perlunicode 5.36.0 と 5.10.0 の差分

1	1
2		=encoding u~~tf8~~
	2	=encoding euc-jp
3	3
4	4	=head1 NAME
5	5
6	6	=begin original
7	7
8	8	perlunicode - Unicode support in Perl
9	9
10	10	=end original
11	11
12	12	perlunicode - Perl における Unicode サポート
13	13
14	14	=head1 DESCRIPTION
15	15
16		=begin original
17
18		If you haven't already, before reading this document, you should become
19		familiar with both L<perlunitut> and L<perluniintro>.
20
21		=end original
22
23		もしまだなら、この文書を読む前に、L<perlunitut> と L<perluniintro> に
24		親しんでおく方が良いでしょう。
25
26		=begin original
27
28		Unicode aims to B<UNI>-fy the en-B<CODE>-ings of all the world's
29		character sets into a single Standard. For quite a few of the various
30		coding standards that existed when Unicode was first created, converting
31		from each to Unicode essentially meant adding a constant to each code
32		point in the original standard, and converting back meant just
33		subtracting that same constant. For ASCII and ISO-8859-1, the constant
34		is 0. For ISO-8859-5, (Cyrillic) the constant is 864; for Hebrew
35		(ISO-8859-8), it's 1488; Thai (ISO-8859-11), 3424; and so forth. This
36		made it easy to do the conversions, and facilitated the adoption of
37		Unicode.
38
39		=end original
40
41		Unicode は世界中の全ての文字集合のエンコーディング(en-B<CODE>-ings) を
42		一つの標準に統合(B<UNI>-fy)することを目標としています。
43		Unicode が最初に作られたといに存在していたいくつかの符号標準に
44		ついては、それぞれから Unicode への変換は、元の標準のそれぞれの符号位置に
45		ある定数を足すことで、
46		逆変換は単に同じ定数を引くことでした。
47		ASCII と ISO-8859-1 では、定数は 0 です。
48		ISO-8859-5 (キリル文字) では、定数は 864 です;
49		ヘブライ文字 (ISO-8859-8) では、これは 1488 です;
50		タイ (ISO-8859-11) は 3424、などです。
51		これは変換を容易にし、Unicode の採用を促進しました。
52
53		=begin original
54
55		And it worked; nowadays, those legacy standards are rarely used. Most
56		everyone uses Unicode.
57
58		=end original
59
60		そしてこれはうまくいきました; 最近は、これらの昔の標準はめったに使われません。
61		ほとんどみんなが Unicode を使います。
62
63		=begin original
64
65		Unicode is a comprehensive standard. It specifies many things outside
66		the scope of Perl, such as how to display sequences of characters. For
67		a full discussion of all aspects of Unicode, see
68		L<https://www.unicode.org>.
69
70		=end original
71
72		Unicode は包括的な標準です。
73		これは、文字の並びをどのように表示するかといった、Perl のスコープの
74		範囲外の多くのことを規定します。
75		Unicode のあらゆる側面に関する完全な議論については、
76		L<https://www.unicode.org> を参照してください。
77
78	16	=head2 Important Caveats
79	17
80	18	(重要な警告)
81	19
82	20	=begin original
83	21
84		Even though some of this section may not be understandable to you on
85		first reading, we think it's important enough to highlight some of the
86		gotchas before delving further, so here goes:
87
88		=end original
89
90		この節の一部は最初に読んだときには理解できないかもしれませんが、
91		さらに掘り下げる前にいくつかの癖について強調することは重要だと考えたので、
92		ここで行います:
93
94		=begin original
95
96	22	Unicode support is an extensive requirement. While Perl does not
97	23	implement the Unicode standard or the accompanying technical reports
98	24	from cover to cover, Perl does support many Unicode features.
99	25
100	26	=end original
101	27
102		Unicode サポートは大規模な要求です。
	28	Uncode サポートは大規模な要求です。
103	29	Perl は標準 Unicode や付随する技術的なレポートを一つ残らず
104	30	実装しているわけではありませんが、多くの Unicode 機能を
105	31	サポートしています。
106	32
107	33	=begin original
108	34
109		Also, the use of Unicode ~~may prese~~nt secur~~ity~~ issues that are~~n't~~
	35	People who want to learn to use Unicode in Perl, should probably read
110		~~obvious, see~~ L</Securi~~ty Im~~plications of Unicode~~> below.~~
	36	L<the Perl Unicode tutorial\|perlunitut> before reading this reference
	37	document.
111	38
112	39	=end original
113	40
114		~~また、~~Unicode を使うと~~、明らかではな~~い~~セキュ~~リ~~ティ問題が姿~~を~~現すかも~~
	41	Perl で Unicode を使うことを学びたい人は、多分このリファレンスを読む前に
115		~~知れませ~~ん;
	42	L<the Perl Unicode tutorial\|perlunitut> を読んだ方がよいでしょう。
116		後述の L</Security Implications of Unicode> を参照してください。
117	43
118	44	=over 4
119	45
120		=item ~~Safes~~t if you C<use fea~~tur~~e ~~'unicode_st~~r~~ing~~s'>
	46	=item Input and Output Layers
121	47
122		(~~C<use feature 'unicode_strings'> とすれば一番安全~~)
	48	(入出力層)
123	49
124	50	=begin original
125	51
126		~~In ord~~er to pre~~serve~~ ba~~ckw~~ard ~~compatibi~~lit~~y, P~~erl does no~~t tur~~n
	52	Perl knows when a filehandle uses Perl's internal Unicode encodings
127		on f~~ull~~ inte~~rnal~~ Unicode s~~upport~~ unless th~~e pragma~~
	53	(UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with
128		~~L<S<C<us~~e feature 'unicod~~e_str~~ings~~'>>\|fe~~a~~tur~~e~~/The~~ ~~'uni~~code~~_st~~r~~ings' f~~eature>
	54	the ":utf8" layer. Other encodings can be converted to Perl's
129		~~is sp~~ec~~ifie~~d. ~~(Th~~is is aut~~oma~~t~~icall~~y
	55	encoding on input or from Perl's encoding on output by use of the
130		selected i~~f you S<C<use v5~~.~~12>>~~ or ~~high~~er.) ~~Failur~~e to ~~do this ca~~n
	56	":encoding(...)" layer. See L<open>.
131		trigger unexpected surprises. See L</The "Unicode Bug"> below.
132	57
133	58	=end original
134	59
135		~~後方互換性を維持するために、~~Perl は
	60	Perl は、ファイルハンドルが ":utf8" 層を指定してオープンされると、
136		~~L<S<C<use~~ fe~~atu~~re ~~'unicode_strings'>>\|feature/The~~ 'unicode~~_strings'~~ ~~feature> プラ~~グ~~マが指定されない限り~~
	61	ファイルハンドルが Perl の内部 Unicode エンコーディング
137		~~完全な内部~~ U~~nicode~~ 対応を~~オンにし~~ません。
	62	(UTF-8, または EBCDIC の時は UTF-EBCDIC) を使うことが分かります。
138		~~(これ~~は ~~S<C<us~~e v5.~~12>>~~ 以上を使うと~~自動的に選択されます。)~~
	63	その他のエンコーディングは、":encoding(...)" 層を使うことで、
139		~~こうする~~の~~に失敗すると予測できない驚きを引き起こすかも知れません。~~
	64	入力時の Perl のエンコーディングへの変換や出力時の Perl の
140		~~後述する L</The "Unicode Bug">~~ を~~参照してください~~。
	65	エンコーディングからの変換を行えます。
	66	L<open> を参照してください。
141	67
142	68	=begin original
143	69
144		T~~his pragma d~~oesn't a~~ffec~~t ~~I/O. No~~r does it ~~cha~~nge the int~~ernal~~
	70	To indicate that Perl source itself is in UTF-8, use C<use utf8;>.
145		representation of strings, only their interpretation. There are still
146		several places where Unicode isn't fully supported, such as in
147		filenames.
148	71
149	72	=end original
150	73
151		この~~プラグマは~~ ~~I/O~~ には~~影響しません。~~
	74	Perl のソース自身が UTF-8 であることを示すには、C<use utf8;> を
152		~~また、文字列の内部表現も変更しません; その解釈~~だ~~けです~~。
	75	使ってください。
153		ファイル名のように Unicode に完全に対応していない場所がいくつかあります。
154	76
155		=item ~~Inp~~ut and ~~Out~~p~~ut Laye~~rs
	77	=item Regular Expressions
156	78
157		(~~入出力層~~)
	79	(正規表現)
158	80
159	81	=begin original
160	82
161		Use the ~~C<:~~encodi~~ng(...)>~~ layer to read from ~~and w~~rite to
	83	The regular expression compiler produces polymorphic opcodes. That is,
162		~~file~~handles ~~using~~ the ~~specifie~~d encodi~~ng.~~ (See L<open~~>.)~~
	84	the pattern adapts to the data and automatically switches to the Unicode
	85	character scheme when presented with data that is internally encoded in
	86	UTF-8 -- or instead uses a traditional byte scheme when presented with
	87	byte data.
163	88
164	89	=end original
165	90
166		~~特定のエン~~コ~~ーディ~~ン~~グを使ってファ~~イ~~ルハン~~ド~~ルと読み書き~~す~~るには、~~
	91	正規表現コンパイラは多態的なオペコードを生成します。
167		~~C<:encoding(...)> 層を使っ~~てください。
	92	つまり、パターンはデータに対して適用され、データが内部で UTF-8 で
168		~~(L<~~open> ~~を参照してください。)~~
	93	エンコードされている場合には Unicode 文字スキームに自動的に
	94	切り替わります -- さもなければ、バイトデータで表されている場合には
	95	伝統的なバイトスキームが使われます。
169	96
170		=item You must convert your n~~on-ASCII,~~ ~~non-~~UTF-8 ~~Perl~~ scripts ~~to be~~
	97	=item C<use utf8> still needed to enable UTF-8/UTF-EBCDIC in scripts
171		UTF-8.
172	98
173		(非 ASCII、非 UTF-8 Perl スクリプトは UTF-8 に変換しなければなりません)
174
175	99	=begin original
176	100
177		~~The~~ ~~L<en~~codi~~ng>~~ m~~odul~~e has ~~been dep~~recated s~~inc~~e ~~perl 5.1~~8 and the
	101	As a compatibility measure, the C<use utf8> pragma must be explicitly
178		~~perl~~ internals it ~~requ~~i~~res~~ have been r~~emoved~~ with perl ~~5.26.~~
	102	included to enable recognition of UTF-8 in the Perl scripts themselves
	103	(in string or regular expression literals, or in identifier names) on
	104	ASCII-based machines or to recognize UTF-EBCDIC on EBCDIC-based
	105	machines. B<These are the only times when an explicit C<use utf8>
	106	is needed.> See L<utf8>.
179	107
180	108	=end original
181	109
182		~~L<encoding>~~ ~~モジュ~~ールは perl ~~5.18 から廃止予定で、~~
	110	互換性のために、ASCII ベースのマシンにおいて Perl スクリプトそれ自身の
183		~~これが要求している perl~~ の~~内部は~~ ~~perl~~ ~~5.26~~ で~~削除されました。~~
	111	中の UTF-8 を(文字列や正規表現リテラル、あるいは変数名で) 認識可能に
	112	するためや、EBCDIC ベースのマシンで UTF-EBCDIC を認識させるために
	113	C<use utf8> プラグマを明示的に含めなければなりません。
	114	B<これらは明示的に C<use utf8> が必要な唯一の場合です。>
	115	L<utf8> を参照してください。
184	116
185		=item ~~C<us~~e ~~utf8>~~ still needed ~~to enable L<~~UTF-~~8\|/Unicode~~ ~~Encoding~~s~~> in s~~cripts
	117	=item BOM-marked scripts and UTF-16 scripts autodetected
186	118
187		(スクリプト内で L<UTF-8\|/Unicode Encodings> を有効にするには、まだ C<use utf8> が必要です)
188
189	119	=begin original
190	120
191		If ~~your~~ Perl script is it~~self~~ encoded in L<UTF-~~8\|/Unicode~~ E~~ncodings>~~,
	121	If a Perl script begins marked with the Unicode BOM (UTF-16LE, UTF16-BE,
192		~~the~~ ~~S<C<use utf~~8>> pr~~agma~~ ~~must~~ be ~~expli~~citly in~~clud~~ed to e~~nabl~~e
	122	or UTF-8), or if the script looks like non-BOM-marked UTF-16 of either
193		re~~cog~~ni~~tion of th~~a~~t (i~~n string or regular ~~exp~~ression literals, ~~or i~~n
	123	endianness, Perl will correctly read in the script as Unicode.
194		ide~~ntifier name~~s). B<T~~his~~ is the ~~only~~ time when an e~~xpl~~icit ~~S<C<us~~e
	124	(BOMless UTF-8 cannot be effectively recognized or differentiated from
195		~~utf~~8>> is needed.~~> (See L<utf8>~~).
	125	ISO 8859-1 or other eight-bit encodings.)
196	126
197	127	=end original
198	128
199		~~Perl スクリプト自身が L<~~U~~TF-8\|/U~~nicode E~~ncodings>~~ で
	129	Unicode BOM (UTF-16LE, UTF16-BE, またはUTF-8)で Perl スクリプトが
200		~~エンコードされ~~てい~~る場合~~、~~Perl~~ スクリプト~~それ自身の~~
	130	始まっていたり、スクリプトが BOM がついていない
201		中を(~~文字列や正規表現リテラル、ある~~い~~は変数名で~~) ~~認識可能に~~
	131	UTF-16(BE か LE のいずれか) であった場合、Perl はそのスクリプトを
202		~~するために、C<us~~e ~~utf8>~~ ~~プラグマを明示的に含め~~な~~ければなりません。~~
	132	Unicode であるとして正しく読み込みます(BOM がない UTF-8 は、
203		~~B<これは明示~~的に ~~C<use~~ ~~utf~~8> ~~が必要~~な唯一の~~場合です。>~~
	133	効率的に ISO 8859-1 などの 8 ビットエンコーディングと区別したり
204		~~(L<utf8> を参照してください~~。)
	134	認識することができません。)
205	135
	136	=item C<use encoding> needed to upgrade non-Latin-1 byte strings
	137
206	138	=begin original
207	139
208		If ~~a P~~erl ~~scrip~~t ~~begins wi~~th the ~~byt~~es that form the ~~UTF-8~~ encod~~ing~~ of
	140	By default, there is a fundamental asymmetry in Perl's Unicode model:
209		the Unicode ~~BYTE ORDER MARK (C<BOM>,~~ see ~~L</~~Unicode ~~Encod~~ings~~>),~~ ~~tho~~se
	141	implicit upgrading from byte strings to Unicode strings assumes that
210		bytes are co~~mpl~~etely ignored.
	142	they were encoded in I<ISO 8859-1 (Latin-1)>, but Unicode strings are
	143	downgraded with UTF-8 encoding. This happens because the first 256
	144	codepoints in Unicode happens to agree with Latin-1.
211	145
212	146	=end original
213	147
214		Perl ~~スクリプトが~~ Unicode ~~のバイト順マーク~~
	148	デフォルトでは、Perl の Unicode モデルにおける基本的な非対称があります:
215		~~(BYTE~~ ~~ORDER MARK, C<BOM>, L</~~Unicode ~~Encodings> 参照)~~ の ~~UTF-8~~
	149	バイト文字列から Unicode 文字列への暗黙の昇格はその文字列が
216		エンコー~~ディングを示すバイト列で始まっ~~ている場合、
	150	I<ISO 8859-1 (Latin-1)> でエンコードされているものと仮定しますが、
217		これらの~~バイト列~~は~~完全に無視されます。~~
	151	Unicode 文字列からのダウングレードは
	152	UTF-8 エンコーディングへと行われます。
	153	これは Unicode の最初の 256 文字が Latin-1 と共通であるからです。
218	154
219		=item L<UTF-16\|/Unicode Encodings> scripts autodetected
220
221		(L<UTF-16\|/Unicode Encodings> スクリプトは自動認識されます)
222
223	155	=begin original
224	156
225		~~If a P~~erl ~~scrip~~t begins ~~wit~~h the Unicode ~~C<BOM> (UTF-16LE,~~
	157	See L</"Byte and Character Semantics"> for more details.
226		UTF16-BE), or if the script looks like non-C<BOM>-marked
227		UTF-16 of either endianness, Perl will correctly read in the script as
228		the appropriate Unicode encoding.
229	158
230	159	=end original
231	160
232		~~しかし、Unicode~~ C<B~~OM>~~ ~~(UTF-16LE,~~ ~~UTF16-BE)で P~~erl ~~スクリプトが~~
	161	詳細は L</"Byte and Character Semantics"> を参照してください。
233		始まっていたり、スクリプトが C<BOM> がついていない
234		UTF-16(BE か LE のいずれか) であった場合、Perl はそのスクリプトを
235		適切な Unicode エンコーディングとして正しく読み込みます。
236	162
237	163	=back
238	164
239	165	=head2 Byte and Character Semantics
240	166
241	167	(バイトと文字のセマンティクス)
242	168
243	169	=begin original
244	170
245		Be~~fore U~~ni~~code,~~ ~~mos~~t e~~ncod~~ings use~~d 8 bit~~s ~~(a sin~~gle byte) to ~~encode~~
	171	Beginning with version 5.6, Perl uses logically-wide characters to
246		~~each cha~~r~~act~~er~~. Thu~~s ~~a charac~~ter was ~~a by~~te~~, a~~nd a by~~te was a~~
	172	represent strings internally.
247		character, and there could be only 256 or fewer possible characters.
248		"Byte Semantics" in the title of this section refers to
249		this behavior. There was no need to distinguish between "Byte" and
250		"Character".
251	173
252	174	=end original
253	175
254		~~Unicode 以前、ほとんどのエンコ~~ーディングは~~それぞれの~~文字の~~エンコードに~~
	176	バージョン 5.6 から、Perl は論理的なワイド文字を内部的な文字列の
255		~~8 ビット (1 バイト) を~~使っていました。
	177	表現のために使っています。
256		従って文字はバイトであり、バイトは文字であり、可能な文字は 256 文字
257		以下でした。
258		この章のタイトルである「バイトのセマンティクス」は、この振る舞いを
259		示しています。
260		「バイト」と「文字」を区別する必要はありませんでした。
261	178
262	179	=begin original
263	180
264		~~The~~n al~~ong~~ come~~s Un~~icode which ~~has~~ ~~room~~ for over ~~a m~~i~~llion charac~~t~~ers~~
	181	In future, Perl-level operations will be expected to work with
265		(a~~nd Pe~~rl a~~llows for~~ e~~ven mo~~r~~e). Thi~~s ~~means that a cha~~racter may
	182	characters rather than bytes.
266		require more than a single byte to represent it, and so the two terms
267		are no longer equivalent. What matter are the characters as whole
268		entities, and not usually the bytes that comprise them. That's what the
269		term "Character Semantics" in the title of this section refers to.
270	183
271	184	=end original
272	185
273		~~それから~~、~~100 万文字以上を扱える (そして~~ Perl はもっと扱える~~) Unicode が~~
	186	将来は、Perl レベルの操作はバイトではなく文字に対して働くことになるでしょう。
274		登場します。
275		これは、一つの文字を表現するのに複数バイトが必要になる場合があり、
276		二つの用語はもはや等価ではないということを意味します。
277		問題となるのはエンティティ全体としての文字であり、通常はそれを構成する
278		バイトではありません。
279		これが、この章のタイトルにある「文字セマンティクス」が指しているものです。
280	187
281	188	=begin original
282	189
283		Perl had ~~to ch~~ange inter~~nally~~ ~~to de~~coup~~le "~~byt~~es"~~ ~~fro~~m ~~"ch~~ar~~act~~ers".
	190	However, as an interim compatibility measure, Perl aims to
284		It is important that you t~~oo chang~~e ~~your id~~eas, ~~if y~~ou ha~~ven't al~~rea~~dy,~~
	191	provide a safe migration path from byte semantics to character
285		s~~o that "byt~~e" an~~d "charac~~t~~er"~~ no longer mean the ~~sam~~e ~~thi~~ng in your
	192	semantics for programs. For operations where Perl can unambiguously
286		mind.
	193	decide that the input data are characters, Perl switches to
	194	character semantics. For operations where this determination cannot
	195	be made without additional information from the user, Perl decides in
	196	favor of compatibility and chooses to use byte semantics.
287	197
288	198	=end original
289	199
290		Perl は~~、「バイト」と「文字」から切り離すために内部を変更する~~
	200	しかしながら、一時的な互換性の措置として、Perl は
291		~~必要がありました。~~
	201	プログラムに対するバイトセマンティクスから文字セマンティクスへの
292		あな~~たの頭の中で「バイト」~~と~~「文字」はもはや同じもの~~を
	202	安全な移行パスを提供することを目指します。
293		~~意味しないように、(もしまだなら)考え方を変え~~ることが重要です。
	203	入力データが文字であると Perl が曖昧さなく決定できる操作については、
	204	Perl は文字セマンティクスに切り替えます。
	205	ユーザーからの付加的な情報抜きに決定することができない操作については
	206	Perl は互換性の観点からバイトセマンティクスを選択します。
294	207
295	208	=begin original
296	209
297		The basic building ~~block~~ ~~of P~~erl strings ~~has~~ ~~always b~~e~~en a "cha~~r~~acter".~~
	210	This behavior preserves compatibility with earlier versions of Perl,
298		The ch~~anges~~ ba~~sica~~ll~~y c~~ome d~~own~~ to that the imple~~ment~~ation no lon~~ger~~
	211	which allowed byte semantics in Perl operations only if
299		~~thi~~nks that a char~~act~~er is a~~lway~~s just a single ~~byte.~~
	212	none of the program's inputs were marked as being as source of Unicode
	213	character data. Such data may come from filehandles, from calls to
	214	external programs, from information provided by the system (such as %ENV),
	215	or from literals and constants in the source text.
300	216
301	217	=end original
302	218
303		Perl の~~文字列~~の~~基礎要素は常に「文字」で~~した。
	219	この動作は Perl の以前のバージョンとの互換性を維持し、プログラムの
304		~~変更は基本的に、実装はもはや文字~~が常に 1 ~~バイト~~であるとは
	220	入力が Unicode の文字データのソースであるとマークされていない場合にのみ
305		~~考えないということ~~です。
	221	Perl の操作でバイトセマンティクスを許可します。
	222	そのようなデータは、ファイルハンドル、外部プログラムの呼び出し、
	223	システムから提供される情報(%ENV のような)、ソーステキスト中のリテラルや
	224	定数といったものからくるものです。
306	225
307	226	=begin original
308	227
309		There are various ~~thin~~gs to note:
	228	The C<bytes> pragma will always, regardless of platform, force byte
	229	semantics in a particular lexical scope. See L<bytes>.
310	230
311	231	=end original
312	232
313		~~記しておくべき様々なこ~~と~~があります:~~
	233	C<bytes> プラグマは常に、プラットフォームとは無関係に、特定の
	234	レキシカルスコープにおいてバイトセマンティクスを強制します。
	235	L<bytes> を参照してください。
314	236
315		=over 4
316
317		=item *
318
319	237	=begin original
320	238
321		~~String~~ h~~andling~~ funct~~ions,~~ for ~~the~~ most part, conti~~nue~~ to operate in
	239	The C<utf8> pragma is primarily a compatibility device that enables
322		terms of ~~characters. C<length~~()>, ~~for~~ example, returns the ~~numb~~er of
	240	recognition of UTF-(8\|EBCDIC) in literals encountered by the parser.
323		~~charac~~ters in a stri~~ng, ju~~st as ~~bef~~ore~~. B~~ut that ~~numb~~er no lo~~nger~~ is
	241	Note that this pragma is only required while Perl defaults to byte
324		~~nece~~ssarily the same as the number o~~f byt~~es in the st~~ring~~ (th~~ere~~ ma~~y be~~
	242	semantics; when character semantics become the default, this pragma
325		m~~ore~~ bytes than ~~characters)~~. The ~~oth~~er su~~ch func~~t~~ions include~~
	243	may become a no-op. See L<utf8>.
326		C<chop()>, C<chomp()>, C<substr()>, C<pos()>, C<index()>, C<rindex()>,
327		C<sort()>, C<sprintf()>, and C<write()>.
328	244
329	245	=end original
330	246
331		~~文字列処理関数~~は、ほと~~んどの場合、引き続き文字に関~~して~~動作しま~~す。
	247	C<utf8> プラグマは主としてパーサが遭遇するリテラル中の UTF-(8\|EBCDIC) の
332		~~たとえば C<~~l~~eng~~t~~h()>~~ ~~は、以前と同じように文字列内の文字の数を返しま~~す。
	248	認識を有効にする互換デバイス(compatibility device)です。
333		~~しかし、そ~~の数は~~もはや文字列内~~のバイト~~数と常に同じ~~ではあ~~りません~~
	249	このプラグマは Perl のデフォルトがバイトセマンティクスであるときにのみ
334		~~(文字数よりもバイト数が多い場合が~~あ~~ります)~~。
	250	必要であることに注意してください。
335		~~その他のそのような関数~~には、
	251	文字セマンティクスがデフォルトである場合には、
336		~~C<chop()>, C<chomp()>, C<substr()>, C<pos()>, C<index()>, C<rindex()>,~~
	252	このプラグマは何もしません。
337		C<~~sor~~t~~()>, C<sprint~~f()>, ~~and C<write()> があります~~。
	253	L<utf8> を参照してください。
338	254
339	255	=begin original
340	256
341		The exceptions are:
	257	Unless explicitly stated, Perl operators use character semantics
	258	for Unicode data and byte semantics for non-Unicode data.
	259	The decision to use character semantics is made transparently. If
	260	input data comes from a Unicode source--for example, if a character
	261	encoding layer is added to a filehandle or a literal Unicode
	262	string constant appears in a program--character semantics apply.
	263	Otherwise, byte semantics are in effect. The C<bytes> pragma should
	264	be used to force byte semantics on Unicode data.
342	265
343	266	=end original
344	267
345		例外は:
	268	明示的に指定されない限り、Perl の演算子は Unicode データに対しては
	269	文字セマンティクスを用い、非 Unicode データに対しては
	270	バイトセマンティクスを用います。
	271	文字セマンティクスの使用の決定はトランスペアレントに行われます。
	272	もし入力データが Unicode ソースから来たもの -- たとえば、
	273	文字エンコーディング層がファイルハンドルに附加されているか
	274	リテラルの Unicode 文字列定数がプログラムの中にある -- のであれば
	275	文字セマンティクスが適用されます。
	276	そうでなければ、バイトセマンティクスが有効になります。
	277	C<bytes> プラグマは Unicode データに対してバイトセマンティクスを
	278	強制するときに使うと良いでしょう。
346	279
347		=over 4
348
349		=item *
350
351	280	=begin original
352	281
353		the bit-oriented ~~C<ve~~c>
	282	If strings operating under byte semantics and strings with Unicode
	283	character data are concatenated, the new string will be created by
	284	decoding the byte strings as I<ISO 8859-1 (Latin-1)>, even if the
	285	old Unicode string used EBCDIC. This translation is done without
	286	regard to the system's native 8-bit encoding.
354	287
355	288	=end original
356	289
357		ビット単位の ~~C<ve~~c>
	290	バイトセマンティクスの元での文字列の操作で、Unicode 文字データが
	291	連結された文字列であった場合、新たな文字列は、古い Unicode 文字列が
	292	EBCDIC を使っていたとしても、バイト文字列を I<ISO 8859-1 (Latin-1)> として
	293	デコードすることで作成されます。
	294	この変換はシステムのネイティブな 8 ビットエンコーディングとは
	295	無関係に行われます。
358	296
359		E<nbsp>
360
361		=item *
362
363	297	=begin original
364	298
365		the byte-oriented ~~C<p~~a~~ck>/C<u~~npa~~ck>~~ ~~C<"C">~~ format
	299	Under character semantics, many operations that formerly operated on
	300	bytes now operate on characters. A character in Perl is
	301	logically just a number ranging from 0 to 2**31 or so. Larger
	302	characters may encode into longer sequences of bytes internally, but
	303	this internal detail is mostly hidden for Perl code.
	304	See L<perluniintro> for more.
366	305
367	306	=end original
368	307
369		バイト単位の ~~C<pack>/C<unpack> C<"C"> フォーマット~~
	308	文字セマンティクスの元では、伝統的にバイトに対して働いていた操作の多くが
	309	文字に対して働きます。
	310	Perl における文字は論理的には 0 から 2**31 までの範囲の数値です。
	311	大きな文字は内部的にはより長いシーケンスにエンコードされる可能性が
	312	ありますが、この内部の詳細は Perl プログラムからほとんど隠されています。
	313	詳細は L<perluniintro> を参照してください。
370	314
371		=be~~gin~~ or~~igin~~al
	315	=head2 Effects of Character Semantics
372	316
373		~~However, the C<W> specifier does operate on whole characters, as does the~~
	317	(文字セマンティクスの効果)
374		C<U> specifier.
375	318
376		=end original
377
378		しかし、C<W> 指示子は C<U> 指示子と同様、文字全体を操作します。
379
380		=item *
381
382	319	=begin original
383	320
384		~~some ope~~rators tha~~t i~~nt~~era~~ct ~~wit~~h the ~~plat~~fo~~rm's~~ o~~perat~~ing ~~syst~~em
	321	Character semantics have the following effects:
385	322
386	323	=end original
387	324
388		~~プラットフォームのオペレー~~ティ~~ングシ~~ス~~テムと相互作用する一部~~の~~演算子~~
	325	文字セマンティクスは以下の効果を持っています:
389	326
390		=~~begin~~ or~~iginal~~
	327	=over 4
391	328
392		Operators dealing with filenames are examples.
393
394		=end original
395
396		例としてはファイル名を扱う演算子です。
397
398	329	=item *
399	330
400	331	=begin original
401	332
402		when the functions are called from within the scope of the
403		S<C<L<use bytes\|bytes>>> pragma
404
405		=end original
406
407		関数が S<C<L<use bytes\|bytes>>> プラグマのスコープ内から呼び出された場合
408
409		=begin original
410
411		Likely, you should use this only for debugging anyway.
412
413		=end original
414
415		おそらく、これはデバッグのためだけに行うべきです。
416
417		=back
418
419		=item *
420
421		=begin original
422
423	333	Strings--including hash keys--and regular expression patterns may
424		contain characters that have ordinal values larger than 255.
	334	contain characters that have an ordinal value larger than 255.
425	335
426	336	=end original
427	337
428	338	文字列 -- ハッシュのキーを含め -- と正規表現パターンは序数値として 255 を
429	339	超える値を持つ文字を含めることができます。
430	340
431	341	=begin original
432	342
433	343	If you use a Unicode editor to edit your program, Unicode characters may
434	344	occur directly within the literal strings in UTF-8 encoding, or UTF-16.
435		(The former requires a C<use utf8>, the latter ~~may~~ require a C<BOM>.)
	345	(The former requires a BOM or C<use utf8>, the latter requires a BOM.)
436	346
437	347	=end original
438	348
	349
439	350	プログラムを編集するのに Unicode エディタを使っているのであれば、Unicode の
440	351	文字 UTF-8 か UTF-16 のエンコーディングコーディングでリテラル文字列に
441	352	含めることができます。
442		(前者は C<use utf8> を必要とし、後者は C<BOM> を必要と~~するかも~~しれません。)
	353	(前者は BOM か C<use utf8> を必要とし、後者は BOM を必要とします。)
443	354
444	355	=begin original
445	356
446		~~L<perluniintro/Creating~~ Unicode> ~~giv~~es other ways to place non~~-AS~~CII
	357	Unicode characters can also be added to a string by using the C<\x{...}>
447		cha~~rac~~ters in your strin~~gs.~~
	358	notation. The Unicode code for the desired character, in hexadecimal,
	359	should be placed in the braces. For instance, a smiley face is
	360	C<\x{263A}>. This encoding scheme only works for all characters, but
	361	for characters under 0x100, note that Perl may use an 8 bit encoding
	362	internally, for optimization and/or backward compatibility.
448	363
449	364	=end original
450	365
451		~~L<perluniintro/Creating~~ Unicode> は、文字~~列に非~~ ASCII 文字~~を置くための~~
	366	Unicode の文字は C<\x{...}> 表記を使うことにより文字列に
452		~~その他の方法を提供し~~ます。
	367	追加することもできます。
	368	その表現される Unicode コードは、16 進でブレースに囲みます。
	369	たとえば、smiley face は C<\x{263A}> です。
	370	このエンコーディングスキームは 0x100 未満の全ての文字でのみ動作します;
	371	Perl は最適化や後方互換性のために内部で 8 ビットエンコーディングを
	372	使うかもしれないことに注意してください。
453	373
454		=item *
455
456	374	=begin original
457	375
458		~~The C<chr()> an~~d ~~C<or~~d~~()> func~~tions ~~work~~ o~~n whole characters.~~
	376	Additionally, if you
459	377
460	378	=end original
461	379
462		~~C<chr()> 関数と C<ord()> 関数は文字全体~~に対して~~働きます。~~
	380	これに加えて、
463	381
464		~~=it~~em *
	382	use charnames ':full';
465	383
466	384	=begin original
467	385
468		~~Reg~~ular e~~xpr~~essions match whole cha~~racters. For examp~~le, ~~C<"."> mat~~ches
	386	you can use the C<\N{...}> notation and put the official Unicode
469		~~a whole~~ character instead of ~~only~~ a s~~ingle~~ ~~byte~~.
	387	character name within the braces, such as C<\N{WHITE SMILING FACE}>.
470	388
471	389	=end original
472	390
473		正規表現は文字~~全体にマッチします。~~
	391	とすると C<\N{...}> 表記を使うことができ、公式な Unicode 文字名を
474		~~例えば、~~C<~~".">~~ は 1 ~~バイトだけではなく、ひとつ~~の~~文字全体~~に~~マッチし~~ます。
	392	C<\N{WHITE SMILING FACE}> のようにブレースの中に置くことができます。
475	393
476	394	=item *
477	395
478	396	=begin original
479	397
480		~~The~~ ~~C<tr///>~~ operator tra~~nsla~~tes ~~whol~~e c~~harac~~ters. ~~(No~~t~~e t~~hat the
	398	If an appropriate L<encoding> is specified, identifiers within the
481		~~C<t~~r~~///CU>~~ ~~fun~~ctionality ha~~s bee~~n remove~~d. Fo~~r si~~mil~~ar ~~fun~~ctionali~~ty to~~
	399	Perl script may contain Unicode alphanumeric characters, including
482		that, see ~~C<pack('U0',~~ ~~...)> a~~nd C<pac~~k('C0',~~ ~~...)>).~~
	400	ideographs. Perl does not currently attempt to canonicalize variable
	401	names.
483	402
484	403	=end original
485	404
486		C<~~tr///~~> 演算子~~は文字全体を変換します。~~
	405	適切な L<encoding> が指定されていれば、Perl スクリプトの中の識別子で
487		~~C<tr///C~~U> ~~は削除された~~こと~~に注意してください~~。
	406	表意文字を含めた Unicode の英数字を含めることができます。
488		~~(これと同様のこと~~を行う~~には C<pack('U0', ...)>~~ と ~~C<pack('C0', ...)> を~~
	407	Perl は現在、変数名を正規化しようとはしません。
489		参照してください。)
490	408
491	409	=item *
492	410
493	411	=begin original
494	412
495		~~C<sca~~lar reverse~~()> rever~~ses by character rathe~~r th~~an by byte.
	413	Regular expressions match characters instead of bytes. "." matches
	414	a character instead of a byte.
496	415
497	416	=end original
498	417
499		~~C<scalar reverse()>~~ はバイト単位ではなく文字~~単位で~~
	418	正規表現はバイトではなく文字にマッチします。
500		~~反転を行い~~ます。
	419	"." は一バイトではなく、ひとつの文字にマッチします。
501	420
502	421	=item *
503	422
504	423	=begin original
505	424
506		The ~~bit~~ string operators, ~~C<& \| ^ ~>~~ a~~nd (s~~tart~~ing~~ in ~~v5.22)~~
	425	Character classes in regular expressions match characters instead of
507		~~C<&.~~ \|. ~~^. ~.>~~ can operate o~~n bit st~~rings encoded in ~~UTF-8, bu~~t this
	426	bytes and match against the character properties specified in the
508		can give ~~unex~~pecte~~d re~~sults if any ~~of th~~e s~~trings~~ ~~con~~t~~ain~~ c~~ode~~ points
	427	Unicode properties database. C<\w> can be used to match a Japanese
509		~~above 0xFF. Start~~ing ~~in v5.28~~, ~~it is a~~ f~~atal err~~or t~~o h~~a~~ve such a~~n
	428	ideograph, for instance.
510		operand. Otherwise, the operation is performed on a non-UTF-8 copy of
511		the operand. If you're not sure about the encoding of a string,
512		downgrade it before using any of these operators; you can use
513		L<C<utf8::utf8_downgrade()>\|utf8/Utility functions>.
514	429
515	430	=end original
516	431
517		ビット文字~~列演算子~~ ~~C<& \| ^ ~> および~~
	432	正規表現中の文字クラスはバイトではなく文字にマッチし、Unicode の
518		~~(v5.22 からの) C<&. \|. ^. ~.> は UTF-8 でエンコ~~ードされ~~たビット~~文字列を
	433	特性データベースで定義されている文字特性に対してマッチを行います。
519		~~操作できますが~~、文字~~列の一部~~に ~~0xFF を超え~~る~~符号位置を含ん~~で~~いる場合、~~
	434	たとえば、C<\w> は日本語の表意文字にマッチさせるために使うことができます。
520		予想外の結果になるかもしれません。
521		v5.28 から、そのようなオペランドに対しては致命的エラーになります。
522		さもなければ、処理はオペランドの非 UTF-8 のコピーに対して行われます。
523		文字列のエンコーディンがはっきりしない場合、
524		これらの演算子を使う前に降格してください;
525		L<C<utf8::utf8_downgrade()>\|utf8/Utility functions> が使えます。
526	435
527		=~~back~~
	436	=item *
528	437
529	438	=begin original
530	439
531		The bo~~ttom~~ line is that ~~Per~~l has a~~lwa~~ys ~~practic~~ed ~~"Charact~~er ~~Semant~~i~~cs",~~
	440	Named Unicode properties, scripts, and block ranges may be used like
532		~~but wit~~h the advent ~~of Un~~i~~code,~~ that is no~~w diff~~erent than ~~"Byte~~
	441	character classes via the C<\p{}> "matches property" construct and
533		Semantics".
	442	the C<\P{}> negation, "doesn't match property".
534	443
535	444	=end original
536	445
537		~~まとめとしては、P~~erl ~~は常に「文字~~の~~意味論」で動作しますが~~、
	446	名前付けされた Unicode の特性、用字、ブロックの範囲は C<\p{}>
538		~~Uni~~code の~~搭乗により、これは「バイト~~の~~意味論」とは~~
	447	"matches property" 構造やその否定形の C<\P{}> "doesn't match property" を
539		~~異なるようにな~~っています。
	448	使った文字クラスで使うことができます。
540	449
541		=head2 ASCII Rules versus Unicode Rules
542
543		(ASCII 規則対 Unicode 規則)
544
545	450	=begin original
546	451
547		Be~~for~~e Unicode, when a ~~cha~~racter ~~was a by~~te was a ~~cha~~r~~act~~er,
	452	See L</"Unicode Character Properties"> for more details.
548		Perl knew only about the 128 characters defined by ASCII, code points 0
549		through 127 (except for under L<S<C<use locale>>\|perllocale>). That
550		left the code
551		points 128 to 255 as unassigned, and available for whatever use a
552		program might want. The only semantics they have is their ordinal
553		numbers, and that they are members of none of the non-negative character
554		classes. None are considered to match C<\w> for example, but all match
555		C<\W>.
556	453
557	454	=end original
558	455
559		Unicode ~~以前、文字はバイトでバイトは文字という時代は、~~
	456	さらなる詳細については L</"Unicode Character Properties"> を
560		~~Perl は ASCII で定義~~さ~~れた 128 文字、符号位置 0 から 127 に~~
	457	参照してください。
561		ついてしか知りませんでした (L<S<C<use locale>>\|perllocale> の下を除く)。
562		そのため、符号位置 128 から 255 は割り当てられておらず、プログラムが
563		望むあらゆる用途に利用可能でした。
564		それらが持つ唯一のセマンティクスは序数であり、
565		それらは否定でない文字クラスのメンバーではありません。
566		たとえば、どれも C<\w> にマッチングするとは見なされず、すべてが C<\W> に
567		マッチングします。
568	458
569	459	=begin original
570	460
571		~~Unic~~ode~~, o~~f cour~~se,~~ ~~assig~~ns each of t~~hos~~e ~~code~~ points a ~~partic~~u~~lar~~
	461	You can define your own character properties and use them
572		~~mean~~ing (along with ones above ~~255).~~ To ~~pre~~ser~~ve ba~~c~~kward~~
	462	in the regular expression with the C<\p{}> or C<\P{}> construct.
573		compatibility, Perl only uses the Unicode meanings when there is some
574		indication that Unicode is what is intended; otherwise the non-ASCII
575		code points remain treated as if they are unassigned.
576	463
577	464	=end original
578	465
579		~~Unicodeはもちろん、これら~~の~~符号位置のそれぞれに~~特定の~~意味を割り当て~~ます
	466	独自の文字特性を定義でき、C<\p{}> や C<\P{}> 構造での正規表現で使えます。
580		(255 より上も同様です)。
581		後方互換性を保つために、Perl は Unicode が意図されたものであることを示す
582		何らかの表示がある場合にのみ Unicode の意味を使用します;
583		それ以外の場合、非 ASCII 符号位置は、割り当てられていないものとして
584		扱われます。
585	467
586	468	=begin original
587	469
588		Here are the ~~ways t~~hat Perl kno~~ws that a st~~ring sho~~uld~~ be treated as
	470	See L</"User-Defined Character Properties"> for more details.
589		Unicode:
590	471
591	472	=end original
592	473
593		~~次のもの~~は~~、文字列が~~ Unicode ~~として扱われるべきと~~ Perl ~~が分かる方法です:~~
	474	更なる詳細については L</"User-Defined Character Properties"> を
	475	参照してください。
594	476
595		=over
596
597	477	=item *
598	478
599	479	=begin original
600	480
601		~~Wit~~h~~in th~~e scope ~~of S<~~C<use ut~~f8>>~~
	481	The special pattern C<\X> matches any extended Unicode
	482	sequence--"a combining character sequence" in Standardese--where the
	483	first character is a base character and subsequent characters are mark
	484	characters that apply to the base character. C<\X> is equivalent to
	485	C<(?:\PM\pM*)>.
602	486
603	487	=end original
604	488
605		S<C<use ut~~f8>>~~ の~~スコープの内側~~
	489	特殊なパターン C<\X> は拡張 Unicode シーケンス -- Standardese の
	490	"a combining character sequence" -- 最初の文字が基本となる文字で
	491	続く文字が基本文字に適用されるマーク文字にマッチします。
	492	C<\X> は C<(?:\PM\pM*)> と等価です。
606	493
607		=begin original
608
609		If the whole program is Unicode (signified by using 8-bit B<U>nicode
610		B<T>ransformation B<F>ormat), then all literal strings within it must be
611		Unicode.
612
613		=end original
614
615		プログラム全体が (8-bit B<U>nicode B<T>ransformation B<F>ormat を
616		使うことで示される) Unicode の場合、その中の全てのリテラルな文字列は
617		Unicode でなければなりません。
618
619	494	=item *
620	495
621	496	=begin original
622	497
623		~~Wit~~hin the scope of
	498	The C<tr///> operator translates characters instead of bytes. Note
624		~~L<S<C<use~~ feature 'unic~~ode_s~~trin~~gs'>>\|fe~~at~~ure/T~~he 'unicode~~_st~~r~~ings'~~ feature>
	499	that the C<tr///CU> functionality has been removed. For similar
	500	functionality see pack('U0', ...) and pack('C0', ...).
625	501
626	502	=end original
627	503
628		~~L<S<~~C<~~use fea~~tur~~e 'unicode_strings'>>\|feature~~/~~The 'unicode_strings' feature~~>
	504	C<tr///> 演算子はバイトではなく文字で変換します。
629		~~のスコープの内側~~
	505	C<tr///CU> は削除されたことに注意してください。
	506	同様のことを行うには pack('U0', ...) と pack('C0', ...) を
	507	参照してください。
630	508
631		=begin original
632
633		This pragma was created so you can explicitly tell Perl that operations
634		executed within its scope are to use Unicode rules. More operations are
635		affected with newer perls. See L</The "Unicode Bug">.
636
637		=end original
638
639		このプラグマは、このスコープ内で実行される操作は Unicode の規則が
640		使われるべきということを明示的に Perl に伝えるために作られました。
641		より新しい perl では更なる操作が影響を受けます。
642		L</The "Unicode Bug"> を参照してください。
643
644	509	=item *
645	510
646	511	=begin original
647	512
648		Within ~~the sc~~ope of ~~S<C<~~use ~~v5.12>>~~ or ~~high~~er
	513	Case translation operators use the Unicode case translation tables
	514	when character input is provided. Note that C<uc()>, or C<\U> in
	515	interpolated strings, translates to uppercase, while C<ucfirst>,
	516	or C<\u> in interpolated strings, translates to titlecase in languages
	517	that make the distinction.
649	518
650	519	=end original
651	520
652		~~S<C<us~~e ~~v5.12>> 以上~~のスコープの内側
	521	大小文字の変換演算子は Unicode の大小文字変換テーブルを、文字の入力が
	522	あったときに使用します。
	523	C<uc()> や展開文字列中の C<\U> は大文字に変換し、C<ucfirst> や
	524	展開文字列中の C<\u> はその言語で区別されているときに
	525	タイトルケースに変換します。
653	526
654		=begin original
655
656		This implicitly turns on S<C<use feature 'unicode_strings'>>.
657
658		=end original
659
660		これは暗黙に S<C<use feature 'unicode_strings'>> を有効にします。
661
662	527	=item *
663	528
664	529	=begin original
665	530
666		Within the s~~cope~~ of
	531	Most operators that deal with positions or lengths in a string will
667		~~L<S<C<~~u~~se l~~ocale ~~'no~~t_character~~s'>>\|~~p~~erll~~o~~cale/Un~~icode and ~~UTF-8>,~~
	532	automatically switch to using character positions, including
668		or L<S<C<us~~e locale~~>>\|p~~erll~~o~~cale~~> and the current
	533	C<chop()>, C<chomp()>, C<substr()>, C<pos()>, C<index()>, C<rindex()>,
669		~~locale~~ is a ~~UTF-8~~ l~~ocal~~e.
	534	C<sprintf()>, C<write()>, and C<length()>. An operator that
	535	specifically does not switch is C<vec()>. Operators that really don't
	536	care include operators that treat strings as a bucket of bits such as
	537	C<sort()>, and operators dealing with filenames.
670	538
671	539	=end original
672	540
673		~~L<S<C<use locale 'not_characters'>>\|perllocale/Unicode and UTF-8> か~~
	541	文字列の位置や長さを取り扱う演算子の大部分は自動的に文字の位置を
674		~~L<S<C<use locale>>\|perllocale> のスコープ内で、現在のロケールが~~
	542	使うように変更されました。
675		~~UTF-8~~ ~~ロケール。~~
	543	これには C<chop()>, C<chomp()>, C<substr()>, C<pos()>, C<index()>,
	544	C<rindex()>, C<sprintf()>, C<write()>, C<length()> が含まれます。
	545	C<vec()> は変更されていません。
	546	文字列をビットのバケツのように扱う C<sort()>、ファイル名を取り扱う
	547	演算子は文字かどうかを気にしません。
676	548
677		=begin original
678
679		The former is defined to imply Unicode handling; and the latter
680		indicates a Unicode locale, hence a Unicode interpretation of all
681		strings within it.
682
683		=end original
684
685		前者は Unicode の扱いを暗示し、後者は Unicode ロケールを示すので、
686		この中の全ての文字列は Unicode の解釈になります。
687
688	549	=item *
689	550
690	551	=begin original
691	552
692		When the str~~ing~~ cont~~ains~~ a Un~~icod~~e~~-only~~ code point
	553	The C<pack()>/C<unpack()> letter C<C> does I<not> change, since it is often
	554	used for byte-oriented formats. Again, think C<char> in the C language.
693	555
694	556	=end original
695	557
696		~~文字列が U~~nic~~ode~~ の~~みの符号位置を含んで~~い~~るとき~~
	558	C<pack()>/C<unpack()> の文字 C<C> は I<変更されていません>。
	559	なぜなら、これらはしばしばバイト指向の書式のために使われるからです。
	560	繰り返しますが、C 言語の C<char> を考えてください。
697	561
698	562	=begin original
699	563
700		Perl has never a~~ccep~~ted co~~de poi~~nts above ~~255~~ without the~~m being~~
	564	There is a new C<U> specifier that converts between Unicode characters
701		Unicode, so their use i~~mplie~~s ~~Unic~~ode for the whole ~~str~~ing.
	565	and code points. There is also a C<W> specifier that is the equivalent of
	566	C<chr>/C<ord> and properly handles character values even if they are above 255.
702	567
703	568	=end original
704	569
705		~~Perl は~~ Unicode ~~でない限り 255 を超える~~符号位置を~~決して受け入れ~~な~~いので、~~
	570	Unicode の文字と符号位置の間の変換を行う新たな C<U> 指定子があります。
706		~~これらを使う~~と文字~~列全体~~が ~~Unicode~~ ~~であること~~を~~暗示します。~~
	571	C<chr>/C<ord> と等価で、文字の値が 255 を超えていても適切に扱える
	572	C<W> 指定子もあります。
707	573
708	574	=item *
709	575
710	576	=begin original
711	577
712		When th~~e st~~ring contains ~~a Unic~~ode n~~amed~~ code point ~~C<\N{...}>~~
	578	The C<chr()> and C<ord()> functions work on characters, similar to
	579	C<pack("W")> and C<unpack("W")>, I<not> C<pack("C")> and
	580	C<unpack("C")>. C<pack("C")> and C<unpack("C")> are methods for
	581	emulating byte-oriented C<chr()> and C<ord()> on Unicode strings.
	582	While these methods reveal the internal encoding of Unicode strings,
	583	that is not something one normally needs to care about at all.
713	584
714	585	=end original
715	586
716		~~文字列が Uni~~code ~~の名前付き符号位置~~ C<~~\N{...}~~> ~~を含んでいるとき~~
	587	C<chr()> 関数と C<ord()> 関数は C<pack("W")> や C<unpack("W")> のように
	588	文字に対して働き、C<pack("C")> や C<unpack("C")> のようには I<働きません>。
	589	C<pack("C")> と C<unpack("C")> は Unicode 文字列においてバイト指向の
	590	C<chr()> や C<ord()> をエミュレートするためのメソッドです。
	591	これらのメソッドが Unicode 文字列の内部エンコーディングを明らかにするので、
	592	通常はケアする必要はありません。
717	593
718		=begin original
719
720		The C<\N{...}> construct explicitly refers to a Unicode code point,
721		even if it is one that is also in ASCII. Therefore the string
722		containing it must be Unicode.
723
724		=end original
725
726		C<\N{...}> 構文は、たとえ ASCII にもあるものだとしても、
727		明示的に Unicode 符号位置を参照します。
728		従ってこれを含む文字列は Unicode でなければなりません。
729
730	594	=item *
731	595
732	596	=begin original
733	597
734		When the string ~~has c~~ome from an e~~xte~~rnal so~~urce~~ marked as
	598	The bit string operators, C<& \| ^ ~>, can operate on character data.
735		~~Unic~~ode
	599	However, for backward compatibility, such as when using bit string
	600	operations when characters are all less than 256 in ordinal value, one
	601	should not use C<~> (the bit complement) with characters of both
	602	values less than 256 and values greater than 256. Most importantly,
	603	DeMorgan's laws (C<~($x\|$y) eq ~$x&~$y> and C<~($x&$y) eq ~$x\|~$y>)
	604	will not hold. The reason for this mathematical I<faux pas> is that
	605	the complement cannot return B<both> the 8-bit (byte-wide) bit
	606	complement B<and> the full character-wide bit complement.
736	607
737	608	=end original
738	609
739		文字列が ~~Unicode~~ とマー~~クされている外部ソースから来たと~~き
	610	ビット文字列演算子 C<& \| ^ ~> は文字データを操作できます。
	611	しかし、例えば全ての文字の値が 255 以下のときに
	612	ビット文字列演算を使った場合の後方互換性のために、
	613	256 以上の値の文字と 255 以下の値の文字の両方が含まれている文字列に
	614	C<~> (ビット補数) を使うべきではありません。
	615	最も重要なことは、ド・モルガンの法則 (C<~($x\|$y) eq ~$x&~$y> と
	616	C<~($x&$y) eq ~$x\|~$y>) が成り立たないということです。
	617	この数学的な I<過失> の理由は補数(complement)が 8 ビットのビット補数
	618	B<および> 文字幅のビット補数の B<両方> を返すことができないためです。
740	619
741		=begin original
742
743		The L<C<-C>\|perlrun/-C [numberE<sol>list]> command line option can
744		specify that certain inputs to the program are Unicode, and the values
745		of this can be read by your Perl code, see L<perlvar/"${^UNICODE}">.
746
747		=end original
748
749		L<C<-C>\|perlrun/-C [numberE<sol>list]> コマンドラインオプションは
750		プログラムへの特定の入力が Unicode であることを指定でき、その値は
751		Perl のコードで読み込めます; L<perlvar/"${^UNICODE}"> を参照してください。
752
753		=item * When the string has been upgraded to UTF-8
754
755		(文字列が UTF-8 に昇格したとき)
756
757		=begin original
758
759		The function L<C<utf8::utf8_upgrade()>\|utf8/Utility functions>
760		can be explicitly used to permanently (unless a subsequent
761		C<utf8::utf8_downgrade()> is called) cause a string to be treated as
762		Unicode.
763
764		=end original
765
766		L<C<utf8::utf8_upgrade()>\|utf8/Utility functions> 関数は、
767		(後に C<utf8::utf8_downgrade()> が呼び出されるまで)
768		恒久的に文字列を Unicode として扱うことを明示的に示すために使われます。
769
770		=item * There are additional methods for regular expression patterns
771
772		(正規表現パターンに追加の手法があるとき)
773
774		=begin original
775
776		A pattern that is compiled with the C<< /u >> or C<< /a >> modifiers is
777		treated as Unicode (though there are some restrictions with C<< /a >>).
778		Under the C<< /d >> and C<< /l >> modifiers, there are several other
779		indications for Unicode; see L<perlre/Character set modifiers>.
780
781		=end original
782
783		C<< /u >> や C<< /a >> の修飾子付きでコンパイルされたパターンは、
784		Unicode として扱われます (但し、C<< /a >> にはいくつかの制限があります)。
785		C<< /d >> と C<< /l >> の修飾子の下では、Unicode を示す他の方法がいくつか
786		あります; L<perlre/Character set modifiers> を参照してください。
787
788		=back
789
790		=begin original
791
792		Note that all of the above are overridden within the scope of
793		C<L<use bytes\|bytes>>; but you should be using this pragma only for
794		debugging.
795
796		=end original
797
798		前述の全ては C<L<use bytes\|bytes>> のスコープ内では上書きされます;
799		しかしこのプラグマはデバッグ用にのみ使うべきです。
800
801		=begin original
802
803		Note also that some interactions with the platform's operating system
804		never use Unicode rules.
805
806		=end original
807
808		また、プラットフォームのオペレーティングシステムとの相互作用の一部は
809		決して Unicode の規則を使いません。
810
811		=begin original
812
813		When Unicode rules are in effect:
814
815		=end original
816
817		Unicode の規則が有効の場合:
818
819		=over 4
820
821	620	=item *
822	621
823	622	=begin original
824	623
825		~~Case~~ translat~~ion~~ operators ~~use~~ the ~~Unic~~o~~de case trans~~l~~ati~~on tables.
	624	lc(), uc(), lcfirst(), and ucfirst() work for the following cases:
826	625
827	626	=end original
828	627
829		~~大小文字の変換演算子は~~ Unic~~ode~~ の~~大小文字変換テーブルを使用し~~ます。
	628	lc(), uc(), lcfirst(), ucfirst() は以下の場合に働きます:
830	629
831		=~~begin~~ or~~iginal~~
	630	=over 8
832	631
833		Note that C<uc()>, or C<\U> in interpolated strings, translates to
834		uppercase, while C<ucfirst>, or C<\u> in interpolated strings,
835		translates to titlecase in languages that make the distinction (which is
836		equivalent to uppercase in languages without the distinction).
837
838		=end original
839
840		C<uc()> や展開文字列中の C<\U> は大文字に変換し、C<ucfirst> や
841		展開文字列中の C<\u> はその言語で区別されているときに
842		タイトルケースに変換します (これは、区別がない言語では大文字と等価です)。
843
844		=begin original
845
846		There is a CPAN module, C<L<Unicode::Casing>>, which allows you to
847		define your own mappings to be used in C<lc()>, C<lcfirst()>, C<uc()>,
848		C<ucfirst()>, and C<fc> (or their double-quoted string inlined versions
849		such as C<\U>). (Prior to Perl 5.16, this functionality was partially
850		provided in the Perl core, but suffered from a number of insurmountable
851		drawbacks, so the CPAN module was written instead.)
852
853		=end original
854
855		C<lc()>, C<lcfirst()>, C<uc()>, C<ucfirst()>, C<fc> (および C<\U> のような
856		ダブルクォート文字列インライン版) で使える独自のマッピングを定義できる
857		CPAN モジュール C<L<Unicode::Casing>> があります。
858		(Perl 5.16 以前では、この機能は Perl コアで部分的に提供されていましたが、
859		多くの克服できない欠点があったため、代わりに CPAN モジュールが書かれました。)
860
861	632	=item *
862	633
863	634	=begin original
864	635
865		~~Charac~~ter classes in reg~~ular~~ ~~express~~ions m~~atch~~ based on the character
	636	the case mapping is from a single Unicode character to another
866		~~propertie~~s ~~spec~~i~~fied i~~n the Unicode pr~~operties d~~at~~abas~~e.
	637	single Unicode character, or
867	638
868	639	=end original
869	640
870		~~正規表現の文字クラ~~スは、Unicode ~~特性データベースで定義されている~~文字~~特性を~~
	641	ケースマッピングが単一の Unicode 文字から
871		~~基にしてマッチングします。~~
	642	別の単一の Unicode 文字へのものであるか、
872	643
873		=begin original
874
875		C<\w> can be used to match a Japanese ideograph, for instance; and
876		C<[[:digit:]]> a Bengali number.
877
878		=end original
879
880		例えば、C<\w> は日本語の文字にマッチングするために使われ、
881		C<[[:digit:]]> はベンガル数字に使われます。
882
883	644	=item *
884	645
885	646	=begin original
886	647
887		~~Nam~~ed ~~Uni~~code prop~~ert~~i~~es,~~ ~~scr~~ipts, ~~and bl~~ock ranges ~~may b~~e used ~~(lik~~e
	648	the case mapping is from a single Unicode character to more
888		~~bracke~~t~~ed c~~ha~~racter~~ ~~class~~es) ~~by usi~~n~~g th~~e ~~C<\p{}> "mat~~ch~~es p~~roper~~ty"~~
	649	than one Unicode character.
889		construct and the C<\P{}> negation, "doesn't match property".
890	650
891	651	=end original
892	652
893		~~名前付き~~ Unicode ~~特性、用~~字~~、ブロック範囲は、~~
	653	ケースマッピングが単一の Unicode 文字から
894		~~C<\p{}>~~ ~~「特性にマッチング」構~~文~~および否定~~である ~~C<\P{}>~~
	654	一文字以上の Unicode 文字へのものである。
895		「特性にマッチングしない」を使って(大かっこ文字クラスのように)使えます。
896	655
897		=b~~egin origin~~al
	656	=back
898	657
899		See L</"Unicode Character Properties"> for more details.
900
901		=end original
902
903		さらなる詳細については L</"Unicode Character Properties"> を参照してください。
904
905	658	=begin original
906	659
907		~~You can def~~ine your own cha~~ract~~er ~~proper~~ties and use t~~hem~~
	660	Things to do with locales (Lithuanian, Turkish, Azeri) do B<not> work
908		in the re~~gula~~r e~~xpre~~s~~sio~~n with the ~~C<\~~p~~{}>~~ or ~~C<\P{}>~~ co~~nstru~~ct.
	661	since Perl does not understand the concept of Unicode locales.
909		See L</"User-Defined Character Properties"> for more details.
910	662
911	663	=end original
912	664
913		~~独自の文字特性を定義して、C<\p{}>~~ と C<~~\P{}~~> ~~構文によって~~
	665	ロケール(Lithuanian, Turkish, Azeri)に付随することは B<働きません>。
914		~~正規表現で~~それ~~らを使うこと~~ができます。
	666	それは Perl が Unicode のロケールのコンセプトを理解しないからです。
915		さらなる詳細については L</"User-Defined Character Properties"> を
916		参照してください。
917	667
918		=back
919
920		=head2 Extended Grapheme Clusters (Logical characters)
921
922		(拡張書記素クラスタ (論理文字))
923
924	668	=begin original
925	669
926		Consider a ch~~ara~~cter, say ~~C<H>. It could~~ app~~ear w~~ith ~~vari~~ous marks ~~aroun~~d it,
	670	See the Unicode Technical Report #21, Case Mappings, for more details.
927		such as an acute accent, or a circumflex, or various hooks, circles, arrows,
928		I<etc.>, above, below, to one side or the other, I<etc>. There are many
929		possibilities among the world's languages. The number of combinations is
930		astronomical, and if there were a character for each combination, it would
931		soon exhaust Unicode's more than a million possible characters. So Unicode
932		took a different approach: there is a character for the base C<H>, and a
933		character for each of the possible marks, and these can be variously combined
934		to get a final logical character. So a logical character--what appears to be a
935		single character--can be a sequence of more than one individual characters.
936		The Unicode standard calls these "extended grapheme clusters" (which
937		is an improved version of the no-longer much used "grapheme cluster");
938		Perl furnishes the C<\X> regular expression construct to match such
939		sequences in their entirety.
940	671
941	672	=end original
942	673
943		一つの~~文字、例えば~~ C~~<H>~~ ~~につい~~て~~考えてみます~~。
	674	詳しくは Unicode Technical Report #21 の Case Mappings を参照してください。
944		これは文字の回りの様々なマークと共に現れることがあって、
945		鋭アクセント、曲折アクセント、フック、円、矢など、上、下、左、右、などです。
946		世界中の言語の中では多くの可能性があります。
947		組み合わせの数は天文学的で、
948		それぞれの組み合わせを一つの文字にすると、Unicode の数百万の可能な文字を
949		すぐに使い切ってしまいます。
950		それで Unicode は異なる手法を取りました:
951		基本となる C<H> を一つの文字として、
952		それぞれの可能なマークのそれぞれを一つの文字として、
953		最後に論理的な文字でこれらを様々に結合できるようにしました。
954		それで一つの論理文字--単一の文字として現れるもの--は
955		複数の独立した文字の並びになることがあります。
956		Unicode 標準はこれを「拡張書記素クラスタ」("extended grapheme cluster")
957		(もはやあまり使われない「書記素クラスタ」"grapheme cluster" の改良版) と
958		呼びます;
959		Perl はこのような並び丸ごとにマッチングする C<\X> 正規表現構文を
960		用意しています。
961	675
962	676	=begin original
963	677
964		But ~~Uni~~code's intent ~~is t~~o un~~ify~~ ~~the ex~~i~~sti~~ng ~~charac~~ter set s~~tan~~d~~ards~~ and
	678	But you can also define your own mappings to be used in the lc(),
965		~~pra~~ctices, and ~~several pre-ex~~ist~~ing~~ ~~standa~~rds have single ~~charact~~ers ~~that~~
	679	lcfirst(), uc(), and ucfirst() (or their string-inlined versions).
966		mean the same thing as some of these combinations, like ISO-8859-1,
967		which has quite a few of them. For example, C<"LATIN CAPITAL LETTER E
968		WITH ACUTE"> was already in this standard when Unicode came along.
969		Unicode therefore added it to its repertoire as that single character.
970		But this character is considered by Unicode to be equivalent to the
971		sequence consisting of the character C<"LATIN CAPITAL LETTER E">
972		followed by the character C<"COMBINING ACUTE ACCENT">.
973	680
974	681	=end original
975	682
976		しかし~~、Un~~ic~~ode~~ の~~目的は、既存の文字集合の標準とプラクティスを~~
	683	しかし lc(), lcfirst(), uc(), ucfirst() (およびこれらの
977		~~統合すること~~で~~あり、いくつか~~の~~既存の標準には、これらの組み合わせの~~
	684	文字列インライン版) で使える独自のマッピングも定義できます。
978		いくつかと同じことを意味する単一の文字があります;
979		たとえば、ISO-8859-1 には、かなりの数のそのような文字があります。
980		たとえば、C<"LATIN CAPITAL LETTER E WITH ACUTE"> は、Unicode が
981		登場したときにすでにこの標準に含まれていました。
982		したがって、Unicode はそれを単一の文字としてレパートリーに追加しました。
983		しかし、Unicode では、この文字は、文字 C<"LATIN CAPITAL LETTER E"> の後に
984		文字 C<"COMBINING ACUTE ACCENT"> が続く並びと等価であると見なされます。
985	685
986	686	=begin original
987	687
988		~~C<"LATIN~~ ~~CAPITA~~L ~~LETTER E WITH ACUTE~~"~~> i~~s ~~call~~ed a "p~~re-c~~omposed"
	688	See L</"User-Defined Case Mappings"> for more details.
989		character, and its equivalence with the "E" and the "COMBINING ACCENT"
990		sequence is called canonical equivalence. All pre-composed characters
991		are said to have a decomposition (into the equivalent sequence), and the
992		decomposition type is also called canonical. A string may be comprised
993		as much as possible of precomposed characters, or it may be comprised of
994		entirely decomposed characters. Unicode calls these respectively,
995		"Normalization Form Composed" (NFC) and "Normalization Form Decomposed".
996		The C<L<Unicode::Normalize>> module contains functions that convert
997		between the two. A string may also have both composed characters and
998		decomposed characters; this module can be used to make it all one or the
999		other.
1000	689
1001	690	=end original
1002	691
1003		C<"~~LATIN~~ C~~APITAL~~ ~~LETTER E WITH ACUTE"> は「合成済」(~~p~~re-com~~pos~~ed)~~ ~~文字と~~
	692	更なる詳細については L</"User-Defined Case Mappings"> を参照してください。
1004		呼ばれ、"E" および "COMBINING ACCENT" と等価な並びは正準等価
1005		(canonical equivalence) と呼ばれます。
1006		全ての合成済文字は(等価な並びに)分解でき、分解の種類もまた正準と呼ばれます。
1007		文字列は、可能な限り合成済文字で構成される場合もあれば、
1008		完全に分解された文字で構成される場合もあります。
1009		Unicode では、これらをそれぞれ
1010		「正規化形式 C」("Normalization Form Composed": NFC) と
1011		"Normalization Form Decomposed" と呼んでいます。
1012		C<L<Unicode::Normalize>> モジュールには、
1013		二つの文字を変換する関数が含まれています。
1014		文字列は、合成された文字と分解された文字の両方を持つこともできます。
1015		このモジュールを使用して、すべてを片方にすることも、
1016		もう片方にすることもできます。
1017	693
1018		=b~~egin origin~~al
	694	=back
1019	695
1020		Yo~~u may be presented with strings in any of these equi~~vale~~nt fo~~r~~ms.~~
	696	=over 4
1021		There is currently nothing in Perl 5 that ignores the differences. So
1022		you'll have to specially handle it. The usual advice is to convert your
1023		inputs to C<NFD> before processing further.
1024	697
1025		=end ~~original~~
	698	=item *
1026	699
1027		これらの同等のどの形式でも文字列が表現される場合があります。
1028		現在のところ、Perl 5 にはこの違いを無視するものは何もありません。
1029		そのため、特別にそれを扱う必要があります。
1030		通常のアドバイスは、処理を進める前に入力を C<NFD> に変換することです。
1031
1032	700	=begin original
1033	701
1034		~~For more~~ d~~etailed~~ inf~~ormat~~ion, see L<h~~ttp://uni~~code.org/reports/t~~r15/>~~.
	702	And finally, C<scalar reverse()> reverses by character rather than by byte.
1035	703
1036	704	=end original
1037	705
1038		~~さらに詳~~し~~い情報につい~~ては、L<~~http://uni~~c~~ode.o~~rg/reports~~/tr15/~~> を
	706	そして最後に、C<scalar reverse()> はバイト単位ではなく文字単位で
1039		~~参照してくださ~~い。
	707	反転を行います。
1040	708
	709	=back
	710
1041	711	=head2 Unicode Character Properties
1042	712
1043	713	(Unicode 文字特性)
1044	714
1045	715	=begin original
1046	716
1047		~~(Th~~e only time that ~~Per~~l con~~sid~~ers a seque~~nce of in~~d~~ividual~~ ~~cod~~e
	717	Named Unicode properties, scripts, and block ranges may be used like
1048		~~points as a single logi~~c~~al c~~haracter is in the C<\X> construct, a~~lrea~~dy
	718	character classes via the C<\p{}> "matches property" construct and
1049		~~men~~t~~ion~~ed ~~above.~~ The~~refore "ch~~a~~rac~~t~~er"~~ in ~~this~~ d~~iscussi~~on means ~~a singl~~e
	719	the C<\P{}> negation, "doesn't match property".
1050		Unicode code point.)
1051	720
1052	721	=end original
1053	722
1054		(Perl ~~が個々~~の~~符号位置の並びを単一の論理文~~字~~として扱う~~
	723	名前付けされた Unicode の特性、用字、ブロックの範囲は C<\p{}>
1055		唯一の~~タイミングは、既に前述した~~ C<\X> ~~構文です。~~
	724	"matches property" 構造やその否定形の C<\P{}> "doesn't match property" を
1056		従っ~~て、この議論での「~~文字~~」は単一の Unicode 符号位置を意味し~~ます。)
	725	使った文字クラスで使うことができます。
1057	726
1058	727	=begin original
1059	728
1060		Very nea~~rly~~ ~~all~~ ~~Uni~~code character ~~propert~~ies are acce~~ssible~~ ~~thro~~ugh
	729	For instance, C<\p{Lu}> matches any character with the Unicode "Lu"
1061		re~~gula~~r expres~~sions~~ by ~~using t~~he C<\p{}> "matches ~~propert~~y" c~~onst~~ruct
	730	(Letter, uppercase) property, while C<\p{M}> matches any character
1062		~~and~~ the ~~C<\P{}>~~ "doesn't match property" ~~for~~ its ~~neg~~a~~tio~~n.
	731	with an "M" (mark--accents and such) property. Brackets are not
	732	required for single letter properties, so C<\p{M}> is equivalent to
	733	C<\pM>. Many predefined properties are available, such as
	734	C<\p{Mirrored}> and C<\p{Tibetan}>.
1063	735
1064	736	=end original
1065	737
1066		~~ほぼ全ての~~ Unicode 文字特性は、
	738	たとえば、C<\p{Lu}> は Unicode の "Lu" (Letter, uppercase) 特性を持つ任意の
1067		C<\p{}> "ma~~tches p~~r~~operty"~~ ~~構文とその否定形の~~ ~~C<\P{}>~~
	739	文字にマッチし、C<\p{M}> は "M" (mark -- アクセントなど) 特性を持つ任意の
1068		~~"doesn't match property" を使った正規表現を通~~し~~てアクセス可能で~~す。
	740	文字にマッチします。
	741	ブラケットは一文字の特性では省略することができるので、C<\p{M}> は
	742	C<\PM> と等価です。
	743	C<\p{Mirrored}> や C<\p{Tibetan}> など多くの特性が定義されています。
1069	744
1070	745	=begin original
1071	746
1072		~~For~~ i~~nstan~~ce, ~~C<\p{~~U~~pper~~case}> ~~matche~~s any ~~sing~~le character ~~with~~ the ~~Unicode~~
	747	The official Unicode script and block names have spaces and dashes as
1073		~~C<"Upperca~~se"> property, ~~while~~ ~~C<\p{L}>~~ ~~mat~~che~~s a~~ny cha~~ract~~er ~~wit~~h a
	748	separators, but for convenience you can use dashes, spaces, or
1074		~~C<Ge~~nera~~l_C~~ategory> of ~~C<"L">~~ (letter) property ~~(see~~
	749	underbars, and case is unimportant. It is recommended, however, that
1075		~~L</Gene~~r~~al_Ca~~te~~gor~~y> below). Bra~~cke~~ts are not
	750	for consistency you use the following naming: the official Unicode
1076		r~~equ~~i~~red for single le~~t~~ter~~ property names, so ~~C<\p{L}>~~ is equival~~ent~~ ~~to C<\pL>.~~
	751	script, property, or block name (see below for the additional rules
	752	that apply to block names) with whitespace and dashes removed, and the
	753	words "uppercase-first-lowercase-rest". C<Latin-1 Supplement> thus
	754	becomes C<Latin1Supplement>.
1077	755
1078	756	=end original
1079	757
1080		~~たとえば、C<\p{Uppercase}>~~ は Unicode の ~~C<"Uppercase"> 特性~~を~~持つ任意の~~
	758	公式の Unicode 用字およびブロックの名前はスペースとダッシュを
1081		単一の~~文字にマ~~ッ~~チングし~~、~~C<\p{L}> は C<General_Category> C<"L"> (letter)~~
	759	セパレータとして使っていますが、便利のため、ダッシュ、スペース、
1082		特性を~~持つ任意の~~文字~~にマッチングし~~ます
	760	アンダーバーを使うことができ、また、大小文字の違いは重要ではありません。
1083		~~(後述する L</General_Category> 参照)。~~
	761	しかしながら、以下のネーミングにしたがって、首尾一貫して使うことを
1084		~~中かっこは一文字~~の特性~~名では省略することができるので~~、~~C<\p{L}>~~ は
	762	お勧めします: Unicode の用字、特性、ブロックの名前 (ブロック名に
1085		~~C<\pL>~~ ~~と等価です。~~
	763	適用される付加的なルールについて以下を参照してください) から
	764	空白とダッシュを取り除き、単語の先頭を大文字にし残りを小文字にします。
	765	したがって、C<Latin-1 Supplement> は C<Latin1Supplement> となります。
1086	766
1087	767	=begin original
1088	768
1089		More formally, C<\p{Uppercase}> matches any single character whose Unicode
1090		C<Uppercase> property value is C<True>, and C<\P{Uppercase}> matches any character
1091		whose C<Uppercase> property value is C<False>, and they could have been written as
1092		C<\p{Uppercase=True}> and C<\p{Uppercase=False}>, respectively.
1093
1094		=end original
1095
1096		より正式には、C<\p{Uppercase}> は Unicode の C<Uppercase> 特性値が
1097		C<True> である任意の単一の文字とマッチングし、C<\P{UpperCase}> は
1098		C<UpperCase> 特性値が C<False> である任意の文字とマッチングします;
1099		そしてこれらはそれぞれ C<\p{Uppercase=True}>, C<\p{Uppercase=False}> と書けます。
1100
1101		=begin original
1102
1103		This formality is needed when properties are not binary; that is, if they can
1104		take on more values than just C<True> and C<False>. For example, the
1105		C<Bidi_Class> property (see L</"Bidirectional Character Types"> below),
1106		can take on several different
1107		values, such as C<Left>, C<Right>, C<Whitespace>, and others. To match these, one needs
1108		to specify both the property name (C<Bidi_Class>), AND the value being
1109		matched against
1110		(C<Left>, C<Right>, I<etc.>). This is done, as in the examples above, by having the
1111		two components separated by an equal sign (or interchangeably, a colon), like
1112		C<\p{Bidi_Class: Left}>.
1113
1114		=end original
1115
1116		この形式は、特性が 2 値でない場合、つまり、単に C<True> と C<False> より多くの
1117		値を取ることができる場合に必要です。
1118		たとえば、C<Bidi_Class> 特性(L</"Bidirectional Character Types"> を参照)は、
1119		C<Left>, C<Right>, C<Whitespace> などのさまざまな値を取ることができます。
1120		これらにマッチングするには、特性名(C<Bidi_Class>)と、
1121		マッチングする値 (C<Left>, C<Right> など) の両方を指定する必要があります。
1122		これは、前述の例のように、二つの要素を等号
1123		(または、C<\p{Biddi_Class:Left}> のように交換可能なコロン)で
1124		区切ることによって、実行されます。
1125
1126		=begin original
1127
1128		All Unicode-defined character properties may be written in these compound forms
1129		of C<\p{I<property>=I<value>}> or C<\p{I<property>:I<value>}>, but Perl provides some
1130		additional properties that are written only in the single form, as well as
1131		single-form short-cuts for all binary properties and certain others described
1132		below, in which you may omit the property name and the equals or colon
1133		separator.
1134
1135		=end original
1136
1137		すべての Unicode が定義した文字特性は、C<\p{I<property>=I<value>}> や
1138		C<\p{I<property>:I<value>}> のような複合形式で書けますが、
1139		Perl は特性名および等号やコロンの区切り文字を省略できるように、
1140		単一形式でのみ書ける追加の特性や、全ての 2 値特性と一部の後述する
1141		ものに対する単一形式のショートカットを提供します。
1142
1143		=begin original
1144
1145		Most Unicode character properties have at least two synonyms (or aliases if you
1146		prefer): a short one that is easier to type and a longer one that is more
1147		descriptive and hence easier to understand. Thus the C<"L"> and
1148		C<"Letter"> properties above are equivalent and can be used
1149		interchangeably. Likewise, C<"Upper"> is a synonym for C<"Uppercase">,
1150		and we could have written C<\p{Uppercase}> equivalently as C<\p{Upper}>.
1151		Also, there are typically various synonyms for the values the property
1152		can be. For binary properties, C<"True"> has 3 synonyms: C<"T">,
1153		C<"Yes">, and C<"Y">; and C<"False"> has correspondingly C<"F">,
1154		C<"No">, and C<"N">. But be careful. A short form of a value for one
1155		property may not mean the same thing as the short form spelled the same
1156		for another.
1157		Thus, for the C<L</General_Category>> property, C<"L"> means
1158		C<"Letter">, but for the L<C<Bidi_Class>\|/Bidirectional Character Types>
1159		property, C<"L"> means C<"Left">. A complete list of properties and
1160		synonyms is in L<perluniprops>.
1161
1162		=end original
1163
1164		ほとんどの Unicode 文字特性には、少なくとも二つの同義語
1165		(またはあなたが好むなら別名)があります; 簡単に入力できる短いものと、
1166		より長いけれども説明的で理解しやすいものです。
1167		したがって、前述の C<"L"> および C<"Letter"> 特性は等価であり、
1168		交換可能です。
1169		同様に、C<"Upper"> は C<"Uppercase"> の同義語であり、C<\p{Uppercase}> は
1170		等価に C<\p{Upper}> と書けます。
1171		また、典型的には特性の値に対してさまざまな同義語があります。
1172		2 値特性の場合、C<"True"> には三つの同義語があります:
1173		C<"T">, C<"Yes">, C<"Y">; C<"False"> には C<"F">, C<"No">, C<"N"> が
1174		あります。
1175		しかし注意してください。
1176		ある特性に対する値の短い形式は、他の特性の同じ綴りの短い形式と同じものを
1177		意味するとは限りません。
1178		従って、C<L</General_Category>> 特性では C<"L"> は C<"Letter"> を
1179		意味しますが、L<C<Bidi_Class>\|/Bidirectional Character Types> 特性では、
1180		C<"L"> は C<"Left"> を意味します。
1181		特性および同義語の完全な一覧は L<perluniprops> にあります。
1182
1183		=begin original
1184
1185		Upper/lower case differences in property names and values are irrelevant;
1186		thus C<\p{Upper}> means the same thing as C<\p{upper}> or even C<\p{UpPeR}>.
1187		Similarly, you can add or subtract underscores anywhere in the middle of a
1188		word, so that these are also equivalent to C<\p{U_p_p_e_r}>. And white space
1189		is generally irrelevant adjacent to non-word characters, such as the
1190		braces and the equals or colon separators, so C<\p{ Upper }> and
1191		C<\p{ Upper_case : Y }> are equivalent to these as well. In fact, white
1192		space and even hyphens can usually be added or deleted anywhere. So
1193		even C<\p{ Up-per case = Yes}> is equivalent. All this is called
1194		"loose-matching" by Unicode. The "name" property has some restrictions
1195		on this due to a few outlier names. Full details are given in
1196		L<https://www.unicode.org/reports/tr44/tr44-24.html#UAX44-LM2>.
1197
1198		=end original
1199
1200		特性名と値の大文字と小文字の違いは無関係です;
1201		したがって C<\p{Upper}> は C<\p{upper}>, さらには C<\p{UpPeR}> とも同じことを
1202		意味します。
1203		同様に、一般的に単語の中のどこにでも下線を追加または削除できるので、
1204		これらは C<\p{U_p_p_e_r}> とも等価です。
1205		また、中かっこや等号、コロンなどの非単語文字に隣接した空白は無視されるので、
1206		C<\p{ Upper }> and C<\p{ Upper_case : Y }> も等価です。
1207		実際には、通常、空白とハイフンさえどこにでも追加または削除できます。
1208		したがって、C<\p{Upper case=Yes}> ですらも等価です。
1209		これはすべて Unicode で「緩いマッチング」と呼ばれます。
1210		"name" 特性は、いくつかの特殊な名前のためにいくつかの制限があります。
1211		完全な詳細は
1212		L<https://www.unicode.org/reports/tr44/tr44-24.html#UAX44-LM2> にあります。
1213
1214		=begin original
1215
1216		The few places where stricter matching is
1217		used is in the middle of numbers, the "name" property, and in the Perl
1218		extension properties that begin or end with an underscore. Stricter
1219		matching cares about white space (except adjacent to non-word
1220		characters), hyphens, and non-interior underscores.
1221
1222		=end original
1223
1224		数少ない厳密なマッチングが採用されている場所は数値の中、
1225		"name" 特性、下線で始まったり終わったりする Perl 拡張特性です。
1226		より厳密なマッチングは空白(非単語文字に隣接するものを除く)、ハイフン、
1227		非内部下線を考慮します。
1228
1229		=begin original
1230
1231	769	You can also use negation in both C<\p{}> and C<\P{}> by introducing a caret
1232		(C<^>) between the first brace and the property name: C<\p{^Tamil}> is
	770	(^) between the first brace and the property name: C<\p{^Tamil}> is
1233	771	equal to C<\P{Tamil}>.
1234	772
1235	773	=end original
1236	774
1237		C<\p{}> と C<\P{}> の両方で、キャレット(C<^>) を最初のブレースと
	775	C<\p{}> と C<\P{}> の両方で、キャレット(^) を最初のブレースと
1238	776	特性名の間に置くことによって意味を反転することができます:
1239	777	C<\p{^Tamil}> は C<\P{Tamil}> と等価です。
1240	778
1241	779	=begin original
1242	780
1243		~~Almos~~t ~~all~~ properties are i~~mmu~~ne to case-ins~~ensi~~tive ~~matc~~h~~ing.~~ That is,
	781	B<NOTE: the properties, scripts, and blocks listed here are as of
1244		~~addi~~n~~g a C</~~i~~> regular expressi~~o~~n mo~~d~~ifi~~er ~~does~~ not ~~change what the~~y
	782	Unicode 5.0.0 in July 2006.>
1245		match. There are two sets that are affected.
1246		The first set is
1247		C<Uppercase_Letter>,
1248		C<Lowercase_Letter>,
1249		and C<Titlecase_Letter>,
1250		all of which match C<Cased_Letter> under C</i> matching.
1251		And the second set is
1252		C<Uppercase>,
1253		C<Lowercase>,
1254		and C<Titlecase>,
1255		all of which match C<Cased> under C</i> matching.
1256		This set also includes its subsets C<PosixUpper> and C<PosixLower> both
1257		of which under C</i> match C<PosixAlpha>.
1258		(The difference between these sets is that some things, such as Roman
1259		numerals, come in both upper and lower case so they are C<Cased>, but
1260		aren't considered letters, so they aren't C<Cased_Letter>'s.)
1261	783
1262	784	=end original
1263	785
1264		~~ほとんど全て~~の特性~~は大文~~字~~小文字を考慮したマ~~ッ~~チング~~の~~影響を受けません。~~
	786	B<注意: ここでの特性、用字、ブロックは 2006 年 7 月の Unicode 5.0.0 に
1265		つま~~り、C</i> 正規表現修飾子を追加~~す~~ることは、~~
	787	従っています。>
1266		それらがマッチングするものを変えません。
1267		影響を受ける二つの集合があります。
1268		最初の集合は、
1269		C<Uppercase_Letter>,
1270		C<Lowercase_Letter>,
1271		C<Titlecase_Letter>,
1272		C</i> の下で C<Cased_Letter> にマッチングする全てです。
1273		二番目の集合は、
1274		C<Uppercase>,
1275		C<Lowercase>,
1276		C<Titlecase>,
1277		C</i> マッチングの基で C<Cased> にマッチングする全てです。
1278		この集合はまた、C</i> マッチングの基で C<PosixAlpha> にマッチングする
1279		そのサブセット C<PosixUpper> と C<PosixLower> を含みます。
1280		(これらの集合の違いは、ローマ数字のような一部のもので、
1281		大文字と小文字の両方に含まれるので C<Cased> であるけれども、
1282		しかし字と考えられないので、C<Cased_Letter> ではありません。)
1283	788
1284		=~~begin~~ or~~iginal~~
	789	=over 4
1285	790
1286		~~See L</Beyond Un~~i~~cod~~e ~~cod~~e ~~poi~~n~~ts> for sp~~ecial ~~consider~~at~~ions wh~~en
	791	=item General Category
1287		matching Unicode properties against non-Unicode code points.
1288	792
1289		=end original
1290
1291		非 Unicode 符号位置に対して Unicode 特性をマッチングしたときの
1292		特殊処理については L</Beyond Unicode code points> を参照してください。
1293
1294		=head3 B<General_Category>
1295
1296	793	=begin original
1297	794
1298		Ever~~y Unicod~~e character is assigned ~~a g~~eneral category, w~~hich~~ is the ~~"most~~
	795	Here are the basic Unicode General Category properties, followed by their
1299		~~usua~~l ~~cate~~gor~~ization~~ of a character~~" (from~~
	796	long form. You can use either; C<\p{Lu}> and C<\p{UppercaseLetter}>,
1300		~~L<http~~s~~://www.u~~nicode.or~~g/r~~e~~por~~t~~s/tr44>)~~.
	797	for instance, are identical.
1301	798
1302	799	=end original
1303	800
1304		全ての Unicode ~~文字は一つ~~の一般カテゴリ~~に割り当てられています;~~
	801	以下に挙げるのは、Unicode の一般カテゴリ特性(General Category properties) で、
1305		~~これは「その文字の最も普通のカテゴライズ」~~
	802	長形式が並んでいます。
1306		(L<~~htt~~p~~s://www.~~u~~nicod~~e.or~~g/r~~e~~por~~ts/tr44> ~~より)です。~~
	803	たとえば、C<\p{Lu}> と C<\p{UppercaseLetter}> は同じものとして
	804	扱うことができます。
1307	805
1308		=begin original
1309
1310		The compound way of writing these is like C<\p{General_Category=Number}>
1311		(short: C<\p{gc:n}>). But Perl furnishes shortcuts in which everything up
1312		through the equal or colon separator is omitted. So you can instead just write
1313		C<\pN>.
1314
1315		=end original
1316
1317		これらを書く複合的な方法は C<\p{General_Category=Number}>
1318		(短縮形: C<\p{gc:n}>) のようなものです。
1319		Perl は等号またはコロンの区切り文字までの全てを省略できる機能を
1320		提供しています。
1321		従って、代わりに単に C<\pN> と書けます。
1322
1323		=begin original
1324
1325		Here are the short and long forms of the values the C<General Category> property
1326		can have:
1327
1328		=end original
1329
1330		以下は、Unicode の C<一般カテゴリ> 特性が持つことができる値の
1331		短形式と長形式です:
1332
1333	806	Short Long
1334	807
1335	808	L Letter
1336		LC, L& Cased_Letter ~~(that is: [\p{Ll}\p{Lu}\p{Lt}])~~
	809	LC CasedLetter
1337		Lu Uppercase_Letter
	810	Lu UppercaseLetter
1338		Ll Lowercase_Letter
	811	Ll LowercaseLetter
1339		Lt Titlecase_Letter
	812	Lt TitlecaseLetter
1340		Lm Modifier_Letter
	813	Lm ModifierLetter
1341		Lo Other_Letter
	814	Lo OtherLetter
1342	815
1343	816	M Mark
1344		Mn Nonspacing_Mark
	817	Mn NonspacingMark
1345		Mc Spacing_Mark
	818	Mc SpacingMark
1346		Me Enclosing_Mark
	819	Me EnclosingMark
1347	820
1348	821	N Number
1349		Nd Decimal_Number ~~(also Digit)~~
	822	Nd DecimalNumber
1350		Nl Letter_Number
	823	Nl LetterNumber
1351		No Other_Number
	824	No OtherNumber
1352	825
1353		P Punctuation ~~(also Punct)~~
	826	P Punctuation
1354		Pc Connector_Punctuation
	827	Pc ConnectorPunctuation
1355		Pd Dash_Punctuation
	828	Pd DashPunctuation
1356		Ps Open_Punctuation
	829	Ps OpenPunctuation
1357		Pe Close_Punctuation
	830	Pe ClosePunctuation
1358		Pi Initial_Punctuation
	831	Pi InitialPunctuation
1359	832	(may behave like Ps or Pe depending on usage)
1360		Pf Final_Punctuation
	833	Pf FinalPunctuation
1361	834	(may behave like Ps or Pe depending on usage)
1362		Po Other_Punctuation
	835	Po OtherPunctuation
1363	836
1364	837	S Symbol
1365		Sm Math_Symbol
	838	Sm MathSymbol
1366		Sc Currency_Symbol
	839	Sc CurrencySymbol
1367		Sk Modifier_Symbol
	840	Sk ModifierSymbol
1368		So Other_Symbol
	841	So OtherSymbol
1369	842
1370	843	Z Separator
1371		Zs Space_Separator
	844	Zs SpaceSeparator
1372		Zl Line_Separator
	845	Zl LineSeparator
1373		Zp Paragraph_Separator
	846	Zp ParagraphSeparator
1374	847
1375	848	C Other
1376		Cc Control ~~(also Cntrl)~~
	849	Cc Control
1377	850	Cf Format
1378		Cs Surrogate
	851	Cs Surrogate (not usable)
1379		Co Private_Use
	852	Co PrivateUse
1380	853	Cn Unassigned
1381	854
1382	855	=begin original
1383	856
1384	857	Single-letter properties match all characters in any of the
1385	858	two-letter sub-properties starting with the same letter.
1386		C<LC> and C<L&> are special: ~~bot~~h are aliases for the set co~~nsisting o~~f ~~everything matched by C<Ll>, C<Lu>, and C<Lt>.~~
	859	C<LC> and C<L&> are special cases, which are aliases for the set of
	860	C<Ll>, C<Lu>, and C<Lt>.
1387	861
1388	862	=end original
1389	863
1390	864	単一文字の特性は同じ文字で始まる二文字の任意のサブ特性に含まれる
1391	865	すべての文字にマッチします。
1392		C<LC> と C<L&> は特別です: ~~両方とも~~ C<Ll>, C<Lu>, C<Lt> に
	866	C<LC> と C<L&> は特別なケースで、これは C<Ll>, C<Lu>, C<Lt> の別名です。
1393		マッチングする全てからなる集合への別名です。
1394	867
1395		=he~~ad3 B<B~~idir~~ect~~ional ~~Character Types>~~
	868	=begin original
1396	869
1397		(B~~<双方向文字型>)~~
	870	Because Perl hides the need for the user to understand the internal
	871	representation of Unicode characters, there is no need to implement
	872	the somewhat messy concept of surrogates. C<Cs> is therefore not
	873	supported.
1398	874
	875	=end original
	876
	877	Perl はユーザーが Unicode 文字の内部表現について理解する必要が
	878	ないようにしているので、サロゲートの面倒なコンセプトについて
	879	実装する必要はありません。
	880	従って、C<Cs> はサポートされていません。
	881
	882	=item Bidirectional Character Types
	883
1399	884	=begin original
1400	885
1401		Because scripts differ in their directionality (Hebrew ~~and Arab~~i~~c are~~
	886	Because scripts differ in their directionality--Hebrew is
1402		written right to left, for example) Unicode supplies ~~a C<Bidi_Cla~~ss> property.
	887	written right to left, for example--Unicode supplies these properties in
1403		~~Some of~~ the values ~~thi~~s ~~property~~ ca~~n have are~~:
	888	the BidiClass class:
1404	889
1405	890	=end original
1406	891
1407		用字はその方向性で異なるので (例えばヘブライ語~~とアラビア語~~は右から左に
	892	用字はその方向性で異なるので--たとえばヘブライ語は右から左に書きます --
1408		~~書きます)~~ Unicode は以下の特性を C<Bidi_Class> 特性で提供しています。
	893	Unicode は以下の特性を BidiClass クラスで提供しています:
1409		この特性が持つことができる値の一部は:
1410	894
1411		~~Valu~~e Meaning
	895	Property Meaning
1412	896
1413	897	L Left-to-Right
1414	898	LRE Left-to-Right Embedding
1415	899	LRO Left-to-Right Override
1416	900	R Right-to-Left
1417		AL Arabic ~~Letter~~
	901	AL Right-to-Left Arabic
1418	902	RLE Right-to-Left Embedding
1419	903	RLO Right-to-Left Override
1420	904	PDF Pop Directional Format
1421	905	EN European Number
1422		ES European Separator
	906	ES European Number Separator
1423		ET European Terminator
	907	ET European Number Terminator
1424	908	AN Arabic Number
1425		CS Common Separator
	909	CS Common Number Separator
1426	910	NSM Non-Spacing Mark
1427	911	BN Boundary Neutral
1428	912	B Paragraph Separator
1429	913	S Segment Separator
1430	914	WS Whitespace
1431	915	ON Other Neutrals
1432	916
1433	917	=begin original
1434	918
1435		~~This~~ p~~rop~~e~~rty~~ is a~~lway~~s written in the ~~compou~~n~~d f~~orm.
	919	For example, C<\p{BidiClass:R}> matches characters that are normally
1436		For ~~example, C<\p{B~~i~~di_Class:R}> ma~~t~~ches charac~~ters that ~~are n~~o~~rma~~lly
	920	written right to left.
1437		written right to left. Unlike the
1438		C<L</General_Category>> property, this
1439		property can have more values added in a future Unicode release. Those
1440		listed above comprised the complete set for many Unicode releases, but
1441		others were added in Unicode 6.3; you can always find what the
1442		current ones are in L<perluniprops>. And
1443		L<https://www.unicode.org/reports/tr9/> describes how to use them.
1444	921
1445	922	=end original
1446	923
1447		~~この特性~~は常に~~複合形式で~~書かれます。
	924	たとえば、C<\p{BidiClass:R}> は通常右から左に書く文字にマッチします。
1448		たとえば、C<\p{Bidi_Class:R}> は通常右から左に書く文字にマッチします。
1449		C<L</General_Category>> 特性とは異なり、
1450		この特性は将来リリースされる Unicode でさらに値が追加されるかもしれません。
1451		これらの上述したものは何回もの Unicode のリリースの間完全な一覧でしたが、
1452		その他の物は Unicode 6.3 で追加されたものです;
1453		現在の内容についてはいつでも L<perluniprops> で確認できます。
1454		これらの使い方については
1455		L<https://www.unicode.org/reports/tr9/> に記述されています。
1456	925
1457		=he~~ad3~~ B<Scripts>
	926	=item Scripts
1458	927
1459		(B<用字>)
1460
1461	928	=begin original
1462	929
1463		The ~~world'~~s languages ~~are~~ wri~~tten~~ ~~in m~~any ~~diff~~e~~rent~~ s~~cri~~pts. ~~This se~~n~~tence~~
	930	The script names which can be used by C<\p{...}> and C<\P{...}>,
1464		~~(unle~~s~~s yo~~u~~'re~~ rea~~ding~~ i~~t i~~n tra~~nsla~~tion) is wri~~tten~~ i~~n Latin~~, ~~whil~~e ~~Russi~~an is
	931	such as in C<\p{Latin}> or C<\p{Cyrillic}>, are as follows:
1465		written in Cyrillic, and Greek is written in, well, Greek; Japanese mainly in
1466		Hiragana or Katakana. There are many more.
1467	932
1468	933	=end original
1469	934
1470		世界の~~言語は多くの異~~な~~った用字~~で~~書かれています。~~
	935	C<\p{Latin}> や C<\p{Cyrillic}> のような、C<\p{...}> と C<\P{...}> で
1471		この~~文は(訳文を読ん~~で~~いない限り)ラテン文~~字~~で書かれていますが、ロシア語~~は
	936	使うことのできる用字名は以下の通り:
1472		キリル文字で書かれています; そしてギリシャ語は、ええと、ギリシャ文字です;
1473		日本語は主にひらがなやカタカナで書かれています。
1474		もっとたくさんあります。
1475	937
1476		~~=begin~~ or~~igin~~al
	938	Arabic
	939	Armenian
	940	Balinese
	941	Bengali
	942	Bopomofo
	943	Braille
	944	Buginese
	945	Buhid
	946	CanadianAboriginal
	947	Cherokee
	948	Coptic
	949	Cuneiform
	950	Cypriot
	951	Cyrillic
	952	Deseret
	953	Devanagari
	954	Ethiopic
	955	Georgian
	956	Glagolitic
	957	Gothic
	958	Greek
	959	Gujarati
	960	Gurmukhi
	961	Han
	962	Hangul
	963	Hanunoo
	964	Hebrew
	965	Hiragana
	966	Inherited
	967	Kannada
	968	Katakana
	969	Kharoshthi
	970	Khmer
	971	Lao
	972	Latin
	973	Limbu
	974	LinearB
	975	Malayalam
	976	Mongolian
	977	Myanmar
	978	NewTaiLue
	979	Nko
	980	Ogham
	981	OldItalic
	982	OldPersian
	983	Oriya
	984	Osmanya
	985	PhagsPa
	986	Phoenician
	987	Runic
	988	Shavian
	989	Sinhala
	990	SylotiNagri
	991	Syriac
	992	Tagalog
	993	Tagbanwa
	994	TaiLe
	995	Tamil
	996	Telugu
	997	Thaana
	998	Thai
	999	Tibetan
	1000	Tifinagh
	1001	Ugaritic
	1002	Yi
1477	1003
1478		~~The Un~~i~~cod~~e ~~C<Script> and C<Script_~~Exten~~sions>~~ propert~~ies~~ ~~give wh~~at
	1004	=item Extended property classes
1479		script a given character is in. The C<Script_Extensions> property is an
1480		improved version of C<Script>, as demonstrated below. Either property
1481		can be specified with the compound form like
1482		C<\p{Script=Hebrew}> (short: C<\p{sc=hebr}>), or
1483		C<\p{Script_Extensions=Javanese}> (short: C<\p{scx=java}>).
1484		In addition, Perl furnishes shortcuts for all
1485		C<Script_Extensions> property names. You can omit everything up through
1486		the equals (or colon), and simply write C<\p{Latin}> or C<\P{Cyrillic}>.
1487		(This is not true for C<Script>, which is required to be
1488		written in the compound form. Prior to Perl v5.26, the single form
1489		returned the plain old C<Script> version, but was changed because
1490		C<Script_Extensions> gives better results.)
1491	1005
1492		=end original
1493
1494		Unicode の C<Script> と C<Script_Extensions> 特性は、指定された
1495		文字の中にある用字を示します。
1496		C<Script_Extensions> 特性は、後述するように、
1497		C<Script> の改良版です。
1498		それぞれの用字は C<\p{Script=Hebrew}> (短縮: C<\p{sc=hebr}>)
1499		または
1500		C<\p{Script_Extensions=Javanese}> (short: C<\p{scx=java}>) のような
1501		複合形式で指定できます。
1502		さらに Perl は、すべての C<Script_Extensions> 用字のショートカットを
1503		提供します。
1504		等号(またはコロン)までのすべてを省略できます;
1505		そして単に C<\p{Latin}> や C<\P{Cyrillic}> と書けます。
1506		(これは C<Script> では正しくありません; これは
1507		複合形式で書かれることを要求します。
1508		Perl v5.26 より前は、単一形式は昔ながらの単純な
1509		C<Script> 版を返していましたが、C<Script_Extensions> が
1510		より良い結果を返すので、変更されました。)
1511
1512	1006	=begin original
1513	1007
1514		~~The difference be~~tween these ~~two~~ propert~~ies~~ ~~invo~~lves cha~~racter~~s that are
	1008	Extended property classes can supplement the basic
1515		use~~d in mul~~tiple s~~cripts.~~ ~~For~~ e~~xampl~~e the di~~git~~s ~~'0'~~ thro~~ugh~~ ~~'9'~~ are
	1009	properties, defined by the F<PropList> Unicode database:
1516		used in many parts of the world. These are placed in a script named
1517		C<Common>. Other characters are used in just a few scripts. For
1518		example, the C<"KATAKANA-HIRAGANA DOUBLE HYPHEN"> is used in both Japanese
1519		scripts, Katakana and Hiragana, but nowhere else. The C<Script>
1520		property places all characters that are used in multiple scripts in the
1521		C<Common> script, while the C<Script_Extensions> property places those
1522		that are used in only a few scripts into each of those scripts; while
1523		still using C<Common> for those used in many scripts. Thus both these
1524		match:
1525	1010
1526	1011	=end original
1527	1012
1528		~~これら二つの~~特性~~の違い~~は、複数の~~用字で使われている文字に関係があります。~~
	1013	拡張特性クラスは基本特性を補完し、Unicode データベースの
1529		~~例えば、数字~~ ~~'0' から '9' は世界中の大部分~~で使われています。
	1014	F<PropList> で定義されています:
1530		これらは C<Common> という名前の用字に置かれています。
1531		その他の文字はほんのいくつかの用字でのみ使われています。
1532		例えば、C<"KATAKANA-HIRAGANA DOUBLE HYPHEN"> は日本語の二つの用字
1533		Katakana と Hiragana の両方で使われていますが、その他では使われていません。
1534		C<Script> 特性は、C<Common> 用字にあって、複数のの用字で使われている
1535		全ての文字に与えられています;
1536		一方 C<Script_Extensions> 特性は、それらの用字それぞれのほんのいくつかの
1537		用字でのみ使われているものに与えられます;
1538		一方多くの用字で使われているものについては未だ C<Common> が使われています。
1539		従ってこれらは両方ともマッチングし:
1540	1015
1541		~~"0"~~ =~ ~~/\p{sc=Common}/~~ ~~# Match~~es
	1016	ASCIIHexDigit
1542		~~"0"~~ =~ ~~/\p{scx=~~Co~~mmo~~n~~}/ # Ma~~t~~ches~~
	1017	BidiControl
	1018	Dash
	1019	Deprecated
	1020	Diacritic
	1021	Extender
	1022	HexDigit
	1023	Hyphen
	1024	Ideographic
	1025	IDSBinaryOperator
	1026	IDSTrinaryOperator
	1027	JoinControl
	1028	LogicalOrderException
	1029	NoncharacterCodePoint
	1030	OtherAlphabetic
	1031	OtherDefaultIgnorableCodePoint
	1032	OtherGraphemeExtend
	1033	OtherIDStart
	1034	OtherIDContinue
	1035	OtherLowercase
	1036	OtherMath
	1037	OtherUppercase
	1038	PatternSyntax
	1039	PatternWhiteSpace
	1040	QuotationMark
	1041	Radical
	1042	SoftDotted
	1043	STerm
	1044	TerminalPunctuation
	1045	UnifiedIdeograph
	1046	VariationSelector
	1047	WhiteSpace
1543	1048
1544	1049	=begin original
1545	1050
1546		and ~~only~~ the first of these match:
	1051	and there are further derived properties:
1547	1052
1548	1053	=end original
1549	1054
1550		そし~~てこれらは最初だけ~~が~~マッチングし~~ます:
	1055	その他にも派生した特性があります:
1551	1056
1552		~~"\N{KATAKANA-HIRAGANA~~ ~~DOUBLE~~ ~~HYPHEN}"~~ ~~=~ /\~~p{sc=Common} # Matches
	1057	Alphabetic = Lu + Ll + Lt + Lm + Lo + Nl + OtherAlphabetic
1553		~~"\N{KATAKANA-HIRAGANA~~ ~~DOUB~~LE ~~HYPHEN}"~~ =~ ~~/\p{scx~~=~~Common}~~ # No match
	1058	Lowercase = Ll + OtherLowercase
	1059	Uppercase = Lu + OtherUppercase
	1060	Math = Sm + OtherMath
1554	1061
1555		=~~begin~~ or~~igin~~al
	1062	IDStart = Lu + Ll + Lt + Lm + Lo + Nl + OtherIDStart
	1063	IDContinue = IDStart + Mn + Mc + Nd + Pc + OtherIDContinue
1556	1064
1557		~~And~~ ~~only~~ ~~the~~ last two o~~f th~~e~~se ma~~t~~ch:~~
	1065	DefaultIgnorableCodePoint
	1066	= OtherDefaultIgnorableCodePoint
	1067	+ Cf + Cc + Cs + Noncharacters + VariationSelector
	1068	- WhiteSpace - FFF9..FFFB (Annotation Characters)
1558	1069
1559		=end original
	1070	Any = Any code points (i.e. U+0000 to U+10FFFF)
	1071	Assigned = Any non-Cn code points (i.e. synonym for \P{Cn})
	1072	Unassigned = Synonym for \p{Cn}
	1073	ASCII = ASCII (i.e. U+0000 to U+007F)
1560	1074
1561		~~それこれらは最後の二つだけがマッチングします:~~
	1075	Common = Any character (or unassigned code point)
	1076	not explicitly assigned to a script
1562	1077
1563		~~"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}"~~ =~ ~~/\p{~~s~~c=Hiragana}~~ ~~# N~~o ~~match~~
	1078	=item Use of "Is" Prefix
1564		"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{sc=Katakana} # No match
1565		"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Hiragana} # Matches
1566		"\N{KATAKANA-HIRAGANA DOUBLE HYPHEN}" =~ /\p{scx=Katakana} # Matches
1567	1079
1568	1080	=begin original
1569	1081
1570		~~C<S~~cript~~_Extens~~i~~ons>~~ is thus an improve~~d C<Sc~~ript>, i~~n which th~~ere are
	1082	For backward compatibility (with Perl 5.6), all properties mentioned
1571		f~~ewe~~r cha~~racters~~ ~~in t~~he C<~~Common~~> scrip~~t, a~~nd correspo~~ndingly~~ mor~~e in~~
	1083	so far may have C<Is> prepended to their name, so C<\P{IsLu}>, for
1572		~~oth~~e~~r scri~~p~~ts. It is n~~ew i~~n Unicode ver~~s~~ion~~ ~~6.0,~~ and its ~~data are likely~~
	1084	example, is equal to C<\P{Lu}>.
1573		to change significantly in later releases, as things get sorted out.
1574		New code should probably be using C<Script_Extensions> and not plain
1575		C<Script>. If you compile perl with a Unicode release that doesn't have
1576		C<Script_Extensions>, the single form Perl extensions will instead refer
1577		to the plain C<Script> property. If you compile with a version of
1578		Unicode that doesn't have the C<Script> property, these extensions will
1579		not be defined at all.
1580	1085
1581	1086	=end original
1582	1087
1583		このように C<~~Script_Exten~~s~~ions~~> ~~は改良された C<Script> で、~~
	1088	(Perl 5.6 との)後方互換性のため、すべての特性はその名前の前に C<Is> を
1584		~~C<Common> 用字にある文字はより少な~~く~~、それに応じて他の用字の文字は~~
	1089	置くことができます。
1585		~~より多くな~~っています。
	1090	したがって、C<\P{IsLu}> は C<\P{Lu}> と等価です。
1586		これは Unicode バージョン 6.0 からの新しいもので、そのデータは
1587		将来のリリースで整理されて大きく変更される可能性が高いです。
1588		新しいコードはおそらく、単なる C<Script> ではなく
1589		C<Script_Extensions> を使うべきです。
1590		C<Script_Extensions> がない Unicode のリリースで perl を
1591		コンパイルしている場合、単一形式の Perl 拡張は代わりに
1592		単なる C<Script> 特性を参照します。
1593		C<Script> 特性がないバージョンでコンパイルしている場合、
1594		これらの拡張は何も定義されません。
1595	1091
1596		=~~beg~~in ~~origina~~l
	1092	=item Blocks
1597	1093
1598		(Actually, besides C<Common>, the C<Inherited> script, contains
1599		characters that are used in multiple scripts. These are modifier
1600		characters which inherit the script value
1601		of the controlling character. Some of these are used in many scripts,
1602		and so go into C<Inherited> in both C<Script> and C<Script_Extensions>.
1603		Others are used in just a few scripts, so are in C<Inherited> in
1604		C<Script>, but not in C<Script_Extensions>.)
1605
1606		=end original
1607
1608		(実際、C<Common> を除くと、C<Inherited> 用字は複数の用字で使われている
1609		文字を含みます。
1610		制御文字の用字の値を継承する文字のための修飾文字です。
1611		その一部は多くの用字で使われているので、
1612		C<Script> と C<Script_Extensions> の両方の中に
1613		C<Inherited> が入っています。
1614		その他のものはいくつかの用字でのみ使われているので、
1615		C<Script> の C<Inherited> にはありますが、
1616		C<Script_Extensions> にはありません。)
1617
1618	1094	=begin original
1619	1095
1620		It is worth stressing that there are several different sets of digits in
1621		Unicode that are equivalent to 0-9 and are matchable by C<\d> in a
1622		regular expression. If they are used in a single language only, they
1623		are in that language's C<Script> and C<Script_Extensions>. If they are
1624		used in more than one script, they will be in C<sc=Common>, but only
1625		if they are used in many scripts should they be in C<scx=Common>.
1626
1627		=end original
1628
1629		Unicode には、0-9 と等価で、正規表現内で C<\d> にマッチングできる数字の
1630		集合がいくつかあることは強調する価値があります。
1631		それらが単一の言語だけで使われた場合、それらはその言語の
1632		C<Script> と C<Script_Extensions> です。
1633		これらが複数の用字で使われている場合、
1634		それらは C<sc=Common> の中にありますが、
1635		C<scx=Common> にあるべき多くの用字で使われている場合のみです。
1636
1637		=begin original
1638
1639		The explanation above has omitted some detail; refer to UAX#24 "Unicode
1640		Script Property": L<https://www.unicode.org/reports/tr24>.
1641
1642		=end original
1643
1644		前述の説明は一部の詳細を省略しています;
1645		UAX#24 "Unicode Script Property": L<https://www.unicode.org/reports/tr24> を
1646		参照してください。
1647
1648		=begin original
1649
1650		A complete list of scripts and their shortcuts is in L<perluniprops>.
1651
1652		=end original
1653
1654		用字とその省略形の完全な一覧は L<perluniprops> にあります。
1655
1656		=head3 B<Use of the C<"Is"> Prefix>
1657
1658		(B<C<"Is"> 接頭辞の使用>)
1659
1660		=begin original
1661
1662		For backward compatibility (with ancient Perl 5.6), all properties writable
1663		without using the compound form mentioned
1664		so far may have C<Is> or C<Is_> prepended to their name, so C<\P{Is_Lu}>, for
1665		example, is equal to C<\P{Lu}>, and C<\p{IsScript:Arabic}> is equal to
1666		C<\p{Arabic}>.
1667
1668		=end original
1669
1670		(とても古い Perl 5.6 との)後方互換性のため、
1671		これまでのところ記述している複合形式を使うことなく書き込み可能な
1672		すべての特性はその名前の前に C<Is>
1673		または C<Is_> を置くことができます; したがって、C<\P{Is_Lu}> は C<\P{Lu}> と
1674		等価で、C<\p{IsScript:Arabic}> は C<\p{Arabic}> と等価です。
1675
1676		=head3 B<Blocks>
1677
1678		(B<ブロック>)
1679
1680		=begin original
1681
1682	1096	In addition to B<scripts>, Unicode also defines B<blocks> of
1683	1097	characters. The difference between scripts and blocks is that the
1684	1098	concept of scripts is closer to natural languages, while the concept
1685		of blocks is more of an artificial grouping based on groups of ~~Unicode~~
	1099	of blocks is more of an artificial grouping based on groups of 256
1686		characters ~~with consecutive ordinal values~~. For example, the C<"Basic Latin">
	1100	Unicode characters. For example, the C<Latin> script contains letters
1687		block is all the characters whose o~~rdinal~~s are ~~between 0 and 127, inclusive; in~~
	1101	from many blocks but does not contain all the characters from those
1688		ot~~her~~ ~~wor~~ds, the ~~ASCII cha~~r~~acters.~~ The ~~C<"L~~a~~tin"> scri~~pt contains some let~~ter~~s
	1102	blocks. It does not, for example, contain digits, because digits are
1689		from this as ~~well~~ as s~~evera~~l ~~othe~~r blocks, like ~~C<"Latin-1 Supplement">,~~
	1103	shared across many scripts. Digits and similar groups, like
1690		~~C<"Lati~~n ~~Extended-A">, I<et~~c~~.>, b~~ut i~~t d~~oes not ~~cont~~ain all the cha~~ract~~ers from
	1104	punctuation, are in a category called C<Common>.
1691		those blocks. It does not, for example, contain the digits 0-9, because
1692		those digits are shared across many scripts, and hence are in the
1693		C<Common> script.
1694	1105
1695	1106	=end original
1696	1107
1697	1108	B<用字> に加え、Unicode では文字の B<ブロック> を定義しています。
1698	1109	用字とブロックの違いは、用字のコンセプトが自然言語に
1699		密着したものであるのに対して、ブロックのコンセプトは~~連続した番号を持つ~~
	1110	密着したものであるのに対して、ブロックのコンセプトは 256 の
1700	1111	Unicode 文字のグループに基づいたより人工的なグループ分けであることです。
1701		たとえば、C<~~"Basic~~ Latin"> ブロック~~は番号 0~~ から ~~127 まで~~の~~全ての~~文字です;
	1112	たとえば、C<Latin> 用字は多くのブロックからの文字を含んでいますが、
1702		言い換えると ASCII 文字です。
1703		C<"Latin"> 用字は、このブロックの文字と、C<"Latin-1 Supplement">,
1704		C<"Latin Extended-A"> などのその他のいくつかのブロックの文字を含んでいますが、
1705	1113	それらのブロックのすべての文字を含んではいません。
1706		例を挙げると、数字 ~~0-9~~ は多くの用字を越えて共有されているので、
	1114	例を挙げると、数字は多くの用字を越えて共有されているので、
1707		(Latin 用字は)数字を含ま~~ないので、これらは C<Common> 用字にあります~~。
	1115	(Latin 用字は)数字を含みません。
	1116	数字と、句読点のような同様のグループは C<Common> と呼ばれる
	1117	カテゴリにあります。
1708	1118
1709	1119	=begin original
1710	1120
1711		For more about scripts ~~versus blocks~~, see UAX#24 "~~Unicode~~ Script ~~Prop~~e~~rty~~":
	1121	For more about scripts, see the UAX#24 "Script Names":
1712		L<https://www.unicode.org/reports/tr24>
1713	1122
1714	1123	=end original
1715	1124
1716		用字~~とブロックに違いに関する~~詳~~細につ~~いては、
	1125	用字のより詳しい情報は UTR #24 "Script Names" を参照してください:
1717		UAX#24 "Unicode Script Property"
1718		L<https://www.unicode.org/reports/tr24> を参照してください。
1719	1126
1720		~~=begin~~ orig~~inal~~
	1127	http://www.unicode.org/reports/tr24/
1721	1128
1722		The C<Script_Extensions> or C<Script> properties are likely to be the
1723		ones you want to use when processing
1724		natural language; the C<Block> property may occasionally be useful in working
1725		with the nuts and bolts of Unicode.
1726
1727		=end original
1728
1729		C<Script_Extensions> や C<Script> 特性は自然言語を処理するときにおそらく
1730		使いたいと思うようなものです;
1731		C<Block> 特性は Unicode の基本的な部分で動作させるのに時々有用です。
1732
1733	1129	=begin original
1734	1130
1735		Block nam~~es a~~re ma~~tched in the c~~o~~mpo~~und ~~form,~~ l~~ike C<\p{Bl~~ock~~: Arrow~~s}> or
	1131	For more about blocks, see:
1736		C<\p{Blk=Hebrew}>. Unlike most other properties, only a few block names have a
1737		Unicode-defined short name.
1738	1132
1739	1133	=end original
1740	1134
1741		ブロック~~名は C<\p{Block: Arrows}> や C<\p{Blk=Hebrew}>~~ のような
	1135	ブロックについてのより詳しい情報は:
1742		復号形式でマッチングします。
1743		その他のほとんどの特性と違って、いくつかのブロック名だけが Unicode が
1744		定義した短い名前を持ちます。
1745	1136
1746		~~=begin~~ original
	1137	http://www.unicode.org/Public/UNIDATA/Blocks.txt
1747	1138
1748		Perl also defines single form synonyms for the block property in cases
1749		where these do not conflict with something else. But don't use any of
1750		these, because they are unstable. Since these are Perl extensions, they
1751		are subordinate to official Unicode property names; Unicode doesn't know
1752		nor care about Perl's extensions. It may happen that a name that
1753		currently means the Perl extension will later be changed without warning
1754		to mean a different Unicode property in a future version of the perl
1755		interpreter that uses a later Unicode release, and your code would no
1756		longer work. The extensions are mentioned here for completeness: Take
1757		the block name and prefix it with one of: C<In> (for example
1758		C<\p{Blk=Arrows}> can currently be written as C<\p{In_Arrows}>); or
1759		sometimes C<Is> (like C<\p{Is_Arrows}>); or sometimes no prefix at all
1760		(C<\p{Arrows}>). As of this writing (Unicode 9.0) there are no
1761		conflicts with using the C<In_> prefix, but there are plenty with the
1762		other two forms. For example, C<\p{Is_Hebrew}> and C<\p{Hebrew}> mean
1763		C<\p{Script_Extensions=Hebrew}> which is NOT the same thing as
1764		C<\p{Blk=Hebrew}>. Our
1765		advice used to be to use the C<In_> prefix as a single form way of
1766		specifying a block. But Unicode 8.0 added properties whose names begin
1767		with C<In>, and it's now clear that it's only luck that's so far
1768		prevented a conflict. Using C<In> is only marginally less typing than
1769		C<Blk:>, and the latter's meaning is clearer anyway, and guaranteed to
1770		never conflict. So don't take chances. Use C<\p{Blk=foo}> for new
1771		code. And be sure that block is what you really really want to do. In
1772		most cases scripts are what you want instead.
1773
1774		=end original
1775
1776		Perl はまた、他のものと競合しない場合には、
1777		ブロック特性に対して単一形式の同義語を定義します。
1778		しかし、これらは不安定なので、使わないでください。
1779		これらは Perl の拡張なので、公式の Unicode 特性名の下位にあたります;
1780		Unicode は Perl の拡張を認識しませんし、気にしません。
1781		現在は Perl 拡張を意味する名前が、将来のUnicodeリリースを使用する
1782		perlインタプリタの将来のバージョンでは、警告なしに別の Unicode 特性を
1783		意味するように変更され、コードが動作しなくなる可能性があります。
1784		ここでは、完全性のために拡張について説明します:
1785		ロック名の前に次のいずれかの接頭辞を付けます:
1786		C<In> (例えば C<\p{Blk=Arrows}> は現在 C<\p{In_Arrows}> と書けます);
1787		時々 C<Is> (C<\p{Is_Arrows}> のように);
1788		時々全く接頭辞なし (C<\p{Arrows}>)。
1789		この文書の執筆時点 (Unicode 9.0) では、C<In_> 接頭辞の使用と
1790		競合することはありませんが、他の二つの形式では多くの競合があります。
1791		例えば、C<\p{Is_Hebrew}> および C<\p{Hebrew}> は、
1792		C<\p{Script_Extensions=Hebrew}> を意味しますが、C<\p{Blk=Hebrew}> と
1793		同じものでは「ありません」。
1794		以前勧めていたのは、ブロックを指定する単一形式として
1795		C<In_> 接頭辞を使うことでした。
1796		しかし Unicode 8.0 では、名前が C<In> で始まる特性が追加されたため、
1797		今まで競合を回避できていたのは単に幸運なだけだったことが
1798		明らかになりました。
1799		C<In> を使用することは、C<Blk:> よりもわずかにタイプ数が少ないだけで、
1800		とにかく後者の意味はより明確で、決して衝突しないことが保証されます。
1801		だから、危険を冒さないでください。
1802		新しいコードには C<\p{Blk=foo}> を使ってください。
1803		そして、これが本当に本当にやりたいことであることを確認してください。
1804		ほとんどの場合、本当に必要なものはそうではなく用字です。
1805
1806	1139	=begin original
1807	1140
1808		~~A comp~~l~~ete list~~ o~~f blo~~cks is in ~~L<p~~e~~rlu~~niprop~~s>.~~
	1141	Block names are given with the C<In> prefix. For example, the
	1142	Katakana block is referenced via C<\p{InKatakana}>. The C<In>
	1143	prefix may be omitted if there is no naming conflict with a script
	1144	or any other property, but it is recommended that C<In> always be used
	1145	for block tests to avoid confusion.
1809	1146
1810	1147	=end original
1811	1148
1812		ブロック~~の完全な一覧~~は L<~~perlu~~n~~iprops~~> にあります。
	1149	ブロック名は C<In> 接頭辞とともに与えられます。
	1150	たとえば、カタカナブロックは C<\p{InKatakana}> として参照されます。
	1151	C<In> 接頭辞は用字や他のプロパティと衝突しなければ省略することも
	1152	可能ですが、混乱のないブロックテストのために常に C<In> を使うことを
	1153	お勧めします。
1813	1154
1814		=head3 B<Other Properties>
1815
1816		(B<その他の特性>)
1817
1818	1155	=begin original
1819	1156
1820		There ~~are~~ many more ~~propertie~~s tha~~n the ve~~r~~y basic on~~es described ~~here.~~
	1157	These block names are supported:
1821		A complete list is in L<perluniprops>.
1822	1158
1823	1159	=end original
1824	1160
1825		~~ここで記述したとても基本的なも~~の~~よりもとても多くの特性~~があります。
	1161	以下のブロック名がサポートされています:
1826		完全な一覧は L<perluniprops> です。
1827	1162
1828		=begin or~~iginal~~
	1163	InAegeanNumbers
	1164	InAlphabeticPresentationForms
	1165	InAncientGreekMusicalNotation
	1166	InAncientGreekNumbers
	1167	InArabic
	1168	InArabicPresentationFormsA
	1169	InArabicPresentationFormsB
	1170	InArabicSupplement
	1171	InArmenian
	1172	InArrows
	1173	InBalinese
	1174	InBasicLatin
	1175	InBengali
	1176	InBlockElements
	1177	InBopomofo
	1178	InBopomofoExtended
	1179	InBoxDrawing
	1180	InBraillePatterns
	1181	InBuginese
	1182	InBuhid
	1183	InByzantineMusicalSymbols
	1184	InCJKCompatibility
	1185	InCJKCompatibilityForms
	1186	InCJKCompatibilityIdeographs
	1187	InCJKCompatibilityIdeographsSupplement
	1188	InCJKRadicalsSupplement
	1189	InCJKStrokes
	1190	InCJKSymbolsAndPunctuation
	1191	InCJKUnifiedIdeographs
	1192	InCJKUnifiedIdeographsExtensionA
	1193	InCJKUnifiedIdeographsExtensionB
	1194	InCherokee
	1195	InCombiningDiacriticalMarks
	1196	InCombiningDiacriticalMarksSupplement
	1197	InCombiningDiacriticalMarksforSymbols
	1198	InCombiningHalfMarks
	1199	InControlPictures
	1200	InCoptic
	1201	InCountingRodNumerals
	1202	InCuneiform
	1203	InCuneiformNumbersAndPunctuation
	1204	InCurrencySymbols
	1205	InCypriotSyllabary
	1206	InCyrillic
	1207	InCyrillicSupplement
	1208	InDeseret
	1209	InDevanagari
	1210	InDingbats
	1211	InEnclosedAlphanumerics
	1212	InEnclosedCJKLettersAndMonths
	1213	InEthiopic
	1214	InEthiopicExtended
	1215	InEthiopicSupplement
	1216	InGeneralPunctuation
	1217	InGeometricShapes
	1218	InGeorgian
	1219	InGeorgianSupplement
	1220	InGlagolitic
	1221	InGothic
	1222	InGreekExtended
	1223	InGreekAndCoptic
	1224	InGujarati
	1225	InGurmukhi
	1226	InHalfwidthAndFullwidthForms
	1227	InHangulCompatibilityJamo
	1228	InHangulJamo
	1229	InHangulSyllables
	1230	InHanunoo
	1231	InHebrew
	1232	InHighPrivateUseSurrogates
	1233	InHighSurrogates
	1234	InHiragana
	1235	InIPAExtensions
	1236	InIdeographicDescriptionCharacters
	1237	InKanbun
	1238	InKangxiRadicals
	1239	InKannada
	1240	InKatakana
	1241	InKatakanaPhoneticExtensions
	1242	InKharoshthi
	1243	InKhmer
	1244	InKhmerSymbols
	1245	InLao
	1246	InLatin1Supplement
	1247	InLatinExtendedA
	1248	InLatinExtendedAdditional
	1249	InLatinExtendedB
	1250	InLatinExtendedC
	1251	InLatinExtendedD
	1252	InLetterlikeSymbols
	1253	InLimbu
	1254	InLinearBIdeograms
	1255	InLinearBSyllabary
	1256	InLowSurrogates
	1257	InMalayalam
	1258	InMathematicalAlphanumericSymbols
	1259	InMathematicalOperators
	1260	InMiscellaneousMathematicalSymbolsA
	1261	InMiscellaneousMathematicalSymbolsB
	1262	InMiscellaneousSymbols
	1263	InMiscellaneousSymbolsAndArrows
	1264	InMiscellaneousTechnical
	1265	InModifierToneLetters
	1266	InMongolian
	1267	InMusicalSymbols
	1268	InMyanmar
	1269	InNKo
	1270	InNewTaiLue
	1271	InNumberForms
	1272	InOgham
	1273	InOldItalic
	1274	InOldPersian
	1275	InOpticalCharacterRecognition
	1276	InOriya
	1277	InOsmanya
	1278	InPhagspa
	1279	InPhoenician
	1280	InPhoneticExtensions
	1281	InPhoneticExtensionsSupplement
	1282	InPrivateUseArea
	1283	InRunic
	1284	InShavian
	1285	InSinhala
	1286	InSmallFormVariants
	1287	InSpacingModifierLetters
	1288	InSpecials
	1289	InSuperscriptsAndSubscripts
	1290	InSupplementalArrowsA
	1291	InSupplementalArrowsB
	1292	InSupplementalMathematicalOperators
	1293	InSupplementalPunctuation
	1294	InSupplementaryPrivateUseAreaA
	1295	InSupplementaryPrivateUseAreaB
	1296	InSylotiNagri
	1297	InSyriac
	1298	InTagalog
	1299	InTagbanwa
	1300	InTags
	1301	InTaiLe
	1302	InTaiXuanJingSymbols
	1303	InTamil
	1304	InTelugu
	1305	InThaana
	1306	InThai
	1307	InTibetan
	1308	InTifinagh
	1309	InUgaritic
	1310	InUnifiedCanadianAboriginalSyllabics
	1311	InVariationSelectors
	1312	InVariationSelectorsSupplement
	1313	InVerticalForms
	1314	InYiRadicals
	1315	InYiSyllables
	1316	InYijingHexagramSymbols
1829	1317
1830		Unicode defines all its properties in the compound form, so all single-form
1831		properties are Perl extensions. Most of these are just synonyms for the
1832		Unicode ones, but some are genuine extensions, including several that are in
1833		the compound form. And quite a few of these are actually recommended by Unicode
1834		(in L<https://www.unicode.org/reports/tr18>).
1835
1836		=end original
1837
1838		Unicode は、複合形式ですべての特性を定義するので、
1839		単一形式の特性はすべて Perl 拡張になります。
1840		これらのほとんどは Unicode のものの同義語にすぎませんが、いくつかは
1841		本物の拡張であり、複合形式のものもあります。
1842		そしてこれらのいくつかは実際に Unicode
1843		(L<https://www.unicode.org/reports/tr18>)で推奨されています。
1844
1845		=begin original
1846
1847		This section gives some details on all extensions that aren't just
1848		synonyms for compound-form Unicode properties
1849		(for those properties, you'll have to refer to the
1850		L<Unicode Standard\|https://www.unicode.org/reports/tr44>.
1851
1852		=end original
1853
1854		この節では、単に複合形式の Unicode 特性の同義語ではないすべての
1855		拡張機能について詳しく説明します (これらの特性については、
1856		L<Unicode Standard https://www.unicode.org/reports/tr44> を
1857		参照してください)。
1858
1859		=over
1860
1861		=item B<C<\p{All}>>
1862
1863		=begin original
1864
1865		This matches every possible code point. It is equivalent to C<qr/./s>.
1866		Unlike all the other non-user-defined C<\p{}> property matches, no
1867		warning is ever generated if this is property is matched against a
1868		non-Unicode code point (see L</Beyond Unicode code points> below).
1869
1870		=end original
1871
1872		これは全ての符号位置にマッチングします。
1873		これは C<qr/./s> と等価です。
1874		その他全てのユーザー定義でない C<\p{}> 特性のマッチングと異なり、
1875		この特性はたとえ非 Unicode 符号位置に対してマッチングしても警告は
1876		発生しません (後述する L</Beyond Unicode code points> 参照)。
1877
1878		=item B<C<\p{Alnum}>>
1879
1880		=begin original
1881
1882		This matches any C<\p{Alphabetic}> or C<\p{Decimal_Number}> character.
1883
1884		=end original
1885
1886		これは任意の C<\p{Alphabetic}> または C<\p{Decimal_Number}> 文字に
1887		マッチングします。
1888
1889		=item B<C<\p{Any}>>
1890
1891		=begin original
1892
1893		This matches any of the 1_114_112 Unicode code points. It is a synonym
1894		for C<\p{Unicode}>.
1895
1896		=end original
1897
1898		これは任意の 1_114_112 Unicode 符号位置にマッチングします。
1899		これは C<\p{Unicode}> の同義語です。
1900
1901		=item B<C<\p{ASCII}>>
1902
1903		=begin original
1904
1905		This matches any of the 128 characters in the US-ASCII character set,
1906		which is a subset of Unicode.
1907
1908		=end original
1909
1910		これは、Unicode のサブセットである、US-ASCII 文字集合の 128 文字に
1911		マッチングします。
1912
1913		=item B<C<\p{Assigned}>>
1914
1915		=begin original
1916
1917		This matches any assigned code point; that is, any code point whose L<general
1918		category\|/General_Category> is not C<Unassigned> (or equivalently, not C<Cn>).
1919
1920		=end original
1921
1922		これは任意の割り当てられた符号位置にマッチングします; つまり、
1923		L<general category\|/General_Category> が
1924		C<Unassigned> ではない(または同等に C<Cn> ではない) 符号位置です。
1925
1926		=item B<C<\p{Blank}>>
1927
1928		=begin original
1929
1930		This is the same as C<\h> and C<\p{HorizSpace}>: A character that changes the
1931		spacing horizontally.
1932
1933		=end original
1934
1935		これは C<\h> および C<\p{HorizSpace}> と同じです: スペースを垂直に変更する
1936		文字です。
1937
1938		=item B<C<\p{Decomposition_Type: Non_Canonical}>> (Short: C<\p{Dt=NonCanon}>)
1939
1940		=begin original
1941
1942		Matches a character that has any of the non-canonical decomposition
1943		types. Canonical decompositions are introduced in the
1944		L</Extended Grapheme Clusters (Logical characters)> section above.
1945		However, many more characters have a different type of decomposition,
1946		generically called "compatible" decompositions, or "non-canonical". The
1947		sequences that form these decompositions are not considered canonically
1948		equivalent to the pre-composed character. An example is the
1949		C<"SUPERSCRIPT ONE">. It is somewhat like a regular digit 1, but not
1950		exactly; its decomposition into the digit 1 is called a "compatible"
1951		decomposition, specifically a "super" (for "superscript") decomposition.
1952		There are several such compatibility decompositions (see
1953		L<https://www.unicode.org/reports/tr44>). S<C<\p{Dt: Non_Canon}>> is a
1954		Perl extension that uses just one name to refer to the union of all of
1955		them.
1956
1957		=end original
1958
1959		非正準分解型の文字にマッチングします。
1960		正準分解は前述の L</Extended Grapheme Clusters (Logical characters)> 節で
1961		説明しました。
1962		しかし、多くの文字は異なる種類の分解を持ち、
1963		一般的に「互換」分解あるいは「非正準」分解と呼ばれます。
1964		これらの分解を形成する並びは合成済文字への正準等価ではないと考えられます。
1965		例えば、C<"SUPERSCRIPT ONE"> です。
1966		これは普通の数字 1 のようなものですが、正確ではありません;
1967		これの数字 1 への分解は
1968		「互換」分解と呼ばれ、特に「スーパー」("superscript" から)分解と呼ばれます。
1969		このような互換分解(L<https://www.unicode.org/reports/tr44>を参照)は
1970		いくつかあります。
1971		S<C<\p{Dt: Non_Canon}>> は、これら全ての和集合を一つの名前で参照するために
1972		使う Perl 拡張です。
1973
1974		=begin original
1975
1976		Most Unicode characters don't have a decomposition, so their
1977		decomposition type is C<"None">. Hence, C<Non_Canonical> is equivalent
1978		to
1979
1980		=end original
1981
1982		ほとんどの Unicode 文字は分解を持たないので、それらの分解型は C<"None"> です。
1983		従って、C<Non_Canonical> は次と等価です:
1984
1985		qr/(?[ \P{DT=Canonical} - \p{DT=None} ])/
1986
1987		=begin original
1988
1989		(Note that one of the non-canonical decompositions is named "compat",
1990		which could perhaps have been better named "miscellaneous". It includes
1991		just the things that Unicode couldn't figure out a better generic name
1992		for.)
1993
1994		=end original
1995
1996		(非正準分解の一つは "compat" という名前で、おそらく
1997		"miscellaneous" という名前の方がよかったものです。
1998		これは、Unicode がよりよい名前を見つけられなかったものを
1999		含んでいます。)
2000
2001		=item B<C<\p{Graph}>>
2002
2003		=begin original
2004
2005		Matches any character that is graphic. Theoretically, this means a character
2006		that on a printer would cause ink to be used.
2007
2008		=end original
2009
2010		任意の図形文字にマッチングします。
2011		理論的には、これはプリンタがインクを使うことになる文字を意味します。
2012
2013		=item B<C<\p{HorizSpace}>>
2014
2015		=begin original
2016
2017		This is the same as C<\h> and C<\p{Blank}>: a character that changes the
2018		spacing horizontally.
2019
2020		=end original
2021
2022		これは C<\h> や C<\p{Blank}> と同じです:
2023		スペースを垂直に変更するものです。
2024
2025		=item B<C<\p{In=*}>>
2026
2027		=begin original
2028
2029		This is a synonym for C<\p{Present_In=*}>
2030
2031		=end original
2032
2033		これは C<\p{Present_In=*}> の同義語です。
2034
2035		=item B<C<\p{PerlSpace}>>
2036
2037		=begin original
2038
2039		This is the same as C<\s>, restricted to ASCII, namely C<S<[ \f\n\r\t]>>
2040		and starting in Perl v5.18, a vertical tab.
2041
2042		=end original
2043
2044		これは C<\s> と同じで、ASCII に制限されます; つまり C<S<[ \f\n\r\t]>>、
2045		および、Perl v5.18 から垂直タブ、です。
2046
2047		=begin original
2048
2049		Mnemonic: Perl's (original) space
2050
2051		=end original
2052
2053		記憶法: Perl の (元々の) スペース
2054
2055		=item B<C<\p{PerlWord}>>
2056
2057		=begin original
2058
2059		This is the same as C<\w>, restricted to ASCII, namely C<[A-Za-z0-9_]>
2060
2061		=end original
2062
2063		これは C<\w> と同じで ASCII に制限されます; つまり C<[A-Za-z0-9_]> です。
2064
2065		=begin original
2066
2067		Mnemonic: Perl's (original) word.
2068
2069		=end original
2070
2071		記憶法: Perl の (元々の) 単語。
2072
2073		=item B<C<\p{Posix...}>>
2074
2075		=begin original
2076
2077		There are several of these, which are equivalents, using the C<\p{}>
2078		notation, for Posix classes and are described in
2079		L<perlrecharclass/POSIX Character Classes>.
2080
2081		=end original
2082
2083		これらのいくつかには Posix クラスのための C<\p{}> 記法を使った
2084		等価物があります; これらは
2085		L<perlrecharclass/POSIX Character Classes> に記述されています。
2086
2087		=item B<C<\p{Present_In: }>> (Short: C<\p{In=}>)
2088
2089		=begin original
2090
2091		This property is used when you need to know in what Unicode version(s) a
2092		character is.
2093
2094		=end original
2095
2096		この特性は、この文字の Unicode バージョンを知る必要があるときに使われます。
2097
2098		=begin original
2099
2100		The "*" above stands for some Unicode version number, such as
2101		C<1.1> or C<12.0>; or the "*" can also be C<Unassigned>. This property will
2102		match the code points whose final disposition has been settled as of the
2103		Unicode release given by the version number; C<\p{Present_In: Unassigned}>
2104		will match those code points whose meaning has yet to be assigned.
2105
2106		=end original
2107
2108		前述の "*" は、C<1.1> や C<12.0> のような Unicode バージョン番号です;
2109		あるいは "*" は C<Unassigned> も取ります。
2110		この特性は、最終的な配置がバージョン番号によって指定された Unicode リリースに
2111		設定された符号位置にマッチングします;
2112		C<\p{Present_In: Unassigned}> は、まだ意味が割り当てられていない符号位置に
2113		マッチングします。
2114
2115		=begin original
2116
2117		For example, C<U+0041> C<"LATIN CAPITAL LETTER A"> was present in the very first
2118		Unicode release available, which is C<1.1>, so this property is true for all
2119		valid "*" versions. On the other hand, C<U+1EFF> was not assigned until version
2120		5.1 when it became C<"LATIN SMALL LETTER Y WITH LOOP">, so the only "*" that
2121		would match it are 5.1, 5.2, and later.
2122
2123		=end original
2124
2125		たとえば、C<U+0041> C<"LATIN CAPITAL LETTER A"> は、使用可能な
2126		最初の Unicode リリースである C<1.1> から存在しているので、
2127		この特性はすべての有効な "*" バージョンに対して真です。
2128		一方、C<U+1EFF> は、これが C<"LATIN SMALL LETTER Y WITH LOOP"> になった
2129		バージョン 5.1 まで割り当てられていなかったので、
2130		これにマッチングする "*" は 5.1, 5.2, およびそれ以降です。
2131
2132		=begin original
2133
2134		Unicode furnishes the C<Age> property from which this is derived. The problem
2135		with Age is that a strict interpretation of it (which Perl takes) has it
2136		matching the precise release a code point's meaning is introduced in. Thus
2137		C<U+0041> would match only 1.1; and C<U+1EFF> only 5.1. This is not usually what
2138		you want.
2139
2140		=end original
2141
2142		Unicodeは、C<Age> 特性を、これから派生したものから提供します。
2143		Age の問題は、(Perl が行う) 厳密な解釈によって、符号位置の
2144		意味が導入された正確なリリースと一致することです。
2145		したがって、C<U+0041> は、1.1 のみにマッチングし、C<U+1eff> は 5.1 とのみ
2146		マッチングします。
2147		これは通常、あなたが望むものではありません。
2148
2149		=begin original
2150
2151		Some non-Perl implementations of the Age property may change its meaning to be
2152		the same as the Perl C<Present_In> property; just be aware of that.
2153
2154		=end original
2155
2156		Age 特性の非 Perl 実装の中には、Perl の C<Present_In> 特性と
2157		同じ意味を持つように変更しているものがあります; 知っておいてください。
2158
2159		=begin original
2160
2161		Another confusion with both these properties is that the definition is not
2162		that the code point has been I<assigned>, but that the meaning of the code point
2163		has been I<determined>. This is because 66 code points will always be
2164		unassigned, and so the C<Age> for them is the Unicode version in which the decision
2165		to make them so was made. For example, C<U+FDD0> is to be permanently
2166		unassigned to a character, and the decision to do that was made in version 3.1,
2167		so C<\p{Age=3.1}> matches this character, as also does C<\p{Present_In: 3.1}> and up.
2168
2169		=end original
2170
2171		これらの特性に関するもう一つの混乱は、定義は
2172		この符号位置が I<割り当てられた> ということではなく、
2173		符号位置の意味が I<決定された> ということです。
2174		これは、66 の符号位置が常に割り当てられなくなり、
2175		それらに対する C<Age> はそう決定された Unicode のバージョンだからです。
2176		たとえば、C<U+FDD0> は永続的に文字が割り当てられないことなっていて、
2177		この決定はバージョン 3.1 で行われたので、
2178		したがって C<\p{Age=3.1}> はこの文字にマッチングし、
2179		C<\p{Present_In:3.1}> 以上もマッチングします。
2180
2181		=item B<C<\p{Print}>>
2182
2183		=begin original
2184
2185		This matches any character that is graphical or blank, except controls.
2186
2187		=end original
2188
2189		制御文字を除く、任意の図形文字か空白にマッチングします。
2190
2191		=item B<C<\p{SpacePerl}>>
2192
2193		=begin original
2194
2195		This is the same as C<\s>, including beyond ASCII.
2196
2197		=end original
2198
2199		これは C<\s> は同様で、ASCII の範囲外を含みます。
2200
2201		=begin original
2202
2203		Mnemonic: Space, as modified by Perl. (It doesn't include the vertical tab
2204		until v5.18, which both the Posix standard and Unicode consider white space.)
2205
2206		=end original
2207
2208		記憶法: スペース、Perl によって修正。
2209		(これは、v5.18 までは、Posix 標準と Unicode の両方が空白と考える
2210		垂直タブを含みません。)
2211
2212		=item B<C<\p{Title}>> and B<C<\p{Titlecase}>>
2213
2214		(B<C<\p{Title}>> と B<C<\p{Titlecase}>>)
2215
2216		=begin original
2217
2218		Under case-sensitive matching, these both match the same code points as
2219		C<\p{General Category=Titlecase_Letter}> (C<\p{gc=lt}>). The difference
2220		is that under C</i> caseless matching, these match the same as
2221		C<\p{Cased}>, whereas C<\p{gc=lt}> matches C<\p{Cased_Letter>).
2222
2223		=end original
2224
2225		大文字小文字を無視するマッチングの下では、これらの両方は
2226		C<\p{General Category=Titlecase_Letter}> (C<\p{gc=lt}>) として
2227		同じ符号位置にマッチングします。
2228		違いは、C</i> 大文字小文字無視マッチングでは、
2229		これらのマッチングは C<\p{Cased}> と同じで、
2230		C<\p{gc=lt}> は C<\p{Cased_Letter>) にマッチングすると言うことです。
2231
2232		=item B<C<\p{Unicode}>>
2233
2234		=begin original
2235
2236		This matches any of the 1_114_112 Unicode code points.
2237		C<\p{Any}>.
2238
2239		=end original
2240
2241		これは任意の 1_114_112 Unicode 符号位置にマッチングします。
2242		これは C<\p{Any}> の同義語です。
2243
2244		=item B<C<\p{VertSpace}>>
2245
2246		=begin original
2247
2248		This is the same as C<\v>: A character that changes the spacing vertically.
2249
2250		=end original
2251
2252		これは C<\v> と同じです: 垂直の空白を変更する文字です。
2253
2254		=item B<C<\p{Word}>>
2255
2256		=begin original
2257
2258		This is the same as C<\w>, including over 100_000 characters beyond ASCII.
2259
2260		=end original
2261
2262		これは C<\w> と同じで、ASCII 範囲外の 100_000 を超える文字を含みます。
2263
2264		=item B<C<\p{XPosix...}>>
2265
2266		=begin original
2267
2268		There are several of these, which are the standard Posix classes
2269		extended to the full Unicode range. They are described in
2270		L<perlrecharclass/POSIX Character Classes>.
2271
2272		=end original
2273
2274		これらのいくつかには、完全な Unicode の範囲に拡張された標準 Posix クラスが
2275		あります; これらは
2276		L<perlrecharclass/POSIX Character Classes> に記述されています。
2277
2278	1318	=back
2279	1319
2280		=head2 Comparison of C<\N{...}> and C<\p{name=...}>
2281
2282		=begin original
2283
2284		Starting in Perl 5.32, you can specify a character by its name in
2285		regular expression patterns using C<\p{name=...}>. This is in addition
2286		to the longstanding method of using C<\N{...}>. The following
2287		summarizes the differences between these two:
2288
2289		=end original
2290
2291		Perl 5.32 から、C<\p{name=...}> を使って、正規表現パターン内の
2292		名前で文字を指定できます。
2293		これは、C<\N{...}> を使う長年の方法に追加されます。
2294		次に、この二つの違いをまとめます:
2295
2296		\N{...} \p{Name=...}
2297		can interpolate only with eval yes [1]
2298		custom names yes no [2]
2299		name aliases yes yes [3]
2300		named sequences yes yes [4]
2301		name value parsing exact Unicode loose [5]
2302
2303		=over
2304
2305		=item [1]
2306
2307		=begin original
2308
2309		The ability to interpolate means you can do something like
2310
2311		=end original
2312
2313		展開能力とは、次のようなことをして:
2314
2315		qr/\p{na=latin capital letter $which}/
2316
2317		=begin original
2318
2319		and specify C<$which> elsewhere.
2320
2321		=end original
2322
2323		C<$which> を別の場所で指定するということです。
2324
2325		=item [2]
2326
2327		=begin original
2328
2329		You can create your own names for characters, and override official
2330		ones when using C<\N{...}>. See L<charnames/CUSTOM ALIASES>.
2331
2332		=end original
2333
2334		文字のための独自の名前を作り、C<\N{...}> を使うときに公式のものを
2335		上書きできますう。
2336		L<charnames/CUSTOM ALIASES> を参照してください。
2337
2338		=item [3]
2339
2340		=begin original
2341
2342		Some characters have multiple names (synonyms).
2343
2344		=end original
2345
2346		一部の文字は複数の名前(同義語)を持ちます。
2347
2348		=item [4]
2349
2350		=begin original
2351
2352		Some particular sequences of characters are given a single name, in
2353		addition to their individual ones.
2354
2355		=end original
2356
2357		一部の特別な文字の並びは、個々の名前に加えて、単一の名前を
2358		与えられています。
2359
2360		=item [5]
2361
2362		=begin original
2363
2364		Exact name value matching means you have to specify case, hyphens,
2365		underscores, and spaces precisely in the name you want. Loose matching
2366		follows the Unicode rules
2367		L<https://www.unicode.org/reports/tr44/tr44-24.html#UAX44-LM2>,
2368		where these are mostly irrelevant. Except for a few outlier character
2369		names, these are the same rules as are already used for any other
2370		C<\p{...}> property.
2371
2372		=end original
2373
2374		正確な名前の値のマッチングとは、大文字と小文字、ハイフン、アンダースコア、
2375		およびスペースを正確に名前に指定する必要があることを意味します。
2376		緩いマッチングは Unicode 規則
2377		L<https://www.unicode.org/reports/tr44/tr44-24.html#UAX44-LM2> に従いますが、
2378		これらはほとんど無関係です。
2379		少数の特殊な文字名を除いて、これらは他の C<\p{...}> 特性に
2380		すでに使われている規則と同じです。
2381
2382		=back
2383
2384		=head2 Wildcards in Property Values
2385
2386		(特性値でのワイルドカード)
2387
2388		=begin original
2389
2390		Starting in Perl 5.30, it is possible to do something like this:
2391
2392		=end original
2393
2394		Perl 5.30 から、次のようなことが出来るようになりました:
2395
2396		qr!\p{numeric_value=/\A[0-5]\z/}!
2397
2398		=begin original
2399
2400		or, by abbreviating and adding C</x>,
2401
2402		=end original
2403
2404		あるいは、省略と C</x> の追加によって:
2405
2406		qr! \p{nv= /(?x) \A [0-5] \z / }!
2407
2408		=begin original
2409
2410		This matches all code points whose numeric value is one of 0, 1, 2, 3,
2411		4, or 5. This particular example could instead have been written as
2412
2413		=end original
2414
2415		これは値が 0, 1, 2, 3, 4, 5 のいずれかである全ての符号位置に
2416		マッチングします。
2417		この特定の例は代わりに次のように書くことも出来ます:
2418
2419		qr! \A [ \p{nv=0}\p{nv=1}\p{nv=2}\p{nv=3}\p{nv=4}\p{nv=5} ] \z !xx
2420
2421		=begin original
2422
2423		in earlier perls, so in this case this feature just makes things easier
2424		and shorter to write. If we hadn't included the C<\A> and C<\z>, these
2425		would have matched things like C<1E<sol>2> because that contains a 1 (as
2426		well as a 2). As written, it matches things like subscripts that have
2427		these numeric values. If we only wanted the decimal digits with those
2428		numeric values, we could say,
2429
2430		=end original
2431
2432		以前の perl でも書けます; 従ってこの例ではこの機能は単により簡単に
2433		短く書けると言うだけです。
2434		C<\A> と C<\z> を含めていないと、
2435		これらは C<1E<sol>2> のようなものにもマッチングします;
2436		(2 と同様) 1 を含んでいるからです。
2437		書かれているように、それはこれらの数値を持つ添字のようなものに
2438		マッチングします。
2439		もしそれらの数値を持つ 10 進数だけが欲しいのであれば、
2440		次のように書けます:
2441
2442		qr! (?[ \d & \p{nv=/[0-5]/ ]) }!x
2443
2444		=begin original
2445
2446		The C<\d> gets rid of needing to anchor the pattern, since it forces the
2447		result to only match C<[0-9]>, and the C<[0-5]> further restricts it.
2448
2449		=end original
2450
2451		C<\d> はパターンにアンカーをする必要性を取り除きます;
2452		これは C<[0-9]> のみにマッチングすることを強制し、
2453		C<[0-5]> はさらにそれを制限するからです。
2454
2455		=begin original
2456
2457		The text in the above examples enclosed between the C<"E<sol>">
2458		characters can be just about any regular expression. It is independent
2459		of the main pattern, so doesn't share any capturing groups, I<etc>. The
2460		delimiters for it must be ASCII punctuation, but it may NOT be
2461		delimited by C<"{">, nor C<"}"> nor contain a literal C<"}">, as that
2462		delimits the end of the enclosing C<\p{}>. Like any pattern, certain
2463		other delimiters are terminated by their mirror images. These are
2464		C<"(">, C<"[>", and C<"E<lt>">. If the delimiter is any of C<"-">,
2465		C<"_">, C<"+">, or C<"\">, or is the same delimiter as is used for the
2466		enclosing pattern, it must be preceded by a backslash escape, both
2467		fore and aft.
2468
2469		=end original
2470
2471		C<"E<sol>"> 文字で囲まれた前述の例のテキストは、任意の
2472		正規表現にすることができます。
2473		これはメインパターンから独立しているため、捕捉グループなどを
2474		共有しません。
2475		区切り文字は ASCII 句読点でなければなりませんが、C<"{"> や
2476		C<"}"> で区切られたり、リテラル C<"}"> を含んだりすることはできません;
2477		これは、囲まれた C<\p{}> の終わりを区切るためです。
2478		他のパターンと同様に、特定の区切り文字は鏡像で終了します。
2479		これらは C<"(">, C<"[>", C<"E<lt>"> です。
2480		区切り文字が C<"-">, C<"_">, C<"+">, C<"\"> のいずれかである、
2481		または囲まれたパターンに使用されている区切り文字と同じ場合、
2482		前後に逆スラッシュエスケープを付けなければなりません。
2483
2484		=begin original
2485
2486		Beware of using C<"$"> to indicate to match the end of the string. It
2487		can too easily be interpreted as being a punctuation variable, like
2488		C<$/>.
2489
2490		=end original
2491
2492		文字列の末尾を示すのに C<"$"> を使う場合は注意してください。
2493		これはとても簡単に C<$/> のような句読点変数として解釈されます。
2494
2495		=begin original
2496
2497		No modifiers may follow the final delimiter. Instead, use
2498		L<perlre/(?adlupimnsx-imnsx)> and/or
2499		L<perlre/(?adluimnsx-imnsx:pattern)> to specify modifiers.
2500		However, certain modifiers are illegal in your wildcard subpattern.
2501		The only character set modifier specifiable is C</aa>;
2502		any other character set, and C<-m>, and C<p>, and C<s> are all illegal.
2503		Specifying modifiers like C<qr/.../gc> that aren't legal in the
2504		C<(?...)> notation normally raise a warning, but with wildcard
2505		subpatterns, their use is an error. The C<m> modifier is ineffective;
2506		everything that matches will be a single line.
2507
2508		=end original
2509
2510		最後の区切り文字の後に修飾子を置くことは出来ません。
2511		修飾子を指定するには代わりに
2512		L<perlre/(?adlupimnsx-imnsx)> または
2513		L<perlre/(?adluimnsx-imnsx:pattern)> を使ってください。
2514		しかし、一部の修飾子はワイルドカードサブパターンでは不正です。
2515		指定できる唯一の文字集合修飾子は C</aa> です;
2516		その他の文字集合、および C<-m>, C<p>, C<s> は全て不正です。
2517		Specifying modifiers like
2518		C<(?...)> 記法で、C<qr/.../gc> のような正当でない修飾子を指定すると、
2519		通常は警告が発生しますが、ワイルドカードサブパターンでは、
2520		これらの使用はエラーです。
2521		C<m> 修飾子は影響しません; マッチングする全てのものは単一行です。
2522
2523		=begin original
2524
2525		By default, your pattern is matched case-insensitively, as if C</i> had
2526		been specified. You can change this by saying C<(?-i)> in your pattern.
2527
2528		=end original
2529
2530		デフォルトでは、パターンは、C</i> が指定されているかのように、
2531		大文字小文字を無視してマッチングします。
2532		パターンに C<(?-i)> と書くことでこれを変更できます。
2533
2534		=begin original
2535
2536		There are also certain operations that are illegal. You can't nest
2537		C<\p{...}> and C<\P{...}> calls within a wildcard subpattern, and C<\G>
2538		doesn't make sense, so is also prohibited.
2539
2540		=end original
2541
2542		不正である操作もあります。
2543		ワイルドカードサブパターンの中で C<\p{...}> と C<\P{...}> を
2544		ネストすることはできず、
2545		C<\G> は意味がないので禁止されます。
2546
2547
2548		=begin original
2549
2550		And the C<*> quantifier (or its equivalent C<(0,}>) is illegal.
2551
2552		=end original
2553
2554		そして C<*> 量指定子 (およびその等価物である C<(0,}>) は不正です。
2555
2556		=begin original
2557
2558		This feature is not available when the left-hand side is prefixed by
2559		C<Is_>, nor for any form that is marked as "Discouraged" in
2560		L<perluniprops/Discouraged>.
2561
2562		=end original
2563
2564		この機能は、左側が C<Is_> を前置されているか、
2565		L<perluniprops/Discouraged> で "Discouraged" とマークされている
2566		形式では利用できません。
2567
2568		=begin original
2569
2570		This experimental feature has been added to begin to implement
2571		L<https://www.unicode.org/reports/tr18/#Wildcard_Properties>. Using it
2572		will raise a (default-on) warning in the
2573		C<experimental::uniprop_wildcards> category. We reserve the right to
2574		change its operation as we gain experience.
2575
2576		=end original
2577
2578		この実験的な機能は、
2579		L<https://www.unicode.org/reports/tr18/#Wildcard_Properties> の実装を
2580		始めるために追加されました。
2581		この機能を使うと、C<experimental::uniprop_wildcards> カテゴリで
2582		(デフォルトでオンの)警告が発生します。
2583		私たちは、経験を積むにつれて、その運用を変更する権利を留保します。
2584
2585		=begin original
2586
2587		Your subpattern can be just about anything, but for it to have some
2588		utility, it should match when called with either or both of
2589		a) the full name of the property value with underscores (and/or spaces
2590		in the Block property) and some things uppercase; or b) the property
2591		value in all lowercase with spaces and underscores squeezed out. For
2592		example,
2593
2594		=end original
2595
2596		サブパターンはどんなものでも構いませんが、サブパターンに何らかの
2597		有用性を持たせるためには、a) 特性値の完全名に
2598		下線(または Block 特性内のスペース)を使い、
2599		一部を大文字にした場合、または b) 特性値をすべて小文字にし、
2600		スペースと下線を削除した場合のいずれか、または両方を
2601		使って呼び出されたときにマッチングする必要があります。
2602		例えば:
2603
2604		qr!\p{Blk=/Old I.*/}!
2605		qr!\p{Blk=/oldi.*/}!
2606
2607		=begin original
2608
2609		would match the same things.
2610
2611		=end original
2612
2613		これは同じものにマッチングします。
2614
2615		=begin original
2616
2617		Another example that shows that within C<\p{...}>, C</x> isn't needed to
2618		have spaces:
2619
2620		=end original
2621
2622		C<\p{...}> の内部を見せるもう一つの例として、C</x> はスペースを
2623		持つ必要はありません。
2624
2625		qr!\p{scx= /Hebrew\|Greek/ }!
2626
2627		=begin original
2628
2629		To be safe, we should have anchored the above example, to prevent
2630		matches for something like C<Hebrew_Braille>, but there aren't
2631		any script names like that, so far.
2632		A warning is issued if none of the legal values for a property are
2633		matched by your pattern. It's likely that a future release will raise a
2634		warning if your pattern ends up causing every possible code point to
2635		match.
2636
2637		=end original
2638
2639		安全のために、前述の例では、C<Hebrew_Braille> のようなものに
2640		マッチングするのを防ぐために、アンカーを使っていますが、
2641		今のところそのような名前の用字名はありません。
2642		パターンと一致する有効な特性の値がない場合は、警告が発生します。
2643		将来のリリースでは、パターンが全ての符号位置にマッチングするように
2644		なった場合に警告が発生する予定です。
2645
2646		=begin original
2647
2648		Starting in 5.32, the Name, Name Aliases, and Named Sequences properties
2649		are allowed to be matched. They are considered to be a single
2650		combination property, just as has long been the case for C<\N{}>. Loose
2651		matching doesn't work in exactly the same way for these as it does for
2652		the values of other properties. The rules are given in
2653		L<https://www.unicode.org/reports/tr44/tr44-24.html#UAX44-LM2>. As a
2654		result, Perl doesn't try loose matching for you, like it does in other
2655		properties. All letters in names are uppercase, but you can add C<(?i)>
2656		to your subpattern to ignore case. If you're uncertain where a blank
2657		is, you can use C< ?> in your subpattern. No character name contains an
2658		underscore, so don't bother trying to match one. The use of hyphens is
2659		particularly problematic; refer to the above link. But note that, as of
2660		Unicode 13.0, the only script in modern usage which has weirdnesses with
2661		these is Tibetan; also the two Korean characters U+116C HANGUL JUNGSEONG
2662		OE and U+1180 HANGUL JUNGSEONG O-E. Unicode makes no promises to not
2663		add hyphen-problematic names in the future.
2664
2665		=end original
2666
2667		5.32 から、Name, Name Aliases, Named Sequences の
2668		各特性をマッチングさせることができます。
2669		これらの特性は、中年の　C<\N{}> の場合と同様に、
2670		単一の組み合わせ特性と見なされます。
2671		緩いマッチングは、他の特性の値とまったく同じようには機能しません。
2672		規則は L<https://www.unicode.org/reports/tr44/tr44-24.html#UAX44-LM2> に
2673		記載されています。
2674		結果として、Perl は他の特性のように緩いマッチングを試みません。
2675		名前のすべての文字は大文字ですが、サブパターンに C<(?i)> を追加して
2676		大文字と小文字を無視することができます。
2677		空白がどこにあるかわからない場合は、サブパターンでC< ?>を使用できます。
2678		下線を含む文字名はないので、わざわざマッチングさせようとしないでください。
2679		ハイフンの使用は特に問題があります: 前述のリンクを参照してください。
2680		ただし、Unicode 13.0 の時点で、現在使われている用字の中で、
2681		これらに奇妙な点があるのはチベット語だけであることに注意してください;
2682		また、U+116C HANGUL JUNGSEONG OEとU+1180 HANGUL JUNGSEONG O-Eという
2683		二つの韓国語文字もあります。
2684		Unicodeは、将来的にハイフンの問題のある名前を追加しないという
2685		約束はしていません。
2686
2687		=begin original
2688
2689		Using wildcards on these is resource intensive, given the hundreds of
2690		thousands of legal names that must be checked against.
2691
2692		=end original
2693
2694		これらにワイルドカードを使うと、
2695		チェックされなければならない有効な名前が数十万あるので。
2696		リソースを大量に消費します。
2697
2698		=begin original
2699
2700		An example of using Name property wildcards is
2701
2702		=end original
2703
2704		Name 特性ワイルドカードを使う例は:
2705
2706		qr!\p{name=/(SMILING\|GRINNING) FACE/}!
2707
2708		=begin original
2709
2710		Another is
2711
2712		=end original
2713
2714		もう一つは:
2715
2716		qr/(?[ \p{name=\/CJK\/} - \p{ideographic} ])/
2717
2718		=begin original
2719
2720		which is the 200-ish (as of Unicode 13.0) CJK characters that aren't
2721		ideographs.
2722
2723		=end original
2724
2725		これは (Unicode 13.0 の時点で) 200 ほどの、表意文字でない CJK 文字です。
2726
2727		=begin original
2728
2729		There are certain properties that wildcard subpatterns don't currently
2730		work with. These are:
2731
2732		=end original
2733
2734		現在ワイルドカードサブパターンが動作しない特性がいくつかあります。
2735		それは:
2736
2737		Bidi Mirroring Glyph
2738		Bidi Paired Bracket
2739		Case Folding
2740		Decomposition Mapping
2741		Equivalent Unified Ideograph
2742		Lowercase Mapping
2743		NFKC Case Fold
2744		Titlecase Mapping
2745		Uppercase Mapping
2746
2747		=begin original
2748
2749		Nor is the C<@I<unicode_property>@> form implemented.
2750
2751		=end original
2752
2753		また、C<@I<unicode_property>@> 形式も実装されていません。
2754
2755		=begin original
2756
2757		Here's a complete example of matching IPV4 internet protocol addresses
2758		in any (single) script
2759
2760		=end original
2761
2762		以下は、(単一の) スクリプトで IPV4 インターネットプロトコルアドレスに
2763		マッチングする完全な例です:
2764
2765		no warnings 'experimental::uniprop_wildcards';
2766
2767		# Can match a substring, so this intermediate regex needs to have
2768		# context or anchoring in its final use. Using nt=de yields decimal
2769		# digits. When specifying a subset of these, we must include \d to
2770		# prevent things like U+00B2 SUPERSCRIPT TWO from matching
2771		my $zero_through_255 =
2772		qr/ \b (*sr: # All from same sript
2773		(?[ \p{nv=0} & \d ])* # Optional leading zeros
2774		( # Then one of:
2775		\d{1,2} # 0 - 99
2776		\| (?[ \p{nv=1} & \d ]) \d{2} # 100 - 199
2777		\| (?[ \p{nv=2} & \d ])
2778		( (?[ \p{nv=:[0-4]:} & \d ]) \d # 200 - 249
2779		\| (?[ \p{nv=5} & \d ])
2780		(?[ \p{nv=:[0-5]:} & \d ]) # 250 - 255
2781		)
2782		)
2783		)
2784		\b
2785		/x;
2786
2787		my $ipv4 = qr/ \A (*sr: $zero_through_255
2788		(?: [.] $zero_through_255 ) {3}
2789		)
2790		\z
2791		/x;
2792
2793	1320	=head2 User-Defined Character Properties
2794	1321
2795	1322	(ユーザ定義文字特性)
2796	1323
2797	1324	=begin original
2798	1325
2799		You can define your own ~~binary~~ character properties by defining subroutines
	1326	You can define your own character properties by defining subroutines
2800		whose names begin with C<"In"> or C<"Is">. (The re~~gex~~ sets fe~~ature~~
	1327	whose names begin with "In" or "Is". The subroutines can be defined in
2801		L<perlre~~/(?[~~ ~~])>~~ provides an ~~alt~~e~~rnativ~~e which ~~allows mo~~re ~~comp~~lex
	1328	any package. The user-defined properties can be used in the regular
2802		defi~~niti~~on~~s.)~~ ~~The~~ subroutines can be defined ~~in any~~
	1329	expression C<\p> and C<\P> constructs; if you are using a user-defined
2803		package. The~~y ove~~r~~ride~~ any ~~Unicod~~e prope~~rties~~ ~~exp~~re~~ssed~~ as the same
	1330	property from a package other than the one you are in, you must specify
2804		~~name~~s. The ~~user-def~~ined proper~~ties~~ can ~~be u~~s~~ed in~~ the regu~~lar~~
	1331	its package in the C<\p> or C<\P> construct.
2805		expression
2806		C<\p{}> and C<\P{}> constructs; if you are using a user-defined property from a
2807		package other than the one you are in, you must specify its package in the
2808		C<\p{}> or C<\P{}> construct.
2809	1332
2810	1333	=end original
2811	1334
2812		あなた自身の ~~2 値~~文字特性を、C<"In"> または C<"Is"> で始まる名前の
	1335	あなた自身の文字特性を、"In" または "Is" で始まる名前のサブルーチンを
2813		~~サブルーチンを~~定義することによって持つことができます。
	1336	定義することによって持つことができます。
2814		(正規表現集合機能 L<perlre/(?[ ])> はより複雑な定義を可能にする選択肢を
2815		提供します。)
2816	1337	そのサブルーチンは任意のパッケージで定義することができます。
2817		~~これら~~は~~同じ名前を持つ~~ ~~Unicode~~ ~~属性を上書~~きします。
	1338	ユーザー定義特性は正規表現の C<\p> 構造や C<\P> 構造で使うことができます。
2818		ユーザー定義特性は正規表現の C<\p{}> 構造や C<\P{}> 構造で使うことができます;
2819	1339	もしユーザー定義特性をそれがあるパッケージ以外で使いたいのであれば、
2820		パッケージ名を C<\p{}> (もしくは C<\P{}>)のために指定する必要があります。
	1340	パッケージ名を C<\p> (もしくは C<\P>)のために指定する必要があります。
2821	1341
2822	1342	# assuming property IsForeign defined in Lang::
2823	1343	package main; # property package name required
2824	1344	if ($txt =~ /\p{Lang::IsForeign}+/) { ... }
2825	1345
2826	1346	package Lang; # property package name not required
2827	1347	if ($txt =~ /\p{IsForeign}+/) { ... }
2828	1348
	1349
2829	1350	=begin original
2830	1351
2831	1352	Note that the effect is compile-time and immutable once defined.
2832		However, the subroutines are passed a single parameter, which is 0 if
2833		case-sensitive matching is in effect and non-zero if caseless matching
2834		is in effect. The subroutine may return different values depending on
2835		the value of the flag, and one set of values will immutably be in effect
2836		for all case-sensitive matches, and the other set for all case-insensitive
2837		matches.
2838	1353
2839	1354	=end original
2840	1355
2841	1356	この効果はコンパイル時のもので、一度定義してしまったら
2842	1357	変更できないことに注意してください。
2843		しかし、サブルーチンは一つの引数を取ります;
2844		大文字小文字を認識するマッチングが有効の場合は 0 となり、
2845		大文字小文字を無視するマッチングが有効の場合は非 0 となります。
2846		サブルーチンはフラグの値に依存して異なった値を返すことがあり、
2847		ある集合の値は全ての大文字小文字を認識するマッチングで変わらず有効になり、
2848		もう一つの集合は大文字小文字を無視するマッチングで有効になります。
2849	1358
2850	1359	=begin original
2851	1360
2852		Note that if the regular expression is tainted, then Perl will die rather
2853		than calling the subroutine when the name of the subroutine is
2854		determined by the tainted data.
2855
2856		=end original
2857
2858		正規表現が汚染されている場合、Perl はサブルーチンの呼び出し時ではなく、
2859		サブルーチンの名前が汚染されたデータによって決定された時点で
2860		die することに注意してください。
2861
2862		=begin original
2863
2864	1361	The subroutines must return a specially-formatted string, with one
2865	1362	or more newline-separated lines. Each line must be one of the following:
2866	1363
2867	1364	=end original
2868	1365
2869	1366	サブルーチンは、ひとつ以上の改行で区切られた特定の形式の文字列を
2870	1367	返さなければなりません。
2871	1368	各行は以下のいずれかの形式でなければなりません:
2872	1369
2873	1370	=over 4
2874	1371
2875	1372	=item *
2876	1373
2877	1374	=begin original
2878	1375
2879		A single hexadecimal number denoting a code point to include.
	1376	A single hexadecimal number denoting a Unicode code point to include.
2880	1377
2881	1378	=end original
2882	1379
2883		含まれる符号位置を示す 1 つの 16 進数。
	1380	含まれる Unicode 符号位置を示す 1 つの 16 進数。
2884	1381
2885	1382	=item *
2886	1383
2887	1384	=begin original
2888	1385
2889	1386	Two hexadecimal numbers separated by horizontal whitespace (space or
2890		tabular characters) denoting a range of code points to include. ~~The~~
	1387	tabular characters) denoting a range of Unicode code points to include.
2891		second number must not be smaller than the first.
2892	1388
2893	1389	=end original
2894	1390
2895		含まれる符号位置の範囲を示す、
	1391	含まれる Unicode の符号位置の範囲を示す、
2896	1392	水平的空白(スペースもしくはタブ)によって区切られる 2 つの 16 進数。
2897		2 番目の数字は最初の数字より小さくてはいけません。
2898	1393
2899	1394	=item *
2900	1395
2901	1396	=begin original
2902	1397
2903		Something to include, prefixed by C<"+">: a built-in character
	1398	Something to include, prefixed by "+": a built-in character
2904		property (prefixed by C<"utf8::">) or a fu~~lly qualifi~~ed (in~~clu~~d~~ing~~ package
	1399	property (prefixed by "utf8::") or a user-defined character property,
2905		name) user-defined character property,
2906	1400	to represent all the characters in that property; two hexadecimal code
2907	1401	points for a range; or a single hexadecimal code point.
2908	1402
2909	1403	=end original
2910	1404
2911		(C<"+"> を前置して) その特性に含めるもの:
	1405	("+" を前置して) その特性に含めるもの:
2912		(C<"utf8::"> が前置された) 組み込みの文字特性もしくは
	1406	("utf8::" が前置された) 組み込みの文字特性もしくはユーザー定義の文字特性;
2913		(パッケージ名を含めた)完全修飾されたユーザー定義の文字特性;
2914	1407	範囲のための 2 つの 16 進符号位置; あるいは単一の 16 進符号位置。
2915	1408
2916	1409	=item *
2917	1410
2918	1411	=begin original
2919	1412
2920		Something to exclude, prefixed by C<"-">: an existing character
	1413	Something to exclude, prefixed by "-": an existing character
2921		property (prefixed by C<"utf8::">) or a fu~~lly qualifi~~ed (in~~clu~~d~~ing~~ package
	1414	property (prefixed by "utf8::") or a user-defined character property,
2922		name) user-defined character property,
2923	1415	to represent all the characters in that property; two hexadecimal code
2924	1416	points for a range; or a single hexadecimal code point.
2925	1417
2926	1418	=end original
2927	1419
2928		(C<"-"> を前置して) その特性から除外するもの:
	1420	("-" を前置して) その特性から除外するもの:
2929		(C<"utf8::"> が前置された) 組み込みの文字特性もしくは
	1421	("utf8::" が前置された) 組み込みの文字特性もしくはユーザー定義の文字特性;
2930		(パッケージ名を含めた)完全修飾されたユーザー定義の文字特性;
2931	1422	範囲のための 2 つの 16 進符号位置; あるいは単一の 16 進符号位置。
2932	1423
2933	1424	=item *
2934	1425
2935	1426	=begin original
2936	1427
2937		Something to negate, prefixed C<"!">: an existing character
	1428	Something to negate, prefixed "!": an existing character
2938		property (prefixed by C<"utf8::">) or a fu~~lly qualifi~~ed (in~~clu~~d~~ing~~ package
	1429	property (prefixed by "utf8::") or a user-defined character property,
2939		name) user-defined character property,
2940	1430	to represent all the characters in that property; two hexadecimal code
2941	1431	points for a range; or a single hexadecimal code point.
2942	1432
2943	1433	=end original
2944	1434
2945		(C<"!"> を前置して)否定を取るもの:
	1435	("!" を前置して)否定を取るもの:
2946		(C<"utf8::"> が前置された) 組み込みの文字特性もしくは
	1436	("utf8::" が前置された) 組み込みの文字特性もしくはユーザー定義の文字特性;
2947		(パッケージ名を含めた)完全修飾されたユーザー定義の文字特性;
2948	1437	範囲のための 2 つの 16 進符号位置; あるいは単一の 16 進符号位置。
2949	1438
2950	1439	=item *
2951	1440
2952	1441	=begin original
2953	1442
2954		Something to intersect with, prefixed by C<"&">: an existing character
	1443	Something to intersect with, prefixed by "&": an existing character
2955		property (prefixed by C<"utf8::">) or a fu~~lly qualifi~~ed (in~~clu~~d~~ing~~ package
	1444	property (prefixed by "utf8::") or a user-defined character property,
2956		name) user-defined character property,
2957	1445	for all the characters except the characters in the property; two
2958	1446	hexadecimal code points for a range; or a single hexadecimal code point.
2959	1447
2960	1448	=end original
2961	1449
2962		(C<"&"> を前置して)共通集合を取るもの:
	1450	("&" を前置して)共通集合を取るもの:
2963	1451	特性にある文字以外の全ての文字のための
2964		(C<"utf8::"> が前置された) 既に存在する文字特性または
	1452	("utf8::" が前置された) 既に存在する文字特性またはユーザー定義文字特性;
2965		(パッケージ名を含めた)完全修飾されたユーザー定義文字特性;
2966	1453	範囲のための 2 つの 16 進符号位置; あるいは単一の 16 進符号位置。
2967	1454
2968	1455	=back
2969	1456
2970	1457	=begin original
2971	1458
2972	1459	For example, to define a property that covers both the Japanese
2973	1460	syllabaries (hiragana and katakana), you can define
2974	1461
2975	1462	=end original
2976	1463
2977	1464	例えば、両方の日本語の音節(ひらがなとカタカナ)を対象とする特性を
2978	1465	定義するには、以下のように定義します
2979	1466
2980	1467	sub InKana {
2981		return <<END;
	1468	return <<END;
2982	1469	3040\t309F
2983	1470	30A0\t30FF
2984	1471	END
2985	1472	}
2986	1473
2987	1474	=begin original
2988	1475
2989	1476	Imagine that the here-doc end marker is at the beginning of the line.
2990	1477	Now you can use C<\p{InKana}> and C<\P{InKana}>.
2991	1478
2992	1479	=end original
2993	1480
2994	1481	ヒアドキュメントの終端マーカーは行の先頭に置かれることを思い出してください。
2995	1482	これで、C<\p{InKana}> や C<\P{InKana}> を使うことができます。
2996	1483
2997	1484	=begin original
2998	1485
2999	1486	You could also have used the existing block property names:
3000	1487
3001	1488	=end original
3002	1489
3003	1490	すでに存在しているブロック特性名を使うこともできます:
3004	1491
3005	1492	sub InKana {
3006		return <<'END';
	1493	return <<'END';
3007	1494	+utf8::InHiragana
3008	1495	+utf8::InKatakana
3009	1496	END
3010	1497	}
3011	1498
3012	1499	=begin original
3013	1500
3014	1501	Suppose you wanted to match only the allocated characters,
3015	1502	not the raw block ranges: in other words, you want to remove
3016		the un~~assig~~ned characters:
	1503	the non-characters:
3017	1504
3018	1505	=end original
3019	1506
3020	1507	生のブロック範囲ではなく、割り当てられた文字のみにマッチさせたいと
3021		考えているとしましょう: 言い換えれば、~~未割り当て~~文字を
	1508	考えているとしましょう: 言い換えれば、文字以外のものを
3022	1509	取り除きたいということです:
3023	1510
3024	1511	sub InKana {
3025		return <<'END';
	1512	return <<'END';
3026	1513	+utf8::InHiragana
3027	1514	+utf8::InKatakana
3028	1515	-utf8::IsCn
3029	1516	END
3030	1517	}
3031	1518
3032	1519	=begin original
3033	1520
3034	1521	The negation is useful for defining (surprise!) negated classes.
3035	1522
3036	1523	=end original
3037	1524
3038	1525	否定は否定クラスを定義するのに便利です。
3039	1526
3040	1527	sub InNotKana {
3041		return <<'END';
	1528	return <<'END';
3042	1529	!utf8::InHiragana
3043	1530	-utf8::InKatakana
3044	1531	+utf8::IsCn
3045	1532	END
3046	1533	}
3047	1534
3048	1535	=begin original
3049	1536
3050		~~Thi~~s will ~~match a~~ll no~~n-Un~~i~~cod~~e co~~de p~~oin~~ts,~~ ~~sin~~ce ~~eve~~ry ~~one of~~ them is
	1537	Intersection is useful for getting the common characters matched by
3051		not ~~in Kana. Y~~ou ~~can use intersecti~~on to excl~~ude the~~s~~e, if de~~sire~~d, a~~s
	1538	two (or more) classes.
3052		this modified example shows:
3053	1539
3054	1540	=end original
3055	1541
3056		~~これは全ての非 Un~~ico~~de 符号位置~~にマッチ~~ングしま~~す;
	1542	共通集合(intersection)は二つ以上のクラスにマッチする共通の文字を得るのに
3057		~~これらのどれも Kana ではないから~~です。
	1543	便利です。
3058		もし必要なら、この修正された例にように、これらを除外するために
3059		共通集合を使えます:
3060	1544
3061		sub InNo~~tKa~~na {
	1545	sub InFooAndBar {
3062	1546	return <<'END';
3063		~~!utf8::InHir~~agana
	1547	+main::Foo
3064		~~-utf8~~::~~InK~~a~~takana~~
	1548	&main::Bar
3065		+utf8::IsCn
3066		&utf8::Any
3067	1549	END
3068	1550	}
3069	1551
3070	1552	=begin original
3071	1553
3072		~~C<&u~~t~~f8::Any>~~ must be the ~~las~~t ~~lin~~e in the definit~~ion.~~
	1554	It's important to remember not to use "&" for the first set -- that
	1555	would be intersecting with nothing (resulting in an empty set).
3073	1556
3074	1557	=end original
3075	1558
3076		~~C<&utf8::Any> は定義の~~最後の行でなければな~~りません。~~
	1559	最初の集合に "&" を使わないということを忘れないでください --
	1560	そうしてしまうと空との共通集合を取ってしまいます(結果は空集合です)。
3077	1561
	1562	=head2 User-Defined Case Mappings
	1563
	1564	(ユーザ定義の大文字・小文字の対応関係)
	1565
3078	1566	=begin original
3079	1567
3080		~~Intersecti~~on is ~~use~~d gene~~rally~~ for ~~gett~~ing t~~he c~~o~~mmon~~ ~~charact~~ers matched
	1568	You can also define your own mappings to be used in the lc(),
3081		by two (~~or more~~) cla~~sses.~~ ~~It's~~ i~~mpo~~rt~~ant~~ to re~~membe~~r ~~not to u~~se ~~C<"&"> fo~~r
	1569	lcfirst(), uc(), and ucfirst() (or their string-inlined versions).
3082		the first set; that wo~~uld~~ be ~~inte~~rsecting ~~wit~~h ~~nothing,~~ r~~esulting in~~ an
	1570	The principle is similar to that of user-defined character
3083		emp~~ty s~~et~~. (S~~i~~milarly u~~s~~ing~~ ~~C<"-"> f~~or the fi~~rst s~~et does nothing).
	1571	properties: to define subroutines in the C<main> package
	1572	with names like C<ToLower> (for lc() and lcfirst()), C<ToTitle> (for
	1573	the first character in ucfirst()), and C<ToUpper> (for uc(), and the
	1574	rest of the characters in ucfirst()).
3084	1575
3085	1576	=end original
3086	1577
3087		~~共通集合は一般的~~に二つ(またはそ~~れ以上)~~の~~クラスによってマッチングする~~
	1578	同様に、lc()、lcfirst()、uc()、ucfirst() (あるいはその文字列組み込み版)で
3088		共通の文字を得る~~ために使われ~~ます。
	1579	あなた自身の対応関係を定義することもできます。
3089		~~最初の集合に C<"&"> を使わないことを覚えておくこと~~は~~重要です;~~
	1580	原則は
3090		~~これは空集合と~~の~~共通集~~合~~を取るので、結果~~として~~空集合になり~~ます
	1581	ユーザー定義文字特性の場合と似ています: C<ToLower> (lc() と lcfirst()用),
3091		(~~同様に、~~最初の~~集合に~~ C<~~"-"~~> ~~を使っても何もしません~~)。
	1582	C<ToTitle> (ucfirst() の最初の文字用), C<ToUpper> (uc() 用と ucfirst() の
	1583	残りの文字用) のような名前のサブルーチンを C<main> パッケージで定義します。
3092	1584
3093	1585	=begin original
3094	1586
3095		~~Unlik~~e ~~non-u~~ser~~-def~~ined ~~C<\p{}> p~~roperty matches, ~~no wa~~rnin~~g i~~s ever
	1587	The string returned by the subroutines needs now to be three
3096		ge~~ner~~ated if the~~se p~~r~~opertie~~s ar~~e m~~atched aga~~ins~~t a no~~n-Unicod~~e code
	1588	hexadecimal numbers separated by tabulators: start of the source
3097		~~poi~~nt (see ~~L</B~~eyond ~~Unic~~ode code points> be~~low)~~.
	1589	range, end of the source range, and start of the destination range.
	1590	For example:
3098	1591
3099	1592	=end original
3100	1593
3101		ユー~~ザー定義~~でない ~~C<\p{}>~~ 特性の~~マッチングと異なり、~~
	1594	サブルーチンから返される文字列はタブで区切られた 3 つの 16 進数を
3102		~~この特性はた~~とえ非 ~~Unicode 符号位置に対~~して~~マッチングしても警告は~~
	1595	必要とします: ソースの範囲の始まり、ソースの範囲の終わり、そして
3103		~~発生し~~ま~~せん (後述~~す~~る L</Beyond Unicode code points> 参照)~~。
	1596	デスティネーション範囲の始まりです。
	1597	例を挙げましょう:
3104	1598
3105		~~=head2~~ ~~User-Defined~~ ~~Case~~ ~~Mappings~~ ~~(for~~ s~~erio~~us ~~hack~~ers ~~only)~~
	1599	sub ToUpper {
	1600	return <<END;
	1601	0061\t0063\t0041
	1602	END
	1603	}
3106	1604
3107		~~(ユーザ定義の大文字・小文字の対応関係(真剣なハッカー専用))~~
	1605	=begin original
3108	1606
	1607	defines an uc() mapping that causes only the characters "a", "b", and
	1608	"c" to be mapped to "A", "B", "C", all other characters will remain
	1609	unchanged.
	1610
	1611	=end original
	1612
	1613	これは、"a", "b", "c" の文字のみを "A", "B", "C" にマッピングして
	1614	その他のすべての文字は変更しないという uc() のマッピングを定義しています。
	1615
3109	1616	=begin original
3110	1617
3111		~~B<T~~his feature has been rem~~oved~~ as of Per~~l 5.16.>~~
	1618	If there is no source range to speak of, that is, the mapping is from
3112		The ~~CPAN m~~o~~dule~~ ~~C<L<U~~nicode~~::Ca~~sing>> pr~~ovid~~es better fun~~cti~~o~~nality~~ with~~out~~
	1619	a single character to another single character, leave the end of the
3113		the dra~~wbacks~~ that th~~is f~~eature had. ~~If you~~ are usi~~ng a Per~~l e~~arli~~er
	1620	source range empty, but the two tabulator characters are still needed.
3114		~~than~~ ~~5.16, this f~~ea~~ture was~~ m~~ost fu~~l~~ly docum~~e~~nted in the 5.14 version of~~
	1621	For example:
3115		this pod:
3116		L<http://perldoc.perl.org/5.14.0/perlunicode.html#User-Defined-Case-Mappings-%28for-serious-hackers-only%29>
3117	1622
3118	1623	=end original
3119	1624
3120		~~B<こ~~の~~機能は Perl 5.16 で削除さ~~れま~~した。>~~
	1625	もしソースの範囲に言及することがなければ、つまり、対応関係が単一の
3121		~~CPAN モジュール C<L<Unicode::Casing>> はこ~~の~~機能が持~~っていた欠点なしに
	1626	文字から別の単一の文字に変換するものであったならば、ソースの範囲の
3122		よりよい~~機能を提供しま~~す。
	1627	終わりは空のままでよいけれども二つのタブは必要です。
3123		~~5.16 より前の Perl~~ を~~使っている場合、この機能は 5.14 版のこの pod に~~
	1628	例を挙げましょう:
3124		もっともよく文書化されています:
3125		L<http://perldoc.perl.org/5.14.0/perlunicode.html#User-Defined-Case-Mappings-%28for-serious-hackers-only%29>
3126	1629
	1630	sub ToLower {
	1631	return <<END;
	1632	0041\t\t0061
	1633	END
	1634	}
	1635
	1636	=begin original
	1637
	1638	defines a lc() mapping that causes only "A" to be mapped to "a", all
	1639	other characters will remain unchanged.
	1640
	1641	=end original
	1642
	1643	"A" を "a" にマッピングしてその他のすべての文字は変更しない lc() の
	1644	マッピングを定義しています。
	1645
	1646	=begin original
	1647
	1648	(For serious hackers only) If you want to introspect the default
	1649	mappings, you can find the data in the directory
	1650	C<$Config{privlib}>/F<unicore/To/>. The mapping data is returned as
	1651	the here-document, and the C<utf8::ToSpecFoo> are special exception
	1652	mappings derived from <$Config{privlib}>/F<unicore/SpecialCasing.txt>.
	1653	The C<Digit> and C<Fold> mappings that one can see in the directory
	1654	are not directly user-accessible, one can use either the
	1655	C<Unicode::UCD> module, or just match case-insensitively (that's when
	1656	the C<Fold> mapping is used).
	1657
	1658	=end original
	1659
	1660	(真剣なハッカー専用) デフォルトのマッピングを内省したいのなら、
	1661	C<$Config{privlib}>/F<unicore/To/> というディレクトリにデータを
	1662	見つけ出すことができます。
	1663	マッピングデータはヒアドキュメントとして返され、C<utf8::ToSpecFoo> は
	1664	C<$Config{privlib}>/F<unicore/SpecialCasing.txt> から派生した特殊な
	1665	例外マッピングです。
	1666	そのディレクトリで見つけることのできる C<Digit> と C<Fold> のマッピングは
	1667	ユーザーがダイレクトにアクセスできず、C<Unicode::UCD> モジュールを使うか
	1668	大小文字を無視してマッピングします(C<Fold> マッピングが使われているとき)。
	1669
	1670	=begin original
	1671
	1672	A final note on the user-defined case mappings: they will be used
	1673	only if the scalar has been marked as having Unicode characters.
	1674	Old byte-style strings will not be affected.
	1675
	1676	=end original
	1677
	1678	ユーザー定義の大文字・小文字の対応関係に関する最後の注意: これらはスカラが
	1679	Unicode 文字としてマークされているときにのみ使われます。
	1680	古いバイト形式の文字列には影響を及ぼしません。
	1681
3127	1682	=head2 Character Encodings for Input and Output
3128	1683
3129	1684	(入出力のための文字エンコーディング)
3130	1685
3131	1686	=begin original
3132	1687
3133	1688	See L<Encode>.
3134	1689
3135	1690	=end original
3136	1691
3137	1692	L<Encode> を参照してください。
3138	1693
3139	1694	=head2 Unicode Regular Expression Support Level
3140	1695
3141	1696	(Unicode 正規表現対応レベル)
3142	1697
3143	1698	=begin original
3144	1699
3145		The following list of Unicode support~~ed features~~ for regular expressions describes
	1700	The following list of Unicode support for regular expressions describes
3146		all features currently ~~directly~~ supported ~~by core Perl~~. The references
	1701	all the features currently supported. The references to "Level N"
3147		~~to "Level I<N>"~~ and the section numbers refer to
	1702	and the section numbers refer to the Unicode Technical Standard #18,
3148		~~L<UTS#18~~ "Unicode Regular Expressions"~~\|https://www.unicod~~e.or~~g/report~~s~~/tr~~18>,
	1703	"Unicode Regular Expressions", version 11, in May 2005.
3149		version 18, October 2016.
3150	1704
3151	1705	=end original
3152	1706
3153		以下に挙げるリストは、現在~~コア Perl が直接~~対応している全ての機能を記述する、
	1707	以下に挙げるリストは、現在対応している全ての機能を記述する、
3154	1708	正規表現のための Unicode 対応のリストです。
3155		"Level I<N>" に対する参照とセクション番号は
	1709	"Level N" に対する参照とセクション番号は
3156		L<U~~TS#18 "U~~nicode Re~~gular Expressions"\|~~h~~ttps://www.u~~nicod~~e.o~~r~~g/reports/tr~~18>,
	1710	Unicode Technical Standard #18,
3157		~~2016~~ 年 10 ~~月のバージョン~~ 18 ~~を参照しています。~~
	1711	"Unicode Regular Expressions", version 11, in May 2005
	1712	を参照しています。
3158	1713
3159		=head3 Level 1 - Basic Unicode Support
3160
3161		RL1.1 Hex Notation - Done [1]
3162		RL1.2 Properties - Done [2]
3163		RL1.2a Compatibility Properties - Done [3]
3164		RL1.3 Subtraction and Intersection - Done [4]
3165		RL1.4 Simple Word Boundaries - Done [5]
3166		RL1.5 Simple Loose Matches - Done [6]
3167		RL1.6 Line Boundaries - Partial [7]
3168		RL1.7 Supplementary Code Points - Done [8]
3169
3170	1714	=over 4
3171	1715
3172		=item ~~[1] C<\N{U+...}> and C<\x{...}>~~
	1716	=item *
3173	1717
3174		=item ~~[2]~~
	1718	Level 1 - Basic Unicode Support
3175		C<\p{...}> C<\P{...}>. This requirement is for a minimal list of
3176		properties. Perl supports these. See R2.7 for other properties.
3177	1719
3178		~~([2]~~ ~~C<\p{...}>~~ ~~C<\P{...}>。この要求は最小限の特性の一覧に対するものです。Perl~~ ~~はこれらに対応しています。その他の特性については~~ R2.7 ~~を参照してください。)~~
	1720	RL1.1 Hex Notation - done [1]
	1721	RL1.2 Properties - done [2][3]
	1722	RL1.2a Compatibility Properties - done [4]
	1723	RL1.3 Subtraction and Intersection - MISSING [5]
	1724	RL1.4 Simple Word Boundaries - done [6]
	1725	RL1.5 Simple Loose Matches - done [7]
	1726	RL1.6 Line Boundaries - MISSING [8]
	1727	RL1.7 Supplementary Code Points - done [9]
3179	1728
3180		~~=item~~ [3]
	1729	[1] \x{...}
3181		~~Perl~~ ~~has~~ ~~C<\d>~~ ~~C<\D>~~ ~~C<\s>~~ ~~C<\S>~~ ~~C<\w>~~ ~~C<\W>~~ ~~C<\X> C<~~[~~:I<prop>:~~]>
	1730	[2] \p{...} \P{...}
3182		C<[~~:^I<~~pro~~p>:]>,~~ plus all the ~~prop~~erties speci~~fied by~~
	1731	[3] supports not only minimal list (general category, scripts,
3183		L<ht~~tps://www.un~~icode.or~~g/r~~ep~~orts/tr18/#Com~~p~~atibility_Prop~~er~~tie~~s>. These
	1732	Alphabetic, Lowercase, Uppercase, WhiteSpace,
3184		are de~~scr~~i~~bed~~ above in ~~L</O~~t~~her~~ ~~Properties>~~
	1733	NoncharacterCodePoint, DefaultIgnorableCodePoint, Any,
	1734	ASCII, Assigned), but also bidirectional types, blocks, etc.
	1735	(see L</"Unicode Character Properties">)
	1736	[4] \d \D \s \S \w \W \X [:prop:] [:^prop:]
	1737	[5] can use regular expression look-ahead [a] or
	1738	user-defined character properties [b] to emulate set operations
	1739	[6] \b \B
	1740	[7] note that Perl does Full case-folding in matching, not Simple:
	1741	for example U+1F88 is equivalent with U+1F00 U+03B9,
	1742	not with 1F80. This difference matters for certain Greek
	1743	capital letters with certain modifiers: the Full case-folding
	1744	decomposes the letter, while the Simple case-folding would map
	1745	it to a single character.
	1746	[8] should do ^ and $ also on U+000B (\v in C), FF (\f), CR (\r),
	1747	CRLF (\r\n), NEL (U+0085), LS (U+2028), and PS (U+2029);
	1748	should also affect <>, $., and script line numbers;
	1749	should not split lines within CRLF [c] (i.e. there is no empty
	1750	line between \r and \n)
	1751	[9] UTF-8/UTF-EBDDIC used in perl allows not only U+10000 to U+10FFFF
	1752	but also beyond U+10FFFF [d]
3185	1753
3186		(Perl は C<\d> C<\D> C<\s> C<\S> C<\w> C<\W> C<\X> C<[:I<prop>:]> C<[:^I<prop>:]> に加えて、L<https://www.unicode.org/reports/tr18/#Compatibility_Properties> で指定されている全ての特性を持ちます。これらは前述の L</Other Properties> に記されています。)
3187
3188		=item [4]
3189
3190	1754	=begin original
3191	1755
3192		~~The~~ ~~regex~~ ~~sets fe~~a~~ture~~ ~~C<"(?[...])">~~ starting in ~~v5.18 accomp~~lishes
	1756	[a] You can mimic class subtraction using lookahead.
3193		~~this.~~ Se~~e L<~~perlre~~/(?[~~ ~~])>.~~
	1757	For example, what UTS#18 might write as
3194	1758
3195	1759	=end original
3196	1760
3197		~~v5.18 からの正規表現集合機能 C<"(?~~[~~...~~]~~)">~~ ~~がこれ~~を行います。
	1761	[a] class subtraction を先読みを使って模倣することができます。
3198		~~L<perlre/(?[~~ ~~])>~~ ~~を参照してください。~~
	1762	たとえば、以下の UTR #18 は
3199	1763
3200		~~=item~~ [5]
	1764	[{Greek}-[{UNASSIGNED}]]
3201		C<\b> C<\B> meet most, but not all, the details of this requirement, but
3202		C<\b{wb}> and C<\B{wb}> do, as well as the stricter R2.3.
3203	1765
3204		=item [6]
3205
3206	1766	=begin original
3207	1767
3208		~~Note~~ ~~that~~ Perl ~~does Full~~ ca~~se-foldi~~ng in ~~match~~i~~ng, no~~t ~~Simpl~~e:
	1768	in Perl can be written as:
3209	1769
3210	1770	=end original
3211	1771
3212		Perl ~~はマッチング~~で ~~Simple~~ で~~はなく Full 大文字小文字畳み込みを~~
	1772	以下のように Perl で記述できます:
3213		行うことに注意してください:
3214	1773
	1774	(?!\p{Unassigned})\p{InGreekAndCoptic}
	1775	(?=\p{Assigned})\p{InGreekAndCoptic}
	1776
3215	1777	=begin original
3216	1778
3217		For example ~~C<U+1F88> is eq~~uival~~ent~~ ~~to C<U+1F00 U+03B9>, inst~~ead ~~of jus~~t
	1779	But in this particular example, you probably really want
3218		C<U+1F80>. This difference matters mainly for certain Greek capital
3219		letters with certain modifiers: the Full case-folding decomposes the
3220		letter, while the Simple case-folding would map it to a single
3221		character.
3222	1780
3223	1781	=end original
3224	1782
3225		例~~えば C<U+1F88>~~ は単な~~る C<U+1F80>~~ では~~なく C<U+1F00 U+03B9> と等価~~です。
	1783	しかし、この特定の例では、あなたが実際に望んでいたのは次のものでしょう
3226		この違いは、主にある種の修飾子付きのある種のギリシャ大文字に対して
3227		問題になります: Full 大文字小文字畳み込みは文字を分解しますが、
3228		Simple 大文字小文字畳み込みはそれを単一文字にマッピングします。
3229	1784
3230		~~=item~~ ~~[7]~~
	1785	\p{GreekAndCoptic}
3231	1786
3232	1787	=begin original
3233	1788
3234		The ~~reason~~ this is ~~con~~sidered to be ~~only~~ part~~ially~~ ~~impl~~emented is t~~hat~~
	1789	which will match assigned characters known to be part of the Greek script.
3235		Perl has L<C<qrE<sol>\b{lb}E<sol>>\|perlrebackslash/\b{lb}> and
3236		C<L<Unicode::LineBreak>> that are conformant with
3237		L<UAX#14 "Unicode Line Breaking Algorithm"\|https://www.unicode.org/reports/tr14>.
3238		The regular expression construct provides default behavior, while the
3239		heavier-weight module provides customizable line breaking.
3240	1790
3241	1791	=end original
3242	1792
3243		これ~~が部分的に~~の~~み実装~~して~~いると考え~~られる~~理由は、P~~erl は
	1793	これは Greek 用字の一部として知られている assigned character にマッチします。
3244		L<UAX#14 "Unicode Line Breaking Algorithm"\|https://www.unicode.org/reports/tr14>
3245		に準拠している
3246		L<C<qrE<sol>\b{lb}E<sol>>\|perlrebackslash/\b{lb}> と
3247		C<L<Unicode::LineBreak>> があるからです。
3248		正規表現構文はデフォルトの振る舞いを提供する一方、
3249		この重量級モジュールは行区切りのカスタマイズを提供します。
3250	1794
3251	1795	=begin original
3252	1796
3253		~~But~~ Perl tre~~ats~~ ~~C<\~~n~~> as~~ the st~~art-~~ and en~~d-lin~~e
	1797	Also see the Unicode::Regex::Set module, it does implement the full
3254		~~del~~imiter, ~~whereas U~~nicode spe~~cifies~~ mo~~re ch~~aract~~ers~~ tha~~t should be~~
	1798	UTS#18 grouping, intersection, union, and removal (subtraction) syntax.
3255		so-interpreted.
3256	1799
3257	1800	=end original
3258	1801
3259		~~しかし~~ Perl ~~は C<\n>~~ を~~行の先頭と末尾の区切り文字と~~して扱い、
	1802	同様に Unicode::Regex::Set モジュールを参照してください。
3260		一方 Unicode ~~はそのように解釈するべきより多くの~~文字を~~指定しています。~~
	1803	これは UTR #18のグルーピング、intersection、union, removal(substraction)構文を
	1804	フルに実装しています。
3261	1805
3262	1806	=begin original
3263	1807
3264		These are:
	1808	[b] '+' for union, '-' for removal (set-difference), '&' for intersection
	1809	(see L</"User-Defined Character Properties">)
3265	1810
3266	1811	=end original
3267	1812
3268		これは:
	1813	[b] 結合のためには '+'、除去(差集合)のためには '-'、
	1814	共通集合のためには '&' です
	1815	(L</"User-Defined Character Properties"> を参照してください)
3269	1816
3270		VT U+000B (\v in C)
3271		FF U+000C (\f)
3272		CR U+000D (\r)
3273		NEL U+0085
3274		LS U+2028
3275		PS U+2029
3276
3277	1817	=begin original
3278	1818
3279		~~C<^>~~ ~~and~~ C<$> ~~in regu~~lar expr~~ession~~ ~~pattern~~s are ~~suppos~~ed ~~to match a~~ll
	1819	[c] Try the C<:crlf> layer (see L<PerlIO>).
3280		these, but don't.
3281		These characters also don't, but should, affect C<< <> >> C<$.>, and
3282		script line numbers.
3283	1820
3284	1821	=end original
3285	1822
3286		~~正規表現パターンの~~ C<^> と C<$> ~~はこれら全~~て~~にマッチングすることが~~
	1823	[c] C<:crlf> 層を試してください (L<PerlIO> を参照してください)。
3287		想定されますが、マッチングしません。
3288		これらの文字は、C<< <> >>, C<$.>, スクリプトの行番号にも影響を
3289		与えるべきですが、与えません。
3290	1824
3291	1825	=begin original
3292	1826
3293		Also, lines ~~sho~~uld not be s~~plit~~ wi~~thi~~n ~~C<CRLF~~> ~~(i.e.~~ t~~here~~ ~~is n~~o
	1827	[d] Avoid C<use warning 'utf8';> (or say C<no warning 'utf8';>) to allow
3294		~~empty~~ ~~line between~~ C<\~~r> and C<\n>).~~ F~~or C<CRL~~F>~~, try the C<:crlf>~~
	1828	U+FFFF (C<\x{FFFF}>).
3295		layer (see L<PerlIO>).
3296	1829
3297	1830	=end original
3298	1831
3299		~~また、C<CRL~~F> ~~の中の行を分割しません~~ (~~つまり~~ C<\r> と C<\n> ~~の間に~~
	1832	[d] U+FFFF (C<\x{FFFF}>) を許可するために、C<use warning 'utf8';> を
3300		~~空行はあり~~ません)。
	1833	しないでください (または C<no warning 'utf8';> としてください)。
3301		C<CRLF> については、C<:crlf> 層 (L<PerlIO> 参照) を試してください。
3302	1834
3303		=item ~~[8]~~
	1835	=item *
3304		UTF-8/UTF-EBDDIC used in Perl allows not only C<U+10000> to
3305		C<U+10FFFF> but also beyond C<U+10FFFF>
3306	1836
3307		~~([8] P~~erl ~~で使われる~~ ~~UTF~~-~~8/UTF-~~E~~BDDIC~~ ~~は C<~~U~~+10000>~~ ~~から C<U+10FFFF> だけでなく C<U+10FFFF> を超える値も認めません)~~
	1837	Level 2 - Extended Unicode Support
3308	1838
3309		=back
	1839	RL2.1 Canonical Equivalents - MISSING [10][11]
	1840	RL2.2 Default Grapheme Clusters - MISSING [12][13]
	1841	RL2.3 Default Word Boundaries - MISSING [14]
	1842	RL2.4 Default Loose Matches - MISSING [15]
	1843	RL2.5 Name Properties - MISSING [16]
	1844	RL2.6 Wildcard Properties - MISSING
3310	1845
3311		~~=head3~~ ~~Level~~ 2 - ~~Ext~~ended Unicode ~~Supp~~ort
	1846	[10] see UAX#15 "Unicode Normalization Forms"
	1847	[11] have Unicode::Normalize but not integrated to regexes
	1848	[12] have \X but at this level . should equal that
	1849	[13] UAX#29 "Text Boundaries" considers CRLF and Hangul syllable
	1850	clusters as a single grapheme cluster.
	1851	[14] see UAX#29, Word Boundaries
	1852	[15] see UAX#21 "Case Mappings"
	1853	[16] have \N{...} but neither compute names of CJK Ideographs
	1854	and Hangul Syllables nor use a loose match [e]
3312	1855
3313		~~RL2.1 Ca~~noni~~cal Equ~~i~~vale~~n~~ts - Retr~~a~~cted [9]~~
	1856	=begin original
3314		by Unicode
3315		RL2.2 Extended Grapheme Clusters and - Partial [10]
3316		Character Classes with Strings
3317		RL2.3 Default Word Boundaries - Done [11]
3318		RL2.4 Default Case Conversion - Done
3319		RL2.5 Name Properties - Done
3320		RL2.6 Wildcards in Property Values - Partial [12]
3321		RL2.7 Full Properties - Partial [13]
3322		RL2.8 Optional Properties - Partial [14]
3323	1857
3324		=over 4
	1858	[e] C<\N{...}> allows namespaces (see L<charnames>).
3325	1859
3326		=item ~~[9]~~
	1860	=end original
3327		Unicode has rewritten this portion of UTS#18 to say that getting
3328		canonical equivalence (see UAX#15
3329		L<"Unicode Normalization Forms"\|https://www.unicode.org/reports/tr15>)
3330		is basically to be done at the programmer level. Use NFD to write
3331		both your regular expressions and text to match them against (you
3332		can use L<Unicode::Normalize>).
3333	1861
3334		~~=it~~em ~~[10]~~
	1862	[e] C<\N{...}> は名前空間を許可します (L<charnames> を参照してください)。
3335		Perl has C<\X> and C<\b{gcb}>. Unicode has retracted their "Grapheme
3336		Cluster Mode", and recently added string properties, which Perl does not
3337		yet support.
3338	1863
3339		=item ~~[11] see~~
	1864	=item *
3340		L<UAX#29 "Unicode Text Segmentation"\|https://www.unicode.org/reports/tr29>,
3341	1865
3342		~~=it~~em ~~[12]~~ see
	1866	Level 3 - Tailored Support
3343		L</Wildcards in Property Values> above.
3344	1867
3345		=item ~~[13]~~
	1868	RL3.1 Tailored Punctuation - MISSING
3346		~~Perl~~ ~~supports~~ ~~all~~ ~~the~~ ~~properties~~ in ~~the~~ Unicode ~~Cha~~racter ~~Databa~~se
	1869	RL3.2 Tailored Grapheme Clusters - MISSING [17][18]
3347		~~(UCD).~~ It ~~does~~ ~~not~~ ~~yet~~ ~~support~~ ~~the~~ listed properties ~~that~~ ~~come~~ ~~from~~
	1870	RL3.3 Tailored Word Boundaries - MISSING
3348		~~other~~ Unicode sources.
	1871	RL3.4 Tailored Loose Matches - MISSING
	1872	RL3.5 Tailored Ranges - MISSING
	1873	RL3.6 Context Matching - MISSING [19]
	1874	RL3.7 Incremental Matches - MISSING
	1875	( RL3.8 Unicode Set Sharing )
	1876	RL3.9 Possible Match Sets - MISSING
	1877	RL3.10 Folded Matching - MISSING [20]
	1878	RL3.11 Submatchers - MISSING
3349	1879
3350		~~=item~~ [14]
	1880	[17] see UAX#10 "Unicode Collation Algorithms"
3351		The on~~ly opt~~io~~nal prop~~e~~rty th~~at Perl supports i~~s N~~amed Sequence~~. None~~
	1881	[18] have Unicode::Collate but not integrated to regexes
3352		of these prope~~rtie~~s are in the ~~UCD.~~
	1882	[19] have (?<=x) and (?=x), but look-aheads or look-behinds should see
	1883	outside of the target substring
	1884	[20] need insensitive matching for linguistic features other than case;
	1885	for example, hiragana to katakana, wide and narrow, simplified Han
	1886	to traditional Han (see UTR#30 "Character Foldings")
3353	1887
3354	1888	=back
3355	1889
3356		=head3 Level 3 - Tailored Support
3357
3358		=begin original
3359
3360		This has been retracted by Unicode.
3361
3362		=end original
3363
3364		これは Unicode によって取り下げられました。
3365
3366	1890	=head2 Unicode Encodings
3367	1891
3368	1892	(Unicode のエンコーディング)
3369	1893
3370	1894	=begin original
3371	1895
3372	1896	Unicode characters are assigned to I<code points>, which are abstract
3373	1897	numbers. To use these numbers, various encodings are needed.
3374	1898
3375	1899	=end original
3376	1900
3377	1901	Unicode 文字は抽象的な数値である I<符号位置> にアサインされています。
3378	1902	これらの数値を使うために、さまざまなエンコーディングが必要となります。
3379	1903
3380	1904	=over 4
3381	1905
3382	1906	=item *
3383	1907
3384	1908	UTF-8
3385	1909
3386	1910	=begin original
3387	1911
3388		UTF-8 is a variable-length (1 to 4 bytes), ~~byt~~e-order in~~dependent~~
	1912	UTF-8 is a variable-length (1 to 6 bytes, current character allocations
3389		e~~ncod~~i~~ng.~~ ~~In mos~~t o~~f P~~er~~l's~~ d~~ocum~~en~~tatio~~n, including ~~elsewhe~~re in ~~this~~
	1913	require 4 bytes), byte-order independent encoding. For ASCII (and we
3390		document, the term ~~"UTF-~~8" mean~~s als~~o "UTF-~~EBCDIC".~~ ~~But~~ i~~n thi~~s ~~section,~~
	1914	really do mean 7-bit ASCII, not another 8-bit encoding), UTF-8 is
3391		~~"UTF-8"~~ r~~efers o~~n~~ly to the encoding u~~s~~ed on ASCII~~ pla~~tfo~~r~~ms. I~~t ~~is a~~
	1915	transparent.
3392		superset of 7-bit US-ASCII, so anything encoded in ASCII has the
3393		identical representation when encoded in UTF-8.
3394	1916
3395	1917	=end original
3396	1918
3397		UTF-8 は可変長(1 から 4 バイト)で、
	1919	UTF-8 は可変長(1 から 6 バイト; 現在の文字配置では 4 バイトを要求します)で、
3398	1920	バイトの並び順に依存しないエンコーディングです。
3399		この~~文書の~~他の~~場所を含む~~ ~~Perl~~ の~~文書のほ~~とんどで、
	1921	ASCII(ここでは 7-bit ASCII のことで、他の 8-bit エンコーディングのことでは
3400		~~"UTF-8"~~ と~~いう用語は~~ "UTF-~~EBCDIC"~~ ~~も意味しま~~す。
	1922	ありません)と UTF-8 は透過です。
3401		しかしこの節では、
3402		"UTF-8" は ASCII プラットフォームで使われているエンコーディングを
3403		意味します。
3404		これは 7 ビット US-ASCII のスーパーセットなので、
3405		ASCII でエンコードされたものは全て UTF-8 でエンコードしたものと
3406		同じ表現になります。
3407	1923
3408	1924	=begin original
3409	1925
3410	1926	The following table is from Unicode 3.2.
3411	1927
3412	1928	=end original
3413	1929
3414	1930	以下のテーブルは Unicode 3.2 のものです。
3415	1931
3416		Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
	1932	Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
3417	1933
3418	1934	U+0000..U+007F 00..7F
3419		U+0080..U+07FF * C2..DF 80..BF
	1935	U+0080..U+07FF C2..DF 80..BF
3420		U+0800..U+0FFF E0 * A0..BF 80..BF
	1936	U+0800..U+0FFF E0 A0..BF 80..BF
3421	1937	U+1000..U+CFFF E1..EC 80..BF 80..BF
3422	1938	U+D000..U+D7FF ED 80..9F 80..BF
3423		U+D800..U+DFFF ~~+++++~~ utf~~16 surr~~o~~gat~~es, ~~not legal utf8 +++++~~
	1939	U+D800..U+DFFF ***** ill-formed *****
3424	1940	U+E000..U+FFFF EE..EF 80..BF 80..BF
3425		U+10000..U+3FFFF F0 * 90..BF 80..BF 80..BF
	1941	U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
3426	1942	U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
3427	1943	U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
3428	1944
3429	1945	=begin original
3430	1946
3431		Note the ~~gaps~~ ~~marked~~ by ~~"*" before several of~~ the ~~byte entries above~~. ~~These are~~
	1947	Note the C<A0..BF> in C<U+0800..U+0FFF>, the C<80..9F> in
3432		~~caused~~ ~~by l~~e~~gal~~ UTF-8 ~~avo~~iding ~~non-shortest~~ encod~~ings:~~ it ~~is tec~~h~~nically~~
	1948	C<U+D000...U+D7FF>, the C<90..B>F in C<U+10000..U+3FFFF>, and the
3433		~~poss~~i~~ble~~ to UTF~~-8-encode~~ a ~~singl~~e ~~code~~ p~~oint~~ ~~in diffe~~rent ways, but tha~~t is~~
	1949	C<80...8F> in C<U+100000..U+10FFFF>. The "gaps" are caused by legal
3434		~~explicitly~~ forbidden, an~~d the~~ shortest ~~possibl~~e encoding s~~hould~~ always be ~~used~~
	1950	UTF-8 avoiding non-shortest encodings: it is technically possible to
3435		(and t~~hat~~ is what ~~Perl doe~~s).
	1951	UTF-8-encode a single code point in different ways, but that is
	1952	explicitly forbidden, and the shortest possible encoding should always
	1953	be used. So that's what Perl does.
3436	1954
3437	1955	=end original
3438	1956
3439		~~上記で~~ ~~'*'~~ の~~マークが付いているいくつか~~の~~バイトエントリ~~の前の
	1957	C<U+0800..U+0FFF> の中の C<A0..BF>、C<U+D000...U+D7FF> の中の C<80..9F>、
3440		~~隙間に注意してください。~~
	1958	C<U+10000..U+3FFFF> の中の C<90..BF>、C<U+100000..U+10FFFF> の中の
3441		~~これらは、正当な UTF-~~8 ~~が最短でないエンコードを避けるため~~に
	1959	C<80...8F> に注意してください。
	1960	この「隙間」は、正当な UTF-8 が最短でないエンコードを避けるために
3442	1961	あります: 技術的には UTF-8 エンコードは一つの符号位置を複数の方法で
3443	1962	表すことができますが、これは明示的に禁止されていて、可能な限り最短の
3444		エンコードが常に使われます~~(そしてそれが Perl のすることです)~~。
	1963	エンコードが常に使われます。
	1964	従って、Perl もそうします。
3445	1965
3446	1966	=begin original
3447	1967
3448	1968	Another way to look at it is via bits:
3449	1969
3450	1970	=end original
3451	1971
3452	1972	これを見るもう一つの方法はビット単位で見ることです:
3453	1973
3454		~~Code~~ ~~Points~~ 1st Byte 2nd Byte 3rd Byte 4th Byte
	1974	Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
3455	1975
3456		0aaaaaaa 0aaaaaaa
	1976	0aaaaaaa 0aaaaaaa
3457		00000bbbbbaaaaaa 110bbbbb 10aaaaaa
	1977	00000bbbbbaaaaaa 110bbbbb 10aaaaaa
3458		ccccbbbbbbaaaaaa 1110cccc 10bbbbbb 10aaaaaa
	1978	ccccbbbbbbaaaaaa 1110cccc 10bbbbbb 10aaaaaa
3459		00000dddccccccbbbbbbaaaaaa 11110ddd 10cccccc 10bbbbbb 10aaaaaa
	1979	00000dddccccccbbbbbbaaaaaa 11110ddd 10cccccc 10bbbbbb 10aaaaaa
3460	1980
3461	1981	=begin original
3462	1982
3463		As you can see, the continuation bytes all begin with C<"10">, and the
	1983	As you can see, the continuation bytes all begin with C<10>, and the
3464		leading bits of the start byte tell how many bytes there are in the
	1984	leading bits of the start byte tell how many bytes the are in the
3465	1985	encoded character.
3466	1986
3467	1987	=end original
3468	1988
3469		見ての通り、後続バイトはすべて C<"10"> から始まっていて、開始バイトの
	1989	見ての通り、後続バイトはすべて C<10> から始まっていて、開始バイトの
3470	1990	先行ビットはエンコードされた文字がどのくらいの長さであるかを示しています。
3471	1991
3472		=begin original
3473
3474		The original UTF-8 specification allowed up to 6 bytes, to allow
3475		encoding of numbers up to C<0x7FFF_FFFF>. Perl continues to allow those,
3476		and has extended that up to 13 bytes to encode code points up to what
3477		can fit in a 64-bit word. However, Perl will warn if you output any of
3478		these as being non-portable; and under strict UTF-8 input protocols,
3479		they are forbidden. In addition, it is now illegal to use a code point
3480		larger than what a signed integer variable on your system can hold. On
3481		32-bit ASCII systems, this means C<0x7FFF_FFFF> is the legal maximum
3482		(much higher on 64-bit systems).
3483
3484		=end original
3485
3486		元の UTF-8 仕様は、C<0x7FFF_FFFF> までの数値をエンコードできるように、
3487		6 バイトまで許されていました。
3488		Perl はこれを許し続け、さらに 64 ビットワードに適合する符号位置を
3489		エンコードするために 13 バイトまで拡張しています。
3490		しかし、これらを出力すると、Perl は互換性がないとして警告します;
3491		そして厳密な UTF-8 入力プロトコルでは、これらは禁止されています。
3492		さらに、システムの符号付き整数変数が保持できるよりも
3493		大きな符号位置は不正になりました。
3494		32 ビット ASCII システムでは、
3495		これは、C<0x7FFF_FFFF> が正当な最大であることを意味します
3496		(64 ビットシステムでは遙かに大きいです)。
3497
3498	1992	=item *
3499	1993
3500	1994	UTF-EBCDIC
3501	1995
3502	1996	=begin original
3503	1997
3504		Like UTF-8, but EBCDIC-safe, in the way that UTF-8 is ASCII-safe.
	1998	Like UTF-8 but EBCDIC-safe, in the way that UTF-8 is ASCII-safe.
3505		This means that all the basic characters (which includes all
3506		those that have ASCII equivalents (like C<"A">, C<"0">, C<"%">, I<etc.>)
3507		are the same in both EBCDIC and UTF-EBCDIC.)
3508	1999
3509	2000	=end original
3510	2001
3511	2002	UTF-8 と似ていますが、UTF-8 が ASCII-safe であるように EBCDIC-safe です。
3512		つまり、全ての基本文字 ((C<"A">, C<"0">, C<"%"> などのような) ASCII の
3513		等価物が EBCDIC と UTF-EBCDIC で同じものを意味します。
3514	2003
3515		=begin original
3516
3517		UTF-EBCDIC is used on EBCDIC platforms. It generally requires more
3518		bytes to represent a given code point than UTF-8 does; the largest
3519		Unicode code points take 5 bytes to represent (instead of 4 in UTF-8),
3520		and, extended for 64-bit words, it uses 14 bytes instead of 13 bytes in
3521		UTF-8.
3522
3523		=end original
3524
3525		UTF-EBCDIC は EBCDIC プラットフォームで使われます。
3526		一般的に、ある符号位置を表現するのに UTF-8 よりも多くのバイト数を
3527		必要とします; 最大の Unicode 符号位置は表現するのに (UTF-8 の 4 バイト
3528		ではなく) 5 バイトを使い、64 ビットワードのために拡張されると、
3529		UTF-8 の場合の 13 バイトではなく 14 バイトを使います。
3530
3531	2004	=item *
3532	2005
3533	2006	=begin original
3534	2007
3535		UTF-16, UTF-16BE, UTF-16LE, Surrogates, and C<BOM>'s (Byte Order Marks)
	2008	UTF-16, UTF-16BE, UTF-16LE, Surrogates, and BOMs (Byte Order Marks)
3536	2009
3537	2010	=end original
3538	2011
3539		UTF-16, UTF-16BE, UTF-16LE, サロゲート, C<BOM> (Byte Order Marks)
	2012	UTF-16, UTF-16BE, UTF-16LE, サロゲート, BOM (Byte Order Marks)
3540	2013
3541	2014	=begin original
3542	2015
3543	2016	The followings items are mostly for reference and general Unicode
3544	2017	knowledge, Perl doesn't use these constructs internally.
3545	2018
3546	2019	=end original
3547	2020
3548	2021	以下の項目はほとんど参照および一般的な Unicode 知識のためのもので、
3549	2022	Perl はこれらの構造を内部で使っていません。
3550	2023
3551	2024	=begin original
3552	2025
3553		~~Like~~ UTF-~~8, UTF-~~16 is a variab~~le-wid~~th encoding, ~~but~~ where
	2026	UTF-16 is a 2 or 4 byte encoding. The Unicode code points
3554		UTF-8 uses ~~8-bi~~t code uni~~ts,~~ ~~UTF-16~~ uses 16-bit ~~code~~ units.
	2027	C<U+0000..U+FFFF> are stored in a single 16-bit unit, and the code
3555		All code points occupy either 2 or 4 bytes in UTF-16: code points
3556		C<U+0000..U+FFFF> are stored in a single 16-bit unit, and code
3557	2028	points C<U+10000..U+10FFFF> in two 16-bit units. The latter case is
3558	2029	using I<surrogates>, the first 16-bit unit being the I<high
3559	2030	surrogate>, and the second being the I<low surrogate>.
3560	2031
3561	2032	=end original
3562	2033
3563		UTF-~~8 と同様、UTF-~~16 は~~可変長~~エンコーディングですが、
	2034	UTF-16 は 2 バイトもしくは 4 バイトのエンコーディングです。
3564		UTF-8 が 8 ビットの符号ユニットを使っているところ、
3565		UTF-16 は 16 ビットの符号ユニットを使います。
3566		UTF-16 は全ての符号位置が 2 バイトもしくは 4 バイトです:
3567	2035	C<U+0000..U+FFFF> の範囲の Unicode の符号位置はひとつの 16 ビット
3568	2036	ユニットに収められ、C<U+10000..U+10FFFF> の範囲の符号位置は 2 つの
3569	2037	16 ビットユニットに収められます。
3570		後者をサロゲート(surrogates) と呼びます~~; 最初の 16 ビットユニットは~~
	2038	後者をサロゲート(surrogates) と呼びます。
3571		I<high surrogate> で、二番目は ~~I<low surrogate> となります。~~
	2039	最初の 16 ビットユニットは I<high surrogate> で、二番目は
	2040	I<low surrogate> となります。
3572	2041
3573	2042	=begin original
3574	2043
3575	2044	Surrogates are code points set aside to encode the C<U+10000..U+10FFFF>
3576	2045	range of Unicode code points in pairs of 16-bit units. The I<high
3577		surrogates> are the range C<U+D800..U+DBFF> and the I<low surrogates>
	2046	surrogates> are the range C<U+D800..U+DBFF>, and the I<low surrogates>
3578	2047	are the range C<U+DC00..U+DFFF>. The surrogate encoding is
3579	2048
3580	2049	=end original
3581	2050
3582	2051	サロゲートは Unicode の符号位置の C<U+10000..U+10FFFF> の範囲を
3583	2052	16 ビットユニットのペアで表現する集合です。
3584	2053	I<high surrogates> は C<U+D800..U+DBFF> の範囲で、I<low surrogates> は
3585	2054	C<U+DC00..U+DFFF> の範囲です。
3586	2055	サロゲートのエンコーディングは
3587	2056
3588		$hi = ($uni - 0x10000) / 0x400 + 0xD800;
	2057	$hi = ($uni - 0x10000) / 0x400 + 0xD800;
3589		$lo = ($uni - 0x10000) % 0x400 + 0xDC00;
	2058	$lo = ($uni - 0x10000) % 0x400 + 0xDC00;
3590	2059
3591	2060	=begin original
3592	2061
3593	2062	and the decoding is
3594	2063
3595	2064	=end original
3596	2065
3597	2066	であり、デコードは以下のようなものです
3598	2067
3599		$uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
	2068	$uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
3600	2069
3601	2070	=begin original
3602	2071
	2072	If you try to generate surrogates (for example by using chr()), you
	2073	will get a warning if warnings are turned on, because those code
	2074	points are not valid for a Unicode character.
	2075
	2076	=end original
	2077
	2078	(たとえば chr() を使って)サロゲートを生成しようとしたならば、
	2079	警告が有効であれば警告が発生するでしょう。
	2080	なぜなら、そういった符号位置は Unicode 文字としては正しいものではないからです。
	2081
	2082	=begin original
	2083
3603	2084	Because of the 16-bitness, UTF-16 is byte-order dependent. UTF-16
3604	2085	itself can be used for in-memory computations, but if storage or
3605	2086	transfer is required either UTF-16BE (big-endian) or UTF-16LE
3606	2087	(little-endian) encodings must be chosen.
3607	2088
3608	2089	=end original
3609	2090
3610	2091	16-bitness のため、UTF-16 はバイトの並び順に依存します。
3611	2092	UTF-16 それ自身はメモリ内の計算に使うことができますが、格納や転送の際には
3612	2093	UTF-16BE (ビッグエンディアン)か UTF-16LE (リトルエンディアン)の
3613	2094	いずれかのエンコーディングを選択しなければなりません。
3614	2095
3615	2096	=begin original
3616	2097
3617	2098	This introduces another problem: what if you just know that your data
3618	2099	is UTF-16, but you don't know which endianness? Byte Order Marks, or
3619		C<BOM>'s, are a solution to this. A special character has been reserved
	2100	BOMs, are a solution to this. A special character has been reserved
3620	2101	in Unicode to function as a byte order marker: the character with the
3621		code point C<U+FEFF> is the C<BOM>.
	2102	code point C<U+FEFF> is the BOM.
3622	2103
3623	2104	=end original
3624	2105
3625	2106	このことは別の問題を引き起こします: あなたのデータが UTF-16 であることだけを
3626	2107	知っていて、そのバイト並び順を知らなかったとしたら?
3627		バイト順マーク (Byte Order Marks)、略して C<BOM> はこれを解決します。
	2108	バイト順マーク (Byte Order Marks)、略して BOM はこれを解決します。
3628	2109	バイト並びのマーカーとしての機能のために Unicode では特殊な文字が
3629		予約されています: 符号位置 C<U+FEFF~~> の文字が C<BOM~~> です。
	2110	予約されています: その文字は符号位置の C<U+FEFF> です。
3630	2111
3631	2112	=begin original
3632	2113
3633		The trick is that if you read a C<BOM>, you will know the byte order,
	2114	The trick is that if you read a BOM, you will know the byte order,
3634	2115	since if it was written on a big-endian platform, you will read the
3635	2116	bytes C<0xFE 0xFF>, but if it was written on a little-endian platform,
3636	2117	you will read the bytes C<0xFF 0xFE>. (And if the originating platform
3637		was writing in ~~ASCII platform~~ UTF-8, you will read the bytes
	2118	was writing in UTF-8, you will read the bytes C<0xEF 0xBB 0xBF>.)
3638		C<0xEF 0xBB 0xBF>.)
3639	2119
3640	2120	=end original
3641	2121
3642		このトリックは、C<BOM> を読み込んだときにバイト順がわかるということです;
	2122	このトリックは、BOM を読み込んだときにバイト順がわかるということです。
3643	2123	ビッグエンディアンのプラットフォームで書かれたものならなら
3644	2124	C<0xFE 0xFF> を読み出し、リトルエンディガンのプラットフォームで
3645	2125	書かれたものなら C<0xFF 0xFE> を読み出します。
3646		(そしてもし元のプラットフォームで ~~ASCII プラットフォームの~~ UTF-8 で
	2126	(そしてもし元のプラットフォームで UTF-8 で書かれたものならば
3647		~~書かれたものならば、~~C<0xEF 0xBB 0xBF> というバイト列を
	2127	C<0xEF 0xBB 0xBF> というバイト列を読むことになるでしょう。)
3648		読むことになるでしょう。)
3649	2128
3650	2129	=begin original
3651	2130
3652	2131	The way this trick works is that the character with the code point
3653		C<U+FFFE> is not ~~suppos~~ed to be in in~~put~~ streams, so the
	2132	C<U+FFFE> is guaranteed not to be a valid Unicode character, so the
3654		sequence of bytes C<0xFF 0xFE> is unambiguously "C<BOM>, represented in
	2133	sequence of bytes C<0xFF 0xFE> is unambiguously "BOM, represented in
3655	2134	little-endian format" and cannot be C<U+FFFE>, represented in big-endian
3656	2135	format".
3657	2136
3658	2137	=end original
3659	2138
3660		このトリックがうまくいくのは符号位置 C<U+FFFE> の文字は
	2139	このトリックがうまくいくのは符号位置 C<U+FFFE> の文字は正当な
3661		~~入力ストリームには現れ~~ない~~はずである~~ということによって、
	2140	Unicode 文字でないということによって、C<0xFF 0xFE> という並びは紛れなく
3662		~~C<0xFF~~ ~~0xFE>~~ ~~という並びは紛れなく~~
	2141	"リトルエンディアンフォーマットの BOM" であって
3663		~~「リトル~~エンディアン~~フォーマット~~の C<~~BOM~~>」で~~あって~~
	2142	"ビッグエンディアンの C<U+FFFE>" とはならないのです。
3664		「ビッグエンディアンの C<U+FFFE>」とはならないのです。
3665	2143
3666		=begin original
3667
3668		Surrogates have no meaning in Unicode outside their use in pairs to
3669		represent other code points. However, Perl allows them to be
3670		represented individually internally, for example by saying
3671		C<chr(0xD801)>, so that all code points, not just those valid for open
3672		interchange, are
3673		representable. Unicode does define semantics for them, such as their
3674		C<L</General_Category>> is C<"Cs">. But because their use is somewhat dangerous,
3675		Perl will warn (using the warning category C<"surrogate">, which is a
3676		sub-category of C<"utf8">) if an attempt is made
3677		to do things like take the lower case of one, or match
3678		case-insensitively, or to output them. (But don't try this on Perls
3679		before 5.14.)
3680
3681		=end original
3682
3683		サロゲートは、他の符号位置を表すためにペアで使用する以外は、
3684		Unicode では意味を持ちません。
3685		ただし、Perl では、例えば C<chr(0xD801)> と記述することによって、
3686		内部的に個別に表すことができるため、
3687		オープンな交換に妥当な符号位置だけでなく、
3688		すべての符号位置を表すことができます。
3689		Unicode では、C<L</General_Category>> が C<"Cs"> であるなどの、
3690		このための意味論が定義されています。
3691		しかし、これらの使用はやや危険であるため、Perl では、小文字を使用したり、
3692		大文字と小文字を無視してマッチングしたり、出力しようとした場合には、
3693		(C<"utf8"> のサブカテゴリである C<"surrogate"> 警告カテゴリを使って) 警告が
3694		出されます。
3695		(ただし、5.14 より前の Perl でこれを使用しないでください。)
3696
3697	2144	=item *
3698	2145
3699	2146	UTF-32, UTF-32BE, UTF-32LE
3700	2147
3701	2148	=begin original
3702	2149
3703		The UTF-32 family is pretty much like the UTF-16 family, except that
	2150	The UTF-32 family is pretty much like the UTF-16 family, expect that
3704	2151	the units are 32-bit, and therefore the surrogate scheme is not
3705		needed. UT~~F-32 is a fixed-widt~~h e~~ncoding.~~ ~~The C<~~BOM> signatures are
	2152	needed. The BOM signatures will be C<0x00 0x00 0xFE 0xFF> for BE and
3706		C<0x~~00 0x00 0xFE 0xFF> for BE and C<0x~~FF 0xFE 0x00 0x00> for LE.
	2153	C<0xFF 0xFE 0x00 0x00> for LE.
3707	2154
3708	2155	=end original
3709	2156
3710	2157	UTF-32 ファミリーは UTF-16 ファミリーと良く似ていますが、ユニットが
3711	2158	32 ビットで、そのためサロゲート方式の必要がないという点が異なります。
3712		~~UTF-32~~ ~~は固定長エンコーディン~~グです。
	2159	BOM シグネチャは BE では C<0x00 0x00 0xFE 0xFF> に、
3713		C<BOM> シグネチャは BE では C<0x00 0x00 0xFE 0xFF> に、
3714	2160	LE では C<0xFF 0xFE 0x00 0x00> になります。
3715	2161
3716	2162	=item *
3717	2163
3718	2164	UCS-2, UCS-4
3719	2165
3720	2166	=begin original
3721	2167
3722		~~Legacy, fixed-width e~~ncodings defined by the ISO 10646 standard. UCS-2 is a 16-bit
	2168	Encodings defined by the ISO 10646 standard. UCS-2 is a 16-bit
3723	2169	encoding. Unlike UTF-16, UCS-2 is not extensible beyond C<U+FFFF>,
3724	2170	because it does not use surrogates. UCS-4 is a 32-bit encoding,
3725		functionally identical to UTF-32 ~~(the difference being that~~
	2171	functionally identical to UTF-32.
3726		UCS-4 forbids neither surrogates nor code points larger than C<0x10_FFFF>).
3727	2172
3728	2173	=end original
3729	2174
3730		ISO 10646 標準で定義されている~~古い固定長の~~エンコーディングです。
	2175	ISO 10646 標準で定義されているエンコーディングです。
3731	2176	UCS-2 は 16 ビットエンコーディングです。
3732		UTF-16 とは異なり、UCS-2 は C<U+FFFF> を超えた範囲に拡張できません;
	2177	UTF-16 とは異なり、UCS-2 は C<U+FFFF> を超えた範囲に拡張できません。
3733	2178	これはサロゲートを使わないためです。
3734		UCS-4 は 32 ビットエンコーディングで、機能的には UTF-32 と同じです
	2179	UCS-4 は 32 ビットエンコーディングで、機能的には UTF-32 と同じです。
3735		(違いは、UCS-4 はサロゲートや C<0x10_FFFF> より大きな符号位置を
3736		禁止していることです)。
3737	2180
3738	2181	=item *
3739	2182
3740	2183	UTF-7
3741	2184
3742	2185	=begin original
3743	2186
3744	2187	A seven-bit safe (non-eight-bit) encoding, which is useful if the
3745	2188	transport or storage is not eight-bit safe. Defined by RFC 2152.
3746	2189
3747	2190	=end original
3748	2191
3749	2192	7 ビットセーフ(非 8 ビット)エンコーディングで、8 ビットセーフでない
3750	2193	転送や格納に便利です。
3751	2194	RFC 2152 によって定義されています。
3752	2195
3753	2196	=back
3754	2197
3755		=head2 ~~Non~~character code points
	2198	=head2 Security Implications of Unicode
3756	2199
3757		(~~非文字符号位置~~)
	2200	(Unicode のセキュリティへの影響)
3758	2201
3759		=begin original
3760
3761		66 code points are set aside in Unicode as "noncharacter code points".
3762		These all have the C<Unassigned> (C<Cn>) C<L</General_Category>>, and
3763		no character will ever be assigned to any of them. They are the 32 code
3764		points between C<U+FDD0> and C<U+FDEF> inclusive, and the 34 code
3765		points:
3766
3767		=end original
3768
3769		66 の符号位置は、Unicode では「非文字符号位置」として確保されています。
3770		これらはすべて C<Unassigned> (C<Cn>) C<L</General_Category>> を持ち、
3771		これらに文字が割り当てられることはありません。
3772		それは、C<U+FDD0> と C<U+FDEF> の間にある 32 の符号位置と、
3773		次の 34 の符号位置です。
3774
3775		U+FFFE U+FFFF
3776		U+1FFFE U+1FFFF
3777		U+2FFFE U+2FFFF
3778		...
3779		U+EFFFE U+EFFFF
3780		U+FFFFE U+FFFFF
3781		U+10FFFE U+10FFFF
3782
3783		=begin original
3784
3785		Until Unicode 7.0, the noncharacters were "B<forbidden> for use in open
3786		interchange of Unicode text data", so that code that processed those
3787		streams could use these code points as sentinels that could be mixed in
3788		with character data, and would always be distinguishable from that data.
3789		(Emphasis above and in the next paragraph are added in this document.)
3790
3791		=end original
3792
3793		Unicode 7.0 までは、非文字は
3794		「Unicode テキストデータのオープンな交換での使用は B<禁止>」であったため、
3795		これらのストリームを処理するコードは、これらの符号位置を
3796		文字データと混在させられる標識として使うことができ、
3797		それは常にデータと区別できました。
3798		(前述および次の段落の強調は、この文書によって追加されています。)
3799
3800		=begin original
3801
3802		Unicode 7.0 changed the wording so that they are "B<not recommended> for
3803		use in open interchange of Unicode text data". The 7.0 Standard goes on
3804		to say:
3805
3806		=end original
3807
3808		Unicode 7.0 では、「Unicode テキストデータのオープンな交換での使用は
3809		B<非推奨>」という表現に変更されました。
3810		7.0 標準では、次のように記述されています。
3811
3812	2202	=over 4
3813	2203
3814		=~~beg~~in ~~original~~
	2204	=item *
3815	2205
3816		"If a noncharacter is received in open interchange, an application is
3817		not required to interpret it in any way. It is good practice, however,
3818		to recognize it as a noncharacter and to take appropriate action, such
3819		as replacing it with C<U+FFFD> replacement character, to indicate the
3820		problem in the text. It is not recommended to simply delete
3821		noncharacter code points from such text, because of the potential
3822		security issues caused by deleting uninterpreted characters. (See
3823		conformance clause C7 in Section 3.2, Conformance Requirements, and
3824		L<Unicode Technical Report #36, "Unicode Security
3825		Considerations"\|https://www.unicode.org/reports/tr36/#Substituting_for_Ill_Formed_Subsequences>)."
3826
3827		=end original
3828
3829		「オープンな交換で非文字が受信された場合、
3830		アプリケーションはそれを解釈する必要はない。
3831		しかし、それを非文字として認識し、テキスト内の問題を示すために
3832		C<U+FFFD> 置換文字で置き換えるなどの適切なアクションを
3833		実行することはよい習慣である。
3834		解釈されない文字を削除することによってセキュリティ上の問題が発生する
3835		可能性があるため、そのようなテキストから非文字符号位置を単純に
3836		削除することは推奨しない。
3837		(Conformance Clause C7 in Section 3.2,Conformance Requirements および
3838		L<Unicode Technical Report #36, "Unicode Security
3839		Considerations"\|https://www.unicode.org/reports/tr36/#Substituting_for_Ill_Formed_Subsequences>
3840		を参照)。」
3841
3842		=back
3843
3844	2206	=begin original
3845	2207
3846		~~This ch~~a~~nge was made because it was~~ fo~~und that va~~r~~ious co~~mme~~rcial~~ ~~tools~~
	2208	Malformed UTF-8
3847		like editors, or for things like source code control, had been written
3848		so that they would not handle program files that used these code points,
3849		effectively precluding their use almost entirely! And that was never
3850		the intent. They've always been meant to be usable within an
3851		application, or cooperating set of applications, at will.
3852	2209
3853	2210	=end original
3854	2211
3855		~~この変更は、~~
	2212	不正な UTF-8
3856		エディタやソースコード制御のような様々な市販のツールが、
3857		これらの符号位置を使ったプログラムファイルを扱わず、
3858		事実上これらの使用をほぼ完全に排除していることが分かったからです!
3859		そしてこれは決して意図してものではありません。
3860		これは常に、アプリケーション単体や、協調するアプリケーションの
3861		集合の中で、任意に使えることを意味していました。
3862	2213
3863	2214	=begin original
3864	2215
3865		If you'r~~e wri~~tin~~g cod~~e, ~~suc~~h as an editor, ~~that~~ is s~~upp~~osed to ~~be able~~
	2216	Unfortunately, the specification of UTF-8 leaves some room for
3866		to h~~andle~~ any Unicode text ~~data, the~~n ~~you~~ should~~n't~~ be ~~usi~~n~~g th~~e~~se cod~~e
	2217	interpretation of how many bytes of encoded output one should generate
3867		points yo~~urs~~e~~lf,~~ a~~nd ins~~tead ~~allow~~ t~~hem~~ in the ~~input. If y~~o~~u n~~eed
	2218	from one input Unicode character. Strictly speaking, the shortest
3868		s~~ent~~inels, they should ~~instead~~ be ~~som~~e~~thi~~n~~g th~~at ~~isn't l~~e~~gal Unico~~de.
	2219	possible sequence of UTF-8 bytes should be generated,
3869		~~For UTF-8 data, you~~ can use the bytes ~~0xC1 a~~nd ~~0xC2~~ a~~s se~~ntinels, as
	2220	because otherwise there is potential for an input buffer overflow at
3870		they ~~neve~~r ~~app~~ear in we~~ll-forme~~d UTF-8. ~~(Th~~ere are e~~quivale~~nts ~~for~~
	2221	the receiving end of a UTF-8 connection. Perl always generates the
3871		UTF-~~EBCDIC).~~ ~~You c~~an ~~also s~~t~~ore~~ ~~you~~r Uni~~code~~ code poi~~nts~~ in int~~eger~~
	2222	shortest length UTF-8, and with warnings on Perl will warn about
3872		var~~iabl~~es and ~~use~~ negative values as sentine~~ls.~~
	2223	non-shortest length UTF-8 along with other malformations, such as the
	2224	surrogates, which are not real Unicode code points.
3873	2225
3874	2226	=end original
3875	2227
3876		任意の Unicode ~~テキストデータを扱えると想定される、エディタなど~~の~~コードを~~
	2228	残念ながら、UTF-8 の仕様ではひとつの Unicode 文字の入力から
3877		書いてい~~る場合は、これ~~らの~~符号位置を自分自身では使わず、~~
	2229	何バイトのエンコードされた出力として解釈するのかについていくらかの
3878		~~入力として使用できるようにする必要~~があります。
	2230	余地があります。
3879		~~番兵文字が必要な場合は~~、正当な U~~nicode~~ ~~ではないものを使う必要~~が~~ありま~~す。
	2231	厳密にいえば、可能な限り最も短い UTF-8 バイト列が生成されるべきです。
3880		UTF-8 ~~データ~~の~~場合は、バイト 0xC1~~ お~~よび 0xC2 を番兵文字とし~~て~~使用できます;~~
	2232	なぜなら、そうしないと UTF-8 コネクションの終わりにおいて、入力バッファが
3881		~~これらは整形式の UTF-8 では決して現れない~~からです
	2233	オーバーフローする可能性があるからです。
3882		(UTF-~~EBCDIC~~ ~~にも同等~~のもの~~があります。)~~
	2234	Perl は常に最も短い長さの UTF-8 を生成し、本当の Unicode の符号位置でない
3883		U~~nicode~~ ~~符号位置を整数変数~~に格納し~~、負の値~~を~~センチネルと~~して
	2235	サロゲートのような不正な形式の最短でない UTF-8 に関して警告を発します。
3884		使用することもできます。
3885	2236
	2237	=item *
3886	2238
3887	2239	=begin original
3888	2240
3889		~~If yo~~u're ~~not w~~riting s~~uch~~ a tool, then whether you a~~ccep~~t ~~nonch~~a~~racters~~
	2241	Regular expressions behave slightly differently between byte data and
3890		a~~s inpu~~t is up to ~~you~~ ~~(th~~o~~ugh~~ the ~~Standa~~rd rec~~omm~~e~~nds~~ that ~~you not). If~~
	2242	character (Unicode) data. For example, the "word character" character
3891		~~you~~ do ~~str~~ict input ~~str~~e~~am ch~~ecking with ~~Perl,~~ these ~~cod~~e points
	2243	class C<\w> will work differently depending on if data is eight-bit bytes
3892		co~~ntinue to be fo~~r~~bidden.~~ ~~This is to mai~~ntai~~n ba~~c~~kward c~~o~~mpatibility~~
	2244	or Unicode.
3893		(otherwise potential security holes could open up, as an unsuspecting
3894		application that was written assuming the noncharacters would be
3895		filtered out before getting to it, could now, without warning, start
3896		getting them). To do strict checking, you can use the layer
3897		C<:encoding('UTF-8')>.
3898	2245
3899	2246	=end original
3900	2247
3901		~~そのようなツ~~ー~~ルを書いているの~~でな~~いのなら、~~
	2248	正規表現はバイトデータと文字(Unicode)データとでまったく異なる
3902		~~非文字~~を~~入力と~~し~~て受け付けるかどうかはあなた次第で~~す
	2249	振る舞いをします。
3903		(~~しかし標準~~はそ~~うしないことを勧めています)。~~
	2250	たとえば、単語文字("word character")クラス C<\w> はそのデータが
3904		Perl ~~で厳密~~な~~入力ストリームチェック~~をす~~るなら、~~
	2251	8 ビットバイトか Unicode かに依存して異なる働きをします。
3905		これらの符号位置は禁止され続けます。
3906		これは後方互換性を維持するためです
3907		(そうしないと、非文字は受け取る前にフィルタリングされることを仮定して
3908		書かれた、疑わないアプリケーションが、警告なしに受け取るようになるといため、
3909		潜在的なセキュリティホールが開く可能性があります)。
3910		厳密なチェックをするためには、C<:encoding('UTF-8')> 層を使えます。
3911	2252
3912	2253	=begin original
3913	2254
3914		~~Perl co~~nt~~inu~~es to wa~~rn (u~~s~~ing~~ the ~~warning cat~~egory C<~~"non~~char~~">,~~ which
	2255	In the first case, the set of C<\w> characters is either small--the
3915		is a su~~b-ca~~te~~gory~~ of ~~C<"u~~t~~f8">)~~ if an atte~~mpt~~ is made to ou~~tput~~
	2256	default set of alphabetic characters, digits, and the "_"--or, if you
3916		noncharacte~~rs.~~
	2257	are using a locale (see L<perllocale>), the C<\w> might contain a few
	2258	more letters according to your language and country.
3917	2259
3918	2260	=end original
3919	2261
3920		~~Perl は~~、~~非文字を出力しようとすると(~~C<~~"utf8"~~> の~~サブカテゴリ~~である
	2262	第一の場合、C<\w> 文字の集合は相対的に小さいものです -- アルファベット、
3921		C<"nonchar"> ~~警告カテゴリ~~を~~使って~~)~~警告し続けます。~~
	2263	数字、そして "_" のデフォルト集合 -- もしくはロケール(L<perllocale> を参照)を
	2264	使っているのであれば、C<\w> はあなたの使っている言語や国に応じていくつかの
	2265	文字が増えているかもしれません。
3922	2266
3923		=head2 Beyond Unicode code points
3924
3925		(Unicode 符号位置を越えたもの)
3926
3927	2267	=begin original
3928	2268
3929		The ~~maximum Uni~~code code ~~poin~~t is C<~~U+10FFFF~~>, ~~and Unicod~~e o~~nly~~ defines
	2269	In the second case, the C<\w> set of characters is much, much larger.
3930		o~~pera~~tions on ~~code po~~ints up th~~rough~~ t~~hat.~~ But Perl works on code
	2270	Most importantly, even in the set of the first 256 characters, it will
3931		po~~ints~~ up t~~o t~~he ~~max~~i~~mum p~~er~~missibl~~e signed num~~ber~~ ~~avai~~lable ~~on t~~he
	2271	probably match different characters: unlike most locales, which are
3932		p~~lat~~f~~orm.~~ Ho~~wever,~~ ~~Perl~~ will not accept ~~the~~se f~~rom~~ i~~nput str~~eams unless
	2272	specific to a language and country pair, Unicode classifies all the
3933		lax rules are be~~ing~~ used, and w~~ill~~ warn ~~(using th~~e wa~~rning cat~~egory
	2273	characters that are letters I<somewhere> as C<\w>. For example, your
3934		~~C<"n~~o~~n_uni~~code~~">,~~ which is a ~~sub-ca~~t~~egory~~ of ~~C<"utf8">)~~ if any are output.
	2274	locale might not think that LATIN SMALL LETTER ETH is a letter (unless
	2275	you happen to speak Icelandic), but Unicode does.
3935	2276
3936	2277	=end original
3937	2278
3938		~~Unicode 符号位置~~の~~最大値は~~ C<~~U+10FFFF~~> で、
	2279	第二の場合、C<\w> の文字集合は相対的に大きなものになります。
3939		~~Unicode は~~こ~~こまで~~の~~符号位置に対する操作~~の~~みを定義し~~て~~います。~~
	2280	最も重要なことは、最初の 256 文字の集合にあってさえ異なる文字と
3940		~~しかし、Perl は、プラ~~ッ~~トフォームで利用~~可能~~な符号付きの最大数ま~~での
	2281	マッチする可能性があるということです: 言語と国のペアで指定される
3941		~~符号位置で動作します。~~
	2282	大部分のロケールと異なり、Unicode のクラス分けは I<どこかにある>
3942		~~しかし、Perl は、緩い規則が使用され~~て~~いないかぎり、入力ストリームから~~
	2283	すべての文字を C<\w> に属するものとします。
3943		~~これらを受け入れず、それらを出力しよう~~とする~~と(C<"utf8">~~ の
	2284	たとえば、あなたの使っているロケールは LATIN SMALL LETTER ETH が
3944		~~サブカテゴリである C<"non_unicode"> 警告カテゴリ~~を使って)警告し~~ます。~~
	2285	(アイスランド語を使っていない限り)属していないとみなしているでしょうが、
	2286	Unicode は属するものとしてみなすのです。
3945	2287
3946	2288	=begin original
3947	2289
3948		~~Since~~ Unicode rules are no~~t defi~~ned on these ~~code~~ points, i~~f a~~
	2290	As discussed elsewhere, Perl has one foot (two hooves?) planted in
3949		~~Uni~~co~~de-de~~f~~ined~~ ~~opera~~tion is ~~done on~~ them, Perl uses what we be~~lieve~~ are
	2291	each of two worlds: the old world of bytes and the new world of
3950		~~sensible~~ rules, ~~while~~ g~~ene~~ra~~lly warn~~ing, us~~ing~~ the ~~C<"~~non~~_uni~~code">
	2292	characters, upgrading from bytes to characters when necessary.
3951		~~categ~~ory. For exampl~~e, C<u~~c~~("\x{11_0000}")> w~~ill gene~~rate~~ ~~such~~ a
	2293	If your legacy code does not explicitly use Unicode, no automatic
3952		w~~arn~~i~~ng, re~~t~~urning t~~he ~~inpu~~t parameter as its ~~res~~ult, since Perl d~~efi~~nes
	2294	switch-over to characters should happen. Characters shouldn't get
3953		~~the uppe~~rcase of ~~ever~~y ~~non-Unicod~~e ~~cod~~e point to be the code point
	2295	downgraded to bytes, either. It is possible to accidentally mix bytes
3954		itse~~lf.~~ ~~(All t~~he case ~~changing o~~perations, not ~~just~~ ~~upper~~cas~~ing,~~ w~~ork~~
	2296	and characters, however (see L<perluniintro>), in which case C<\w> in
3955		this way.)
	2297	regular expressions might start behaving differently. Review your
	2298	code. Use warnings and the C<strict> pragma.
3956	2299
3957	2300	=end original
3958	2301
3959		~~Unicod~~e ~~の規則~~は~~これら~~の~~符号位置に対して定義さ~~れ~~ていないため、~~
	2302	すでに述べている通り、Perl は二つの世界のそれぞれに片方の足
3960		~~Unicode~~ ~~が定義~~し~~た操作がこれらに対して行われた場合~~、
	2303	(二つのひづめ?) を突っ込んでいます: 古いバイトの世界と新しい文字の世界で、
3961		~~Perl は私たちがふさわしいと信~~じ~~る規則を使い~~ます~~が、一般的には~~
	2304	必要に応じてバイトから文字に昇格します。
3962		~~C<"~~n~~on_un~~icode"> ~~カテゴリの警告~~を行い~~ます。~~
	2305	もしあなたの古いコードが明示的に Unicode を使っていないのなら、文字への
3963		例え~~ば、C<uc("\x{11_0000}")> は~~こ~~の警告を生成し、入力パラメータを~~
	2306	切り替えが自動的になされることはありません。
3964		~~その結果として返します; Perl はすべての非 Unicode 符号位置の大~~文字~~をその~~
	2307	文字はバイトにダウングレードされるべきではありません。
3965		~~符号位置自身~~と~~定義してい~~る~~からで~~す。
	2308	偶発的にバイトと文字が混じる可能性がありますが(L<perluniintro> を参照)、
3966		~~(大文字化だけでなく、全て~~の~~大文字小文字変更操作はこの~~ように
	2309	そのような場合正規表現中の C<\w> は異なるふるまいをするかもしれません。
3967		動作します。)
	2310	あなたのコードをレビューしてください。
	2311	warnings と C<strict> プラグマを使ってください。
3968	2312
3969		=b~~egin origin~~al
	2313	=back
3970	2314
3971		The ~~situ~~a~~tion~~ ~~with matching~~ Unicode ~~propert~~i~~es i~~n re~~gula~~r ~~expressi~~ons,
	2315	=head2 Unicode in Perl on EBCDIC
3972		the C<\p{}> and C<\P{}> constructs, against these code points is not as
3973		clear cut, and how these are handled has changed as we've gained
3974		experience.
3975	2316
3976		~~=end~~ or~~igina~~l
	2317	(EBCDIC 上の Perl での Unicode)
3977	2318
3978		正規表現中で C<\p{}> や C<\P{}> 構文による Unicode 特性を
3979		このような符号位置に対してマッチングさせる状況は
3980		はっきりしたものではなく、これらをどのように扱うかは
3981		経験を積むにつれて変更されてきました。
3982
3983	2319	=begin original
3984	2320
3985		One ~~possibilit~~y i~~s t~~o treat ~~any match aga~~inst these code points as
	2321	The way Unicode is handled on EBCDIC platforms is still
3986		~~und~~efined. ~~But~~ since ~~Per~~l does~~n't~~ ~~hav~~e the concept of a ~~match b~~eing
	2322	experimental. On such platforms, references to UTF-8 encoding in this
3987		undefin~~ed, i~~t converts this to ~~fai~~l~~ing~~ or ~~C<FALSE>. Thi~~s is ~~almos~~t, ~~but~~
	2323	document and elsewhere should be read as meaning the UTF-EBCDIC
3988		not quite, what Perl ~~did from v5.~~14 ~~(when~~ use of ~~the~~se ~~code~~ points
	2324	specified in Unicode Technical Report 16, unless ASCII vs. EBCDIC issues
3989		~~bec~~ame ge~~ner~~ally ~~rel~~i~~able) thro~~u~~gh v5~~.~~18.~~ The ~~diffe~~re~~nce~~ is that Perl
	2325	are specifically discussed. There is no C<utfebcdic> pragma or
3990		tre~~ate~~d all ~~C<\p{}> m~~atches as fa~~ili~~ng, but all ~~C<\P{}> match~~es as
	2326	":utfebcdic" layer; rather, "utf8" and ":utf8" are reused to mean
3991		succeeding.
	2327	the platform's "natural" 8-bit encoding of Unicode. See L<perlebcdic>
	2328	for more discussion of the issues.
3992	2329
3993	2330	=end original
3994	2331
3995		一つの~~可能性は、これら~~の~~符号位置に対~~す~~るあらゆるマッチングを~~
	2332	EBCDIC プラットフォームでの Unicode の扱い方は未だ実験的です。
3996		~~未定義として扱~~うことです。
	2333	このようなプラットフォームでは、この文書やその他での
3997		~~しかし、Perl~~ ~~はマッチ~~ング~~が未定義であるという概念を持っていない~~ので、
	2334	UTF-8 エンコーディングへの言及は、特に ASCII 対 EBCDIC 問題について
3998		これは失敗、~~または~~ ~~C<FALSE>~~ ~~に変換されます。~~
	2335	議論されている場合でない限りは、Unicode Technical Report 16 で
3999		これ~~は、(完全にではありませんが)~~ ほぼ ~~(これら~~の~~符号位置が一般的に~~
	2336	定義されている UTF-EBCDIC を意味するものとして読むべきです。
4000		~~信頼できるようになった)~~ ~~v5.14~~ から ~~v5.18~~ ま~~での動作です。~~
	2337	C<utfebcdic> プラグマや ":utfebcdic" 層はありません;
4001		~~違いは~~、 ~~Perl~~ ~~は全ての~~ ~~C<\p{}> マッチングを失敗として扱うけれども~~、
	2338	代わりに、"utf8" と ":utf8" が、そのプラットフォームの「自然な」
4002		全ての ~~C<\P{}>~~ マッチング~~は成功として扱~~う~~ことで~~す。
	2339	Unicode の 8 ビットエンコーディングを意味するように再利用されています。
	2340	この問題に関する更なる議論については L<perlebcdic> を参照してください。
4003	2341
4004		=be~~gin~~ o~~rigin~~al
	2342	=head2 Locales
4005	2343
4006		~~One problem with this is that it leads to unexpected, and confusing~~
	2344	(ロケール)
4007		results in some cases:
4008	2345
4009		=end original
4010
4011		これの問題の一つは、場合によっては、予想外で混乱する結果が
4012		導かれることです:
4013
4014		chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Failed on <= v5.18
4015		chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Failed! on <= v5.18
4016
4017	2346	=begin original
4018	2347
4019		~~That i~~s~~, it tre~~a~~ted~~ bo~~th mat~~ches as ~~und~~efin~~ed,~~ and converted that to
	2348	Usually locale settings and Unicode do not affect each other, but
4020		~~fals~~e (r~~aising~~ a wa~~rning~~ on e~~ach).~~ ~~The~~ f~~irst~~ ~~cas~~e ~~is the e~~xpected
	2349	there are a couple of exceptions:
4021		result, but the second is likely counterintuitive: "How could both be
4022		false when they are complements?" Another problem was that the
4023		implementation optimized many Unicode property matches down to already
4024		existing simpler, faster operations, which don't raise the warning. We
4025		chose to not forgo those optimizations, which help the vast majority of
4026		matches, just to generate a warning for the unlikely event that an
4027		above-Unicode code point is being matched against.
4028	2350
4029	2351	=end original
4030	2352
4031		~~つまり、両方~~の~~マッチングは未~~定義と~~して扱われ~~、
	2353	通常ロケールの設定と Unicode は互いに影響を及ぼすことはありませんが、
4032		~~偽に変換されます (それぞれ警告~~が~~発生し~~ます)。
	2354	いくつかの例外があります:
4033		最初の場合は予想される結果ですが、2 番目のものはおそらく直感に反します:
4034		「補集合の両方で失敗するってどういうこと?」
4035		もう一つの問題は、多くの Unicode 特性は、警告を発生させない、
4036		既存のより単純で高速な演算に最適化される実装であることです。
4037		私たちは、単に Unicode を超える符号位置に対してマッチングをするという
4038		めったに起きないことで警告を出すということのために、
4039		圧倒的多数のマッチングの助けになるこれらの最適化をしないで
4040		済ませるということはしないことを選びました;
4041	2355
4042		=begin original
4043
4044		As a result of these problems, starting in v5.20, what Perl does is
4045		to treat non-Unicode code points as just typical unassigned Unicode
4046		characters, and matches accordingly. (Note: Unicode has atypical
4047		unassigned code points. For example, it has noncharacter code points,
4048		and ones that, when they do get assigned, are destined to be written
4049		Right-to-left, as Arabic and Hebrew are. Perl assumes that no
4050		non-Unicode code point has any atypical properties.)
4051
4052		=end original
4053
4054		これらの問題の結果として、v5.20 から Perl が行うことは、
4055		非 Unicode 符号位置を単なる典型的な未割り当て Unicode 文字として扱い、
4056		それに応じてマッチングするということです。
4057		(注意: Unicode には典型的でない未割り当て符号位置があります。
4058		例えば、非文字符号位置があります; 割り当てられたときにはそうだったものの
4059		一つは、アラビア語やヘブライ語のように右から左に書かれることになるものです。
4060		Perl は、非 Unicode 符号位置に典型的でない属性はないことを仮定しています。)
4061
4062		=begin original
4063
4064		Perl, in most cases, will raise a warning when matching an above-Unicode
4065		code point against a Unicode property when the result is C<TRUE> for
4066		C<\p{}>, and C<FALSE> for C<\P{}>. For example:
4067
4068		=end original
4069
4070		ほとんどの場合、Perl は、Unicode 特性を Unicode を超える符号位置に
4071		マッチングして、結果が C<\p{}> なら C<TRUE>、C<\P{}> なら C<FALSE> の
4072		場合、警告を発生させます。
4073		例えば:
4074
4075		chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails, no warning
4076		chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Succeeds, with warning
4077
4078		=begin original
4079
4080		In both these examples, the character being matched is non-Unicode, so
4081		Unicode doesn't define how it should match. It clearly isn't an ASCII
4082		hex digit, so the first example clearly should fail, and so it does,
4083		with no warning. But it is arguable that the second example should have
4084		an undefined, hence C<FALSE>, result. So a warning is raised for it.
4085
4086		=end original
4087
4088		これら両方の例において、マッチングする文字は非 Unicode なので、
4089		Unicode はこれがどのようにマッチングするべきかを定義していません。
4090		これは明らかに ASCII 16 進文字ではないので、
4091		最初の例は明らかに失敗するべきで、実際警告なしで失敗します。
4092		しかし 2 番目の例は未定義の、つまり C<FALSE> の結果となるべきです。
4093		従って警告が発生します。
4094
4095		=begin original
4096
4097		Thus the warning is raised for many fewer cases than in earlier Perls,
4098		and only when what the result is could be arguable. It turns out that
4099		none of the optimizations made by Perl (or are ever likely to be made)
4100		cause the warning to be skipped, so it solves both problems of Perl's
4101		earlier approach. The most commonly used property that is affected by
4102		this change is C<\p{Unassigned}> which is a short form for
4103		C<\p{General_Category=Unassigned}>. Starting in v5.20, all non-Unicode
4104		code points are considered C<Unassigned>. In earlier releases the
4105		matches failed because the result was considered undefined.
4106
4107		=end original
4108
4109		従って、警告は以前の Perl よりも遙かに少ない場合で、
4110		結果に議論の余地がある場合にのみ発生します。
4111		Perl によって行われた (および行われうる) どの最適化によっても
4112		警告は飛ばされなくなったので、Perl の以前の手法の二つの問題両方を
4113		解決しています。
4114		この変更の影響を受ける、もっともよく使われている特性は、
4115		C<\p{General_Category=Unassigned}> の短縮版である C<\p{Unassigned}> です。
4116		v5.20 から、全ての非 Unicode 符号位置は C<Unassigned> として扱われます。
4117		以前のリリースでは、結果は未定義として扱われていたのでこのマッチングは
4118		失敗していました。
4119
4120		=begin original
4121
4122		The only place where the warning is not raised when it might ought to
4123		have been is if optimizations cause the whole pattern match to not even
4124		be attempted. For example, Perl may figure out that for a string to
4125		match a certain regular expression pattern, the string has to contain
4126		the substring C<"foobar">. Before attempting the match, Perl may look
4127		for that substring, and if not found, immediately fail the match without
4128		actually trying it; so no warning gets generated even if the string
4129		contains an above-Unicode code point.
4130
4131		=end original
4132
4133		警告が発生するべきかもしれないけれでも発生しない唯一の場所は、
4134		最適化によってパターンマッチング自体が試みられさえもしなかった場合です。
4135		例えば、ある正規表現パターンにマッチングする文字列に対して、
4136		文字列が特定の部分文字列 C<"foobar"> を含んでいなければならないことに
4137		Perl が気付いたとします。
4138		マッチングを試みる前に Perl はその部分文字列を探し、もし見つからなければ、
4139		実際にマッチングを試みる前に直ちに失敗します;
4140		従って、文字列に非 Unicode 符号位置が含まれていたとしても、
4141		警告は発生しません。
4142
4143		=begin original
4144
4145		This behavior is more "Do what I mean" than in earlier Perls for most
4146		applications. But it catches fewer issues for code that needs to be
4147		strictly Unicode compliant. Therefore there is an additional mode of
4148		operation available to accommodate such code. This mode is enabled if a
4149		regular expression pattern is compiled within the lexical scope where
4150		the C<"non_unicode"> warning class has been made fatal, say by:
4151
4152		=end original
4153
4154		この振る舞いは、ほとんどのアプリケーションにとって、
4155		以前の Perl よりもより「空気を読む」ものです。
4156		しかし、これは厳密に Unicode に準拠していることが必要なコードにとっては
4157		より少ないながらも問題があります。
4158		従って、そのようなコードに適応するために追加の
4159		操作モードが利用可能です。
4160		このモードは次のように、C<"non_unicode"> 警告クラスが致命的になっている
4161		レキシカルスコープ内で正規表現がコンパイルされたときに有効になります:
4162
4163		use warnings FATAL => "non_unicode"
4164
4165		=begin original
4166
4167		(see L<warnings>). In this mode of operation, Perl will raise the
4168		warning for all matches against a non-Unicode code point (not just the
4169		arguable ones), and it skips the optimizations that might cause the
4170		warning to not be output. (It currently still won't warn if the match
4171		isn't even attempted, like in the C<"foobar"> example above.)
4172
4173		=end original
4174
4175		(L<warnings> 参照)。
4176		この操作モードでは、Perl は (議論の余地のあるものだけでなく)
4177		非 Unicode 符号位置に対する全てのマッチングで警告を出力し、
4178		警告が出力されなくなるかもしれない最適化を飛ばします。
4179		(現在のところ、前述の C<"foobar"> の例のように、マッチングが試みられさえ
4180		しなかった場合は、警告は出ないままです。)
4181
4182		=begin original
4183
4184		In summary, Perl now normally treats non-Unicode code points as typical
4185		Unicode unassigned code points for regular expression matches, raising a
4186		warning only when it is arguable what the result should be. However, if
4187		this warning has been made fatal, it isn't skipped.
4188
4189		=end original
4190
4191		まとめると、Perl は通常正規表現マッチングでは非 Unicode 符号位置を
4192		典型的な Unicode 未割り当て符号位置として扱い、
4193		その結果に議論の余地がある場合にのみ警告を出力するようになりました。
4194		しかし、警告が致命的になっている場合は、これは飛ばされません。
4195
4196		=begin original
4197
4198		There is one exception to all this. C<\p{All}> looks like a Unicode
4199		property, but it is a Perl extension that is defined to be true for all
4200		possible code points, Unicode or not, so no warning is ever generated
4201		when matching this against a non-Unicode code point. (Prior to v5.20,
4202		it was an exact synonym for C<\p{Any}>, matching code points C<0>
4203		through C<0x10FFFF>.)
4204
4205		=end original
4206
4207		これら全てに関して一つの例外があります。
4208		C<\p{All}> は Unicode 特性のように見えますが、
4209		これは Unicode であろうがなかろうが全ての可能な符号位置に対して
4210		真と定義されている Perl 拡張なので、非 Unicode 符号位置に対してこれを
4211		マッチングしても警告は発生しません。
4212		(v5.20 より前では、これは C<\p{Any}> の正確な別名で、
4213		C<0> から C<0x10FFFF> の符号位置にマッチングしていました。)
4214
4215		=head2 Security Implications of Unicode
4216
4217		(Unicode のセキュリティへの影響)
4218
4219		=begin original
4220
4221		First, read
4222		L<Unicode Security Considerations\|https://www.unicode.org/reports/tr36>.
4223
4224		=end original
4225
4226		まず、
4227		L<Unicode Security Considerations\|https://www.unicode.org/reports/tr36> を
4228		読んでください。
4229
4230		=begin original
4231
4232		Also, note the following:
4233
4234		=end original
4235
4236		また、以下のことに注意してください:
4237
4238	2356	=over 4
4239	2357
4240	2358	=item *
4241	2359
4242	2360	=begin original
4243	2361
4244		Malformed UTF-8
	2362	You can enable automatic UTF-8-ification of your standard file
	2363	handles, default C<open()> layer, and C<@ARGV> by using either
	2364	the C<-C> command line switch or the C<PERL_UNICODE> environment
	2365	variable, see L<perlrun> for the documentation of the C<-C> switch.
4245	2366
4246	2367	=end original
4247	2368
4248		~~不正な~~ ~~UTF-8~~
	2369	デフォルトの C<open()> 層や C<@ARGV> の標準ファイルハンドルの
	2370	自動的な UTF-8 化を、C<-C> コマンドラインスイッチか
	2371	環境変数 C<PERL_UNICODE> によって有効にできます。
	2372	C<-C> スイッチについての説明は L<perlrun> を参照してください。
4249	2373
4250		=begin original
4251
4252		UTF-8 is very structured, so many combinations of bytes are invalid. In
4253		the past, Perl tried to soldier on and make some sense of invalid
4254		combinations, but this can lead to security holes, so now, if the Perl
4255		core needs to process an invalid combination, it will either raise a
4256		fatal error, or will replace those bytes by the sequence that forms the
4257		Unicode REPLACEMENT CHARACTER, for which purpose Unicode created it.
4258
4259		=end original
4260
4261		UTF-8 is very structured, so many combinations of bytes are invalid.
4262		以前は、Perl はこれと戦って、不正な組み合わせに意味を持たせようとしましたが、
4263		これはセキュリティホールを引き起こすことがあったので、
4264		今では、Perl コアが不正な組み合わせを処理する必要があると、
4265		致命的エラーが発生するか、それらのバイト列を、Unicode がこのために作った
4266		Unicode の REPLACEMENT CHARACTER を形成する並びに置き換えます。
4267
4268		=begin original
4269
4270		Every code point can be represented by more than one possible
4271		syntactically valid UTF-8 sequence. Early on, both Unicode and Perl
4272		considered any of these to be valid, but now, all sequences longer
4273		than the shortest possible one are considered to be malformed.
4274
4275		=end original
4276
4277		全ての符号位置は、一つの可能な文法的に正当な UTF-8 並び以上の
4278		方法で表現できます。
4279		以前は、Unicode と Perl はこれら全ても正当であると考えていましたが、
4280		今は、最短のもの以外の、より長い全ての並びは不正と考えられます。
4281
4282		=begin original
4283
4284		Unicode considers many code points to be illegal, or to be avoided.
4285		Perl generally accepts them, once they have passed through any input
4286		filters that may try to exclude them. These have been discussed above
4287		(see "Surrogates" under UTF-16 in L</Unicode Encodings>,
4288		L</Noncharacter code points>, and L</Beyond Unicode code points>).
4289
4290		=end original
4291
4292		Unicode は多くの符号位置を不正または避けるべきと考えています。
4293		Perl は、一旦これらを除外しようとする入力フィルタを通過したものは、
4294		一般的にこれらを受け入れます。
4295		これらは先に議論しています
4296		(L</Unicode Encodings> の UTF-16 の「サロゲート」("Surrogates"),
4297		L</Noncharacter code points>, L</Beyond Unicode code points> を
4298		参照してください)。
4299
4300	2374	=item *
4301	2375
4302	2376	=begin original
4303	2377
4304		Re~~gula~~r ~~exp~~res~~sion~~ ~~patte~~r~~n m~~a~~tching~~ may sur~~prise~~ you if yo~~u'r~~e not
	2378	Perl tries really hard to work both with Unicode and the old
4305		~~accus~~tomed to ~~Unico~~de. Starting in Pe~~rl 5.14~~, seve~~ral~~ ~~patt~~ern
	2379	byte-oriented world. Most often this is nice, but sometimes Perl's
4306		~~modifier~~s are a~~vai~~l~~able~~ to ~~con~~tro~~l th~~i~~s, c~~al~~led~~ the cha~~ract~~er set
	2380	straddling of the proverbial fence causes problems.
4307		modifiers. Details are given in L<perlre/Character set modifiers>.
4308	2381
4309	2382	=end original
4310	2383
4311		Unicode に慣れてい~~ないなら、正規表現パターンマッチングは~~
	2384	Perl は Unicode と古いバイト指向の世界の両方で働くために苦労しています。
4312		~~あなた~~を驚かせる~~かもしれません。~~
	2385	ほとんどの場合はうまくいきますが、ときには Perl が二股をかけていることが
4313		~~Perl 5.14 から、これ~~を制御す~~るためのいくつかのパターンマッチング修飾子が~~
	2386	問題を引き起こすこともあります。
4314		利用可能になりました; これは文字集合修飾子と呼ばれます。
4315		詳細は L<perlre/Character set modifiers> にあります。
4316	2387
4317	2388	=back
4318	2389
4319		=begin original
4320
4321		As discussed elsewhere, Perl has one foot (two hooves?) planted in
4322		each of two worlds: the old world of ASCII and single-byte locales, and
4323		the new world of Unicode, upgrading when necessary.
4324		If your legacy code does not explicitly use Unicode, no automatic
4325		switch-over to Unicode should happen.
4326
4327		=end original
4328
4329		すでに述べている通り、Perl は二つの世界のそれぞれに片方の足
4330		(二つのひづめ?) を突っ込んでいます: ASCII と単一バイトロケールの
4331		古い世界と、必要に応じて昇格する Unicode の新しい世界です。
4332		もしあなたの古いコードが明示的に Unicode を使っていないのなら、
4333		Unicode への切り替えが自動的になされることはありません。
4334
4335		=head2 Unicode in Perl on EBCDIC
4336
4337		(EBCDIC 上の Perl での Unicode)
4338
4339		=begin original
4340
4341		Unicode is supported on EBCDIC platforms. See L<perlebcdic>.
4342
4343		=end original
4344
4345		Unicode は EBCDIC プラットフォームで対応しています。
4346		L<perlebcdic> を参照してください。
4347
4348		=begin original
4349
4350		Unless ASCII vs. EBCDIC issues are specifically being discussed,
4351		references to UTF-8 encoding in this document and elsewhere should be
4352		read as meaning UTF-EBCDIC on EBCDIC platforms.
4353		See L<perlebcdic/Unicode and UTF>.
4354
4355		=end original
4356
4357		特に ASCII 対 EBCDIC 問題について
4358		議論されている場合でない限り、
4359		EBCDIC プラットフォームでは、
4360		この文書やその他での
4361		UTF-8 エンコーディングへの言及は、
4362		UTF-EBCDIC を意味するものとして読むべきです。
4363		L<perlebcdic/Unicode and UTF> を参照してください。
4364
4365		=begin original
4366
4367		Because UTF-EBCDIC is so similar to UTF-8, the differences are mostly
4368		hidden from you; S<C<use utf8>> (and NOT something like
4369		S<C<use utfebcdic>>) declares the script is in the platform's
4370		"native" 8-bit encoding of Unicode. (Similarly for the C<":utf8">
4371		layer.)
4372
4373		=end original
4374
4375		UTF-EBCDIC は UTF-8 にとても似ているので、違いはほとんど隠されています;
4376		S<C<use utf8>> (そして S<C<use utfebcdic>> のようなものでは「ありません」) は
4377		スクリプトがそのプラットフォームの「ネイティブな」Unicode の 8 ビット
4378		エンコーディングであることを宣言します。
4379		(C<":utf8"> 層も同様です。)
4380
4381		=head2 Locales
4382
4383		(ロケール)
4384
4385		=begin original
4386
4387		See L<perllocale/Unicode and UTF-8>
4388
4389		=end original
4390
4391		L<perllocale/Unicode and UTF-8> を参照してください。
4392
4393	2390	=head2 When Unicode Does Not Happen
4394	2391
4395	2392	(Unicode ではない場合)
4396	2393
4397	2394	=begin original
4398	2395
4399		There are still many ~~place~~s ~~where Unic~~ode (in ~~some e~~ncoding or
	2396	While Perl does have extensive ways to input and output in Unicode,
4400		another) ~~could b~~e given as ~~argum~~ents or ~~rece~~i~~ved~~ as re~~sults,~~ ~~or both~~ in
	2397	and few other 'entry points' like the @ARGV which can be interpreted
4401		~~Perl, but it i~~s not, in spite ~~of Pe~~rl having e~~xten~~s~~ive~~ w~~ays~~ to in~~put~~ ~~and~~
	2398	as Unicode (UTF-8), there still are many places where Unicode (in some
4402		o~~utput~~ in ~~Unic~~o~~de,~~ an~~d a few~~ other "e~~ntry~~ points" like ~~the C<@ARGV>~~
	2399	encoding or another) could be given as arguments or received as
4403		array ~~(which can s~~omet~~imes~~ be int~~erpreted~~ as ~~UTF-8)~~.
	2400	results, or both, but it is not.
4404	2401
4405	2402	=end original
4406	2403
4407		Perl には入出力を Unicode で行うための~~幅広い~~方法があり、
	2404	Perl には入出力を Unicode で行うための多数の方法があり、
4408		C<@ARGV> 配列のように ~~(時々~~ UTF-8 として解釈できるような)その他の
	2405	@ARGV のように Unicode (UTF-8) として解釈できるようなその他の
4409		「エントリポイント」がい~~くつかあるにも関わらず~~、
	2406	「エントリポイント」はほとんどない一方、(何らかのエンコーディングで)
4410		(何ら~~かのエンコーディングで)~~
	2407	Unicode が引数として与えられたり結果として返されるべきにも関わらず、
4411		~~Unicode が引数とし~~て~~与えられたり結果として返されたり、ある~~いは
	2408	そうなっていない場所も未だ多くあります。
4412		両方であるべきにも関わらず、そうなっていない場所も未だ多くあります。
4413	2409
4414	2410	=begin original
4415	2411
4416		The following are such interfaces. Also, see ~~L</The "Un~~icode ~~Bug">.~~
	2412	The following are such interfaces. For all of these interfaces Perl
4417		For all of these int~~erf~~aces Perl
	2413	currently (as of 5.8.3) simply assumes byte strings both as arguments
4418		currentl~~y (a~~s of ~~v5.16.0)~~ s~~imply assumes by~~t~~e st~~rings both as arguments
	2414	and results, or UTF-8 strings if the C<encoding> pragma has been used.
4419		and results, or UTF-8 strings if the (deprecated) C<encoding> pragma has been used.
4420	2415
4421	2416	=end original
4422	2417
4423	2418	以下に挙げるのはそのようなインターフェースです。
4424		~~また、L</The~~ ~~"Unicod~~e ~~Bug"> を参照してください。~~
	2419	これらすべてが現在の Perl(5.8.3) では単純に引数と戻り値の両方が
4425		これ~~らすべ~~て~~が現在の~~ ~~Perl(v5.16.0)~~ で~~は単純に引数と戻り値の両方が~~
	2420	バイト文字列か、C<encoding> プラグマが使われていれば UTF-8 文字列で
4426		~~バイト文字列か、(廃止予~~定~~の) C<encoding> プラグマが使われ~~ていれば
	2421	あると仮定しています。
4427		UTF-8 文字列であると仮定しています。
4428	2422
4429	2423	=begin original
4430	2424
4431		One reason that Perl does not attempt to resolve the role of Unicode in
	2425	One reason why Perl does not attempt to resolve the role of Unicode in
4432		these ~~situ~~a~~tion~~s is that the answers are highly dependent on the operating
	2426	this cases is that the answers are highly dependent on the operating
4433	2427	system and the file system(s). For example, whether filenames can be
4434		in Unicode and in exactly what kind of encoding, is not exactly a
	2428	in Unicode, and in exactly what kind of encoding, is not exactly a
4435		portable concept. Similarly for C<qx> and C<system>: how well will the
	2429	portable concept. Similarly for the qx and system: how well will the
4436		"command-line interface" (and which of them?) handle Unicode?
	2430	'command line interface' (and which of them?) handle Unicode?
4437	2431
4438	2432	=end original
4439	2433
4440		このような状況において、Perl が Unicode による解決を~~しないのかの~~
	2434	このようなケースにおいて、Perl がなぜ Unicode による解決を
4441		理由の一つは、答えがオペレーティングシステムや
	2435	しないのかの理由の一つは、答えがオペレーティングシステムや
4442	2436	ファイルシステムに強く依存しているからです。
4443	2437	たとえば、ファイル名が Unicode で記述できてエンコーディングが
4444	2438	合っていたとしてもそれは移植性のあるコンセプトではないのです。
4445		同様なことが C<qx> や C<system> にも言えます:
	2439	同様なことが qx や system にも言えます:
4446	2440	「コマンドラインインターフェース」は Unicode をどのように
4447	2441	扱うのでしょうか?
4448	2442
4449	2443	=over 4
4450	2444
4451	2445	=item *
4452	2446
4453		C<chdir>, C<chmod>, C<chown>, C<chroot>, C<exec>, C<link>, C<lstat>, C<mkdir>,
	2447	chdir, chmod, chown, chroot, exec, link, lstat, mkdir,
4454		C<rename>, C<rmdir>, C<stat>, C<symlink>, C<truncate>, C<unlink>, C<utime>, C<-X>
	2448	rename, rmdir, stat, symlink, truncate, unlink, utime, -X
4455	2449
4456	2450	=item *
4457	2451
4458		C<%ENV>
	2452	%ENV
4459	2453
4460	2454	=item *
4461	2455
4462	2456	=begin original
4463	2457
4464		C<glob> (aka the C<~~E<lt>~~*~~E<gt>~~>)
	2458	glob (aka the <*>)
4465	2459
4466	2460	=end original
4467	2461
4468		C<glob> (または C<~~E<lt>~~*~~E<gt>~~>)
	2462	glob (または <*>)
4469	2463
4470	2464	=item *
4471	2465
4472		C<open>, C<opendir>, C<sysopen>
	2466	open, opendir, sysopen
4473	2467
4474	2468	=item *
4475	2469
4476	2470	=begin original
4477	2471
4478		C<qx> (aka the backtick operator), C<system>
	2472	qx (aka the backtick operator), system
4479	2473
4480	2474	=end original
4481	2475
4482		C<qx> (または逆クォート演算子), C<system>
	2476	qx (または逆クォート演算子), system
4483	2477
4484	2478	=item *
4485	2479
4486		C<readdir>, C<readlink>
	2480	readdir, readlink
4487	2481
4488	2482	=back
4489	2483
4490		=head2 ~~The~~ "Unicode Bug"
	2484	=head2 Forcing Unicode in Perl (Or Unforcing Unicode in Perl)
4491	2485
4492		(「Unicode ~~バグ」~~)
	2486	(Unicode を Perl に強制する (あるいは Unicode でないことを Perl に強制する))
4493	2487
4494	2488	=begin original
4495	2489
4496		The term, "Unicode ~~bug" ha~~s ~~been~~ applie~~d to a~~n ~~inconsis~~te~~ncy~~ ~~with th~~e
	2490	Sometimes (see L</"When Unicode Does Not Happen">) there are
4497		~~code point~~s in t~~he C<L~~atin-1 Supplement> block, that ~~is,~~ between
	2491	situations where you simply need to force Perl to believe that a byte
4498		~~128 a~~nd ~~255. W~~i~~thout~~ a locale spe~~cified,~~ ~~unlik~~e all other cha~~racter~~s or
	2492	string is UTF-8, or vice versa. The low-level calls
4499		code ~~poin~~t~~s, th~~es~~e charac~~ters can ~~have ve~~ry d~~iff~~e~~ren~~t s~~eman~~tics
	2493	utf8::upgrade($bytestring) and utf8::downgrade($utf8string) are
4500		~~depending on~~ the ~~rules in effect. (Ch~~a~~racter~~s w~~hos~~e ~~code points a~~re
	2494	the answers.
4501		above 255 force Unicode rules; whereas the rules for ASCII characters
4502		are the same under both ASCII and Unicode rules.)
4503	2495
4504	2496	=end original
4505	2497
4506		「Unicode ~~バグ」("Unic~~ode ~~bug"~~)~~という用語は~~、
	2498	ときとして(L</When Unicode Does Not Happen> を参照)、Perl にバイト列を
4507		~~C<Latin~~-~~1 Supplement> ブロック、つまり 12~~8 ~~から 255 に~~ある~~符号位置~~の
	2499	UTF-8 であるように強制したりその逆を行う場合があるかもしれません。
4508		~~非一貫性に対~~し~~て使われます。~~
	2500	低レベルの呼び出し utf8::upgrade($bytestring) と
4509		その~~他の文字や符号位置とは異なり、これらの文字は~~
	2501	utf8::downgrade($utf8string) がその回答です。
4510		有効な規則によってとても異なったセマンティクスです。
4511		(255 を超える符号位置の文字は Unicode の規則が強制されます;
4512		一方 ASCII 文字のための規則は、ASCII と Unicode の規則で同じです。)
4513	2502
4514	2503	=begin original
4515	2504
4516		~~Under~~ Unicode rules, these u~~pper-La~~t~~in1~~ char~~act~~ers ~~are~~ inter~~preted~~ as
	2505	Do not use them without careful thought, though: Perl may easily get
4517		~~Unicod~~e code ~~poi~~nts, ~~which~~ means they have the ~~same sema~~n~~tics~~ a~~s La~~t~~in-1~~
	2506	very confused, angry, or even crash, if you suddenly change the 'nature'
4518		~~(ISO-8859-1)~~ and C1 controls.
	2507	of scalar like that. Especially careful you have to be if you use the
	2508	utf8::upgrade(): any random byte string is not valid UTF-8.
4519	2509
4520	2510	=end original
4521	2511
4522		~~Unicode の規則の下では~~、これら~~の上位の Latin1 文字は Unicode 符号位置~~として
	2512	しかし、これらを使うときには十分注意しなければなりません: あなたが突然
4523		~~解釈され、L~~at~~in-1 (ISO-8859-1~~) ~~および C1 制御文字と~~
	2513	スカラのような'性質'(nature)を突然変えたりしたら、Perl は簡単に混乱し、
4524		~~同じセマンティ~~ク~~スを持ち~~ます。
	2514	怒り、クラッシュしてしまいます。
	2515	utf8::upgrade() を使うときには特に注意が必要です: 任意のランダムな
	2516	バイト列は正当な UTF-8 ではありません。
4525	2517
4526		=begin original
	2518	=head2 Using Unicode in XS
4527	2519
4528		~~As explained in L</A~~S~~CII~~ ~~Rules~~ ~~versus~~ Unicode ~~Rules>, under ASCII rules,~~
	2520	(XS で Unicode を使う)
4529		they are considered to be unassigned characters.
4530	2521
4531		=end original
4532
4533		L</ASCII Rules versus Unicode Rules> で説明されているように、
4534		ASCII の規則では、これらは未割り当て文字と見なされます。
4535
4536	2522	=begin original
4537	2523
4538		~~This~~ can ~~lead~~ to unexpected re~~sults.~~ ~~For~~ ex~~ampl~~e, a ~~str~~in~~g's~~
	2524	If you want to handle Perl Unicode in XS extensions, you may find the
4539		~~semant~~i~~cs ca~~n suddenly ~~chang~~e if a code point above ~~255 is~~ a~~ppe~~n~~ded to~~
	2525	following C APIs useful. See also L<perlguts/"Unicode Support"> for an
4540		it, which cha~~nges~~ the rules from ~~ASCII t~~o ~~Unicod~~e. A~~s an~~
	2526	explanation about Unicode at the XS level, and L<perlapi> for the API
4541		~~example, consi~~der t~~he following progr~~a~~m and~~ its ~~output:~~
	2527	details.
4542	2528
4543	2529	=end original
4544	2530
4545		~~これによ~~り~~、予期しな~~い~~結果が生じ~~る~~可能性があります。~~
	2531	Perl の Unicode を XS 拡張で取り扱いたいと思うのなら、以下に挙げる
4546		~~たとえば、255~~ ~~を超えるコードポイント~~が~~文字列に追加さ~~れ~~た場合、~~
	2532	API 群が便利かも知れません。
4547		~~文字列のセマンティクスが突然変更され、規則が A~~S~~CII~~ から Unicode に
	2533	XS レベルでの Unicode に関しての説明は L<perlguts/"Unicode Support"> を、
4548		変更さ~~れる可能性があります~~。
	2534	API の詳細については L<perlapi> を参照してください。
4549		例として、次のプログラムとその出力を考えてみます:
4550	2535
4551		~~$ p~~erl ~~-le'~~
	2536	=over 4
4552		no feature "unicode_strings";
4553		$s1 = "\xC2";
4554		$s2 = "\x{2660}";
4555		for ($s1, $s2, $s1.$s2) {
4556		print /\w/ \|\| 0;
4557		}
4558		'
4559		0
4560		0
4561		1
4562	2537
4563		=~~beg~~in ~~original~~
	2538	=item *
4564	2539
4565		If there's no C<\w> in C<s1> nor in C<s2>, why does their concatenation
4566		have one?
4567
4568		=end original
4569
4570		C<s1> や C<s2> に C<\w> がなければ、なぜこれらの結合は一つになるのでしょう?
4571
4572	2540	=begin original
4573	2541
4574		This an~~omaly~~ stems f~~rom~~ ~~Perl's a~~tte~~mpt~~ to ~~not~~ dis~~turb~~ o~~lder~~ ~~progr~~ams that
	2542	C<DO_UTF8(sv)> returns true if the C<UTF8> flag is on and the bytes
4575		~~didn't~~ use Unico~~de,~~ ~~alo~~ng with ~~Perl's~~ desire t~~o add U~~n~~icod~~e s~~uppor~~t
	2543	pragma is not in effect. C<SvUTF8(sv)> returns true is the C<UTF8>
4576		~~seam~~les~~sly.~~ But the res~~ult~~ tur~~ned~~ ~~out~~ to no~~t b~~e ~~seamless~~. ~~(By t~~he way,
	2544	flag is on; the bytes pragma is ignored. The C<UTF8> flag being on
4577		you can choose to be warned ~~whe~~n th~~ing~~s ~~lik~~e this happe~~n. See~~
	2545	does B<not> mean that there are any characters of code points greater
4578		~~C<L<e~~ncodin~~g::w~~arnings~~>>.)~~
	2546	than 255 (or 127) in the scalar or that there are even any characters
	2547	in the scalar. What the C<UTF8> flag means is that the sequence of
	2548	octets in the representation of the scalar is the sequence of UTF-8
	2549	encoded code points of the characters of a string. The C<UTF8> flag
	2550	being off means that each octet in this representation encodes a
	2551	single character with code point 0..255 within the string. Perl's
	2552	Unicode model is not to use UTF-8 until it is absolutely necessary.
4579	2553
4580	2554	=end original
4581	2555
4582		~~この異常~~は、U~~nicode~~ を~~使用していない、~~
	2556	C<DO_UTF8(sv)> は C<UTF8> フラグがオンでバイトプラグマが効果を
4583		古い~~プログラムを妨害し~~ないようにし~~ようという Perl の試みと、~~
	2557	もっていないときに真を返します。
4584		U~~nicode~~ ~~対応をシームレスに追加しようとする~~ ~~Perl~~ の
	2558	C<SvUTF8(sv)> は C<UTF8> がオンのとき、バイトプラグマの状態には
4585		~~願望によるもので~~す。
	2559	関係なく真を返します。
4586		~~しかしその結果~~は~~シームレ~~ス~~になりません~~でした。
	2560	C<UTF8> フラグはスカラの中で 255(もしくは127)を超える符号位置の文字が
4587		(と~~ころで、このよ~~うなこと~~が起きたときに警告されるようにでき~~ます。
	2561	あるということを I<意味しません>。
4588		C<~~L<encoding::warnings~~>> ~~を参照してください。)~~
	2562	C<UTF8> フラグの意味するところは、スカラ中のそのオクテットの並びが
	2563	文字列としてUTF-8でエンコードされた符号位置の並びだということです。
	2564	C<UTF8> フラグがオフであるということは文字列の中のエンコードされた
	2565	文字が 0..255 の範囲でエンコードされたオクテットであることを意味します。
	2566	Perl の Unicode モデルは本当に必要となるまで UTF-8 を使用しません。
4589	2567
	2568	=item *
	2569
4590	2570	=begin original
4591	2571
4592		~~L<S<~~C<u~~se fea~~ture 'unicode~~_st~~r~~ings'>>\|fe~~ature~~/The~~ ~~'uni~~code~~_str~~in~~gs'~~ ~~fea~~t~~ure>~~
	2572	C<uvuni_to_utf8(buf, chr)> writes a Unicode character code point into
4593		was ~~add~~e~~d, sta~~rting in ~~Perl~~ ~~v5.12~~, to address this pro~~blem. I~~t ~~aff~~e~~cts~~
	2573	a buffer encoding the code point as UTF-8, and returns a pointer
4594		these th~~ing~~s:
	2574	pointing after the UTF-8 bytes.
4595	2575
4596	2576	=end original
4597	2577
4598		Perl ~~v5.12~~ ~~から、こ~~の~~問題に対応に対応するために~~
	2578	C<uvuni_to_utf8(buf, chr)> は Unicode の文字符号位置を UTF-8 で
4599		~~L<S<C<use feature 'unicode_strings'>>\|feature/The 'unicode_strings' feature>~~
	2579	エンコードされたの符号位置としてバッファに書き込みます。
4600		~~が追加されま~~した。
	2580	そして、その UTF-8 バイトの後を指し示すポインタを返します。
4601		これは以下のような影響があります:
4602	2581
4603		=over 4
4604
4605	2582	=item *
4606	2583
4607	2584	=begin original
4608	2585
4609		Changing the case of ~~a s~~c~~alar,~~ t~~hat i~~s, ~~using~~ ~~C<uc()>,~~ C<ucfir~~st()>,~~ ~~C<lc()>,~~
	2586	C<utf8_to_uvuni(buf, lenp)> reads UTF-8 encoded bytes from a buffer and
4610		and ~~C<l~~cfirst~~()>,~~ or ~~C<\L>,~~ ~~C<\U>, C<\u>~~ and ~~C<\l>~~ in ~~doub~~le~~-quo~~tish
	2587	returns the Unicode character code point and, optionally, the length of
4611		~~con~~te~~xts,~~ ~~such~~ as regu~~lar~~ e~~xpressio~~n ~~substitutions~~.
	2588	the UTF-8 byte sequence.
4612	2589
4613	2590	=end original
4614	2591
4615		~~スカラの大文字小文字を変える; つまり、~~C<u~~c()>, C<uc~~f~~irs~~t()>, C<lc()>,
	2592	C<utf8_to_uvuni(buf, lenp)> はバッファから UTF-8 エンコードされたバイトを
4616		~~C<lcf~~i~~rst()>~~ ~~を使ったり、正規表現~~置換の~~ようなダブルクォート風~~
	2593	読み出し、Unicode の文字符号位置と、オプションでその
4617		コンテキストの~~中で C<\L>, C<\U>, C<\u>, C<\l>~~ を使う。
	2594	UTF-8 バイトシーケンスの長さを返します。
4618	2595
	2596	=item *
	2597
4619	2598	=begin original
4620	2599
4621		~~Under~~ C<u~~nicode_s~~trings~~> s~~tart~~ing~~ in Perl ~~5.12.0,~~ Unicode rule~~s a~~re
	2600	C<utf8_length(start, end)> returns the length of the UTF-8 encoded buffer
4622		genera~~lly us~~ed. ~~See L~~<~~per~~lfun~~c/lc~~> for det~~ail~~s on how this wo~~rks~~
	2601	in characters. C<sv_len_utf8(sv)> returns the length of the UTF-8 encoded
4623		in c~~ombin~~a~~tion with v~~ar~~ious other pragmas~~.
	2602	scalar.
4624	2603
4625	2604	=end original
4626	2605
4627		~~Perl 5.12.0 からの~~ C<u~~nicode~~_strings> では~~、一般的に~~
	2606	C<utf8_length(start, end)> は UTF-8 エンコードされたバッファの長さを
4628		~~Unicode の規則が使われ~~ます。
	2607	文字で返します。
4629		これがさま~~ざまなプラグマと組み合わせて動作~~す~~る方法の~~
	2608	C<sv_len_utf8(sv)> は UTF-8 エンコードされたスカラの長さを返します。
4630		詳細については、L<perlfunc/lc> を参照してください。
4631	2609
4632	2610	=item *
4633	2611
4634	2612	=begin original
4635	2613
4636		Using case~~less~~ (~~C</i>~~) re~~gula~~r e~~xpre~~ssion matchi~~ng.~~
	2614	C<sv_utf8_upgrade(sv)> converts the string of the scalar to its UTF-8
	2615	encoded form. C<sv_utf8_downgrade(sv)> does the opposite, if
	2616	possible. C<sv_utf8_encode(sv)> is like sv_utf8_upgrade except that
	2617	it does not set the C<UTF8> flag. C<sv_utf8_decode()> does the
	2618	opposite of C<sv_utf8_encode()>. Note that none of these are to be
	2619	used as general-purpose encoding or decoding interfaces: C<use Encode>
	2620	for that. C<sv_utf8_upgrade()> is affected by the encoding pragma
	2621	but C<sv_utf8_downgrade()> is not (since the encoding pragma is
	2622	designed to be a one-way street).
4637	2623
4638	2624	=end original
4639	2625
4640		大文字~~小文字~~を~~無視した~~ ~~(C</i>)~~ ~~正規表現マッチ~~ン~~グを使う。~~
	2626	C<sv_utf8_upgrade(sv)> はスカラの文字列をその UTF-8 エンコードされた
	2627	形式に変換します。
	2628	C<sv_utf8_downgrade(sv)> は(可能であれば)その反対の動作をします。
	2629	C<sv_utf8_encode(sv)> は C<sv_utf8_upgrade> に似ていますが、
	2630	C<UTF8> フラグをセットしない点が異なります。
	2631	これらの欠如が一般的な目的のエンコーディングやデコーディングの
	2632	インターフェースとして使われていることに注意してください:
	2633	C<use Encode> がそのためにあります。
	2634	C<sv_utf8_upgrade()> はエンコーディングプラグマに影響を受けますが、
	2635	C<sv_utf8_downgrade()> はそうではありません(なぜならエンコーディング
	2636	プラグマは一方通行にデザインされているからです)。
4641	2637
	2638	=item *
	2639
4642	2640	=begin original
4643	2641
4644		Starting in Perl ~~5.14.0, r~~e~~gula~~r exp~~ressi~~ons co~~mpi~~led ~~within~~
	2642	C<is_utf8_char(s)> returns true if the pointer points to a valid UTF-8
4645		th~~e s~~c~~ope of C<unicode_s~~t~~rings> us~~e ~~Unicode~~ r~~ules~~
	2643	character.
4646		even when executed or compiled into larger
4647		regular expressions outside the scope.
4648	2644
4649	2645	=end original
4650	2646
4651		~~Perl 5.14.0 から、~~C<uni~~code_~~str~~ing~~s> ~~のスコープ内でコンパ~~イ~~ルされた~~
	2647	C<is_utf8_char(s)> はポインタが正しい UTF-8 文字を指し示しているときに
4652		~~正規表現は、スコープの外で実行されたり、~~
	2648	真を返します。
4653		より大きな正規表現の中にコンパイルされたりした場合でも、
4654		Unicode の規則を使います。
4655	2649
4656	2650	=item *
4657	2651
4658	2652	=begin original
4659	2653
4660		Matching ~~any o~~f ~~severa~~l properties in ~~regu~~lar e~~xpre~~ssio~~ns.~~
	2654	C<is_utf8_string(buf, len)> returns true if C<len> bytes of the buffer
	2655	are valid UTF-8.
4661	2656
4662	2657	=end original
4663	2658
4664		正~~規表現中に~~い~~くつかの特性を使う。~~
	2659	C<is_utf8_string(buf, len)> はバッファの C<len> バイトが正しい
	2660	UTF-8 文字であるときに真を返します。
4665	2661
	2662	=item *
	2663
4666	2664	=begin original
4667	2665
4668		~~These properties are~~ C<\b> (without braces), ~~C<\B> (w~~ith~~out~~ ~~bra~~ce~~s),~~
	2666	C<UTF8SKIP(buf)> will return the number of bytes in the UTF-8 encoded
4669		~~C<\s>,~~ ~~C<\S>,~~ ~~C<\w>,~~ C<\W>, ~~and a~~ll the ~~Posix charact~~er ~~class~~es
	2667	character in the buffer. C<UNISKIP(chr)> will return the number of bytes
4670		I<except> ~~C<[[:~~asci~~i:]]>~~.
	2668	required to UTF-8-encode the Unicode character code point. C<UTF8SKIP()>
	2669	is useful for example for iterating over the characters of a UTF-8
	2670	encoded buffer; C<UNISKIP()> is useful, for example, in computing
	2671	the size required for a UTF-8 encoded buffer.
4671	2672
4672	2673	=end original
4673	2674
4674		~~その特性は、(大かっこなしの)~~ C<\b>, ~~(大かっこなし~~の) ~~C<\B>,~~
	2675	C<UTF8SKIP(buf)> はバッファの中にある UTF-8 エンコードされた文字の
4675		~~C<\s>, C<\S>, C<\w>, C<\W> および、~~
	2676	バイト数を返します。
4676		C<~~[[:as~~c~~ii:]]~~> ~~I<以外の>~~ ~~Pos~~ix 文字~~クラスで~~す。
	2677	C<UNISKIP(chr)> は UTF-8 エンコードする Unicode 文字の符号位置が要求する
	2678	バイト数を返します。
	2679	C<UTF8SKIP()> は UTF-8 エンコードされたバッファの文字に対して繰り返しを
	2680	行うような例に便利です。
	2681	C<UNISKIP()> はたとえば、UTF-8 エンコードされたバッファの要求する大きさを
	2682	計算するのに便利です。
4677	2683
	2684	=item *
	2685
4678	2686	=begin original
4679	2687
4680		St~~art~~ing in ~~Per~~l ~~5.14.0, r~~egular e~~xpre~~ssions c~~ompil~~ed within
	2688	C<utf8_distance(a, b)> will tell the distance in characters between the
4681		t~~he sc~~ope o~~f C<un~~i~~cod~~e_strings> use Unicode rules
	2689	two pointers pointing to the same UTF-8 encoded buffer.
4682		even when executed or compiled into larger
4683		regular expressions outside the scope.
4684	2690
4685	2691	=end original
4686	2692
4687		~~Perl 5.14.0 から、~~C<u~~nicode~~_strings> のスコー~~プ内でコンパイル~~された
	2693	C<utf8_distance(a, b)> は同じ UTF-8 エンコードされたバッファをさす
4688		~~正規表現は、スコープ~~の~~外で実行されたり、~~
	2694	二つのポインタの間の文字単位の距離を返します。
4689		より大きな正規表現の中にコンパイルされたりした場合でも、
4690		Unicode の規則を使います。
4691	2695
4692	2696	=item *
4693	2697
4694	2698	=begin original
4695	2699
4696		In C<quotemeta> or its inline equ~~ival~~e~~nt C<\Q>.~~
	2700	C<utf8_hop(s, off)> will return a pointer to an UTF-8 encoded buffer
	2701	that is C<off> (positive or negative) Unicode characters displaced
	2702	from the UTF-8 buffer C<s>. Be careful not to overstep the buffer:
	2703	C<utf8_hop()> will merrily run off the end or the beginning of the
	2704	buffer if told to do so.
4697	2705
4698	2706	=end original
4699	2707
4700		C<quot~~emeta~~> や、~~インラインの等価物~~である C<\Q> ~~の中。~~
	2708	C<utf8_hop(s, off)> は、UTF-8 バッファ C<s> から Unicode で C<off> 文字分
	2709	(正数でも負数でも) 移動した UTF-8 エンコーディングバッファへの
	2710	ポインタを返します。
	2711	バッファを超えないように注意してください: C<utf8_hop()> は、そう
	2712	指示されれば何も気にせずにバッファの先頭や末尾を踏み越えます。
4701	2713
	2714	=item *
	2715
4702	2716	=begin original
4703	2717
4704		~~Starti~~ng i~~n Per~~l ~~5.16.0~~, ~~con~~s~~ist~~ent ~~quot~~ing rules a~~re u~~sed ~~withi~~n ~~the~~
	2718	C<pv_uni_display(dsv, spv, len, pvlim, flags)> and
4705		~~scope of~~ C<uni~~code~~_~~str~~ings>, as described ~~in L<p~~erlfun~~c/quo~~te~~meta>.~~
	2719	C<sv_uni_display(dsv, ssv, pvlim, flags)> are useful for debugging the
4706		~~Pri~~or to t~~hat,~~ or ~~outs~~ide its sc~~ope,~~ no code ~~poin~~ts ~~abov~~e ~~127~~ are quoted
	2720	output of Unicode strings and scalars. By default they are useful
4707		in ~~UTF-8 enc~~ode~~d str~~ing~~s, bu~~t i~~n b~~yte e~~nco~~de~~d str~~i~~ngs,~~ code ~~points~~
	2721	only for debugging--they display B<all> characters as hexadecimal code
4708		betw~~een~~ ~~128-255 ar~~e always ~~quoted.~~
	2722	points--but with the flags C<UNI_DISPLAY_ISPRINT>,
	2723	C<UNI_DISPLAY_BACKSLASH>, and C<UNI_DISPLAY_QQ> you can make the
	2724	output more readable.
4709	2725
4710	2726	=end original
4711	2727
4712		~~Perl 5.16.0 から、L~~<p~~erlf~~un~~c/quot~~emeta> ~~で記述されているように、~~
	2728	C<pv_uni_display(dsv, spv, len, pvlim, flags)> と
4713		C<uni~~code~~_strings> のス~~コープ~~の~~中では、~~
	2729	C<sv_uni_display(dsv, ssv, pvlim, flags)> は Unicode の文字列やスカラの
4714		~~一貫したクォート規則が使われま~~す。
	2730	出力をデバッグするのに便利です。
4715		~~それ以前~~で~~あったり、スコープ外~~の~~場合、~~
	2731	デフォルトではデバッグのみに便利です -- B<すべての> 文字を
4716		~~UTF-8 エンコードされた文字列では~~ 128 ~~を超える~~符号位置~~の文字は~~
	2732	16 進の符号位置として表示します -- しかし C<UNI_DISPLAY_ISPRINT>,
4717		~~クォートされな~~いが、
	2733	C<UNI_DISPLAY_BACKSLASH>, C<UNI_DISPLAY_QQ> というフラグを
4718		~~バイトエンコードされた文字列では、128-255 の符号位置は常にクォートされ~~る。
	2734	与えることによって、出力を読みやすくできます。
4719	2735
4720	2736	=item *
4721	2737
4722	2738	=begin original
4723	2739
4724		In the ~~C<..>~~ or ~~L<range\|~~perl~~op/R~~ange Ope~~rators>~~ operator.
	2740	C<ibcmp_utf8(s1, pe1, u1, l1, u1, s2, pe2, l2, u2)> can be used to
	2741	compare two strings case-insensitively in Unicode. For case-sensitive
	2742	comparisons you can just use C<memEQ()> and C<memNE()> as usual.
4725	2743
4726	2744	=end original
4727	2745
4728		C<~~..>~~ ~~という L<範囲\|~~perl~~op/Range~~ Ope~~rators~~> ~~演算子の中。~~
	2746	C<ibcmp_utf8(s1, pe1, u1, l1, u1, s2, pe2, l2, u2)> は Unicode に
	2747	おいて大小文字を無視した文字列比較に使うことができます。
	2748	大小文字を意識した比較には通常どおり C<memEQ()> や C<memNE()> を
	2749	使うことができます。
4729	2750
	2751	=back
	2752
4730	2753	=begin original
4731	2754
4732		~~Sta~~r~~ting~~ ~~in Pe~~r~~l 5.26.0, th~~e range operato~~r o~~n stri~~ngs~~ ~~tre~~ats t~~heir~~ lengths
	2755	For more information, see L<perlapi>, and F<utf8.c> and F<utf8.h>
4733		~~cons~~i~~ste~~n~~tly~~ within the scope ~~of C<uni~~code_stri~~ngs>. Pr~~io~~r to that, or~~
	2756	in the Perl source code distribution.
4734		outside its scope, it could produce strings whose length in characters
4735		exceeded that of the right-hand side, where the right-hand side took up more
4736		bytes than the correct range endpoint.
4737	2757
4738	2758	=end original
4739	2759
4740		Perl ~~5.26.0 から~~、~~文字列へ~~の~~範囲演算子は、~~
	2760	もっと詳しい情報は、L<perlapi> と、Perl のソースコード配布の
4741		C<unic~~ode_s~~t~~rings~~> ~~のスコープ内で、その長さ~~を~~一貫性を持っ~~て扱います。
	2761	F<utf8.c> と F<utf8.h> を参照してください。
4742		それより前、あるいはスコープの外では、
4743		右側が正しい範囲の端点より多くのバイト数を必要とする場所では、
4744		右側の長さを超えた長さの文字を生成することがあります。
4745	2762
4746		=item *
	2763	=head1 BUGS
4747	2764
4748		=be~~gin~~ original
	2765	=head2 Interaction with Locales
4749	2766
4750		~~In L<< C<split>'s special-case whitespace splitting\|perlfunc/split >>.~~
	2767	(ロケールとの相互作用)
4751	2768
4752		=end original
4753
4754		L<< C<split> の空白分割の特殊処理\|perlfunc/split >> の中。
4755
4756	2769	=begin original
4757	2770
4758		~~Starting in P~~erl ~~5.28.0,~~ the C<split> function with a pat~~tern~~ ~~specifie~~d as
	2771	Use of locales with Unicode data may lead to odd results. Currently,
4759		a st~~ring~~ conta~~ining~~ ~~a s~~i~~ngle~~ ~~spa~~c~~e h~~andles whit~~espace~~ characters ~~cons~~isten~~tly~~
	2772	Perl attempts to attach 8-bit locale info to characters in the range
4760		within the scope ~~of C<un~~icode_str~~ings>.~~ Prior to ~~that,~~ or o~~utsid~~e its ~~scope,~~
	2773	0..255, but this technique is demonstrably incorrect for locales that
4761		characters that are white~~space~~ a~~ccor~~ding to Unicode rules ~~but not according to~~
	2774	use characters above that range when mapped into Unicode. Perl's
4762		~~ASCII rul~~es were t~~reated~~ ~~as f~~ield contents rat~~her~~ ~~tha~~n ~~fie~~ld se~~parat~~ors when
	2775	Unicode support will also tend to run slower. Use of locales with
4763		~~they appear i~~n ~~byte-en~~coded strings.
	2776	Unicode is discouraged.
4764	2777
4765	2778	=end original
4766	2779
4767		Perl ~~5.28.0 から、単一の空白からなる文字列をパ~~ターンとして
	2780	Unicode データと共にロケールを使うことはおかしな結果を
4768		~~指定され~~た ~~C<split> 関数は、C<unicode_strings> のスコープ内~~では
	2781	もたらすことになりやすいです。
4769		空白文字を~~一貫性を持って扱います。~~
	2782	現在のところ、Perl は文字に 0..255 の範囲の 8 ビットロケールを
4770		~~それよ~~り~~前、ある~~い~~はスコープ~~の外では、
	2783	割り当てようとしていますが、このテクニックは Unicode に
4771		~~Unicode~~ の~~規則では空白だけれども ASCII~~ の~~規則ではそうではない~~文字は、
	2784	マップしようとしたときに先の範囲の文字を使用するロケールに対して
4772		~~バイトエンコードされた文字列~~に~~現れた場合、~~
	2785	明らかに正しくありません。
4773		フィー~~ルド区切りで~~は~~なくフィールドの内容として扱われてい~~ました。
	2786	Perl の Unicode サポートはまた、遅くなりがちです。
	2787	Unicode といっしょにロケールを使うことはお勧めできません。
4774	2788
4775		=back
	2789	=head2 Interaction with Extensions
4776	2790
	2791	(エクステンションとの相互作用)
	2792
4777	2793	=begin original
4778	2794
4779		~~You ca~~n se~~e f~~rom the a~~bove~~ that the e~~ffec~~t ~~of C<unicod~~e_strings>
	2795	When Perl exchanges data with an extension, the extension should be
4780		~~incre~~ased o~~ver~~ several Perl relea~~ses.~~ (And ~~Perl's~~ ~~supp~~ort for ~~Unicod~~e
	2796	able to understand the UTF8 flag and act accordingly. If the
4781		~~con~~t~~inu~~es to ~~impr~~ove~~; it'~~s ~~bes~~t to use the latest ~~ava~~ilable release in
	2797	extension doesn't know about the flag, it's likely that the extension
4782		order t~~o get the most complete a~~nd accur~~ate~~ re~~sul~~t~~s possib~~le.) ~~Note th~~at
	2798	will return incorrectly-flagged data.
4783		C<unicode_strings> is automatically chosen if you S<C<use v5.12>> or
4784		higher.
4785	2799
4786	2800	=end original
4787	2801
4788		~~前述のところから、C<unicode_strings> の効果は~~ Perl ~~のリリ~~ース~~が進むにつれて~~
	2802	Perl がエクステンションとデータをやり取りするとき、そのエクステンションは
4789		拡大し~~ていることが分かり~~ます。
	2803	UTF8 フラグを理解し、また、それに従った振る舞いをすべきです。
4790		~~(そして Perl~~ の ~~Unicode 対応は改良し続け~~られ~~ています;~~
	2804	エクステンションがこのフラグについて何も知らなければ、そのエクステンションは
4791		~~最大限に完全で~~正確な~~結果を得る~~た~~めには、利用可能な最新のリリ~~ースを
	2805	正しくないフラグがついたデータを返す可能性があります。
4792		使うのが最良です。)
4793		S<C<use v5.12>> 以上を使うと、C<unicode_strings> は自動的に選択されることに
4794		注意してください。
4795	2806
4796	2807	=begin original
4797	2808
4798		For ~~Perls~~ earlier ~~tha~~n th~~ose~~ ~~descr~~ibed a~~bove~~, or when a string ~~is passed~~
	2809	So if you're working with Unicode data, consult the documentation of
4799		to a fu~~nct~~ion ~~outs~~ide the ~~scop~~e ~~of C<u~~ni~~code_~~s~~tring~~s~~>, s~~ee the next ~~sec~~t~~ion.~~
	2810	every module you're using if there are any issues with Unicode data
	2811	exchange. If the documentation does not talk about Unicode at all,
	2812	suspect the worst and probably look at the source to learn how the
	2813	module is implemented. Modules written completely in Perl shouldn't
	2814	cause problems. Modules that directly or indirectly access code written
	2815	in other programming languages are at risk.
4800	2816
4801	2817	=end original
4802	2818
4803		~~前述し~~たも~~のより古い~~ Perl の~~場合や~~、~~文字列が~~ ~~C<u~~nicode~~_strings>~~ の
	2819	そのため、もし Unicode データを扱おうというのであれば、 Unicode データの
4804		~~スコープ外の~~関数から~~渡された場合、次~~の節を~~参照してください。~~
	2820	交換に関して何らかの記述があるのなら使うモジュールすべてのドキュメントを
	2821	調べてください。
	2822	ドキュメントが Unicode に関して何の言及もしていないのなら、最悪のケースを
	2823	考慮し、そしてそのモジュールがどのように実装されているかを知るために
	2824	ソースを見ることになるかもしれません。
	2825	完全に Perl で書かれたモジュールは問題を引き起こしません。
	2826	他のプログラミング言語で書かれている直接または間接にアクセスするコードに
	2827	リスクがあるのです。
4805	2828
4806		=head2 Forcing Unicode in Perl (Or Unforcing Unicode in Perl)
4807
4808		(Unicode を Perl に強制する (あるいは Unicode でないことを Perl に強制する))
4809
4810	2829	=begin original
4811	2830
4812		Sometimes (see ~~L</"W~~hen Uni~~cod~~e ~~Doe~~s Not Happe~~n">~~ or ~~L</The "Unic~~ode Bu~~g">)~~
	2831	For affected functions, the simple strategy to avoid data corruption is
4813		t~~here~~ a~~re situ~~a~~tion~~s where you ~~simply~~ need to ~~for~~ce a ~~byte~~
	2832	to always make the encoding of the exchanged data explicit. Choose an
4814		~~str~~ing into ~~UTF-8,~~ or ~~vic~~e versa. Th~~e st~~and~~ard modu~~le ~~L<Enc~~ode> can be
	2833	encoding that you know the extension can handle. Convert arguments passed
4815		~~used f~~or this, or the lo~~w-le~~vel ca~~lls~~
	2834	to the extensions to that encoding and convert results back from that
4816		~~L<C<utf8::upgrad~~e~~($bytestr~~ing~~)>\|utf8/Ut~~ility functions> and
	2835	encoding. Write wrapper functions that do the conversions for you, so
4817		~~L<C<utf8::d~~owngra~~de($u~~t~~f8st~~ring[, ~~FAIL_OK])>\|u~~t~~f8/Utility~~ functions>.
	2836	you can later change the functions when the extension catches up.
4818	2837
4819	2838	=end original
4820	2839
4821		~~ときとして~~(~~L</When Unico~~d~~e Does No~~t Happen~~> を参照~~)~~、バイト列~~を
	2840	影響を受けた関数のための、データの劣化(data corruption)を防ぐ単純な
4822		~~UTF-8 であ~~る~~ように強制したりそ~~の逆を~~行う場合があ~~る~~かもしれません~~。
	2841	戦略とは、交換するデータのエンコーディングを常に明確にするということです。
4823		~~標準モジュ~~ー~~ル L<Encode> や、~~
	2842	エクステンションが取り扱うことができると知っているエンコーディングを
4824		~~低レベルの呼び出~~し
	2843	選択してください。
4825		~~L<C<utf8::upgrade($bytestring)>\|utf8/Utility functions> と~~
	2844	エクステンションに渡す引数を選択したエンコーディングに変換し、
4826		~~L<C<utf8::downgrade($utf8string[, FAIL_OK])>\|utf8/Utility functions> が~~
	2845	エクステンションから返ってきた結果をそのエンコーディングから
4827		~~このため~~に使えます。
	2846	逆方向に変換します。
	2847	変換を行ってくれるラッパ関数を書いておいて、
	2848	エクステンションが追いついた時に関数を変更できるようにしておきます。
4828	2849
4829	2850	=begin original
4830	2851
4831		Note ~~that C<utf8::downg~~rade~~()>~~ can fail if the str~~ing~~ contains cha~~ract~~ers
	2852	To provide an example, let's say the popular Foo::Bar::escape_html
4832		t~~hat~~ don't fit into a byte.
	2853	function doesn't deal with Unicode data yet. The wrapper function
	2854	would convert the argument to raw UTF-8 and convert the result back to
	2855	Perl's internal representation like so:
4833	2856
4834	2857	=end original
4835	2858
4836		~~C<utf8::dow~~n~~gra~~de~~()>~~ ~~は、バイト~~に~~収まら~~ない~~文字を含む文字列の場合は~~
	2859	例として、まだ Unicode データを取り扱うようにはできていない、
4837		~~失敗することがあること~~に~~注意してくださ~~い。
	2860	有名な Foo::Bar::escape_html について述べましょう。
	2861	ラッパ関数は引数を生の UTF-8 に変換し、結果を Perl の内部表現に
	2862	逆変換します:
4838	2863
	2864	sub my_escape_html ($) {
	2865	my($what) = shift;
	2866	return unless defined $what;
	2867	Encode::decode_utf8(Foo::Bar::escape_html(Encode::encode_utf8($what)));
	2868	}
	2869
4839	2870	=begin original
4840	2871
4841		~~Call~~ing either ~~func~~tion o~~n a~~ s~~tri~~ng that a~~lready~~ ~~is in~~ the des~~ired~~ state is a
	2872	Sometimes, when the extension does not convert data but just stores
4842		no-op.
	2873	and retrieves them, you will be in a position to use the otherwise
	2874	dangerous Encode::_utf8_on() function. Let's say the popular
	2875	C<Foo::Bar> extension, written in C, provides a C<param> method that
	2876	lets you store and retrieve data according to these prototypes:
4843	2877
4844	2878	=end original
4845	2879
4846		~~既に望み通りの状態に~~なってい~~る文字列に対してこ~~れ~~らの関数を呼び~~出しても、
	2880	エクステンションがデータを変換しないけれども格納したり取り出したりするときに、
4847		何も~~起こりません。~~
	2881	ときとして危険な Encode::_utf8_on() 関数以外のものを
	2882	使うことがあるかもしれません。
	2883	C で書かれていて、データを以下のプロトタイプに従って格納したり
	2884	取り出したりする C<param> メソッドを持っている
	2885	有名な C<Foo::Bar> エクステンションについて述べてみましょう:
4848	2886
	2887	$self->param($name, $value); # set a scalar
	2888	$value = $self->param($name); # retrieve a scalar
	2889
4849	2890	=begin original
4850	2891
4851		~~L</ASC~~II ~~Rules~~ versus Unicode ~~Rul~~es> gives ~~all~~ the ways ~~that a str~~ing is
	2892	If it does not yet provide support for any encoding, one could write a
4852		made ~~to u~~se Unic~~ode~~ rules.
	2893	derived class with such a C<param> method:
4853	2894
4854	2895	=end original
4855	2896
4856		~~L</ASCII Rules versus Unicode Rules> は、Unicode~~ の~~規則を使う文字列が~~
	2897	どのエンコーディングもまだサポートしていないのなら、
4857		~~作られる全て~~の方法を~~提供します。~~
	2898	以下のような C<param> メソッドを持った派生クラスを
	2899	記述することができるでしょう:
4858	2900
4859		~~=head2~~ ~~Using~~ ~~Unicode~~ in XS
	2901	sub param {
	2902	my($self,$name,$value) = @_;
	2903	utf8::upgrade($name); # make sure it is UTF-8 encoded
	2904	if (defined $value) {
	2905	utf8::upgrade($value); # make sure it is UTF-8 encoded
	2906	return $self->SUPER::param($name,$value);
	2907	} else {
	2908	my $ret = $self->SUPER::param($name);
	2909	Encode::_utf8_on($ret); # we know, it is UTF-8 encoded
	2910	return $ret;
	2911	}
	2912	}
4860	2913
4861		(XS で Unicode を使う)
4862
4863	2914	=begin original
4864	2915
4865		See ~~L<p~~e~~rlgu~~t~~s/"U~~nicode Suppor~~t">~~ for an intr~~oduct~~ion to ~~Uni~~c~~ode~~ at
	2916	Some extensions provide filters on data entry/exit points, such as
4866		the XS level, and ~~L<perl~~api~~/Unic~~ode ~~Supp~~ort> for the ~~API deta~~ils.
	2917	DB_File::filter_store_key and family. Look out for such filters in
	2918	the documentation of your extensions, they can make the transition to
	2919	Unicode data much easier.
4867	2920
4868	2921	=end original
4869	2922
4870		~~XS レベル~~の ~~Unicode の紹介について~~は ~~L<perlguts~~/~~"Unicode Support">~~ を、
	2923	一部のエクステンションはデータのエントリ/脱出ポイントでフィルターを
4871		~~API の詳細については L<perlapi/Unicode Support> を参照~~して~~くださ~~い。
	2924	提供しています。
	2925	たとえば DB_File::filter_store_keyとその仲間です。
	2926	あなた使うエクステンションのドキュメントにあるそのようなフィルターに
	2927	注意してください。
	2928	それらは Unicode データの変化をより容易にします。
4872	2929
4873		=head2 ~~Hacking P~~e~~rl to work on~~ e~~arlier Unico~~d~~e versions (for very serious hackers only)~~
	2930	=head2 Speed
4874	2931
4875		(~~以前の Unicode のバージョンで動作させるように Perl をハックする (とても真剣なハッカー専用)~~)
	2932	(速度)
4876	2933
4877	2934	=begin original
4878	2935
4879		Perl ~~by de~~fault comes ~~with the l~~atest s~~upp~~orted Unicode ~~ver~~s~~ion buil~~t-in, but
	2936	Some functions are slower when working on UTF-8 encoded strings than
4880		the goal is to allow you to cha~~nge~~ to ~~use a~~ny e~~arli~~er o~~ne.~~ ~~In P~~erls
	2937	on byte encoded strings. All functions that need to hop over
4881		~~v5.20 and v5.22,~~ however, the earliest usab~~le ver~~sion i~~s U~~n~~ico~~de ~~5.1.~~
	2938	characters such as length(), substr() or index(), or matching regular
4882		Perl ~~v5.18~~ and ~~v5.24~~ are ~~abl~~e to handle all earlie~~r versions.~~
	2939	expressions can work B<much> faster when the underlying data are
	2940	byte-encoded.
4883	2941
4884	2942	=end original
4885	2943
4886		~~Perl はデフォルトでは最新~~の U~~nicode~~ ~~バージョ~~ン~~が組み込ま~~れて~~いますが、~~
	2944	一部の関数は UTF-8 でエンコードされた文字列に対して適用したときにバイト
4887		~~目標は、より古いもの~~に~~変更できるように~~することです。
	2945	エンコードされた文字列に対するときよりも遅くなります。
4888		し~~かし、P~~er~~l v5.20 と v5.22 は~~、~~利用可能~~なもっと~~早いバージョンは~~
	2946	文字に対して働く必要のある length()、substr()、index()のような関数のすべてと
4889		~~Unicode 5.1 です。~~
	2947	正規表現マッチングは、データが
4890		~~Perl v5.18~~ と ~~v5.24~~ で~~は、それ以前の全てのバージョンが利用可能で~~す。
	2948	バイトエンコードされているときには B<かなり> 早く動作できます。
4891	2949
4892	2950	=begin original
4893	2951
4894		~~Dow~~n~~load~~ the files in the desired ~~ver~~s~~ion~~ of Uni~~cod~~e from ~~the Un~~i~~code~~ web
	2952	In Perl 5.8.0 the slowness was often quite spectacular; in Perl 5.8.1
4895		~~site~~ L<h~~ttps://www.un~~i~~code.or~~g~~>).~~ These should replace the ~~exi~~s~~ting fi~~les in
	2953	a caching scheme was introduced which will hopefully make the slowness
4896		~~F<lib/unic~~ore~~> in t~~he ~~Per~~l source tree. Fo~~llow~~ the ~~inst~~ructions in
	2954	somewhat less spectacular, at least for some operations. In general,
4897		~~F<README.~~perl> in that ~~dir~~ectory t~~o cha~~nge s~~ome~~ o~~f th~~eir ~~name~~s, and the~~n bui~~ld
	2955	operations with UTF-8 encoded strings are still slower. As an example,
4898		perl (see L<IN~~STALL~~>).
	2956	the Unicode properties (character classes) like C<\p{Nd}> are known to
	2957	be quite a bit slower (5-20 times) than their simpler counterparts
	2958	like C<\d> (then again, there 268 Unicode characters matching C<Nd>
	2959	compared with the 10 ASCII characters matching C<d>).
4899	2960
4900	2961	=end original
4901	2962
4902		~~Unicod~~e ~~の Web サイト L<https://www~~.~~unicode~~.~~org>~~ ~~から、~~目的の ~~Unicode~~
	2963	Perl 5.8.0 ではこの遅さはしばしば目立つものでした。
4903		~~バージョン~~の~~ファイル~~を~~ダウンロードしま~~す。
	2964	Perl 5.8.1 では少なくとも一部の操作については、遅さを改善することを
4904		~~これらのファイルは、Perl ソー~~スツリー~~の F<l~~i~~b/u~~nicore~~> の既存のファイルを~~
	2965	期待するキャッシングスキーム(caching scheme)が導入されました。
4905		~~置き換え~~る~~必要があり~~ます。
	2966	一般的には、UTF-8 エンコードされた文字列に対する操作はまだ遅いものです。
4906		~~一部の名前を変~~え~~るには~~、~~そのディレクトリにある F~~<~~README.~~p~~erl~~> の~~指示に従って、~~
	2967	たとえば、C<\p{Nd}> のような Unicode の特性(文字クラス)は対応する
4907		~~perl~~ ~~をビルドし~~て~~くださ~~い (~~L<INSTALL>~~ 参照)。
	2968	C<\d> のような単純なものよりも目立って遅い(5 倍から10 倍)ことが
	2969	知られています(繰り返しますが、C<d> は 10 の ASCII 文字に対して
	2970	マッチするのに対して C<Nd> は 268 の Unicode 文字にマッチします)。
4908	2971
4909	2972	=head2 Porting code from perl-5.6.X
4910	2973
4911	2974	(perl 5.6.X からコードを移植する)
4912	2975
4913	2976	=begin original
4914	2977
4915		Perls ~~starting in~~ 5.8 have a different Unicode model from 5.6. In 5.6 the
	2978	Perl 5.8 has a different Unicode model from 5.6. In 5.6 the programmer
4916		~~programmer~~ was required to use the C<utf8> pragma to declare that a
	2979	was required to use the C<utf8> pragma to declare that a given scope
4917		~~giv~~e~~n scope e~~xpected to deal with Unicode data and had to make sure that
	2980	expected to deal with Unicode data and had to make sure that only
4918		~~only~~ Unicode data were reaching that scope. If you have code that is
	2981	Unicode data were reaching that scope. If you have code that is
4919	2982	working with 5.6, you will need some of the following adjustments to
4920		your code. The examples are written such that the code will continue to
	2983	your code. The examples are written such that the code will continue
4921		work under 5.6, so you should be safe to try them out.
	2984	to work under 5.6, so you should be safe to try them out.
4922	2985
4923	2986	=end original
4924	2987
4925		Perl 5.8 からは 5.6 とは異なる Unicode モデルを持っています。
	2988	Perl 5.8 は 5.6 とは異なる Unicode モデルを持っています。
4926	2989	5.6 ではプログラマは、ある与えられたスコープが Unicode データを
4927	2990	取り扱うのと Unicode データだけがそのスコープにあることを宣言するのに
4928	2991	C<utf8> プラグマの使用を要求されていました。
4929	2992	5.6 で動いていたプログラムを持っているのなら、以下に挙げる微調整を施す
4930	2993	必要があるでしょう。
4931	2994	例は 5.6 でも動くように書かれているので、安心して試すことができます。
4932	2995
4933		=over 3
	2996	=over 4
4934	2997
4935	2998	=item *
4936	2999
4937	3000	=begin original
4938	3001
4939	3002	A filehandle that should read or write UTF-8
4940	3003
4941	3004	=end original
4942	3005
4943	3006	UTF-8 で読み書きすべきファイルハンドル
4944	3007
4945		if ($] > 5.008) {
	3008	if ($] > 5.007) {
4946		binmode $fh, ":encoding(~~UTF-~~8)";
	3009	binmode $fh, ":encoding(utf8)";
4947	3010	}
4948	3011
4949	3012	=item *
4950	3013
4951	3014	=begin original
4952	3015
4953	3016	A scalar that is going to be passed to some extension
4954	3017
4955	3018	=end original
4956	3019
4957	3020	何らかのエクステンションに渡そうとするスカラ
4958	3021
4959	3022	=begin original
4960	3023
4961		Be it C<Compress::Zlib>, C<Apache::Request> or any extension that has no
	3024	Be it Compress::Zlib, Apache::Request or any extension that has no
4962	3025	mention of Unicode in the manpage, you need to make sure that the
4963	3026	UTF8 flag is stripped off. Note that at the time of this writing
4964		(~~Janua~~ry 2012) the mentioned modules are not UTF-8-aware. Please
	3027	(October 2002) the mentioned modules are not UTF-8-aware. Please
4965	3028	check the documentation to verify if this is still true.
4966	3029
4967	3030	=end original
4968	3031
4969		C<Compress::Zlib~~>, C<~~Apache::Request> などの、マニュアルページに Unicode に
	3032	Compress::Zlib、Apache::Request などの、マニュアルページに Unicode に
4970	3033	関する記載がない何らかのエクステンションで、確実に UTF8 フラグが
4971	3034	オフにする必要があります。
4972		これを書いている時点(2012 年 1 月)では、上記のモジュールは
	3035	これを書いている時点(2002 年 10 月)では、上記のモジュールは
4973	3036	UTF-8 対応でないことに注意してください。
4974	3037	これがまだ真であるのなら、ドキュメントをチェックして確かめてください。
4975	3038
4976		if ($] > 5.008) {
	3039	if ($] > 5.007) {
4977	3040	require Encode;
4978		$val = Encode::encode~~("UTF-~~8", $val); # make octets
	3041	$val = Encode::encode_utf8($val); # make octets
4979	3042	}
4980	3043
4981	3044	=item *
4982	3045
4983	3046	=begin original
4984	3047
4985	3048	A scalar we got back from an extension
4986	3049
4987	3050	=end original
4988	3051
4989	3052	エクステンションから返ってきたスカラ
4990	3053
4991	3054	=begin original
4992	3055
4993	3056	If you believe the scalar comes back as UTF-8, you will most likely
4994	3057	want the UTF8 flag restored:
4995	3058
4996	3059	=end original
4997	3060
4998	3061	そのスカラが UTF-8 として返ってきたものだと信じているのなら、
4999	3062	UTF-8 フラグをリストアしたいと考えるでしょう:
5000	3063
5001		if ($] > 5.008) {
	3064	if ($] > 5.007) {
5002	3065	require Encode;
5003		$val = Encode::decode~~("UTF-~~8", $val);
	3066	$val = Encode::decode_utf8($val);
5004	3067	}
5005	3068
5006	3069	=item *
5007	3070
5008	3071	=begin original
5009	3072
5010	3073	Same thing, if you are really sure it is UTF-8
5011	3074
5012	3075	=end original
5013	3076
5014	3077	同様に、UTF-8 だと確信しているのなら
5015	3078
5016		if ($] > 5.008) {
	3079	if ($] > 5.007) {
5017	3080	require Encode;
5018	3081	Encode::_utf8_on($val);
5019	3082	}
5020	3083
5021	3084	=item *
5022	3085
5023	3086	=begin original
5024	3087
5025		A wrapper for ~~L<DBI> C<~~fetchrow_array> and C<fetchrow_hashref>
	3088	A wrapper for fetchrow_array and fetchrow_hashref
5026	3089
5027	3090	=end original
5028	3091
5029		~~L<DBI> の C<~~fetchrow_array> と C<fetchrow_hashref> へのラッパ
	3092	fetchrow_array と fetchrow_hashref へのラッパ
5030	3093
5031	3094	=begin original
5032	3095
5033	3096	When the database contains only UTF-8, a wrapper function or method is
5034		a convenient way to replace all your C<fetchrow_array> and
	3097	a convenient way to replace all your fetchrow_array and
5035		C<fetchrow_hashref> calls. A wrapper function will also make it easier to
	3098	fetchrow_hashref calls. A wrapper function will also make it easier to
5036	3099	adapt to future enhancements in your database driver. Note that at the
5037		time of this writing (~~Janua~~ry 2012), the DBI has no standardized way
	3100	time of this writing (October 2002), the DBI has no standardized way
5038		to deal with UTF-8 data. Please check the ~~L<DBI~~ documentation~~\|DBI>~~ to verify if
	3101	to deal with UTF-8 data. Please check the documentation to verify if
5039	3102	that is still true.
5040	3103
5041	3104	=end original
5042	3105
5043	3106	データベースが UTF-8 のみから構成されているとき、ラッパ関数や
5044		ラッパメソッドはあなたの C<fetchrow_array> や C<fetchrow_hashref> の
	3107	ラッパメソッドはあなたの fetchrow_array や fetchrow_hashref の呼び出しを
5045		~~呼び出しを~~置き換えるのに便利な方法でしょう。
	3108	置き換えるのに便利な方法でしょう。
5046	3109	ラッパ関数はまた、あなたの使っているデータベースドライバが
5047	3110	将来拡張されたときに適用しやすくするでしょう。
5048		このドキュメントを書いている時点(2012 年 1 月)では、DBI は UTF-8 のデータを
	3111	このドキュメントを書いている時点(2002 年 10 月)では、DBI は UTF-8 のデータを
5049	3112	扱う標準的な方法を持っていません。
5050		これがまだ真なら ~~L<DBI の文書\|DBI>~~ をチェックして確かめてください。
	3113	これがまだ真ならドキュメントをチェックして確かめてください。
5051	3114
5052	3115	sub fetchrow {
5053		# $what is one of fetchrow_{array,hashref}
	3116	my($self, $sth, $what) = @_; # $what is one of fetchrow_{array,hashref}
5054		~~my($sel~~f, $~~sth,~~ ~~$what~~) ~~= @_;~~
	3117	if ($] < 5.007) {
5055		if ($] < 5.008) {
5056	3118	return $sth->$what;
5057	3119	} else {
5058	3120	require Encode;
5059	3121	if (wantarray) {
5060	3122	my @arr = $sth->$what;
5061	3123	for (@arr) {
5062	3124	defined && /[^\000-\177]/ && Encode::_utf8_on($_);
5063	3125	}
5064	3126	return @arr;
5065	3127	} else {
5066	3128	my $ret = $sth->$what;
5067	3129	if (ref $ret) {
5068	3130	for my $k (keys %$ret) {
5069		defined
	3131	defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret->{$k};
5070		&& /[^\000-\177]/
5071		&& Encode::_utf8_on($_) for $ret->{$k};
5072	3132	}
5073	3133	return $ret;
5074	3134	} else {
5075	3135	defined && /[^\000-\177]/ && Encode::_utf8_on($_) for $ret;
5076	3136	return $ret;
5077	3137	}
5078	3138	}
5079	3139	}
5080	3140	}
5081	3141
5082	3142	=item *
5083	3143
5084	3144	=begin original
5085	3145
5086	3146	A large scalar that you know can only contain ASCII
5087	3147
5088	3148	=end original
5089	3149
5090	3150	ASCII だけが含まれていると分かっている大きなスカラ
5091	3151
5092	3152	=begin original
5093	3153
5094	3154	Scalars that contain only ASCII and are marked as UTF-8 are sometimes
5095	3155	a drag to your program. If you recognize such a situation, just remove
5096	3156	the UTF8 flag:
5097	3157
5098	3158	=end original
5099	3159
5100	3160	ASCII だけから構成されているのに UTF8 として印付けされているスカラが
5101	3161	あなたのプログラムへ引きずりこまれることがあります。
5102	3162	そのような場合を認識したならば、単に UTF-8 フラグを取り除いてください:
5103	3163
5104		utf8::downgrade($val) if $] > 5.008;
	3164	utf8::downgrade($val) if $] > 5.007;
5105	3165
5106	3166	=back
5107	3167
5108		=head1 BUGS
5109
5110		=begin original
5111
5112		See also L</The "Unicode Bug"> above.
5113
5114		=end original
5115
5116		前述の L</The "Unicode Bug"> も参照してください。
5117
5118		=head2 Interaction with Extensions
5119
5120		(エクステンションとの相互作用)
5121
5122		=begin original
5123
5124		When Perl exchanges data with an extension, the extension should be
5125		able to understand the UTF8 flag and act accordingly. If the
5126		extension doesn't recognize that flag, it's likely that the extension
5127		will return incorrectly-flagged data.
5128
5129		=end original
5130
5131		Perl がエクステンションとデータをやり取りするとき、そのエクステンションは
5132		UTF8 フラグを理解し、また、それに従った振る舞いをすべきです。
5133		エクステンションがこのフラグを認識しない場合、そのエクステンションは
5134		正しくないフラグがついたデータを返す可能性があります。
5135
5136		=begin original
5137
5138		So if you're working with Unicode data, consult the documentation of
5139		every module you're using if there are any issues with Unicode data
5140		exchange. If the documentation does not talk about Unicode at all,
5141		suspect the worst and probably look at the source to learn how the
5142		module is implemented. Modules written completely in Perl shouldn't
5143		cause problems. Modules that directly or indirectly access code written
5144		in other programming languages are at risk.
5145
5146		=end original
5147
5148		そのため、もし Unicode データを扱おうというのであれば、 Unicode データの
5149		交換に関して何らかの記述があるのなら使うモジュールすべてのドキュメントを
5150		調べてください。
5151		ドキュメントが Unicode に関して何の言及もしていないのなら、最悪のケースを
5152		考慮し、そしてそのモジュールがどのように実装されているかを知るために
5153		ソースを見ることになるかもしれません。
5154		完全に Perl で書かれたモジュールは問題を引き起こしません。
5155		他のプログラミング言語で書かれている直接または間接にアクセスするコードに
5156		リスクがあるのです。
5157
5158		=begin original
5159
5160		For affected functions, the simple strategy to avoid data corruption is
5161		to always make the encoding of the exchanged data explicit. Choose an
5162		encoding that you know the extension can handle. Convert arguments passed
5163		to the extensions to that encoding and convert results back from that
5164		encoding. Write wrapper functions that do the conversions for you, so
5165		you can later change the functions when the extension catches up.
5166
5167		=end original
5168
5169		影響を受けた関数のための、データの劣化(data corruption)を防ぐ単純な
5170		戦略とは、交換するデータのエンコーディングを常に明確にするということです。
5171		エクステンションが取り扱うことができると知っているエンコーディングを
5172		選択してください。
5173		エクステンションに渡す引数を選択したエンコーディングに変換し、
5174		エクステンションから返ってきた結果をそのエンコーディングから
5175		逆方向に変換します。
5176		変換を行ってくれるラッパ関数を書いておいて、
5177		エクステンションが追いついた時に関数を変更できるようにしておきます。
5178
5179		=begin original
5180
5181		To provide an example, let's say the popular C<Foo::Bar::escape_html>
5182		function doesn't deal with Unicode data yet. The wrapper function
5183		would convert the argument to raw UTF-8 and convert the result back to
5184		Perl's internal representation like so:
5185
5186		=end original
5187
5188		例として、まだ Unicode データを取り扱うようにはできていない、
5189		有名な C<Foo::Bar::escape_html> について述べましょう。
5190		ラッパ関数は引数を生の UTF-8 に変換し、結果を Perl の内部表現に
5191		逆変換します:
5192
5193		sub my_escape_html ($) {
5194		my($what) = shift;
5195		return unless defined $what;
5196		Encode::decode("UTF-8", Foo::Bar::escape_html(
5197		Encode::encode("UTF-8", $what)));
5198		}
5199
5200		=begin original
5201
5202		Sometimes, when the extension does not convert data but just stores
5203		and retrieves it, you will be able to use the otherwise
5204		dangerous L<C<Encode::_utf8_on()>\|Encode/_utf8_on> function. Let's say
5205		the popular C<Foo::Bar> extension, written in C, provides a C<param>
5206		method that lets you store and retrieve data according to these prototypes:
5207
5208		=end original
5209
5210		エクステンションがデータを変換しないけれども格納したり取り出したりするときに、
5211		ときとして危険な L<C<Encode::_utf8_on()>\|Encode/_utf8_on> 関数以外のものを
5212		使うことができるかもしれません。
5213		C で書かれていて、データを以下のプロトタイプに従って格納したり
5214		取り出したりする C<param> メソッドを持っている
5215		有名な C<Foo::Bar> エクステンションについて述べてみましょう:
5216
5217		$self->param($name, $value); # set a scalar
5218		$value = $self->param($name); # retrieve a scalar
5219
5220		=begin original
5221
5222		If it does not yet provide support for any encoding, one could write a
5223		derived class with such a C<param> method:
5224
5225		=end original
5226
5227		どのエンコーディングもまだサポートしていないのなら、
5228		以下のような C<param> メソッドを持った派生クラスを
5229		記述することができるでしょう:
5230
5231		sub param {
5232		my($self,$name,$value) = @_;
5233		utf8::upgrade($name); # make sure it is UTF-8 encoded
5234		if (defined $value) {
5235		utf8::upgrade($value); # make sure it is UTF-8 encoded
5236		return $self->SUPER::param($name,$value);
5237		} else {
5238		my $ret = $self->SUPER::param($name);
5239		Encode::_utf8_on($ret); # we know, it is UTF-8 encoded
5240		return $ret;
5241		}
5242		}
5243
5244		=begin original
5245
5246		Some extensions provide filters on data entry/exit points, such as
5247		C<DB_File::filter_store_key> and family. Look out for such filters in
5248		the documentation of your extensions; they can make the transition to
5249		Unicode data much easier.
5250
5251		=end original
5252
5253		一部のエクステンションはデータのエントリ/脱出ポイントでフィルターを
5254		提供しています; たとえば C<DB_File::filter_store_key> とその仲間です。
5255		あなた使うエクステンションのドキュメントにあるそのようなフィルターに
5256		注意してください; それらは Unicode データの変化をより容易にします。
5257
5258		=head2 Speed
5259
5260		(速度)
5261
5262		=begin original
5263
5264		Some functions are slower when working on UTF-8 encoded strings than
5265		on byte encoded strings. All functions that need to hop over
5266		characters such as C<length()>, C<substr()> or C<index()>, or matching
5267		regular expressions can work B<much> faster when the underlying data are
5268		byte-encoded.
5269
5270		=end original
5271
5272		一部の関数は UTF-8 でエンコードされた文字列に対して適用したときにバイト
5273		エンコードされた文字列に対するときよりも遅くなります。
5274		文字に対して働く必要のある C<length()>, C<substr()>, C<index()>
5275		のような関数のすべてと正規表現マッチングは、データが
5276		バイトエンコードされているときには B<かなり> 早く動作できます。
5277
5278		=begin original
5279
5280		In Perl 5.8.0 the slowness was often quite spectacular; in Perl 5.8.1
5281		a caching scheme was introduced which improved the situation. In general,
5282		operations with UTF-8 encoded strings are still slower. As an example,
5283		the Unicode properties (character classes) like C<\p{Nd}> are known to
5284		be quite a bit slower (5-20 times) than their simpler counterparts
5285		like C<[0-9]> (then again, there are hundreds of Unicode characters matching
5286		C<Nd> compared with the 10 ASCII characters matching C<[0-9]>).
5287
5288		=end original
5289
5290		Perl 5.8.0 ではこの遅さはしばしば目立つものでした; Perl 5.8.1 では
5291		状況を改善するキャッシュ機構が導入されました。
5292		一般的には、UTF-8 エンコードされた文字列に対する操作はまだ遅いものです。
5293		たとえば、C<\p{Nd}> のような Unicode の特性(文字クラス)は対応する
5294		C<[0-9]> のような単純なものよりも目立って遅い(5 倍から10 倍)ことが
5295		知られています(繰り返しますが、C<[0-9]> は 10 の ASCII 文字に対して
5296		マッチするのに対して C<Nd> は数百の Unicode 文字にマッチします)。
5297
5298	3168	=head1 SEE ALSO
5299	3169
5300		L<perlunitut>, L<perluniintro>, L<~~perluniprops>, L<~~Encode>, L<open>, L<utf8>, L<bytes>,
	3170	L<perlunitut>, L<perluniintro>, L<Encode>, L<open>, L<utf8>, L<bytes>,
5301		L<perlretut>, L<perlvar/"${^UNICODE}">,
	3171	L<perlretut>, L<perlvar/"${^UNICODE}">
5302		L<https://www.unicode.org/reports/tr44>).
5303	3172
5304		=cut
5305
5306	3173	=begin meta
5307	3174
5308	3175	Translate: KIMURA Koichi (-5.8.5)
5309		Update: ~~SHIRA~~K~~ATA K~~entaro <argrath@ub32.org> (5.10.0-)
	3176	Update: Kentaro Shirakata <argrath@ub32.org> (5.10.0-)
5310		Status: completed
5311	3177
5312	3178	=end meta
	3179
	3180	=cut

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association