perlrecharclass 5.30.0 と 5.26.1 の差分

1	1
2	2	=encoding euc-jp
3	3
4	4	=head1 NAME
5	5	X<character class>
6	6
7	7	=begin original
8	8
9	9	perlrecharclass - Perl Regular Expression Character Classes
10	10
11	11	=end original
12	12
13	13	perlrecharclass - Perl 正規表現文字クラス
14	14
15	15	=head1 DESCRIPTION
16	16
17	17	=begin original
18	18
19	19	The top level documentation about Perl regular expressions
20	20	is found in L<perlre>.
21	21
22	22	=end original
23	23
24	24	Perl 正規表現に関する最上位文書は L<perlre> です。
25	25
26	26	=begin original
27	27
28	28	This manual page discusses the syntax and use of character
29	29	classes in Perl regular expressions.
30	30
31	31	=end original
32	32
33	33	このマニュアルページは Perl 正規表現の文字クラスの文法と使用法について
34	34	議論します。
35	35
36	36	=begin original
37	37
38	38	A character class is a way of denoting a set of characters
39	39	in such a way that one character of the set is matched.
40	40	It's important to remember that: matching a character class
41	41	consumes exactly one character in the source string. (The source
42	42	string is the string the regular expression is matched against.)
43	43
44	44	=end original
45	45
46	46	文字クラスは、集合の中の一文字がマッチングするというような方法で、
47	47	文字の集合を指定するための方法です。
48	48	次のことを覚えておくことは重要です: 文字集合はソース文字列の中から正確に
49	49	一文字だけを消費します。
50	50	(ソース文字列とは正規表現がマッチングしようとしている文字列です。)
51	51
52	52	=begin original
53	53
54	54	There are three types of character classes in Perl regular
55	55	expressions: the dot, backslash sequences, and the form enclosed in square
56	56	brackets. Keep in mind, though, that often the term "character class" is used
57	57	to mean just the bracketed form. Certainly, most Perl documentation does that.
58	58
59	59	=end original
60	60
61	61	Perl 正規表現には 3 種類の文字クラスがあります: ドット、
62	62	逆スラッシュシーケンス、大かっこで囲まれた形式です。
63	63	しかし、「文字クラス」という用語はしばしば大かっこ形式だけを意味するために
64	64	使われることに注意してください。
65	65	確かに、ほとんどの Perl 文書ではそうなっています。
66	66
67	67	=head2 The dot
68	68
69	69	(ドット)
70	70
71	71	=begin original
72	72
73	73	The dot (or period), C<.> is probably the most used, and certainly
74	74	the most well-known character class. By default, a dot matches any
75	75	character, except for the newline. That default can be changed to
76	76	add matching the newline by using the I<single line> modifier:
77	77	for the entire regular expression with the C</s> modifier, or
78	78	locally with C<(?s)> (and even globally within the scope of
79	79	L<C<use re '/s'>\|re/'E<sol>flags' mode>). (The C<L</\N>> backslash
80	80	sequence, described
81	81	below, matches any character except newline without regard to the
82	82	I<single line> modifier.)
83	83
84	84	=end original
85	85
86	86	ドット (またはピリオド) C<.> はおそらくもっともよく使われ、そして確実に
87	87	もっともよく知られている文字クラスです。
88	88	デフォルトでは、ドットは改行を除く任意の文字にマッチングします。
89	89	このデフォルトは I<単一行> 修飾子を使うことで改行にもマッチングするように
90	90	変更されます: 正規表現全体に対して C</s> 修飾子を使うか、ローカルには
91	91	C<(?s)> を使います
92	92	(そしてグローバルに L<C<use re '/s'>\|re/'E<sol>flags' mode> の
93	93	スコープ内の場合でもそうです)。
94	94	(後述する C<L</\N>> 逆スラッシュシーケンスでは、I<単一行> 修飾子に
95	95	関わりなく改行以外の任意の文字にマッチングします。)
96	96
97	97	=begin original
98	98
99	99	Here are some examples:
100	100
101	101	=end original
102	102
103	103	以下は例です:
104	104
105	105	=begin original
106	106
107	107	"a" =~ /./ # Match
108	108	"." =~ /./ # Match
109	109	"" =~ /./ # No match (dot has to match a character)
110	110	"\n" =~ /./ # No match (dot does not match a newline)
111	111	"\n" =~ /./s # Match (global 'single line' modifier)
112	112	"\n" =~ /(?s:.)/ # Match (local 'single line' modifier)
113	113	"ab" =~ /^.$/ # No match (dot matches one character)
114	114
115	115	=end original
116	116
117	117	"a" =~ /./ # マッチングする
118	118	"." =~ /./ # マッチングする
119	119	"" =~ /./ # マッチングしない (ドットは文字にマッチングする必要がある)
120	120	"\n" =~ /./ # マッチングしない (ドットは改行にはマッチングしない)
121	121	"\n" =~ /./s # マッチングする (グローバル「単一行」修飾子)
122	122	"\n" =~ /(?s:.)/ # マッチングする (ローカル「単一行」修飾子)
123	123	"ab" =~ /^.$/ # マッチングしない (ドットは一文字にマッチングする)
124	124
125	125	=head2 Backslash sequences
126	126	X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\p> X<\P>
127	127	X<\N> X<\v> X<\V> X<\h> X<\H>
128	128	X<word> X<whitespace>
129	129
130	130	(逆スラッシュシーケンス)
131	131
132	132	=begin original
133	133
134	134	A backslash sequence is a sequence of characters, the first one of which is a
135	135	backslash. Perl ascribes special meaning to many such sequences, and some of
136	136	these are character classes. That is, they match a single character each,
137	137	provided that the character belongs to the specific set of characters defined
138	138	by the sequence.
139	139
140	140	=end original
141	141
142	142	逆スラッシュシーケンスは、最初がバックスラッシュの文字並びです。
143	143	Perl はそのような並びの多くに特別な意味を持たせていて、
144	144	その一部は文字クラスです。
145	145	つまり、それらはそれぞれ並びによって定義されている特定の文字の集合に
146	146	帰属する一文字にマッチングします。
147	147
148	148	=begin original
149	149
150	150	Here's a list of the backslash sequences that are character classes. They
151	151	are discussed in more detail below. (For the backslash sequences that aren't
152	152	character classes, see L<perlrebackslash>.)
153	153
154	154	=end original
155	155
156	156	以下は文字クラスの逆スラッシュシーケンスの一覧です。
157	157	以下でさらに詳細に議論します。
158	158	(文字クラスではない逆スラッシュシーケンスについては、L<perlrebackslash> を
159	159	参照してください。)
160	160
161	161	=begin original
162	162
163	163	\d Match a decimal digit character.
164	164	\D Match a non-decimal-digit character.
165	165	\w Match a "word" character.
166	166	\W Match a non-"word" character.
167	167	\s Match a whitespace character.
168	168	\S Match a non-whitespace character.
169	169	\h Match a horizontal whitespace character.
170	170	\H Match a character that isn't horizontal whitespace.
171	171	\v Match a vertical whitespace character.
172	172	\V Match a character that isn't vertical whitespace.
173	173	\N Match a character that isn't a newline.
174	174	\pP, \p{Prop} Match a character that has the given Unicode property.
175	175	\PP, \P{Prop} Match a character that doesn't have the Unicode property
176	176
177	177	=end original
178	178
179	179	\d 10 進数字にマッチング。
180	180	\D 非 10 進数字にマッチング。
181	181	\w 「単語」文字にマッチング。
182	182	\W 非「単語」文字にマッチング。
183	183	\s 空白文字にマッチング。
184	184	\S 非空白文字にマッチング。
185	185	\h 水平空白文字にマッチング。
186	186	\H 水平空白でない文字にマッチング。
187	187	\v 垂直空白文字にマッチング。
188	188	\V 垂直空白でない文字にマッチング。
189	189	\N 改行以外の文字にマッチング。
190	190	\pP, \p{Prop} 指定された Unicode 特性を持つ文字にマッチング。
191	191	\PP, \P{Prop} 指定された Unicode 特性を持たない文字にマッチング。
192	192
193	193	=head3 \N
194	194
195	195	=begin original
196	196
197	197	C<\N>, available starting in v5.12, like the dot, matches any
198	198	character that is not a newline. The difference is that C<\N> is not influenced
199	199	by the I<single line> regular expression modifier (see L</The dot> above). Note
200	200	that the form C<\N{...}> may mean something completely different. When the
201	201	C<{...}> is a L<quantifier\|perlre/Quantifiers>, it means to match a non-newline
202	202	character that many times. For example, C<\N{3}> means to match 3
203	203	non-newlines; C<\N{5,}> means to match 5 or more non-newlines. But if C<{...}>
204	204	is not a legal quantifier, it is presumed to be a named character. See
205	205	L<charnames> for those. For example, none of C<\N{COLON}>, C<\N{4F}>, and
206	206	C<\N{F4}> contain legal quantifiers, so Perl will try to find characters whose
207	207	names are respectively C<COLON>, C<4F>, and C<F4>.
208	208
209	209	=end original
210	210
211	211	v5.12 から利用可能な C<\N> は、ドットのように、
212	212	改行以外の任意の文字にマッチングします。
213	213	違いは、C<\N> は I<単一行> 正規表現修飾子の影響を受けないことです
214	214	(上述の L</The dot> 参照)。
215	215	C<\N{...}> 型式は何か全く違うものを意味するかも知れないことに
216	216	注意してください。
217	217	C<{...}> が L<量指定子\|perlre/Quantifiers> なら、これは指定された回数の
218	218	非改行文字にマッチングします。
219	219	例えば、C<\N{3}> は三つの非改行にマッチングします;
220	220	C<\N{5,}> は五つ以上の非改行にマッチングします。
221	221	しかし、C<{...}> が有効な量指定子でない場合、これは名前付き文字と
222	222	推定されます。
223	223	これについては L<charnames> を参照してください。
224	224	例えば、C<\N{COLON}>, C<\N{4F}>, C<\N{F4}> はどれも有効な
225	225	量指定子ではないので、Perl はそれぞれ C<COLON>, C<4F>, C<F4> という名前の
226	226	文字を探そうとします。
227	227
228	228	=head3 Digits
229	229
230	230	(数字)
231	231
232	232	=begin original
233	233
234	234	C<\d> matches a single character considered to be a decimal I<digit>.
235	235	If the C</a> regular expression modifier is in effect, it matches [0-9].
236	236	Otherwise, it
237	237	matches anything that is matched by C<\p{Digit}>, which includes [0-9].
238	238	(An unlikely possible exception is that under locale matching rules, the
239	239	current locale might not have C<[0-9]> matched by C<\d>, and/or might match
240	240	other characters whose code point is less than 256. The only such locale
241	241	definitions that are legal would be to match C<[0-9]> plus another set of
242	242	10 consecutive digit characters; anything else would be in violation of
243	243	the C language standard, but Perl doesn't currently assume anything in
244	244	regard to this.)
245	245
246	246	=end original
247	247
248	248	C<\d> は 10 進 I<数字> と考えられる単一の文字にマッチングします。
249	249	C</a> 正規表現修飾子が有効の場合、これは [0-9] にマッチングします。
250	250	さもなければ、これは C<[0-9]> を含む、C<\p{Digit}> にマッチングするものに
251	251	マッチングします。
252	252	(ありそうもない例外はロケールマッチングの下で、現在のロケールが
253	253	C<\d> にマッチングする [0-9] がないか、
254	254	符号位置が 256 未満の他の文字にマッチングすることです。
255	255	唯一正当なロケール定義は、C<[0-9]> に加えてもう一つの 10 の連続した
256	256	数字の集合にマッチングするもので、
257	257	それ以外は C 言語標準に違反していますが、
258	258	Perl は今のところこれに関して何も仮定しません。)
259	259
260	260	=begin original
261	261
262	262	What this means is that unless the C</a> modifier is in effect C<\d> not
263	263	only matches the digits '0' - '9', but also Arabic, Devanagari, and
264	264	digits from other languages. This may cause some confusion, and some
265	265	security issues.
266	266
267	267	=end original
268	268
269	269	これが意味することは、C</a> 修飾子が有効でない限り、C<\d> は数字
270	270	'0' - '9' だけでなく、アラビア文字、デバナーガリ文字、およびその他の言語の
271	271	数字もマッチングします。
272	272	これは混乱やセキュリティ問題を引き起こすことがあります。
273	273
274	274	=begin original
275	275
276	276	Some digits that C<\d> matches look like some of the [0-9] ones, but
277	277	have different values. For example, BENGALI DIGIT FOUR (U+09EA) looks
278		very much like an ASCII DIGIT EIGHT (U+0038), ~~and~~ ~~LEPCH~~A ~~DIGIT~~ ~~SIX~~
	278	very much like an ASCII DIGIT EIGHT (U+0038). An application that
279		(U+1C46) looks very much like an ASCII DIGIT FIVE (U+0035). An
280		application that
281	279	is expecting only the ASCII digits might be misled, or if the match is
282	280	C<\d+>, the matched string might contain a mixture of digits from
283	281	different writing systems that look like they signify a number different
284	282	than they actually do. L<Unicode::UCD/num()> can
285	283	be used to safely
286	284	calculate the value, returning C<undef> if the input string contains
287		such a mixture. ~~Otherwise, for example, a displayed price might be~~
	285	such a mixture.
288		deliberately different than it appears.
289	286
290	287	=end original
291	288
292	289	C<\d> にマッチングする数字には、[0-9] のように見えるけれども、
293	290	異なる値を持つものもあります。
294		例えば、 C<BENGALI DIGIT FOUR> (U+09EA) は C<ASCII DIGIT EIGHT> (U+0038) に
	291	例えば、BENGALI DIGIT FOUR (U+09EA) は ASCII DIGIT EIGHT (U+0038) と
295		とてもよく似ていて、
296		C<LEPCHA DIGIT SIX> (U+1C46) は C<ASCII DIGIT FIVE> (U+0035) に
297	292	とてもよく似ています。
298	293	ASCII 数字のみを想定しているアプリケーションはミスリードされるかも知れず、
299	294	マッチングが C<\d+> の場合、
300	295	マッチングした文字列は、実際と異なる値を示しているように見える、
301	296	異なった書記体系からの数字が混ざったものかもしれません。
302	297	L<Unicode::UCD/num()> は値を安全に計算するのに使えます;
303	298	入力文字列がこのような混合を含んでいる場合は C<undef> を返します。
304		さもなければ、例えば、表示された価格は見た目と意図的に違うものに
305		なるかもしれません。
306	299
307	300	=begin original
308	301
309	302	What C<\p{Digit}> means (and hence C<\d> except under the C</a>
310	303	modifier) is C<\p{General_Category=Decimal_Number}>, or synonymously,
311	304	C<\p{General_Category=Digit}>. Starting with Unicode version 4.1, this
312	305	is the same set of characters matched by C<\p{Numeric_Type=Decimal}>.
313	306	But Unicode also has a different property with a similar name,
314	307	C<\p{Numeric_Type=Digit}>, which matches a completely different set of
315	308	characters. These characters are things such as C<CIRCLED DIGIT ONE>
316	309	or subscripts, or are from writing systems that lack all ten digits.
317	310
318	311	=end original
319	312
320	313	C<\p{Digit}> が意味するもの(つまり、C</a> 修飾子の下でない C<\d>)は、
321	314	C<\p{General_Category=Decimal_Number}>、または同義語として
322	315	C<\p{General_Category=Digit}> です。
323	316	Unicode バージョン 4.1 以降では、これは C<\p{Numeric_Type=Decimal}> に
324	317	マッチングする文字集合と同じです。
325	318	ただし、Unicode には、C<\p{Numeric_Type=Digit}> という類似した名前を持つ
326	319	別の特性もあります; これは完全に異なる文字集合とマッチングします。
327	320	これらの文字は、C<CIRCLEED DIGIT ONE> や添字のようなものであるか、
328	321	10 の数字すべてが揃っていない書記体系からのものです。
329	322
330	323	=begin original
331	324
332	325	The design intent is for C<\d> to exactly match the set of characters
333	326	that can safely be used with "normal" big-endian positional decimal
334	327	syntax, where, for example 123 means one 'hundred', plus two 'tens',
335	328	plus three 'ones'. This positional notation does not necessarily apply
336	329	to characters that match the other type of "digit",
337	330	C<\p{Numeric_Type=Digit}>, and so C<\d> doesn't match them.
338	331
339	332	=end original
340	333
341	334	設計意図は、C<\d> が「通常の」ビッグエンディアンの
342	335	位置 10 進構文 (例えば、123 は一つの「100」に二つの「10」と三つの「1」を
343	336	加えたものを意味する) で安全に使用できる文字集合と
344	337	正確にマッチングするようにすることです;
345	338	この位置表記は、他のタイプの「digit」である C<\p{Numeric_Type=Digit}> に
346	339	マッチングする文字には必ずしも適用されないため、
347	340	C<\d> はこれらの文字にマッチングしません。
348	341
349	342	=begin original
350	343
351	344	The Tamil digits (U+0BE6 - U+0BEF) can also legally be
352	345	used in old-style Tamil numbers in which they would appear no more than
353	346	one in a row, separated by characters that mean "times 10", "times 100",
354	347	etc. (See L<http://www.unicode.org/notes/tn21>.)
355	348
356	349	=end original
357	350
358	351	タミル語の数字(U+0BE6-U+0BEF)は、古い様式のタミル語の
359	352	数字でも合法的に使用することができます;
360	353	この数字は、「×10」や「×100」などを意味する文字で区切られて、
361	354	1 回に一度にしか現れません。
362	355	(L<http://www.unicode.org/notes/tn21>を参照してください)。
363	356
364	357	=begin original
365	358
366	359	Any character not matched by C<\d> is matched by C<\D>.
367	360
368	361	=end original
369	362
370	363	C<\d> にマッチングしない任意の文字は C<\D> にマッチングします。
371	364
372	365	=head3 Word characters
373	366
374	367	(単語文字)
375	368
376	369	=begin original
377	370
378	371	A C<\w> matches a single alphanumeric character (an alphabetic character, or a
379	372	decimal digit); or a connecting punctuation character, such as an
380	373	underscore ("_"); or a "mark" character (like some sort of accent) that
381	374	attaches to one of those. It does not match a whole word. To match a
382	375	whole word, use C<\w+>. This isn't the same thing as matching an
383	376	English word, but in the ASCII range it is the same as a string of
384	377	Perl-identifier characters.
385	378
386	379	=end original
387	380
388	381	C<\w> は単語全体ではなく、単一の英数字(つまり英字または数字)または
389	382	下線(C<_>) のような接続句読点
390	383	またはこれらの一つに付いている(ある種のアクセントのような)「マーク」文字に
391	384	マッチングします。
392	385	これは単語全体にはマッチングしません。
393	386	単語全体にマッチングするには、C<\w+> を使ってください。
394	387	これは英語の単語にマッチングするのと同じことではありませんが、
395	388	ASCII の範囲では、Perl の識別子文字の文字列と同じです。
396	389
397	390	=over
398	391
399	392	=item If the C</a> modifier is in effect ...
400	393
401	394	(C</a> 修飾子が有効なら ...)
402	395
403	396	=begin original
404	397
405	398	C<\w> matches the 63 characters [a-zA-Z0-9_].
406	399
407	400	=end original
408	401
409	402	C<\w> は 63 文字 [a-zA-Z0-9_] にマッチングします。
410	403
411	404	=item otherwise ...
412	405
413	406	(さもなければ ...)
414	407
415	408	=over
416	409
417	410	=item For code points above 255 ...
418	411
419	412	(256 以上の符号位置では ...)
420	413
421	414	=begin original
422	415
423	416	C<\w> matches the same as C<\p{Word}> matches in this range. That is,
424	417	it matches Thai letters, Greek letters, etc. This includes connector
425	418	punctuation (like the underscore) which connect two words together, or
426	419	diacritics, such as a C<COMBINING TILDE> and the modifier letters, which
427	420	are generally used to add auxiliary markings to letters.
428	421
429	422	=end original
430	423
431	424	C<\w> はこの範囲で C<\p{Word}> がマッチングするものと同じものに
432	425	マッチングします。
433	426	つまり、タイ文字、ギリシャ文字などです。
434	427	これには(下線のような)二つの単語を繋ぐ接続句読点、
435	428	C<COMBINING TILDE> や一般的に文字に追加のマークを付けるために
436	429	使われる修飾字のようなダイアクリティカルマークが含まれます。
437	430
438	431	=item For code points below 256 ...
439	432
440	433	(255 以下の符号位置では ...)
441	434
442	435	=over
443	436
444	437	=item if locale rules are in effect ...
445	438
446	439	(ロケール規則が有効なら ...)
447	440
448	441	=begin original
449	442
450	443	C<\w> matches the platform's native underscore character plus whatever
451	444	the locale considers to be alphanumeric.
452	445
453	446	=end original
454	447
455	448	C<\w> は、プラットフォームのネイティブな下線に加えてロケールが英数字と
456	449	考えるものにマッチングします。
457	450
458	451	=item if, instead, Unicode rules are in effect ...
459	452
460	453	(そうではなく、Unicode 規則が有効なら ...)
461	454
462	455	=begin original
463	456
464	457	C<\w> matches exactly what C<\p{Word}> matches.
465	458
466	459	=end original
467	460
468	461	C<\w> は C<\p{Word}> がマッチングするものと同じものにマッチングします。
469	462
470	463	=item otherwise ...
471	464
472	465	(さもなければ ...)
473	466
474	467	=begin original
475	468
476	469	C<\w> matches [a-zA-Z0-9_].
477	470
478	471	=end original
479	472
480	473	C<\w> は [a-zA-Z0-9_] にマッチングします。
481	474
482	475	=back
483	476
484	477	=back
485	478
486	479	=back
487	480
488	481	=begin original
489	482
490	483	Which rules apply are determined as described in L<perlre/Which character set modifier is in effect?>.
491	484
492	485	=end original
493	486
494	487	どの規則を適用するかは L<perlre/Which character set modifier is in effect?> で
495	488	記述されている方法で決定されます。
496	489
497	490	=begin original
498	491
499	492	There are a number of security issues with the full Unicode list of word
500	493	characters. See L<http://unicode.org/reports/tr36>.
501	494
502	495	=end original
503	496
504	497	完全な Unicode の単語文字の一覧には多くのセキュリティ問題があります。
505	498	L<http://unicode.org/reports/tr36> を参照してください。
506	499
507	500	=begin original
508	501
509	502	Also, for a somewhat finer-grained set of characters that are in programming
510	503	language identifiers beyond the ASCII range, you may wish to instead use the
511	504	more customized L</Unicode Properties>, C<\p{ID_Start}>,
512	505	C<\p{ID_Continue}>, C<\p{XID_Start}>, and C<\p{XID_Continue}>. See
513	506	L<http://unicode.org/reports/tr31>.
514	507
515	508	=end original
516	509
517	510	また、ASCII の範囲を超えたプログラミング言語識別子のための
518	511	より高精度の文字集合のためには、代わりによりカスタマイズされた
519	512	L<Unicode 特性\|/Unicode Properties>である
520	513	C<\p{ID_Start}>,
521	514	C<\p{ID_Continue}>, C<\p{XID_Start}>, and C<\p{XID_Continue}> を
522	515	使った方がよいでしょう。
523	516	L<http://unicode.org/reports/tr31> を参照してください。
524	517
525	518	=begin original
526	519
527	520	Any character not matched by C<\w> is matched by C<\W>.
528	521
529	522	=end original
530	523
531	524	C<\w> にマッチングしない任意の文字は C<\W> にマッチングします。
532	525
533	526	=head3 Whitespace
534	527
535	528	(空白)
536	529
537	530	=begin original
538	531
539	532	C<\s> matches any single character considered whitespace.
540	533
541	534	=end original
542	535
543	536	C<\s> は空白と考えられる単一の文字にマッチングします。
544	537
545	538	=over
546	539
547	540	=item If the C</a> modifier is in effect ...
548	541
549	542	(C</a> 修飾子が有効なら ...)
550	543
551	544	=begin original
552	545
553	546	In all Perl versions, C<\s> matches the 5 characters [\t\n\f\r ]; that
554	547	is, the horizontal tab,
555	548	the newline, the form feed, the carriage return, and the space.
556	549	Starting in Perl v5.18, it also matches the vertical tab, C<\cK>.
557	550	See note C<[1]> below for a discussion of this.
558	551
559	552	=end original
560	553
561	554	全ての Perl バージョンで、C<\s> は [\t\n\f\r ] の 5 文字にマッチングします;
562	555	つまり、水平タブ、改行、改頁、復帰、スペースです。
563	556	Perl 5.18 から、垂直タブ C<\cK> にもマッチングします。
564	557	ここでの議論については後述する C<[1]> を参照してください。
565	558
566	559	=item otherwise ...
567	560
568	561	(さもなければ ...)
569	562
570	563	=over
571	564
572	565	=item For code points above 255 ...
573	566
574	567	(256 以上の符号位置では ...)
575	568
576	569	=begin original
577	570
578	571	C<\s> matches exactly the code points above 255 shown with an "s" column
579	572	in the table below.
580	573
581	574	=end original
582	575
583	576	C<\s> は、後述する表の "s" の列で示されている、
584	577	255 を超える符号位置に正確にマッチングします。
585	578
586	579	=item For code points below 256 ...
587	580
588	581	(255 以下の符号位置では ...)
589	582
590	583	=over
591	584
592	585	=item if locale rules are in effect ...
593	586
594	587	(ロケール規則が有効なら ...)
595	588
596	589	=begin original
597	590
598	591	C<\s> matches whatever the locale considers to be whitespace.
599	592
600	593	=end original
601	594
602	595	C<\s> はロケールが空白だと考えるものにマッチングします。
603	596
604	597	=item if, instead, Unicode rules are in effect ...
605	598
606	599	(そうではなく、Unicode 規則が有効なら ...)
607	600
608	601	=begin original
609	602
610	603	C<\s> matches exactly the characters shown with an "s" column in the
611	604	table below.
612	605
613	606	=end original
614	607
615	608	C<\s> は正確に以下の表で "s" の列にある文字にマッチングします。
616	609
617	610	=item otherwise ...
618	611
619	612	(さもなければ ...)
620	613
621	614	=begin original
622	615
623	616	C<\s> matches [\t\n\f\r ] and, starting in Perl
624	617	v5.18, the vertical tab, C<\cK>.
625	618	(See note C<[1]> below for a discussion of this.)
626	619	Note that this list doesn't include the non-breaking space.
627	620
628	621	=end original
629	622
630	623	C<\s> は [\t\n\f\r ] にマッチングし、Perl v5.18 から、
631	624	垂直タブ C<\cK> にもマッチングします。
632	625	(これの議論については後述する C<[1]> を参照してください。)
633	626	この一覧にはノーブレークスペースが含まれていないことに注意してください。
634	627
635	628	=back
636	629
637	630	=back
638	631
639	632	=back
640	633
641	634	=begin original
642	635
643	636	Which rules apply are determined as described in L<perlre/Which character set modifier is in effect?>.
644	637
645	638	=end original
646	639
647	640	どの規則を適用するかは L<perlre/Which character set modifier is in effect?> で
648	641	記述されている方法で決定されます。
649	642
650	643	=begin original
651	644
652	645	Any character not matched by C<\s> is matched by C<\S>.
653	646
654	647	=end original
655	648
656	649	C<\s> にマッチングしない任意の文字は C<\S> にマッチングします。
657	650
658	651	=begin original
659	652
660	653	C<\h> matches any character considered horizontal whitespace;
661	654	this includes the platform's space and tab characters and several others
662	655	listed in the table below. C<\H> matches any character
663	656	not considered horizontal whitespace. They use the platform's native
664	657	character set, and do not consider any locale that may otherwise be in
665	658	use.
666	659
667	660	=end original
668	661
669	662	C<\h> は水平空白と考えられる任意の文字にマッチングします; これは
670	663	プラットフォームのスペースとタブ文字および以下の表に上げられている
671	664	いくつかのその他の文字です。
672	665	C<\H> は水平空白と考えられない文字にマッチングします。
673	666	これらはプラットフォームのネイティブな文字集合を使い、
674	667	他の場所では有効なロケールを考慮しません。
675	668
676	669	=begin original
677	670
678	671	C<\v> matches any character considered vertical whitespace;
679	672	this includes the platform's carriage return and line feed characters (newline)
680	673	plus several other characters, all listed in the table below.
681	674	C<\V> matches any character not considered vertical whitespace.
682	675	They use the platform's native character set, and do not consider any
683	676	locale that may otherwise be in use.
684	677
685	678	=end original
686	679
687	680	C<\v> は垂直空白と考えられる任意の文字にマッチングします; これは
688	681	プラットフォームの復帰と行送り(改行)文字に加えていくつかのその他の文字です;
689	682	全ては以下の表に挙げられています。
690	683	C<\V> は垂直空白と考えられない任意の文字にマッチングします。
691	684	これらはプラットフォームのネイティブな文字集合を使い、
692	685	他の場所では有効なロケールを考慮しません。
693	686
694	687	=begin original
695	688
696	689	C<\R> matches anything that can be considered a newline under Unicode
697	690	rules. It can match a multi-character sequence. It cannot be used inside
698	691	a bracketed character class; use C<\v> instead (vertical whitespace).
699	692	It uses the platform's
700	693	native character set, and does not consider any locale that may
701	694	otherwise be in use.
702	695	Details are discussed in L<perlrebackslash>.
703	696
704	697	=end original
705	698
706	699	C<\R> は Unicode の規則で改行と考えられるものにマッチングします。
707	700	複数文字の並びにマッチングすることもあります。
708	701	従って、大かっこ文字クラスの中では使えません; 代わりに C<\v> (垂直空白) を
709	702	使ってください。
710	703	これらはプラットフォームのネイティブな文字集合を使い、
711	704	他の場所では有効なロケールを考慮しません。
712	705	詳細は L<perlrebackslash> で議論しています。
713	706
714	707	=begin original
715	708
716	709	Note that unlike C<\s> (and C<\d> and C<\w>), C<\h> and C<\v> always match
717	710	the same characters, without regard to other factors, such as the active
718	711	locale or whether the source string is in UTF-8 format.
719	712
720	713	=end original
721	714
722	715	C<\s> (および C<\d> と C<\w>) と違って、C<\h> および C<\v> は、現在の
723	716	ロケールやソース文字列が UTF-8 形式かどうかといった他の要素に関わらず
724	717	同じ文字にマッチングします。
725	718
726	719	=begin original
727	720
728	721	One might think that C<\s> is equivalent to C<[\h\v]>. This is indeed true
729	722	starting in Perl v5.18, but prior to that, the sole difference was that the
730	723	vertical tab (C<"\cK">) was not matched by C<\s>.
731	724
732	725	=end original
733	726
734	727	C<\s> が C<[\h\v]> と等価と考える人がいるかもしれません。
735	728	Perl 5.18 からはもちろん正しいです; しかしそれより前では、
736	729	唯一の違いは、垂直タブ (C<"\xcK">) は C<\s> にマッチングしないということです。
737	730
738	731	=begin original
739	732
740	733	The following table is a complete listing of characters matched by
741	734	C<\s>, C<\h> and C<\v> as of Unicode 6.3.
742	735
743	736	=end original
744	737
745	738	以下の表は Unicode 6.3 現在で C<\s>, C<\h>, C<\v> にマッチングする文字の
746	739	完全な一覧です。
747	740
748	741	=begin original
749	742
750	743	The first column gives the Unicode code point of the character (in hex format),
751	744	the second column gives the (Unicode) name. The third column indicates
752	745	by which class(es) the character is matched (assuming no locale is in
753	746	effect that changes the C<\s> matching).
754	747
755	748	=end original
756	749
757	750	最初の列は文字の Unicode 符号位置(16 進形式)、2 番目の列は (Unicode の)
758	751	名前です。
759	752	3 番目の列はどのクラスにマッチングするかを示しています
760	753	(C<\s> のマッチングを変更するようなロケールが
761	754	有効でないことを仮定しています)。
762	755
763	756	0x0009 CHARACTER TABULATION h s
764	757	0x000a LINE FEED (LF) vs
765	758	0x000b LINE TABULATION vs [1]
766	759	0x000c FORM FEED (FF) vs
767	760	0x000d CARRIAGE RETURN (CR) vs
768	761	0x0020 SPACE h s
769	762	0x0085 NEXT LINE (NEL) vs [2]
770	763	0x00a0 NO-BREAK SPACE h s [2]
771	764	0x1680 OGHAM SPACE MARK h s
772	765	0x2000 EN QUAD h s
773	766	0x2001 EM QUAD h s
774	767	0x2002 EN SPACE h s
775	768	0x2003 EM SPACE h s
776	769	0x2004 THREE-PER-EM SPACE h s
777	770	0x2005 FOUR-PER-EM SPACE h s
778	771	0x2006 SIX-PER-EM SPACE h s
779	772	0x2007 FIGURE SPACE h s
780	773	0x2008 PUNCTUATION SPACE h s
781	774	0x2009 THIN SPACE h s
782	775	0x200a HAIR SPACE h s
783	776	0x2028 LINE SEPARATOR vs
784	777	0x2029 PARAGRAPH SEPARATOR vs
785	778	0x202f NARROW NO-BREAK SPACE h s
786	779	0x205f MEDIUM MATHEMATICAL SPACE h s
787	780	0x3000 IDEOGRAPHIC SPACE h s
788	781
789	782	=over 4
790	783
791	784	=item [1]
792	785
793	786	=begin original
794	787
795	788	Prior to Perl v5.18, C<\s> did not match the vertical tab.
796	789	C<[^\S\cK]> (obscurely) matches what C<\s> traditionally did.
797	790
798	791	=end original
799	792
800	793	Perl v5.18 より前では、C<\s> は垂直タブにマッチングしませんでした。
801	794	C<[^\S\cK]> は(ひっそりと)C<\s> が伝統的に
802	795	マッチングしていたものにマッチングします。
803	796
804	797	=item [2]
805	798
806	799	=begin original
807	800
808	801	NEXT LINE and NO-BREAK SPACE may or may not match C<\s> depending
809	802	on the rules in effect. See
810	803	L<the beginning of this section\|/Whitespace>.
811	804
812	805	=end original
813	806
814	807	NEXT LINE と NO-BREAK SPACE はどの規則が有効かによって C<\s> に
815	808	マッチングしたりマッチングしなかったりします。
816	809	L<the beginning of this section\|/Whitespace> を参照してください。
817	810
818	811	=back
819	812
820	813	=head3 Unicode Properties
821	814
822	815	(Unicode 特性)
823	816
824	817	=begin original
825	818
826	819	C<\pP> and C<\p{Prop}> are character classes to match characters that fit given
827	820	Unicode properties. One letter property names can be used in the C<\pP> form,
828	821	with the property name following the C<\p>, otherwise, braces are required.
829	822	When using braces, there is a single form, which is just the property name
830	823	enclosed in the braces, and a compound form which looks like C<\p{name=value}>,
831	824	which means to match if the property "name" for the character has that particular
832	825	"value".
833	826	For instance, a match for a number can be written as C</\pN/> or as
834	827	C</\p{Number}/>, or as C</\p{Number=True}/>.
835	828	Lowercase letters are matched by the property I<Lowercase_Letter> which
836	829	has the short form I<Ll>. They need the braces, so are written as C</\p{Ll}/> or
837	830	C</\p{Lowercase_Letter}/>, or C</\p{General_Category=Lowercase_Letter}/>
838	831	(the underscores are optional).
839	832	C</\pLl/> is valid, but means something different.
840	833	It matches a two character string: a letter (Unicode property C<\pL>),
841	834	followed by a lowercase C<l>.
842	835
843	836	=end original
844	837
845	838	C<\pP> と C<\p{Prop}> は指定された Unicode 特性に一致する文字に
846	839	マッチングする文字クラスです。
847	840	一文字特性は C<\pP> 形式で、C<\p> に引き続いて特性名です; さもなければ
848	841	中かっこが必要です。
849	842	中かっこを使うとき、単に特性名を中かっこで囲んだ単一形式と、
850	843	C<\p{name=value}> のような形で、文字の特性 "name" が特定の "value" を
851	844	持つものにマッチングすることになる複合形式があります。
852	845	例えば、数字にマッチングするものは C</\pN/> または C</\p{Number}/> または
853	846	C</\p{Number=True}/> と書けます。
854	847	小文字は I<LowercaseLetter> 特性にマッチングします; これには
855	848	I<Ll> と言う短縮形式があります。
856	849	中かっこが必要なので、C</\p{Ll}/> または C</\p{Lowercase_Letter}/> または
857	850	C</\p{General_Category=Lowercase_Letter}/> と書きます(下線はオプションです)。
858	851	C</\pLl/> も妥当ですが、違う意味になります。
859	852	これは 2 文字にマッチングします: 英字 (Unicode 特性 C<\pL>)に引き続いて
860	853	小文字の C<l> です。
861	854
862	855	=begin original
863	856
864		~~What~~ ~~a Unic~~ode prope~~rty~~ ma~~tch~~es is n~~ever~~ ~~subj~~ect t~~o local~~e rules, ~~and~~
	857	If locale rules are not in effect, the use of
865		if locale rule~~s a~~r~~e no~~t ~~other~~wise ~~in e~~ffect, the use of a Unicode
	858	a Unicode property will force the regular expression into using Unicode
866		pr~~operty will force the reg~~ular e~~xpre~~ssion into using ~~Unicode ru~~le~~s, if~~
	859	rules, if it isn't already.
867		it isn't already.
868	860
869	861	=end original
870	862
871		Unicode 特性が何にマッチングするかは決してロケールの規則に影響されず、
872	863	ロケール規則が有効でない場合、Unicode 特性を使うと
873	864	正規表現に (まだそうでなければ) Unicode 規則を使うように強制します。
874	865
875	866	=begin original
876	867
877	868	Note that almost all properties are immune to case-insensitive matching.
878	869	That is, adding a C</i> regular expression modifier does not change what
879	870	they match. There are two sets that are affected. The first set is
880	871	C<Uppercase_Letter>,
881	872	C<Lowercase_Letter>,
882	873	and C<Titlecase_Letter>,
883	874	all of which match C<Cased_Letter> under C</i> matching.
884	875	The second set is
885	876	C<Uppercase>,
886	877	C<Lowercase>,
887	878	and C<Titlecase>,
888	879	all of which match C<Cased> under C</i> matching.
889	880	(The difference between these sets is that some things, such as Roman
890	881	numerals, come in both upper and lower case, so they are C<Cased>, but
891	882	aren't considered to be letters, so they aren't C<Cased_Letter>s. They're
892	883	actually C<Letter_Number>s.)
893	884	This set also includes its subsets C<PosixUpper> and C<PosixLower>, both
894	885	of which under C</i> match C<PosixAlpha>.
895	886
896	887	=end original
897	888
898	889	ほとんど全ての特性は大文字小文字を無視したマッチングから免除されることに
899	890	注意してください。
900	891	つまり、C</i> 正規表現修飾子はこれらがマッチングするものに影響を
901	892	与えないということです。
902	893	影響を与える二つの集合があります。
903	894	一つ目の集合は
904	895	C<Uppercase_Letter>,
905	896	C<Lowercase_Letter>,
906	897	C<Titlecase_Letter> で、全て C</i> マッチングの下で
907	898	C<Cased_Letter> にマッチングします。
908	899	二つ目の集合は
909	900	C<Uppercase>,
910	901	C<Lowercase>,
911	902	C<Titlecase> で、全てC</i> マッチングの下で
912	903	C<Cased> にマッチングします。
913	904	(これらの集合の違いは、ローマ数字のような一部のものは、
914	905	大文字と小文字があるので C<Cased> ですが、
915	906	文字とは扱われないので C<Cased_Letter> ではありません。
916	907	これらは実際には C<Letter_Number> です。)
917	908	この集合はその部分集合である C<PosixUpper> と C<PosixLower> を含みます;
918	909	これら両方は C</i> マッチングの下では C<PosixAlpha> にマッチングします。
919	910
920	911	=begin original
921	912
922	913	For more details on Unicode properties, see L<perlunicode/Unicode
923	914	Character Properties>; for a
924	915	complete list of possible properties, see
925	916	L<perluniprops/Properties accessible through \p{} and \P{}>,
926	917	which notes all forms that have C</i> differences.
927	918	It is also possible to define your own properties. This is discussed in
928	919	L<perlunicode/User-Defined Character Properties>.
929	920
930	921	=end original
931	922
932	923	Unicode 特性に関するさらなる詳細については、
933	924	L<perlunicode/Unicode Character Properties> を参照してください; 特性の完全な
934	925	一覧については、C</i> に違いのある全ての形式について記されている
935	926	L<perluniprops/Properties accessible through \p{} and \P{}> を参照して
936	927	ください。
937	928	独自の特性を定義することも可能です。
938	929	これは L<perlunicode/User-Defined Character Properties> で
939	930	議論されています。
940	931
941	932	=begin original
942	933
943	934	Unicode properties are defined (surprise!) only on Unicode code points.
944	935	Starting in v5.20, when matching against C<\p> and C<\P>, Perl treats
945	936	non-Unicode code points (those above the legal Unicode maximum of
946	937	0x10FFFF) as if they were typical unassigned Unicode code points.
947	938
948	939	=end original
949	940
950	941	Unicode 特性は (驚くべきことに!) Unicode 符号位置に対してのみ
951	942	定義されています。
952	943	v5.20 から、C<\p> と C<\P> に対してマッチングするとき、
953	944	Perl は
954	945	非 Unicode 符号位置 (正当な Unicode の上限の 0x10FFFF を超えるもの) を、
955	946	典型的な未割り当て Unicode 符号位置であるかのように扱います。
956	947
957	948	=begin original
958	949
959	950	Prior to v5.20, Perl raised a warning and made all matches fail on
960	951	non-Unicode code points. This could be somewhat surprising:
961	952
962	953	=end original
963	954
964	955	v5.20 より前では、非 Unicode 符号位置に対しては全てのマッチングは失敗して、
965	956	Perl は警告を出していました。
966	957	これは驚かされるものだったかもしれません。
967	958
968	959	chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails on Perls < v5.20.
969	960	chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Also fails on Perls
970	961	# < v5.20
971	962
972	963	=begin original
973	964
974	965	Even though these two matches might be thought of as complements, until
975	966	v5.20 they were so only on Unicode code points.
976	967
977	968	=end original
978	969
979	970	これら二つのマッチングは補集合と考えるかもしれませんが、
980	971	v5.20 まで、これらは Unicode 符号位置だけでした。
981	972
982		=begin original
983
984		Starting in perl v5.30, wildcards are allowed in Unicode property
985		values. See L<perlunicode/Wildcards in Property Values>.
986
987		=end original
988
989		perl v5.30 から、Unicode 特性にワイルドカードを使えます。
990		L<perlunicode/Wildcards in Property Values> を参照してください。
991
992	973	=head4 Examples
993	974
994	975	(例)
995	976
996	977	=begin original
997	978
998	979	"a" =~ /\w/ # Match, "a" is a 'word' character.
999	980	"7" =~ /\w/ # Match, "7" is a 'word' character as well.
1000	981	"a" =~ /\d/ # No match, "a" isn't a digit.
1001	982	"7" =~ /\d/ # Match, "7" is a digit.
1002	983	" " =~ /\s/ # Match, a space is whitespace.
1003	984	"a" =~ /\D/ # Match, "a" is a non-digit.
1004	985	"7" =~ /\D/ # No match, "7" is not a non-digit.
1005	986	" " =~ /\S/ # No match, a space is not non-whitespace.
1006	987
1007	988	=end original
1008	989
1009	990	"a" =~ /\w/ # マッチング; "a" は「単語」文字。
1010	991	"7" =~ /\w/ # マッチング; "7" も「単語」文字。
1011	992	"a" =~ /\d/ # マッチングしない; "a" は数字ではない。
1012	993	"7" =~ /\d/ # マッチング; "7" は数字。
1013	994	" " =~ /\s/ # マッチング; スペースは空白。
1014	995	"a" =~ /\D/ # マッチング; "a" は非数字。
1015	996	"7" =~ /\D/ # マッチングしない; "7" は非数字ではない。
1016	997	" " =~ /\S/ # マッチングしない; スペースは非空白ではない。
1017	998
1018	999	=begin original
1019	1000
1020	1001	" " =~ /\h/ # Match, space is horizontal whitespace.
1021	1002	" " =~ /\v/ # No match, space is not vertical whitespace.
1022	1003	"\r" =~ /\v/ # Match, a return is vertical whitespace.
1023	1004
1024	1005	=end original
1025	1006
1026	1007	" " =~ /\h/ # マッチング; スペースは水平空白。
1027	1008	" " =~ /\v/ # マッチングしない; スペースは垂直空白ではない。
1028	1009	"\r" =~ /\v/ # マッチング; 復帰は垂直空白。
1029	1010
1030	1011	=begin original
1031	1012
1032	1013	"a" =~ /\pL/ # Match, "a" is a letter.
1033	1014	"a" =~ /\p{Lu}/ # No match, /\p{Lu}/ matches upper case letters.
1034	1015
1035	1016	=end original
1036	1017
1037	1018	"a" =~ /\pL/ # マッチング; "a" は英字。
1038	1019	"a" =~ /\p{Lu}/ # マッチングしない; /\p{Lu}/ は大文字にマッチングする。
1039	1020
1040	1021	=begin original
1041	1022
1042	1023	"\x{0e0b}" =~ /\p{Thai}/ # Match, \x{0e0b} is the character
1043	1024	# 'THAI CHARACTER SO SO', and that's in
1044	1025	# Thai Unicode class.
1045	1026	"a" =~ /\P{Lao}/ # Match, as "a" is not a Laotian character.
1046	1027
1047	1028	=end original
1048	1029
1049	1030	"\x{0e0b}" =~ /\p{Thai}/ # マッチング; \x{0e0b} は文字
1050	1031	# 'THAI CHARACTER SO SO' で、これは
1051	1032	# Thai Unicode クラスにある。
1052	1033	"a" =~ /\P{Lao}/ # マッチング; "a" はラオス文字ではない。
1053	1034
1054	1035	=begin original
1055	1036
1056	1037	It is worth emphasizing that C<\d>, C<\w>, etc, match single characters, not
1057	1038	complete numbers or words. To match a number (that consists of digits),
1058	1039	use C<\d+>; to match a word, use C<\w+>. But be aware of the security
1059	1040	considerations in doing so, as mentioned above.
1060	1041
1061	1042	=end original
1062	1043
1063	1044	C<\d>, C<\w> などは数値や単語全体ではなく、1 文字にマッチングすることは
1064	1045	強調する価値があります。
1065	1046	(数字で構成される) 数値にマッチングするには C<\d+> を使います;
1066	1047	単語にマッチングするには C<\w+> を使います。
1067	1048	しかし前述したように、そうする場合のセキュリティ問題について
1068	1049	注意してください。
1069	1050
1070	1051	=head2 Bracketed Character Classes
1071	1052
1072	1053	(かっこ付き文字クラス)
1073	1054
1074	1055	=begin original
1075	1056
1076	1057	The third form of character class you can use in Perl regular expressions
1077	1058	is the bracketed character class. In its simplest form, it lists the characters
1078	1059	that may be matched, surrounded by square brackets, like this: C<[aeiou]>.
1079	1060	This matches one of C<a>, C<e>, C<i>, C<o> or C<u>. Like the other
1080	1061	character classes, exactly one character is matched.* To match
1081	1062	a longer string consisting of characters mentioned in the character
1082	1063	class, follow the character class with a L<quantifier\|perlre/Quantifiers>. For
1083	1064	instance, C<[aeiou]+> matches one or more lowercase English vowels.
1084	1065
1085	1066	=end original
1086	1067
1087	1068	Perl 正規表現で使える文字クラスの第 3 の形式は大かっこ文字クラスです。
1088	1069	もっとも単純な形式では、以下のように大かっこの中にマッチングする文字を
1089	1070	リストします: C<[aeiou]>.
1090	1071	これは C<a>, C<e>, C<i>, C<o>, C<u> のどれかにマッチングします。
1091	1072	他の文字クラスと同様、正確に一つの文字にマッチングします。
1092	1073	文字クラスで言及した文字で構成されるより長い文字列にマッチングするには、
1093	1074	文字クラスに L<量指定子\|perlre/Quantifiers> を付けます。
1094	1075	例えば、C<[aeiou]+> は一つまたはそれ以上の小文字英語母音に
1095	1076	マッチングします。
1096	1077
1097	1078	=begin original
1098	1079
1099	1080	Repeating a character in a character class has no
1100	1081	effect; it's considered to be in the set only once.
1101	1082
1102	1083	=end original
1103	1084
1104	1085	文字クラスの中で文字を繰り返しても効果はありません; 一度だけ現れたものと
1105	1086	考えられます。
1106	1087
1107	1088	=begin original
1108	1089
1109	1090	Examples:
1110	1091
1111	1092	=end original
1112	1093
1113	1094	例:
1114	1095
1115	1096	=begin original
1116	1097
1117	1098	"e" =~ /[aeiou]/ # Match, as "e" is listed in the class.
1118	1099	"p" =~ /[aeiou]/ # No match, "p" is not listed in the class.
1119	1100	"ae" =~ /^[aeiou]$/ # No match, a character class only matches
1120	1101	# a single character.
1121	1102	"ae" =~ /^[aeiou]+$/ # Match, due to the quantifier.
1122	1103
1123	1104	=end original
1124	1105
1125	1106	"e" =~ /[aeiou]/ # マッチング; "e" はクラスにある。
1126	1107	"p" =~ /[aeiou]/ # マッチングしない; "p" はクラスにない。
1127	1108	"ae" =~ /^[aeiou]$/ # マッチングしない; 一つの文字クラスは
1128	1109	# 一文字だけにマッチングする。
1129	1110	"ae" =~ /^[aeiou]+$/ # マッチング; 量指定子により。
1130	1111
1131	1112	-------
1132	1113
1133	1114	=begin original
1134	1115
1135	1116	* There are two exceptions to a bracketed character class matching a
1136	1117	single character only. Each requires special handling by Perl to make
1137	1118	things work:
1138	1119
1139	1120	=end original
1140	1121
1141	1122	* 大かっこ文字クラスは単一の文字にのみマッチングするということには
1142	1123	二つの例外があります。
1143	1124	それぞれは Perl がうまく動くために特別な扱いが必要です:
1144	1125
1145	1126	=over
1146	1127
1147	1128	=item *
1148	1129
1149	1130	=begin original
1150	1131
1151	1132	When the class is to match caselessly under C</i> matching rules, and a
1152	1133	character that is explicitly mentioned inside the class matches a
1153	1134	multiple-character sequence caselessly under Unicode rules, the class
1154	1135	will also match that sequence. For example, Unicode says that the
1155	1136	letter C<LATIN SMALL LETTER SHARP S> should match the sequence C<ss>
1156	1137	under C</i> rules. Thus,
1157	1138
1158	1139	=end original
1159	1140
1160	1141	クラスが C</i> マッチング規則の下で大文字小文字を無視したマッチングを
1161	1142	して、クラスの中で明示的に記述された文字が Unicode の規則の下で複数文字並びに
1162	1143	大文字小文字を無視してマッチングするとき、
1163	1144	そのクラスはその並びにもマッチングします。
1164	1145	例えば、Unicode は文字 C<LATIN SMALL LETTER SHARP S> は C</i> 規則の下では
1165	1146	並び C<ss> にマッチングするとしています。
1166	1147	従って:
1167	1148
1168	1149	'ss' =~ /\A\N{LATIN SMALL LETTER SHARP S}\z/i # Matches
1169	1150	'ss' =~ /\A[aeioust\N{LATIN SMALL LETTER SHARP S}]\z/i # Matches
1170	1151
1171	1152	=begin original
1172	1153
1173	1154	For this to happen, the class must not be inverted (see L</Negation>)
1174	1155	and the character must be explicitly specified, and not be part of a
1175	1156	multi-character range (not even as one of its endpoints). (L</Character
1176	1157	Ranges> will be explained shortly.) Therefore,
1177	1158
1178	1159	=end original
1179	1160
1180	1161	これが起きるためには、
1181	1162	そのクラスは否定 (L</Negation> 参照) ではなく、
1182	1163	その文字は明示的に指定され、複数文字範囲の一部
1183	1164	(たとえその端でも)でない必要があります。
1184	1165	(L</Character Ranges> は短く説明されています。)
1185	1166	従って:
1186	1167
1187	1168	'ss' =~ /\A[\0-\x{ff}]\z/ui # Doesn't match
1188	1169	'ss' =~ /\A[\0-\N{LATIN SMALL LETTER SHARP S}]\z/ui # No match
1189	1170	'ss' =~ /\A[\xDF-\xDF]\z/ui # Matches on ASCII platforms, since
1190	1171	# \xDF is LATIN SMALL LETTER SHARP S,
1191	1172	# and the range is just a single
1192	1173	# element
1193	1174
1194	1175	=begin original
1195	1176
1196	1177	Note that it isn't a good idea to specify these types of ranges anyway.
1197	1178
1198	1179	=end original
1199	1180
1200	1181	どちらにしろこれらの種類の範囲を指定するのは良い考えではありません。
1201	1182
1202	1183	=item *
1203	1184
1204	1185	=begin original
1205	1186
1206	1187	Some names known to C<\N{...}> refer to a sequence of multiple characters,
1207	1188	instead of the usual single character. When one of these is included in
1208	1189	the class, the entire sequence is matched. For example,
1209	1190
1210	1191	=end original
1211	1192
1212	1193	Some names known to
1213	1194	C<\N{...}> で知られているいくつかの名前は、通常の単一の文字ではなく、
1214	1195	複数の文字の並びを参照します。
1215	1196	その一つがこのクラスに含まれている場合、並び全体がマッチングします。
1216	1197	例えば:
1217	1198
1218	1199	"\N{TAMIL LETTER KA}\N{TAMIL VOWEL SIGN AU}"
1219	1200	=~ / ^ [\N{TAMIL SYLLABLE KAU}] $ /x;
1220	1201
1221	1202	=begin original
1222	1203
1223	1204	matches, because C<\N{TAMIL SYLLABLE KAU}> is a named sequence
1224	1205	consisting of the two characters matched against. Like the other
1225	1206	instance where a bracketed class can match multiple characters, and for
1226	1207	similar reasons, the class must not be inverted, and the named sequence
1227	1208	may not appear in a range, even one where it is both endpoints. If
1228	1209	these happen, it is a fatal error if the character class is within the
1229	1210	scope of L<C<use re 'strict>\|re/'strict' mode>, or within an extended
1230	1211	L<C<(?[...])>\|/Extended Bracketed Character Classes> class; otherwise
1231	1212	only the first code point is used (with a C<regexp>-type warning
1232	1213	raised).
1233	1214
1234	1215	=end original
1235	1216
1236	1217	これはマッチングします; なぜなら C<\N{TAMIL SYLLABLE KAU}> は
1237	1218	マッチングする二つの文字からなる名前付き並びだからです。
1238	1219	大かっこクラスが複数の文字にマッチングするその他の例と同じように、
1239	1220	そして同様の理由で、クラスは否定できず、
1240	1221	たとえ両端の間であっても名前付き並びは範囲の中には現れません。
1241	1222	これらが起きたとき、文字クラスが
1242	1223	L<C<use re 'strict>\|re/'strict' mode> のスコープ内か、
1243	1224	拡張された L<C<(?[...])>\|/Extended Bracketed Character Classes> クラスの
1244	1225	中の場合には致命的エラーになります;
1245	1226	さもなければ、最初の符号位置のみが使われます
1246	1227	(そして C<regexp> 系の警告が発生します)。
1247	1228
1248	1229	=back
1249	1230
1250	1231	=head3 Special Characters Inside a Bracketed Character Class
1251	1232
1252	1233	(かっこ付き文字クラスの中の特殊文字)
1253	1234
1254	1235	=begin original
1255	1236
1256	1237	Most characters that are meta characters in regular expressions (that
1257	1238	is, characters that carry a special meaning like C<.>, C<*>, or C<(>) lose
1258	1239	their special meaning and can be used inside a character class without
1259	1240	the need to escape them. For instance, C<[()]> matches either an opening
1260	1241	parenthesis, or a closing parenthesis, and the parens inside the character
1261	1242	class don't group or capture. Be aware that, unless the pattern is
1262	1243	evaluated in single-quotish context, variable interpolation will take
1263	1244	place before the bracketed class is parsed:
1264	1245
1265	1246	=end original
1266	1247
1267	1248	正規表現内でメタ文字(つまり、C<.>, C<*>, C<(> のように特別な意味を持つ
1268	1249	文字)となるほとんどの文字は文字クラス内ではエスケープしなくても特別な意味を
1269	1250	失うので、エスケープする必要はありません。
1270	1251	例えば、C<[()]> は開きかっこまたは閉じかっこにマッチングし、文字クラスの中の
1271	1252	かっこはグループや捕捉にはなりません。
1272	1253	パターンがシングルクォート風コンテキストの中で評価されない限り、
1273	1254	変数展開は大かっこクラスがパースされる前に行われることに注意してください:
1274	1255
1275	1256	$, = "\t\| ";
1276	1257	$a =~ m'[$,]'; # single-quotish: matches '$' or ','
1277	1258	$a =~ q{[$,]}' # same
1278	1259	$a =~ m/[$,]/; # double-quotish: matches "\t", "\|", or " "
1279	1260
1280	1261	=begin original
1281	1262
1282	1263	Characters that may carry a special meaning inside a character class are:
1283	1264	C<\>, C<^>, C<->, C<[> and C<]>, and are discussed below. They can be
1284	1265	escaped with a backslash, although this is sometimes not needed, in which
1285	1266	case the backslash may be omitted.
1286	1267
1287	1268	=end original
1288	1269
1289	1270	文字クラスの中でも特別な意味を持つ文字は:
1290	1271	C<\>, C<^>, C<->, C<[>, C<]> で、以下で議論します。
1291	1272	これらは逆スラッシュでエスケープできますが、不要な場合もあり、そのような
1292	1273	場合では逆スラッシュは省略できます。
1293	1274
1294	1275	=begin original
1295	1276
1296	1277	The sequence C<\b> is special inside a bracketed character class. While
1297	1278	outside the character class, C<\b> is an assertion indicating a point
1298	1279	that does not have either two word characters or two non-word characters
1299	1280	on either side, inside a bracketed character class, C<\b> matches a
1300	1281	backspace character.
1301	1282
1302	1283	=end original
1303	1284
1304	1285	シーケンス C<\b> は大かっこ文字クラスの内側では特別です。
1305	1286	文字クラスの外側では C<\b> 二つの単語文字か二つの非単語文字のどちらかではない
1306	1287	位置を示す表明ですが、大かっこ文字クラスの内側では C<\b> は後退文字に
1307	1288	マッチングします。
1308	1289
1309	1290	=begin original
1310	1291
1311	1292	The sequences
1312	1293	C<\a>,
1313	1294	C<\c>,
1314	1295	C<\e>,
1315	1296	C<\f>,
1316	1297	C<\n>,
1317	1298	C<\N{I<NAME>}>,
1318	1299	C<\N{U+I<hex char>}>,
1319	1300	C<\r>,
1320	1301	C<\t>,
1321	1302	and
1322	1303	C<\x>
1323	1304	are also special and have the same meanings as they do outside a
1324	1305	bracketed character class.
1325	1306
1326	1307	=end original
1327	1308
1328	1309	並び
1329	1310	C<\a>,
1330	1311	C<\c>,
1331	1312	C<\e>,
1332	1313	C<\f>,
1333	1314	C<\n>,
1334	1315	C<\N{I<NAME>}>,
1335	1316	C<\N{U+I<hex char>}>,
1336	1317	C<\r>,
1337	1318	C<\t>,
1338	1319	C<\x>
1339	1320	も特別で、大かっこ文字クラスの外側と同じ意味を持ちます。
1340	1321
1341	1322	=begin original
1342	1323
1343	1324	Also, a backslash followed by two or three octal digits is considered an octal
1344	1325	number.
1345	1326
1346	1327	=end original
1347	1328
1348	1329	また、逆スラッシュに引き続いて 2 または 3 桁の 8 進数字があると 8 進数として
1349	1330	扱われます。
1350	1331
1351	1332	=begin original
1352	1333
1353	1334	A C<[> is not special inside a character class, unless it's the start of a
1354	1335	POSIX character class (see L</POSIX Character Classes> below). It normally does
1355	1336	not need escaping.
1356	1337
1357	1338	=end original
1358	1339
1359	1340	C<[> は、POSIX 文字クラス(後述の L</POSIX Character Classes> 参照)の
1360	1341	開始でない限りは文字クラスの中では特別ではありません。
1361	1342	これは普通エスケープは不要です。
1362	1343
1363	1344	=begin original
1364	1345
1365	1346	A C<]> is normally either the end of a POSIX character class (see
1366	1347	L</POSIX Character Classes> below), or it signals the end of the bracketed
1367	1348	character class. If you want to include a C<]> in the set of characters, you
1368	1349	must generally escape it.
1369	1350
1370	1351	=end original
1371	1352
1372	1353	A C<]> は普通は POSIX 文字クラス(後述の L</POSIX Character Classes> 参照)の
1373	1354	終わりか、大かっこ文字クラスの終了を示すかどちらかです。
1374	1355	文字集合に C<]> を含める必要がある場合、一般的には
1375	1356	エスケープしなければなりません。
1376	1357
1377	1358	=begin original
1378	1359
1379	1360	However, if the C<]> is the I<first> (or the second if the first
1380	1361	character is a caret) character of a bracketed character class, it
1381	1362	does not denote the end of the class (as you cannot have an empty class)
1382	1363	and is considered part of the set of characters that can be matched without
1383	1364	escaping.
1384	1365
1385	1366	=end original
1386	1367
1387	1368	しかし、C<]> が大かっこ文字クラスの I<最初> (または最初の文字がキャレットなら
1388	1369	2 番目) の文字の場合、(空クラスを作ることはできないので)これはクラスの
1389	1370	終了を意味せず、エスケープなしでマッチングできる文字の集合の一部と
1390	1371	考えられます。
1391	1372
1392	1373	=begin original
1393	1374
1394	1375	Examples:
1395	1376
1396	1377	=end original
1397	1378
1398	1379	例:
1399	1380
1400	1381	=begin original
1401	1382
1402	1383	"+" =~ /[+?*]/ # Match, "+" in a character class is not special.
1403	1384	"\cH" =~ /[\b]/ # Match, \b inside in a character class
1404	1385	# is equivalent to a backspace.
1405	1386	"]" =~ /[][]/ # Match, as the character class contains
1406	1387	# both [ and ].
1407	1388	"[]" =~ /[[]]/ # Match, the pattern contains a character class
1408	1389	# containing just [, and the character class is
1409	1390	# followed by a ].
1410	1391
1411	1392	=end original
1412	1393
1413	1394	"+" =~ /[+?*]/ # マッチング; 文字クラス内の "+" は特別ではない。
1414	1395	"\cH" =~ /[\b]/ # マッチング; 文字クラスの内側の \b は後退と
1415	1396	# 等価。
1416	1397	"]" =~ /[][]/ # マッチング; 文字クラスに [ と ] の両方を
1417	1398	# 含んでいる。
1418	1399	"[]" =~ /[[]]/ # マッチング; パターンは [ だけを含んでいる
1419	1400	# 文字クラスと、それに引き続く
1420	1401	# ] からなる。
1421	1402
1422	1403	=head3 Bracketed Character Classes and the C</xx> pattern modifier
1423	1404
1424	1405	=begin original
1425	1406
1426	1407	Normally SPACE and TAB characters have no special meaning inside a
1427	1408	bracketed character class; they are just added to the list of characters
1428	1409	matched by the class. But if the L<C</xx>\|perlre/E<sol>x and E<sol>xx>
1429	1410	pattern modifier is in effect, they are generally ignored and can be
1430	1411	added to improve readability. They can't be added in the middle of a
1431	1412	single construct:
1432	1413
1433	1414	=end original
1434	1415
1435	1416	通常、大かっこ文字クラスの内側では SPACE と TAB の文字は
1436	1417	特別な意味はありません; これらは単にクラスによってマッチングされる文字の
1437	1418	リストに加えられます。
1438	1419	しかし、L<C</xx>\|perlre/E<sol>x and E<sol>xx> パターン修飾子が有効の場合、
1439	1420	これらは一般的に無視されるので、可読性を向上させるために追加できます。
1440	1421	これらは単一の構文の中には追加できません:
1441	1422
1442	1423	/ [ \x{10 FFFF} ] /xx # WRONG!
1443	1424
1444	1425	=begin original
1445	1426
1446	1427	The SPACE in the middle of the hex constant is illegal.
1447	1428
1448	1429	=end original
1449	1430
1450	1431	16 進定数の中の SPACE は不正です。
1451	1432
1452	1433	=begin original
1453	1434
1454	1435	To specify a literal SPACE character, you can escape it with a
1455	1436	backslash, like:
1456	1437
1457	1438	=end original
1458	1439
1459	1440	リテラルな SPACE 文字を指定するには、次のように逆スラッシュで
1460	1441	エスケープします:
1461	1442
1462	1443	/[ a e i o u \ ]/xx
1463	1444
1464	1445	=begin original
1465	1446
1466	1447	This matches the English vowels plus the SPACE character.
1467	1448
1468	1449	=end original
1469	1450
1470	1451	これは英語の母音と SPACE 文字に一致します。
1471	1452
1472	1453	=begin original
1473	1454
1474	1455	For clarity, you should already have been using C<\t> to specify a
1475	1456	literal tab, and C<\t> is unaffected by C</xx>.
1476	1457
1477	1458	=end original
1478	1459
1479	1460	確認すると、リテラルなタブのためには既に C<\t> を使っているべきで、
1480	1461	C<\t> は C</xx> の影響を受けません。
1481	1462
1482	1463	=head3 Character Ranges
1483	1464
1484	1465	(文字範囲)
1485	1466
1486	1467	=begin original
1487	1468
1488	1469	It is not uncommon to want to match a range of characters. Luckily, instead
1489	1470	of listing all characters in the range, one may use the hyphen (C<->).
1490	1471	If inside a bracketed character class you have two characters separated
1491	1472	by a hyphen, it's treated as if all characters between the two were in
1492	1473	the class. For instance, C<[0-9]> matches any ASCII digit, and C<[a-m]>
1493	1474	matches any lowercase letter from the first half of the ASCII alphabet.
1494	1475
1495	1476	=end original
1496	1477
1497	1478	文字のある範囲にマッチングしたいというのは珍しくありません。
1498	1479	幸運なことに、その範囲の文字を全て一覧に書く代わりに、ハイフン (C<->) を
1499	1480	使えます。
1500	1481	大かっこ文字クラスの内側で二つの文字がハイフンで区切られていると、
1501	1482	二つの文字の間の全ての文字がクラスに書かれているかのように扱われます。
1502	1483	例えば、C<[0-9]> は任意の ASCII 数字にマッチングし、C<[a-m]> は
1503	1484	ASCII アルファベットの前半分の小文字にマッチングします。
1504	1485
1505	1486	=begin original
1506	1487
1507	1488	Note that the two characters on either side of the hyphen are not
1508	1489	necessarily both letters or both digits. Any character is possible,
1509	1490	although not advisable. C<['-?]> contains a range of characters, but
1510	1491	most people will not know which characters that means. Furthermore,
1511	1492	such ranges may lead to portability problems if the code has to run on
1512	1493	a platform that uses a different character set, such as EBCDIC.
1513	1494
1514	1495	=end original
1515	1496
1516	1497	ハイフンのそれぞれの側の二つの文字は両方とも英字であったり両方とも
1517	1498	数字であったりする必要はないことに注意してください。
1518	1499	任意の文字が可能ですが、勧められません。
1519	1500	C<['-?]> は文字の範囲を含みますが、ほとんどの人はどの文字が含まれるか
1520	1501	分かりません。
1521	1502	さらに、このような範囲は、コードが EBCDIC のような異なった文字集合を使う
1522	1503	プラットフォームで実行されると移植性の問題を引き起こします。
1523	1504
1524	1505	=begin original
1525	1506
1526	1507	If a hyphen in a character class cannot syntactically be part of a range, for
1527	1508	instance because it is the first or the last character of the character class,
1528	1509	or if it immediately follows a range, the hyphen isn't special, and so is
1529	1510	considered a character to be matched literally. If you want a hyphen in
1530	1511	your set of characters to be matched and its position in the class is such
1531	1512	that it could be considered part of a range, you must escape that hyphen
1532	1513	with a backslash.
1533	1514
1534	1515	=end original
1535	1516
1536	1517	例えば文字クラスの最初または最後であったり、範囲の直後のために、文字クラスの
1537	1518	中のハイフンが文法的に範囲の一部となれない場合、ハイフンは特別ではなく、
1538	1519	リテラルにマッチングするべき文字として扱われます。
1539	1520	マッチングする文字の集合にハイフンを入れたいけれどもその位置が範囲の
1540	1521	一部として考えられる場合はハイフンを逆スラッシュで
1541	1522	エスケープしなければなりません。
1542	1523
1543	1524	=begin original
1544	1525
1545	1526	Examples:
1546	1527
1547	1528	=end original
1548	1529
1549	1530	例:
1550	1531
1551	1532	=begin original
1552	1533
1553	1534	[a-z] # Matches a character that is a lower case ASCII letter.
1554	1535	[a-fz] # Matches any letter between 'a' and 'f' (inclusive) or
1555	1536	# the letter 'z'.
1556	1537	[-z] # Matches either a hyphen ('-') or the letter 'z'.
1557	1538	[a-f-m] # Matches any letter between 'a' and 'f' (inclusive), the
1558	1539	# hyphen ('-'), or the letter 'm'.
1559	1540	['-?] # Matches any of the characters '()*+,-./0123456789:;<=>?
1560	1541	# (But not on an EBCDIC platform).
1561	1542	[\N{APOSTROPHE}-\N{QUESTION MARK}]
1562	1543	# Matches any of the characters '()*+,-./0123456789:;<=>?
1563	1544	# even on an EBCDIC platform.
1564	1545	[\N{U+27}-\N{U+3F}] # Same. (U+27 is "'", and U+3F is "?")
1565	1546
1566	1547	=end original
1567	1548
1568	1549	[a-z] # 小文字 ASCII 英字にマッチング。
1569	1550	[a-fz] # 'a' から 'f' の英字およびと 'z' の英字に
1570	1551	# マッチング。
1571	1552	[-z] # ハイフン ('-') または英字 'z' にマッチング。
1572	1553	[a-f-m] # 'a' から 'f' の英字、ハイフン ('-')、英字 'm' に
1573	1554	# マッチング。
1574	1555	['-?] # 文字 '()*+,-./0123456789:;<=>? のどれかにマッチング
1575	1556	# (しかし EBCDIC プラットフォームでは異なります)。
1576	1557	[\N{APOSTROPHE}-\N{QUESTION MARK}]
1577	1558	# たとえ EBCDIC プラットフォームでも '()*+,-./0123456789:;<=>?
1578	1559	# のいずれかの文字にマッチング。
1579	1560	[\N{U+27}-\N{U+3F}] # 同じ。 (U+27 は "'", U+3F は "?")
1580	1561
1581	1562	=begin original
1582	1563
1583		As the final two examples above show, you can achieve portability to
	1564	As the final two examples above show, you can achieve portablity to
1584	1565	non-ASCII platforms by using the C<\N{...}> form for the range
1585	1566	endpoints. These indicate that the specified range is to be interpreted
1586	1567	using Unicode values, so C<[\N{U+27}-\N{U+3F}]> means to match
1587	1568	C<\N{U+27}>, C<\N{U+28}>, C<\N{U+29}>, ..., C<\N{U+3D}>, C<\N{U+3E}>,
1588	1569	and C<\N{U+3F}>, whatever the native code point versions for those are.
1589	1570	These are called "Unicode" ranges. If either end is of the C<\N{...}>
1590	1571	form, the range is considered Unicode. A C<regexp> warning is raised
1591	1572	under C<S<"use re 'strict'">> if the other endpoint is specified
1592	1573	non-portably:
1593	1574
1594	1575	=end original
1595	1576
1596	1577	前述の最後の二つの例が示すように、範囲の端点に
1597	1578	C<\N{...}> 形式を使用することで、非 ASCII プラットフォームへの
1598	1579	移植性を実現できます。
1599	1580	これらは、指定された範囲が Unicode 値を使用して解釈されることを示しています;
1600	1581	したがって、C<[\N{U+27}-\N{U+3F}]>は、C<\N{U+27}>、C<\N{U+28}>、
1601	1582	C<\N{U+29}>、...、C<\N{U+3D}>、C<\N{U+3E}>、C<\N{U+3F}> に
1602	1583	マッチングすることを意味します;
1603	1584	これらのネイティブ符号位置のバージョンが何であっても一致します。
1604	1585	これらは "Unicode" 範囲と呼ばれます。
1605	1586	いずれかの端点が C<\N{...}> 形式の場合、範囲は Unicode と見なされます。
1606	1587	もう一方の端点が移植性がない形で指定されている場合、
1607	1588	C<S<"use re 'strict'">> の下で C<regexp> 警告が発生します:
1608	1589
1609	1590	[\N{U+00}-\x09] # Warning under re 'strict'; \x09 is non-portable
1610	1591	[\N{U+00}-\t] # No warning;
1611	1592
1612	1593	=begin original
1613	1594
1614	1595	Both of the above match the characters C<\N{U+00}> C<\N{U+01}>, ...
1615	1596	C<\N{U+08}>, C<\N{U+09}>, but the C<\x09> looks like it could be a
1616	1597	mistake so the warning is raised (under C<re 'strict'>) for it.
1617	1598
1618	1599	=end original
1619	1600
1620	1601	前述の両方とも文字 C<\N{U+00}> C<\N{U+01}>, ...
1621	1602	C<\N{U+08}>, C<\N{U+09}> にマッチングしますが、
1622	1603	C<\x09> は誤りのように見えるので、
1623	1604	(C<re 'strict'> の下で) 警告が発生します。
1624	1605
1625	1606	=begin original
1626	1607
1627	1608	Perl also guarantees that the ranges C<A-Z>, C<a-z>, C<0-9>, and any
1628	1609	subranges of these match what an English-only speaker would expect them
1629	1610	to match on any platform. That is, C<[A-Z]> matches the 26 ASCII
1630	1611	uppercase letters;
1631	1612	C<[a-z]> matches the 26 lowercase letters; and C<[0-9]> matches the 10
1632	1613	digits. Subranges, like C<[h-k]>, match correspondingly, in this case
1633	1614	just the four letters C<"h">, C<"i">, C<"j">, and C<"k">. This is the
1634	1615	natural behavior on ASCII platforms where the code points (ordinal
1635	1616	values) for C<"h"> through C<"k"> are consecutive integers (0x68 through
1636	1617	0x6B). But special handling to achieve this may be needed on platforms
1637	1618	with a non-ASCII native character set. For example, on EBCDIC
1638	1619	platforms, the code point for C<"h"> is 0x88, C<"i"> is 0x89, C<"j"> is
1639	1620	0x91, and C<"k"> is 0x92. Perl specially treats C<[h-k]> to exclude the
1640	1621	seven code points in the gap: 0x8A through 0x90. This special handling is
1641	1622	only invoked when the range is a subrange of one of the ASCII uppercase,
1642	1623	lowercase, and digit ranges, AND each end of the range is expressed
1643	1624	either as a literal, like C<"A">, or as a named character (C<\N{...}>,
1644	1625	including the C<\N{U+...> form).
1645	1626
1646	1627	=end original
1647	1628
1648	1629	Perl はまた、範囲 C<A-Z>、C<a-z>、C<0-9>、およびこれらの部分範囲が、
1649	1630	英語のみの話者が一致すると予想する範囲とどのプラットフォームでも
1650	1631	一致することを保証します。
1651	1632	つまり、C<[A-Z]> はASCII の大文字 26 文字と一致します;
1652	1633	C<[a-z]> は小文字 26 文字と一致します;
1653	1634	C<[0-9]>は 10 の数字と一致します。
1654	1635	C<[h-k]> のような部分範囲もこれに対応して一致します;
1655	1636	この場合、4 文字 C<"h">、C<"i">、C<"j">、C<"k"> だけが一致します。
1656	1637	これは、C<"h"> から C<"k"> までの符号位置(序数値)が連続した
1657	1638	整数(0x68 から 0x6B)である ASCII プラットフォームでの自然な動作です。
1658	1639	しかし、非 ASCII ネイティブ文字集合を持つプラットフォームでは、
1659	1640	これを実現するための特別な処理が必要になるかもしれません。
1660	1641	たとえば、EBCDIC プラットフォームでは、C<"h"> のコードポイントは
1661	1642	0x88、C<"i"> は 0x89、C<"j"> は 0x91、C<"k"> は 0x92 です。
1662	1643	Perl は C<[h-k]> を特別に扱い、隙間にある七つの符号位置
1663	1644	(0x8A から 0x90)を除外します。
1664	1645	この特殊処理は、範囲が ASCII の大文字、小文字、数字の範囲の
1665	1646	いずれかの部分範囲であり、範囲の両端が C<"A"> のようなリテラル
1666	1647	または名前付き文字(C<\N{...}>(C<\N{U+...> 形式を含む))として表現されている
1667	1648	場合にのみ呼び出されます。
1668	1649
1669	1650	=begin original
1670	1651
1671	1652	EBCDIC Examples:
1672	1653
1673	1654	=end original
1674	1655
1675	1656	EBCDIC の例:
1676	1657
1677	1658	[i-j] # Matches either "i" or "j"
1678	1659	[i-\N{LATIN SMALL LETTER J}] # Same
1679	1660	[i-\N{U+6A}] # Same
1680	1661	[\N{U+69}-\N{U+6A}] # Same
1681	1662	[\x{89}-\x{91}] # Matches 0x89 ("i"), 0x8A .. 0x90, 0x91 ("j")
1682	1663	[i-\x{91}] # Same
1683	1664	[\x{89}-j] # Same
1684	1665	[i-J] # Matches, 0x89 ("i") .. 0xC1 ("J"); special
1685	1666	# handling doesn't apply because range is mixed
1686	1667	# case
1687	1668
1688	1669	=head3 Negation
1689	1670
1690	1671	(否定)
1691	1672
1692	1673	=begin original
1693	1674
1694	1675	It is also possible to instead list the characters you do not want to
1695	1676	match. You can do so by using a caret (C<^>) as the first character in the
1696	1677	character class. For instance, C<[^a-z]> matches any character that is not a
1697	1678	lowercase ASCII letter, which therefore includes more than a million
1698	1679	Unicode code points. The class is said to be "negated" or "inverted".
1699	1680
1700	1681	=end original
1701	1682
1702	1683	代わりにマッチングしたくない文字の一覧を指定することも可能です。
1703	1684	文字クラスの先頭の文字としてキャレット (C<^>) を使うことで実現します。
1704	1685	例えば、C<[^a-z]> 小文字の ASCII 英字以外の文字にマッチングします;
1705	1686	従って 100 万種類以上の Unicode 符号位置が含まれます。
1706	1687	このクラスは「否定」("negated") や「反転」("inverted")と呼ばれます。
1707	1688
1708	1689	=begin original
1709	1690
1710	1691	This syntax make the caret a special character inside a bracketed character
1711	1692	class, but only if it is the first character of the class. So if you want
1712	1693	the caret as one of the characters to match, either escape the caret or
1713	1694	else don't list it first.
1714	1695
1715	1696	=end original
1716	1697
1717	1698	この文法はキャレットを大かっこ文字クラスの内側で特別な文字にしますが、
1718	1699	クラスの最初の文字の場合のみです。
1719	1700	それでマッチングしたい文字の一つでキャレットを使いたい場合、キャレットを
1720	1701	エスケープするか、最初以外の位置に書いてください。
1721	1702
1722	1703	=begin original
1723	1704
1724	1705	In inverted bracketed character classes, Perl ignores the Unicode rules
1725	1706	that normally say that named sequence, and certain characters should
1726	1707	match a sequence of multiple characters use under caseless C</i>
1727	1708	matching. Following those rules could lead to highly confusing
1728	1709	situations:
1729	1710
1730	1711	=end original
1731	1712
1732	1713	否定大かっこ文字クラスでは、通常は大文字小文字を無視した C</i> マッチングの
1733	1714	下では名前空間とある種の文字が複数の文字並びにマッチングするということを
1734	1715	Perl は無視します。
1735	1716	これらの規則に従うととても混乱する状況を引き起こすことになるからです:
1736	1717
1737	1718	"ss" =~ /^[^\xDF]+$/ui; # Matches!
1738	1719
1739	1720	=begin original
1740	1721
1741	1722	This should match any sequences of characters that aren't C<\xDF> nor
1742	1723	what C<\xDF> matches under C</i>. C<"s"> isn't C<\xDF>, but Unicode
1743	1724	says that C<"ss"> is what C<\xDF> matches under C</i>. So which one
1744	1725	"wins"? Do you fail the match because the string has C<ss> or accept it
1745	1726	because it has an C<s> followed by another C<s>? Perl has chosen the
1746	1727	latter. (See note in L</Bracketed Character Classes> above.)
1747	1728
1748	1729	=end original
1749	1730
1750	1731	これは C</i> の下では C<\xDF> または C<\xDF> にマッチングするもの以外の
1751	1732	任意の文字並びにマッチングするべきです。
1752	1733	C<"s"> は C<\xDF> ではありませんが、
1753	1734	C</i> の下では C<"ss"> は C<\xDF> がマッチングするものと Unicode は
1754	1735	言っています。
1755	1736	ではどちらが「勝つ」のでしょうか?
1756	1737	文字列は C<ss> だからマッチングに失敗するのでしょうか、
1757	1738	それともこれは C<s> の後にもう一つの C<s> があるから成功するのでしょうか?
1758	1739	Perl は後者を選択しました。
1759	1740	(前述の L</Bracketed Character Classes> を参照してください。)
1760	1741
1761	1742	=begin original
1762	1743
1763	1744	Examples:
1764	1745
1765	1746	=end original
1766	1747
1767	1748	例:
1768	1749
1769	1750	=begin original
1770	1751
1771	1752	"e" =~ /[^aeiou]/ # No match, the 'e' is listed.
1772	1753	"x" =~ /[^aeiou]/ # Match, as 'x' isn't a lowercase vowel.
1773	1754	"^" =~ /[^^]/ # No match, matches anything that isn't a caret.
1774	1755	"^" =~ /[x^]/ # Match, caret is not special here.
1775	1756
1776	1757	=end original
1777	1758
1778	1759	"e" =~ /[^aeiou]/ # マッチングしない; 'e' がある。
1779	1760	"x" =~ /[^aeiou]/ # マッチング; 'x' は小文字の母音ではない。
1780	1761	"^" =~ /[^^]/ # マッチングしない; キャレット以外全てにマッチング。
1781	1762	"^" =~ /[x^]/ # マッチング; キャレットはここでは特別ではない。
1782	1763
1783	1764	=head3 Backslash Sequences
1784	1765
1785	1766	(逆スラッシュシーケンス)
1786	1767
1787	1768	=begin original
1788	1769
1789	1770	You can put any backslash sequence character class (with the exception of
1790	1771	C<\N> and C<\R>) inside a bracketed character class, and it will act just
1791	1772	as if you had put all characters matched by the backslash sequence inside the
1792	1773	character class. For instance, C<[a-f\d]> matches any decimal digit, or any
1793	1774	of the lowercase letters between 'a' and 'f' inclusive.
1794	1775
1795	1776	=end original
1796	1777
1797	1778	大かっこ文字クラスの中に(C<\N> と C<\R> を例外として)逆スラッシュシーケンス
1798	1779	文字クラスを置くことができ、逆スラッシュシーケンスにマッチングする全ての
1799	1780	文字を文字クラスの中に置いたかのように動作します。
1800	1781	例えば、C<[a-f\d]> は任意の 10 進数字、あるいは 'a' から 'f' までの小文字に
1801	1782	マッチングします。
1802	1783
1803	1784	=begin original
1804	1785
1805	1786	C<\N> within a bracketed character class must be of the forms C<\N{I<name>}>
1806	1787	or C<\N{U+I<hex char>}>, and NOT be the form that matches non-newlines,
1807	1788	for the same reason that a dot C<.> inside a bracketed character class loses
1808	1789	its special meaning: it matches nearly anything, which generally isn't what you
1809	1790	want to happen.
1810	1791
1811	1792	=end original
1812	1793
1813	1794	大かっこ文字クラスの中のドット C<.> が特別な意味を持たないのと同じ理由で、
1814	1795	大かっこ文字クラスの中の C<\N> は C<\N{I<name>}> または
1815	1796	C<\N{U+I<hex char>}> の形式で、かつ非改行マッチング形式でない形でなければ
1816	1797	なりません: これはほとんど何でもマッチングするので、一般的には起こって
1817	1798	欲しいことではありません。
1818	1799
1819	1800	=begin original
1820	1801
1821	1802	Examples:
1822	1803
1823	1804	=end original
1824	1805
1825	1806	例:
1826	1807
1827	1808	=begin original
1828	1809
1829	1810	/[\p{Thai}\d]/ # Matches a character that is either a Thai
1830	1811	# character, or a digit.
1831	1812	/[^\p{Arabic}()]/ # Matches a character that is neither an Arabic
1832	1813	# character, nor a parenthesis.
1833	1814
1834	1815	=end original
1835	1816
1836	1817	/[\p{Thai}\d]/ # タイ文字または数字の文字に
1837	1818	# マッチングする。
1838	1819	/[^\p{Arabic}()]/ # アラビア文字でもかっこでもない文字に
1839	1820	# マッチングする。
1840	1821
1841	1822	=begin original
1842	1823
1843	1824	Backslash sequence character classes cannot form one of the endpoints
1844	1825	of a range. Thus, you can't say:
1845	1826
1846	1827	=end original
1847	1828
1848	1829	逆スラッシュシーケンス文字クラスは範囲の端点の一つにはできません。
1849	1830	従って、以下のようにはできません:
1850	1831
1851	1832	/[\p{Thai}-\d]/ # Wrong!
1852	1833
1853	1834	=head3 POSIX Character Classes
1854	1835	X<character class> X<\p> X<\p{}>
1855	1836	X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
1856	1837	X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
1857	1838
1858	1839	(POSIX 文字クラス)
1859	1840
1860	1841	=begin original
1861	1842
1862	1843	POSIX character classes have the form C<[:class:]>, where I<class> is the
1863	1844	name, and the C<[:> and C<:]> delimiters. POSIX character classes only appear
1864	1845	I<inside> bracketed character classes, and are a convenient and descriptive
1865	1846	way of listing a group of characters.
1866	1847
1867	1848	=end original
1868	1849
1869	1850	POSIX 文字クラスは C<[:class:]> の形式で、I<class> は名前、C<[:> と C<:]> は
1870	1851	デリミタです。
1871	1852	POSIX 文字クラスは大かっこ文字クラスの I<内側> にのみ現れ、文字のグループを
1872	1853	一覧するのに便利で記述的な方法です。
1873	1854
1874	1855	=begin original
1875	1856
1876	1857	Be careful about the syntax,
1877	1858
1878	1859	=end original
1879	1860
1880	1861	文法について注意してください、
1881	1862
1882	1863	# Correct:
1883	1864	$string =~ /[[:alpha:]]/
1884	1865
1885	1866	# Incorrect (will warn):
1886	1867	$string =~ /[:alpha:]/
1887	1868
1888	1869	=begin original
1889	1870
1890	1871	The latter pattern would be a character class consisting of a colon,
1891	1872	and the letters C<a>, C<l>, C<p> and C<h>.
1892	1873	POSIX character classes can be part of a larger bracketed character class.
1893	1874	For example,
1894	1875
1895	1876	=end original
1896	1877
1897	1878	後者のパターンは、コロンおよび C<a>, C<l>, C<p>, C<h> の文字からなる
1898	1879	文字クラスです。
1899	1880	これら文字クラスはより大きな大かっこ文字クラスの一部にできます。
1900	1881	例えば、
1901	1882
1902	1883	[01[:alpha:]%]
1903	1884
1904	1885	=begin original
1905	1886
1906	1887	is valid and matches '0', '1', any alphabetic character, and the percent sign.
1907	1888
1908	1889	=end original
1909	1890
1910	1891	これは妥当で、'0'、'1'、任意の英字、パーセントマークにマッチングします。
1911	1892
1912	1893	=begin original
1913	1894
1914	1895	Perl recognizes the following POSIX character classes:
1915	1896
1916	1897	=end original
1917	1898
1918	1899	Perl は以下の POSIX 文字クラスを認識します:
1919	1900
1920	1901	=begin original
1921	1902
1922		alpha Any alphabetical character (~~e.g.,~~ [A-Za-z]).
	1903	alpha Any alphabetical character ("[A-Za-z]").
1923		alnum Any alphanumeric character (~~e.g.,~~ [A-Za-z0-9]).
	1904	alnum Any alphanumeric character ("[A-Za-z0-9]").
1924	1905	ascii Any character in the ASCII character set.
1925	1906	blank A GNU extension, equal to a space or a horizontal tab ("\t").
1926	1907	cntrl Any control character. See Note [2] below.
1927		digit Any decimal digit (~~e.g.,~~ [0-9]), equivalent to "\d".
	1908	digit Any decimal digit ("[0-9]"), equivalent to "\d".
1928	1909	graph Any printable character, excluding a space. See Note [3] below.
1929		lower Any lowercase character (~~e.g.,~~ [a-z]).
	1910	lower Any lowercase character ("[a-z]").
1930	1911	print Any printable character, including a space. See Note [4] below.
1931	1912	punct Any graphical character excluding "word" characters. Note [5].
1932	1913	space Any whitespace character. "\s" including the vertical tab
1933	1914	("\cK").
1934		upper Any uppercase character (~~e.g.,~~ [A-Z]).
	1915	upper Any uppercase character ("[A-Z]").
1935		word A Perl extension (~~e.g.,~~ [A-Za-z0-9_]), equivalent to "\w".
	1916	word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w".
1936		xdigit Any hexadecimal digit (~~e.g.,~~ [0-9a-fA-F])~~. Note [7]~~.
	1917	xdigit Any hexadecimal digit ("[0-9a-fA-F]").
1937	1918
1938	1919	=end original
1939	1920
1940		alpha 任意の英字 (例: [A-Za-z])。
	1921	alpha 任意の英字 ("[A-Za-z]")。
1941		alnum 任意の英数字。(例: [A-Za-z0-9])
	1922	alnum 任意の英数字。("[A-Za-z0-9]")
1942	1923	ascii 任意の ASCII 文字集合の文字。
1943		blank GNU 拡張; スペースまたは水平タブ (\t) と同じ。
	1924	blank GNU 拡張; スペースまたは水平タブ ("\t") と同じ。
1944	1925	cntrl 任意の制御文字。後述の [2] 参照。
1945		digit 任意の 10 進数字 (例: [0-9]); "\d" と等価。
	1926	digit 任意の 10 進数字 ("[0-9]"); "\d" と等価。
1946	1927	graph 任意の表示文字; スペースを除く。後述の [3] 参照。
1947		lower 任意の小文字 (例: [a-z])。
	1928	lower 任意の小文字 ("[a-z]")。
1948	1929	print 任意の表示文字; スペースを含む。後述の [4] 参照。
1949	1930	punct 任意の「単語」文字を除く表示文字。[5] 参照。
1950	1931	space 任意の空白文字。水平タブ ("\cK") を含む "\s"。
1951		upper 任意の大文字 (例: [A-Z])。
	1932	upper 任意の大文字 ("[A-Z]")。
1952		word Perl 拡張 (例: [A-Za-z0-9_]); "\w" と等価。
	1933	word Perl 拡張 ("[A-Za-z0-9_]"); "\w" と等価。
1953		xdigit 任意の 16 進文字 (例: [0-9a-fA-F])~~。[7] 参照~~。
	1934	xdigit 任意の 16 進文字 ("[0-9a-fA-F]")。
1954	1935
1955	1936	=begin original
1956	1937
1957	1938	Like the L<Unicode properties\|/Unicode Properties>, most of the POSIX
1958	1939	properties match the same regardless of whether case-insensitive (C</i>)
1959	1940	matching is in effect or not. The two exceptions are C<[:upper:]> and
1960	1941	C<[:lower:]>. Under C</i>, they each match the union of C<[:upper:]> and
1961	1942	C<[:lower:]>.
1962	1943
1963	1944	=end original
1964	1945
1965	1946	L<Unicode properties\|/Unicode Properties> と同様、
1966	1947	ほとんどの POSIX 特性は、大文字小文字無視 (C</i>) が有効かどうかに関わらず
1967	1948	同じものにマッチングします。
1968	1949	二つの例外は C<[:upper:]> と C<[:lower:]> です。
1969	1950	C</i> の下では、これらそれぞれ C<[:upper:]> と C<[:lower:]> の和集合に
1970	1951	マッチングします。
1971	1952
1972	1953	=begin original
1973	1954
1974	1955	Most POSIX character classes have two Unicode-style C<\p> property
1975	1956	counterparts. (They are not official Unicode properties, but Perl extensions
1976	1957	derived from official Unicode properties.) The table below shows the relation
1977	1958	between POSIX character classes and these counterparts.
1978	1959
1979	1960	=end original
1980	1961
1981	1962	ほとんどの POSIX 文字クラスには、対応する二つの Unicode 式の C<\p> 特性が
1982	1963	あります。
1983	1964	(これは公式 Unicode 特性ではなく、公式 Unicode 特性から派生した Perl
1984	1965	エクステンションです。)
1985	1966	以下の表は POSIX 文字クラスと対応するものとの関連を示します。
1986	1967
1987	1968	=begin original
1988	1969
1989	1970	One counterpart, in the column labelled "ASCII-range Unicode" in
1990	1971	the table, matches only characters in the ASCII character set.
1991	1972
1992	1973	=end original
1993	1974
1994	1975	対応物の一つである、表で "ASCII-range Unicode" と書かれた列のものは、
1995	1976	ASCII 文字集合の文字にのみマッチングします。
1996	1977
1997	1978	=begin original
1998	1979
1999	1980	The other counterpart, in the column labelled "Full-range Unicode", matches any
2000	1981	appropriate characters in the full Unicode character set. For example,
2001	1982	C<\p{Alpha}> matches not just the ASCII alphabetic characters, but any
2002	1983	character in the entire Unicode character set considered alphabetic.
2003	1984	An entry in the column labelled "backslash sequence" is a (short)
2004	1985	equivalent.
2005	1986
2006	1987	=end original
2007	1988
2008	1989	もう一つの対応物である、"Full-range Unicode" と書かれた列のものは、
2009	1990	Unicode 文字集合全体の中の適切な任意の文字にマッチングします。
2010	1991	例えば、C<\p{Alpha}> は単に ASCII アルファベット文字だけでなく、
2011	1992	Unicode 文字集合全体の中からアルファベットと考えられる任意の文字に
2012	1993	マッチングします。
2013	1994	"backslash sequence" の列は (短い) 同義語です。
2014	1995
2015	1996	[[:...:]] ASCII-range Full-range backslash Note
2016	1997	Unicode Unicode sequence
2017	1998	-----------------------------------------------------
2018	1999	alpha \p{PosixAlpha} \p{XPosixAlpha}
2019	2000	alnum \p{PosixAlnum} \p{XPosixAlnum}
2020	2001	ascii \p{ASCII}
2021	2002	blank \p{PosixBlank} \p{XPosixBlank} \h [1]
2022	2003	or \p{HorizSpace} [1]
2023	2004	cntrl \p{PosixCntrl} \p{XPosixCntrl} [2]
2024	2005	digit \p{PosixDigit} \p{XPosixDigit} \d
2025	2006	graph \p{PosixGraph} \p{XPosixGraph} [3]
2026	2007	lower \p{PosixLower} \p{XPosixLower}
2027	2008	print \p{PosixPrint} \p{XPosixPrint} [4]
2028	2009	punct \p{PosixPunct} \p{XPosixPunct} [5]
2029	2010	\p{PerlSpace} \p{XPerlSpace} \s [6]
2030	2011	space \p{PosixSpace} \p{XPosixSpace} [6]
2031	2012	upper \p{PosixUpper} \p{XPosixUpper}
2032	2013	word \p{PosixWord} \p{XPosixWord} \w
2033		xdigit \p{PosixXDigit} \p{XPosixXDigit} ~~[7]~~
	2014	xdigit \p{PosixXDigit} \p{XPosixXDigit}
2034	2015
2035	2016	=over 4
2036	2017
2037	2018	=item [1]
2038	2019
2039	2020	=begin original
2040	2021
2041	2022	C<\p{Blank}> and C<\p{HorizSpace}> are synonyms.
2042	2023
2043	2024	=end original
2044	2025
2045	2026	C<\p{Blank}> と C<\p{HorizSpace}> は同義語です。
2046	2027
2047	2028	=item [2]
2048	2029
2049	2030	=begin original
2050	2031
2051	2032	Control characters don't produce output as such, but instead usually control
2052	2033	the terminal somehow: for example, newline and backspace are control characters.
2053	2034	On ASCII platforms, in the ASCII range, characters whose code points are
2054	2035	between 0 and 31 inclusive, plus 127 (C<DEL>) are control characters; on
2055	2036	EBCDIC platforms, their counterparts are control characters.
2056	2037
2057	2038	=end original
2058	2039
2059	2040	制御文字はそれ自体は出力されず、普通は何か端末を制御します: 例えば
2060	2041	改行と後退は制御文字です。
2061	2042	ASCII プラットフォームで、ASCII の範囲では、符号位置が 0 から 31 までの
2062	2043	範囲の文字および 127 (C<DEL>) が制御文字です;
2063	2044	EBCDIC プラットフォームでは、対応するものは制御文字です。
2064	2045
2065	2046	=item [3]
2066	2047
2067	2048	=begin original
2068	2049
2069	2050	Any character that is I<graphical>, that is, visible. This class consists
2070	2051	of all alphanumeric characters and all punctuation characters.
2071	2052
2072	2053	=end original
2073	2054
2074	2055	I<graphical>、つまり見える文字。
2075	2056	このクラスは全ての英数字と全ての句読点文字。
2076	2057
2077	2058	=item [4]
2078	2059
2079	2060	=begin original
2080	2061
2081	2062	All printable characters, which is the set of all graphical characters
2082	2063	plus those whitespace characters which are not also controls.
2083	2064
2084	2065	=end original
2085	2066
2086	2067	全ての表示可能な文字; 全ての graphical 文字に加えて制御文字でない空白文字。
2087	2068
2088	2069	=item [5]
2089	2070
2090	2071	=begin original
2091	2072
2092	2073	C<\p{PosixPunct}> and C<[[:punct:]]> in the ASCII range match all
2093	2074	non-controls, non-alphanumeric, non-space characters:
2094	2075	C<[-!"#$%&'()*+,./:;<=E<gt>?@[\\\]^_`{\|}~]> (although if a locale is in effect,
2095	2076	it could alter the behavior of C<[[:punct:]]>).
2096	2077
2097	2078	=end original
2098	2079
2099	2080	ASCII の範囲の C<\p{PosixPunct}> と C<[[:punct:]]> は全ての非制御、非英数字、
2100	2081	非空白文字にマッチングします:
2101	2082	C<[-!"#$%&'()*+,./:;<=E<gt>?@[\\\]^_`{\|}~]> (しかしロケールが有効なら、
2102	2083	C<[[:punct:]]> の振る舞いが変わります)。
2103	2084
2104	2085	=begin original
2105	2086
2106	2087	The similarly named property, C<\p{Punct}>, matches a somewhat different
2107	2088	set in the ASCII range, namely
2108	2089	C<[-!"#%&'()*,./:;?@[\\\]_{}]>. That is, it is missing the nine
2109	2090	characters C<[$+E<lt>=E<gt>^`\|~]>.
2110	2091	This is because Unicode splits what POSIX considers to be punctuation into two
2111	2092	categories, Punctuation and Symbols.
2112	2093
2113	2094	=end original
2114	2095
2115	2096	似たような名前の特性 C<\p{Punct}> は、ASCII 範囲の異なる集合である
2116	2097	C<[-!"#%&'()*,./:;?@[\\\]_{}]> にマッチングします。
2117	2098	つまり、C<[$+E<lt>=E<gt>^`\|~]> の 9 文字はありません。
2118	2099	これは、Unicode は POSIX が句読点と考えるものを二つのカテゴリ
2119	2100	Punctuation と Symbols に分けているからです。
2120	2101
2121	2102	=begin original
2122	2103
2123	2104	C<\p{XPosixPunct}> and (under Unicode rules) C<[[:punct:]]>, match what
2124	2105	C<\p{PosixPunct}> matches in the ASCII range, plus what C<\p{Punct}>
2125	2106	matches. This is different than strictly matching according to
2126	2107	C<\p{Punct}>. Another way to say it is that
2127	2108	if Unicode rules are in effect, C<[[:punct:]]> matches all characters
2128	2109	that Unicode considers punctuation, plus all ASCII-range characters that
2129	2110	Unicode considers symbols.
2130	2111
2131	2112	=end original
2132	2113
2133	2114	C<\p{XPosixPunct}> と (Unicode の規則の下での) C<[[:punct:]]> は、
2134	2115	ASCII の範囲で C<\p{PosixPunct}> がマッチングする物に加えて、
2135	2116	C<\p{Punct}> がマッチングする物にマッチングします。
2136	2117	これは C<\p{Punct}> に従って正確にマッチングする物と異なります。
2137	2118	Unicode 規則が有効な場合のもう一つの言い方は、C<[[:punct:]]> は Unicode が
2138	2119	句読点として扱うものに加えて、Unicode が "symbols" として扱う ASCII 範囲の
2139	2120	全ての文字にマッチングします。
2140	2121
2141	2122	=item [6]
2142	2123
2143	2124	=begin original
2144	2125
2145	2126	C<\p{XPerlSpace}> and C<\p{Space}> match identically starting with Perl
2146	2127	v5.18. In earlier versions, these differ only in that in non-locale
2147	2128	matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>.
2148	2129	Same for the two ASCII-only range forms.
2149	2130
2150	2131	=end original
2151	2132
2152	2133	C<\p{XPerlSpace}> と C<\p{Space}> は、Perl v5.18 からは同じように
2153	2134	マッチングします。
2154	2135	以前のバージョンでは、これらの違いは、非ロケールマッチングでは
2155	2136	C<\p{XPerlSpace}> は垂直タブ C<\cK> にもマッチングしないということだけです。
2156	2137	二つの ASCII のみの範囲の形式では同じです。
2157	2138
2158		=item [7]
2159
2160		=begin original
2161
2162		Unlike C<[[:digit:]]> which matches digits in many writing systems, such
2163		as Thai and Devanagari, there are currently only two sets of hexadecimal
2164		digits, and it is unlikely that more will be added. This is because you
2165		not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>)
2166		to correspond. That means only the Latin script is suitable for these,
2167		and Unicode has only two sets of these, the familiar ASCII set, and the
2168		fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO).
2169
2170		=end original
2171
2172		タイ文字やデバナーガリ文字のように多くの書記体系の数字にマッチングする
2173		C<[[:digit:]]> と異なり、16 進数の二つの集合だけで、これ以上追加されることは
2174		おそらくありません。
2175		これは、対応するのに 10 の数字だけでなく、6 個の C<[A-F]> (および C<[a-f]>) も
2176		必要だからです。
2177		これは、Latin 用字のみがこれらに適合していて、
2178		Unicode はこれらの二つの集合、つまり慣れ親しんだ
2179		ASCII 集合と、U+FF10 (FULLWIDTH DIGIT ZERO) から始まる全角形式のみを
2180		持つということです。
2181
2182	2139	=back
2183	2140
2184	2141	=begin original
2185	2142
2186	2143	There are various other synonyms that can be used besides the names
2187	2144	listed in the table. For example, C<\p{XPosixAlpha}> can be written as
2188	2145	C<\p{Alpha}>. All are listed in
2189	2146	L<perluniprops/Properties accessible through \p{} and \P{}>.
2190	2147
2191	2148	=end original
2192	2149
2193	2150	表に挙げられている名前以外にも様々なその他の同義語が使えます。
2194	2151	例えば、C<\p{XPosixAlpha}> は C<\p{Alpha}> と書けます。
2195	2152	全ての一覧は
2196	2153	L<perluniprops/Properties accessible through \p{} and \P{}> に
2197	2154	あります。
2198	2155
2199	2156	=begin original
2200	2157
2201	2158	Both the C<\p> counterparts always assume Unicode rules are in effect.
2202	2159	On ASCII platforms, this means they assume that the code points from 128
2203	2160	to 255 are Latin-1, and that means that using them under locale rules is
2204	2161	unwise unless the locale is guaranteed to be Latin-1 or UTF-8. In contrast, the
2205	2162	POSIX character classes are useful under locale rules. They are
2206	2163	affected by the actual rules in effect, as follows:
2207	2164
2208	2165	=end original
2209	2166
2210	2167	C<\p> に対応するものの両方は常に Unicode の規則が有効であることを仮定します。
2211	2168	これは、ASCII プラットフォームでは、128 から 255 の符号位置は
2212	2169	Latin-1 であることを仮定するということで、ロケールの規則の下で
2213	2170	これらを使うということは、ロケールが Latin-1 か UTF-8 であることが
2214	2171	補償されていない限り賢明ではないということです。
2215	2172	一方、POSIX 文字クラスはロケールの規則の下で有用です。
2216	2173	これらは次のように、実際に有効な規則に影響を受けます:
2217	2174
2218	2175	=over
2219	2176
2220	2177	=item If the C</a> modifier, is in effect ...
2221	2178
2222	2179	(C</a> が有効なら...)
2223	2180
2224	2181	=begin original
2225	2182
2226	2183	Each of the POSIX classes matches exactly the same as their ASCII-range
2227	2184	counterparts.
2228	2185
2229	2186	=end original
2230	2187
2231	2188	それぞれの POSIX クラスは ASCII の範囲で対応する正確に同じものに
2232	2189	マッチングします。
2233	2190
2234	2191	=item otherwise ...
2235	2192
2236	2193	(さもなければ ...)
2237	2194
2238	2195	=over
2239	2196
2240	2197	=item For code points above 255 ...
2241	2198
2242	2199	(256 以上の符号位置では ...)
2243	2200
2244	2201	=begin original
2245	2202
2246	2203	The POSIX class matches the same as its Full-range counterpart.
2247	2204
2248	2205	=end original
2249	2206
2250	2207	POSIX クラスはその Full の範囲で対応する同じものにマッチングします。
2251	2208
2252	2209	=item For code points below 256 ...
2253	2210
2254	2211	(255 以下の符号位置では ...)
2255	2212
2256	2213	=over
2257	2214
2258	2215	=item if locale rules are in effect ...
2259	2216
2260	2217	(ロケール規則が有効なら ...)
2261	2218
2262	2219	=begin original
2263	2220
2264	2221	The POSIX class matches according to the locale, except:
2265	2222
2266	2223	=end original
2267	2224
2268	2225	POSIX クラスはロケールに従ってマッチングします; 例外は:
2269	2226
2270	2227	=over
2271	2228
2272	2229	=item C<word>
2273	2230
2274	2231	=begin original
2275	2232
2276	2233	also includes the platform's native underscore character, no matter what
2277	2234	the locale is.
2278	2235
2279	2236	=end original
2280	2237
2281	2238	それに加えて、ロケールが何かに関わらず、プラットフォームのネイティブな
2282	2239	下線文字を使います。
2283	2240
2284	2241	=item C<ascii>
2285	2242
2286	2243	=begin original
2287	2244
2288	2245	on platforms that don't have the POSIX C<ascii> extension, this matches
2289	2246	just the platform's native ASCII-range characters.
2290	2247
2291	2248	=end original
2292	2249
2293	2250	POSIX C<ascii> 拡張を持たないプラットフォームでは、
2294	2251	これは単にプラットフォームのネイティブな ASCII の範囲の文字に
2295	2252	マッチングします。
2296	2253
2297	2254	=item C<blank>
2298	2255
2299	2256	=begin original
2300	2257
2301	2258	on platforms that don't have the POSIX C<blank> extension, this matches
2302	2259	just the platform's native tab and space characters.
2303	2260
2304	2261	=end original
2305	2262
2306	2263	on platforms that don't have the
2307	2264	POSIX C<blank> 格調を持たないプラットフォームでは、
2308	2265	これは単にプラットフォームのネイティブなタブとすぺーす文字に
2309	2266	マッチングします。
2310	2267
2311	2268	=back
2312	2269
2313	2270	=item if, instead, Unicode rules are in effect ...
2314	2271
2315	2272	(そうではなく、Unicode 規則が有効なら ...)
2316	2273
2317	2274	=begin original
2318	2275
2319	2276	The POSIX class matches the same as the Full-range counterpart.
2320	2277
2321	2278	=end original
2322	2279
2323	2280	POSIX クラスは Full の範囲の対応する同じものにマッチングします。
2324	2281
2325	2282	=item otherwise ...
2326	2283
2327	2284	(さもなければ ...)
2328	2285
2329	2286	=begin original
2330	2287
2331	2288	The POSIX class matches the same as the ASCII range counterpart.
2332	2289
2333	2290	=end original
2334	2291
2335	2292	POSIX クラスは ASCII の範囲の同じものにマッチングします。
2336	2293
2337	2294	=back
2338	2295
2339	2296	=back
2340	2297
2341	2298	=back
2342	2299
2343	2300	=begin original
2344	2301
2345	2302	Which rules apply are determined as described in
2346	2303	L<perlre/Which character set modifier is in effect?>.
2347	2304
2348	2305	=end original
2349	2306
2350	2307	どの規則を適用するかは L<perlre/Which character set modifier is in effect?> で
2351	2308	記述されている方法で決定されます。
2352	2309
	2310	=begin original
	2311
	2312	It is proposed to change this behavior in a future release of Perl so that
	2313	whether or not Unicode rules are in effect would not change the
	2314	behavior: Outside of locale, the POSIX classes
	2315	would behave like their ASCII-range counterparts. If you wish to
	2316	comment on this proposal, send email to C<perl5-porters@perl.org>.
	2317
	2318	=end original
	2319
	2320	Perl の将来のバージョンではこの振る舞いを変えることが提案されています;
	2321	Unicode の規則が有効かどうかは振る舞いを変えません:
	2322	ロケールの外側では、
	2323	POSIX クラスはその ASCII の範囲の対応するものと同様に振る舞います。
	2324	この提案にコメントしたいなら、C<perl5-porters@perl.org> にメールを
	2325	送ってください。
	2326
2353	2327	=head4 Negation of POSIX character classes
2354	2328	X<character class, negation>
2355	2329
2356	2330	(POSIX 文字クラスの否定)
2357	2331
2358	2332	=begin original
2359	2333
2360	2334	A Perl extension to the POSIX character class is the ability to
2361	2335	negate it. This is done by prefixing the class name with a caret (C<^>).
2362	2336	Some examples:
2363	2337
2364	2338	=end original
2365	2339
2366	2340	POSIX 文字クラスに対する Perl の拡張は否定の機能です。
2367	2341	これはクラス名の前にキャレット (C<^>) を置くことで実現します。
2368	2342	いくつかの例です:
2369	2343
2370	2344	POSIX ASCII-range Full-range backslash
2371	2345	Unicode Unicode sequence
2372	2346	-----------------------------------------------------
2373	2347	[[:^digit:]] \P{PosixDigit} \P{XPosixDigit} \D
2374	2348	[[:^space:]] \P{PosixSpace} \P{XPosixSpace}
2375	2349	\P{PerlSpace} \P{XPerlSpace} \S
2376	2350	[[:^word:]] \P{PerlWord} \P{XPosixWord} \W
2377	2351
2378	2352	=begin original
2379	2353
2380	2354	The backslash sequence can mean either ASCII- or Full-range Unicode,
2381	2355	depending on various factors as described in L<perlre/Which character set modifier is in effect?>.
2382	2356
2383	2357	=end original
2384	2358
2385	2359	逆スラッシュシーケンスは ASCII- か Full-range Unicode のどちらかを意味します;
2386	2360	どちらが使われるかは L<perlre/Which character set modifier is in effect?> で
2387	2361	記述されている様々な要素に依存します。
2388	2362
2389	2363	=head4 [= =] and [. .]
2390	2364
2391	2365	([= =] と [. .])
2392	2366
2393	2367	=begin original
2394	2368
2395	2369	Perl recognizes the POSIX character classes C<[=class=]> and
2396	2370	C<[.class.]>, but does not (yet?) support them. Any attempt to use
2397	2371	either construct raises an exception.
2398	2372
2399	2373	=end original
2400	2374
2401	2375	Perl は POSIX 文字クラス C<[=class=]> と C<[.class.]> を認識しますが、
2402	2376	これらには(まだ?)対応していません。
2403	2377	このような構文を使おうとすると例外が発生します。
2404	2378
2405	2379	=head4 Examples
2406	2380
2407	2381	(例)
2408	2382
2409	2383	=begin original
2410	2384
2411	2385	/[[:digit:]]/ # Matches a character that is a digit.
2412	2386	/[01[:lower:]]/ # Matches a character that is either a
2413	2387	# lowercase letter, or '0' or '1'.
2414	2388	/[[:digit:][:^xdigit:]]/ # Matches a character that can be anything
2415	2389	# except the letters 'a' to 'f' and 'A' to
2416	2390	# 'F'. This is because the main character
2417	2391	# class is composed of two POSIX character
2418	2392	# classes that are ORed together, one that
2419	2393	# matches any digit, and the other that
2420	2394	# matches anything that isn't a hex digit.
2421	2395	# The OR adds the digits, leaving only the
2422	2396	# letters 'a' to 'f' and 'A' to 'F' excluded.
2423	2397
2424	2398	=end original
2425	2399
2426	2400	/[[:digit:]]/ # 数字の文字にマッチングする。
2427	2401	/[01[:lower:]]/ # 小文字、'0'、'1' のいずれかの文字に
2428	2402	# マッチングする。
2429	2403	/[[:digit:][:^xdigit:]]/ # 'a' から 'f' と 'A' から 'F' 以外の任意の文字に
2430	2404	# マッチング。これはメインの文字クラスでは二つの
2431	2405	# POSIX 文字クラスが OR され、一つは任意の数字に
2432	2406	# マッチングし、もう一つは 16 進文字でない全ての
2433	2407	# 文字にマッチングします。OR は数字を加え、
2434	2408	# 'a' から 'f' および 'A' から 'F' のみが
2435	2409	# 除外されて残ります。
2436	2410	#
2437	2411
2438	2412	=head3 Extended Bracketed Character Classes
2439	2413	X<character class>
2440	2414	X<set operations>
2441	2415
2442	2416	(拡張大かっこ文字クラス)
2443	2417
2444	2418	=begin original
2445	2419
2446	2420	This is a fancy bracketed character class that can be used for more
2447	2421	readable and less error-prone classes, and to perform set operations,
2448	2422	such as intersection. An example is
2449	2423
2450	2424	=end original
2451	2425
2452	2426	これはしゃれた大かっこ文字クラスで、より読みやすく、エラーが発生しにくい
2453	2427	クラスや、交差などの集合演算を実行するために使用できます。
2454	2428
2455	2429	/(?[ \p{Thai} & \p{Digit} ])/
2456	2430
2457	2431	=begin original
2458	2432
2459	2433	This will match all the digit characters that are in the Thai script.
2460	2434
2461	2435	=end original
2462	2436
2463	2437	これは、タイ語スクリプト内のすべての数字と一致します。
2464	2438
2465	2439	=begin original
2466	2440
2467	2441	This is an experimental feature available starting in 5.18, and is
2468	2442	subject to change as we gain field experience with it. Any attempt to
2469	2443	use it will raise a warning, unless disabled via
2470	2444
2471	2445	=end original
2472	2446
2473	2447	これは 5.18 から利用できる実験的な機能で、現場での経験を積むにつれて
2474	2448	変更される可能性があります。
2475	2449	これを使用しようとすると、次のようにして無効にしない限り、警告が表示されます:
2476	2450
2477	2451	no warnings "experimental::regex_sets";
2478	2452
2479	2453	=begin original
2480	2454
2481	2455	Comments on this feature are welcome; send email to
2482	2456	C<perl5-porters@perl.org>.
2483	2457
2484	2458	=end original
2485	2459
2486	2460	この機能に関するコメントを歓迎します。
2487	2461	C<perl5-porters@perl.org> に電子メールを送ってください。
2488	2462
2489	2463	=begin original
2490	2464
2491	2465	The rules used by L<C<use re 'strict>\|re/'strict' mode> apply to this
2492	2466	construct.
2493	2467
2494	2468	=end original
2495	2469
2496	2470	L<C<use re 'strict>\|re/'strict' mode> で使われる規則はこの構文に
2497	2471	適用されます。
2498	2472
2499	2473	=begin original
2500	2474
2501	2475	We can extend the example above:
2502	2476
2503	2477	=end original
2504	2478
2505	2479	上記の例を拡張できます:
2506	2480
2507	2481	/(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])/
2508	2482
2509	2483	=begin original
2510	2484
2511	2485	This matches digits that are in either the Thai or Laotian scripts.
2512	2486
2513	2487	=end original
2514	2488
2515	2489	これはタイ語またはラオス語のいずれかの数字と一致します。
2516	2490
2517	2491	=begin original
2518	2492
2519	2493	Notice the white space in these examples. This construct always has
2520	2494	the C<E<sol>xx> modifier turned on within it.
2521	2495
2522	2496	=end original
2523	2497
2524	2498	これらの例の中の空白に注意してください。
2525	2499	この構文では、その中では常に C<E<sol>xx> 修飾子がオンになっています。
2526	2500
2527	2501	=begin original
2528	2502
2529	2503	The available binary operators are:
2530	2504
2531	2505	=end original
2532	2506
2533	2507	使用可能な 2 項演算子は次のとおりです:
2534	2508
2535	2509	& intersection
2536	2510	+ union
2537	2511	\| another name for '+', hence means union
2538	2512	- subtraction (the result matches the set consisting of those
2539	2513	code points matched by the first operand, excluding any that
2540	2514	are also matched by the second operand)
2541	2515	^ symmetric difference (the union minus the intersection). This
2542	2516	is like an exclusive or, in that the result is the set of code
2543	2517	points that are matched by either, but not both, of the
2544	2518	operands.
2545	2519
2546	2520	=begin original
2547	2521
2548	2522	There is one unary operator:
2549	2523
2550	2524	=end original
2551	2525
2552	2526	単項演算子が一つあります。
2553	2527
2554	2528	! complement
2555	2529
2556	2530	=begin original
2557	2531
2558	2532	All the binary operators left associate; C<"&"> is higher precedence
2559	2533	than the others, which all have equal precedence. The unary operator
2560	2534	right associates, and has highest precedence. Thus this follows the
2561	2535	normal Perl precedence rules for logical operators. Use parentheses to
2562	2536	override the default precedence and associativity.
2563	2537
2564	2538	=end original
2565	2539
2566	2540	すべての二項演算子は左結合です; C<"&"> はその他よりも高い優先順位を持ち、
2567	2541	それ以外は同等の優先順位を持ちます。
2568	2542	単項演算子は右結合で、最も高い優先順位を持ちます。
2569	2543	従って、これは通常の Perl の論理演算子に関する優先順位規則に従います。
2570	2544	デフォルトの優先順位と結合を上書きするにはかっこを使います。
2571	2545
2572	2546	=begin original
2573	2547
2574	2548	The main restriction is that everything is a metacharacter. Thus,
2575	2549	you cannot refer to single characters by doing something like this:
2576	2550
2577	2551	=end original
2578	2552
2579	2553	主な制限は、すべてがメタ文字であるということです。
2580	2554	したがって、以下のようにして単一文字を参照することはできません:
2581	2555
2582	2556	/(?[ a + b ])/ # Syntax error!
2583	2557
2584	2558	=begin original
2585	2559
2586	2560	The easiest way to specify an individual typable character is to enclose
2587	2561	it in brackets:
2588	2562
2589	2563	=end original
2590	2564
2591	2565	タイプ可能な個々の文字を指定する最も簡単な方法は、次のように
2592	2566	かっこで囲むことです:
2593	2567
2594	2568	/(?[ [a] + [b] ])/
2595	2569
2596	2570	=begin original
2597	2571
2598	2572	(This is the same thing as C<[ab]>.) You could also have said the
2599	2573	equivalent:
2600	2574
2601	2575	=end original
2602	2576
2603	2577	(これはC<[ab]>と同じことです)。
2604	2578	同じことを言うこともできます:
2605	2579
2606	2580	/(?[[ a b ]])/
2607	2581
2608	2582	=begin original
2609	2583
2610	2584	(You can, of course, specify single characters by using, C<\x{...}>,
2611	2585	C<\N{...}>, etc.)
2612	2586
2613	2587	=end original
2614	2588
2615	2589	(もちろん、C<\x{...}> や C<\N{...}> などを使用して 1 文字を
2616	2590	指定することもできます。)
2617	2591
2618	2592	=begin original
2619	2593
2620	2594	This last example shows the use of this construct to specify an ordinary
2621	2595	bracketed character class without additional set operations. Note the
2622	2596	white space within it. This is allowed because C<E<sol>xx> is
2623	2597	automatically turned on within this construct.
2624	2598
2625	2599	=end original
2626	2600
2627	2601	この最後の例では、この構文を使用して、追加の集合操作なしで
2628	2602	通常の大かっこ文字クラスを指定する方法を示しています。
2629	2603	この中に空白があることに注意してください。
2630	2604	C<E<sol>xx> は、この構文の内側で自動的に有効になるのでこれが許されます。
2631	2605
2632	2606	=begin original
2633	2607
2634	2608	All the other escapes accepted by normal bracketed character classes are
2635	2609	accepted here as well.
2636	2610
2637	2611	=end original
2638	2612
2639	2613	通常の大かっこ文字クラスで受け入れられる他のエスケープは
2640	2614	すべてここでも受け入れられます。
2641	2615
2642	2616	=begin original
2643	2617
2644	2618	Because this construct compiles under
2645	2619	L<C<use re 'strict>\|re/'strict' mode>, unrecognized escapes that
2646	2620	generate warnings in normal classes are fatal errors here, as well as
2647	2621	all other warnings from these class elements, as well as some
2648	2622	practices that don't currently warn outside C<re 'strict'>. For example
2649	2623	you cannot say
2650	2624
2651	2625	=end original
2652	2626
2653	2627	この構文は L<C<use re 'strict>\|re/'strict' mode> の下でコンパイルされるので、
2654	2628	通常のクラスで警告を生成する
2655	2629	認識されないエスケープはここでは致命的なエラーです;
2656	2630	これらのクラス要素からのその他すべての警告も同様で、
2657	2631	C<re 'strict'> の外側では、現在警告していないいくつかのプラクティスも
2658	2632	同様です。
2659	2633	例えば次のようにはできません:
2660	2634
2661	2635	/(?[ [ \xF ] ])/ # Syntax error!
2662	2636
2663	2637	=begin original
2664	2638
2665	2639	You have to have two hex digits after a braceless C<\x> (use a leading
2666	2640	zero to make two). These restrictions are to lower the incidence of
2667	2641	typos causing the class to not match what you thought it would.
2668	2642
2669	2643	=end original
2670	2644
2671	2645	中かっこのない C<\x> の後には 2 桁の 16 進数が必要です(2 桁にするには
2672	2646	先頭の 0 を使用します)。
2673	2647	これらの制限は、クラスが想定したものと一致しない原因となる
2674	2648	タイプミスの発生を減らすためです。
2675	2649
2676	2650	=begin original
2677	2651
2678	2652	If a regular bracketed character class contains a C<\p{}> or C<\P{}> and
2679	2653	is matched against a non-Unicode code point, a warning may be
2680	2654	raised, as the result is not Unicode-defined. No such warning will come
2681	2655	when using this extended form.
2682	2656
2683	2657	=end original
2684	2658
2685	2659	通常の大かっこ文字クラスに C<\p{}> や C<\P{}> が含まれていて、
2686	2660	非 Unicode 符号位置に対してマッチングした場合、
2687	2661	結果は Unicode で定義されていないので、警告が発生します。
2688	2662	このような警告は、拡張形式を使った場合は発生しません。
2689	2663
2690	2664	=begin original
2691	2665
2692	2666	The final difference between regular bracketed character classes and
2693	2667	these, is that it is not possible to get these to match a
2694	2668	multi-character fold. Thus,
2695	2669
2696	2670	=end original
2697	2671
2698	2672	通常の大かっこ文字クラスとこれらのクラスの最後の違いは、
2699	2673	これらを複数文字畳み込みにマッチングさせることができないということです。
2700	2674	従って:
2701	2675
2702	2676	/(?[ [\xDF] ])/iu
2703	2677
2704	2678	=begin original
2705	2679
2706	2680	does not match the string C<ss>.
2707	2681
2708	2682	=end original
2709	2683
2710	2684	は文字列 C<ss> と一致しません。
2711	2685
2712	2686	=begin original
2713	2687
2714	2688	You don't have to enclose POSIX class names inside double brackets,
2715	2689	hence both of the following work:
2716	2690
2717	2691	=end original
2718	2692
2719	2693	POSIX クラス名を二重かっこで囲む必要はありません;
2720	2694	そのため、以下の両方とも動作します:
2721	2695
2722	2696	/(?[ [:word:] - [:lower:] ])/
2723	2697	/(?[ [[:word:]] - [[:lower:]] ])/
2724	2698
2725	2699	=begin original
2726	2700
2727	2701	Any contained POSIX character classes, including things like C<\w> and C<\D>
2728	2702	respect the C<E<sol>a> (and C<E<sol>aa>) modifiers.
2729	2703
2730	2704	=end original
2731	2705
2732	2706	C<\w> や C<\D> などの POSIX 文字クラスは、C<E<sol>a>
2733	2707	(および C<E<sol>aa> )修飾子を尊重します。
2734	2708
2735	2709	=begin original
2736	2710
2737		~~Note that~~ C<< (?[ ]) >> is a regex-compile-time construct. Any attempt
	2711	C<< (?[ ]) >> is a regex-compile-time construct. Any attempt to use
2738		~~to use~~ something which isn't knowable at the time the containing regular
	2712	something which isn't knowable at the time the containing regular
2739	2713	expression is compiled is a fatal error. In practice, this means
2740	2714	just three limitations:
2741	2715
2742	2716	=end original
2743	2717
2744		C<< (?[ ]) >> はコンパイル時正規表現構文で~~あることに注意してください~~。
	2718	C<< (?[ ]) >> はコンパイル時正規表現構文です。
2745	2719	正規表現を含むコンパイル時に未知のものを使用しようとすると、
2746	2720	致命的なエラーになります。
2747	2721	実際には、これは三つの制限を意味します:
2748	2722
2749	2723	=over 4
2750	2724
2751	2725	=item 1
2752	2726
2753	2727	=begin original
2754	2728
2755	2729	When compiled within the scope of C<use locale> (or the C<E<sol>l> regex
2756	2730	modifier), this construct assumes that the execution-time locale will be
2757	2731	a UTF-8 one, and the generated pattern always uses Unicode rules. What
2758	2732	gets matched or not thus isn't dependent on the actual runtime locale, so
2759	2733	tainting is not enabled. But a C<locale> category warning is raised
2760	2734	if the runtime locale turns out to not be UTF-8.
2761	2735
2762	2736	=end original
2763	2737
2764	2738	C<use locale> (または C<E<sol>l> 正規表現修飾子)の
2765	2739	スコープ内でコンパイルされると、この構文は実行時ロケールが
2766	2740	UTF-8 のものであることを仮定し、
2767	2741	生成されたパターンは常に Unicode の規則を使います。
2768	2742	従ってマッチングするかどうかは実際の実行時ロケールには関係なく、
2769	2743	汚染チェックモードは有効になりません。
2770	2744	しかし、実行時ロケールが UTF-8 以外になると、
2771	2745	C<locale> カテゴリの警告が発生します。
2772	2746
2773	2747	=item 2
2774	2748
2775	2749	=begin original
2776	2750
2777	2751	Any
2778	2752	L<user-defined property\|perlunicode/"User-Defined Character Properties">
2779	2753	used must be already defined by the time the regular expression is
2780	2754	compiled (but note that this construct can be used instead of such
2781	2755	properties).
2782	2756
2783	2757	=end original
2784	2758
2785	2759	使用される
2786	2760	L<ユーザー定義特性\|perlunicode/"User-Defined Character Properties"> は、
2787	2761	正規表現がコンパイルされるときにすでに定義されている必要があります
2788	2762	(ただし、このような特性の代わりにこの構文を使用することもできます)。
2789	2763
2790	2764	=item 3
2791	2765
2792	2766	=begin original
2793	2767
2794	2768	A regular expression that otherwise would compile
2795	2769	using C<E<sol>d> rules, and which uses this construct will instead
2796	2770	use C<E<sol>u>. Thus this construct tells Perl that you don't want
2797	2771	C<E<sol>d> rules for the entire regular expression containing it.
2798	2772
2799	2773	=end original
2800	2774
2801	2775	C<E<sol>d> 規則を使用してコンパイルされ、この構文を使用する正規表現は、
2802	2776	代わりに C<E<sol>u> を使用します。
2803	2777	したがって、この構文は、C<E<sol>d> 規則が含まれている
2804	2778	正規表現全体に対して C<E<sol>d> 規則が必要ないことを Perl に通知します。
2805	2779
2806	2780	=back
2807	2781
2808	2782	=begin original
2809	2783
2810	2784	Note that skipping white space applies only to the interior of this
2811	2785	construct. There must not be any space between any of the characters
2812	2786	that form the initial C<(?[>. Nor may there be space between the
2813	2787	closing C<])> characters.
2814	2788
2815	2789	=end original
2816	2790
2817	2791	空白のスキップは、この構造体の内部にのみ適用されることに注意してください。
2818	2792	最初の C<(?[> を形成する文字の間に空白を入れることはできません。
2819	2793	また、終わりの C<])> 文字の間に空白を入れることもできません。
2820	2794
2821	2795	=begin original
2822	2796
2823	2797	Just as in all regular expressions, the pattern can be built up by
2824	2798	including variables that are interpolated at regex compilation time.
2825		But ~~its~~ best to co~~mpil~~e each ~~sub-c~~omponent.
	2799	Care must be taken to ensure that you are getting what you expect. For
	2800	example:
2826	2801
2827	2802	=end original
2828	2803
2829	2804	すべての正規表現と同様に、正規表現コンパイル時に補完される変数を
2830	2805	含めることでパターンを構築できます。
2831		~~しかし部分要素毎にコンパイルする~~のが最善です。
	2806	期待どおりの結果が得られるように注意が必要です。
2832
2833		my $thai_or_lao = qr/(?[ \p{Thai} + \p{Lao} ])/;
2834		my $lower = qr/(?[ \p{Lower} + \p{Digit} ])/;
2835
2836		=begin original
2837
2838		When these are embedded in another pattern, what they match does not
2839		change, regardless of parenthesization or what modifiers are in effect
2840		in that outer pattern. If you fail to compile the subcomponents, you
2841		can get some nasty surprises. For example:
2842
2843		=end original
2844
2845		これらが別のパターンに埋め込まれている場合、親子関係やその外側のパターンで
2846		有効な修飾子に関係なく、一致するものは変わりません。
2847		部分要素をコンパイルするのに失敗すると、扱いにくい驚きを受けることに
2848		なるかもしれません。
2849	2807	例えば:
2850	2808
2851	2809	my $thai_or_lao = '\p{Thai} + \p{Lao}';
2852	2810	...
2853	2811	qr/(?[ \p{Digit} & $thai_or_lao ])/;
2854	2812
2855	2813	=begin original
2856	2814
2857	2815	compiles to
2858	2816
2859	2817	=end original
2860	2818
2861	2819	これは次のようにコンパイルされます:
2862	2820
2863	2821	qr/(?[ \p{Digit} & \p{Thai} + \p{Lao} ])/;
2864	2822
2865	2823	=begin original
2866	2824
2867		But this does not have the effect that someone reading the ~~sour~~ce code
	2825	But this does not have the effect that someone reading the code would
2868		~~would~~ likely expect, as the intersection applies just to C<\p{Thai}>,
	2826	likely expect, as the intersection applies just to C<\p{Thai}>,
2869		excluding the Laotian. Its best to c~~ompile~~ the ~~subc~~o~~mpon~~e~~nts,~~ but you
	2827	excluding the Laotian. Pitfalls like this can be avoided by
2870		~~could also~~ parenthesize the component pieces:
	2828	parenthesizing the component pieces:
2871	2829
2872	2830	=end original
2873	2831
2874		しかし、これは、~~ソース~~コードを読んでいる人が期待するような効果はありません;
	2832	しかし、これは、コードを読んでいる人が期待するような効果はありません;
2875	2833	なぜなら、この交差は C<\p{Thai}> だけに適用され、ラオス語には
2876	2834	適用されないからです。
2877		~~部分要素毎に~~コン~~パイルするのが最善~~ですが、
	2835	このような落とし穴は、コンポーネントをかっこで囲むことで回避できます:
2878		要素をかっこで囲むことでも回避できます:
2879	2836
2880	2837	my $thai_or_lao = '( \p{Thai} + \p{Lao} )';
2881	2838
2882	2839	=begin original
2883	2840
2884	2841	But any modifiers will still apply to all the components:
2885	2842
2886	2843	=end original
2887	2844
2888	2845	ただし、修飾子はすべてのコンポーネントに適用されます:
2889	2846
2890	2847	my $lower = '\p{Lower} + \p{Digit}';
2891	2848	qr/(?[ \p{Greek} & $lower ])/i;
2892	2849
2893	2850	=begin original
2894	2851
2895		matches upper case things. So ju~~st,~~ compile ~~the~~ su~~bcom~~ponents, as
	2852	matches upper case things. You can avoid surprises by making the
2896		i~~llu~~strated above.
	2853	components into instances of this construct by compiling them:
2897	2854
2898	2855	=end original
2899	2856
2900	2857	これは大文字のものと一致します。
2901		~~従って、既に示したように、単に部分要素~~をコンパイルして~~ください。~~
	2858	コンポーネントをコンパイルしてこの構文の実体にすることで、
	2859	予期せぬ事態を避けることができます:
	2860
	2861	my $thai_or_lao = qr/(?[ \p{Thai} + \p{Lao} ])/;
	2862	my $lower = qr/(?[ \p{Lower} + \p{Digit} ])/;
	2863
	2864	=begin original
	2865
	2866	When these are embedded in another pattern, what they match does not
	2867	change, regardless of parenthesization or what modifiers are in effect
	2868	in that outer pattern.
	2869
	2870	=end original
	2871
	2872	これらが別のパターンに埋め込まれている場合、親子関係やその外側のパターンで
	2873	有効な修飾子に関係なく、一致するものは変わりません。
2902	2874
2903	2875	=begin original
2904	2876
2905	2877	Due to the way that Perl parses things, your parentheses and brackets
2906	2878	may need to be balanced, even including comments. If you run into any
2907	2879	examples, please send them to C<perlbug@perl.org>, so that we can have a
2908	2880	concrete example for this man page.
2909	2881
2910	2882	=end original
2911	2883
2912	2884	Perl の構文解析方法によっては、コメントを含めてもかっこと大かっこの
2913	2885	バランスを取る必要がある場合があります。
2914	2886	もし何か例を見つけたら、C<perlbug@perl.org> まで送ってください。
2915	2887	そうすれば、この man ページの具体的な例を得ることができます。
2916	2888
2917	2889	=begin original
2918	2890
2919	2891	We may change it so that things that remain legal uses in normal bracketed
2920	2892	character classes might become illegal within this experimental
2921	2893	construct. One proposal, for example, is to forbid adjacent uses of the
2922	2894	same character, as in C<(?[ [aa] ])>. The motivation for such a change
2923	2895	is that this usage is likely a typo, as the second "a" adds nothing.
2924	2896
2925	2897	=end original
2926	2898
2927	2899	たとえば、C<(?[ [aa] ])> のように、同じ文字を隣接して使用すること
2928	2900	を禁止することが提案されています。
2929	2901	このような変更の動機は、2 番目の "a" は何も追加しないので、この使用は
2930	2902	タイプミスである可能性が高いということです。
2931	2903
2932	2904	=begin meta
2933	2905
2934	2906	Translate: SHIRAKATA Kentaro <argrath@ub32.org> (5.10.1-)
2935	2907	Status: completed
2936	2908
2937	2909	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association