perlrecharclass 5.36.0 と 5.22.1 の差分

1	1
2		=encoding u~~tf8~~
	2	=encoding euc-jp
3	3
4	4	=head1 NAME
5	5	X<character class>
6	6
7	7	=begin original
8	8
9	9	perlrecharclass - Perl Regular Expression Character Classes
10	10
11	11	=end original
12	12
13	13	perlrecharclass - Perl 正規表現文字クラス
14	14
15	15	=head1 DESCRIPTION
16	16
17	17	=begin original
18	18
19	19	The top level documentation about Perl regular expressions
20	20	is found in L<perlre>.
21	21
22	22	=end original
23	23
24	24	Perl 正規表現に関する最上位文書は L<perlre> です。
25	25
26	26	=begin original
27	27
28	28	This manual page discusses the syntax and use of character
29	29	classes in Perl regular expressions.
30	30
31	31	=end original
32	32
33	33	このマニュアルページは Perl 正規表現の文字クラスの文法と使用法について
34	34	議論します。
35	35
36	36	=begin original
37	37
38	38	A character class is a way of denoting a set of characters
39	39	in such a way that one character of the set is matched.
40	40	It's important to remember that: matching a character class
41	41	consumes exactly one character in the source string. (The source
42	42	string is the string the regular expression is matched against.)
43	43
44	44	=end original
45	45
46	46	文字クラスは、集合の中の一文字がマッチングするというような方法で、
47	47	文字の集合を指定するための方法です。
48	48	次のことを覚えておくことは重要です: 文字集合はソース文字列の中から正確に
49	49	一文字だけを消費します。
50	50	(ソース文字列とは正規表現がマッチングしようとしている文字列です。)
51	51
52	52	=begin original
53	53
54	54	There are three types of character classes in Perl regular
55	55	expressions: the dot, backslash sequences, and the form enclosed in square
56	56	brackets. Keep in mind, though, that often the term "character class" is used
57	57	to mean just the bracketed form. Certainly, most Perl documentation does that.
58	58
59	59	=end original
60	60
61	61	Perl 正規表現には 3 種類の文字クラスがあります: ドット、
62	62	逆スラッシュシーケンス、大かっこで囲まれた形式です。
63	63	しかし、「文字クラス」という用語はしばしば大かっこ形式だけを意味するために
64	64	使われることに注意してください。
65	65	確かに、ほとんどの Perl 文書ではそうなっています。
66	66
67	67	=head2 The dot
68	68
69	69	(ドット)
70	70
71	71	=begin original
72	72
73	73	The dot (or period), C<.> is probably the most used, and certainly
74	74	the most well-known character class. By default, a dot matches any
75	75	character, except for the newline. That default can be changed to
76		add matching the newline by using the I<single line> modifier:
	76	add matching the newline by using the I<single line> modifier: either
77	77	for the entire regular expression with the C</s> modifier, or
78		locally with C<(?s)> (~~and~~ e~~ven~~ ~~glo~~bal~~ly wit~~hin the scope of
	78	locally with C<(?s)>. (The C<L</\N>> backslash sequence, described
79		L<C<use re '/s'>\|re/'E<sol>flags' mode>). (The C<L</\N>> backslash
80		sequence, described
81	79	below, matches any character except newline without regard to the
82	80	I<single line> modifier.)
83	81
84	82	=end original
85	83
86	84	ドット (またはピリオド) C<.> はおそらくもっともよく使われ、そして確実に
87	85	もっともよく知られている文字クラスです。
88	86	デフォルトでは、ドットは改行を除く任意の文字にマッチングします。
89	87	このデフォルトは I<単一行> 修飾子を使うことで改行にもマッチングするように
90	88	変更されます: 正規表現全体に対して C</s> 修飾子を使うか、ローカルには
91		C<(?s)> を使います
	89	C<(?s)> を使います。
92		(そしてグローバルに L<C<use re '/s'>\|re/'E<sol>flags' mode> の
93		スコープ内の場合でもそうです)。
94	90	(後述する C<L</\N>> 逆スラッシュシーケンスでは、I<単一行> 修飾子に
95	91	関わりなく改行以外の任意の文字にマッチングします。)
96	92
97	93	=begin original
98	94
99	95	Here are some examples:
100	96
101	97	=end original
102	98
103	99	以下は例です:
104	100
105	101	=begin original
106	102
107	103	"a" =~ /./ # Match
108	104	"." =~ /./ # Match
109	105	"" =~ /./ # No match (dot has to match a character)
110	106	"\n" =~ /./ # No match (dot does not match a newline)
111	107	"\n" =~ /./s # Match (global 'single line' modifier)
112	108	"\n" =~ /(?s:.)/ # Match (local 'single line' modifier)
113	109	"ab" =~ /^.$/ # No match (dot matches one character)
114	110
115	111	=end original
116	112
117	113	"a" =~ /./ # マッチングする
118	114	"." =~ /./ # マッチングする
119	115	"" =~ /./ # マッチングしない (ドットは文字にマッチングする必要がある)
120	116	"\n" =~ /./ # マッチングしない (ドットは改行にはマッチングしない)
121	117	"\n" =~ /./s # マッチングする (グローバル「単一行」修飾子)
122	118	"\n" =~ /(?s:.)/ # マッチングする (ローカル「単一行」修飾子)
123	119	"ab" =~ /^.$/ # マッチングしない (ドットは一文字にマッチングする)
124	120
125	121	=head2 Backslash sequences
126	122	X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\p> X<\P>
127	123	X<\N> X<\v> X<\V> X<\h> X<\H>
128	124	X<word> X<whitespace>
129	125
130	126	(逆スラッシュシーケンス)
131	127
132	128	=begin original
133	129
134	130	A backslash sequence is a sequence of characters, the first one of which is a
135	131	backslash. Perl ascribes special meaning to many such sequences, and some of
136	132	these are character classes. That is, they match a single character each,
137	133	provided that the character belongs to the specific set of characters defined
138	134	by the sequence.
139	135
140	136	=end original
141	137
142	138	逆スラッシュシーケンスは、最初がバックスラッシュの文字並びです。
143	139	Perl はそのような並びの多くに特別な意味を持たせていて、
144	140	その一部は文字クラスです。
145	141	つまり、それらはそれぞれ並びによって定義されている特定の文字の集合に
146	142	帰属する一文字にマッチングします。
147	143
148	144	=begin original
149	145
150	146	Here's a list of the backslash sequences that are character classes. They
151	147	are discussed in more detail below. (For the backslash sequences that aren't
152	148	character classes, see L<perlrebackslash>.)
153	149
154	150	=end original
155	151
156	152	以下は文字クラスの逆スラッシュシーケンスの一覧です。
157	153	以下でさらに詳細に議論します。
158	154	(文字クラスではない逆スラッシュシーケンスについては、L<perlrebackslash> を
159	155	参照してください。)
160	156
161	157	=begin original
162	158
163	159	\d Match a decimal digit character.
164	160	\D Match a non-decimal-digit character.
165	161	\w Match a "word" character.
166	162	\W Match a non-"word" character.
167	163	\s Match a whitespace character.
168	164	\S Match a non-whitespace character.
169	165	\h Match a horizontal whitespace character.
170	166	\H Match a character that isn't horizontal whitespace.
171	167	\v Match a vertical whitespace character.
172	168	\V Match a character that isn't vertical whitespace.
173	169	\N Match a character that isn't a newline.
174	170	\pP, \p{Prop} Match a character that has the given Unicode property.
175	171	\PP, \P{Prop} Match a character that doesn't have the Unicode property
176	172
177	173	=end original
178	174
179	175	\d 10 進数字にマッチング。
180	176	\D 非 10 進数字にマッチング。
181	177	\w 「単語」文字にマッチング。
182	178	\W 非「単語」文字にマッチング。
183	179	\s 空白文字にマッチング。
184	180	\S 非空白文字にマッチング。
185	181	\h 水平空白文字にマッチング。
186	182	\H 水平空白でない文字にマッチング。
187	183	\v 垂直空白文字にマッチング。
188	184	\V 垂直空白でない文字にマッチング。
189	185	\N 改行以外の文字にマッチング。
190	186	\pP, \p{Prop} 指定された Unicode 特性を持つ文字にマッチング。
191	187	\PP, \P{Prop} 指定された Unicode 特性を持たない文字にマッチング。
192	188
193	189	=head3 \N
194	190
195	191	=begin original
196	192
197	193	C<\N>, available starting in v5.12, like the dot, matches any
198	194	character that is not a newline. The difference is that C<\N> is not influenced
199	195	by the I<single line> regular expression modifier (see L</The dot> above). Note
200	196	that the form C<\N{...}> may mean something completely different. When the
201	197	C<{...}> is a L<quantifier\|perlre/Quantifiers>, it means to match a non-newline
202	198	character that many times. For example, C<\N{3}> means to match 3
203	199	non-newlines; C<\N{5,}> means to match 5 or more non-newlines. But if C<{...}>
204	200	is not a legal quantifier, it is presumed to be a named character. See
205	201	L<charnames> for those. For example, none of C<\N{COLON}>, C<\N{4F}>, and
206	202	C<\N{F4}> contain legal quantifiers, so Perl will try to find characters whose
207	203	names are respectively C<COLON>, C<4F>, and C<F4>.
208	204
209	205	=end original
210	206
211	207	v5.12 から利用可能な C<\N> は、ドットのように、
212	208	改行以外の任意の文字にマッチングします。
213	209	違いは、C<\N> は I<単一行> 正規表現修飾子の影響を受けないことです
214	210	(上述の L</The dot> 参照)。
215	211	C<\N{...}> 型式は何か全く違うものを意味するかも知れないことに
216	212	注意してください。
217	213	C<{...}> が L<量指定子\|perlre/Quantifiers> なら、これは指定された回数の
218	214	非改行文字にマッチングします。
219	215	例えば、C<\N{3}> は三つの非改行にマッチングします;
220	216	C<\N{5,}> は五つ以上の非改行にマッチングします。
221	217	しかし、C<{...}> が有効な量指定子でない場合、これは名前付き文字と
222	218	推定されます。
223	219	これについては L<charnames> を参照してください。
224	220	例えば、C<\N{COLON}>, C<\N{4F}>, C<\N{F4}> はどれも有効な
225	221	量指定子ではないので、Perl はそれぞれ C<COLON>, C<4F>, C<F4> という名前の
226	222	文字を探そうとします。
227	223
228	224	=head3 Digits
229	225
230	226	(数字)
231	227
232	228	=begin original
233	229
234	230	C<\d> matches a single character considered to be a decimal I<digit>.
235	231	If the C</a> regular expression modifier is in effect, it matches [0-9].
236	232	Otherwise, it
237	233	matches anything that is matched by C<\p{Digit}>, which includes [0-9].
238	234	(An unlikely possible exception is that under locale matching rules, the
239	235	current locale might not have C<[0-9]> matched by C<\d>, and/or might match
240	236	other characters whose code point is less than 256. The only such locale
241	237	definitions that are legal would be to match C<[0-9]> plus another set of
242	238	10 consecutive digit characters; anything else would be in violation of
243	239	the C language standard, but Perl doesn't currently assume anything in
244	240	regard to this.)
245	241
246	242	=end original
247	243
248	244	C<\d> は 10 進 I<数字> と考えられる単一の文字にマッチングします。
249	245	C</a> 正規表現修飾子が有効の場合、これは [0-9] にマッチングします。
250	246	さもなければ、これは C<[0-9]> を含む、C<\p{Digit}> にマッチングするものに
251	247	マッチングします。
252	248	(ありそうもない例外はロケールマッチングの下で、現在のロケールが
253	249	C<\d> にマッチングする [0-9] がないか、
254	250	符号位置が 256 未満の他の文字にマッチングすることです。
255	251	唯一正当なロケール定義は、C<[0-9]> に加えてもう一つの 10 の連続した
256	252	数字の集合にマッチングするもので、
257	253	それ以外は C 言語標準に違反していますが、
258	254	Perl は今のところこれに関して何も仮定しません。)
259	255
260	256	=begin original
261	257
262	258	What this means is that unless the C</a> modifier is in effect C<\d> not
263	259	only matches the digits '0' - '9', but also Arabic, Devanagari, and
264	260	digits from other languages. This may cause some confusion, and some
265	261	security issues.
266	262
267	263	=end original
268	264
269	265	これが意味することは、C</a> 修飾子が有効でない限り、C<\d> は数字
270	266	'0' - '9' だけでなく、アラビア文字、デバナーガリ文字、およびその他の言語の
271	267	数字もマッチングします。
272	268	これは混乱やセキュリティ問題を引き起こすことがあります。
273	269
274	270	=begin original
275	271
276	272	Some digits that C<\d> matches look like some of the [0-9] ones, but
277	273	have different values. For example, BENGALI DIGIT FOUR (U+09EA) looks
278		very much like an ASCII DIGIT EIGHT (U+0038), ~~and~~ ~~LEPCH~~A ~~DIGIT~~ ~~SIX~~
	274	very much like an ASCII DIGIT EIGHT (U+0038). An application that
279		(U+1C46) looks very much like an ASCII DIGIT FIVE (U+0035). An
280		application that
281	275	is expecting only the ASCII digits might be misled, or if the match is
282	276	C<\d+>, the matched string might contain a mixture of digits from
283	277	different writing systems that look like they signify a number different
284	278	than they actually do. L<Unicode::UCD/num()> can
285	279	be used to safely
286	280	calculate the value, returning C<undef> if the input string contains
287		such a mixture. ~~Otherwise, for example, a displayed price might be~~
	281	such a mixture.
288		deliberately different than it appears.
289	282
290	283	=end original
291	284
292	285	C<\d> にマッチングする数字には、[0-9] のように見えるけれども、
293	286	異なる値を持つものもあります。
294		例えば、BENGALI DIGIT FOUR (U+09EA) は ASCII DIGIT EIGHT (U+0038) に
	287	例えば、BENGALI DIGIT FOUR (U+09EA) は ASCII DIGIT EIGHT (U+0038) と
295		とてもよく似ていて、
296		LEPCHA DIGIT SIX (U+1C46) は ASCII DIGIT FIVE (U+0035) に
297	288	とてもよく似ています。
298	289	ASCII 数字のみを想定しているアプリケーションはミスリードされるかも知れず、
299	290	マッチングが C<\d+> の場合、
300	291	マッチングした文字列は、実際と異なる値を示しているように見える、
301	292	異なった書記体系からの数字が混ざったものかもしれません。
302	293	L<Unicode::UCD/num()> は値を安全に計算するのに使えます;
303	294	入力文字列がこのような混合を含んでいる場合は C<undef> を返します。
304		さもなければ、例えば、表示された価格は見た目と意図的に違うものに
305		なるかもしれません。
306	295
307	296	=begin original
308	297
309	298	What C<\p{Digit}> means (and hence C<\d> except under the C</a>
310	299	modifier) is C<\p{General_Category=Decimal_Number}>, or synonymously,
311	300	C<\p{General_Category=Digit}>. Starting with Unicode version 4.1, this
312	301	is the same set of characters matched by C<\p{Numeric_Type=Decimal}>.
313	302	But Unicode also has a different property with a similar name,
314	303	C<\p{Numeric_Type=Digit}>, which matches a completely different set of
315	304	characters. These characters are things such as C<CIRCLED DIGIT ONE>
316	305	or subscripts, or are from writing systems that lack all ten digits.
317	306
318	307	=end original
319	308
320	309	C<\p{Digit}> が意味するもの(つまり、C</a> 修飾子の下でない C<\d>)は、
321	310	C<\p{General_Category=Decimal_Number}>、または同義語として
322	311	C<\p{General_Category=Digit}> です。
323	312	Unicode バージョン 4.1 以降では、これは C<\p{Numeric_Type=Decimal}> に
324	313	マッチングする文字集合と同じです。
325	314	ただし、Unicode には、C<\p{Numeric_Type=Digit}> という類似した名前を持つ
326	315	別の特性もあります; これは完全に異なる文字集合とマッチングします。
327	316	これらの文字は、C<CIRCLEED DIGIT ONE> や添字のようなものであるか、
328	317	10 の数字すべてが揃っていない書記体系からのものです。
329	318
330	319	=begin original
331	320
332	321	The design intent is for C<\d> to exactly match the set of characters
333	322	that can safely be used with "normal" big-endian positional decimal
334	323	syntax, where, for example 123 means one 'hundred', plus two 'tens',
335	324	plus three 'ones'. This positional notation does not necessarily apply
336	325	to characters that match the other type of "digit",
337	326	C<\p{Numeric_Type=Digit}>, and so C<\d> doesn't match them.
338	327
339	328	=end original
340	329
341	330	設計意図は、C<\d> が「通常の」ビッグエンディアンの
342	331	位置 10 進構文 (例えば、123 は一つの「100」に二つの「10」と三つの「1」を
343	332	加えたものを意味する) で安全に使用できる文字集合と
344		正確にマッチングするようにすることです。
	333	正確にマッチングするようにすることです;
345	334	この位置表記は、他のタイプの「digit」である C<\p{Numeric_Type=Digit}> に
346	335	マッチングする文字には必ずしも適用されないため、
347	336	C<\d> はこれらの文字にマッチングしません。
348	337
349	338	=begin original
350	339
351	340	The Tamil digits (U+0BE6 - U+0BEF) can also legally be
352	341	used in old-style Tamil numbers in which they would appear no more than
353	342	one in a row, separated by characters that mean "times 10", "times 100",
354		etc. (See L<https://www.unicode.org/notes/tn21>.)
	343	etc. (See L<http://www.unicode.org/notes/tn21>.)
355	344
356	345	=end original
357	346
358	347	タミル語の数字(U+0BE6-U+0BEF)は、古い様式のタミル語の
359	348	数字でも合法的に使用することができます;
360	349	この数字は、「×10」や「×100」などを意味する文字で区切られて、
361	350	1 回に一度にしか現れません。
362		(L<https://www.unicode.org/notes/tn21>を参照してください。)
	351	(L<http://www.unicode.org/notes/tn21>を参照してください)。
363	352
364	353	=begin original
365	354
366	355	Any character not matched by C<\d> is matched by C<\D>.
367	356
368	357	=end original
369	358
370	359	C<\d> にマッチングしない任意の文字は C<\D> にマッチングします。
371	360
372	361	=head3 Word characters
373	362
374	363	(単語文字)
375	364
376	365	=begin original
377	366
378	367	A C<\w> matches a single alphanumeric character (an alphabetic character, or a
379	368	decimal digit); or a connecting punctuation character, such as an
380	369	underscore ("_"); or a "mark" character (like some sort of accent) that
381	370	attaches to one of those. It does not match a whole word. To match a
382	371	whole word, use C<\w+>. This isn't the same thing as matching an
383	372	English word, but in the ASCII range it is the same as a string of
384	373	Perl-identifier characters.
385	374
386	375	=end original
387	376
388	377	C<\w> は単語全体ではなく、単一の英数字(つまり英字または数字)または
389	378	下線(C<_>) のような接続句読点
390	379	またはこれらの一つに付いている(ある種のアクセントのような)「マーク」文字に
391	380	マッチングします。
392	381	これは単語全体にはマッチングしません。
393	382	単語全体にマッチングするには、C<\w+> を使ってください。
394	383	これは英語の単語にマッチングするのと同じことではありませんが、
395	384	ASCII の範囲では、Perl の識別子文字の文字列と同じです。
396	385
397	386	=over
398	387
399	388	=item If the C</a> modifier is in effect ...
400	389
401	390	(C</a> 修飾子が有効なら ...)
402	391
403	392	=begin original
404	393
405	394	C<\w> matches the 63 characters [a-zA-Z0-9_].
406	395
407	396	=end original
408	397
409	398	C<\w> は 63 文字 [a-zA-Z0-9_] にマッチングします。
410	399
411	400	=item otherwise ...
412	401
413	402	(さもなければ ...)
414	403
415	404	=over
416	405
417	406	=item For code points above 255 ...
418	407
419	408	(256 以上の符号位置では ...)
420	409
421	410	=begin original
422	411
423	412	C<\w> matches the same as C<\p{Word}> matches in this range. That is,
424	413	it matches Thai letters, Greek letters, etc. This includes connector
425	414	punctuation (like the underscore) which connect two words together, or
426	415	diacritics, such as a C<COMBINING TILDE> and the modifier letters, which
427	416	are generally used to add auxiliary markings to letters.
428	417
429	418	=end original
430	419
431	420	C<\w> はこの範囲で C<\p{Word}> がマッチングするものと同じものに
432	421	マッチングします。
433	422	つまり、タイ文字、ギリシャ文字などです。
434	423	これには(下線のような)二つの単語を繋ぐ接続句読点、
435	424	C<COMBINING TILDE> や一般的に文字に追加のマークを付けるために
436	425	使われる修飾字のようなダイアクリティカルマークが含まれます。
437	426
438	427	=item For code points below 256 ...
439	428
440	429	(255 以下の符号位置では ...)
441	430
442	431	=over
443	432
444	433	=item if locale rules are in effect ...
445	434
446	435	(ロケール規則が有効なら ...)
447	436
448	437	=begin original
449	438
450	439	C<\w> matches the platform's native underscore character plus whatever
451	440	the locale considers to be alphanumeric.
452	441
453	442	=end original
454	443
455	444	C<\w> は、プラットフォームのネイティブな下線に加えてロケールが英数字と
456	445	考えるものにマッチングします。
457	446
458		=item if~~, instead,~~ Unicode rules are in effect ...
	447	=item if Unicode rules are in effect ...
459	448
460		(~~そうではなく、~~Unicode 規則が有効なら ...)
	449	(Unicode 規則が有効なら ...)
461	450
462	451	=begin original
463	452
464	453	C<\w> matches exactly what C<\p{Word}> matches.
465	454
466	455	=end original
467	456
468	457	C<\w> は C<\p{Word}> がマッチングするものと同じものにマッチングします。
469	458
470	459	=item otherwise ...
471	460
472	461	(さもなければ ...)
473	462
474	463	=begin original
475	464
476	465	C<\w> matches [a-zA-Z0-9_].
477	466
478	467	=end original
479	468
480	469	C<\w> は [a-zA-Z0-9_] にマッチングします。
481	470
482	471	=back
483	472
484	473	=back
485	474
486	475	=back
487	476
488	477	=begin original
489	478
490	479	Which rules apply are determined as described in L<perlre/Which character set modifier is in effect?>.
491	480
492	481	=end original
493	482
494	483	どの規則を適用するかは L<perlre/Which character set modifier is in effect?> で
495	484	記述されている方法で決定されます。
496	485
497	486	=begin original
498	487
499	488	There are a number of security issues with the full Unicode list of word
500	489	characters. See L<http://unicode.org/reports/tr36>.
501	490
502	491	=end original
503	492
504	493	完全な Unicode の単語文字の一覧には多くのセキュリティ問題があります。
505	494	L<http://unicode.org/reports/tr36> を参照してください。
506	495
507	496	=begin original
508	497
509	498	Also, for a somewhat finer-grained set of characters that are in programming
510	499	language identifiers beyond the ASCII range, you may wish to instead use the
511	500	more customized L</Unicode Properties>, C<\p{ID_Start}>,
512	501	C<\p{ID_Continue}>, C<\p{XID_Start}>, and C<\p{XID_Continue}>. See
513	502	L<http://unicode.org/reports/tr31>.
514	503
515	504	=end original
516	505
517	506	また、ASCII の範囲を超えたプログラミング言語識別子のための
518	507	より高精度の文字集合のためには、代わりによりカスタマイズされた
519	508	L<Unicode 特性\|/Unicode Properties>である
520	509	C<\p{ID_Start}>,
521	510	C<\p{ID_Continue}>, C<\p{XID_Start}>, and C<\p{XID_Continue}> を
522	511	使った方がよいでしょう。
523	512	L<http://unicode.org/reports/tr31> を参照してください。
524	513
525	514	=begin original
526	515
527	516	Any character not matched by C<\w> is matched by C<\W>.
528	517
529	518	=end original
530	519
531	520	C<\w> にマッチングしない任意の文字は C<\W> にマッチングします。
532	521
533	522	=head3 Whitespace
534	523
535	524	(空白)
536	525
537	526	=begin original
538	527
539	528	C<\s> matches any single character considered whitespace.
540	529
541	530	=end original
542	531
543	532	C<\s> は空白と考えられる単一の文字にマッチングします。
544	533
545	534	=over
546	535
547	536	=item If the C</a> modifier is in effect ...
548	537
549	538	(C</a> 修飾子が有効なら ...)
550	539
551	540	=begin original
552	541
553	542	In all Perl versions, C<\s> matches the 5 characters [\t\n\f\r ]; that
554	543	is, the horizontal tab,
555	544	the newline, the form feed, the carriage return, and the space.
556	545	Starting in Perl v5.18, it also matches the vertical tab, C<\cK>.
557	546	See note C<[1]> below for a discussion of this.
558	547
559	548	=end original
560	549
561	550	全ての Perl バージョンで、C<\s> は [\t\n\f\r ] の 5 文字にマッチングします;
562	551	つまり、水平タブ、改行、改頁、復帰、スペースです。
563	552	Perl 5.18 から、垂直タブ C<\cK> にもマッチングします。
564	553	ここでの議論については後述する C<[1]> を参照してください。
565	554
566	555	=item otherwise ...
567	556
568	557	(さもなければ ...)
569	558
570	559	=over
571	560
572	561	=item For code points above 255 ...
573	562
574	563	(256 以上の符号位置では ...)
575	564
576	565	=begin original
577	566
578	567	C<\s> matches exactly the code points above 255 shown with an "s" column
579	568	in the table below.
580	569
581	570	=end original
582	571
583	572	C<\s> は、後述する表の "s" の列で示されている、
584	573	255 を超える符号位置に正確にマッチングします。
585	574
586	575	=item For code points below 256 ...
587	576
588	577	(255 以下の符号位置では ...)
589	578
590	579	=over
591	580
592	581	=item if locale rules are in effect ...
593	582
594	583	(ロケール規則が有効なら ...)
595	584
596	585	=begin original
597	586
598	587	C<\s> matches whatever the locale considers to be whitespace.
599	588
600	589	=end original
601	590
602	591	C<\s> はロケールが空白だと考えるものにマッチングします。
603	592
604		=item if~~, instead,~~ Unicode rules are in effect ...
	593	=item if Unicode rules are in effect ...
605	594
606		(~~そうではなく、~~Unicode 規則が有効なら ...)
	595	(Unicode 規則が有効なら ...)
607	596
608	597	=begin original
609	598
610	599	C<\s> matches exactly the characters shown with an "s" column in the
611	600	table below.
612	601
613	602	=end original
614	603
615	604	C<\s> は正確に以下の表で "s" の列にある文字にマッチングします。
616	605
617	606	=item otherwise ...
618	607
619	608	(さもなければ ...)
620	609
621	610	=begin original
622	611
623	612	C<\s> matches [\t\n\f\r ] and, starting in Perl
624	613	v5.18, the vertical tab, C<\cK>.
625	614	(See note C<[1]> below for a discussion of this.)
626	615	Note that this list doesn't include the non-breaking space.
627	616
628	617	=end original
629	618
630	619	C<\s> は [\t\n\f\r ] にマッチングし、Perl v5.18 から、
631	620	垂直タブ C<\cK> にもマッチングします。
632	621	(これの議論については後述する C<[1]> を参照してください。)
633	622	この一覧にはノーブレークスペースが含まれていないことに注意してください。
634	623
635	624	=back
636	625
637	626	=back
638	627
639	628	=back
640	629
641	630	=begin original
642	631
643	632	Which rules apply are determined as described in L<perlre/Which character set modifier is in effect?>.
644	633
645	634	=end original
646	635
647	636	どの規則を適用するかは L<perlre/Which character set modifier is in effect?> で
648	637	記述されている方法で決定されます。
649	638
650	639	=begin original
651	640
652	641	Any character not matched by C<\s> is matched by C<\S>.
653	642
654	643	=end original
655	644
656	645	C<\s> にマッチングしない任意の文字は C<\S> にマッチングします。
657	646
658	647	=begin original
659	648
660	649	C<\h> matches any character considered horizontal whitespace;
661	650	this includes the platform's space and tab characters and several others
662	651	listed in the table below. C<\H> matches any character
663	652	not considered horizontal whitespace. They use the platform's native
664	653	character set, and do not consider any locale that may otherwise be in
665	654	use.
666	655
667	656	=end original
668	657
669	658	C<\h> は水平空白と考えられる任意の文字にマッチングします; これは
670	659	プラットフォームのスペースとタブ文字および以下の表に上げられている
671	660	いくつかのその他の文字です。
672	661	C<\H> は水平空白と考えられない文字にマッチングします。
673	662	これらはプラットフォームのネイティブな文字集合を使い、
674	663	他の場所では有効なロケールを考慮しません。
675	664
676	665	=begin original
677	666
678	667	C<\v> matches any character considered vertical whitespace;
679	668	this includes the platform's carriage return and line feed characters (newline)
680	669	plus several other characters, all listed in the table below.
681	670	C<\V> matches any character not considered vertical whitespace.
682	671	They use the platform's native character set, and do not consider any
683	672	locale that may otherwise be in use.
684	673
685	674	=end original
686	675
687	676	C<\v> は垂直空白と考えられる任意の文字にマッチングします; これは
688	677	プラットフォームの復帰と行送り(改行)文字に加えていくつかのその他の文字です;
689	678	全ては以下の表に挙げられています。
690	679	C<\V> は垂直空白と考えられない任意の文字にマッチングします。
691	680	これらはプラットフォームのネイティブな文字集合を使い、
692	681	他の場所では有効なロケールを考慮しません。
693	682
694	683	=begin original
695	684
696	685	C<\R> matches anything that can be considered a newline under Unicode
697	686	rules. It can match a multi-character sequence. It cannot be used inside
698	687	a bracketed character class; use C<\v> instead (vertical whitespace).
699	688	It uses the platform's
700	689	native character set, and does not consider any locale that may
701	690	otherwise be in use.
702	691	Details are discussed in L<perlrebackslash>.
703	692
704	693	=end original
705	694
706	695	C<\R> は Unicode の規則で改行と考えられるものにマッチングします。
707	696	複数文字の並びにマッチングすることもあります。
708	697	従って、大かっこ文字クラスの中では使えません; 代わりに C<\v> (垂直空白) を
709	698	使ってください。
710	699	これらはプラットフォームのネイティブな文字集合を使い、
711	700	他の場所では有効なロケールを考慮しません。
712	701	詳細は L<perlrebackslash> で議論しています。
713	702
714	703	=begin original
715	704
716	705	Note that unlike C<\s> (and C<\d> and C<\w>), C<\h> and C<\v> always match
717	706	the same characters, without regard to other factors, such as the active
718	707	locale or whether the source string is in UTF-8 format.
719	708
720	709	=end original
721	710
722	711	C<\s> (および C<\d> と C<\w>) と違って、C<\h> および C<\v> は、現在の
723	712	ロケールやソース文字列が UTF-8 形式かどうかといった他の要素に関わらず
724	713	同じ文字にマッチングします。
725	714
726	715	=begin original
727	716
728	717	One might think that C<\s> is equivalent to C<[\h\v]>. This is indeed true
729	718	starting in Perl v5.18, but prior to that, the sole difference was that the
730	719	vertical tab (C<"\cK">) was not matched by C<\s>.
731	720
732	721	=end original
733	722
734	723	C<\s> が C<[\h\v]> と等価と考える人がいるかもしれません。
735	724	Perl 5.18 からはもちろん正しいです; しかしそれより前では、
736	725	唯一の違いは、垂直タブ (C<"\xcK">) は C<\s> にマッチングしないということです。
737	726
738	727	=begin original
739	728
740	729	The following table is a complete listing of characters matched by
741		C<\s>, C<\h> and C<\v> as of Unicode 14.0.
	730	C<\s>, C<\h> and C<\v> as of Unicode 6.3.
742	731
743	732	=end original
744	733
745		以下の表は Unicode 14.0 現在で C<\s>, C<\h>, C<\v> にマッチングする文字の
	734	以下の表は Unicode 6.3 現在で C<\s>, C<\h>, C<\v> にマッチングする文字の
746	735	完全な一覧です。
747	736
748	737	=begin original
749	738
750	739	The first column gives the Unicode code point of the character (in hex format),
751	740	the second column gives the (Unicode) name. The third column indicates
752	741	by which class(es) the character is matched (assuming no locale is in
753	742	effect that changes the C<\s> matching).
754	743
755	744	=end original
756	745
757	746	最初の列は文字の Unicode 符号位置(16 進形式)、2 番目の列は (Unicode の)
758	747	名前です。
759	748	3 番目の列はどのクラスにマッチングするかを示しています
760	749	(C<\s> のマッチングを変更するようなロケールが
761	750	有効でないことを仮定しています)。
762	751
763	752	0x0009 CHARACTER TABULATION h s
764	753	0x000a LINE FEED (LF) vs
765	754	0x000b LINE TABULATION vs [1]
766	755	0x000c FORM FEED (FF) vs
767	756	0x000d CARRIAGE RETURN (CR) vs
768	757	0x0020 SPACE h s
769	758	0x0085 NEXT LINE (NEL) vs [2]
770	759	0x00a0 NO-BREAK SPACE h s [2]
771	760	0x1680 OGHAM SPACE MARK h s
772	761	0x2000 EN QUAD h s
773	762	0x2001 EM QUAD h s
774	763	0x2002 EN SPACE h s
775	764	0x2003 EM SPACE h s
776	765	0x2004 THREE-PER-EM SPACE h s
777	766	0x2005 FOUR-PER-EM SPACE h s
778	767	0x2006 SIX-PER-EM SPACE h s
779	768	0x2007 FIGURE SPACE h s
780	769	0x2008 PUNCTUATION SPACE h s
781	770	0x2009 THIN SPACE h s
782	771	0x200a HAIR SPACE h s
783	772	0x2028 LINE SEPARATOR vs
784	773	0x2029 PARAGRAPH SEPARATOR vs
785	774	0x202f NARROW NO-BREAK SPACE h s
786	775	0x205f MEDIUM MATHEMATICAL SPACE h s
787	776	0x3000 IDEOGRAPHIC SPACE h s
788	777
789	778	=over 4
790	779
791	780	=item [1]
792	781
793	782	=begin original
794	783
795	784	Prior to Perl v5.18, C<\s> did not match the vertical tab.
796	785	C<[^\S\cK]> (obscurely) matches what C<\s> traditionally did.
797	786
798	787	=end original
799	788
800	789	Perl v5.18 より前では、C<\s> は垂直タブにマッチングしませんでした。
801	790	C<[^\S\cK]> は(ひっそりと)C<\s> が伝統的に
802	791	マッチングしていたものにマッチングします。
803	792
804	793	=item [2]
805	794
806	795	=begin original
807	796
808	797	NEXT LINE and NO-BREAK SPACE may or may not match C<\s> depending
809	798	on the rules in effect. See
810	799	L<the beginning of this section\|/Whitespace>.
811	800
812	801	=end original
813	802
814	803	NEXT LINE と NO-BREAK SPACE はどの規則が有効かによって C<\s> に
815	804	マッチングしたりマッチングしなかったりします。
816	805	L<the beginning of this section\|/Whitespace> を参照してください。
817	806
818	807	=back
819	808
820	809	=head3 Unicode Properties
821	810
822	811	(Unicode 特性)
823	812
824	813	=begin original
825	814
826	815	C<\pP> and C<\p{Prop}> are character classes to match characters that fit given
827	816	Unicode properties. One letter property names can be used in the C<\pP> form,
828	817	with the property name following the C<\p>, otherwise, braces are required.
829	818	When using braces, there is a single form, which is just the property name
830	819	enclosed in the braces, and a compound form which looks like C<\p{name=value}>,
831	820	which means to match if the property "name" for the character has that particular
832	821	"value".
833	822	For instance, a match for a number can be written as C</\pN/> or as
834	823	C</\p{Number}/>, or as C</\p{Number=True}/>.
835	824	Lowercase letters are matched by the property I<Lowercase_Letter> which
836	825	has the short form I<Ll>. They need the braces, so are written as C</\p{Ll}/> or
837	826	C</\p{Lowercase_Letter}/>, or C</\p{General_Category=Lowercase_Letter}/>
838	827	(the underscores are optional).
839	828	C</\pLl/> is valid, but means something different.
840	829	It matches a two character string: a letter (Unicode property C<\pL>),
841	830	followed by a lowercase C<l>.
842	831
843	832	=end original
844	833
845	834	C<\pP> と C<\p{Prop}> は指定された Unicode 特性に一致する文字に
846	835	マッチングする文字クラスです。
847	836	一文字特性は C<\pP> 形式で、C<\p> に引き続いて特性名です; さもなければ
848	837	中かっこが必要です。
849	838	中かっこを使うとき、単に特性名を中かっこで囲んだ単一形式と、
850	839	C<\p{name=value}> のような形で、文字の特性 "name" が特定の "value" を
851	840	持つものにマッチングすることになる複合形式があります。
852	841	例えば、数字にマッチングするものは C</\pN/> または C</\p{Number}/> または
853	842	C</\p{Number=True}/> と書けます。
854	843	小文字は I<LowercaseLetter> 特性にマッチングします; これには
855	844	I<Ll> と言う短縮形式があります。
856	845	中かっこが必要なので、C</\p{Ll}/> または C</\p{Lowercase_Letter}/> または
857	846	C</\p{General_Category=Lowercase_Letter}/> と書きます(下線はオプションです)。
858	847	C</\pLl/> も妥当ですが、違う意味になります。
859	848	これは 2 文字にマッチングします: 英字 (Unicode 特性 C<\pL>)に引き続いて
860	849	小文字の C<l> です。
861	850
862	851	=begin original
863	852
864		~~What~~ ~~a Unic~~ode prope~~rty~~ ma~~tch~~es is n~~ever~~ ~~subj~~ect t~~o local~~e rules, ~~and~~
	853	If locale rules are not in effect, the use of
865		if locale rule~~s a~~r~~e no~~t ~~other~~wise ~~in e~~ffect, the use of a Unicode
	854	a Unicode property will force the regular expression into using Unicode
866		pr~~operty will force the reg~~ular e~~xpre~~ssion into using ~~Unicode ru~~le~~s, if~~
	855	rules, if it isn't already.
867		it isn't already.
868	856
869	857	=end original
870	858
871		Unicode 特性が何にマッチングするかは決してロケールの規則に影響されず、
872	859	ロケール規則が有効でない場合、Unicode 特性を使うと
873	860	正規表現に (まだそうでなければ) Unicode 規則を使うように強制します。
874	861
875	862	=begin original
876	863
877	864	Note that almost all properties are immune to case-insensitive matching.
878	865	That is, adding a C</i> regular expression modifier does not change what
879		they match. ~~But t~~here are two sets that are affected. The first set is
	866	they match. There are two sets that are affected. The first set is
880	867	C<Uppercase_Letter>,
881	868	C<Lowercase_Letter>,
882	869	and C<Titlecase_Letter>,
883	870	all of which match C<Cased_Letter> under C</i> matching.
884	871	The second set is
885	872	C<Uppercase>,
886	873	C<Lowercase>,
887	874	and C<Titlecase>,
888	875	all of which match C<Cased> under C</i> matching.
889	876	(The difference between these sets is that some things, such as Roman
890	877	numerals, come in both upper and lower case, so they are C<Cased>, but
891	878	aren't considered to be letters, so they aren't C<Cased_Letter>s. They're
892	879	actually C<Letter_Number>s.)
893	880	This set also includes its subsets C<PosixUpper> and C<PosixLower>, both
894	881	of which under C</i> match C<PosixAlpha>.
895	882
896	883	=end original
897	884
898	885	ほとんど全ての特性は大文字小文字を無視したマッチングから免除されることに
899	886	注意してください。
900	887	つまり、C</i> 正規表現修飾子はこれらがマッチングするものに影響を
901	888	与えないということです。
902		~~しかし、~~影響を与える二つの集合があります。
	889	影響を与える二つの集合があります。
903	890	一つ目の集合は
904	891	C<Uppercase_Letter>,
905	892	C<Lowercase_Letter>,
906	893	C<Titlecase_Letter> で、全て C</i> マッチングの下で
907	894	C<Cased_Letter> にマッチングします。
908	895	二つ目の集合は
909	896	C<Uppercase>,
910	897	C<Lowercase>,
911	898	C<Titlecase> で、全てC</i> マッチングの下で
912	899	C<Cased> にマッチングします。
913	900	(これらの集合の違いは、ローマ数字のような一部のものは、
914	901	大文字と小文字があるので C<Cased> ですが、
915	902	文字とは扱われないので C<Cased_Letter> ではありません。
916	903	これらは実際には C<Letter_Number> です。)
917	904	この集合はその部分集合である C<PosixUpper> と C<PosixLower> を含みます;
918	905	これら両方は C</i> マッチングの下では C<PosixAlpha> にマッチングします。
919	906
920	907	=begin original
921	908
922	909	For more details on Unicode properties, see L<perlunicode/Unicode
923	910	Character Properties>; for a
924	911	complete list of possible properties, see
925	912	L<perluniprops/Properties accessible through \p{} and \P{}>,
926	913	which notes all forms that have C</i> differences.
927	914	It is also possible to define your own properties. This is discussed in
928	915	L<perlunicode/User-Defined Character Properties>.
929	916
930	917	=end original
931	918
932	919	Unicode 特性に関するさらなる詳細については、
933	920	L<perlunicode/Unicode Character Properties> を参照してください; 特性の完全な
934	921	一覧については、C</i> に違いのある全ての形式について記されている
935	922	L<perluniprops/Properties accessible through \p{} and \P{}> を参照して
936	923	ください。
937	924	独自の特性を定義することも可能です。
938	925	これは L<perlunicode/User-Defined Character Properties> で
939	926	議論されています。
940	927
941	928	=begin original
942	929
943	930	Unicode properties are defined (surprise!) only on Unicode code points.
944	931	Starting in v5.20, when matching against C<\p> and C<\P>, Perl treats
945	932	non-Unicode code points (those above the legal Unicode maximum of
946	933	0x10FFFF) as if they were typical unassigned Unicode code points.
947	934
948	935	=end original
949	936
950	937	Unicode 特性は (驚くべきことに!) Unicode 符号位置に対してのみ
951	938	定義されています。
952	939	v5.20 から、C<\p> と C<\P> に対してマッチングするとき、
953	940	Perl は
954	941	非 Unicode 符号位置 (正当な Unicode の上限の 0x10FFFF を超えるもの) を、
955	942	典型的な未割り当て Unicode 符号位置であるかのように扱います。
956	943
957	944	=begin original
958	945
959	946	Prior to v5.20, Perl raised a warning and made all matches fail on
960	947	non-Unicode code points. This could be somewhat surprising:
961	948
962	949	=end original
963	950
964	951	v5.20 より前では、非 Unicode 符号位置に対しては全てのマッチングは失敗して、
965	952	Perl は警告を出していました。
966	953	これは驚かされるものだったかもしれません。
967	954
968	955	chr(0x110000) =~ \p{ASCII_Hex_Digit=True} # Fails on Perls < v5.20.
969	956	chr(0x110000) =~ \p{ASCII_Hex_Digit=False} # Also fails on Perls
970	957	# < v5.20
971	958
972	959	=begin original
973	960
974	961	Even though these two matches might be thought of as complements, until
975	962	v5.20 they were so only on Unicode code points.
976	963
977	964	=end original
978	965
979	966	これら二つのマッチングは補集合と考えるかもしれませんが、
980	967	v5.20 まで、これらは Unicode 符号位置だけでした。
981	968
982		=begin original
983
984		Starting in perl v5.30, wildcards are allowed in Unicode property
985		values. See L<perlunicode/Wildcards in Property Values>.
986
987		=end original
988
989		perl v5.30 から、Unicode 特性にワイルドカードを使えます。
990		L<perlunicode/Wildcards in Property Values> を参照してください。
991
992	969	=head4 Examples
993	970
994	971	(例)
995	972
996	973	=begin original
997	974
998	975	"a" =~ /\w/ # Match, "a" is a 'word' character.
999	976	"7" =~ /\w/ # Match, "7" is a 'word' character as well.
1000	977	"a" =~ /\d/ # No match, "a" isn't a digit.
1001	978	"7" =~ /\d/ # Match, "7" is a digit.
1002	979	" " =~ /\s/ # Match, a space is whitespace.
1003	980	"a" =~ /\D/ # Match, "a" is a non-digit.
1004	981	"7" =~ /\D/ # No match, "7" is not a non-digit.
1005	982	" " =~ /\S/ # No match, a space is not non-whitespace.
1006	983
1007	984	=end original
1008	985
1009	986	"a" =~ /\w/ # マッチング; "a" は「単語」文字。
1010	987	"7" =~ /\w/ # マッチング; "7" も「単語」文字。
1011	988	"a" =~ /\d/ # マッチングしない; "a" は数字ではない。
1012	989	"7" =~ /\d/ # マッチング; "7" は数字。
1013	990	" " =~ /\s/ # マッチング; スペースは空白。
1014	991	"a" =~ /\D/ # マッチング; "a" は非数字。
1015	992	"7" =~ /\D/ # マッチングしない; "7" は非数字ではない。
1016	993	" " =~ /\S/ # マッチングしない; スペースは非空白ではない。
1017	994
1018	995	=begin original
1019	996
1020	997	" " =~ /\h/ # Match, space is horizontal whitespace.
1021	998	" " =~ /\v/ # No match, space is not vertical whitespace.
1022	999	"\r" =~ /\v/ # Match, a return is vertical whitespace.
1023	1000
1024	1001	=end original
1025	1002
1026	1003	" " =~ /\h/ # マッチング; スペースは水平空白。
1027	1004	" " =~ /\v/ # マッチングしない; スペースは垂直空白ではない。
1028	1005	"\r" =~ /\v/ # マッチング; 復帰は垂直空白。
1029	1006
1030	1007	=begin original
1031	1008
1032	1009	"a" =~ /\pL/ # Match, "a" is a letter.
1033	1010	"a" =~ /\p{Lu}/ # No match, /\p{Lu}/ matches upper case letters.
1034	1011
1035	1012	=end original
1036	1013
1037	1014	"a" =~ /\pL/ # マッチング; "a" は英字。
1038	1015	"a" =~ /\p{Lu}/ # マッチングしない; /\p{Lu}/ は大文字にマッチングする。
1039	1016
1040	1017	=begin original
1041	1018
1042	1019	"\x{0e0b}" =~ /\p{Thai}/ # Match, \x{0e0b} is the character
1043	1020	# 'THAI CHARACTER SO SO', and that's in
1044	1021	# Thai Unicode class.
1045	1022	"a" =~ /\P{Lao}/ # Match, as "a" is not a Laotian character.
1046	1023
1047	1024	=end original
1048	1025
1049	1026	"\x{0e0b}" =~ /\p{Thai}/ # マッチング; \x{0e0b} は文字
1050	1027	# 'THAI CHARACTER SO SO' で、これは
1051	1028	# Thai Unicode クラスにある。
1052	1029	"a" =~ /\P{Lao}/ # マッチング; "a" はラオス文字ではない。
1053	1030
1054	1031	=begin original
1055	1032
1056	1033	It is worth emphasizing that C<\d>, C<\w>, etc, match single characters, not
1057	1034	complete numbers or words. To match a number (that consists of digits),
1058	1035	use C<\d+>; to match a word, use C<\w+>. But be aware of the security
1059	1036	considerations in doing so, as mentioned above.
1060	1037
1061	1038	=end original
1062	1039
1063	1040	C<\d>, C<\w> などは数値や単語全体ではなく、1 文字にマッチングすることは
1064	1041	強調する価値があります。
1065	1042	(数字で構成される) 数値にマッチングするには C<\d+> を使います;
1066	1043	単語にマッチングするには C<\w+> を使います。
1067	1044	しかし前述したように、そうする場合のセキュリティ問題について
1068	1045	注意してください。
1069	1046
1070	1047	=head2 Bracketed Character Classes
1071	1048
1072	1049	(かっこ付き文字クラス)
1073	1050
1074	1051	=begin original
1075	1052
1076	1053	The third form of character class you can use in Perl regular expressions
1077	1054	is the bracketed character class. In its simplest form, it lists the characters
1078	1055	that may be matched, surrounded by square brackets, like this: C<[aeiou]>.
1079	1056	This matches one of C<a>, C<e>, C<i>, C<o> or C<u>. Like the other
1080	1057	character classes, exactly one character is matched.* To match
1081	1058	a longer string consisting of characters mentioned in the character
1082	1059	class, follow the character class with a L<quantifier\|perlre/Quantifiers>. For
1083	1060	instance, C<[aeiou]+> matches one or more lowercase English vowels.
1084	1061
1085	1062	=end original
1086	1063
1087	1064	Perl 正規表現で使える文字クラスの第 3 の形式は大かっこ文字クラスです。
1088	1065	もっとも単純な形式では、以下のように大かっこの中にマッチングする文字を
1089	1066	リストします: C<[aeiou]>.
1090	1067	これは C<a>, C<e>, C<i>, C<o>, C<u> のどれかにマッチングします。
1091	1068	他の文字クラスと同様、正確に一つの文字にマッチングします。
1092	1069	文字クラスで言及した文字で構成されるより長い文字列にマッチングするには、
1093	1070	文字クラスに L<量指定子\|perlre/Quantifiers> を付けます。
1094	1071	例えば、C<[aeiou]+> は一つまたはそれ以上の小文字英語母音に
1095	1072	マッチングします。
1096	1073
1097	1074	=begin original
1098	1075
1099	1076	Repeating a character in a character class has no
1100	1077	effect; it's considered to be in the set only once.
1101	1078
1102	1079	=end original
1103	1080
1104	1081	文字クラスの中で文字を繰り返しても効果はありません; 一度だけ現れたものと
1105	1082	考えられます。
1106	1083
1107	1084	=begin original
1108	1085
1109	1086	Examples:
1110	1087
1111	1088	=end original
1112	1089
1113	1090	例:
1114	1091
1115	1092	=begin original
1116	1093
1117	1094	"e" =~ /[aeiou]/ # Match, as "e" is listed in the class.
1118	1095	"p" =~ /[aeiou]/ # No match, "p" is not listed in the class.
1119	1096	"ae" =~ /^[aeiou]$/ # No match, a character class only matches
1120	1097	# a single character.
1121	1098	"ae" =~ /^[aeiou]+$/ # Match, due to the quantifier.
1122	1099
1123	1100	=end original
1124	1101
1125	1102	"e" =~ /[aeiou]/ # マッチング; "e" はクラスにある。
1126	1103	"p" =~ /[aeiou]/ # マッチングしない; "p" はクラスにない。
1127	1104	"ae" =~ /^[aeiou]$/ # マッチングしない; 一つの文字クラスは
1128	1105	# 一文字だけにマッチングする。
1129	1106	"ae" =~ /^[aeiou]+$/ # マッチング; 量指定子により。
1130	1107
1131	1108	-------
1132	1109
1133	1110	=begin original
1134	1111
1135	1112	* There are two exceptions to a bracketed character class matching a
1136	1113	single character only. Each requires special handling by Perl to make
1137	1114	things work:
1138	1115
1139	1116	=end original
1140	1117
1141	1118	* 大かっこ文字クラスは単一の文字にのみマッチングするということには
1142	1119	二つの例外があります。
1143	1120	それぞれは Perl がうまく動くために特別な扱いが必要です:
1144	1121
1145	1122	=over
1146	1123
1147	1124	=item *
1148	1125
1149	1126	=begin original
1150	1127
1151	1128	When the class is to match caselessly under C</i> matching rules, and a
1152	1129	character that is explicitly mentioned inside the class matches a
1153	1130	multiple-character sequence caselessly under Unicode rules, the class
1154	1131	will also match that sequence. For example, Unicode says that the
1155	1132	letter C<LATIN SMALL LETTER SHARP S> should match the sequence C<ss>
1156	1133	under C</i> rules. Thus,
1157	1134
1158	1135	=end original
1159	1136
1160	1137	クラスが C</i> マッチング規則の下で大文字小文字を無視したマッチングを
1161	1138	して、クラスの中で明示的に記述された文字が Unicode の規則の下で複数文字並びに
1162	1139	大文字小文字を無視してマッチングするとき、
1163	1140	そのクラスはその並びにもマッチングします。
1164	1141	例えば、Unicode は文字 C<LATIN SMALL LETTER SHARP S> は C</i> 規則の下では
1165	1142	並び C<ss> にマッチングするとしています。
1166	1143	従って:
1167	1144
1168	1145	'ss' =~ /\A\N{LATIN SMALL LETTER SHARP S}\z/i # Matches
1169	1146	'ss' =~ /\A[aeioust\N{LATIN SMALL LETTER SHARP S}]\z/i # Matches
1170	1147
1171	1148	=begin original
1172	1149
1173	1150	For this to happen, the class must not be inverted (see L</Negation>)
1174	1151	and the character must be explicitly specified, and not be part of a
1175	1152	multi-character range (not even as one of its endpoints). (L</Character
1176	1153	Ranges> will be explained shortly.) Therefore,
1177	1154
1178	1155	=end original
1179	1156
1180	1157	これが起きるためには、
1181	1158	そのクラスは否定 (L</Negation> 参照) ではなく、
1182	1159	その文字は明示的に指定され、複数文字範囲の一部
1183	1160	(たとえその端でも)でない必要があります。
1184	1161	(L</Character Ranges> は短く説明されています。)
1185	1162	従って:
1186	1163
1187	1164	'ss' =~ /\A[\0-\x{ff}]\z/ui # Doesn't match
1188	1165	'ss' =~ /\A[\0-\N{LATIN SMALL LETTER SHARP S}]\z/ui # No match
1189	1166	'ss' =~ /\A[\xDF-\xDF]\z/ui # Matches on ASCII platforms, since
1190	1167	# \xDF is LATIN SMALL LETTER SHARP S,
1191	1168	# and the range is just a single
1192	1169	# element
1193	1170
1194	1171	=begin original
1195	1172
1196	1173	Note that it isn't a good idea to specify these types of ranges anyway.
1197	1174
1198	1175	=end original
1199	1176
1200	1177	どちらにしろこれらの種類の範囲を指定するのは良い考えではありません。
1201	1178
1202	1179	=item *
1203	1180
1204	1181	=begin original
1205	1182
1206	1183	Some names known to C<\N{...}> refer to a sequence of multiple characters,
1207	1184	instead of the usual single character. When one of these is included in
1208	1185	the class, the entire sequence is matched. For example,
1209	1186
1210	1187	=end original
1211	1188
1212	1189	Some names known to
1213	1190	C<\N{...}> で知られているいくつかの名前は、通常の単一の文字ではなく、
1214	1191	複数の文字の並びを参照します。
1215	1192	その一つがこのクラスに含まれている場合、並び全体がマッチングします。
1216	1193	例えば:
1217	1194
1218	1195	"\N{TAMIL LETTER KA}\N{TAMIL VOWEL SIGN AU}"
1219	1196	=~ / ^ [\N{TAMIL SYLLABLE KAU}] $ /x;
1220	1197
1221	1198	=begin original
1222	1199
1223	1200	matches, because C<\N{TAMIL SYLLABLE KAU}> is a named sequence
1224	1201	consisting of the two characters matched against. Like the other
1225	1202	instance where a bracketed class can match multiple characters, and for
1226	1203	similar reasons, the class must not be inverted, and the named sequence
1227	1204	may not appear in a range, even one where it is both endpoints. If
1228		these happen, it is a fatal error if the character class is within ~~the~~
	1205	these happen, it is a fatal error if the character class is within an
1229		~~scop~~e of L<C<use re ~~'st~~ric~~t>\|r~~e~~/'s~~trict~~' mod~~e~~>, o~~r ~~within~~ an e~~xtended~~
	1206	extended L<C<(?[...])>\|/Extended Bracketed Character Classes>
1230		~~L<C<(?[...])>\|/Extended Bra~~c~~keted Character C~~lass~~es>~~ cla~~ss;~~ otherwise
	1207	class; and only the first code point is used (with
1231		~~only the first code point is used (with~~ a C<regexp>-type warning
	1208	a C<regexp>-type warning raised) otherwise.
1232		raised).
1233	1209
1234	1210	=end original
1235	1211
1236	1212	これはマッチングします; なぜなら C<\N{TAMIL SYLLABLE KAU}> は
1237	1213	マッチングする二つの文字からなる名前付き並びだからです。
1238	1214	大かっこクラスが複数の文字にマッチングするその他の例と同じように、
1239	1215	そして同様の理由で、クラスは否定できず、
1240	1216	たとえ両端の間であっても名前付き並びは範囲の中には現れません。
1241	1217	これらが起きたとき、文字クラスが
1242		L<C<use re 'strict>\|re/'strict' mode> のスコープ内か、
1243	1218	拡張された L<C<(?[...])>\|/Extended Bracketed Character Classes> クラスの
1244	1219	中の場合には致命的エラーになります;
1245	1220	さもなければ、最初の符号位置のみが使われます
1246	1221	(そして C<regexp> 系の警告が発生します)。
1247	1222
1248	1223	=back
1249	1224
1250	1225	=head3 Special Characters Inside a Bracketed Character Class
1251	1226
1252	1227	(かっこ付き文字クラスの中の特殊文字)
1253	1228
1254	1229	=begin original
1255	1230
1256	1231	Most characters that are meta characters in regular expressions (that
1257	1232	is, characters that carry a special meaning like C<.>, C<*>, or C<(>) lose
1258	1233	their special meaning and can be used inside a character class without
1259	1234	the need to escape them. For instance, C<[()]> matches either an opening
1260	1235	parenthesis, or a closing parenthesis, and the parens inside the character
1261		class don't group or capture. ~~Be aware that, unless the pattern is~~
	1236	class don't group or capture.
1262		evaluated in single-quotish context, variable interpolation will take
1263		place before the bracketed class is parsed:
1264	1237
1265	1238	=end original
1266	1239
1267	1240	正規表現内でメタ文字(つまり、C<.>, C<*>, C<(> のように特別な意味を持つ
1268	1241	文字)となるほとんどの文字は文字クラス内ではエスケープしなくても特別な意味を
1269	1242	失うので、エスケープする必要はありません。
1270	1243	例えば、C<[()]> は開きかっこまたは閉じかっこにマッチングし、文字クラスの中の
1271	1244	かっこはグループや捕捉にはなりません。
1272		パターンがシングルクォート風コンテキストの中で評価されない限り、
1273		変数展開は大かっこクラスがパースされる前に行われることに注意してください:
1274	1245
1275		$, = "\t\| ";
1276		$a =~ m'[$,]'; # single-quotish: matches '$' or ','
1277		$a =~ q{[$,]}' # same
1278		$a =~ m/[$,]/; # double-quotish: Because we made an
1279		# assignment to $, above, this now
1280		# matches "\t", "\|", or " "
1281
1282	1246	=begin original
1283	1247
1284	1248	Characters that may carry a special meaning inside a character class are:
1285	1249	C<\>, C<^>, C<->, C<[> and C<]>, and are discussed below. They can be
1286	1250	escaped with a backslash, although this is sometimes not needed, in which
1287	1251	case the backslash may be omitted.
1288	1252
1289	1253	=end original
1290	1254
1291	1255	文字クラスの中でも特別な意味を持つ文字は:
1292	1256	C<\>, C<^>, C<->, C<[>, C<]> で、以下で議論します。
1293	1257	これらは逆スラッシュでエスケープできますが、不要な場合もあり、そのような
1294	1258	場合では逆スラッシュは省略できます。
1295	1259
1296	1260	=begin original
1297	1261
1298	1262	The sequence C<\b> is special inside a bracketed character class. While
1299	1263	outside the character class, C<\b> is an assertion indicating a point
1300	1264	that does not have either two word characters or two non-word characters
1301	1265	on either side, inside a bracketed character class, C<\b> matches a
1302	1266	backspace character.
1303	1267
1304	1268	=end original
1305	1269
1306	1270	シーケンス C<\b> は大かっこ文字クラスの内側では特別です。
1307	1271	文字クラスの外側では C<\b> 二つの単語文字か二つの非単語文字のどちらかではない
1308	1272	位置を示す表明ですが、大かっこ文字クラスの内側では C<\b> は後退文字に
1309	1273	マッチングします。
1310	1274
1311	1275	=begin original
1312	1276
1313	1277	The sequences
1314	1278	C<\a>,
1315	1279	C<\c>,
1316	1280	C<\e>,
1317	1281	C<\f>,
1318	1282	C<\n>,
1319	1283	C<\N{I<NAME>}>,
1320	1284	C<\N{U+I<hex char>}>,
1321	1285	C<\r>,
1322	1286	C<\t>,
1323	1287	and
1324	1288	C<\x>
1325	1289	are also special and have the same meanings as they do outside a
1326	1290	bracketed character class.
1327	1291
1328	1292	=end original
1329	1293
1330	1294	並び
1331	1295	C<\a>,
1332	1296	C<\c>,
1333	1297	C<\e>,
1334	1298	C<\f>,
1335	1299	C<\n>,
1336	1300	C<\N{I<NAME>}>,
1337	1301	C<\N{U+I<hex char>}>,
1338	1302	C<\r>,
1339	1303	C<\t>,
1340	1304	C<\x>
1341	1305	も特別で、大かっこ文字クラスの外側と同じ意味を持ちます。
1342	1306
1343	1307	=begin original
1344	1308
1345	1309	Also, a backslash followed by two or three octal digits is considered an octal
1346	1310	number.
1347	1311
1348	1312	=end original
1349	1313
1350	1314	また、逆スラッシュに引き続いて 2 または 3 桁の 8 進数字があると 8 進数として
1351	1315	扱われます。
1352	1316
1353	1317	=begin original
1354	1318
1355	1319	A C<[> is not special inside a character class, unless it's the start of a
1356	1320	POSIX character class (see L</POSIX Character Classes> below). It normally does
1357	1321	not need escaping.
1358	1322
1359	1323	=end original
1360	1324
1361	1325	C<[> は、POSIX 文字クラス(後述の L</POSIX Character Classes> 参照)の
1362	1326	開始でない限りは文字クラスの中では特別ではありません。
1363	1327	これは普通エスケープは不要です。
1364	1328
1365	1329	=begin original
1366	1330
1367	1331	A C<]> is normally either the end of a POSIX character class (see
1368	1332	L</POSIX Character Classes> below), or it signals the end of the bracketed
1369	1333	character class. If you want to include a C<]> in the set of characters, you
1370	1334	must generally escape it.
1371	1335
1372	1336	=end original
1373	1337
1374	1338	A C<]> は普通は POSIX 文字クラス(後述の L</POSIX Character Classes> 参照)の
1375	1339	終わりか、大かっこ文字クラスの終了を示すかどちらかです。
1376	1340	文字集合に C<]> を含める必要がある場合、一般的には
1377	1341	エスケープしなければなりません。
1378	1342
1379	1343	=begin original
1380	1344
1381	1345	However, if the C<]> is the I<first> (or the second if the first
1382	1346	character is a caret) character of a bracketed character class, it
1383	1347	does not denote the end of the class (as you cannot have an empty class)
1384	1348	and is considered part of the set of characters that can be matched without
1385	1349	escaping.
1386	1350
1387	1351	=end original
1388	1352
1389	1353	しかし、C<]> が大かっこ文字クラスの I<最初> (または最初の文字がキャレットなら
1390	1354	2 番目) の文字の場合、(空クラスを作ることはできないので)これはクラスの
1391	1355	終了を意味せず、エスケープなしでマッチングできる文字の集合の一部と
1392	1356	考えられます。
1393	1357
1394	1358	=begin original
1395	1359
1396	1360	Examples:
1397	1361
1398	1362	=end original
1399	1363
1400	1364	例:
1401	1365
1402	1366	=begin original
1403	1367
1404	1368	"+" =~ /[+?*]/ # Match, "+" in a character class is not special.
1405	1369	"\cH" =~ /[\b]/ # Match, \b inside in a character class
1406	1370	# is equivalent to a backspace.
1407	1371	"]" =~ /[][]/ # Match, as the character class contains
1408	1372	# both [ and ].
1409	1373	"[]" =~ /[[]]/ # Match, the pattern contains a character class
1410	1374	# containing just [, and the character class is
1411	1375	# followed by a ].
1412	1376
1413	1377	=end original
1414	1378
1415	1379	"+" =~ /[+?*]/ # マッチング; 文字クラス内の "+" は特別ではない。
1416	1380	"\cH" =~ /[\b]/ # マッチング; 文字クラスの内側の \b は後退と
1417	1381	# 等価。
1418	1382	"]" =~ /[][]/ # マッチング; 文字クラスに [ と ] の両方を
1419	1383	# 含んでいる。
1420	1384	"[]" =~ /[[]]/ # マッチング; パターンは [ だけを含んでいる
1421	1385	# 文字クラスと、それに引き続く
1422	1386	# ] からなる。
1423	1387
1424		=head3 Bracketed Character Classes and the C</xx> pattern modifier
1425
1426		=begin original
1427
1428		Normally SPACE and TAB characters have no special meaning inside a
1429		bracketed character class; they are just added to the list of characters
1430		matched by the class. But if the L<C</xx>\|perlre/E<sol>x and E<sol>xx>
1431		pattern modifier is in effect, they are generally ignored and can be
1432		added to improve readability. They can't be added in the middle of a
1433		single construct:
1434
1435		=end original
1436
1437		通常、大かっこ文字クラスの内側では SPACE と TAB の文字は
1438		特別な意味はありません; これらは単にクラスによってマッチングされる文字の
1439		リストに加えられます。
1440		しかし、L<C</xx>\|perlre/E<sol>x and E<sol>xx> パターン修飾子が有効の場合、
1441		これらは一般的に無視されるので、可読性を向上させるために追加できます。
1442		これらは単一の構文の中には追加できません:
1443
1444		/ [ \x{10 FFFF} ] /xx # WRONG!
1445
1446		=begin original
1447
1448		The SPACE in the middle of the hex constant is illegal.
1449
1450		=end original
1451
1452		16 進定数の中の SPACE は不正です。
1453
1454		=begin original
1455
1456		To specify a literal SPACE character, you can escape it with a
1457		backslash, like:
1458
1459		=end original
1460
1461		リテラルな SPACE 文字を指定するには、次のように逆スラッシュで
1462		エスケープします:
1463
1464		/[ a e i o u \ ]/xx
1465
1466		=begin original
1467
1468		This matches the English vowels plus the SPACE character.
1469
1470		=end original
1471
1472		これは英語の母音と SPACE 文字に一致します。
1473
1474		=begin original
1475
1476		For clarity, you should already have been using C<\t> to specify a
1477		literal tab, and C<\t> is unaffected by C</xx>.
1478
1479		=end original
1480
1481		確認すると、リテラルなタブのためには既に C<\t> を使っているべきで、
1482		C<\t> は C</xx> の影響を受けません。
1483
1484	1388	=head3 Character Ranges
1485	1389
1486	1390	(文字範囲)
1487	1391
1488	1392	=begin original
1489	1393
1490	1394	It is not uncommon to want to match a range of characters. Luckily, instead
1491	1395	of listing all characters in the range, one may use the hyphen (C<->).
1492	1396	If inside a bracketed character class you have two characters separated
1493	1397	by a hyphen, it's treated as if all characters between the two were in
1494	1398	the class. For instance, C<[0-9]> matches any ASCII digit, and C<[a-m]>
1495	1399	matches any lowercase letter from the first half of the ASCII alphabet.
1496	1400
1497	1401	=end original
1498	1402
1499	1403	文字のある範囲にマッチングしたいというのは珍しくありません。
1500	1404	幸運なことに、その範囲の文字を全て一覧に書く代わりに、ハイフン (C<->) を
1501	1405	使えます。
1502	1406	大かっこ文字クラスの内側で二つの文字がハイフンで区切られていると、
1503	1407	二つの文字の間の全ての文字がクラスに書かれているかのように扱われます。
1504	1408	例えば、C<[0-9]> は任意の ASCII 数字にマッチングし、C<[a-m]> は
1505	1409	ASCII アルファベットの前半分の小文字にマッチングします。
1506	1410
1507	1411	=begin original
1508	1412
1509	1413	Note that the two characters on either side of the hyphen are not
1510	1414	necessarily both letters or both digits. Any character is possible,
1511	1415	although not advisable. C<['-?]> contains a range of characters, but
1512	1416	most people will not know which characters that means. Furthermore,
1513	1417	such ranges may lead to portability problems if the code has to run on
1514	1418	a platform that uses a different character set, such as EBCDIC.
1515	1419
1516	1420	=end original
1517	1421
1518	1422	ハイフンのそれぞれの側の二つの文字は両方とも英字であったり両方とも
1519	1423	数字であったりする必要はないことに注意してください。
1520	1424	任意の文字が可能ですが、勧められません。
1521	1425	C<['-?]> は文字の範囲を含みますが、ほとんどの人はどの文字が含まれるか
1522	1426	分かりません。
1523	1427	さらに、このような範囲は、コードが EBCDIC のような異なった文字集合を使う
1524	1428	プラットフォームで実行されると移植性の問題を引き起こします。
1525	1429
1526	1430	=begin original
1527	1431
1528	1432	If a hyphen in a character class cannot syntactically be part of a range, for
1529	1433	instance because it is the first or the last character of the character class,
1530	1434	or if it immediately follows a range, the hyphen isn't special, and so is
1531	1435	considered a character to be matched literally. If you want a hyphen in
1532	1436	your set of characters to be matched and its position in the class is such
1533	1437	that it could be considered part of a range, you must escape that hyphen
1534	1438	with a backslash.
1535	1439
1536	1440	=end original
1537	1441
1538	1442	例えば文字クラスの最初または最後であったり、範囲の直後のために、文字クラスの
1539	1443	中のハイフンが文法的に範囲の一部となれない場合、ハイフンは特別ではなく、
1540	1444	リテラルにマッチングするべき文字として扱われます。
1541	1445	マッチングする文字の集合にハイフンを入れたいけれどもその位置が範囲の
1542	1446	一部として考えられる場合はハイフンを逆スラッシュで
1543	1447	エスケープしなければなりません。
1544	1448
1545	1449	=begin original
1546	1450
1547	1451	Examples:
1548	1452
1549	1453	=end original
1550	1454
1551	1455	例:
1552	1456
1553	1457	=begin original
1554	1458
1555	1459	[a-z] # Matches a character that is a lower case ASCII letter.
1556	1460	[a-fz] # Matches any letter between 'a' and 'f' (inclusive) or
1557	1461	# the letter 'z'.
1558	1462	[-z] # Matches either a hyphen ('-') or the letter 'z'.
1559	1463	[a-f-m] # Matches any letter between 'a' and 'f' (inclusive), the
1560	1464	# hyphen ('-'), or the letter 'm'.
1561	1465	['-?] # Matches any of the characters '()*+,-./0123456789:;<=>?
1562	1466	# (But not on an EBCDIC platform).
1563	1467	[\N{APOSTROPHE}-\N{QUESTION MARK}]
1564	1468	# Matches any of the characters '()*+,-./0123456789:;<=>?
1565	1469	# even on an EBCDIC platform.
1566	1470	[\N{U+27}-\N{U+3F}] # Same. (U+27 is "'", and U+3F is "?")
1567	1471
1568	1472	=end original
1569	1473
1570	1474	[a-z] # 小文字 ASCII 英字にマッチング。
1571	1475	[a-fz] # 'a' から 'f' の英字およびと 'z' の英字に
1572	1476	# マッチング。
1573	1477	[-z] # ハイフン ('-') または英字 'z' にマッチング。
1574	1478	[a-f-m] # 'a' から 'f' の英字、ハイフン ('-')、英字 'm' に
1575	1479	# マッチング。
1576	1480	['-?] # 文字 '()*+,-./0123456789:;<=>? のどれかにマッチング
1577	1481	# (しかし EBCDIC プラットフォームでは異なります)。
1578	1482	[\N{APOSTROPHE}-\N{QUESTION MARK}]
1579	1483	# たとえ EBCDIC プラットフォームでも '()*+,-./0123456789:;<=>?
1580	1484	# のいずれかの文字にマッチング。
1581	1485	[\N{U+27}-\N{U+3F}] # 同じ。 (U+27 は "'", U+3F は "?")
1582	1486
1583	1487	=begin original
1584	1488
1585		As the final two examples above show, you can achieve portability to
	1489	As the final two examples above show, you can achieve portablity to
1586	1490	non-ASCII platforms by using the C<\N{...}> form for the range
1587	1491	endpoints. These indicate that the specified range is to be interpreted
1588	1492	using Unicode values, so C<[\N{U+27}-\N{U+3F}]> means to match
1589	1493	C<\N{U+27}>, C<\N{U+28}>, C<\N{U+29}>, ..., C<\N{U+3D}>, C<\N{U+3E}>,
1590	1494	and C<\N{U+3F}>, whatever the native code point versions for those are.
1591	1495	These are called "Unicode" ranges. If either end is of the C<\N{...}>
1592	1496	form, the range is considered Unicode. A C<regexp> warning is raised
1593	1497	under C<S<"use re 'strict'">> if the other endpoint is specified
1594	1498	non-portably:
1595	1499
1596	1500	=end original
1597	1501
1598	1502	前述の最後の二つの例が示すように、範囲の端点に
1599	1503	C<\N{...}> 形式を使用することで、非 ASCII プラットフォームへの
1600	1504	移植性を実現できます。
1601	1505	これらは、指定された範囲が Unicode 値を使用して解釈されることを示しています;
1602	1506	したがって、C<[\N{U+27}-\N{U+3F}]>は、C<\N{U+27}>、C<\N{U+28}>、
1603	1507	C<\N{U+29}>、...、C<\N{U+3D}>、C<\N{U+3E}>、C<\N{U+3F}> に
1604	1508	マッチングすることを意味します;
1605	1509	これらのネイティブ符号位置のバージョンが何であっても一致します。
1606	1510	これらは "Unicode" 範囲と呼ばれます。
1607	1511	いずれかの端点が C<\N{...}> 形式の場合、範囲は Unicode と見なされます。
1608	1512	もう一方の端点が移植性がない形で指定されている場合、
1609	1513	C<S<"use re 'strict'">> の下で C<regexp> 警告が発生します:
1610	1514
1611	1515	[\N{U+00}-\x09] # Warning under re 'strict'; \x09 is non-portable
1612	1516	[\N{U+00}-\t] # No warning;
1613	1517
1614	1518	=begin original
1615	1519
1616	1520	Both of the above match the characters C<\N{U+00}> C<\N{U+01}>, ...
1617	1521	C<\N{U+08}>, C<\N{U+09}>, but the C<\x09> looks like it could be a
1618	1522	mistake so the warning is raised (under C<re 'strict'>) for it.
1619	1523
1620	1524	=end original
1621	1525
1622	1526	前述の両方とも文字 C<\N{U+00}> C<\N{U+01}>, ...
1623	1527	C<\N{U+08}>, C<\N{U+09}> にマッチングしますが、
1624	1528	C<\x09> は誤りのように見えるので、
1625	1529	(C<re 'strict'> の下で) 警告が発生します。
1626	1530
1627	1531	=begin original
1628	1532
1629	1533	Perl also guarantees that the ranges C<A-Z>, C<a-z>, C<0-9>, and any
1630	1534	subranges of these match what an English-only speaker would expect them
1631	1535	to match on any platform. That is, C<[A-Z]> matches the 26 ASCII
1632	1536	uppercase letters;
1633	1537	C<[a-z]> matches the 26 lowercase letters; and C<[0-9]> matches the 10
1634	1538	digits. Subranges, like C<[h-k]>, match correspondingly, in this case
1635	1539	just the four letters C<"h">, C<"i">, C<"j">, and C<"k">. This is the
1636	1540	natural behavior on ASCII platforms where the code points (ordinal
1637	1541	values) for C<"h"> through C<"k"> are consecutive integers (0x68 through
1638	1542	0x6B). But special handling to achieve this may be needed on platforms
1639	1543	with a non-ASCII native character set. For example, on EBCDIC
1640	1544	platforms, the code point for C<"h"> is 0x88, C<"i"> is 0x89, C<"j"> is
1641	1545	0x91, and C<"k"> is 0x92. Perl specially treats C<[h-k]> to exclude the
1642	1546	seven code points in the gap: 0x8A through 0x90. This special handling is
1643	1547	only invoked when the range is a subrange of one of the ASCII uppercase,
1644	1548	lowercase, and digit ranges, AND each end of the range is expressed
1645	1549	either as a literal, like C<"A">, or as a named character (C<\N{...}>,
1646	1550	including the C<\N{U+...> form).
1647	1551
1648	1552	=end original
1649	1553
1650	1554	Perl はまた、範囲 C<A-Z>、C<a-z>、C<0-9>、およびこれらの部分範囲が、
1651	1555	英語のみの話者が一致すると予想する範囲とどのプラットフォームでも
1652	1556	一致することを保証します。
1653	1557	つまり、C<[A-Z]> はASCII の大文字 26 文字と一致します;
1654	1558	C<[a-z]> は小文字 26 文字と一致します;
1655	1559	C<[0-9]>は 10 の数字と一致します。
1656	1560	C<[h-k]> のような部分範囲もこれに対応して一致します;
1657	1561	この場合、4 文字 C<"h">、C<"i">、C<"j">、C<"k"> だけが一致します。
1658	1562	これは、C<"h"> から C<"k"> までの符号位置(序数値)が連続した
1659	1563	整数(0x68 から 0x6B)である ASCII プラットフォームでの自然な動作です。
1660	1564	しかし、非 ASCII ネイティブ文字集合を持つプラットフォームでは、
1661	1565	これを実現するための特別な処理が必要になるかもしれません。
1662	1566	たとえば、EBCDIC プラットフォームでは、C<"h"> のコードポイントは
1663	1567	0x88、C<"i"> は 0x89、C<"j"> は 0x91、C<"k"> は 0x92 です。
1664	1568	Perl は C<[h-k]> を特別に扱い、隙間にある七つの符号位置
1665	1569	(0x8A から 0x90)を除外します。
1666	1570	この特殊処理は、範囲が ASCII の大文字、小文字、数字の範囲の
1667	1571	いずれかの部分範囲であり、範囲の両端が C<"A"> のようなリテラル
1668	1572	または名前付き文字(C<\N{...}>(C<\N{U+...> 形式を含む))として表現されている
1669	1573	場合にのみ呼び出されます。
1670	1574
1671	1575	=begin original
1672	1576
1673	1577	EBCDIC Examples:
1674	1578
1675	1579	=end original
1676	1580
1677	1581	EBCDIC の例:
1678	1582
1679	1583	[i-j] # Matches either "i" or "j"
1680	1584	[i-\N{LATIN SMALL LETTER J}] # Same
1681	1585	[i-\N{U+6A}] # Same
1682	1586	[\N{U+69}-\N{U+6A}] # Same
1683	1587	[\x{89}-\x{91}] # Matches 0x89 ("i"), 0x8A .. 0x90, 0x91 ("j")
1684	1588	[i-\x{91}] # Same
1685	1589	[\x{89}-j] # Same
1686	1590	[i-J] # Matches, 0x89 ("i") .. 0xC1 ("J"); special
1687	1591	# handling doesn't apply because range is mixed
1688	1592	# case
1689	1593
1690	1594	=head3 Negation
1691	1595
1692	1596	(否定)
1693	1597
1694	1598	=begin original
1695	1599
1696	1600	It is also possible to instead list the characters you do not want to
1697	1601	match. You can do so by using a caret (C<^>) as the first character in the
1698	1602	character class. For instance, C<[^a-z]> matches any character that is not a
1699	1603	lowercase ASCII letter, which therefore includes more than a million
1700	1604	Unicode code points. The class is said to be "negated" or "inverted".
1701	1605
1702	1606	=end original
1703	1607
1704	1608	代わりにマッチングしたくない文字の一覧を指定することも可能です。
1705	1609	文字クラスの先頭の文字としてキャレット (C<^>) を使うことで実現します。
1706	1610	例えば、C<[^a-z]> 小文字の ASCII 英字以外の文字にマッチングします;
1707	1611	従って 100 万種類以上の Unicode 符号位置が含まれます。
1708	1612	このクラスは「否定」("negated") や「反転」("inverted")と呼ばれます。
1709	1613
1710	1614	=begin original
1711	1615
1712	1616	This syntax make the caret a special character inside a bracketed character
1713	1617	class, but only if it is the first character of the class. So if you want
1714	1618	the caret as one of the characters to match, either escape the caret or
1715	1619	else don't list it first.
1716	1620
1717	1621	=end original
1718	1622
1719	1623	この文法はキャレットを大かっこ文字クラスの内側で特別な文字にしますが、
1720	1624	クラスの最初の文字の場合のみです。
1721	1625	それでマッチングしたい文字の一つでキャレットを使いたい場合、キャレットを
1722	1626	エスケープするか、最初以外の位置に書いてください。
1723	1627
1724	1628	=begin original
1725	1629
1726	1630	In inverted bracketed character classes, Perl ignores the Unicode rules
1727	1631	that normally say that named sequence, and certain characters should
1728	1632	match a sequence of multiple characters use under caseless C</i>
1729	1633	matching. Following those rules could lead to highly confusing
1730	1634	situations:
1731	1635
1732	1636	=end original
1733	1637
1734	1638	否定大かっこ文字クラスでは、通常は大文字小文字を無視した C</i> マッチングの
1735	1639	下では名前空間とある種の文字が複数の文字並びにマッチングするということを
1736	1640	Perl は無視します。
1737	1641	これらの規則に従うととても混乱する状況を引き起こすことになるからです:
1738	1642
1739	1643	"ss" =~ /^[^\xDF]+$/ui; # Matches!
1740	1644
1741	1645	=begin original
1742	1646
1743	1647	This should match any sequences of characters that aren't C<\xDF> nor
1744	1648	what C<\xDF> matches under C</i>. C<"s"> isn't C<\xDF>, but Unicode
1745	1649	says that C<"ss"> is what C<\xDF> matches under C</i>. So which one
1746	1650	"wins"? Do you fail the match because the string has C<ss> or accept it
1747	1651	because it has an C<s> followed by another C<s>? Perl has chosen the
1748	1652	latter. (See note in L</Bracketed Character Classes> above.)
1749	1653
1750	1654	=end original
1751	1655
1752	1656	これは C</i> の下では C<\xDF> または C<\xDF> にマッチングするもの以外の
1753	1657	任意の文字並びにマッチングするべきです。
1754	1658	C<"s"> は C<\xDF> ではありませんが、
1755	1659	C</i> の下では C<"ss"> は C<\xDF> がマッチングするものと Unicode は
1756	1660	言っています。
1757	1661	ではどちらが「勝つ」のでしょうか?
1758	1662	文字列は C<ss> だからマッチングに失敗するのでしょうか、
1759	1663	それともこれは C<s> の後にもう一つの C<s> があるから成功するのでしょうか?
1760	1664	Perl は後者を選択しました。
1761	1665	(前述の L</Bracketed Character Classes> を参照してください。)
1762	1666
1763	1667	=begin original
1764	1668
1765	1669	Examples:
1766	1670
1767	1671	=end original
1768	1672
1769	1673	例:
1770	1674
1771	1675	=begin original
1772	1676
1773	1677	"e" =~ /[^aeiou]/ # No match, the 'e' is listed.
1774	1678	"x" =~ /[^aeiou]/ # Match, as 'x' isn't a lowercase vowel.
1775	1679	"^" =~ /[^^]/ # No match, matches anything that isn't a caret.
1776	1680	"^" =~ /[x^]/ # Match, caret is not special here.
1777	1681
1778	1682	=end original
1779	1683
1780	1684	"e" =~ /[^aeiou]/ # マッチングしない; 'e' がある。
1781	1685	"x" =~ /[^aeiou]/ # マッチング; 'x' は小文字の母音ではない。
1782	1686	"^" =~ /[^^]/ # マッチングしない; キャレット以外全てにマッチング。
1783	1687	"^" =~ /[x^]/ # マッチング; キャレットはここでは特別ではない。
1784	1688
1785	1689	=head3 Backslash Sequences
1786	1690
1787	1691	(逆スラッシュシーケンス)
1788	1692
1789	1693	=begin original
1790	1694
1791	1695	You can put any backslash sequence character class (with the exception of
1792	1696	C<\N> and C<\R>) inside a bracketed character class, and it will act just
1793	1697	as if you had put all characters matched by the backslash sequence inside the
1794	1698	character class. For instance, C<[a-f\d]> matches any decimal digit, or any
1795	1699	of the lowercase letters between 'a' and 'f' inclusive.
1796	1700
1797	1701	=end original
1798	1702
1799	1703	大かっこ文字クラスの中に(C<\N> と C<\R> を例外として)逆スラッシュシーケンス
1800	1704	文字クラスを置くことができ、逆スラッシュシーケンスにマッチングする全ての
1801	1705	文字を文字クラスの中に置いたかのように動作します。
1802	1706	例えば、C<[a-f\d]> は任意の 10 進数字、あるいは 'a' から 'f' までの小文字に
1803	1707	マッチングします。
1804	1708
1805	1709	=begin original
1806	1710
1807	1711	C<\N> within a bracketed character class must be of the forms C<\N{I<name>}>
1808	1712	or C<\N{U+I<hex char>}>, and NOT be the form that matches non-newlines,
1809	1713	for the same reason that a dot C<.> inside a bracketed character class loses
1810	1714	its special meaning: it matches nearly anything, which generally isn't what you
1811	1715	want to happen.
1812	1716
1813	1717	=end original
1814	1718
1815	1719	大かっこ文字クラスの中のドット C<.> が特別な意味を持たないのと同じ理由で、
1816	1720	大かっこ文字クラスの中の C<\N> は C<\N{I<name>}> または
1817	1721	C<\N{U+I<hex char>}> の形式で、かつ非改行マッチング形式でない形でなければ
1818	1722	なりません: これはほとんど何でもマッチングするので、一般的には起こって
1819	1723	欲しいことではありません。
1820	1724
1821	1725	=begin original
1822	1726
1823	1727	Examples:
1824	1728
1825	1729	=end original
1826	1730
1827	1731	例:
1828	1732
1829	1733	=begin original
1830	1734
1831	1735	/[\p{Thai}\d]/ # Matches a character that is either a Thai
1832	1736	# character, or a digit.
1833	1737	/[^\p{Arabic}()]/ # Matches a character that is neither an Arabic
1834	1738	# character, nor a parenthesis.
1835	1739
1836	1740	=end original
1837	1741
1838	1742	/[\p{Thai}\d]/ # タイ文字または数字の文字に
1839	1743	# マッチングする。
1840	1744	/[^\p{Arabic}()]/ # アラビア文字でもかっこでもない文字に
1841	1745	# マッチングする。
1842	1746
1843	1747	=begin original
1844	1748
1845	1749	Backslash sequence character classes cannot form one of the endpoints
1846	1750	of a range. Thus, you can't say:
1847	1751
1848	1752	=end original
1849	1753
1850	1754	逆スラッシュシーケンス文字クラスは範囲の端点の一つにはできません。
1851	1755	従って、以下のようにはできません:
1852	1756
1853	1757	/[\p{Thai}-\d]/ # Wrong!
1854	1758
1855	1759	=head3 POSIX Character Classes
1856	1760	X<character class> X<\p> X<\p{}>
1857	1761	X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
1858	1762	X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
1859	1763
1860	1764	(POSIX 文字クラス)
1861	1765
1862	1766	=begin original
1863	1767
1864	1768	POSIX character classes have the form C<[:class:]>, where I<class> is the
1865	1769	name, and the C<[:> and C<:]> delimiters. POSIX character classes only appear
1866	1770	I<inside> bracketed character classes, and are a convenient and descriptive
1867	1771	way of listing a group of characters.
1868	1772
1869	1773	=end original
1870	1774
1871	1775	POSIX 文字クラスは C<[:class:]> の形式で、I<class> は名前、C<[:> と C<:]> は
1872	1776	デリミタです。
1873	1777	POSIX 文字クラスは大かっこ文字クラスの I<内側> にのみ現れ、文字のグループを
1874	1778	一覧するのに便利で記述的な方法です。
1875	1779
1876	1780	=begin original
1877	1781
1878	1782	Be careful about the syntax,
1879	1783
1880	1784	=end original
1881	1785
1882	1786	文法について注意してください、
1883	1787
1884	1788	# Correct:
1885	1789	$string =~ /[[:alpha:]]/
1886	1790
1887	1791	# Incorrect (will warn):
1888	1792	$string =~ /[:alpha:]/
1889	1793
1890	1794	=begin original
1891	1795
1892	1796	The latter pattern would be a character class consisting of a colon,
1893	1797	and the letters C<a>, C<l>, C<p> and C<h>.
1894	1798	POSIX character classes can be part of a larger bracketed character class.
1895	1799	For example,
1896	1800
1897	1801	=end original
1898	1802
1899	1803	後者のパターンは、コロンおよび C<a>, C<l>, C<p>, C<h> の文字からなる
1900	1804	文字クラスです。
1901	1805	これら文字クラスはより大きな大かっこ文字クラスの一部にできます。
1902		例えば:
	1806	例えば、
1903	1807
1904	1808	[01[:alpha:]%]
1905	1809
1906	1810	=begin original
1907	1811
1908	1812	is valid and matches '0', '1', any alphabetic character, and the percent sign.
1909	1813
1910	1814	=end original
1911	1815
1912	1816	これは妥当で、'0'、'1'、任意の英字、パーセントマークにマッチングします。
1913	1817
1914	1818	=begin original
1915	1819
1916	1820	Perl recognizes the following POSIX character classes:
1917	1821
1918	1822	=end original
1919	1823
1920	1824	Perl は以下の POSIX 文字クラスを認識します:
1921	1825
1922	1826	=begin original
1923	1827
1924		alpha Any alphabetical character (~~e.g.,~~ [A-Za-z]).
	1828	alpha Any alphabetical character ("[A-Za-z]").
1925		alnum Any alphanumeric character (~~e.g.,~~ [A-Za-z0-9]).
	1829	alnum Any alphanumeric character ("[A-Za-z0-9]").
1926	1830	ascii Any character in the ASCII character set.
1927	1831	blank A GNU extension, equal to a space or a horizontal tab ("\t").
1928	1832	cntrl Any control character. See Note [2] below.
1929		digit Any decimal digit (~~e.g.,~~ [0-9]), equivalent to "\d".
	1833	digit Any decimal digit ("[0-9]"), equivalent to "\d".
1930	1834	graph Any printable character, excluding a space. See Note [3] below.
1931		lower Any lowercase character (~~e.g.,~~ [a-z]).
	1835	lower Any lowercase character ("[a-z]").
1932	1836	print Any printable character, including a space. See Note [4] below.
1933	1837	punct Any graphical character excluding "word" characters. Note [5].
1934	1838	space Any whitespace character. "\s" including the vertical tab
1935	1839	("\cK").
1936		upper Any uppercase character (~~e.g.,~~ [A-Z]).
	1840	upper Any uppercase character ("[A-Z]").
1937		word A Perl extension (~~e.g.,~~ [A-Za-z0-9_]), equivalent to "\w".
	1841	word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w".
1938		xdigit Any hexadecimal digit (~~e.g.,~~ [0-9a-fA-F])~~. Note [7]~~.
	1842	xdigit Any hexadecimal digit ("[0-9a-fA-F]").
1939	1843
1940	1844	=end original
1941	1845
1942		alpha 任意の英字 (例: [A-Za-z])。
	1846	alpha 任意の英字 ("[A-Za-z]")。
1943		alnum 任意の英数字。(例: [A-Za-z0-9])
	1847	alnum 任意の英数字。("[A-Za-z0-9]")
1944	1848	ascii 任意の ASCII 文字集合の文字。
1945		blank GNU 拡張; スペースまたは水平タブ (\t) と同じ。
	1849	blank GNU 拡張; スペースまたは水平タブ ("\t") と同じ。
1946	1850	cntrl 任意の制御文字。後述の [2] 参照。
1947		digit 任意の 10 進数字 (例: [0-9]); "\d" と等価。
	1851	digit 任意の 10 進数字 ("[0-9]"); "\d" と等価。
1948	1852	graph 任意の表示文字; スペースを除く。後述の [3] 参照。
1949		lower 任意の小文字 (例: [a-z])。
	1853	lower 任意の小文字 ("[a-z]")。
1950	1854	print 任意の表示文字; スペースを含む。後述の [4] 参照。
1951	1855	punct 任意の「単語」文字を除く表示文字。[5] 参照。
1952	1856	space 任意の空白文字。水平タブ ("\cK") を含む "\s"。
1953		upper 任意の大文字 (例: [A-Z])。
	1857	upper 任意の大文字 ("[A-Z]")。
1954		word Perl 拡張 (例: [A-Za-z0-9_]); "\w" と等価。
	1858	word Perl 拡張 ("[A-Za-z0-9_]"); "\w" と等価。
1955		xdigit 任意の 16 進文字 (例: [0-9a-fA-F])~~。[7] 参照~~。
	1859	xdigit 任意の 16 進文字 ("[0-9a-fA-F]")。
1956	1860
1957	1861	=begin original
1958	1862
1959	1863	Like the L<Unicode properties\|/Unicode Properties>, most of the POSIX
1960	1864	properties match the same regardless of whether case-insensitive (C</i>)
1961	1865	matching is in effect or not. The two exceptions are C<[:upper:]> and
1962	1866	C<[:lower:]>. Under C</i>, they each match the union of C<[:upper:]> and
1963	1867	C<[:lower:]>.
1964	1868
1965	1869	=end original
1966	1870
1967	1871	L<Unicode properties\|/Unicode Properties> と同様、
1968	1872	ほとんどの POSIX 特性は、大文字小文字無視 (C</i>) が有効かどうかに関わらず
1969	1873	同じものにマッチングします。
1970	1874	二つの例外は C<[:upper:]> と C<[:lower:]> です。
1971	1875	C</i> の下では、これらそれぞれ C<[:upper:]> と C<[:lower:]> の和集合に
1972	1876	マッチングします。
1973	1877
1974	1878	=begin original
1975	1879
1976	1880	Most POSIX character classes have two Unicode-style C<\p> property
1977	1881	counterparts. (They are not official Unicode properties, but Perl extensions
1978	1882	derived from official Unicode properties.) The table below shows the relation
1979	1883	between POSIX character classes and these counterparts.
1980	1884
1981	1885	=end original
1982	1886
1983	1887	ほとんどの POSIX 文字クラスには、対応する二つの Unicode 式の C<\p> 特性が
1984	1888	あります。
1985	1889	(これは公式 Unicode 特性ではなく、公式 Unicode 特性から派生した Perl
1986	1890	エクステンションです。)
1987	1891	以下の表は POSIX 文字クラスと対応するものとの関連を示します。
1988	1892
1989	1893	=begin original
1990	1894
1991	1895	One counterpart, in the column labelled "ASCII-range Unicode" in
1992	1896	the table, matches only characters in the ASCII character set.
1993	1897
1994	1898	=end original
1995	1899
1996	1900	対応物の一つである、表で "ASCII-range Unicode" と書かれた列のものは、
1997	1901	ASCII 文字集合の文字にのみマッチングします。
1998	1902
1999	1903	=begin original
2000	1904
2001	1905	The other counterpart, in the column labelled "Full-range Unicode", matches any
2002	1906	appropriate characters in the full Unicode character set. For example,
2003	1907	C<\p{Alpha}> matches not just the ASCII alphabetic characters, but any
2004	1908	character in the entire Unicode character set considered alphabetic.
2005	1909	An entry in the column labelled "backslash sequence" is a (short)
2006	1910	equivalent.
2007	1911
2008	1912	=end original
2009	1913
2010	1914	もう一つの対応物である、"Full-range Unicode" と書かれた列のものは、
2011	1915	Unicode 文字集合全体の中の適切な任意の文字にマッチングします。
2012	1916	例えば、C<\p{Alpha}> は単に ASCII アルファベット文字だけでなく、
2013	1917	Unicode 文字集合全体の中からアルファベットと考えられる任意の文字に
2014	1918	マッチングします。
2015	1919	"backslash sequence" の列は (短い) 同義語です。
2016	1920
2017	1921	[[:...:]] ASCII-range Full-range backslash Note
2018	1922	Unicode Unicode sequence
2019	1923	-----------------------------------------------------
2020	1924	alpha \p{PosixAlpha} \p{XPosixAlpha}
2021	1925	alnum \p{PosixAlnum} \p{XPosixAlnum}
2022	1926	ascii \p{ASCII}
2023	1927	blank \p{PosixBlank} \p{XPosixBlank} \h [1]
2024	1928	or \p{HorizSpace} [1]
2025	1929	cntrl \p{PosixCntrl} \p{XPosixCntrl} [2]
2026	1930	digit \p{PosixDigit} \p{XPosixDigit} \d
2027	1931	graph \p{PosixGraph} \p{XPosixGraph} [3]
2028	1932	lower \p{PosixLower} \p{XPosixLower}
2029	1933	print \p{PosixPrint} \p{XPosixPrint} [4]
2030	1934	punct \p{PosixPunct} \p{XPosixPunct} [5]
2031	1935	\p{PerlSpace} \p{XPerlSpace} \s [6]
2032	1936	space \p{PosixSpace} \p{XPosixSpace} [6]
2033	1937	upper \p{PosixUpper} \p{XPosixUpper}
2034	1938	word \p{PosixWord} \p{XPosixWord} \w
2035		xdigit \p{PosixXDigit} \p{XPosixXDigit} ~~[7]~~
	1939	xdigit \p{PosixXDigit} \p{XPosixXDigit}
2036	1940
2037	1941	=over 4
2038	1942
2039	1943	=item [1]
2040	1944
2041	1945	=begin original
2042	1946
2043	1947	C<\p{Blank}> and C<\p{HorizSpace}> are synonyms.
2044	1948
2045	1949	=end original
2046	1950
2047	1951	C<\p{Blank}> と C<\p{HorizSpace}> は同義語です。
2048	1952
2049	1953	=item [2]
2050	1954
2051	1955	=begin original
2052	1956
2053	1957	Control characters don't produce output as such, but instead usually control
2054	1958	the terminal somehow: for example, newline and backspace are control characters.
2055	1959	On ASCII platforms, in the ASCII range, characters whose code points are
2056	1960	between 0 and 31 inclusive, plus 127 (C<DEL>) are control characters; on
2057	1961	EBCDIC platforms, their counterparts are control characters.
2058	1962
2059	1963	=end original
2060	1964
2061	1965	制御文字はそれ自体は出力されず、普通は何か端末を制御します: 例えば
2062	1966	改行と後退は制御文字です。
2063	1967	ASCII プラットフォームで、ASCII の範囲では、符号位置が 0 から 31 までの
2064	1968	範囲の文字および 127 (C<DEL>) が制御文字です;
2065	1969	EBCDIC プラットフォームでは、対応するものは制御文字です。
2066	1970
2067	1971	=item [3]
2068	1972
2069	1973	=begin original
2070	1974
2071	1975	Any character that is I<graphical>, that is, visible. This class consists
2072	1976	of all alphanumeric characters and all punctuation characters.
2073	1977
2074	1978	=end original
2075	1979
2076	1980	I<graphical>、つまり見える文字。
2077	1981	このクラスは全ての英数字と全ての句読点文字。
2078	1982
2079	1983	=item [4]
2080	1984
2081	1985	=begin original
2082	1986
2083	1987	All printable characters, which is the set of all graphical characters
2084	1988	plus those whitespace characters which are not also controls.
2085	1989
2086	1990	=end original
2087	1991
2088	1992	全ての表示可能な文字; 全ての graphical 文字に加えて制御文字でない空白文字。
2089	1993
2090	1994	=item [5]
2091	1995
2092	1996	=begin original
2093	1997
2094	1998	C<\p{PosixPunct}> and C<[[:punct:]]> in the ASCII range match all
2095	1999	non-controls, non-alphanumeric, non-space characters:
2096	2000	C<[-!"#$%&'()*+,./:;<=E<gt>?@[\\\]^_`{\|}~]> (although if a locale is in effect,
2097	2001	it could alter the behavior of C<[[:punct:]]>).
2098	2002
2099	2003	=end original
2100	2004
2101	2005	ASCII の範囲の C<\p{PosixPunct}> と C<[[:punct:]]> は全ての非制御、非英数字、
2102	2006	非空白文字にマッチングします:
2103	2007	C<[-!"#$%&'()*+,./:;<=E<gt>?@[\\\]^_`{\|}~]> (しかしロケールが有効なら、
2104	2008	C<[[:punct:]]> の振る舞いが変わります)。
2105	2009
2106	2010	=begin original
2107	2011
2108	2012	The similarly named property, C<\p{Punct}>, matches a somewhat different
2109	2013	set in the ASCII range, namely
2110	2014	C<[-!"#%&'()*,./:;?@[\\\]_{}]>. That is, it is missing the nine
2111	2015	characters C<[$+E<lt>=E<gt>^`\|~]>.
2112	2016	This is because Unicode splits what POSIX considers to be punctuation into two
2113	2017	categories, Punctuation and Symbols.
2114	2018
2115	2019	=end original
2116	2020
2117	2021	似たような名前の特性 C<\p{Punct}> は、ASCII 範囲の異なる集合である
2118	2022	C<[-!"#%&'()*,./:;?@[\\\]_{}]> にマッチングします。
2119	2023	つまり、C<[$+E<lt>=E<gt>^`\|~]> の 9 文字はありません。
2120	2024	これは、Unicode は POSIX が句読点と考えるものを二つのカテゴリ
2121	2025	Punctuation と Symbols に分けているからです。
2122	2026
2123	2027	=begin original
2124	2028
2125	2029	C<\p{XPosixPunct}> and (under Unicode rules) C<[[:punct:]]>, match what
2126	2030	C<\p{PosixPunct}> matches in the ASCII range, plus what C<\p{Punct}>
2127	2031	matches. This is different than strictly matching according to
2128	2032	C<\p{Punct}>. Another way to say it is that
2129	2033	if Unicode rules are in effect, C<[[:punct:]]> matches all characters
2130	2034	that Unicode considers punctuation, plus all ASCII-range characters that
2131	2035	Unicode considers symbols.
2132	2036
2133	2037	=end original
2134	2038
2135	2039	C<\p{XPosixPunct}> と (Unicode の規則の下での) C<[[:punct:]]> は、
2136	2040	ASCII の範囲で C<\p{PosixPunct}> がマッチングする物に加えて、
2137	2041	C<\p{Punct}> がマッチングする物にマッチングします。
2138	2042	これは C<\p{Punct}> に従って正確にマッチングする物と異なります。
2139	2043	Unicode 規則が有効な場合のもう一つの言い方は、C<[[:punct:]]> は Unicode が
2140	2044	句読点として扱うものに加えて、Unicode が "symbols" として扱う ASCII 範囲の
2141	2045	全ての文字にマッチングします。
2142	2046
2143	2047	=item [6]
2144	2048
2145	2049	=begin original
2146	2050
2147	2051	C<\p{XPerlSpace}> and C<\p{Space}> match identically starting with Perl
2148	2052	v5.18. In earlier versions, these differ only in that in non-locale
2149	2053	matching, C<\p{XPerlSpace}> did not match the vertical tab, C<\cK>.
2150	2054	Same for the two ASCII-only range forms.
2151	2055
2152	2056	=end original
2153	2057
2154	2058	C<\p{XPerlSpace}> と C<\p{Space}> は、Perl v5.18 からは同じように
2155	2059	マッチングします。
2156	2060	以前のバージョンでは、これらの違いは、非ロケールマッチングでは
2157	2061	C<\p{XPerlSpace}> は垂直タブ C<\cK> にもマッチングしないということだけです。
2158	2062	二つの ASCII のみの範囲の形式では同じです。
2159	2063
2160		=item [7]
2161
2162		=begin original
2163
2164		Unlike C<[[:digit:]]> which matches digits in many writing systems, such
2165		as Thai and Devanagari, there are currently only two sets of hexadecimal
2166		digits, and it is unlikely that more will be added. This is because you
2167		not only need the ten digits, but also the six C<[A-F]> (and C<[a-f]>)
2168		to correspond. That means only the Latin script is suitable for these,
2169		and Unicode has only two sets of these, the familiar ASCII set, and the
2170		fullwidth forms starting at U+FF10 (FULLWIDTH DIGIT ZERO).
2171
2172		=end original
2173
2174		タイ文字やデバナーガリ文字のように多くの書記体系の数字にマッチングする
2175		C<[[:digit:]]> と異なり、16 進数の二つの集合だけで、これ以上追加されることは
2176		おそらくありません。
2177		これは、対応するのに 10 の数字だけでなく、6 個の C<[A-F]> (および C<[a-f]>) も
2178		必要だからです。
2179		これは、Latin 用字のみがこれらに適合していて、
2180		Unicode はこれらの二つの集合、つまり慣れ親しんだ
2181		ASCII 集合と、U+FF10 (FULLWIDTH DIGIT ZERO) から始まる全角形式のみを
2182		持つということです。
2183
2184	2064	=back
2185	2065
2186	2066	=begin original
2187	2067
2188	2068	There are various other synonyms that can be used besides the names
2189		listed in the table. For example, C<\p{XPosixAlpha}> can be written as
	2069	listed in the table. For example, C<\p{PosixAlpha}> can be written as
2190	2070	C<\p{Alpha}>. All are listed in
2191	2071	L<perluniprops/Properties accessible through \p{} and \P{}>.
2192	2072
2193	2073	=end original
2194	2074
2195	2075	表に挙げられている名前以外にも様々なその他の同義語が使えます。
2196		例えば、C<\p{XPosixAlpha}> は C<\p{Alpha}> と書けます。
	2076	例えば、C<\p{PosixAlpha}> は C<\p{Alpha}> と書けます。
2197	2077	全ての一覧は
2198	2078	L<perluniprops/Properties accessible through \p{} and \P{}> に
2199	2079	あります。
2200	2080
2201	2081	=begin original
2202	2082
2203	2083	Both the C<\p> counterparts always assume Unicode rules are in effect.
2204	2084	On ASCII platforms, this means they assume that the code points from 128
2205	2085	to 255 are Latin-1, and that means that using them under locale rules is
2206	2086	unwise unless the locale is guaranteed to be Latin-1 or UTF-8. In contrast, the
2207	2087	POSIX character classes are useful under locale rules. They are
2208	2088	affected by the actual rules in effect, as follows:
2209	2089
2210	2090	=end original
2211	2091
2212	2092	C<\p> に対応するものの両方は常に Unicode の規則が有効であることを仮定します。
2213	2093	これは、ASCII プラットフォームでは、128 から 255 の符号位置は
2214	2094	Latin-1 であることを仮定するということで、ロケールの規則の下で
2215	2095	これらを使うということは、ロケールが Latin-1 か UTF-8 であることが
2216	2096	補償されていない限り賢明ではないということです。
2217	2097	一方、POSIX 文字クラスはロケールの規則の下で有用です。
2218	2098	これらは次のように、実際に有効な規則に影響を受けます:
2219	2099
2220	2100	=over
2221	2101
2222	2102	=item If the C</a> modifier, is in effect ...
2223	2103
2224	2104	(C</a> が有効なら...)
2225	2105
2226	2106	=begin original
2227	2107
2228	2108	Each of the POSIX classes matches exactly the same as their ASCII-range
2229	2109	counterparts.
2230	2110
2231	2111	=end original
2232	2112
2233	2113	それぞれの POSIX クラスは ASCII の範囲で対応する正確に同じものに
2234	2114	マッチングします。
2235	2115
2236	2116	=item otherwise ...
2237	2117
2238	2118	(さもなければ ...)
2239	2119
2240	2120	=over
2241	2121
2242	2122	=item For code points above 255 ...
2243	2123
2244	2124	(256 以上の符号位置では ...)
2245	2125
2246	2126	=begin original
2247	2127
2248	2128	The POSIX class matches the same as its Full-range counterpart.
2249	2129
2250	2130	=end original
2251	2131
2252	2132	POSIX クラスはその Full の範囲で対応する同じものにマッチングします。
2253	2133
2254	2134	=item For code points below 256 ...
2255	2135
2256	2136	(255 以下の符号位置では ...)
2257	2137
2258	2138	=over
2259	2139
2260	2140	=item if locale rules are in effect ...
2261	2141
2262	2142	(ロケール規則が有効なら ...)
2263	2143
2264	2144	=begin original
2265	2145
2266	2146	The POSIX class matches according to the locale, except:
2267	2147
2268	2148	=end original
2269	2149
2270	2150	POSIX クラスはロケールに従ってマッチングします; 例外は:
2271	2151
2272	2152	=over
2273	2153
2274	2154	=item C<word>
2275	2155
2276	2156	=begin original
2277	2157
2278	2158	also includes the platform's native underscore character, no matter what
2279	2159	the locale is.
2280	2160
2281	2161	=end original
2282	2162
2283	2163	それに加えて、ロケールが何かに関わらず、プラットフォームのネイティブな
2284	2164	下線文字を使います。
2285	2165
2286	2166	=item C<ascii>
2287	2167
2288	2168	=begin original
2289	2169
2290	2170	on platforms that don't have the POSIX C<ascii> extension, this matches
2291	2171	just the platform's native ASCII-range characters.
2292	2172
2293	2173	=end original
2294	2174
2295	2175	POSIX C<ascii> 拡張を持たないプラットフォームでは、
2296	2176	これは単にプラットフォームのネイティブな ASCII の範囲の文字に
2297	2177	マッチングします。
2298	2178
2299	2179	=item C<blank>
2300	2180
2301	2181	=begin original
2302	2182
2303	2183	on platforms that don't have the POSIX C<blank> extension, this matches
2304	2184	just the platform's native tab and space characters.
2305	2185
2306	2186	=end original
2307	2187
2308	2188	on platforms that don't have the
2309	2189	POSIX C<blank> 格調を持たないプラットフォームでは、
2310	2190	これは単にプラットフォームのネイティブなタブとすぺーす文字に
2311	2191	マッチングします。
2312	2192
2313	2193	=back
2314	2194
2315		=item if~~, instead,~~ Unicode rules are in effect ...
	2195	=item if Unicode rules are in effect ...
2316	2196
2317		(~~そうではなく、~~Unicode 規則が有効なら ...)
	2197	(Unicode 規則が有効なら ...)
2318	2198
2319	2199	=begin original
2320	2200
2321	2201	The POSIX class matches the same as the Full-range counterpart.
2322	2202
2323	2203	=end original
2324	2204
2325	2205	POSIX クラスは Full の範囲の対応する同じものにマッチングします。
2326	2206
2327	2207	=item otherwise ...
2328	2208
2329	2209	(さもなければ ...)
2330	2210
2331	2211	=begin original
2332	2212
2333	2213	The POSIX class matches the same as the ASCII range counterpart.
2334	2214
2335	2215	=end original
2336	2216
2337	2217	POSIX クラスは ASCII の範囲の同じものにマッチングします。
2338	2218
2339	2219	=back
2340	2220
2341	2221	=back
2342	2222
2343	2223	=back
2344	2224
2345	2225	=begin original
2346	2226
2347	2227	Which rules apply are determined as described in
2348	2228	L<perlre/Which character set modifier is in effect?>.
2349	2229
2350	2230	=end original
2351	2231
2352	2232	どの規則を適用するかは L<perlre/Which character set modifier is in effect?> で
2353	2233	記述されている方法で決定されます。
2354	2234
	2235	=begin original
	2236
	2237	It is proposed to change this behavior in a future release of Perl so that
	2238	whether or not Unicode rules are in effect would not change the
	2239	behavior: Outside of locale, the POSIX classes
	2240	would behave like their ASCII-range counterparts. If you wish to
	2241	comment on this proposal, send email to C<perl5-porters@perl.org>.
	2242
	2243	=end original
	2244
	2245	Perl の将来のバージョンではこの振る舞いを変えることが提案されています;
	2246	Unicode の規則が有効かどうかは振る舞いを変えません:
	2247	ロケールの外側では、
	2248	POSIX クラスはその ASCII の範囲の対応するものと同様に振る舞います。
	2249	この提案にコメントしたいなら、C<perl5-porters@perl.org> にメールを
	2250	送ってください。
	2251
2355	2252	=head4 Negation of POSIX character classes
2356	2253	X<character class, negation>
2357	2254
2358	2255	(POSIX 文字クラスの否定)
2359	2256
2360	2257	=begin original
2361	2258
2362	2259	A Perl extension to the POSIX character class is the ability to
2363	2260	negate it. This is done by prefixing the class name with a caret (C<^>).
2364	2261	Some examples:
2365	2262
2366	2263	=end original
2367	2264
2368	2265	POSIX 文字クラスに対する Perl の拡張は否定の機能です。
2369	2266	これはクラス名の前にキャレット (C<^>) を置くことで実現します。
2370	2267	いくつかの例です:
2371	2268
2372	2269	POSIX ASCII-range Full-range backslash
2373	2270	Unicode Unicode sequence
2374	2271	-----------------------------------------------------
2375	2272	[[:^digit:]] \P{PosixDigit} \P{XPosixDigit} \D
2376	2273	[[:^space:]] \P{PosixSpace} \P{XPosixSpace}
2377	2274	\P{PerlSpace} \P{XPerlSpace} \S
2378	2275	[[:^word:]] \P{PerlWord} \P{XPosixWord} \W
2379	2276
2380	2277	=begin original
2381	2278
2382	2279	The backslash sequence can mean either ASCII- or Full-range Unicode,
2383	2280	depending on various factors as described in L<perlre/Which character set modifier is in effect?>.
2384	2281
2385	2282	=end original
2386	2283
2387	2284	逆スラッシュシーケンスは ASCII- か Full-range Unicode のどちらかを意味します;
2388	2285	どちらが使われるかは L<perlre/Which character set modifier is in effect?> で
2389	2286	記述されている様々な要素に依存します。
2390	2287
2391	2288	=head4 [= =] and [. .]
2392	2289
2393	2290	([= =] と [. .])
2394	2291
2395	2292	=begin original
2396	2293
2397	2294	Perl recognizes the POSIX character classes C<[=class=]> and
2398	2295	C<[.class.]>, but does not (yet?) support them. Any attempt to use
2399	2296	either construct raises an exception.
2400	2297
2401	2298	=end original
2402	2299
2403	2300	Perl は POSIX 文字クラス C<[=class=]> と C<[.class.]> を認識しますが、
2404	2301	これらには(まだ?)対応していません。
2405	2302	このような構文を使おうとすると例外が発生します。
2406	2303
2407	2304	=head4 Examples
2408	2305
2409	2306	(例)
2410	2307
2411	2308	=begin original
2412	2309
2413	2310	/[[:digit:]]/ # Matches a character that is a digit.
2414	2311	/[01[:lower:]]/ # Matches a character that is either a
2415	2312	# lowercase letter, or '0' or '1'.
2416	2313	/[[:digit:][:^xdigit:]]/ # Matches a character that can be anything
2417	2314	# except the letters 'a' to 'f' and 'A' to
2418	2315	# 'F'. This is because the main character
2419	2316	# class is composed of two POSIX character
2420	2317	# classes that are ORed together, one that
2421	2318	# matches any digit, and the other that
2422	2319	# matches anything that isn't a hex digit.
2423	2320	# The OR adds the digits, leaving only the
2424	2321	# letters 'a' to 'f' and 'A' to 'F' excluded.
2425	2322
2426	2323	=end original
2427	2324
2428	2325	/[[:digit:]]/ # 数字の文字にマッチングする。
2429	2326	/[01[:lower:]]/ # 小文字、'0'、'1' のいずれかの文字に
2430	2327	# マッチングする。
2431	2328	/[[:digit:][:^xdigit:]]/ # 'a' から 'f' と 'A' から 'F' 以外の任意の文字に
2432	2329	# マッチング。これはメインの文字クラスでは二つの
2433	2330	# POSIX 文字クラスが OR され、一つは任意の数字に
2434	2331	# マッチングし、もう一つは 16 進文字でない全ての
2435	2332	# 文字にマッチングします。OR は数字を加え、
2436	2333	# 'a' から 'f' および 'A' から 'F' のみが
2437	2334	# 除外されて残ります。
	2335	#
2438	2336
2439	2337	=head3 Extended Bracketed Character Classes
2440	2338	X<character class>
2441	2339	X<set operations>
2442	2340
2443	2341	(拡張大かっこ文字クラス)
2444	2342
2445	2343	=begin original
2446	2344
2447	2345	This is a fancy bracketed character class that can be used for more
2448	2346	readable and less error-prone classes, and to perform set operations,
2449	2347	such as intersection. An example is
2450	2348
2451	2349	=end original
2452	2350
2453	2351	これはしゃれた大かっこ文字クラスで、より読みやすく、エラーが発生しにくい
2454	2352	クラスや、交差などの集合演算を実行するために使用できます。
2455		例は:
2456	2353
2457	2354	/(?[ \p{Thai} & \p{Digit} ])/
2458	2355
2459	2356	=begin original
2460	2357
2461	2358	This will match all the digit characters that are in the Thai script.
2462	2359
2463	2360	=end original
2464	2361
2465	2362	これは、タイ語スクリプト内のすべての数字と一致します。
2466	2363
2467	2364	=begin original
2468	2365
2469		This feature ~~bec~~a~~me a~~vailable in ~~Perl~~ 5.18, a~~s experime~~n~~tal; accepte~~d in
	2366	This is an experimental feature available starting in 5.18, and is
2470		5.~~36.~~
	2367	subject to change as we gain field experience with it. Any attempt to
	2368	use it will raise a warning, unless disabled via
2471	2369
2472	2370	=end original
2473	2371
2474		こ~~の機能~~は ~~Perl~~ 5.18 で実験的~~に利用可~~能に~~なりました;~~
	2372	これは 5.18 から利用できる実験的な機能で、現場での経験を積むにつれて
2475		~~5.36 で受け入~~れられました。
	2373	変更される可能性があります。
	2374	これを使用しようとすると、次のようにして無効にしない限り、警告が表示されます:
2476	2375
	2376	no warnings "experimental::regex_sets";
	2377
2477	2378	=begin original
2478	2379
2479		The ~~rule~~s used ~~by L<C<~~use re ~~'strict>\|r~~e~~/'stri~~c~~t' m~~ode> apply to ~~this~~
	2380	Comments on this feature are welcome; send email to
2480		co~~nst~~ruct.
	2381	C<perl5-porters@perl.org>.
2481	2382
2482	2383	=end original
2483	2384
2484		~~L<C<use re 'strict>\|re/'strict' mode> で使われる規則は~~この構文に
	2385	この機能に関するコメントを歓迎します。
2485		適用さ~~れます~~。
	2386	C<perl5-porters@perl.org> に電子メールを送ってください。
2486	2387
2487	2388	=begin original
2488	2389
2489	2390	We can extend the example above:
2490	2391
2491	2392	=end original
2492	2393
2493	2394	上記の例を拡張できます:
2494	2395
2495	2396	/(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])/
2496	2397
2497	2398	=begin original
2498	2399
2499	2400	This matches digits that are in either the Thai or Laotian scripts.
2500	2401
2501	2402	=end original
2502	2403
2503	2404	これはタイ語またはラオス語のいずれかの数字と一致します。
2504	2405
2505	2406	=begin original
2506	2407
2507	2408	Notice the white space in these examples. This construct always has
2508		the C<E<sol>xx> modifier turned on within it.
	2409	the C<E<sol>x> modifier turned on within it.
2509	2410
2510	2411	=end original
2511	2412
2512	2413	これらの例の中の空白に注意してください。
2513		この構文では、その中では常に C<E<sol>xx> 修飾子がオンになっています。
	2414	この構文では、その中では常に C<E<sol>x> 修飾子がオンになっています。
2514	2415
2515	2416	=begin original
2516	2417
2517	2418	The available binary operators are:
2518	2419
2519	2420	=end original
2520	2421
2521	2422	使用可能な 2 項演算子は次のとおりです:
2522	2423
2523	2424	& intersection
2524	2425	+ union
2525	2426	\| another name for '+', hence means union
2526	2427	- subtraction (the result matches the set consisting of those
2527	2428	code points matched by the first operand, excluding any that
2528	2429	are also matched by the second operand)
2529	2430	^ symmetric difference (the union minus the intersection). This
2530	2431	is like an exclusive or, in that the result is the set of code
2531	2432	points that are matched by either, but not both, of the
2532	2433	operands.
2533	2434
2534	2435	=begin original
2535	2436
2536	2437	There is one unary operator:
2537	2438
2538	2439	=end original
2539	2440
2540	2441	単項演算子が一つあります。
2541	2442
2542	2443	! complement
2543	2444
2544	2445	=begin original
2545	2446
2546	2447	All the binary operators left associate; C<"&"> is higher precedence
2547	2448	than the others, which all have equal precedence. The unary operator
2548	2449	right associates, and has highest precedence. Thus this follows the
2549	2450	normal Perl precedence rules for logical operators. Use parentheses to
2550	2451	override the default precedence and associativity.
2551	2452
2552	2453	=end original
2553	2454
2554	2455	すべての二項演算子は左結合です; C<"&"> はその他よりも高い優先順位を持ち、
2555	2456	それ以外は同等の優先順位を持ちます。
2556	2457	単項演算子は右結合で、最も高い優先順位を持ちます。
2557	2458	従って、これは通常の Perl の論理演算子に関する優先順位規則に従います。
2558	2459	デフォルトの優先順位と結合を上書きするにはかっこを使います。
2559	2460
2560	2461	=begin original
2561	2462
2562	2463	The main restriction is that everything is a metacharacter. Thus,
2563	2464	you cannot refer to single characters by doing something like this:
2564	2465
2565	2466	=end original
2566	2467
2567	2468	主な制限は、すべてがメタ文字であるということです。
2568	2469	したがって、以下のようにして単一文字を参照することはできません:
2569	2470
2570	2471	/(?[ a + b ])/ # Syntax error!
2571	2472
2572	2473	=begin original
2573	2474
2574	2475	The easiest way to specify an individual typable character is to enclose
2575	2476	it in brackets:
2576	2477
2577	2478	=end original
2578	2479
2579	2480	タイプ可能な個々の文字を指定する最も簡単な方法は、次のように
2580	2481	かっこで囲むことです:
2581	2482
2582	2483	/(?[ [a] + [b] ])/
2583	2484
2584	2485	=begin original
2585	2486
2586	2487	(This is the same thing as C<[ab]>.) You could also have said the
2587	2488	equivalent:
2588	2489
2589	2490	=end original
2590	2491
2591	2492	(これはC<[ab]>と同じことです)。
2592	2493	同じことを言うこともできます:
2593	2494
2594	2495	/(?[[ a b ]])/
2595	2496
2596	2497	=begin original
2597	2498
2598	2499	(You can, of course, specify single characters by using, C<\x{...}>,
2599	2500	C<\N{...}>, etc.)
2600	2501
2601	2502	=end original
2602	2503
2603	2504	(もちろん、C<\x{...}> や C<\N{...}> などを使用して 1 文字を
2604	2505	指定することもできます。)
2605	2506
2606	2507	=begin original
2607	2508
2608	2509	This last example shows the use of this construct to specify an ordinary
2609	2510	bracketed character class without additional set operations. Note the
2610		white space within it. ~~This is allowed because~~ C<E<sol>xx> is
	2511	white space within it; C<E<sol>x> is turned on even within bracketed
2611		a~~utom~~atically turned on wi~~thi~~n this con~~stru~~ct.
	2512	character classes, except you can't have comments inside them. Hence,
2612	2513
2613	2514	=end original
2614	2515
2615	2516	この最後の例では、この構文を使用して、追加の集合操作なしで
2616	2517	通常の大かっこ文字クラスを指定する方法を示しています。
2617		この中に空白があることに注意してください。
	2518	この中に空白があることに注意してください;
2618		C<E<sol>xx> は、この構文の内側で~~自動的に~~有効にな~~るのでこれが許され~~ます。
	2519	C<E<sol>x> は、かっこで囲まれた文字クラス内でも有効になりますが、
	2520	コメントを含めることはできません。
	2521	したがって:
2619	2522
	2523	(?[ [#] ])
	2524
2620	2525	=begin original
2621	2526
	2527	matches the literal character "#". To specify a literal white space character,
	2528	you can escape it with a backslash, like:
	2529
	2530	=end original
	2531
	2532	は、リテラル文字 "#" にマッチングします。
	2533	リテラル空白文字を指定するには、次のように逆スラッシュでエスケープします:
	2534
	2535	/(?[ [ a e i o u \ ] ])/
	2536
	2537	=begin original
	2538
	2539	This matches the English vowels plus the SPACE character.
2622	2540	All the other escapes accepted by normal bracketed character classes are
2623		accepted here as well.
	2541	accepted here as well; but unrecognized escapes that generate warnings
	2542	in normal classes are fatal errors here.
2624	2543
2625	2544	=end original
2626	2545
	2546	これは英語の母音と SPACE 文字に一致します。
2627	2547	通常の大かっこ文字クラスで受け入れられる他のエスケープは
2628		すべてここでも受け入れられます。
	2548	すべてここでも受け入れられますが、通常のクラスで警告を生成する
	2549	認識されないエスケープはここでは致命的なエラーです。
2629	2550
2630	2551	=begin original
2631	2552
2632		~~Bec~~ause this constr~~uct~~ ~~compi~~les ~~und~~er
	2553	All warnings from these class elements are fatal, as well as some
2633		~~L<C<use~~ r~~e 's~~tric~~t>\|r~~e/'st~~ric~~t' mod~~e>,~~ unre~~cog~~n~~ized~~ escapes that
	2554	practices that don't currently warn. For example you cannot say
2634		generate warnings in normal classes are fatal errors here, as well as
2635		all other warnings from these class elements, as well as some
2636		practices that don't currently warn outside C<re 'strict'>. For example
2637		you cannot say
2638	2555
2639	2556	=end original
2640	2557
2641		この~~構文は L<C<use re 'strict>\|re/'strict' mode>~~ の~~下でコンパイルされる~~ので、
	2558	これらのクラス要素からのすべての警告は致命的であり、
2642		通常のクラスで~~警告を生成~~する
	2559	現在警告していないいくつかのプラクティスも同様です。
2643		認識されないエスケープはここでは致命的なエラーです;
2644		これらのクラス要素からのその他すべての警告も同様で、
2645		C<re 'strict'> の外側では、現在警告していないいくつかのプラクティスも
2646		同様です。
2647	2560	例えば次のようにはできません:
2648	2561
2649	2562	/(?[ [ \xF ] ])/ # Syntax error!
2650	2563
2651	2564	=begin original
2652	2565
2653	2566	You have to have two hex digits after a braceless C<\x> (use a leading
2654	2567	zero to make two). These restrictions are to lower the incidence of
2655	2568	typos causing the class to not match what you thought it would.
2656	2569
2657	2570	=end original
2658	2571
2659	2572	中かっこのない C<\x> の後には 2 桁の 16 進数が必要です(2 桁にするには
2660	2573	先頭の 0 を使用します)。
2661	2574	これらの制限は、クラスが想定したものと一致しない原因となる
2662	2575	タイプミスの発生を減らすためです。
2663	2576
2664	2577	=begin original
2665	2578
2666	2579	If a regular bracketed character class contains a C<\p{}> or C<\P{}> and
2667	2580	is matched against a non-Unicode code point, a warning may be
2668	2581	raised, as the result is not Unicode-defined. No such warning will come
2669	2582	when using this extended form.
2670	2583
2671	2584	=end original
2672	2585
2673	2586	通常の大かっこ文字クラスに C<\p{}> や C<\P{}> が含まれていて、
2674	2587	非 Unicode 符号位置に対してマッチングした場合、
2675	2588	結果は Unicode で定義されていないので、警告が発生します。
2676	2589	このような警告は、拡張形式を使った場合は発生しません。
2677	2590
2678	2591	=begin original
2679	2592
2680	2593	The final difference between regular bracketed character classes and
2681	2594	these, is that it is not possible to get these to match a
2682	2595	multi-character fold. Thus,
2683	2596
2684	2597	=end original
2685	2598
2686	2599	通常の大かっこ文字クラスとこれらのクラスの最後の違いは、
2687	2600	これらを複数文字畳み込みにマッチングさせることができないということです。
2688	2601	従って:
2689	2602
2690	2603	/(?[ [\xDF] ])/iu
2691	2604
2692	2605	=begin original
2693	2606
2694	2607	does not match the string C<ss>.
2695	2608
2696	2609	=end original
2697	2610
2698	2611	は文字列 C<ss> と一致しません。
2699	2612
2700	2613	=begin original
2701	2614
2702	2615	You don't have to enclose POSIX class names inside double brackets,
2703	2616	hence both of the following work:
2704	2617
2705	2618	=end original
2706	2619
2707	2620	POSIX クラス名を二重かっこで囲む必要はありません;
2708	2621	そのため、以下の両方とも動作します:
2709	2622
2710	2623	/(?[ [:word:] - [:lower:] ])/
2711	2624	/(?[ [[:word:]] - [[:lower:]] ])/
2712	2625
2713	2626	=begin original
2714	2627
2715	2628	Any contained POSIX character classes, including things like C<\w> and C<\D>
2716	2629	respect the C<E<sol>a> (and C<E<sol>aa>) modifiers.
2717	2630
2718	2631	=end original
2719	2632
2720	2633	C<\w> や C<\D> などの POSIX 文字クラスは、C<E<sol>a>
2721	2634	(および C<E<sol>aa> )修飾子を尊重します。
2722	2635
2723	2636	=begin original
2724	2637
2725		~~Note that~~ C<< (?[ ]) >> is a regex-compile-time construct. Any attempt
	2638	C<< (?[ ]) >> is a regex-compile-time construct. Any attempt to use
2726		~~to use~~ something which isn't knowable at the time the containing regular
	2639	something which isn't knowable at the time the containing regular
2727	2640	expression is compiled is a fatal error. In practice, this means
2728	2641	just three limitations:
2729	2642
2730	2643	=end original
2731	2644
2732		C<< (?[ ]) >> はコンパイル時正規表現構文で~~あることに注意してください~~。
	2645	C<< (?[ ]) >> はコンパイル時正規表現構文です。
2733	2646	正規表現を含むコンパイル時に未知のものを使用しようとすると、
2734	2647	致命的なエラーになります。
2735	2648	実際には、これは三つの制限を意味します:
2736	2649
2737	2650	=over 4
2738	2651
2739	2652	=item 1
2740	2653
2741	2654	=begin original
2742	2655
2743		When co~~mpil~~ed within the scope of ~~C<use locale> (or the C<E<sol>l> regex~~
	2656	This construct cannot be used within the scope of
2744		~~modifi~~e~~r),~~ ~~this c~~o~~nstru~~ct a~~ssum~~es ~~that~~ the ~~executi~~o~~n-time~~ l~~oca~~le wi~~ll b~~e
	2657	C<use locale> (or the C<E<sol>l> regex modifier).
2745		a UTF-8 one, and the generated pattern always uses Unicode rules. What
2746		gets matched or not thus isn't dependent on the actual runtime locale, so
2747		tainting is not enabled. But a C<locale> category warning is raised
2748		if the runtime locale turns out to not be UTF-8.
2749	2658
2750	2659	=end original
2751	2660
2752		C<use locale> (または C<E<sol>l> 正規表現修飾子)の
	2661	この構文は、C<use locale> (または C<E<sol>l> 正規表現修飾子)の
2753		スコープ内で~~コンパイルされると、この構文~~は~~実行時ロケールが~~
	2662	スコープ内では使用できません。
2754		UTF-8 のものであることを仮定し、
2755		生成されたパターンは常に Unicode の規則を使います。
2756		従ってマッチングするかどうかは実際の実行時ロケールには関係なく、
2757		汚染チェックモードは有効になりません。
2758		しかし、実行時ロケールが UTF-8 以外になると、
2759		C<locale> カテゴリの警告が発生します。
2760	2663
2761	2664	=item 2
2762	2665
2763	2666	=begin original
2764	2667
2765	2668	Any
2766	2669	L<user-defined property\|perlunicode/"User-Defined Character Properties">
2767	2670	used must be already defined by the time the regular expression is
2768	2671	compiled (but note that this construct can be used instead of such
2769	2672	properties).
2770	2673
2771	2674	=end original
2772	2675
2773	2676	使用される
2774	2677	L<ユーザー定義特性\|perlunicode/"User-Defined Character Properties"> は、
2775	2678	正規表現がコンパイルされるときにすでに定義されている必要があります
2776	2679	(ただし、このような特性の代わりにこの構文を使用することもできます)。
2777	2680
2778	2681	=item 3
2779	2682
2780	2683	=begin original
2781	2684
2782	2685	A regular expression that otherwise would compile
2783	2686	using C<E<sol>d> rules, and which uses this construct will instead
2784	2687	use C<E<sol>u>. Thus this construct tells Perl that you don't want
2785	2688	C<E<sol>d> rules for the entire regular expression containing it.
2786	2689
2787	2690	=end original
2788	2691
2789	2692	C<E<sol>d> 規則を使用してコンパイルされ、この構文を使用する正規表現は、
2790	2693	代わりに C<E<sol>u> を使用します。
2791	2694	したがって、この構文は、C<E<sol>d> 規則が含まれている
2792	2695	正規表現全体に対して C<E<sol>d> 規則が必要ないことを Perl に通知します。
2793	2696
2794	2697	=back
2795	2698
2796	2699	=begin original
2797	2700
2798	2701	Note that skipping white space applies only to the interior of this
2799	2702	construct. There must not be any space between any of the characters
2800	2703	that form the initial C<(?[>. Nor may there be space between the
2801	2704	closing C<])> characters.
2802	2705
2803	2706	=end original
2804	2707
2805	2708	空白のスキップは、この構造体の内部にのみ適用されることに注意してください。
2806	2709	最初の C<(?[> を形成する文字の間に空白を入れることはできません。
2807	2710	また、終わりの C<])> 文字の間に空白を入れることもできません。
2808	2711
2809	2712	=begin original
2810	2713
2811	2714	Just as in all regular expressions, the pattern can be built up by
2812	2715	including variables that are interpolated at regex compilation time.
2813		But ~~curr~~ently e~~ach~~ su~~ch sub-compon~~ent should be an a~~lread~~y-compiled
	2716	Care must be taken to ensure that you are getting what you expect. For
2814		ex~~tended br~~a~~cketed character c~~l~~ass.~~
	2717	example:
2815	2718
2816	2719	=end original
2817	2720
2818	2721	すべての正規表現と同様に、正規表現コンパイル時に補完される変数を
2819	2722	含めることでパターンを構築できます。
2820		~~しかし、現在~~の~~所、この~~よう~~な部分~~要~~素のそれぞれは~~
	2723	期待どおりの結果が得られるように注意が必要です。
2821		~~すでにコンパイルされた拡張大かっこ文字クラスであるべきです。~~
	2724	例えば:
2822	2725
2823		my $thai_or_lao = ~~qr/(?[~~ \p{Thai} + \p{Lao} ~~])/~~;
	2726	my $thai_or_lao = '\p{Thai} + \p{Lao}';
2824	2727	...
2825	2728	qr/(?[ \p{Digit} & $thai_or_lao ])/;
2826	2729
2827	2730	=begin original
2828	2731
2829		~~If you interpolate something else, the pattern may still~~ compile ~~(or i~~t
	2732	compiles to
2830		may die), but if it compiles, it very well may not behave as you would
2831		expect:
2832	2733
2833	2734	=end original
2834	2735
2835		~~何か違うものを変数展開すると、パターン~~は~~やはり~~コンパイルされます
	2736	これは次のようにコンパイルされます:
2836		(あるいは die します)が、コンパイルされると、想像しているものと
2837		かなり違う振る舞いになるかもしれません:
2838	2737
2839		my ~~$tha~~i~~_or_lao~~ = '\p{Thai} + \p{Lao}';
	2738	qr/(?[ \p{Digit} & \p{Thai} + \p{Lao} ])/;
2840		qr/(?[ \p{Digit} & $thai_or_lao ])/;
2841	2739
2842	2740	=begin original
2843	2741
2844		com~~pil~~es to
	2742	But this does not have the effect that someone reading the code would
	2743	likely expect, as the intersection applies just to C<\p{Thai}>,
	2744	excluding the Laotian. Pitfalls like this can be avoided by
	2745	parenthesizing the component pieces:
2845	2746
2846	2747	=end original
2847	2748
2848		これは次のよう~~にコンパイルされ~~ます:
	2749	しかし、これは、コードを読んでいる人が期待するような効果はありません;
	2750	なぜなら、この交差は C<\p{Thai}> だけに適用され、ラオス語には
	2751	適用されないからです。
	2752	このような落とし穴は、コンポーネントをかっこで囲むことで回避できます:
2849	2753
2850		~~qr/(?[~~ ~~\p{Digi~~t} & \p{Thai} + \p{Lao} ])/;
	2754	my $thai_or_lao = '( \p{Thai} + \p{Lao} )';
2851	2755
2852	2756	=begin original
2853	2757
2854		~~This~~ does not have the ~~effect~~ that someone ~~readi~~ng the s~~ource code~~
	2758	But any modifiers will still apply to all the components:
2855		would likely expect, as the intersection applies just to C<\p{Thai}>,
2856		excluding the Laotian.
2857	2759
2858	2760	=end original
2859	2761
2860		~~これは~~、~~ソース~~コー~~ドを読んでいる人が期待するような効果はあり~~ま~~せん;~~
	2762	ただし、修飾子はすべてのコンポーネントに適用されます:
2861		なぜなら、この交差は C<\p{Thai}> だけに適用され、ラオス語には
2862		適用されないからです。
2863	2763
	2764	my $lower = '\p{Lower} + \p{Digit}';
	2765	qr/(?[ \p{Greek} & $lower ])/i;
	2766
2864	2767	=begin original
2865	2768
	2769	matches upper case things. You can avoid surprises by making the
	2770	components into instances of this construct by compiling them:
	2771
	2772	=end original
	2773
	2774	これは大文字のものと一致します。
	2775	コンポーネントをコンパイルしてこの構文の実体にすることで、
	2776	予期せぬ事態を避けることができます:
	2777
	2778	my $thai_or_lao = qr/(?[ \p{Thai} + \p{Lao} ])/;
	2779	my $lower = qr/(?[ \p{Lower} + \p{Digit} ])/;
	2780
	2781	=begin original
	2782
	2783	When these are embedded in another pattern, what they match does not
	2784	change, regardless of parenthesization or what modifiers are in effect
	2785	in that outer pattern.
	2786
	2787	=end original
	2788
	2789	これらが別のパターンに埋め込まれている場合、親子関係やその外側のパターンで
	2790	有効な修飾子に関係なく、一致するものは変わりません。
	2791
	2792	=begin original
	2793
2866	2794	Due to the way that Perl parses things, your parentheses and brackets
2867	2795	may need to be balanced, even including comments. If you run into any
2868		examples, please s~~ubmit~~ them to L<~~htt~~p~~s://github.com/P~~erl/perl~~5/issues~~>,
	2796	examples, please send them to C<perlbug@perl.org>, so that we can have a
2869		~~so that we can have a~~ concrete example for this man page.
	2797	concrete example for this man page.
2870	2798
2871	2799	=end original
2872	2800
2873	2801	Perl の構文解析方法によっては、コメントを含めてもかっこと大かっこの
2874	2802	バランスを取る必要がある場合があります。
2875		もし何か例を見つけたら、L<~~htt~~p~~s://github.com/P~~erl/perl~~5/issues~~> に
	2803	もし何か例を見つけたら、C<perlbug@perl.org> まで送ってください。
2876		登録してください;
2877	2804	そうすれば、この man ページの具体的な例を得ることができます。
	2805
	2806	=begin original
	2807
	2808	We may change it so that things that remain legal uses in normal bracketed
	2809	character classes might become illegal within this experimental
	2810	construct. One proposal, for example, is to forbid adjacent uses of the
	2811	same character, as in C<(?[ [aa] ])>. The motivation for such a change
	2812	is that this usage is likely a typo, as the second "a" adds nothing.
	2813
	2814	=end original
	2815
	2816	たとえば、C<(?[ [aa] ])> のように、同じ文字を隣接して使用すること
	2817	を禁止することが提案されています。
	2818	このような変更の動機は、2 番目の "a" は何も追加しないので、この使用は
	2819	タイプミスである可能性が高いということです。
2878	2820
2879	2821	=begin meta
2880	2822
2881	2823	Translate: SHIRAKATA Kentaro <argrath@ub32.org> (5.10.1-)
2882	2824	Status: completed
2883	2825
2884	2826	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association