perlunicook 5.38.0 と 5.36.0 の差分

1	1
2	2	=encoding utf8
3	3
4	4	=head1 NAME
5	5
6	6	=begin original
7	7
8	8	perlunicook - cookbookish examples of handling Unicode in Perl
9	9
10	10	=end original
11	11
12	12	perlunicook - Perl で Unicode を扱うためのクックブック風の例
13	13
14	14	=head1 DESCRIPTION
15	15
16	16	=begin original
17	17
18	18	This manpage contains short recipes demonstrating how to handle common Unicode
19	19	operations in Perl, plus one complete program at the end. Any undeclared
20	20	variables in individual recipes are assumed to have a previous appropriate
21	21	value in them.
22	22
23	23	=end original
24	24
25	25	この man ページには、Perl で一般的な Unicode 操作を扱う方法を説明する
26	26	短いレシピと、最後に一つの完全なプログラムが含まれています。
27	27	個々のレシピ内の宣言されていない変数は、それ以前に適切な値が
28	28	設定されていることを仮定しています。
29	29
30	30	=head1 EXAMPLES
31	31
32	32	=head2 ℞ 0: Standard preamble
33	33
34	34	(℞ 0: 標準の前提)
35	35
36	36	=begin original
37	37
38	38	Unless otherwise notes, all examples below require this standard preamble
39	39	to work correctly, with the C<#!> adjusted to work on your system:
40	40
41	41	=end original
42	42
43	43	特に注記がない限り、以下のすべての例では、この標準の前提が正しく動作し、
44	44	C<#!> がシステム上で動作するように調整されている必要があります。
45	45
46	46	#!/usr/bin/env perl
47	47
48	48	=begin original
49	49
50	50	use v5.36; # or later to get "unicode_strings" feature,
51	51	# plus strict, warnings
52	52	use utf8; # so literals and identifiers can be in UTF-8
53	53	use warnings qw(FATAL utf8); # fatalize encoding glitches
54	54	use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
55	55	use charnames qw(:full :short); # unneeded in v5.16
56	56
57	57	=end original
58	58
59	59	use v5.36; # またはそれ以降; "unicode_strings" 機能を有効に
60	60	# 加えて strict, warnings
61	61	use utf8; # 従ってリテラルと識別子で UTF-8 を使える
62	62	use warnings qw(FATAL utf8); # エンコーディングエラーを致命的エラーに
63	63	use open qw(:std :encoding(UTF-8)); # 未宣言ストリームを UTF-8 に
64	64	use charnames qw(:full :short); # v5.16 では不要
65	65
66	66	=begin original
67	67
68	68	This I<does> make even Unix programmers C<binmode> your binary streams,
69	69	or open them with C<:raw>, but that's the only way to get at them
70	70	portably anyway.
71	71
72	72	=end original
73	73
74	74	これは Unix プログラマでさえバイナリストリームを C<binmode> したり、
75	75	C<:raw> で開いたり I<しています> が、それがとにかくこれらを
76	76	移植性のあるものにする唯一の方法です。
77	77
78	78	=begin original
79	79
80	80	B<WARNING>: C<use autodie> (pre 2.26) and C<use open> do not get along with each
81	81	other.
82	82
83	83	=end original
84	84
85	85	B<警告>: C<use autodie>(2.26 より前)と C<use open> は同時に使えません。
86	86
87	87	=head2 ℞ 1: Generic Unicode-savvy filter
88	88
89	89	(℞ 1: 一般的な Unicode が使えるフィルタ)
90	90
91	91	=begin original
92	92
93	93	Always decompose on the way in, then recompose on the way out.
94	94
95	95	=end original
96	96
97	97	常に、入り口で分解し、出口で再合成します。
98	98
99	99	use Unicode::Normalize;
100	100
101	101	while (<>) {
102	102	$_ = NFD($_); # decompose + reorder canonically
103	103	...
104	104	} continue {
105	105	print NFC($_); # recompose (where possible) + reorder canonically
106	106	}
107	107
108	108	=head2 ℞ 2: Fine-tuning Unicode warnings
109	109
110	110	(℞ 2: Unicode 警告の微調整)
111	111
112	112	=begin original
113	113
114	114	As of v5.14, Perl distinguishes three subclasses of UTF‑8 warnings.
115	115
116	116	=end original
117	117
118	118	v5.14 から、Perl は UTF-8 警告の三つのサブクラスを区別しています。
119	119
120	120	use v5.14; # subwarnings unavailable any earlier
121	121	no warnings "nonchar"; # the 66 forbidden non-characters
122	122	no warnings "surrogate"; # UTF-16/CESU-8 nonsense
123	123	no warnings "non_unicode"; # for codepoints over 0x10_FFFF
124	124
125	125	=head2 ℞ 3: Declare source in utf8 for identifiers and literals
126	126
127	127	(℞ 3: 識別子とリテラルのためにソースが utf8 であると宣言する)
128	128
129	129	=begin original
130	130
131	131	Without the all-critical C<use utf8> declaration, putting UTF‑8 in your
132	132	literals and identifiers won’t work right. If you used the standard
133	133	preamble just given above, this already happened. If you did, you can
134	134	do things like this:
135	135
136	136	=end original
137	137
138	138	最も重要な C<use utf8> 宣言なしの場合、リテラルと識別子に
139	139	UTF-8 を入れると正しく動作しません。
140	140	前述した標準の前提を使った場合、これは既に含まれています。
141	141	その場合、以下のようなことができます:
142	142
143	143	use utf8;
144	144
145	145	my $measure = "Ångström";
146	146	my @μsoft = qw( cp852 cp1251 cp1252 );
147	147	my @ὑπέρμεγας = qw( ὑπέρ μεγας );
148	148	my @鯉 = qw( koi8-f koi8-u koi8-r );
149	149	my $motto = "👪 💗 🐪"; # FAMILY, GROWING HEART, DROMEDARY CAMEL
150	150
151	151	=begin original
152	152
153	153	If you forget C<use utf8>, high bytes will be misunderstood as
154	154	separate characters, and nothing will work right.
155	155
156	156	=end original
157	157
158	158	C<use utf8> を忘れると、上位バイトは別々の文字として誤解され、
159	159	何も正しく動作しません。
160	160
161	161	=head2 ℞ 4: Characters and their numbers
162	162
163	163	(℞ 4: 文字とその番号)
164	164
165	165	=begin original
166	166
167	167	The C<ord> and C<chr> functions work transparently on all codepoints,
168	168	not just on ASCII alone — nor in fact, not even just on Unicode alone.
169	169
170	170	=end original
171	171
172	172	C<ord> 関数と C<chr> 関数は、すべての符号位置で透過的に動作します;
173	173	ASCII だけではなく、実際には Unicode だけでもありません。
174	174
175	175	# ASCII characters
176	176	ord("A")
177	177	chr(65)
178	178
179	179	# characters from the Basic Multilingual Plane
180	180	ord("Σ")
181	181	chr(0x3A3)
182	182
183	183	# beyond the BMP
184	184	ord("𝑛") # MATHEMATICAL ITALIC SMALL N
185	185	chr(0x1D45B)
186	186
187	187	# beyond Unicode! (up to MAXINT)
188	188	ord("\x{20_0000}")
189	189	chr(0x20_0000)
190	190
191	191	=head2 ℞ 5: Unicode literals by character number
192	192
193	193	(℞ 5: 文字番号による Unicode リテラル)
194	194
195	195	=begin original
196	196
197	197	In an interpolated literal, whether a double-quoted string or a
198	198	regex, you may specify a character by its number using the
199	199	C<\x{I<HHHHHH>}> escape.
200	200
201	201	=end original
202	202
203	203	展開リテラルでは、ダブルクォートで囲まれた文字列か正規表現かにかかわらず、
204	204	C<\x{I<HHHHHH>}> エスケープを使用して番号で文字を指定できます。
205	205
206	206	String: "\x{3a3}"
207	207	Regex: /\x{3a3}/
208	208
209	209	String: "\x{1d45b}"
210	210	Regex: /\x{1d45b}/
211	211
212	212	# even non-BMP ranges in regex work fine
213	213	/[\x{1D434}-\x{1D467}]/
214	214
215	215	=head2 ℞ 6: Get character name by number
216	216
217	217	(℞ 6: 番号で文字名を取得する)
218	218
219	219	use charnames ();
220	220	my $name = charnames::viacode(0x03A3);
221	221
222	222	=head2 ℞ 7: Get character number by name
223	223
224	224	(℞ 7: 名前で文字番号を取得する)
225	225
226	226	use charnames ();
227	227	my $number = charnames::vianame("GREEK CAPITAL LETTER SIGMA");
228	228
229	229	=head2 ℞ 8: Unicode named characters
230	230
231	231	(℞ 8: Unicode 名による文字)
232	232
233	233	=begin original
234	234
235	235	Use the C<< \N{I<charname>} >> notation to get the character
236	236	by that name for use in interpolated literals (double-quoted
237	237	strings and regexes). In v5.16, there is an implicit
238	238
239	239	=end original
240	240
241	241	展開リテラル(ダブルクォートで囲まれた文字列と正規表現)で用いる、
242	242	名前で文字を得るために C<<\N{I<charname>}>> 表記を使います。
243	243	v5.16 では、これは暗黙に指定されます:
244	244
245	245	use charnames qw(:full :short);
246	246
247	247	=begin original
248	248
249	249	But prior to v5.16, you must be explicit about which set of charnames you
250	250	want. The C<:full> names are the official Unicode character name, alias, or
251	251	sequence, which all share a namespace.
252	252
253	253	=end original
254	254
255	255	しかし、v5.16 より前のバージョンでは、どの charnames の集合を使用するかを
256	256	明示的に指定しなければなりません。
257	257	C<:full> の名前は、Unicode の正式な文字名、別名、または
258	258	並びであり、すべて名前空間を共有します。
259	259
260	260	use charnames qw(:full :short latin greek);
261	261
262	262	"\N{MATHEMATICAL ITALIC SMALL N}" # :full
263	263	"\N{GREEK CAPITAL LETTER SIGMA}" # :full
264	264
265	265	=begin original
266	266
267	267	Anything else is a Perl-specific convenience abbreviation. Specify one or
268	268	more scripts by names if you want short names that are script-specific.
269	269
270	270	=end original
271	271
272	272	それ以外は、Perl 固有の便利な省略形です。
273	273	用字固有の短い名前が必要な場合は、一つ以上の用字を名前で指定します。
274	274
275	275	"\N{Greek:Sigma}" # :short
276	276	"\N{ae}" # latin
277	277	"\N{epsilon}" # greek
278	278
279	279	=begin original
280	280
281	281	The v5.16 release also supports a C<:loose> import for loose matching of
282	282	character names, which works just like loose matching of property names:
283	283	that is, it disregards case, whitespace, and underscores:
284	284
285	285	=end original
286	286
287	287	v5.16 リリースでは、文字名の緩やかなマッチングのための
288	288	C<:loose> インポートにも対応しています;
289	289	これは特性名の緩やかなマッチングと同じように機能します:
290	290	つまり、大文字小文字、空白、下線は無視されます:
291	291
292	292	"\N{euro sign}" # :loose (from v5.16)
293	293
294	294	=begin original
295	295
296	296	Starting in v5.32, you can also use
297	297
298	298	=end original
299	299
300	300	v5.32 から、次のものを使って:
301	301
302	302	qr/\p{name=euro sign}/
303	303
304	304	=begin original
305	305
306	306	to get official Unicode named characters in regular expressions. Loose
307	307	matching is always done for these.
308	308
309	309	=end original
310	310
311	311	公式な正規表現での Unicode の名前の文字を得られます。
312	312	緩いマッチングは常にこれらで行われます。
313	313
314	314	=head2 ℞ 9: Unicode named sequences
315	315
316	316	(℞ 9: Unicode 名による並び)
317	317
318	318	=begin original
319	319
320	320	These look just like character names but return multiple codepoints.
321	321	Notice the C<%vx> vector-print functionality in C<printf>.
322	322
323	323	=end original
324	324
325	325	これらは文字名のように見えますが、複数の符号位置を返します。
326	326	C<printf> の C<%vx> ベクトル表示機能に注目してください。
327	327
328	328	use charnames qw(:full);
329	329	my $seq = "\N{LATIN CAPITAL LETTER A WITH MACRON AND GRAVE}";
330	330	printf "U+%v04X\n", $seq;
331	331	U+0100.0300
332	332
333	333	=head2 ℞ 10: Custom named characters
334	334
335	335	(℞ 10: カスタム名による文字)
336	336
337	337	=begin original
338	338
339	339	Use C<:alias> to give your own lexically scoped nicknames to existing
340	340	characters, or even to give unnamed private-use characters useful names.
341	341
342	342	=end original
343	343
344	344	C<:alias> を使用して、既存の文字に対してレキシカルスコープの
345	345	独自のニックネームを付けたり、無名の私用文字に有用な名前を
346	346	付けることができます。
347	347
348	348	use charnames ":full", ":alias" => {
349	349	ecute => "LATIN SMALL LETTER E WITH ACUTE",
350	350	"APPLE LOGO" => 0xF8FF, # private use character
351	351	};
352	352
353	353	"\N{ecute}"
354	354	"\N{APPLE LOGO}"
355	355
356	356	=head2 ℞ 11: Names of CJK codepoints
357	357
358	358	(℞ 11: CJK 符号位置の名前)
359	359
360	360	=begin original
361	361
362	362	Sinograms like “東京” come back with character names of
363	363	C<CJK UNIFIED IDEOGRAPH-6771> and C<CJK UNIFIED IDEOGRAPH-4EAC>,
364	364	because their “names” vary. The CPAN C<Unicode::Unihan> module
365	365	has a large database for decoding these (and a whole lot more), provided you
366	366	know how to understand its output.
367	367
368	368	=end original
369	369
370	370	「東京」のような中国漢字は、「名前」が異なるため、
371	371	C<CJK UNIFIED IDEOGRAPH-6771> と
372	372	C<CJK UNIFIED IDEOGRAPH-4EAC> という文字名で戻ってきます。
373	373	CPAN の C<Unicode::Unihan> モジュールは、その出力を理解する方法を
374	374	知っていれば、これら(およびさらに多くの)文字をデコードするための
375	375	大規模なデータベースを持ちます。
376	376
377	377	# cpan -i Unicode::Unihan
378	378	use Unicode::Unihan;
379	379	my $str = "東京";
380	380	my $unhan = Unicode::Unihan->new;
381	381	for my $lang (qw(Mandarin Cantonese Korean JapaneseOn JapaneseKun)) {
382	382	printf "CJK $str in %-12s is ", $lang;
383	383	say $unhan->$lang($str);
384	384	}
385	385
386	386	=begin original
387	387
388	388	prints:
389	389
390	390	=end original
391	391
392	392	これは次のものを表示します:
393	393
394	394	CJK 東京 in Mandarin is DONG1JING1
395	395	CJK 東京 in Cantonese is dung1ging1
396	396	CJK 東京 in Korean is TONGKYENG
397	397	CJK 東京 in JapaneseOn is TOUKYOU KEI KIN
398	398	CJK 東京 in JapaneseKun is HIGASHI AZUMAMIYAKO
399	399
400	400	=begin original
401	401
402	402	If you have a specific romanization scheme in mind,
403	403	use the specific module:
404	404
405	405	=end original
406	406
407	407	特定のローマ字化スキームを考えている場合は、特定のモジュールを使います:
408	408
409	409	# cpan -i Lingua::JA::Romanize::Japanese
410	410	use Lingua::JA::Romanize::Japanese;
411	411	my $k2r = Lingua::JA::Romanize::Japanese->new;
412	412	my $str = "東京";
413	413	say "Japanese for $str is ", $k2r->chars($str);
414	414
415	415	=begin original
416	416
417	417	prints
418	418
419	419	=end original
420	420
421	421	これは次のものを表示します:
422	422
423	423	Japanese for 東京 is toukyou
424	424
425	425	=head2 ℞ 12: Explicit encode/decode
426	426
427	427	(℞ 12: 明示的なエンコード/デコード)
428	428
429	429	=begin original
430	430
431	431	On rare occasion, such as a database read, you may be
432	432	given encoded text you need to decode.
433	433
434	434	=end original
435	435
436	436	まれに、データベースの読み取りなど、デコードする必要がある
437	437	エンコードされたテキストを受け取ることがあります。
438	438
439	439	use Encode qw(encode decode);
440	440
441	441	my $chars = decode("shiftjis", $bytes, 1);
442	442	# OR
443	443	my $bytes = encode("MIME-Header-ISO_2022_JP", $chars, 1);
444	444
445	445	=begin original
446	446
447	447	For streams all in the same encoding, don't use encode/decode; instead
448	448	set the file encoding when you open the file or immediately after with
449	449	C<binmode> as described later below.
450	450
451	451	=end original
452	452
453	453	同じエンコーディングのストリームに対しては、encode/decode を
454	454	使わないでください;
455	455	代わりに、後述するように、ファイルを開くとき、またはその直後に
456	456	C<binmode> でファイルエンコーディングを設定してください。
457	457
458	458	=head2 ℞ 13: Decode program arguments as utf8
459	459
460	460	(℞ 13: プログラム引数を utf8 としてデコードする)
461	461
462	462	$ perl -CA ...
463	463	or
464	464	$ export PERL_UNICODE=A
465	465	or
466	466	use Encode qw(decode);
467	467	@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
468	468
469	469	=head2 ℞ 14: Decode program arguments as locale encoding
470	470
471	471	(℞ 14: プログラム引数をロケールエンコーディングとしてデコードする)
472	472
473	473	# cpan -i Encode::Locale
474	474	use Encode qw(locale);
475	475	use Encode::Locale;
476	476
477	477	# use "locale" as an arg to encode/decode
478	478	@ARGV = map { decode(locale => $_, 1) } @ARGV;
479	479
480	480	=head2 ℞ 15: Declare STD{IN,OUT,ERR} to be utf8
481	481
482	482	(℞ 15: STD{IN,OUT,ERR} を utf8 として宣言する)
483	483
484	484	=begin original
485	485
486	486	Use a command-line option, an environment variable, or else
487	487	call C<binmode> explicitly:
488	488
489	489	=end original
490	490
491	491	コマンドラインオプションや環境変数を使うか、明示的に
492	492	C<binmode> を呼び出します。
493	493
494	494	$ perl -CS ...
495	495	or
496	496	$ export PERL_UNICODE=S
497	497	or
498	498	use open qw(:std :encoding(UTF-8));
499	499	or
500	500	binmode(STDIN, ":encoding(UTF-8)");
501	501	binmode(STDOUT, ":utf8");
502	502	binmode(STDERR, ":utf8");
503	503
504	504	=head2 ℞ 16: Declare STD{IN,OUT,ERR} to be in locale encoding
505	505
506	506	(℞ 15: STD{IN,OUT,ERR} をロケールエンコーディングとして宣言する)
507	507
508	508	# cpan -i Encode::Locale
509	509	use Encode;
510	510	use Encode::Locale;
511	511
512	512	# or as a stream for binmode or open
513	513	binmode STDIN, ":encoding(console_in)" if -t STDIN;
514	514	binmode STDOUT, ":encoding(console_out)" if -t STDOUT;
515	515	binmode STDERR, ":encoding(console_out)" if -t STDERR;
516	516
517	517	=head2 ℞ 17: Make file I/O default to utf8
518	518
519	519	(℞ 17: ファイル I/O のデフォルトを utf8 にする)
520	520
521	521	=begin original
522	522
523	523	Files opened without an encoding argument will be in UTF-8:
524	524
525	525	=end original
526	526
527	527	encoding 引数なしで開かれたファイルは UTF-8 になります:
528	528
529	529	$ perl -CD ...
530	530	or
531	531	$ export PERL_UNICODE=D
532	532	or
533	533	use open qw(:encoding(UTF-8));
534	534
535	535	=head2 ℞ 18: Make all I/O and args default to utf8
536	536
537	537	(℞ 18: 全ての I/O と引数のデフォルトを utf8 にする)
538	538
539	539	$ perl -CSDA ...
540	540	or
541	541	$ export PERL_UNICODE=SDA
542	542	or
543	543	use open qw(:std :encoding(UTF-8));
544	544	use Encode qw(decode);
545	545	@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
546	546
547	547	=head2 ℞ 19: Open file with specific encoding
548	548
549	549	(℞ 19: 特定のエンコーディングでファイルを開く)
550	550
551	551	=begin original
552	552
553	553	Specify stream encoding. This is the normal way
554	554	to deal with encoded text, not by calling low-level
555	555	functions.
556	556
557	557	=end original
558	558
559	559	ストリームエンコーディングを指定します。
560	560	これは、低レベル関数を呼び出すのではなく、エンコードされたテキストを
561	561	処理する通常の方法です。
562	562
563	563	# input file
564	564	open(my $in_file, "< :encoding(UTF-16)", "wintext");
565	565	OR
566	566	open(my $in_file, "<", "wintext");
567	567	binmode($in_file, ":encoding(UTF-16)");
568	568	THEN
569	569	my $line = <$in_file>;
570	570
571	571	# output file
572	572	open($out_file, "> :encoding(cp1252)", "wintext");
573	573	OR
574	574	open(my $out_file, ">", "wintext");
575	575	binmode($out_file, ":encoding(cp1252)");
576	576	THEN
577	577	print $out_file "some text\n";
578	578
579	579	=begin original
580	580
581	581	More layers than just the encoding can be specified here. For example,
582	582	the incantation C<":raw :encoding(UTF-16LE) :crlf"> includes implicit
583	583	CRLF handling.
584	584
585	585	=end original
586	586
587	587	ここで指定できるのは、エンコーディングだけではありません。
588	588	例えば、呪文 C<":raw :encoding(UTF-16LE) :crlf"> には
589	589	暗黙的な CRLF 処理が含まれています。
590	590
591	591	=head2 ℞ 20: Unicode casing
592	592
593	593	(℞ 20: Unicode の大文字小文字)
594	594
595	595	=begin original
596	596
597	597	Unicode casing is very different from ASCII casing.
598	598
599	599	=end original
600	600
601	601	Unicode の大文字小文字は ASCII の大文字小文字とは大きく異なります。
602	602
603	603	uc("henry ⅷ") # "HENRY Ⅷ"
604	604	uc("tschüß") # "TSCHÜSS" notice ß => SS
605	605
606	606	# both are true:
607	607	"tschüß" =~ /TSCHÜSS/i # notice ß => SS
608	608	"Σίσυφος" =~ /ΣΊΣΥΦΟΣ/i # notice Σ,σ,ς sameness
609	609
610	610	=head2 ℞ 21: Unicode case-insensitive comparisons
611	611
612	612	(℞ 21: Unicode の大文字小文字を無視した比較)
613	613
614	614	=begin original
615	615
616	616	Also available in the CPAN L<Unicode::CaseFold> module,
617	617	the new C<fc> “foldcase” function from v5.16 grants
618	618	access to the same Unicode casefolding as the C</i>
619	619	pattern modifier has always used:
620	620
621	621	=end original
622	622
623	623	CPAN の L<Unicode::CaseFold> モジュールでも利用可能な、v5.16 の新しい
624	624	C<fc> "foldcase" 関数は、C</i> パターン修飾子が常に使ってきたのと同じ
625	625	Unicode 大文字小文字畳み込みへのアクセスを与えます。
626	626
627	627	use feature "fc"; # fc() function is from v5.16
628	628
629	629	# sort case-insensitively
630	630	my @sorted = sort { fc($a) cmp fc($b) } @list;
631	631
632	632	# both are true:
633	633	fc("tschüß") eq fc("TSCHÜSS")
634	634	fc("Σίσυφος") eq fc("ΣΊΣΥΦΟΣ")
635	635
636	636	=head2 ℞ 22: Match Unicode linebreak sequence in regex
637	637
638	638	(℞ 22: 正規表現中の Unicode 改行並びのマッチング)
639	639
640	640	=begin original
641	641
642	642	A Unicode linebreak matches the two-character CRLF
643	643	grapheme or any of seven vertical whitespace characters.
644	644	Good for dealing with textfiles coming from different
645	645	operating systems.
646	646
647	647	=end original
648	648
649	649	Unicode の改行は、2 文字の CRLF 書記素または七つの垂直空白文字の
650	650	いずれかにマッチングします。
651	651	異なるオペレーティングシステムから送られてくるテキストファイルを
652	652	扱うのに適しています。
653	653
654	654	\R
655	655
656	656	s/\R/\n/g; # normalize all linebreaks to \n
657	657
658	658	=head2 ℞ 23: Get character category
659	659
660	660	(℞ 23: 文字カテゴリを得る)
661	661
662	662	=begin original
663	663
664	664	Find the general category of a numeric codepoint.
665	665
666	666	=end original
667	667
668	668	数値符号位置の一般カテゴリを見つけます。
669	669
670	670	use Unicode::UCD qw(charinfo);
671	671	my $cat = charinfo(0x3A3)->{category}; # "Lu"
672	672
673	673	=head2 ℞ 24: Disabling Unicode-awareness in builtin charclasses
674	674
675	675	(℞ 24: 組み込み文字クラスで Unicode 判定を無効にする)
676	676
677	677	=begin original
678	678
679	679	Disable C<\w>, C<\b>, C<\s>, C<\d>, and the POSIX
680	680	classes from working correctly on Unicode either in this
681	681	scope, or in just one regex.
682	682
683	683	=end original
684	684
685	685	このスコープまたは一つの正規表現で、C<\w>、C<\b>、C<\s>、C<\d>、
686	686	および POSIX クラスが Unicode で正しく動作しないようにします。
687	687
688	688	use v5.14;
689	689	use re "/a";
690	690
691	691	# OR
692	692
693	693	my($num) = $str =~ /(\d+)/a;
694	694
695	695	=begin original
696	696
697	697	Or use specific un-Unicode properties, like C<\p{ahex}>
698	698	and C<\p{POSIX_Digit>}. Properties still work normally
699	699	no matter what charset modifiers (C</d /u /l /a /aa>)
700	700	should be effect.
701	701
702	702	=end original
703	703
704	704	または、C<\p{ahex}> や C<\p{POSIX_Digit>} などの特定の非 Unicode 特性を
705	705	使います。
706	706	どの文字集合修飾子 (C</d /u /l /a /aa>) が有効であっても、
707	707	特性は正常に動作します。
708	708
709	709	=head2 ℞ 25: Match Unicode properties in regex with \p, \P
710	710
711	711	(℞ 25: 正規表現中に \p, \P を使って Unicode 特性にマッチングする)
712	712
713	713	=begin original
714	714
715	715	These all match a single codepoint with the given
716	716	property. Use C<\P> in place of C<\p> to match
717	717	one codepoint lacking that property.
718	718
719	719	=end original
720	720
721	721	これらはすべて、指定された特性を持つ一つの符号位置にマッチングします。
722	722	C<\p> の代わりに C<\P> を使用すると、その特性を持たない一つの符号位置に
723	723	マッチングします。
724	724
725	725	\pL, \pN, \pS, \pP, \pM, \pZ, \pC
726	726	\p{Sk}, \p{Ps}, \p{Lt}
727	727	\p{alpha}, \p{upper}, \p{lower}
728	728	\p{Latin}, \p{Greek}
729	729	\p{script_extensions=Latin}, \p{scx=Greek}
730	730	\p{East_Asian_Width=Wide}, \p{EA=W}
731	731	\p{Line_Break=Hyphen}, \p{LB=HY}
732	732	\p{Numeric_Value=4}, \p{NV=4}
733	733
734	734	=head2 ℞ 26: Custom character properties
735	735
736	736	(℞ 26: カスタム文字特性)
737	737
738	738	=begin original
739	739
740	740	Define at compile-time your own custom character
741	741	properties for use in regexes.
742	742
743	743	=end original
744	744
745	745	正規表現で使用する独自のカスタム文字特性をコンパイル時に定義します。
746	746
747	747	# using private-use characters
748	748	sub In_Tengwar { "E000\tE07F\n" }
749	749
750	750	if (/\p{In_Tengwar}/) { ... }
751	751
752	752	# blending existing properties
753	753	sub Is_GraecoRoman_Title {<<'END_OF_SET'}
754	754	+utf8::IsLatin
755	755	+utf8::IsGreek
756	756	&utf8::IsTitle
757	757	END_OF_SET
758	758
759	759	if (/\p{Is_GraecoRoman_Title}/ { ... }
760	760
761	761	=head2 ℞ 27: Unicode normalization
762	762
763	763	(℞ 27: Unicode 正規化)
764	764
765	765	=begin original
766	766
767	767	Typically render into NFD on input and NFC on output. Using NFKC or NFKD
768	768	functions improves recall on searches, assuming you've already done to the
769	769	same text to be searched. Note that this is about much more than just pre-
770	770	combined compatibility glyphs; it also reorders marks according to their
771	771	canonical combining classes and weeds out singletons.
772	772
773	773	=end original
774	774
775	775	通常は、入力では NFD に、出力では NFC にレンダリングされます。
776	776	NFKC または NFKD 関数を使うことで、検索対象の同じテキストに対して
777	777	既に実行していることを前提として、検索時の再呼び出しが改善されます。
778	778	これは単に事前結合された互換グリフ以上のものであることに
779	779	注意してください;
780	780	正準結合クラスに従ってマークを並び替え、シングルトンを削除します。
781	781
782	782	use Unicode::Normalize;
783	783	my $nfd = NFD($orig);
784	784	my $nfc = NFC($orig);
785	785	my $nfkd = NFKD($orig);
786	786	my $nfkc = NFKC($orig);
787	787
788	788	=head2 ℞ 28: Convert non-ASCII Unicode numerics
789	789
790	790	(℞ 28: 非 ASCII Unicode 数字を変換する)
791	791
792	792	=begin original
793	793
794	794	Unless you’ve used C</a> or C</aa>, C<\d> matches more than
795	795	ASCII digits only, but Perl’s implicit string-to-number
796	796	conversion does not current recognize these. Here’s how to
797	797	convert such strings manually.
798	798
799	799	=end original
800	800
801	801	C</a> や C</aa> を使用していない限り、C<\d> は ASCII 数字以上のものに
802	802	マッチングしますが、
803	803	Perl の暗黙的な文字列から数値への変換では、現在のところこれらを
804	804	認識できません。
805	805	このような文字列を手動で変換する方法を以下に示します。
806	806
807	807	use v5.14; # needed for num() function
808	808	use Unicode::UCD qw(num);
809	809	my $str = "got Ⅻ and ४५६७ and ⅞ and here";
810	810	my @nums = ();
811	811	while ($str =~ /(\d+\|\N)/g) { # not just ASCII!
812	812	push @nums, num($1);
813	813	}
814	814	say "@nums"; # 12 4567 0.875
815	815
816	816	use charnames qw(:full);
817	817	my $nv = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}");
818	818
819	819	=head2 ℞ 29: Match Unicode grapheme cluster in regex
820	820
821	821	(℞ 29: 正規表現中の Unicode 書記素クラスタにマッチングする)
822	822
823	823	=begin original
824	824
825	825	Programmer-visible “characters” are codepoints matched by C</./s>,
826	826	but user-visible “characters” are graphemes matched by C</\X/>.
827	827
828	828	=end original
829	829
830	830	プログラマから見える「文字」は、C</./s> がマッチする符号位置ですが、
831	831	ユーザから見える「文字」は、C</\X/> がマッチする書記素です。
832	832
833	833	# Find vowel plus any combining diacritics,underlining,etc.
834	834	my $nfd = NFD($orig);
835	835	$nfd =~ / (?=[aeiou]) \X /xi
836	836
837	837	=head2 ℞ 30: Extract by grapheme instead of by codepoint (regex)
838	838
839	839	(℞ 30: 符号位置によってではなく、書記素によって展開する (正規表現))
840	840
841	841	# match and grab five first graphemes
842	842	my($first_five) = $str =~ /^ ( \X{5} ) /x;
843	843
844	844	=head2 ℞ 31: Extract by grapheme instead of by codepoint (substr)
845	845
846	846	(℞ 31: 符号位置によってではなく、書記素によって展開する (substr))
847	847
848	848	# cpan -i Unicode::GCString
849	849	use Unicode::GCString;
850	850	my $gcs = Unicode::GCString->new($str);
851	851	my $first_five = $gcs->substr(0, 5);
852	852
853	853	=head2 ℞ 32: Reverse string by grapheme
854	854
855	855	(℞ 32: 文字列を書記素単位で反転する)
856	856
857	857	=begin original
858	858
859	859	Reversing by codepoint messes up diacritics, mistakenly converting
860	860	C<crème brûlée> into C<éel̂urb em̀erc> instead of into C<eélûrb emèrc>;
861	861	so reverse by grapheme instead. Both these approaches work
862	862	right no matter what normalization the string is in:
863	863
864	864	=end original
865	865
866	866	符号位置による反転はダイアクリティカルマークを混乱させ、誤って
867	867	C<crème brülée> を C<eélûrb emèrc> ではなく
868	868	C<éel̂urb em̀erc> に変換します;
869	869	そこで、代わりに書記素による反転を行います。
870	870	これらの手法はどちらも、文字列の正規化がどのようなものであっても
871	871	正しく機能します。
872	872
873	873	$str = join("", reverse $str =~ /\X/g);
874	874
875	875	# OR: cpan -i Unicode::GCString
876	876	use Unicode::GCString;
877	877	$str = reverse Unicode::GCString->new($str);
878	878
879	879	=head2 ℞ 33: String length in graphemes
880	880
881	881	(℞ 33: 書記素での文字列長)
882	882
883	883	=begin original
884	884
885	885	The string C<brûlée> has six graphemes but up to eight codepoints.
886	886	This counts by grapheme, not by codepoint:
887	887
888	888	=end original
889	889
890	890	文字列 C<brülée> は六つの書記素を持ちますが、最大八つの符号位置を持ちます。
891	891	これは、符号位置ではなく、書記素によってカウントされます:
892	892
893	893	my $str = "brûlée";
894	894	my $count = 0;
895	895	while ($str =~ /\X/g) { $count++ }
896	896
897	897	# OR: cpan -i Unicode::GCString
898	898	use Unicode::GCString;
899	899	my $gcs = Unicode::GCString->new($str);
900	900	my $count = $gcs->length;
901	901
902	902	=head2 ℞ 34: Unicode column-width for printing
903	903
904	904	(℞ 34: 表示のための Unicode 桁幅)
905	905
906	906	=begin original
907	907
908	908	Perl’s C<printf>, C<sprintf>, and C<format> think all
909	909	codepoints take up 1 print column, but many take 0 or 2.
910	910	Here to show that normalization makes no difference,
911	911	we print out both forms:
912	912
913	913	=end original
914	914
915	915	Perl の C<printf>、C<sprintf>、C<format> は、すべての符号位置が
916	916	一つの表示桁を占有すると考えていますが、多くの符号位置は 0 から 2 を
917	917	占有します。
918	918	ここでは、正規化に違いがないことを示すために、両方の形式を出力します。
919	919
920	920	use Unicode::GCString;
921	921	use Unicode::Normalize;
922	922
923	923	my @words = qw/crème brûlée/;
924	924	@words = map { NFC($_), NFD($_) } @words;
925	925
926	926	for my $str (@words) {
927	927	my $gcs = Unicode::GCString->new($str);
928	928	my $cols = $gcs->columns;
929	929	my $pad = " " x (10 - $cols);
930	930	say str, $pad, " \|";
931	931	}
932	932
933	933	=begin original
934	934
935	935	generates this to show that it pads correctly no matter
936	936	the normalization:
937	937
938	938	=end original
939	939
940	940	これは、正規化に関係なく正しくパッディングされていることを示すために
941	941	次のように生成されます。
942	942
943	943	crème \|
944	944	crème \|
945	945	brûlée \|
946	946	brûlée \|
947	947
948	948	=head2 ℞ 35: Unicode collation
949	949
950	950	(℞ 35: Unicode の照合順序)
951	951
952	952	=begin original
953	953
954	954	Text sorted by numeric codepoint follows no reasonable alphabetic order;
955	955	use the UCA for sorting text.
956	956
957	957	=end original
958	958
959	959	数値符号位置でソートされたテキストは、合理的なアルファベット順ではありません;
960	960	テキストのソートには UCA を使用してください。
961	961
962	962	use Unicode::Collate;
963	963	my $col = Unicode::Collate->new();
964	964	my @list = $col->sort(@old_list);
965	965
966	966	=begin original
967	967
968	968	See the I<ucsort> program from the L<Unicode::Tussle> CPAN module
969	969	for a convenient command-line interface to this module.
970	970
971	971	=end original
972	972
973	973	このモジュールへの便利なコマンドラインインタフェースについては、
974	974	L<Unicode::Tassil> CPAN モジュールの I<ucsort> プログラムを参照してください。
975	975
976	976	=head2 ℞ 36: Case- I<and> accent-insensitive Unicode sort
977	977
978	978	(℞ 36: 大文字小文字 I<および> アクセントを無視した Unicode のソート)
979	979
980	980	=begin original
981	981
982	982	Specify a collation strength of level 1 to ignore case and
983	983	diacritics, only looking at the basic character.
984	984
985	985	=end original
986	986
987	987	照合強度レベル 1 を指定して、大文字小文字とダイアクリティカルマークを
988	988	無視し、基本文字だけを参照するようにします。
989	989
990	990	use Unicode::Collate;
991	991	my $col = Unicode::Collate->new(level => 1);
992	992	my @list = $col->sort(@old_list);
993	993
994	994	=head2 ℞ 37: Unicode locale collation
995	995
996	996	(℞ 37: Unicode ロケールの照合順序)
997	997
998	998	=begin original
999	999
1000	1000	Some locales have special sorting rules.
1001	1001
1002	1002	=end original
1003	1003
1004	1004	一部のロケールには、特別なソート規則があります。
1005	1005
1006	1006	# either use v5.12, OR: cpan -i Unicode::Collate::Locale
1007	1007	use Unicode::Collate::Locale;
1008	1008	my $col = Unicode::Collate::Locale->new(locale => "de__phonebook");
1009	1009	my @list = $col->sort(@old_list);
1010	1010
1011	1011	=begin original
1012	1012
1013	1013	The I<ucsort> program mentioned above accepts a C<--locale> parameter.
1014	1014
1015	1015	=end original
1016	1016
1017	1017	上記の I<ucsort> プログラムは、C<--locale> パラメータを受け付けます。
1018	1018
1019	1019	=head2 ℞ 38: Making C<cmp> work on text instead of codepoints
1020	1020
1021	1021	(℞ 38: 符号位置ではなくテキストでg C<cmp> が動作するようにする)
1022	1022
1023	1023	=begin original
1024	1024
1025	1025	Instead of this:
1026	1026
1027	1027	=end original
1028	1028
1029	1029	次のようにせずに:
1030	1030
1031	1031	@srecs = sort {
1032	1032	$b->{AGE} <=> $a->{AGE}
1033	1033	\|\|
1034	1034	$a->{NAME} cmp $b->{NAME}
1035	1035	} @recs;
1036	1036
1037	1037	=begin original
1038	1038
1039	1039	Use this:
1040	1040
1041	1041	=end original
1042	1042
1043	1043	次を使います:
1044	1044
1045	1045	my $coll = Unicode::Collate->new();
1046	1046	for my $rec (@recs) {
1047	1047	$rec->{NAME_key} = $coll->getSortKey( $rec->{NAME} );
1048	1048	}
1049	1049	@srecs = sort {
1050	1050	$b->{AGE} <=> $a->{AGE}
1051	1051	\|\|
1052	1052	$a->{NAME_key} cmp $b->{NAME_key}
1053	1053	} @recs;
1054	1054
1055	1055	=head2 ℞ 39: Case- I<and> accent-insensitive comparisons
1056	1056
1057	1057	(℞ 39: 大文字小文字 I<および> アクセントを無視した比較)
1058	1058
1059	1059	=begin original
1060	1060
1061	1061	Use a collator object to compare Unicode text by character
1062	1062	instead of by codepoint.
1063	1063
1064	1064	=end original
1065	1065
1066	1066	照合オブジェクトを使用して、Unicode テキストを符号位置ではなく
1067	1067	文字で比較します。
1068	1068
1069	1069	use Unicode::Collate;
1070	1070	my $es = Unicode::Collate->new(
1071	1071	level => 1,
1072	1072	normalization => undef
1073	1073	);
1074	1074
1075	1075	# now both are true:
1076	1076	$es->eq("García", "GARCIA" );
1077	1077	$es->eq("Márquez", "MARQUEZ");
1078	1078
1079	1079	=head2 ℞ 40: Case- I<and> accent-insensitive locale comparisons
1080	1080
1081	1081	(℞ 40: 大文字小文字 I<および> アクセントを無視したロケールでの比較)
1082	1082
1083	1083	=begin original
1084	1084
1085	1085	Same, but in a specific locale.
1086	1086
1087	1087	=end original
1088	1088
1089	1089	同じですが、特定のロケールです。
1090	1090
1091	1091	my $de = Unicode::Collate::Locale->new(
1092	1092	locale => "de__phonebook",
1093	1093	);
1094	1094
1095	1095	# now this is true:
1096	1096	$de->eq("tschüß", "TSCHUESS"); # notice ü => UE, ß => SS
1097	1097
1098	1098	=head2 ℞ 41: Unicode linebreaking
1099	1099
1100	1100	(℞ 41: Unicode の改行)
1101	1101
1102	1102	=begin original
1103	1103
1104	1104	Break up text into lines according to Unicode rules.
1105	1105
1106	1106	=end original
1107	1107
1108	1108	Unicode 規則に従ってテキストを行に分割します。
1109	1109
1110	1110	# cpan -i Unicode::LineBreak
1111	1111	use Unicode::LineBreak;
1112	1112	use charnames qw(:full);
1113	1113
1114	1114	my $para = "This is a super\N{HYPHEN}long string. " x 20;
1115	1115	my $fmt = Unicode::LineBreak->new;
1116	1116	print $fmt->break($para), "\n";
1117	1117
1118	1118	=head2 ℞ 42: Unicode text in DBM hashes, the tedious way
1119	1119
1120	1120	(℞ 42: DBM ハッシュの中の Unicode テキスト、退屈な方法)
1121	1121
1122	1122	=begin original
1123	1123
1124	1124	Using a regular Perl string as a key or value for a DBM
1125	1125	hash will trigger a wide character exception if any codepoints
1126	1126	won’t fit into a byte. Here’s how to manually manage the translation:
1127	1127
1128	1128	=end original
1129	1129
1130	1130	DBM ハッシュのキーまたは値として通常の Perl 文字列を使用すると、
1131	1131	符号位置が 1 バイトに収まらない場合にワイド文字例外が発生します。
1132	1132	次に、手動で変換を管理する方法を示します:
1133	1133
1134	1134	use DB_File;
1135	1135	use Encode qw(encode decode);
1136	1136	tie %dbhash, "DB_File", "pathname";
1137	1137
1138	1138	# STORE
1139	1139
1140	1140	# assume $uni_key and $uni_value are abstract Unicode strings
1141	1141	my $enc_key = encode("UTF-8", $uni_key, 1);
1142	1142	my $enc_value = encode("UTF-8", $uni_value, 1);
1143	1143	$dbhash{$enc_key} = $enc_value;
1144	1144
1145	1145	# FETCH
1146	1146
1147	1147	# assume $uni_key holds a normal Perl string (abstract Unicode)
1148	1148	my $enc_key = encode("UTF-8", $uni_key, 1);
1149	1149	my $enc_value = $dbhash{$enc_key};
1150	1150	my $uni_value = decode("UTF-8", $enc_value, 1);
1151	1151
1152	1152	=head2 ℞ 43: Unicode text in DBM hashes, the easy way
1153	1153
1154	1154	(℞ 43: DBM ハッシュの中の Unicode テキスト、簡単な方法)
1155	1155
1156	1156	=begin original
1157	1157
1158	1158	Here’s how to implicitly manage the translation; all encoding
1159	1159	and decoding is done automatically, just as with streams that
1160	1160	have a particular encoding attached to them:
1161	1161
1162	1162	=end original
1163	1163
1164	1164	次に、変換を暗黙的に管理する方法を示します;
1165	1165	すべてのエンコードとデコードは、特定のエンコーディングが付加された
1166	1166	ストリームと同じように自動的に行われます:
1167	1167
1168	1168	use DB_File;
1169	1169	use DBM_Filter;
1170	1170
1171	1171	my $dbobj = tie %dbhash, "DB_File", "pathname";
1172	1172	$dbobj->Filter_Value("utf8"); # this is the magic bit
1173	1173
1174	1174	# STORE
1175	1175
1176	1176	# assume $uni_key and $uni_value are abstract Unicode strings
1177	1177	$dbhash{$uni_key} = $uni_value;
1178	1178
1179	1179	# FETCH
1180	1180
1181	1181	# $uni_key holds a normal Perl string (abstract Unicode)
1182	1182	my $uni_value = $dbhash{$uni_key};
1183	1183
1184	1184	=head2 ℞ 44: PROGRAM: Demo of Unicode collation and printing
1185	1185
1186	1186	(℞ 44: プログラム: Unicode の照合と表示のデモ)
1187	1187
1188	1188	=begin original
1189	1189
1190	1190	Here’s a full program showing how to make use of locale-sensitive
1191	1191	sorting, Unicode casing, and managing print widths when some of the
1192	1192	characters take up zero or two columns, not just one column each time.
1193	1193	When run, the following program produces this nicely aligned output:
1194	1194
1195	1195	=end original
1196	1196
1197	1197	以下の完全なプログラムでは、ロケールを認識するソート、
1198	1198	Unicode の大文字小文字、そしていくつかの文字が 1 桁ではなく 0 または 2 桁を
1199	1199	占める場合の印刷幅の管理をどのように利用するかを示しています。
1200	1200	次のプログラムを実行すると、次のようなうまく整列した出力が生成されます:
1201	1201
1202	1202	Crème Brûlée....... €2.00
1203	1203	Éclair............. €1.60
1204	1204	Fideuà............. €4.20
1205	1205	Hamburger.......... €6.00
1206	1206	Jamón Serrano...... €4.45
1207	1207	Linguiça........... €7.00
1208	1208	Pâté............... €4.15
1209	1209	Pears.............. €2.00
1210	1210	Pêches............. €2.25
1211	1211	Smørbrød........... €5.75
1212	1212	Spätzle............ €5.50
1213	1213	Xoriço............. €3.00
1214	1214	Γύρος.............. €6.50
1215	1215	막걸리............. €4.00
1216	1216	おもち............. €2.65
1217	1217	お好み焼き......... €8.00
1218	1218	シュークリーム..... €1.85
1219	1219	寿司............... €9.99
1220	1220	包子............... €7.50
1221	1221
1222	1222	=begin original
1223	1223
1224	1224	Here's that program.
1225	1225
1226	1226	=end original
1227	1227
1228	1228	これがプログラムです。
1229	1229
1230	1230	#!/usr/bin/env perl
1231	1231	# umenu - demo sorting and printing of Unicode food
1232	1232	#
1233	1233	# (obligatory and increasingly long preamble)
1234	1234	#
1235	1235	use v5.36;
1236	1236	use utf8;
1237	1237	use warnings qw(FATAL utf8); # fatalize encoding faults
1238	1238	use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
1239	1239	use charnames qw(:full :short); # unneeded in v5.16
1240	1240
1241	1241	# std modules
1242	1242	use Unicode::Normalize; # std perl distro as of v5.8
1243	1243	use List::Util qw(max); # std perl distro as of v5.10
1244	1244	use Unicode::Collate::Locale; # std perl distro as of v5.14
1245	1245
1246	1246	# cpan modules
1247	1247	use Unicode::GCString; # from CPAN
1248	1248
1249	1249	my %price = (
1250	1250	"γύρος" => 6.50, # gyros
1251	1251	"pears" => 2.00, # like um, pears
1252	1252	"linguiça" => 7.00, # spicy sausage, Portuguese
1253	1253	"xoriço" => 3.00, # chorizo sausage, Catalan
1254	1254	"hamburger" => 6.00, # burgermeister meisterburger
1255	1255	"éclair" => 1.60, # dessert, French
1256	1256	"smørbrød" => 5.75, # sandwiches, Norwegian
1257	1257	"spätzle" => 5.50, # Bayerisch noodles, little sparrows
1258	1258	"包子" => 7.50, # bao1 zi5, steamed pork buns, Mandarin
1259	1259	"jamón serrano" => 4.45, # country ham, Spanish
1260	1260	"pêches" => 2.25, # peaches, French
1261	1261	"シュークリーム" => 1.85, # cream-filled pastry like eclair
1262	1262	"막걸리" => 4.00, # makgeolli, Korean rice wine
1263	1263	"寿司" => 9.99, # sushi, Japanese
1264	1264	"おもち" => 2.65, # omochi, rice cakes, Japanese
1265	1265	"crème brûlée" => 2.00, # crema catalana
1266	1266	"fideuà" => 4.20, # more noodles, Valencian
1267	1267	# (Catalan=fideuada)
1268	1268	"pâté" => 4.15, # gooseliver paste, French
1269	1269	"お好み焼き" => 8.00, # okonomiyaki, Japanese
1270	1270	);
1271	1271
1272	1272	my $width = 5 + max map { colwidth($_) } keys %price;
1273	1273
1274	1274	# So the Asian stuff comes out in an order that someone
1275	1275	# who reads those scripts won't freak out over; the
1276	1276	# CJK stuff will be in JIS X 0208 order that way.
1277	1277	my $coll = Unicode::Collate::Locale->new(locale => "ja");
1278	1278
1279	1279	for my $item ($coll->sort(keys %price)) {
1280	1280	print pad(entitle($item), $width, ".");
1281	1281	printf " €%.2f\n", $price{$item};
1282	1282	}
1283	1283
1284	1284	sub pad ($str, $width, $padchar) {
1285	1285	return $str . ($padchar x ($width - colwidth($str)));
1286	1286	}
1287	1287
1288	1288	sub colwidth ($str) {
1289	1289	return Unicode::GCString->new($str)->columns;
1290	1290	}
1291	1291
1292	1292	sub entitle ($str) {
1293	1293	$str =~ s{ (?=\pL)(\S) (\S*) }
1294	1294	{ ucfirst($1) . lc($2) }xge;
1295	1295	return $str;
1296	1296	}
1297	1297
1298	1298	=head1 SEE ALSO
1299	1299
1300	1300	=begin original
1301	1301
1302	1302	See these manpages, some of which are CPAN modules:
1303	1303	L<perlunicode>, L<perluniprops>,
1304	1304	L<perlre>, L<perlrecharclass>,
1305	1305	L<perluniintro>, L<perlunitut>, L<perlunifaq>,
1306	1306	L<PerlIO>, L<DB_File>, L<DBM_Filter>, L<DBM_Filter::utf8>,
1307	1307	L<Encode>, L<Encode::Locale>,
1308	1308	L<Unicode::UCD>,
1309	1309	L<Unicode::Normalize>,
1310	1310	L<Unicode::GCString>, L<Unicode::LineBreak>,
1311	1311	L<Unicode::Collate>, L<Unicode::Collate::Locale>,
1312	1312	L<Unicode::Unihan>,
1313	1313	L<Unicode::CaseFold>,
1314	1314	L<Unicode::Tussle>,
1315	1315	L<Lingua::JA::Romanize::Japanese>,
1316	1316	L<Lingua::ZH::Romanize::Pinyin>,
1317	1317	L<Lingua::KO::Romanize::Hangul>.
1318	1318
1319	1319	=end original
1320	1320
1321	1321	以下の man ページ; 一部は CPAN モジュールのものです:
1322	1322	L<perlunicode>, L<perluniprops>,
1323	1323	L<perlre>, L<perlrecharclass>,
1324	1324	L<perluniintro>, L<perlunitut>, L<perlunifaq>,
1325	1325	L<PerlIO>, L<DB_File>, L<DBM_Filter>, L<DBM_Filter::utf8>,
1326	1326	L<Encode>, L<Encode::Locale>,
1327	1327	L<Unicode::UCD>,
1328	1328	L<Unicode::Normalize>,
1329	1329	L<Unicode::GCString>, L<Unicode::LineBreak>,
1330	1330	L<Unicode::Collate>, L<Unicode::Collate::Locale>,
1331	1331	L<Unicode::Unihan>,
1332	1332	L<Unicode::CaseFold>,
1333	1333	L<Unicode::Tussle>,
1334	1334	L<Lingua::JA::Romanize::Japanese>,
1335	1335	L<Lingua::ZH::Romanize::Pinyin>,
1336	1336	L<Lingua::KO::Romanize::Hangul>.
1337	1337
1338	1338	=begin original
1339	1339
1340	1340	The L<Unicode::Tussle> CPAN module includes many programs
1341	1341	to help with working with Unicode, including
1342	1342	these programs to fully or partly replace standard utilities:
1343	1343	I<tcgrep> instead of I<egrep>,
1344	1344	I<uniquote> instead of I<cat -v> or I<hexdump>,
1345	1345	I<uniwc> instead of I<wc>,
1346	1346	I<unilook> instead of I<look>,
1347	1347	I<unifmt> instead of I<fmt>,
1348	1348	and
1349	1349	I<ucsort> instead of I<sort>.
1350	1350	For exploring Unicode character names and character properties,
1351	1351	see its I<uniprops>, I<unichars>, and I<uninames> programs.
1352	1352	It also supplies these programs, all of which are general filters that do Unicode-y things:
1353	1353	I<unititle> and I<unicaps>;
1354	1354	I<uniwide> and I<uninarrow>;
1355	1355	I<unisupers> and I<unisubs>;
1356	1356	I<nfd>, I<nfc>, I<nfkd>, and I<nfkc>;
1357	1357	and I<uc>, I<lc>, and I<tc>.
1358	1358
1359	1359	=end original
1360	1360
1361	1361	L<Unicode::Tussle> CPAN モジュールには、Unicode を扱うための多くの
1362	1362	プログラムが含まれています;
1363	1363	これらのプログラムは、標準ユーティリティを完全にまたは部分的に
1364	1364	置き換えるためのものです:
1365	1365	I<egrep> の代わりに I<tcgrep>、
1366	1366	I<cat -v> または I<hexdump> の代わりに I<uniquote>、
1367	1367	I<wc> の代わりに I<uniwc>、
1368	1368	I<look> の代わりに I<unilook>、
1369	1369	I<fmt> の代わりに I<unifmt>、
1370	1370	I<sort> の代わりに I<ucsort>。
1371	1371	Unicode 文字名と文字特性を調べるには、I<uniprops>、I<unichars>、
1372	1372	I<uninames> プログラムを参照してください。
1373	1373	また、これらのプログラムも提供しています。
1374	1374	これらはすべて Unicode 対応の一般的なフィルタです:
1375	1375	I<unititle> と I<unicaps>、
1376	1376	I<uniwide> と I<uninarrow>、
1377	1377	I<unisupers> と I<unisubs>、
1378	1378	I<nfd>、I<nfc>、I<nfkd>、I<nfkc>;
1379	1379	I<uc>、I<lc>、I<tc>。
1380	1380
1381	1381	=begin original
1382	1382
1383	1383	Finally, see the published Unicode Standard (page numbers are from version
1384	1384	6.0.0), including these specific annexes and technical reports:
1385	1385
1386	1386	=end original
1387	1387
1388	1388	最後に、これらの特定の付属文書および技術報告書を含む、公開された
1389	1389	Unicode 標準(ページ番号はバージョン6.0.0 から) を参照してください。
1390	1390
1391	1391	=over
1392	1392
1393	1393	=item §3.13 Default Case Algorithms, page 113;
1394	1394	§4.2 Case, pages 120–122;
1395	1395	Case Mappings, page 166–172, especially Caseless Matching starting on page 170.
1396	1396
1397	1397	=item UAX #44: Unicode Character Database
1398	1398
1399	1399	=item UTS #18: Unicode Regular Expressions
1400	1400
1401	1401	=item UAX #15: Unicode Normalization Forms
1402	1402
1403	1403	=item UTS #10: Unicode Collation Algorithm
1404	1404
1405	1405	=item UAX #29: Unicode Text Segmentation
1406	1406
1407	1407	=item UAX #14: Unicode Line Breaking Algorithm
1408	1408
1409	1409	=item UAX #11: East Asian Width
1410	1410
1411	1411	=back
1412	1412
1413	1413	=head1 AUTHOR
1414	1414
1415	1415	=begin original
1416	1416
1417	1417	Tom Christiansen E<lt>tchrist@perl.comE<gt> wrote this, with occasional
1418	1418	kibbitzing from Larry Wall and Jeffrey Friedl in the background.
1419	1419
1420	1420	=end original
1421	1421
1422	1422	Tom Christiansen E<lt>tchrist@perl.comE<gt> が、
1423	1423	時々 Larry Wall と Jeffrey Friedl に後ろから口出しされながら書きました。
1424	1424
1425	1425	=head1 COPYRIGHT AND LICENCE
1426	1426
1427	1427	Copyright © 2012 Tom Christiansen.
1428	1428
1429	1429	This program is free software; you may redistribute it and/or modify it
1430	1430	under the same terms as Perl itself.
1431	1431
1432	1432	=begin original
1433	1433
1434	1434	Most of these examples taken from the current edition of the “Camel Book”;
1435	1435	that is, from the 4ᵗʰ Edition of I<Programming Perl>, Copyright © 2012 Tom
1436	1436	Christiansen <et al.>, 2012-02-13 by O’Reilly Media. The code itself is
1437	1437	freely redistributable, and you are encouraged to transplant, fold,
1438	1438	spindle, and mutilate any of the examples in this manpage however you please
1439	1439	for inclusion into your own programs without any encumbrance whatsoever.
1440	1440	Acknowledgement via code comment is polite but not required.
1441	1441
1442	1442	=end original
1443	1443
1444	1444	これらの例のほとんどは、"Camel Book"の現在の版から引用されています:
1445	1445	すなわち、4ᵗʰ版I<Programming Perl>, Copyright © 2012 Tom
1446	1446	Christiansen <et al.>, 2012-02-13 by O'Reilly Media。
1447	1447	コード自体は自由に再配布可能であり、この man ページの例を移植したり、
1448	1448	折りたたんだり、紡錘形にしたり、切断したりすることが推奨されますが、
1449	1449	あなた自身のプログラムに含めるためには、何も気にせずに行ってください。
1450	1450	コードコメントによる謝辞は丁寧ですが、必須ではありません。
1451	1451
1452	1452	=head1 REVISION HISTORY
1453	1453
1454	1454	=begin original
1455	1455
1456	1456	v1.0.0 – first public release, 2012-02-27
1457	1457
1458	1458	=end original
1459	1459
1460	1460	v1.0.0 - 最初の一般公開、2012-02-27
1461	1461
1462	1462	=begin meta
1463	1463
1464	1464	Translate: SHIRAKATA Kentaro <argrath@ub32.org>
1465	1465	Status: completed
1466	1466
1467	1467	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association