perlunicook 5.24.1 と 5.38.0 の差分

1	1
2	2	=encoding utf8
3	3
4	4	=head1 NAME
5	5
6	6	=begin original
7	7
8	8	perlunicook - cookbookish examples of handling Unicode in Perl
9	9
10	10	=end original
11	11
12	12	perlunicook - Perl で Unicode を扱うためのクックブック風の例
13	13
14	14	=head1 DESCRIPTION
15	15
16	16	=begin original
17	17
18	18	This manpage contains short recipes demonstrating how to handle common Unicode
19	19	operations in Perl, plus one complete program at the end. Any undeclared
20	20	variables in individual recipes are assumed to have a previous appropriate
21	21	value in them.
22	22
23	23	=end original
24	24
25	25	この man ページには、Perl で一般的な Unicode 操作を扱う方法を説明する
26	26	短いレシピと、最後に一つの完全なプログラムが含まれています。
27	27	個々のレシピ内の宣言されていない変数は、それ以前に適切な値が
28	28	設定されていることを仮定しています。
29	29
30	30	=head1 EXAMPLES
31	31
32	32	=head2 ℞ 0: Standard preamble
33	33
34	34	(℞ 0: 標準の前提)
35	35
36	36	=begin original
37	37
38	38	Unless otherwise notes, all examples below require this standard preamble
39	39	to work correctly, with the C<#!> adjusted to work on your system:
40	40
41	41	=end original
42	42
43	43	特に注記がない限り、以下のすべての例では、この標準の前提が正しく動作し、
44	44	C<#!> がシステム上で動作するように調整されている必要があります。
45	45
46	46	#!/usr/bin/env perl
47	47
48	48	=begin original
49	49
	50	use v5.36; # or later to get "unicode_strings" feature,
	51	# plus strict, warnings
50	52	use utf8; # so literals and identifiers can be in UTF-8
51		use v5.12; # or later to get "unicode_strings" feature
52		use strict; # quote strings, declare variables
53		use warnings; # on by default
54	53	use warnings qw(FATAL utf8); # fatalize encoding glitches
55		use open qw(:std :~~utf~~8); # undeclared streams in UTF-8
	54	use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
56	55	use charnames qw(:full :short); # unneeded in v5.16
57	56
58	57	=end original
59	58
	59	use v5.36; # またはそれ以降; "unicode_strings" 機能を有効に
	60	# 加えて strict, warnings
60	61	use utf8; # 従ってリテラルと識別子で UTF-8 を使える
61		use v5.12; # またはそれ以降; "unicode_strings" 機能を有効に
62		use strict; # 文字列をクォート、変数を宣言
63		use warnings; # デフォルトでオン
64	62	use warnings qw(FATAL utf8); # エンコーディングエラーを致命的エラーに
65		use open qw(:std :~~utf~~8); # 未宣言ストリームを UTF-8 に
	63	use open qw(:std :encoding(UTF-8)); # 未宣言ストリームを UTF-8 に
66	64	use charnames qw(:full :short); # v5.16 では不要
67	65
68	66	=begin original
69	67
70	68	This I<does> make even Unix programmers C<binmode> your binary streams,
71	69	or open them with C<:raw>, but that's the only way to get at them
72	70	portably anyway.
73	71
74	72	=end original
75	73
76	74	これは Unix プログラマでさえバイナリストリームを C<binmode> したり、
77	75	C<:raw> で開いたり I<しています> が、それがとにかくこれらを
78	76	移植性のあるものにする唯一の方法です。
79	77
80	78	=begin original
81	79
82	80	B<WARNING>: C<use autodie> (pre 2.26) and C<use open> do not get along with each
83	81	other.
84	82
85	83	=end original
86	84
87	85	B<警告>: C<use autodie>(2.26 より前)と C<use open> は同時に使えません。
88	86
89	87	=head2 ℞ 1: Generic Unicode-savvy filter
90	88
91	89	(℞ 1: 一般的な Unicode が使えるフィルタ)
92	90
93	91	=begin original
94	92
95	93	Always decompose on the way in, then recompose on the way out.
96	94
97	95	=end original
98	96
99	97	常に、入り口で分解し、出口で再合成します。
100	98
101	99	use Unicode::Normalize;
102	100
103	101	while (<>) {
104	102	$_ = NFD($_); # decompose + reorder canonically
105	103	...
106	104	} continue {
107	105	print NFC($_); # recompose (where possible) + reorder canonically
108	106	}
109	107
110	108	=head2 ℞ 2: Fine-tuning Unicode warnings
111	109
112	110	(℞ 2: Unicode 警告の微調整)
113	111
114	112	=begin original
115	113
116	114	As of v5.14, Perl distinguishes three subclasses of UTF‑8 warnings.
117	115
118	116	=end original
119	117
120	118	v5.14 から、Perl は UTF-8 警告の三つのサブクラスを区別しています。
121	119
122	120	use v5.14; # subwarnings unavailable any earlier
123	121	no warnings "nonchar"; # the 66 forbidden non-characters
124	122	no warnings "surrogate"; # UTF-16/CESU-8 nonsense
125	123	no warnings "non_unicode"; # for codepoints over 0x10_FFFF
126	124
127	125	=head2 ℞ 3: Declare source in utf8 for identifiers and literals
128	126
129	127	(℞ 3: 識別子とリテラルのためにソースが utf8 であると宣言する)
130	128
131	129	=begin original
132	130
133	131	Without the all-critical C<use utf8> declaration, putting UTF‑8 in your
134	132	literals and identifiers won’t work right. If you used the standard
135	133	preamble just given above, this already happened. If you did, you can
136	134	do things like this:
137	135
138	136	=end original
139	137
140	138	最も重要な C<use utf8> 宣言なしの場合、リテラルと識別子に
141	139	UTF-8 を入れると正しく動作しません。
142	140	前述した標準の前提を使った場合、これは既に含まれています。
143	141	その場合、以下のようなことができます:
144	142
145	143	use utf8;
146	144
147	145	my $measure = "Ångström";
148	146	my @μsoft = qw( cp852 cp1251 cp1252 );
149	147	my @ὑπέρμεγας = qw( ὑπέρ μεγας );
150	148	my @鯉 = qw( koi8-f koi8-u koi8-r );
151	149	my $motto = "👪 💗 🐪"; # FAMILY, GROWING HEART, DROMEDARY CAMEL
152	150
153	151	=begin original
154	152
155	153	If you forget C<use utf8>, high bytes will be misunderstood as
156	154	separate characters, and nothing will work right.
157	155
158	156	=end original
159	157
160	158	C<use utf8> を忘れると、上位バイトは別々の文字として誤解され、
161	159	何も正しく動作しません。
162	160
163	161	=head2 ℞ 4: Characters and their numbers
164	162
165	163	(℞ 4: 文字とその番号)
166	164
167	165	=begin original
168	166
169	167	The C<ord> and C<chr> functions work transparently on all codepoints,
170	168	not just on ASCII alone — nor in fact, not even just on Unicode alone.
171	169
172	170	=end original
173	171
174	172	C<ord> 関数と C<chr> 関数は、すべての符号位置で透過的に動作します;
175	173	ASCII だけではなく、実際には Unicode だけでもありません。
176	174
177	175	# ASCII characters
178	176	ord("A")
179	177	chr(65)
180	178
181	179	# characters from the Basic Multilingual Plane
182	180	ord("Σ")
183	181	chr(0x3A3)
184	182
185	183	# beyond the BMP
186	184	ord("𝑛") # MATHEMATICAL ITALIC SMALL N
187	185	chr(0x1D45B)
188	186
189	187	# beyond Unicode! (up to MAXINT)
190	188	ord("\x{20_0000}")
191	189	chr(0x20_0000)
192	190
193	191	=head2 ℞ 5: Unicode literals by character number
194	192
195	193	(℞ 5: 文字番号による Unicode リテラル)
196	194
197	195	=begin original
198	196
199	197	In an interpolated literal, whether a double-quoted string or a
200	198	regex, you may specify a character by its number using the
201	199	C<\x{I<HHHHHH>}> escape.
202	200
203	201	=end original
204	202
205	203	展開リテラルでは、ダブルクォートで囲まれた文字列か正規表現かにかかわらず、
206	204	C<\x{I<HHHHHH>}> エスケープを使用して番号で文字を指定できます。
207	205
208	206	String: "\x{3a3}"
209	207	Regex: /\x{3a3}/
210	208
211	209	String: "\x{1d45b}"
212	210	Regex: /\x{1d45b}/
213	211
214	212	# even non-BMP ranges in regex work fine
215	213	/[\x{1D434}-\x{1D467}]/
216	214
217	215	=head2 ℞ 6: Get character name by number
218	216
219	217	(℞ 6: 番号で文字名を取得する)
220	218
221	219	use charnames ();
222	220	my $name = charnames::viacode(0x03A3);
223	221
224	222	=head2 ℞ 7: Get character number by name
225	223
226	224	(℞ 7: 名前で文字番号を取得する)
227	225
228	226	use charnames ();
229	227	my $number = charnames::vianame("GREEK CAPITAL LETTER SIGMA");
230	228
231	229	=head2 ℞ 8: Unicode named characters
232	230
233	231	(℞ 8: Unicode 名による文字)
234	232
235	233	=begin original
236	234
237	235	Use the C<< \N{I<charname>} >> notation to get the character
238	236	by that name for use in interpolated literals (double-quoted
239	237	strings and regexes). In v5.16, there is an implicit
240	238
241	239	=end original
242	240
243	241	展開リテラル(ダブルクォートで囲まれた文字列と正規表現)で用いる、
244	242	名前で文字を得るために C<<\N{I<charname>}>> 表記を使います。
245	243	v5.16 では、これは暗黙に指定されます:
246	244
247	245	use charnames qw(:full :short);
248	246
249	247	=begin original
250	248
251	249	But prior to v5.16, you must be explicit about which set of charnames you
252	250	want. The C<:full> names are the official Unicode character name, alias, or
253	251	sequence, which all share a namespace.
254	252
255	253	=end original
256	254
257	255	しかし、v5.16 より前のバージョンでは、どの charnames の集合を使用するかを
258	256	明示的に指定しなければなりません。
259	257	C<:full> の名前は、Unicode の正式な文字名、別名、または
260	258	並びであり、すべて名前空間を共有します。
261	259
262	260	use charnames qw(:full :short latin greek);
263	261
264	262	"\N{MATHEMATICAL ITALIC SMALL N}" # :full
265	263	"\N{GREEK CAPITAL LETTER SIGMA}" # :full
266	264
267	265	=begin original
268	266
269	267	Anything else is a Perl-specific convenience abbreviation. Specify one or
270	268	more scripts by names if you want short names that are script-specific.
271	269
272	270	=end original
273	271
274	272	それ以外は、Perl 固有の便利な省略形です。
275	273	用字固有の短い名前が必要な場合は、一つ以上の用字を名前で指定します。
276	274
277	275	"\N{Greek:Sigma}" # :short
278	276	"\N{ae}" # latin
279	277	"\N{epsilon}" # greek
280	278
281	279	=begin original
282	280
283	281	The v5.16 release also supports a C<:loose> import for loose matching of
284	282	character names, which works just like loose matching of property names:
285	283	that is, it disregards case, whitespace, and underscores:
286	284
287	285	=end original
288	286
289	287	v5.16 リリースでは、文字名の緩やかなマッチングのための
290	288	C<:loose> インポートにも対応しています;
291	289	これは特性名の緩やかなマッチングと同じように機能します:
292	290	つまり、大文字小文字、空白、下線は無視されます:
293	291
294	292	"\N{euro sign}" # :loose (from v5.16)
295	293
	294	=begin original
	295
	296	Starting in v5.32, you can also use
	297
	298	=end original
	299
	300	v5.32 から、次のものを使って:
	301
	302	qr/\p{name=euro sign}/
	303
	304	=begin original
	305
	306	to get official Unicode named characters in regular expressions. Loose
	307	matching is always done for these.
	308
	309	=end original
	310
	311	公式な正規表現での Unicode の名前の文字を得られます。
	312	緩いマッチングは常にこれらで行われます。
	313
296	314	=head2 ℞ 9: Unicode named sequences
297	315
298	316	(℞ 9: Unicode 名による並び)
299	317
300	318	=begin original
301	319
302	320	These look just like character names but return multiple codepoints.
303	321	Notice the C<%vx> vector-print functionality in C<printf>.
304	322
305	323	=end original
306	324
307	325	これらは文字名のように見えますが、複数の符号位置を返します。
308	326	C<printf> の C<%vx> ベクトル表示機能に注目してください。
309	327
310	328	use charnames qw(:full);
311	329	my $seq = "\N{LATIN CAPITAL LETTER A WITH MACRON AND GRAVE}";
312	330	printf "U+%v04X\n", $seq;
313	331	U+0100.0300
314	332
315	333	=head2 ℞ 10: Custom named characters
316	334
317	335	(℞ 10: カスタム名による文字)
318	336
319	337	=begin original
320	338
321	339	Use C<:alias> to give your own lexically scoped nicknames to existing
322	340	characters, or even to give unnamed private-use characters useful names.
323	341
324	342	=end original
325	343
326	344	C<:alias> を使用して、既存の文字に対してレキシカルスコープの
327	345	独自のニックネームを付けたり、無名の私用文字に有用な名前を
328	346	付けることができます。
329	347
330	348	use charnames ":full", ":alias" => {
331	349	ecute => "LATIN SMALL LETTER E WITH ACUTE",
332	350	"APPLE LOGO" => 0xF8FF, # private use character
333	351	};
334	352
335	353	"\N{ecute}"
336	354	"\N{APPLE LOGO}"
337	355
338	356	=head2 ℞ 11: Names of CJK codepoints
339	357
340	358	(℞ 11: CJK 符号位置の名前)
341	359
342	360	=begin original
343	361
344	362	Sinograms like “東京” come back with character names of
345	363	C<CJK UNIFIED IDEOGRAPH-6771> and C<CJK UNIFIED IDEOGRAPH-4EAC>,
346	364	because their “names” vary. The CPAN C<Unicode::Unihan> module
347	365	has a large database for decoding these (and a whole lot more), provided you
348	366	know how to understand its output.
349	367
350	368	=end original
351	369
352	370	「東京」のような中国漢字は、「名前」が異なるため、
353	371	C<CJK UNIFIED IDEOGRAPH-6771> と
354	372	C<CJK UNIFIED IDEOGRAPH-4EAC> という文字名で戻ってきます。
355	373	CPAN の C<Unicode::Unihan> モジュールは、その出力を理解する方法を
356	374	知っていれば、これら(およびさらに多くの)文字をデコードするための
357	375	大規模なデータベースを持ちます。
358	376
359	377	# cpan -i Unicode::Unihan
360	378	use Unicode::Unihan;
361	379	my $str = "東京";
362	380	my $unhan = Unicode::Unihan->new;
363	381	for my $lang (qw(Mandarin Cantonese Korean JapaneseOn JapaneseKun)) {
364	382	printf "CJK $str in %-12s is ", $lang;
365	383	say $unhan->$lang($str);
366	384	}
367	385
368	386	=begin original
369	387
370	388	prints:
371	389
372	390	=end original
373	391
374	392	これは次のものを表示します:
375	393
376	394	CJK 東京 in Mandarin is DONG1JING1
377	395	CJK 東京 in Cantonese is dung1ging1
378	396	CJK 東京 in Korean is TONGKYENG
379	397	CJK 東京 in JapaneseOn is TOUKYOU KEI KIN
380	398	CJK 東京 in JapaneseKun is HIGASHI AZUMAMIYAKO
381	399
382	400	=begin original
383	401
384	402	If you have a specific romanization scheme in mind,
385	403	use the specific module:
386	404
387	405	=end original
388	406
389	407	特定のローマ字化スキームを考えている場合は、特定のモジュールを使います:
390	408
391	409	# cpan -i Lingua::JA::Romanize::Japanese
392	410	use Lingua::JA::Romanize::Japanese;
393	411	my $k2r = Lingua::JA::Romanize::Japanese->new;
394	412	my $str = "東京";
395	413	say "Japanese for $str is ", $k2r->chars($str);
396	414
397	415	=begin original
398	416
399	417	prints
400	418
401	419	=end original
402	420
403	421	これは次のものを表示します:
404	422
405	423	Japanese for 東京 is toukyou
406	424
407	425	=head2 ℞ 12: Explicit encode/decode
408	426
409	427	(℞ 12: 明示的なエンコード/デコード)
410	428
411	429	=begin original
412	430
413	431	On rare occasion, such as a database read, you may be
414	432	given encoded text you need to decode.
415	433
416	434	=end original
417	435
418	436	まれに、データベースの読み取りなど、デコードする必要がある
419	437	エンコードされたテキストを受け取ることがあります。
420	438
421	439	use Encode qw(encode decode);
422	440
423	441	my $chars = decode("shiftjis", $bytes, 1);
424	442	# OR
425	443	my $bytes = encode("MIME-Header-ISO_2022_JP", $chars, 1);
426	444
427	445	=begin original
428	446
429	447	For streams all in the same encoding, don't use encode/decode; instead
430	448	set the file encoding when you open the file or immediately after with
431	449	C<binmode> as described later below.
432	450
433	451	=end original
434	452
435	453	同じエンコーディングのストリームに対しては、encode/decode を
436	454	使わないでください;
437	455	代わりに、後述するように、ファイルを開くとき、またはその直後に
438	456	C<binmode> でファイルエンコーディングを設定してください。
439	457
440	458	=head2 ℞ 13: Decode program arguments as utf8
441	459
442	460	(℞ 13: プログラム引数を utf8 としてデコードする)
443	461
444	462	$ perl -CA ...
445	463	or
446	464	$ export PERL_UNICODE=A
447	465	or
448		use Encode qw(decode~~_utf8~~);
	466	use Encode qw(decode);
449		@ARGV = map { decode~~_utf8~~($_, 1) } @ARGV;
	467	@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
450	468
451	469	=head2 ℞ 14: Decode program arguments as locale encoding
452	470
453	471	(℞ 14: プログラム引数をロケールエンコーディングとしてデコードする)
454	472
455	473	# cpan -i Encode::Locale
456	474	use Encode qw(locale);
457	475	use Encode::Locale;
458	476
459	477	# use "locale" as an arg to encode/decode
460	478	@ARGV = map { decode(locale => $_, 1) } @ARGV;
461	479
462	480	=head2 ℞ 15: Declare STD{IN,OUT,ERR} to be utf8
463	481
464	482	(℞ 15: STD{IN,OUT,ERR} を utf8 として宣言する)
465	483
466	484	=begin original
467	485
468	486	Use a command-line option, an environment variable, or else
469	487	call C<binmode> explicitly:
470	488
471	489	=end original
472	490
473	491	コマンドラインオプションや環境変数を使うか、明示的に
474	492	C<binmode> を呼び出します。
475	493
476	494	$ perl -CS ...
477	495	or
478	496	$ export PERL_UNICODE=S
479	497	or
480		use open qw(:std :~~utf~~8);
	498	use open qw(:std :encoding(UTF-8));
481	499	or
482		binmode(STDIN, ":~~utf~~8");
	500	binmode(STDIN, ":encoding(UTF-8)");
483	501	binmode(STDOUT, ":utf8");
484	502	binmode(STDERR, ":utf8");
485	503
486	504	=head2 ℞ 16: Declare STD{IN,OUT,ERR} to be in locale encoding
487	505
488	506	(℞ 15: STD{IN,OUT,ERR} をロケールエンコーディングとして宣言する)
489	507
490	508	# cpan -i Encode::Locale
491	509	use Encode;
492	510	use Encode::Locale;
493	511
494	512	# or as a stream for binmode or open
495	513	binmode STDIN, ":encoding(console_in)" if -t STDIN;
496	514	binmode STDOUT, ":encoding(console_out)" if -t STDOUT;
497	515	binmode STDERR, ":encoding(console_out)" if -t STDERR;
498	516
499	517	=head2 ℞ 17: Make file I/O default to utf8
500	518
501	519	(℞ 17: ファイル I/O のデフォルトを utf8 にする)
502	520
503	521	=begin original
504	522
505	523	Files opened without an encoding argument will be in UTF-8:
506	524
507	525	=end original
508	526
509	527	encoding 引数なしで開かれたファイルは UTF-8 になります:
510	528
511	529	$ perl -CD ...
512	530	or
513	531	$ export PERL_UNICODE=D
514	532	or
515		use open qw(:~~utf~~8);
	533	use open qw(:encoding(UTF-8));
516	534
517	535	=head2 ℞ 18: Make all I/O and args default to utf8
518	536
519	537	(℞ 18: 全ての I/O と引数のデフォルトを utf8 にする)
520	538
521	539	$ perl -CSDA ...
522	540	or
523	541	$ export PERL_UNICODE=SDA
524	542	or
525		use open qw(:std :~~utf~~8);
	543	use open qw(:std :encoding(UTF-8));
526		use Encode qw(decode~~_utf8~~);
	544	use Encode qw(decode);
527		@ARGV = map { decode~~_utf8~~($_, 1) } @ARGV;
	545	@ARGV = map { decode('UTF-8', $_, 1) } @ARGV;
528	546
529	547	=head2 ℞ 19: Open file with specific encoding
530	548
531	549	(℞ 19: 特定のエンコーディングでファイルを開く)
532	550
533	551	=begin original
534	552
535	553	Specify stream encoding. This is the normal way
536	554	to deal with encoded text, not by calling low-level
537	555	functions.
538	556
539	557	=end original
540	558
541	559	ストリームエンコーディングを指定します。
542	560	これは、低レベル関数を呼び出すのではなく、エンコードされたテキストを
543	561	処理する通常の方法です。
544	562
545	563	# input file
546	564	open(my $in_file, "< :encoding(UTF-16)", "wintext");
547	565	OR
548	566	open(my $in_file, "<", "wintext");
549	567	binmode($in_file, ":encoding(UTF-16)");
550	568	THEN
551	569	my $line = <$in_file>;
552	570
553	571	# output file
554	572	open($out_file, "> :encoding(cp1252)", "wintext");
555	573	OR
556	574	open(my $out_file, ">", "wintext");
557	575	binmode($out_file, ":encoding(cp1252)");
558	576	THEN
559	577	print $out_file "some text\n";
560	578
561	579	=begin original
562	580
563	581	More layers than just the encoding can be specified here. For example,
564	582	the incantation C<":raw :encoding(UTF-16LE) :crlf"> includes implicit
565	583	CRLF handling.
566	584
567	585	=end original
568	586
569	587	ここで指定できるのは、エンコーディングだけではありません。
570	588	例えば、呪文 C<":raw :encoding(UTF-16LE) :crlf"> には
571	589	暗黙的な CRLF 処理が含まれています。
572	590
573	591	=head2 ℞ 20: Unicode casing
574	592
575	593	(℞ 20: Unicode の大文字小文字)
576	594
577	595	=begin original
578	596
579	597	Unicode casing is very different from ASCII casing.
580	598
581	599	=end original
582	600
583	601	Unicode の大文字小文字は ASCII の大文字小文字とは大きく異なります。
584	602
585	603	uc("henry ⅷ") # "HENRY Ⅷ"
586	604	uc("tschüß") # "TSCHÜSS" notice ß => SS
587	605
588	606	# both are true:
589	607	"tschüß" =~ /TSCHÜSS/i # notice ß => SS
590	608	"Σίσυφος" =~ /ΣΊΣΥΦΟΣ/i # notice Σ,σ,ς sameness
591	609
592	610	=head2 ℞ 21: Unicode case-insensitive comparisons
593	611
594	612	(℞ 21: Unicode の大文字小文字を無視した比較)
595	613
596	614	=begin original
597	615
598	616	Also available in the CPAN L<Unicode::CaseFold> module,
599	617	the new C<fc> “foldcase” function from v5.16 grants
600	618	access to the same Unicode casefolding as the C</i>
601	619	pattern modifier has always used:
602	620
603	621	=end original
604	622
605	623	CPAN の L<Unicode::CaseFold> モジュールでも利用可能な、v5.16 の新しい
606	624	C<fc> "foldcase" 関数は、C</i> パターン修飾子が常に使ってきたのと同じ
607	625	Unicode 大文字小文字畳み込みへのアクセスを与えます。
608	626
609	627	use feature "fc"; # fc() function is from v5.16
610	628
611	629	# sort case-insensitively
612	630	my @sorted = sort { fc($a) cmp fc($b) } @list;
613	631
614	632	# both are true:
615	633	fc("tschüß") eq fc("TSCHÜSS")
616	634	fc("Σίσυφος") eq fc("ΣΊΣΥΦΟΣ")
617	635
618	636	=head2 ℞ 22: Match Unicode linebreak sequence in regex
619	637
620	638	(℞ 22: 正規表現中の Unicode 改行並びのマッチング)
621	639
622	640	=begin original
623	641
624	642	A Unicode linebreak matches the two-character CRLF
625	643	grapheme or any of seven vertical whitespace characters.
626	644	Good for dealing with textfiles coming from different
627	645	operating systems.
628	646
629	647	=end original
630	648
631	649	Unicode の改行は、2 文字の CRLF 書記素または七つの垂直空白文字の
632	650	いずれかにマッチングします。
633	651	異なるオペレーティングシステムから送られてくるテキストファイルを
634	652	扱うのに適しています。
635	653
636	654	\R
637	655
638	656	s/\R/\n/g; # normalize all linebreaks to \n
639	657
640	658	=head2 ℞ 23: Get character category
641	659
642	660	(℞ 23: 文字カテゴリを得る)
643	661
644	662	=begin original
645	663
646	664	Find the general category of a numeric codepoint.
647	665
648	666	=end original
649	667
650	668	数値符号位置の一般カテゴリを見つけます。
651	669
652	670	use Unicode::UCD qw(charinfo);
653	671	my $cat = charinfo(0x3A3)->{category}; # "Lu"
654	672
655	673	=head2 ℞ 24: Disabling Unicode-awareness in builtin charclasses
656	674
657	675	(℞ 24: 組み込み文字クラスで Unicode 判定を無効にする)
658	676
659	677	=begin original
660	678
661	679	Disable C<\w>, C<\b>, C<\s>, C<\d>, and the POSIX
662	680	classes from working correctly on Unicode either in this
663	681	scope, or in just one regex.
664	682
665	683	=end original
666	684
667	685	このスコープまたは一つの正規表現で、C<\w>、C<\b>、C<\s>、C<\d>、
668	686	および POSIX クラスが Unicode で正しく動作しないようにします。
669	687
670	688	use v5.14;
671	689	use re "/a";
672	690
673	691	# OR
674	692
675	693	my($num) = $str =~ /(\d+)/a;
676	694
677	695	=begin original
678	696
679	697	Or use specific un-Unicode properties, like C<\p{ahex}>
680	698	and C<\p{POSIX_Digit>}. Properties still work normally
681	699	no matter what charset modifiers (C</d /u /l /a /aa>)
682	700	should be effect.
683	701
684	702	=end original
685	703
686	704	または、C<\p{ahex}> や C<\p{POSIX_Digit>} などの特定の非 Unicode 特性を
687	705	使います。
688	706	どの文字集合修飾子 (C</d /u /l /a /aa>) が有効であっても、
689	707	特性は正常に動作します。
690	708
691	709	=head2 ℞ 25: Match Unicode properties in regex with \p, \P
692	710
693	711	(℞ 25: 正規表現中に \p, \P を使って Unicode 特性にマッチングする)
694	712
695	713	=begin original
696	714
697	715	These all match a single codepoint with the given
698	716	property. Use C<\P> in place of C<\p> to match
699	717	one codepoint lacking that property.
700	718
701	719	=end original
702	720
703	721	これらはすべて、指定された特性を持つ一つの符号位置にマッチングします。
704	722	C<\p> の代わりに C<\P> を使用すると、その特性を持たない一つの符号位置に
705	723	マッチングします。
706	724
707	725	\pL, \pN, \pS, \pP, \pM, \pZ, \pC
708	726	\p{Sk}, \p{Ps}, \p{Lt}
709	727	\p{alpha}, \p{upper}, \p{lower}
710	728	\p{Latin}, \p{Greek}
711		\p{script=Latin}, \p{sc~~ript~~=Greek}
	729	\p{script_extensions=Latin}, \p{scx=Greek}
712	730	\p{East_Asian_Width=Wide}, \p{EA=W}
713	731	\p{Line_Break=Hyphen}, \p{LB=HY}
714	732	\p{Numeric_Value=4}, \p{NV=4}
715	733
716	734	=head2 ℞ 26: Custom character properties
717	735
718	736	(℞ 26: カスタム文字特性)
719	737
720	738	=begin original
721	739
722	740	Define at compile-time your own custom character
723	741	properties for use in regexes.
724	742
725	743	=end original
726	744
727	745	正規表現で使用する独自のカスタム文字特性をコンパイル時に定義します。
728	746
729	747	# using private-use characters
730	748	sub In_Tengwar { "E000\tE07F\n" }
731	749
732	750	if (/\p{In_Tengwar}/) { ... }
733	751
734	752	# blending existing properties
735	753	sub Is_GraecoRoman_Title {<<'END_OF_SET'}
736	754	+utf8::IsLatin
737	755	+utf8::IsGreek
738	756	&utf8::IsTitle
739	757	END_OF_SET
740	758
741	759	if (/\p{Is_GraecoRoman_Title}/ { ... }
742	760
743	761	=head2 ℞ 27: Unicode normalization
744	762
745	763	(℞ 27: Unicode 正規化)
746	764
747	765	=begin original
748	766
749	767	Typically render into NFD on input and NFC on output. Using NFKC or NFKD
750	768	functions improves recall on searches, assuming you've already done to the
751	769	same text to be searched. Note that this is about much more than just pre-
752	770	combined compatibility glyphs; it also reorders marks according to their
753	771	canonical combining classes and weeds out singletons.
754	772
755	773	=end original
756	774
757	775	通常は、入力では NFD に、出力では NFC にレンダリングされます。
758	776	NFKC または NFKD 関数を使うことで、検索対象の同じテキストに対して
759	777	既に実行していることを前提として、検索時の再呼び出しが改善されます。
760	778	これは単に事前結合された互換グリフ以上のものであることに
761	779	注意してください;
762	780	正準結合クラスに従ってマークを並び替え、シングルトンを削除します。
763	781
764	782	use Unicode::Normalize;
765	783	my $nfd = NFD($orig);
766	784	my $nfc = NFC($orig);
767	785	my $nfkd = NFKD($orig);
768	786	my $nfkc = NFKC($orig);
769	787
770	788	=head2 ℞ 28: Convert non-ASCII Unicode numerics
771	789
772	790	(℞ 28: 非 ASCII Unicode 数字を変換する)
773	791
774	792	=begin original
775	793
776	794	Unless you’ve used C</a> or C</aa>, C<\d> matches more than
777	795	ASCII digits only, but Perl’s implicit string-to-number
778	796	conversion does not current recognize these. Here’s how to
779	797	convert such strings manually.
780	798
781	799	=end original
782	800
783	801	C</a> や C</aa> を使用していない限り、C<\d> は ASCII 数字以上のものに
784	802	マッチングしますが、
785	803	Perl の暗黙的な文字列から数値への変換では、現在のところこれらを
786	804	認識できません。
787	805	このような文字列を手動で変換する方法を以下に示します。
788	806
789	807	use v5.14; # needed for num() function
790	808	use Unicode::UCD qw(num);
791	809	my $str = "got Ⅻ and ४५६७ and ⅞ and here";
792	810	my @nums = ();
793	811	while ($str =~ /(\d+\|\N)/g) { # not just ASCII!
794	812	push @nums, num($1);
795	813	}
796	814	say "@nums"; # 12 4567 0.875
797	815
798	816	use charnames qw(:full);
799	817	my $nv = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}");
800	818
801	819	=head2 ℞ 29: Match Unicode grapheme cluster in regex
802	820
803	821	(℞ 29: 正規表現中の Unicode 書記素クラスタにマッチングする)
804	822
805	823	=begin original
806	824
807	825	Programmer-visible “characters” are codepoints matched by C</./s>,
808	826	but user-visible “characters” are graphemes matched by C</\X/>.
809	827
810	828	=end original
811	829
812	830	プログラマから見える「文字」は、C</./s> がマッチする符号位置ですが、
813	831	ユーザから見える「文字」は、C</\X/> がマッチする書記素です。
814	832
815	833	# Find vowel plus any combining diacritics,underlining,etc.
816	834	my $nfd = NFD($orig);
817	835	$nfd =~ / (?=[aeiou]) \X /xi
818	836
819	837	=head2 ℞ 30: Extract by grapheme instead of by codepoint (regex)
820	838
821	839	(℞ 30: 符号位置によってではなく、書記素によって展開する (正規表現))
822	840
823	841	# match and grab five first graphemes
824	842	my($first_five) = $str =~ /^ ( \X{5} ) /x;
825	843
826	844	=head2 ℞ 31: Extract by grapheme instead of by codepoint (substr)
827	845
828	846	(℞ 31: 符号位置によってではなく、書記素によって展開する (substr))
829	847
830	848	# cpan -i Unicode::GCString
831	849	use Unicode::GCString;
832	850	my $gcs = Unicode::GCString->new($str);
833	851	my $first_five = $gcs->substr(0, 5);
834	852
835	853	=head2 ℞ 32: Reverse string by grapheme
836	854
837	855	(℞ 32: 文字列を書記素単位で反転する)
838	856
839	857	=begin original
840	858
841	859	Reversing by codepoint messes up diacritics, mistakenly converting
842	860	C<crème brûlée> into C<éel̂urb em̀erc> instead of into C<eélûrb emèrc>;
843	861	so reverse by grapheme instead. Both these approaches work
844	862	right no matter what normalization the string is in:
845	863
846	864	=end original
847	865
848	866	符号位置による反転はダイアクリティカルマークを混乱させ、誤って
849	867	C<crème brülée> を C<eélûrb emèrc> ではなく
850	868	C<éel̂urb em̀erc> に変換します;
851	869	そこで、代わりに書記素による反転を行います。
852	870	これらの手法はどちらも、文字列の正規化がどのようなものであっても
853	871	正しく機能します。
854	872
855	873	$str = join("", reverse $str =~ /\X/g);
856	874
857	875	# OR: cpan -i Unicode::GCString
858	876	use Unicode::GCString;
859	877	$str = reverse Unicode::GCString->new($str);
860	878
861	879	=head2 ℞ 33: String length in graphemes
862	880
863	881	(℞ 33: 書記素での文字列長)
864	882
865	883	=begin original
866	884
867	885	The string C<brûlée> has six graphemes but up to eight codepoints.
868	886	This counts by grapheme, not by codepoint:
869	887
870	888	=end original
871	889
872	890	文字列 C<brülée> は六つの書記素を持ちますが、最大八つの符号位置を持ちます。
873	891	これは、符号位置ではなく、書記素によってカウントされます:
874	892
875	893	my $str = "brûlée";
876	894	my $count = 0;
877	895	while ($str =~ /\X/g) { $count++ }
878	896
879	897	# OR: cpan -i Unicode::GCString
880	898	use Unicode::GCString;
881	899	my $gcs = Unicode::GCString->new($str);
882	900	my $count = $gcs->length;
883	901
884	902	=head2 ℞ 34: Unicode column-width for printing
885	903
886	904	(℞ 34: 表示のための Unicode 桁幅)
887	905
888	906	=begin original
889	907
890	908	Perl’s C<printf>, C<sprintf>, and C<format> think all
891	909	codepoints take up 1 print column, but many take 0 or 2.
892	910	Here to show that normalization makes no difference,
893	911	we print out both forms:
894	912
895	913	=end original
896	914
897	915	Perl の C<printf>、C<sprintf>、C<format> は、すべての符号位置が
898	916	一つの表示桁を占有すると考えていますが、多くの符号位置は 0 から 2 を
899	917	占有します。
900	918	ここでは、正規化に違いがないことを示すために、両方の形式を出力します。
901	919
902	920	use Unicode::GCString;
903	921	use Unicode::Normalize;
904	922
905	923	my @words = qw/crème brûlée/;
906	924	@words = map { NFC($_), NFD($_) } @words;
907	925
908	926	for my $str (@words) {
909	927	my $gcs = Unicode::GCString->new($str);
910	928	my $cols = $gcs->columns;
911	929	my $pad = " " x (10 - $cols);
912	930	say str, $pad, " \|";
913	931	}
914	932
915	933	=begin original
916	934
917	935	generates this to show that it pads correctly no matter
918	936	the normalization:
919	937
920	938	=end original
921	939
922	940	これは、正規化に関係なく正しくパッディングされていることを示すために
923	941	次のように生成されます。
924	942
925	943	crème \|
926	944	crème \|
927	945	brûlée \|
928	946	brûlée \|
929	947
930	948	=head2 ℞ 35: Unicode collation
931	949
932	950	(℞ 35: Unicode の照合順序)
933	951
934	952	=begin original
935	953
936	954	Text sorted by numeric codepoint follows no reasonable alphabetic order;
937	955	use the UCA for sorting text.
938	956
939	957	=end original
940	958
941	959	数値符号位置でソートされたテキストは、合理的なアルファベット順ではありません;
942	960	テキストのソートには UCA を使用してください。
943	961
944	962	use Unicode::Collate;
945	963	my $col = Unicode::Collate->new();
946	964	my @list = $col->sort(@old_list);
947	965
948	966	=begin original
949	967
950	968	See the I<ucsort> program from the L<Unicode::Tussle> CPAN module
951	969	for a convenient command-line interface to this module.
952	970
953	971	=end original
954	972
955	973	このモジュールへの便利なコマンドラインインタフェースについては、
956	974	L<Unicode::Tassil> CPAN モジュールの I<ucsort> プログラムを参照してください。
957	975
958	976	=head2 ℞ 36: Case- I<and> accent-insensitive Unicode sort
959	977
960	978	(℞ 36: 大文字小文字 I<および> アクセントを無視した Unicode のソート)
961	979
962	980	=begin original
963	981
964	982	Specify a collation strength of level 1 to ignore case and
965	983	diacritics, only looking at the basic character.
966	984
967	985	=end original
968	986
969	987	照合強度レベル 1 を指定して、大文字小文字とダイアクリティカルマークを
970	988	無視し、基本文字だけを参照するようにします。
971	989
972	990	use Unicode::Collate;
973	991	my $col = Unicode::Collate->new(level => 1);
974	992	my @list = $col->sort(@old_list);
975	993
976	994	=head2 ℞ 37: Unicode locale collation
977	995
978	996	(℞ 37: Unicode ロケールの照合順序)
979	997
980	998	=begin original
981	999
982	1000	Some locales have special sorting rules.
983	1001
984	1002	=end original
985	1003
986	1004	一部のロケールには、特別なソート規則があります。
987	1005
988	1006	# either use v5.12, OR: cpan -i Unicode::Collate::Locale
989	1007	use Unicode::Collate::Locale;
990	1008	my $col = Unicode::Collate::Locale->new(locale => "de__phonebook");
991	1009	my @list = $col->sort(@old_list);
992	1010
993	1011	=begin original
994	1012
995	1013	The I<ucsort> program mentioned above accepts a C<--locale> parameter.
996	1014
997	1015	=end original
998	1016
999	1017	上記の I<ucsort> プログラムは、C<--locale> パラメータを受け付けます。
1000	1018
1001	1019	=head2 ℞ 38: Making C<cmp> work on text instead of codepoints
1002	1020
1003	1021	(℞ 38: 符号位置ではなくテキストでg C<cmp> が動作するようにする)
1004	1022
1005	1023	=begin original
1006	1024
1007	1025	Instead of this:
1008	1026
1009	1027	=end original
1010	1028
1011	1029	次のようにせずに:
1012	1030
1013	1031	@srecs = sort {
1014	1032	$b->{AGE} <=> $a->{AGE}
1015	1033	\|\|
1016	1034	$a->{NAME} cmp $b->{NAME}
1017	1035	} @recs;
1018	1036
1019	1037	=begin original
1020	1038
1021	1039	Use this:
1022	1040
1023	1041	=end original
1024	1042
1025	1043	次を使います:
1026	1044
1027	1045	my $coll = Unicode::Collate->new();
1028	1046	for my $rec (@recs) {
1029	1047	$rec->{NAME_key} = $coll->getSortKey( $rec->{NAME} );
1030	1048	}
1031	1049	@srecs = sort {
1032	1050	$b->{AGE} <=> $a->{AGE}
1033	1051	\|\|
1034	1052	$a->{NAME_key} cmp $b->{NAME_key}
1035	1053	} @recs;
1036	1054
1037	1055	=head2 ℞ 39: Case- I<and> accent-insensitive comparisons
1038	1056
1039	1057	(℞ 39: 大文字小文字 I<および> アクセントを無視した比較)
1040	1058
1041	1059	=begin original
1042	1060
1043	1061	Use a collator object to compare Unicode text by character
1044	1062	instead of by codepoint.
1045	1063
1046	1064	=end original
1047	1065
1048	1066	照合オブジェクトを使用して、Unicode テキストを符号位置ではなく
1049	1067	文字で比較します。
1050	1068
1051	1069	use Unicode::Collate;
1052	1070	my $es = Unicode::Collate->new(
1053	1071	level => 1,
1054	1072	normalization => undef
1055	1073	);
1056	1074
1057	1075	# now both are true:
1058	1076	$es->eq("García", "GARCIA" );
1059	1077	$es->eq("Márquez", "MARQUEZ");
1060	1078
1061	1079	=head2 ℞ 40: Case- I<and> accent-insensitive locale comparisons
1062	1080
1063	1081	(℞ 40: 大文字小文字 I<および> アクセントを無視したロケールでの比較)
1064	1082
1065	1083	=begin original
1066	1084
1067	1085	Same, but in a specific locale.
1068	1086
1069	1087	=end original
1070	1088
1071	1089	同じですが、特定のロケールです。
1072	1090
1073	1091	my $de = Unicode::Collate::Locale->new(
1074	1092	locale => "de__phonebook",
1075	1093	);
1076	1094
1077	1095	# now this is true:
1078	1096	$de->eq("tschüß", "TSCHUESS"); # notice ü => UE, ß => SS
1079	1097
1080	1098	=head2 ℞ 41: Unicode linebreaking
1081	1099
1082	1100	(℞ 41: Unicode の改行)
1083	1101
1084	1102	=begin original
1085	1103
1086	1104	Break up text into lines according to Unicode rules.
1087	1105
1088	1106	=end original
1089	1107
1090	1108	Unicode 規則に従ってテキストを行に分割します。
1091	1109
1092	1110	# cpan -i Unicode::LineBreak
1093	1111	use Unicode::LineBreak;
1094	1112	use charnames qw(:full);
1095	1113
1096	1114	my $para = "This is a super\N{HYPHEN}long string. " x 20;
1097	1115	my $fmt = Unicode::LineBreak->new;
1098	1116	print $fmt->break($para), "\n";
1099	1117
1100	1118	=head2 ℞ 42: Unicode text in DBM hashes, the tedious way
1101	1119
1102	1120	(℞ 42: DBM ハッシュの中の Unicode テキスト、退屈な方法)
1103	1121
1104	1122	=begin original
1105	1123
1106	1124	Using a regular Perl string as a key or value for a DBM
1107	1125	hash will trigger a wide character exception if any codepoints
1108	1126	won’t fit into a byte. Here’s how to manually manage the translation:
1109	1127
1110	1128	=end original
1111	1129
1112	1130	DBM ハッシュのキーまたは値として通常の Perl 文字列を使用すると、
1113	1131	符号位置が 1 バイトに収まらない場合にワイド文字例外が発生します。
1114	1132	次に、手動で変換を管理する方法を示します:
1115	1133
1116	1134	use DB_File;
1117	1135	use Encode qw(encode decode);
1118	1136	tie %dbhash, "DB_File", "pathname";
1119	1137
1120	1138	# STORE
1121	1139
1122	1140	# assume $uni_key and $uni_value are abstract Unicode strings
1123	1141	my $enc_key = encode("UTF-8", $uni_key, 1);
1124	1142	my $enc_value = encode("UTF-8", $uni_value, 1);
1125	1143	$dbhash{$enc_key} = $enc_value;
1126	1144
1127	1145	# FETCH
1128	1146
1129	1147	# assume $uni_key holds a normal Perl string (abstract Unicode)
1130	1148	my $enc_key = encode("UTF-8", $uni_key, 1);
1131	1149	my $enc_value = $dbhash{$enc_key};
1132	1150	my $uni_value = decode("UTF-8", $enc_value, 1);
1133	1151
1134	1152	=head2 ℞ 43: Unicode text in DBM hashes, the easy way
1135	1153
1136	1154	(℞ 43: DBM ハッシュの中の Unicode テキスト、簡単な方法)
1137	1155
1138	1156	=begin original
1139	1157
1140	1158	Here’s how to implicitly manage the translation; all encoding
1141	1159	and decoding is done automatically, just as with streams that
1142	1160	have a particular encoding attached to them:
1143	1161
1144	1162	=end original
1145	1163
1146	1164	次に、変換を暗黙的に管理する方法を示します;
1147	1165	すべてのエンコードとデコードは、特定のエンコーディングが付加された
1148	1166	ストリームと同じように自動的に行われます:
1149	1167
1150	1168	use DB_File;
1151	1169	use DBM_Filter;
1152	1170
1153	1171	my $dbobj = tie %dbhash, "DB_File", "pathname";
1154	1172	$dbobj->Filter_Value("utf8"); # this is the magic bit
1155	1173
1156	1174	# STORE
1157	1175
1158	1176	# assume $uni_key and $uni_value are abstract Unicode strings
1159	1177	$dbhash{$uni_key} = $uni_value;
1160	1178
1161	1179	# FETCH
1162	1180
1163	1181	# $uni_key holds a normal Perl string (abstract Unicode)
1164	1182	my $uni_value = $dbhash{$uni_key};
1165	1183
1166	1184	=head2 ℞ 44: PROGRAM: Demo of Unicode collation and printing
1167	1185
1168	1186	(℞ 44: プログラム: Unicode の照合と表示のデモ)
1169	1187
1170	1188	=begin original
1171	1189
1172	1190	Here’s a full program showing how to make use of locale-sensitive
1173	1191	sorting, Unicode casing, and managing print widths when some of the
1174	1192	characters take up zero or two columns, not just one column each time.
1175	1193	When run, the following program produces this nicely aligned output:
1176	1194
1177	1195	=end original
1178	1196
1179	1197	以下の完全なプログラムでは、ロケールを認識するソート、
1180	1198	Unicode の大文字小文字、そしていくつかの文字が 1 桁ではなく 0 または 2 桁を
1181	1199	占める場合の印刷幅の管理をどのように利用するかを示しています。
1182	1200	次のプログラムを実行すると、次のようなうまく整列した出力が生成されます:
1183	1201
1184	1202	Crème Brûlée....... €2.00
1185	1203	Éclair............. €1.60
1186	1204	Fideuà............. €4.20
1187	1205	Hamburger.......... €6.00
1188	1206	Jamón Serrano...... €4.45
1189	1207	Linguiça........... €7.00
1190	1208	Pâté............... €4.15
1191	1209	Pears.............. €2.00
1192	1210	Pêches............. €2.25
1193	1211	Smørbrød........... €5.75
1194	1212	Spätzle............ €5.50
1195	1213	Xoriço............. €3.00
1196	1214	Γύρος.............. €6.50
1197	1215	막걸리............. €4.00
1198	1216	おもち............. €2.65
1199	1217	お好み焼き......... €8.00
1200	1218	シュークリーム..... €1.85
1201	1219	寿司............... €9.99
1202	1220	包子............... €7.50
1203	1221
1204	1222	=begin original
1205	1223
1206		Here's that program~~; tested on v5.14~~.
	1224	Here's that program.
1207	1225
1208	1226	=end original
1209	1227
1210		これがプログラムです~~; v5.14 でテストされています~~。
	1228	これがプログラムです。
1211	1229
1212	1230	#!/usr/bin/env perl
1213	1231	# umenu - demo sorting and printing of Unicode food
1214	1232	#
1215	1233	# (obligatory and increasingly long preamble)
1216	1234	#
	1235	use v5.36;
1217	1236	use utf8;
1218		use v5.14; # for locale sorting
1219		use strict;
1220		use warnings;
1221	1237	use warnings qw(FATAL utf8); # fatalize encoding faults
1222		use open qw(:std :~~utf~~8); # undeclared streams in UTF-8
	1238	use open qw(:std :encoding(UTF-8)); # undeclared streams in UTF-8
1223	1239	use charnames qw(:full :short); # unneeded in v5.16
1224	1240
1225	1241	# std modules
1226	1242	use Unicode::Normalize; # std perl distro as of v5.8
1227	1243	use List::Util qw(max); # std perl distro as of v5.10
1228	1244	use Unicode::Collate::Locale; # std perl distro as of v5.14
1229	1245
1230	1246	# cpan modules
1231	1247	use Unicode::GCString; # from CPAN
1232	1248
1233		# forward defs
1234		sub pad($$$);
1235		sub colwidth(_);
1236		sub entitle(_);
1237
1238	1249	my %price = (
1239	1250	"γύρος" => 6.50, # gyros
1240	1251	"pears" => 2.00, # like um, pears
1241	1252	"linguiça" => 7.00, # spicy sausage, Portuguese
1242	1253	"xoriço" => 3.00, # chorizo sausage, Catalan
1243	1254	"hamburger" => 6.00, # burgermeister meisterburger
1244	1255	"éclair" => 1.60, # dessert, French
1245	1256	"smørbrød" => 5.75, # sandwiches, Norwegian
1246	1257	"spätzle" => 5.50, # Bayerisch noodles, little sparrows
1247	1258	"包子" => 7.50, # bao1 zi5, steamed pork buns, Mandarin
1248	1259	"jamón serrano" => 4.45, # country ham, Spanish
1249	1260	"pêches" => 2.25, # peaches, French
1250	1261	"シュークリーム" => 1.85, # cream-filled pastry like eclair
1251	1262	"막걸리" => 4.00, # makgeolli, Korean rice wine
1252	1263	"寿司" => 9.99, # sushi, Japanese
1253	1264	"おもち" => 2.65, # omochi, rice cakes, Japanese
1254	1265	"crème brûlée" => 2.00, # crema catalana
1255	1266	"fideuà" => 4.20, # more noodles, Valencian
1256	1267	# (Catalan=fideuada)
1257	1268	"pâté" => 4.15, # gooseliver paste, French
1258	1269	"お好み焼き" => 8.00, # okonomiyaki, Japanese
1259	1270	);
1260	1271
1261		my $width = 5 + max map { colwidth } keys %price;
	1272	my $width = 5 + max map { colwidth($_) } keys %price;
1262	1273
1263	1274	# So the Asian stuff comes out in an order that someone
1264	1275	# who reads those scripts won't freak out over; the
1265	1276	# CJK stuff will be in JIS X 0208 order that way.
1266	1277	my $coll = Unicode::Collate::Locale->new(locale => "ja");
1267	1278
1268	1279	for my $item ($coll->sort(keys %price)) {
1269	1280	print pad(entitle($item), $width, ".");
1270	1281	printf " €%.2f\n", $price{$item};
1271	1282	}
1272	1283
1273		sub pad($$$) {
	1284	sub pad ($str, $width, $padchar) {
1274		my($str, $width, $padchar) = @_;
1275	1285	return $str . ($padchar x ($width - colwidth($str)));
1276	1286	}
1277	1287
1278		sub colwidth(_) {
	1288	sub colwidth ($str) {
1279		my($str) = @_;
1280	1289	return Unicode::GCString->new($str)->columns;
1281	1290	}
1282	1291
1283		sub entitle(_) {
	1292	sub entitle ($str) {
1284		my($str) = @_;
1285	1293	$str =~ s{ (?=\pL)(\S) (\S*) }
1286	1294	{ ucfirst($1) . lc($2) }xge;
1287	1295	return $str;
1288	1296	}
1289	1297
1290	1298	=head1 SEE ALSO
1291	1299
1292	1300	=begin original
1293	1301
1294	1302	See these manpages, some of which are CPAN modules:
1295	1303	L<perlunicode>, L<perluniprops>,
1296	1304	L<perlre>, L<perlrecharclass>,
1297	1305	L<perluniintro>, L<perlunitut>, L<perlunifaq>,
1298	1306	L<PerlIO>, L<DB_File>, L<DBM_Filter>, L<DBM_Filter::utf8>,
1299	1307	L<Encode>, L<Encode::Locale>,
1300	1308	L<Unicode::UCD>,
1301	1309	L<Unicode::Normalize>,
1302	1310	L<Unicode::GCString>, L<Unicode::LineBreak>,
1303	1311	L<Unicode::Collate>, L<Unicode::Collate::Locale>,
1304	1312	L<Unicode::Unihan>,
1305	1313	L<Unicode::CaseFold>,
1306	1314	L<Unicode::Tussle>,
1307	1315	L<Lingua::JA::Romanize::Japanese>,
1308	1316	L<Lingua::ZH::Romanize::Pinyin>,
1309	1317	L<Lingua::KO::Romanize::Hangul>.
1310	1318
1311	1319	=end original
1312	1320
1313	1321	以下の man ページ; 一部は CPAN モジュールのものです:
1314	1322	L<perlunicode>, L<perluniprops>,
1315	1323	L<perlre>, L<perlrecharclass>,
1316	1324	L<perluniintro>, L<perlunitut>, L<perlunifaq>,
1317	1325	L<PerlIO>, L<DB_File>, L<DBM_Filter>, L<DBM_Filter::utf8>,
1318	1326	L<Encode>, L<Encode::Locale>,
1319	1327	L<Unicode::UCD>,
1320	1328	L<Unicode::Normalize>,
1321	1329	L<Unicode::GCString>, L<Unicode::LineBreak>,
1322	1330	L<Unicode::Collate>, L<Unicode::Collate::Locale>,
1323	1331	L<Unicode::Unihan>,
1324	1332	L<Unicode::CaseFold>,
1325	1333	L<Unicode::Tussle>,
1326	1334	L<Lingua::JA::Romanize::Japanese>,
1327	1335	L<Lingua::ZH::Romanize::Pinyin>,
1328	1336	L<Lingua::KO::Romanize::Hangul>.
1329	1337
1330	1338	=begin original
1331	1339
1332	1340	The L<Unicode::Tussle> CPAN module includes many programs
1333	1341	to help with working with Unicode, including
1334	1342	these programs to fully or partly replace standard utilities:
1335	1343	I<tcgrep> instead of I<egrep>,
1336	1344	I<uniquote> instead of I<cat -v> or I<hexdump>,
1337	1345	I<uniwc> instead of I<wc>,
1338	1346	I<unilook> instead of I<look>,
1339	1347	I<unifmt> instead of I<fmt>,
1340	1348	and
1341	1349	I<ucsort> instead of I<sort>.
1342	1350	For exploring Unicode character names and character properties,
1343	1351	see its I<uniprops>, I<unichars>, and I<uninames> programs.
1344	1352	It also supplies these programs, all of which are general filters that do Unicode-y things:
1345	1353	I<unititle> and I<unicaps>;
1346	1354	I<uniwide> and I<uninarrow>;
1347	1355	I<unisupers> and I<unisubs>;
1348	1356	I<nfd>, I<nfc>, I<nfkd>, and I<nfkc>;
1349	1357	and I<uc>, I<lc>, and I<tc>.
1350	1358
1351	1359	=end original
1352	1360
1353	1361	L<Unicode::Tussle> CPAN モジュールには、Unicode を扱うための多くの
1354	1362	プログラムが含まれています;
1355	1363	これらのプログラムは、標準ユーティリティを完全にまたは部分的に
1356	1364	置き換えるためのものです:
1357	1365	I<egrep> の代わりに I<tcgrep>、
1358	1366	I<cat -v> または I<hexdump> の代わりに I<uniquote>、
1359	1367	I<wc> の代わりに I<uniwc>、
1360	1368	I<look> の代わりに I<unilook>、
1361	1369	I<fmt> の代わりに I<unifmt>、
1362	1370	I<sort> の代わりに I<ucsort>。
1363	1371	Unicode 文字名と文字特性を調べるには、I<uniprops>、I<unichars>、
1364	1372	I<uninames> プログラムを参照してください。
1365	1373	また、これらのプログラムも提供しています。
1366	1374	これらはすべて Unicode 対応の一般的なフィルタです:
1367	1375	I<unititle> と I<unicaps>、
1368	1376	I<uniwide> と I<uninarrow>、
1369	1377	I<unisupers> と I<unisubs>、
1370	1378	I<nfd>、I<nfc>、I<nfkd>、I<nfkc>;
1371	1379	I<uc>、I<lc>、I<tc>。
1372	1380
1373	1381	=begin original
1374	1382
1375	1383	Finally, see the published Unicode Standard (page numbers are from version
1376	1384	6.0.0), including these specific annexes and technical reports:
1377	1385
1378	1386	=end original
1379	1387
1380	1388	最後に、これらの特定の付属文書および技術報告書を含む、公開された
1381	1389	Unicode 標準(ページ番号はバージョン6.0.0 から) を参照してください。
1382	1390
1383	1391	=over
1384	1392
1385	1393	=item §3.13 Default Case Algorithms, page 113;
1386	1394	§4.2 Case, pages 120–122;
1387	1395	Case Mappings, page 166–172, especially Caseless Matching starting on page 170.
1388	1396
1389	1397	=item UAX #44: Unicode Character Database
1390	1398
1391	1399	=item UTS #18: Unicode Regular Expressions
1392	1400
1393	1401	=item UAX #15: Unicode Normalization Forms
1394	1402
1395	1403	=item UTS #10: Unicode Collation Algorithm
1396	1404
1397	1405	=item UAX #29: Unicode Text Segmentation
1398	1406
1399	1407	=item UAX #14: Unicode Line Breaking Algorithm
1400	1408
1401	1409	=item UAX #11: East Asian Width
1402	1410
1403	1411	=back
1404	1412
1405	1413	=head1 AUTHOR
1406	1414
1407	1415	=begin original
1408	1416
1409	1417	Tom Christiansen E<lt>tchrist@perl.comE<gt> wrote this, with occasional
1410	1418	kibbitzing from Larry Wall and Jeffrey Friedl in the background.
1411	1419
1412	1420	=end original
1413	1421
1414	1422	Tom Christiansen E<lt>tchrist@perl.comE<gt> が、
1415	1423	時々 Larry Wall と Jeffrey Friedl に後ろから口出しされながら書きました。
1416	1424
1417	1425	=head1 COPYRIGHT AND LICENCE
1418	1426
1419	1427	Copyright © 2012 Tom Christiansen.
1420	1428
1421	1429	This program is free software; you may redistribute it and/or modify it
1422	1430	under the same terms as Perl itself.
1423	1431
1424	1432	=begin original
1425	1433
1426	1434	Most of these examples taken from the current edition of the “Camel Book”;
1427	1435	that is, from the 4ᵗʰ Edition of I<Programming Perl>, Copyright © 2012 Tom
1428	1436	Christiansen <et al.>, 2012-02-13 by O’Reilly Media. The code itself is
1429	1437	freely redistributable, and you are encouraged to transplant, fold,
1430	1438	spindle, and mutilate any of the examples in this manpage however you please
1431	1439	for inclusion into your own programs without any encumbrance whatsoever.
1432	1440	Acknowledgement via code comment is polite but not required.
1433	1441
1434	1442	=end original
1435	1443
1436	1444	これらの例のほとんどは、"Camel Book"の現在の版から引用されています:
1437	1445	すなわち、4ᵗʰ版I<Programming Perl>, Copyright © 2012 Tom
1438	1446	Christiansen <et al.>, 2012-02-13 by O'Reilly Media。
1439	1447	コード自体は自由に再配布可能であり、この man ページの例を移植したり、
1440	1448	折りたたんだり、紡錘形にしたり、切断したりすることが推奨されますが、
1441	1449	あなた自身のプログラムに含めるためには、何も気にせずに行ってください。
1442	1450	コードコメントによる謝辞は丁寧ですが、必須ではありません。
1443	1451
1444	1452	=head1 REVISION HISTORY
1445	1453
1446	1454	=begin original
1447	1455
1448	1456	v1.0.0 – first public release, 2012-02-27
1449	1457
1450	1458	=end original
1451	1459
1452	1460	v1.0.0 - 最初の一般公開、2012-02-27
1453	1461
1454	1462	=begin meta
1455	1463
1456	1464	Translate: SHIRAKATA Kentaro <argrath@ub32.org>
1457	1465	Status: completed
1458	1466
1459	1467	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association