perlunifaq 5.14.1 と 5.20.1 の差分

1	1
2	2	=encoding euc-jp
3	3
4	4	=head1 NAME
5	5
6	6	=begin original
7	7
8	8	perlunifaq - Perl Unicode FAQ
9	9
10	10	=end original
11	11
12	12	perlunifaq - Perl Unicode FAQ
13	13
14	14	=head1 Q and A
15	15
16	16	=begin original
17	17
18	18	This is a list of questions and answers about Unicode in Perl, intended to be
19	19	read after L<perlunitut>.
20	20
21	21	=end original
22	22
23	23	これは、L<perlunitut> の後で読むことを想定した、Perl での Unicode に関する
24	24	質問と答えの一覧です。
25	25
26	26	=head2 perlunitut isn't really a Unicode tutorial, is it?
27	27
28	28	(perlunitut は実際には Unicode チュートリアルじゃないんじゃないの?)
29	29
30	30	=begin original
31	31
32	32	No, and this isn't really a Unicode FAQ.
33	33
34	34	=end original
35	35
36	36	はい、違います; そしてこれは実際には Unicode FAQ ではありません。
37	37
38	38	=begin original
39	39
40	40	Perl has an abstracted interface for all supported character encodings, so this
41	41	is actually a generic C<Encode> tutorial and C<Encode> FAQ. But many people
42	42	think that Unicode is special and magical, and I didn't want to disappoint
43	43	them, so I decided to call the document a Unicode tutorial.
44	44
45	45	=end original
46	46
47	47	Perl は対応している全ての文字エンコーディングへの抽象インターフェースを
48	48	持っているので、実際には汎用の C<Encode> チュートリアルと
49	49	C<Encode> FAQ です。
50	50	しかし、多くの人々が、Unicode は特別でマジカルなものだと考えていて、
51	51	私は彼らを失望させたくなかったので、そのドキュメントを
52	52	Unicode チュートリアルと呼ぶことに決めました。
53	53
54	54	=head2 What character encodings does Perl support?
55	55
56	56	(Perl が対応している文字エンコーディングは何?)
57	57
58	58	=begin original
59	59
60	60	To find out which character encodings your Perl supports, run:
61	61
62	62	=end original
63	63
64	64	Perl がどの文字エンコーディングに対応しているかを見つけるには、以下を
65	65	実行してください:
66	66
67	67	perl -MEncode -le "print for Encode->encodings(':all')"
68	68
69	69	=head2 Which version of perl should I use?
70	70
71	71	(どのバージョンの perl を使うべき?)
72	72
73	73	=begin original
74	74
75	75	Well, if you can, upgrade to the most recent, but certainly C<5.8.1> or newer.
76	76	The tutorial and FAQ assume the latest release.
77	77
78	78	=end original
79	79
80	80	うーん、もし可能なら、最新にアップグレードしてください; 但し、確実に
81	81	C<5.8.1> 以降にはしてください。
82	82	チュートリアルと FAQ は最新リリースを仮定しています。
83	83
84	84	=begin original
85	85
86	86	You should also check your modules, and upgrade them if necessary. For example,
87	87	HTML::Entities requires version >= 1.32 to function correctly, even though the
88	88	changelog is silent about this.
89	89
90	90	=end original
91	91
92	92	モジュールもチェックして、もし必要ならアップグレードするべきです。
93	93	例えば HTML::Entities は、changelog は何も触れていませんが、正しく
94	94	動作するためにはバージョン >= 1.32 が必要です。
95	95
96	96	=head2 What about binary data, like images?
97	97
98	98	(イメージのようなバイナリデータはどうするの?)
99	99
100	100	=begin original
101	101
102	102	Well, apart from a bare C<binmode $fh>, you shouldn't treat them specially.
103	103	(The binmode is needed because otherwise Perl may convert line endings on Win32
104	104	systems.)
105	105
106	106	=end original
107	107
108	108	うーん、生の C<binmode $fh> を別として、特別に扱う必要はないはずです。
109	109	(Win32 システムで Perl が行端を変更しないようにするために、binmode が
110	110	必要です。)
111	111
112	112	=begin original
113	113
114	114	Be careful, though, to never combine text strings with binary strings. If you
115	115	need text in a binary stream, encode your text strings first using the
116	116	appropriate encoding, then join them with binary strings. See also: "What if I
117	117	don't encode?".
118	118
119	119	=end original
120	120
121	121	但し、決してテキスト文字列とバイナリ文字列を結合しないように
122	122	注意してください。
123	123	もしバイナリストリームにテキストが必要なら、まずテキスト文字列を適切な
124	124	エンコーディングを使ってエンコードして、それをバイナリ文字列と
125	125	結合してください。
126	126	L<"What if I don't encode?"> も参照してください。
127	127
128	128	=head2 When should I decode or encode?
129	129
130	130	(デコードやエンコードはいつ行うべき?)
131	131
132	132	=begin original
133	133
134	134	Whenever you're communicating text with anything that is external to your perl
135	135	process, like a database, a text file, a socket, or another program. Even if
136	136	the thing you're communicating with is also written in Perl.
137	137
138	138	=end original
139	139
140	140	データベース、テキストファイル、ソケット、他のプログラムといった、自分の
141	141	perl プロセスの外側にある何かとテキストを通信するときはいつでも、です。
142	142	通信の相手が Perl で書かれている場合も同じです。
143	143
144	144	=head2 What if I don't decode?
145	145
146	146	(デコードしないとどうなるの?)
147	147
148	148	=begin original
149	149
150	150	Whenever your encoded, binary string is used together with a text string, Perl
151	151	will assume that your binary string was encoded with ISO-8859-1, also known as
152	152	latin-1. If it wasn't latin-1, then your data is unpleasantly converted. For
153	153	example, if it was UTF-8, the individual bytes of multibyte characters are seen
154	154	as separate characters, and then again converted to UTF-8. Such double encoding
155	155	can be compared to double HTML encoding (C<&gt;>), or double URI encoding
156	156	(C<%253E>).
157	157
158	158	=end original
159	159
160	160	エンコードされたバイナリ文字列をテキスト文字列と一緒に使ったときはいつでも、
161	161	Perl はバイナリ文字列が ISO-8859-1 またの名を latin-1 と仮定します。
162	162	もしこれが latin-1 でなかった場合、データは不愉快な形に変換されます。
163	163	例えば、もしデータが UTF-8 だった場合、マルチバイト文字のそれぞれのバイトが
164	164	文字として扱われ、それから再び UTF-8 に変換されます。
165	165	このような二重エンコードは二重 HTML エンコーディング (C<&gt;>) や
166	166	二重 URI エンコーディング (C<%253E>) と比較できます。
167	167
168	168	=begin original
169	169
170	170	This silent implicit decoding is known as "upgrading". That may sound
171	171	positive, but it's best to avoid it.
172	172
173	173	=end original
174	174
175	175	この、暗黙のうちに行われるデコードは「昇格」("upgrading")と呼ばれます。
176	176	これは前向きなことに聞こえるかもしれませんが、避けるのが最良です。
177	177
178	178	=head2 What if I don't encode?
179	179
180	180	(エンコードしないとどうなるの?)
181	181
182	182	=begin original
183	183
184	184	Your text string will be sent using the bytes in Perl's internal format. In
185	185	some cases, Perl will warn you that you're doing something wrong, with a
186	186	friendly warning:
187	187
188	188	=end original
189	189
190	190	テキスト文字列は Perl の内部形式のバイト列を使って送信されます。
191	191	いくつかの場合では、Perl は何かが間違っていることを、親切なメッセージで
192	192	警告します:
193	193
194	194	Wide character in print at example.pl line 2.
195	195
196	196	=begin original
197	197
198	198	Because the internal format is often UTF-8, these bugs are hard to spot,
199	199	because UTF-8 is usually the encoding you wanted! But don't be lazy, and don't
200	200	use the fact that Perl's internal format is UTF-8 to your advantage. Encode
201	201	explicitly to avoid weird bugs, and to show to maintenance programmers that you
202	202	thought this through.
203	203
204	204	=end original
205	205
206	206	内部形式はしばしば UTF-8 なので、このバグは発見しにくいです; なぜなら
207	207	あなたがほしいのは普通 UTF-8 だからです!
208	208	しかし、手を抜かないでください; そして Perl の内部形式が UTF-8 であることを
209	209	利用しようとしないでください。
210	210	奇妙なバグを防ぐため、そして保守プログラマに対してあなたが何を考えたかを
211	211	示すために、明示的にエンコードしてください。
212	212
213	213	=head2 Is there a way to automatically decode or encode?
214	214
215	215	(自動的にデコードやエンコードする方法はある?)
216	216
217	217	=begin original
218	218
219	219	If all data that comes from a certain handle is encoded in exactly the same
220	220	way, you can tell the PerlIO system to automatically decode everything, with
221	221	the C<encoding> layer. If you do this, you can't accidentally forget to decode
222	222	or encode anymore, on things that use the layered handle.
223	223
224	224	=end original
225	225
226	226	もし、あるハンドルから来る全てのデータが正確に同じ方法で
227	227	エンコードされているなら、C<encoding> 層を使って、 PerlIO システムに自動的に
228	228	全てをデコードするように伝えることができます。
229	229	これを行えば、この層のハンドルを使っている限り、うっかりデコードや
230	230	エンコードを忘れることはありません。
231	231
232	232	=begin original
233	233
234	234	You can provide this layer when C<open>ing the file:
235	235
236	236	=end original
237	237
238	238	ファイルを C<open> するときにこの層を指定することができます:
239	239
240	240	open my $fh, '>:encoding(UTF-8)', $filename; # auto encoding on write
241	241	open my $fh, '<:encoding(UTF-8)', $filename; # auto decoding on read
242	242
243	243	=begin original
244	244
245	245	Or if you already have an open filehandle:
246	246
247	247	=end original
248	248
249	249	あるいは既にオープンしているファイルハンドルがあるなら:
250	250
251	251	binmode $fh, ':encoding(UTF-8)';
252	252
253	253	=begin original
254	254
255	255	Some database drivers for DBI can also automatically encode and decode, but
256	256	that is sometimes limited to the UTF-8 encoding.
257	257
258	258	=end original
259	259
260	260	DBI のデータベースドライバのいくつかも、エンコードとデコードを自動的に
261	261	行いますが、ときどきこれは UTF-8 エンコーディングに制限されています。
262	262
263	263	=head2 What if I don't know which encoding was used?
264	264
265	265	(どのエンコーディングが使われているかわからないときは?)
266	266
267	267	=begin original
268	268
269	269	Do whatever you can to find out, and if you have to: guess. (Don't forget to
270	270	document your guess with a comment.)
271	271
272	272	=end original
273	273
274	274	なんとかして見つけるか、もし必要なら、推測してください。
275	275	(どう推測したかをコメントとして文書化するのを忘れないでください。)
276	276
277	277	=begin original
278	278
279	279	You could open the document in a web browser, and change the character set or
280	280	character encoding until you can visually confirm that all characters look the
281	281	way they should.
282	282
283	283	=end original
284	284
285	285	ドキュメントを web ブラウザで開いて、全ての文字があるべき形であることを
286	286	視覚的に確認できるまで文字集合や文字エンコーディングを変更する方法も
287	287	あります。
288	288
289	289	=begin original
290	290
291	291	There is no way to reliably detect the encoding automatically, so if people
292	292	keep sending you data without charset indication, you may have to educate them.
293	293
294	294	=end original
295	295
296	296	エンコーディングを自動的に検出するための信頼性のある方法はないので、
297	297	もし人々があなたに文字集合の指示なしにデータを送り続けるなら、彼らを
298	298	教育する必要があるかもしれません。
299	299
300	300	=head2 Can I use Unicode in my Perl sources?
301	301
302	302	(Perl のソースコードに Unicode は使える?)
303	303
304	304	=begin original
305	305
306	306	Yes, you can! If your sources are UTF-8 encoded, you can indicate that with the
307	307	C<use utf8> pragma.
308	308
309	309	=end original
310	310
311	311	はい、できます!
312	312	ソースコードが UTF-8 でエンコードされているなら、C<use utf8> プラグマを
313	313	使ってそれを示すことができます。
314	314
315	315	use utf8;
316	316
317	317	=begin original
318	318
319	319	This doesn't do anything to your input, or to your output. It only influences
320	320	the way your sources are read. You can use Unicode in string literals, in
321	321	identifiers (but they still have to be "word characters" according to C<\w>),
322	322	and even in custom delimiters.
323	323
324	324	=end original
325	325
326	326	これは入出力に対しては何も行いません。
327	327	ソースを読み込む方法のみに影響を与えます。
328	328	文字列リテラル、識別子(しかし C<\w> に従った「単語文字」である必要が
329	329	あります)、そして独自デリミタにすら Unicode が使えます。
330	330
331	331	=head2 Data::Dumper doesn't restore the UTF8 flag; is it broken?
332	332
333	333	(Data::Dumper は UTF8 フラグを復元しません; これは壊れてるの?)
334	334
335	335	=begin original
336	336
337	337	No, Data::Dumper's Unicode abilities are as they should be. There have been
338	338	some complaints that it should restore the UTF8 flag when the data is read
339	339	again with C<eval>. However, you should really not look at the flag, and
340	340	nothing indicates that Data::Dumper should break this rule.
341	341
342	342	=end original
343	343
344	344	いいえ、Data::Dumper の Unicode 能力は、あるべき形であります。
345	345	C<eval> で再びデータを読み込むとき、UTF8 フラグを復元するべきだという
346	346	苦情が来ることがあります。
347	347	しかし、実際にはフラグを見るべきではないですし、Data::Dumper がこの規則を
348	348	破っていることを示すものは何もありません。
349	349
350	350	=begin original
351	351
352	352	Here's what happens: when Perl reads in a string literal, it sticks to 8 bit
353	353	encoding as long as it can. (But perhaps originally it was internally encoded
354	354	as UTF-8, when you dumped it.) When it has to give that up because other
355	355	characters are added to the text string, it silently upgrades the string to
356	356	UTF-8.
357	357
358	358	=end original
359	359
360	360	起きているのは以下のようなことです: Perl が文字列リテラルを読み込むとき、
361	361	可能な限り長く 8 ビットエンコーディングにこだわります。
362	362	(しかしおそらく、これをダンプしたときには内部では UTF-8 でエンコード
363	363	されていました。)
364	364	それ以外の文字をテキスト文字列に追加するためにこれを諦めなければならない
365	365	とき、Perl は暗黙のうちに文字列を UTF-8 に昇格させます。
366	366
367	367	=begin original
368	368
369	369	If you properly encode your strings for output, none of this is of your
370	370	concern, and you can just C<eval> dumped data as always.
371	371
372	372	=end original
373	373
374	374	出力用の文字列を適切にエンコードしていれば、これについてあなたは何も
375	375	心配することはなく、いつも通りにダンプしたデータを C<eval> できます。
376	376
377	377	=head2 Why do regex character classes sometimes match only in the ASCII range?
378	378
379	379	(なぜ正規表現文字クラスは時々 ASCII の範囲にしかマッチしないの?)
380	380
381		=head2 Why do some characters not uppercase or lowercase correctly?
382
383		(なぜいくつかの文字は正しく大文字や小文字にならないの?)
384
385	381	=begin original
386	382
387	383	Starting in Perl 5.14 (and partially in Perl 5.12), just put a
388	384	C<use feature 'unicode_strings'> near the beginning of your program.
389	385	Within its lexical scope you shouldn't have this problem. It also is
390		automatically enabled under C<use feature ':5.12'> or us~~ing~~ ~~C<-E~~> o~~n the~~
	386	automatically enabled under C<use feature ':5.12'> or C<use v5.12> or
391		command line for Perl 5.12 or higher.
	387	using C<-E> on the command line for Perl 5.12 or higher.
392	388
393	389	=end original
394	390
395	391	Perl 5.14 から (そして部分的に Perl 5.12 から、) 単にプログラムの先頭付近に
396	392	C<use feature 'unicode_strings'> を書いてください。
397	393	このレキシカルスコープ内ではこの問題は発生しないはずです。
398		これはまた C<use feature ':5.12'> が有効か、Perl 5.12 ~~以降でコマンドラインで~~
	394	これはまた C<use feature ':5.12'> または C<use v5.12> が有効か、Perl 5.12
399		C<-E> を使っていると自動的に有効になります。
	395	以降でコマンドラインで C<-E> を使っていると自動的に有効になります。
400	396
401	397	=begin original
402	398
403	399	The rationale for requiring this is to not break older programs that
404	400	rely on the way things worked before Unicode came along. Those older
405	401	programs knew only about the ASCII character set, and so may not work
406	402	properly for additional characters. When a string is encoded in UTF-8,
407	403	Perl assumes that the program is prepared to deal with Unicode, but when
408		the string isn't, Perl assumes that only ASCII ~~(unless it is an EBCDIC~~
	404	the string isn't, Perl assumes that only ASCII
409		~~platform)~~ is wanted, and so those characters that are not ASCII
	405	is wanted, and so those characters that are not ASCII
410	406	characters aren't recognized as to what they would be in Unicode.
411	407	C<use feature 'unicode_strings'> tells Perl to treat all characters as
412	408	Unicode, whether the string is encoded in UTF-8 or not, thus avoiding
413	409	the problem.
414	410
415	411	=end original
416	412
417	413	これが必要な理論的根拠は、Unicode がやってくる前に動作する方法に
418	414	依存している古いプログラムを壊さないことです。
419	415	このような古いプログラムは ASCII 文字集合のみを知っているので、追加の
420	416	文字については正しく動作しないかも知れません。
421	417	Perl はプログラムが Unicode を扱えるように準備されていると仮定しますが、
422	418	文字列がそうでなかった場合、Perl は (EBCDIC プラットフォームでなければ)
423	419	ASCII のみが求められていると仮定するので、非 ASCII 文字は Unicode に
424	420	するべきものとして認識しません。
425	421	C<use feature 'unicode_strings'> は Perl に、文字が UTF-8 で
426	422	エンコードされているかどうかにかかわらず全ての文字を Unicode として
427	423	扱うように知らせて、この問題を回避します。
428	424
429	425	=begin original
430	426
431	427	However, on earlier Perls, or if you pass strings to subroutines outside
432		the feature's scope, you can force Unicode se~~mantic~~s by changing the
	428	the feature's scope, you can force Unicode rules by changing the
433	429	encoding to UTF-8 by doing C<utf8::upgrade($string)>. This can be used
434	430	safely on any string, as it checks and does not change strings that have
435	431	already been upgraded.
436	432
437	433	=end original
438	434
439	435	しかし、以前の Perl であったり、この機能のスコープの外側のサブルーチンに
440	436	文字列を渡した場合、C<utf8::upgrade($string)> とすることでエンコーディングを
441		UTF-8 にすることで~~強制的に~~ Unicode の動作を使えます。
	437	UTF-8 にすることで Unicode の規則を強制できます。
442	438	これは既に昇格している文字列は変更しないので、どのような文字列に対しても
443	439	安全に用いることができます。
444	440
445	441	=begin original
446	442
447	443	For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
448	444
449	445	=end original
450	446
451	447	さらなる詳細な議論については、CPAN の L<Unicode::Semantics> を
452	448	参照してください。
453	449
	450	=head2 Why do some characters not uppercase or lowercase correctly?
	451
	452	(なぜいくつかの文字は正しく大文字や小文字にならないの?)
	453
	454	=begin original
	455
	456	See the answer to the previous question.
	457
	458	=end original
	459
	460	前述の質問の答えを参照してください。
	461
454	462	=head2 How can I determine if a string is a text string or a binary string?
455	463
456	464	(文字列がテキスト文字列かバイナリ文字列かを決定するには?)
457	465
458	466	=begin original
459	467
460	468	You can't. Some use the UTF8 flag for this, but that's misuse, and makes well
461	469	behaved modules like Data::Dumper look bad. The flag is useless for this
462	470	purpose, because it's off when an 8 bit encoding (by default ISO-8859-1) is
463	471	used to store the string.
464	472
465	473	=end original
466	474
467	475	それはできません。
468	476	このために UTF8 フラグを使う人もいますが、これは誤用で、Data::Dumper のように
469	477	正しく振る舞うモジュールをおかしくします。
470	478	このフラグはこの目的のためには使えません; なぜなら文字列の保管に 8 ビット
471	479	エンコーディングが使われている場合 (デフォルトでは ISO-8859-1 です)、
472	480	オフだからです。
473	481
474	482	=begin original
475	483
476	484	This is something you, the programmer, has to keep track of; sorry. You could
477	485	consider adopting a kind of "Hungarian notation" to help with this.
478	486
479	487	=end original
480	488
481	489	把握しておく必要があるプログラマに言えることはこれです; ごめんなさい。
482	490	これを助けるために、「ハンガリアン記法」のようなものの採用を
483	491	検討することもできます。
484	492
485	493	=head2 How do I convert from encoding FOO to encoding BAR?
486	494
487	495	(エンコーディング FOO からエンコーディング BAR に変換するには?)
488	496
489	497	=begin original
490	498
491	499	By first converting the FOO-encoded byte string to a text string, and then the
492	500	text string to a BAR-encoded byte string:
493	501
494	502	=end original
495	503
496	504	まず FOO でエンコードされたバイト文字列をテキスト文字列に変化し、
497	505	それからテキスト文字列を BAR エンコードされたバイト文字列に変換します:
498	506
499	507	my $text_string = decode('FOO', $foo_string);
500	508	my $bar_string = encode('BAR', $text_string);
501	509
502	510	=begin original
503	511
504	512	or by skipping the text string part, and going directly from one binary
505	513	encoding to the other:
506	514
507	515	=end original
508	516
509	517	あるいは、テキスト文字列の部分を飛ばして、あるバイナリエンコーディングから
510	518	他のものへ直接変換します:
511	519
512	520	use Encode qw(from_to);
513	521	from_to($string, 'FOO', 'BAR'); # changes contents of $string
514	522
515	523	=begin original
516	524
517	525	or by letting automatic decoding and encoding do all the work:
518	526
519	527	=end original
520	528
521	529	あるいは、自動でデコードとエンコードをさせることで全ての作業を行います:
522	530
523	531	open my $foofh, '<:encoding(FOO)', 'example.foo.txt';
524	532	open my $barfh, '>:encoding(BAR)', 'example.bar.txt';
525	533	print { $barfh } $_ while <$foofh>;
526	534
527	535	=head2 What are C<decode_utf8> and C<encode_utf8>?
528	536
529	537	(C<decode_utf8> と C<encode_utf8> って何?)
530	538
531	539	=begin original
532	540
533	541	These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
534	542	...)>.
535	543
536	544	=end original
537	545
538	546	これらは C<decode('utf8', ...)> および C<encode('utf8', ...)> のもう一つの
539	547	文法です。
540	548
541	549	=head2 What is a "wide character"?
542	550
543	551	(「ワイド文字」って何?)
544	552
545	553	=begin original
546	554
547	555	This is a term used both for characters with an ordinal value greater than 127,
548	556	characters with an ordinal value greater than 255, or any character occupying
549	557	more than one byte, depending on the context.
550	558
551	559	=end original
552	560
553	561	これは文脈に依存して、127 より大きい序数を持つ文字、255 より大きい序数を
554	562	持つ文字、1 バイトで収まらない文字、のいずれかの意味で使われる用語です。
555	563
556	564	=begin original
557	565
558	566	The Perl warning "Wide character in ..." is caused by a character with an
559	567	ordinal value greater than 255. With no specified encoding layer, Perl tries to
560	568	fit things in ISO-8859-1 for backward compatibility reasons. When it can't, it
561	569	emits this warning (if warnings are enabled), and outputs UTF-8 encoded data
562	570	instead.
563	571
564	572	=end original
565	573
566	574	Perl の警告 "Wide character in ..." は 255 より大きい序数を持つ文字によって
567	575	引き起こされます。
568	576	エンコーディング層が指定されていない場合、Perl は過去互換性の理由によって
569	577	文字を ISO-8859-1 に合わせようとします。
570	578	これができないと、(警告が有効なら)この警告が出力され、代わりに UTF-8 で
571	579	エンコードされたデータが出力されます。
572	580
573	581	=begin original
574	582
575	583	To avoid this warning and to avoid having different output encodings in a single
576	584	stream, always specify an encoding explicitly, for example with a PerlIO layer:
577	585
578	586	=end original
579	587
580	588	この警告を回避し、一つのストリームに異なった出力エンコーディングが
581	589	出力されることを回避するには、常に明示的にエンコーディングを指定してください;
582	590	例えば PerlIO 層を使って:
583	591
584	592	binmode STDOUT, ":encoding(UTF-8)";
585	593
586	594	=head1 INTERNALS
587	595
588	596	(内部構造)
589	597
590	598	=head2 What is "the UTF8 flag"?
591	599
592	600	(「UTF8 フラグ」って何?)
593	601
594	602	=begin original
595	603
596	604	Please, unless you're hacking the internals, or debugging weirdness, don't
597	605	think about the UTF8 flag at all. That means that you very probably shouldn't
598	606	use C<is_utf8>, C<_utf8_on> or C<_utf8_off> at all.
599	607
600	608	=end original
601	609
602	610	内部をハックしようとしているか、変なものをデバッグしようとしているのでない
603	611	限り、どうか UTF8 フラグのことは一切考えないでください。
604	612	これは、まず間違いなく C<is_utf8>, C<_utf8_on>, C<_utf8_off> を
605	613	一切使うべきでないことを意味します。
606	614
607	615	=begin original
608	616
609	617	The UTF8 flag, also called SvUTF8, is an internal flag that indicates that the
610	618	current internal representation is UTF-8. Without the flag, it is assumed to be
611	619	ISO-8859-1. Perl converts between these automatically. (Actually Perl usually
612	620	assumes the representation is ASCII; see L</Why do regex character classes
613	621	sometimes match only in the ASCII range?> above.)
614	622
615	623	=end original
616	624
617	625	UTF8 フラグ(SvUTF8 とも呼ばれます)は、現在の内部表現が UTF-8 であることを
618	626	示す内部フラグです。
619	627	このフラグがない場合、ISO-8859-1 と仮定します。
620	628	Perl はこれらを自動的に変換します。
621	629	(実際のところ Perl は普通表現が ASCII であると仮定します; 上述の L</Why do
622	630	regex character classes sometimes match only in the ASCII range?> を
623	631	参照してください。)
624	632
625	633	=begin original
626	634
627	635	One of Perl's internal formats happens to be UTF-8. Unfortunately, Perl can't
628	636	keep a secret, so everyone knows about this. That is the source of much
629	637	confusion. It's better to pretend that the internal format is some unknown
630	638	encoding, and that you always have to encode and decode explicitly.
631	639
632	640	=end original
633	641
634	642	Perl の内部表現の一つはたまたま UTF-8 です。
635	643	残念ながら、Perl は秘密を守れないので、このことはみんな知っています。
636	644	これが多くの混乱の源です。
637	645	内部表現は何か分からないエンコーディングで、常に明示的にエンコードと
638	646	デコードが必要ということにしておいた方がよいです。
639	647
640	648	=head2 What about the C<use bytes> pragma?
641	649
642	650	(C<use bytes> プラグマって何?)
643	651
644	652	=begin original
645	653
646	654	Don't use it. It makes no sense to deal with bytes in a text string, and it
647	655	makes no sense to deal with characters in a byte string. Do the proper
648	656	conversions (by decoding/encoding), and things will work out well: you get
649	657	character counts for decoded data, and byte counts for encoded data.
650	658
651	659	=end original
652	660
653	661	これは使わないでください。
654	662	テキスト文字列をバイト単位で扱うことに意味はありませんし、
655	663	バイト文字列を文字単位で扱うことには意味はありません。
656	664	適切な変換(デコードかエンコード)を行えば、物事はうまくいきます:
657	665	デコードしたデータの文字数を得られますし、エンコードしたデータのバイト数を
658	666	得られます。
659	667
660	668	=begin original
661	669
662	670	C<use bytes> is usually a failed attempt to do something useful. Just forget
663	671	about it.
664	672
665	673	=end original
666	674
667	675	C<use bytes> は何か有用なことをしようとするためには間違った方法です。
668	676	これのことは単に忘れてください。
669	677
670	678	=head2 What about the C<use encoding> pragma?
671	679
672	680	(C<use encoding> プラグマって何?)
673	681
674	682	=begin original
675	683
676	684	Don't use it. Unfortunately, it assumes that the programmer's environment and
677	685	that of the user will use the same encoding. It will use the same encoding for
678	686	the source code and for STDIN and STDOUT. When a program is copied to another
679	687	machine, the source code does not change, but the STDIO environment might.
680	688
681	689	=end original
682	690
683	691	これは使わないでください。
684	692	残念ながら、これはプログラマの環境とユーザーの環境が同じであると仮定します。
685	693	これはソースコードと STDIN や STDOUT で同じエンコーディングを使います。
686	694	プログラムが他のマシンにコピーされると、ソースコードは変わりませんが、
687	695	STDIO 環境は変わるかもしれません。
688	696
689	697	=begin original
690	698
691	699	If you need non-ASCII characters in your source code, make it a UTF-8 encoded
692	700	file and C<use utf8>.
693	701
694	702	=end original
695	703
696	704	もしソースコードに非 ASCII 文字が必要なら、ファイルを UTF-8 で
697	705	エンコードして、C<use utf8> を使ってください。
698	706
699	707	=begin original
700	708
701	709	If you need to set the encoding for STDIN, STDOUT, and STDERR, for example
702	710	based on the user's locale, C<use open>.
703	711
704	712	=end original
705	713
706	714	もし STDIN, STDOUT, STDERR のエンコーディングを、例えばユーザーのロケールに
707	715	合わせてセットする必要があるなら、C<use open> してください。
708	716
709	717	=head2 What is the difference between C<:encoding> and C<:utf8>?
710	718
711	719	(C<:encoding> と C<:utf8> の違いは?)
712	720
713	721	=begin original
714	722
715	723	Because UTF-8 is one of Perl's internal formats, you can often just skip the
716	724	encoding or decoding step, and manipulate the UTF8 flag directly.
717	725
718	726	=end original
719	727
720	728	UTF-8 は Perl の内部形式のひとつなので、しばしばエンコードやデコードの
721	729	手順を省略して、UTF8 フラグを直接操作できます。
722	730
723	731	=begin original
724	732
725	733	Instead of C<:encoding(UTF-8)>, you can simply use C<:utf8>, which skips the
726	734	encoding step if the data was already represented as UTF8 internally. This is
727	735	widely accepted as good behavior when you're writing, but it can be dangerous
728	736	when reading, because it causes internal inconsistency when you have invalid
729	737	byte sequences. Using C<:utf8> for input can sometimes result in security
730	738	breaches, so please use C<:encoding(UTF-8)> instead.
731	739
732	740	=end original
733	741
734	742	C<:encoding(UTF-8)> の代わりに単に C<:utf8> を使うことで、もしデータが
735	743	内部で既に UTF8 で表現されていれば、エンコードの手順を省略します。
736	744	これは、書き込むときにはよい振る舞いであると広く受け入れられていますが、
737	745	読み込むときには危険があります; なぜなら不正なバイト列を受け取ると
738	746	内部矛盾を引き起こすからです。
739	747	入力に C<:utf8> を使うとセキュリティ侵害を引き起こす可能性があるので、
740	748	どうか代わりに C<:encoding(UTF-8)> を使ってください。
741	749
742	750	=begin original
743	751
744	752	Instead of C<decode> and C<encode>, you could use C<_utf8_on> and C<_utf8_off>,
745	753	but this is considered bad style. Especially C<_utf8_on> can be dangerous, for
746	754	the same reason that C<:utf8> can.
747	755
748	756	=end original
749	757
750	758	C<decode> と C<encode> の代わりに、C<_utf8_on> と C<_utf8_off> を
751	759	使えますが、これは悪いスタイルと考えられています。
752	760	特に C<_utf8_on> は、C<:utf8> と同じ理由で危険です。
753	761
754	762	=begin original
755	763
756		There are some shortcuts for oneliners; ~~see C<-C> in L<perlrun>.~~
	764	There are some shortcuts for oneliners;
	765	see L<-C\|perlrun/-C [numberE<sol>list]> in L<perlrun>.
757	766
758	767	=end original
759	768
760		一行野郎のための省略形があります; L<perlrun> の ~~C<-C> を参照してください。~~
	769	一行野郎のための省略形があります; L<perlrun> の
	770	L<-C\|perlrun/-C [numberE<sol>list]> を参照してください。
761	771
762	772	=head2 What's the difference between C<UTF-8> and C<utf8>?
763	773
764	774	(C<UTF-8> と C<utf8> の違いは?)
765	775
766	776	=begin original
767	777
768	778	C<UTF-8> is the official standard. C<utf8> is Perl's way of being liberal in
769	779	what it accepts. If you have to communicate with things that aren't so liberal,
770	780	you may want to consider using C<UTF-8>. If you have to communicate with things
771	781	that are too liberal, you may have to use C<utf8>. The full explanation is in
772	782	L<Encode>.
773	783
774	784	=end original
775	785
776	786	C<UTF-8> は公式な標準です。
777	787	C<utf8> は、何を受け入れるかに関して自由な Perl のやり方です。
778	788	もしそれほど自由でないものと対話する必要があるなら、
779	789	C<UTF-8> を使うことを考えたくなるかもしれません。
780	790	自由すぎるものと対話する必要があるなら、C<utf8> を
781	791	使わなければならないかもしれません。
782	792	完全な説明は L<Encode> にあります。
783	793
784	794	=begin original
785	795
786	796	C<UTF-8> is internally known as C<utf-8-strict>. The tutorial uses UTF-8
787	797	consistently, even where utf8 is actually used internally, because the
788	798	distinction can be hard to make, and is mostly irrelevant.
789	799
790	800	=end original
791	801
792	802	C<UTF-8> は内部では C<utf-8-strict> として知られます。
793	803	チュートリアルでは、たとえ内部では実際には utf8 が使われる場合でも
794	804	一貫して UTF-8 を使っています; なぜなら区別をつけるのは難しく、ほとんど
795	805	無意味だからです。
796	806
797	807	=begin original
798	808
799	809	For example, utf8 can be used for code points that don't exist in Unicode, like
800	810	9999999, but if you encode that to UTF-8, you get a substitution character (by
801	811	default; see L<Encode/"Handling Malformed Data"> for more ways of dealing with
802	812	this.)
803	813
804	814	=end original
805	815
806	816	例えば utf8 は、9999999 のような、Unicode に存在しない符号位置も使えますが、
807	817	これを UTF-8 でエンコードすると、代替文字を得ることになります(これは
808	818	デフォルトの場合です; これを扱う他の方法については
809	819	L<Encode/"Handling Malformed Data"> を参照してください。)
810	820
811	821	=begin original
812	822
813	823	Okay, if you insist: the "internal format" is utf8, not UTF-8. (When it's not
814	824	some other encoding.)
815	825
816	826	=end original
817	827
818	828	わかりました、どうしてもと言うのなら:「内部形式」は utf8 であって、
819	829	UTF-8 ではありません。
820	830	(もしその他のエンコーディングでないのなら。)
821	831
822	832	=head2 I lost track; what encoding is the internal format really?
823	833
824	834	(迷子になりました; 実際のところ内部形式のエンコーディングは何?)
825	835
826	836	=begin original
827	837
828	838	It's good that you lost track, because you shouldn't depend on the internal
829	839	format being any specific encoding. But since you asked: by default, the
830	840	internal format is either ISO-8859-1 (latin-1), or utf8, depending on the
831	841	history of the string. On EBCDIC platforms, this may be different even.
832	842
833	843	=end original
834	844
835	845	迷子になったのはよいことです; なぜなら内部形式が特定のエンコーディングで
836	846	あることに依存するべきではないからです。
837	847	しかし聞かれたので答えましょう: デフォルトでは、内部形式は
838	848	ISO-8859-1 (latin-1) か utf8 で、どちらになるかは文字列の歴史に
839	849	依存します。
840	850	EBCDIC プラットフォームでは、これは異なっているかもしれません。
841	851
842	852	=begin original
843	853
844	854	Perl knows how it stored the string internally, and will use that knowledge
845	855	when you C<encode>. In other words: don't try to find out what the internal
846	856	encoding for a certain string is, but instead just encode it into the encoding
847	857	that you want.
848	858
849	859	=end original
850	860
851	861	Perl は文字列が内部でどのように保管されているかを知っていて、この知識を
852	862	C<エンコードする> ときに使います。
853	863	言い換えると: 特定の文字列の内部エンコーディングが何かを
854	864	調べようとしてはいけません; 代わりに、単に望みのエンコーディングに
855	865	エンコードしてください。
856	866
857	867	=head1 AUTHOR
858	868
859	869	Juerd Waalboer <#####@juerd.nl>
860	870
861	871	=head1 SEE ALSO
862	872
863	873	L<perlunicode>, L<perluniintro>, L<Encode>
864	874
865	875	=begin meta
866	876
867	877	Translate: SHIRAKATA Kentaro <argrath@ub32.org> (5.10.0-)
868	878	Status: completed
869	879
870	880	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association