perlunifaq 5.38.0 と 5.24.1 の差分

1	1
2		=encoding u~~tf8~~
	2	=encoding euc-jp
3	3
4	4	=head1 NAME
5	5
6	6	=begin original
7	7
8	8	perlunifaq - Perl Unicode FAQ
9	9
10	10	=end original
11	11
12	12	perlunifaq - Perl Unicode FAQ
13	13
14	14	=head1 Q and A
15	15
16	16	=begin original
17	17
18	18	This is a list of questions and answers about Unicode in Perl, intended to be
19	19	read after L<perlunitut>.
20	20
21	21	=end original
22	22
23	23	これは、L<perlunitut> の後で読むことを想定した、Perl での Unicode に関する
24	24	質問と答えの一覧です。
25	25
26	26	=head2 perlunitut isn't really a Unicode tutorial, is it?
27	27
28	28	(perlunitut は実際には Unicode チュートリアルじゃないんじゃないの?)
29	29
30	30	=begin original
31	31
32	32	No, and this isn't really a Unicode FAQ.
33	33
34	34	=end original
35	35
36	36	はい、違います; そしてこれは実際には Unicode FAQ ではありません。
37	37
38	38	=begin original
39	39
40	40	Perl has an abstracted interface for all supported character encodings, so this
41	41	is actually a generic C<Encode> tutorial and C<Encode> FAQ. But many people
42	42	think that Unicode is special and magical, and I didn't want to disappoint
43	43	them, so I decided to call the document a Unicode tutorial.
44	44
45	45	=end original
46	46
47	47	Perl は対応している全ての文字エンコーディングへの抽象インターフェースを
48	48	持っているので、実際には汎用の C<Encode> チュートリアルと
49	49	C<Encode> FAQ です。
50	50	しかし、多くの人々が、Unicode は特別でマジカルなものだと考えていて、
51	51	私は彼らを失望させたくなかったので、そのドキュメントを
52	52	Unicode チュートリアルと呼ぶことに決めました。
53	53
54	54	=head2 What character encodings does Perl support?
55	55
56	56	(Perl が対応している文字エンコーディングは何?)
57	57
58	58	=begin original
59	59
60	60	To find out which character encodings your Perl supports, run:
61	61
62	62	=end original
63	63
64	64	Perl がどの文字エンコーディングに対応しているかを見つけるには、以下を
65	65	実行してください:
66	66
67	67	perl -MEncode -le "print for Encode->encodings(':all')"
68	68
69	69	=head2 Which version of perl should I use?
70	70
71	71	(どのバージョンの perl を使うべき?)
72	72
73	73	=begin original
74	74
75	75	Well, if you can, upgrade to the most recent, but certainly C<5.8.1> or newer.
76	76	The tutorial and FAQ assume the latest release.
77	77
78	78	=end original
79	79
80	80	うーん、もし可能なら、最新にアップグレードしてください; 但し、確実に
81	81	C<5.8.1> 以降にはしてください。
82	82	チュートリアルと FAQ は最新リリースを仮定しています。
83	83
84	84	=begin original
85	85
86	86	You should also check your modules, and upgrade them if necessary. For example,
87	87	HTML::Entities requires version >= 1.32 to function correctly, even though the
88	88	changelog is silent about this.
89	89
90	90	=end original
91	91
92	92	モジュールもチェックして、もし必要ならアップグレードするべきです。
93	93	例えば HTML::Entities は、changelog は何も触れていませんが、正しく
94	94	動作するためにはバージョン >= 1.32 が必要です。
95	95
96	96	=head2 What about binary data, like images?
97	97
98	98	(イメージのようなバイナリデータはどうするの?)
99	99
100	100	=begin original
101	101
102	102	Well, apart from a bare C<binmode $fh>, you shouldn't treat them specially.
103	103	(The binmode is needed because otherwise Perl may convert line endings on Win32
104	104	systems.)
105	105
106	106	=end original
107	107
108	108	うーん、生の C<binmode $fh> を別として、特別に扱う必要はないはずです。
109	109	(Win32 システムで Perl が行端を変更しないようにするために、binmode が
110	110	必要です。)
111	111
112	112	=begin original
113	113
114	114	Be careful, though, to never combine text strings with binary strings. If you
115	115	need text in a binary stream, encode your text strings first using the
116	116	appropriate encoding, then join them with binary strings. See also: "What if I
117	117	don't encode?".
118	118
119	119	=end original
120	120
121	121	但し、決してテキスト文字列とバイナリ文字列を結合しないように
122	122	注意してください。
123	123	もしバイナリストリームにテキストが必要なら、まずテキスト文字列を適切な
124	124	エンコーディングを使ってエンコードして、それをバイナリ文字列と
125	125	結合してください。
126	126	L<"What if I don't encode?"> も参照してください。
127	127
128	128	=head2 When should I decode or encode?
129	129
130	130	(デコードやエンコードはいつ行うべき?)
131	131
132	132	=begin original
133	133
134	134	Whenever you're communicating text with anything that is external to your perl
135	135	process, like a database, a text file, a socket, or another program. Even if
136	136	the thing you're communicating with is also written in Perl.
137	137
138	138	=end original
139	139
140	140	データベース、テキストファイル、ソケット、他のプログラムといった、自分の
141	141	perl プロセスの外側にある何かとテキストを通信するときはいつでも、です。
142	142	通信の相手が Perl で書かれている場合も同じです。
143	143
144	144	=head2 What if I don't decode?
145	145
146	146	(デコードしないとどうなるの?)
147	147
148	148	=begin original
149	149
150	150	Whenever your encoded, binary string is used together with a text string, Perl
151	151	will assume that your binary string was encoded with ISO-8859-1, also known as
152	152	latin-1. If it wasn't latin-1, then your data is unpleasantly converted. For
153	153	example, if it was UTF-8, the individual bytes of multibyte characters are seen
154	154	as separate characters, and then again converted to UTF-8. Such double encoding
155	155	can be compared to double HTML encoding (C<&gt;>), or double URI encoding
156	156	(C<%253E>).
157	157
158	158	=end original
159	159
160	160	エンコードされたバイナリ文字列をテキスト文字列と一緒に使ったときはいつでも、
161	161	Perl はバイナリ文字列が ISO-8859-1 またの名を latin-1 と仮定します。
162	162	もしこれが latin-1 でなかった場合、データは不愉快な形に変換されます。
163	163	例えば、もしデータが UTF-8 だった場合、マルチバイト文字のそれぞれのバイトが
164	164	文字として扱われ、それから再び UTF-8 に変換されます。
165	165	このような二重エンコードは二重 HTML エンコーディング (C<&gt;>) や
166	166	二重 URI エンコーディング (C<%253E>) と比較できます。
167	167
168	168	=begin original
169	169
170	170	This silent implicit decoding is known as "upgrading". That may sound
171	171	positive, but it's best to avoid it.
172	172
173	173	=end original
174	174
175	175	この、暗黙のうちに行われるデコードは「昇格」("upgrading")と呼ばれます。
176	176	これは前向きなことに聞こえるかもしれませんが、避けるのが最良です。
177	177
178	178	=head2 What if I don't encode?
179	179
180	180	(エンコードしないとどうなるの?)
181	181
182	182	=begin original
183	183
184		It de~~pends~~ on w~~hat~~ ~~you~~ out~~put~~ and how y~~ou ou~~t~~put~~ it.
	184	Your text string will be sent using the bytes in Perl's internal format. In
	185	some cases, Perl will warn you that you're doing something wrong, with a
	186	friendly warning:
185	187
186	188	=end original
187	189
188		何を~~どうや~~って~~出力するかにより~~ます。
	190	テキスト文字列は Perl の内部形式のバイト列を使って送信されます。
	191	いくつかの場合では、Perl は何かが間違っていることを、親切なメッセージで
	192	警告します:
189	193
190		~~=hea~~d3 Output via a file~~hand~~le
	194	Wide character in print at example.pl line 2.
191	195
192		=over
193
194	196	=begin original
195	197
196		=item * If the string's characters are all code point 255 or lower, Perl
197		outputs bytes that match those code points. This is what happens with encoded
198		strings. It can also, though, happen with unencoded strings that happen to be
199		all code point 255 or lower.
200
201		=end original
202
203		=item * 文字列の文字の符号位置が全て 255 以下の場合、Perl は
204		その符号位置に一致するバイトを出力します。
205		これはエンコードされた文字列の時に起きることです。
206		しかし、たまたま全ての符号位置が 255 以下のエンコードされていない文字列でも
207		起きます。
208
209		=begin original
210
211		=item * Otherwise, Perl outputs the string encoded as UTF-8. This only happens
212		with strings you neglected to encode. Since that should not happen, Perl also
213		throws a "wide character" warning in this case.
214
215		=end original
216
217		=item * さもなければ、Perl は UTF-8 としてエンコードされた文字列を出力します。
218		これはあなたがエンコードを拒否した文字列にのみ起きます。
219		これは起きるべきではないので、Perl はこの場合
220		"wide character" 警告も投げます。
221
222		=back
223
224		=head3 Other output mechanisms (e.g., C<exec>, C<chdir>, ..)
225
226		(その他の出力機構 (例えば C<exec>, C<chdir>, ..))
227
228		=begin original
229
230		Your text string will be sent using the bytes in Perl's internal format.
231
232		=end original
233
234		テキスト文字列は Perl の内部フォーマットのバイトを使って送ります。
235
236		=begin original
237
238	198	Because the internal format is often UTF-8, these bugs are hard to spot,
239	199	because UTF-8 is usually the encoding you wanted! But don't be lazy, and don't
240	200	use the fact that Perl's internal format is UTF-8 to your advantage. Encode
241	201	explicitly to avoid weird bugs, and to show to maintenance programmers that you
242	202	thought this through.
243	203
244	204	=end original
245	205
246	206	内部形式はしばしば UTF-8 なので、このバグは発見しにくいです; なぜなら
247	207	あなたがほしいのは普通 UTF-8 だからです!
248	208	しかし、手を抜かないでください; そして Perl の内部形式が UTF-8 であることを
249	209	利用しようとしないでください。
250	210	奇妙なバグを防ぐため、そして保守プログラマに対してあなたが何を考えたかを
251	211	示すために、明示的にエンコードしてください。
252	212
253	213	=head2 Is there a way to automatically decode or encode?
254	214
255	215	(自動的にデコードやエンコードする方法はある?)
256	216
257	217	=begin original
258	218
259	219	If all data that comes from a certain handle is encoded in exactly the same
260	220	way, you can tell the PerlIO system to automatically decode everything, with
261	221	the C<encoding> layer. If you do this, you can't accidentally forget to decode
262	222	or encode anymore, on things that use the layered handle.
263	223
264	224	=end original
265	225
266	226	もし、あるハンドルから来る全てのデータが正確に同じ方法で
267	227	エンコードされているなら、C<encoding> 層を使って、 PerlIO システムに自動的に
268	228	全てをデコードするように伝えることができます。
269	229	これを行えば、この層のハンドルを使っている限り、うっかりデコードや
270	230	エンコードを忘れることはありません。
271	231
272	232	=begin original
273	233
274	234	You can provide this layer when C<open>ing the file:
275	235
276	236	=end original
277	237
278	238	ファイルを C<open> するときにこの層を指定することができます:
279	239
280	240	open my $fh, '>:encoding(UTF-8)', $filename; # auto encoding on write
281	241	open my $fh, '<:encoding(UTF-8)', $filename; # auto decoding on read
282	242
283	243	=begin original
284	244
285	245	Or if you already have an open filehandle:
286	246
287	247	=end original
288	248
289	249	あるいは既にオープンしているファイルハンドルがあるなら:
290	250
291	251	binmode $fh, ':encoding(UTF-8)';
292	252
293	253	=begin original
294	254
295	255	Some database drivers for DBI can also automatically encode and decode, but
296	256	that is sometimes limited to the UTF-8 encoding.
297	257
298	258	=end original
299	259
300	260	DBI のデータベースドライバのいくつかも、エンコードとデコードを自動的に
301	261	行いますが、ときどきこれは UTF-8 エンコーディングに制限されています。
302	262
303	263	=head2 What if I don't know which encoding was used?
304	264
305	265	(どのエンコーディングが使われているかわからないときは?)
306	266
307	267	=begin original
308	268
309	269	Do whatever you can to find out, and if you have to: guess. (Don't forget to
310	270	document your guess with a comment.)
311	271
312	272	=end original
313	273
314	274	なんとかして見つけるか、もし必要なら、推測してください。
315	275	(どう推測したかをコメントとして文書化するのを忘れないでください。)
316	276
317	277	=begin original
318	278
319	279	You could open the document in a web browser, and change the character set or
320	280	character encoding until you can visually confirm that all characters look the
321	281	way they should.
322	282
323	283	=end original
324	284
325	285	ドキュメントを web ブラウザで開いて、全ての文字があるべき形であることを
326	286	視覚的に確認できるまで文字集合や文字エンコーディングを変更する方法も
327	287	あります。
328	288
329	289	=begin original
330	290
331	291	There is no way to reliably detect the encoding automatically, so if people
332	292	keep sending you data without charset indication, you may have to educate them.
333	293
334	294	=end original
335	295
336	296	エンコーディングを自動的に検出するための信頼性のある方法はないので、
337	297	もし人々があなたに文字集合の指示なしにデータを送り続けるなら、彼らを
338	298	教育する必要があるかもしれません。
339	299
340	300	=head2 Can I use Unicode in my Perl sources?
341	301
342	302	(Perl のソースコードに Unicode は使える?)
343	303
344	304	=begin original
345	305
346	306	Yes, you can! If your sources are UTF-8 encoded, you can indicate that with the
347	307	C<use utf8> pragma.
348	308
349	309	=end original
350	310
351	311	はい、できます!
352	312	ソースコードが UTF-8 でエンコードされているなら、C<use utf8> プラグマを
353	313	使ってそれを示すことができます。
354	314
355	315	use utf8;
356	316
357	317	=begin original
358	318
359	319	This doesn't do anything to your input, or to your output. It only influences
360	320	the way your sources are read. You can use Unicode in string literals, in
361	321	identifiers (but they still have to be "word characters" according to C<\w>),
362	322	and even in custom delimiters.
363	323
364	324	=end original
365	325
366	326	これは入出力に対しては何も行いません。
367	327	ソースを読み込む方法のみに影響を与えます。
368	328	文字列リテラル、識別子(しかし C<\w> に従った「単語文字」である必要が
369	329	あります)、そして独自デリミタにすら Unicode が使えます。
370	330
371	331	=head2 Data::Dumper doesn't restore the UTF8 flag; is it broken?
372	332
373	333	(Data::Dumper は UTF8 フラグを復元しません; これは壊れてるの?)
374	334
375	335	=begin original
376	336
377	337	No, Data::Dumper's Unicode abilities are as they should be. There have been
378	338	some complaints that it should restore the UTF8 flag when the data is read
379	339	again with C<eval>. However, you should really not look at the flag, and
380	340	nothing indicates that Data::Dumper should break this rule.
381	341
382	342	=end original
383	343
384	344	いいえ、Data::Dumper の Unicode 能力は、あるべき形であります。
385	345	C<eval> で再びデータを読み込むとき、UTF8 フラグを復元するべきだという
386	346	苦情が来ることがあります。
387	347	しかし、実際にはフラグを見るべきではないですし、Data::Dumper がこの規則を
388	348	破っていることを示すものは何もありません。
389	349
390	350	=begin original
391	351
392	352	Here's what happens: when Perl reads in a string literal, it sticks to 8 bit
393	353	encoding as long as it can. (But perhaps originally it was internally encoded
394	354	as UTF-8, when you dumped it.) When it has to give that up because other
395	355	characters are added to the text string, it silently upgrades the string to
396	356	UTF-8.
397	357
398	358	=end original
399	359
400	360	起きているのは以下のようなことです: Perl が文字列リテラルを読み込むとき、
401	361	可能な限り長く 8 ビットエンコーディングにこだわります。
402	362	(しかしおそらく、これをダンプしたときには内部では UTF-8 でエンコード
403	363	されていました。)
404	364	それ以外の文字をテキスト文字列に追加するためにこれを諦めなければならない
405	365	とき、Perl は暗黙のうちに文字列を UTF-8 に昇格させます。
406	366
407	367	=begin original
408	368
409	369	If you properly encode your strings for output, none of this is of your
410	370	concern, and you can just C<eval> dumped data as always.
411	371
412	372	=end original
413	373
414	374	出力用の文字列を適切にエンコードしていれば、これについてあなたは何も
415	375	心配することはなく、いつも通りにダンプしたデータを C<eval> できます。
416	376
417	377	=head2 Why do regex character classes sometimes match only in the ASCII range?
418	378
419	379	(なぜ正規表現文字クラスは時々 ASCII の範囲にしかマッチしないの?)
420	380
421	381	=begin original
422	382
423	383	Starting in Perl 5.14 (and partially in Perl 5.12), just put a
424	384	C<use feature 'unicode_strings'> near the beginning of your program.
425	385	Within its lexical scope you shouldn't have this problem. It also is
426	386	automatically enabled under C<use feature ':5.12'> or C<use v5.12> or
427	387	using C<-E> on the command line for Perl 5.12 or higher.
428	388
429	389	=end original
430	390
431	391	Perl 5.14 から (そして部分的に Perl 5.12 から、) 単にプログラムの先頭付近に
432	392	C<use feature 'unicode_strings'> を書いてください。
433	393	このレキシカルスコープ内ではこの問題は発生しないはずです。
434	394	これはまた C<use feature ':5.12'> または C<use v5.12> が有効か、Perl 5.12
435	395	以降でコマンドラインで C<-E> を使っていると自動的に有効になります。
436	396
437	397	=begin original
438	398
439	399	The rationale for requiring this is to not break older programs that
440	400	rely on the way things worked before Unicode came along. Those older
441	401	programs knew only about the ASCII character set, and so may not work
442	402	properly for additional characters. When a string is encoded in UTF-8,
443	403	Perl assumes that the program is prepared to deal with Unicode, but when
444	404	the string isn't, Perl assumes that only ASCII
445	405	is wanted, and so those characters that are not ASCII
446	406	characters aren't recognized as to what they would be in Unicode.
447	407	C<use feature 'unicode_strings'> tells Perl to treat all characters as
448	408	Unicode, whether the string is encoded in UTF-8 or not, thus avoiding
449	409	the problem.
450	410
451	411	=end original
452	412
453	413	これが必要な理論的根拠は、Unicode がやってくる前に動作する方法に
454	414	依存している古いプログラムを壊さないことです。
455	415	このような古いプログラムは ASCII 文字集合のみを知っているので、追加の
456	416	文字については正しく動作しないかも知れません。
457	417	Perl はプログラムが Unicode を扱えるように準備されていると仮定しますが、
458	418	文字列がそうでなかった場合、Perl は (EBCDIC プラットフォームでなければ)
459	419	ASCII のみが求められていると仮定するので、非 ASCII 文字は Unicode に
460	420	するべきものとして認識しません。
461	421	C<use feature 'unicode_strings'> は Perl に、文字が UTF-8 で
462	422	エンコードされているかどうかにかかわらず全ての文字を Unicode として
463	423	扱うように知らせて、この問題を回避します。
464	424
465	425	=begin original
466	426
467	427	However, on earlier Perls, or if you pass strings to subroutines outside
468	428	the feature's scope, you can force Unicode rules by changing the
469	429	encoding to UTF-8 by doing C<utf8::upgrade($string)>. This can be used
470	430	safely on any string, as it checks and does not change strings that have
471	431	already been upgraded.
472	432
473	433	=end original
474	434
475	435	しかし、以前の Perl であったり、この機能のスコープの外側のサブルーチンに
476	436	文字列を渡した場合、C<utf8::upgrade($string)> とすることでエンコーディングを
477	437	UTF-8 にすることで Unicode の規則を強制できます。
478	438	これは既に昇格している文字列は変更しないので、どのような文字列に対しても
479	439	安全に用いることができます。
480	440
481	441	=begin original
482	442
483	443	For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
484	444
485	445	=end original
486	446
487	447	さらなる詳細な議論については、CPAN の L<Unicode::Semantics> を
488	448	参照してください。
489	449
490	450	=head2 Why do some characters not uppercase or lowercase correctly?
491	451
492	452	(なぜいくつかの文字は正しく大文字や小文字にならないの?)
493	453
494	454	=begin original
495	455
496	456	See the answer to the previous question.
497	457
498	458	=end original
499	459
500	460	前述の質問の答えを参照してください。
501	461
502	462	=head2 How can I determine if a string is a text string or a binary string?
503	463
504	464	(文字列がテキスト文字列かバイナリ文字列かを決定するには?)
505	465
506	466	=begin original
507	467
508	468	You can't. Some use the UTF8 flag for this, but that's misuse, and makes well
509	469	behaved modules like Data::Dumper look bad. The flag is useless for this
510	470	purpose, because it's off when an 8 bit encoding (by default ISO-8859-1) is
511	471	used to store the string.
512	472
513	473	=end original
514	474
515	475	それはできません。
516	476	このために UTF8 フラグを使う人もいますが、これは誤用で、Data::Dumper のように
517	477	正しく振る舞うモジュールをおかしくします。
518	478	このフラグはこの目的のためには使えません; なぜなら文字列の保管に 8 ビット
519	479	エンコーディングが使われている場合 (デフォルトでは ISO-8859-1 です)、
520	480	オフだからです。
521	481
522	482	=begin original
523	483
524	484	This is something you, the programmer, has to keep track of; sorry. You could
525	485	consider adopting a kind of "Hungarian notation" to help with this.
526	486
527	487	=end original
528	488
529	489	把握しておく必要があるプログラマに言えることはこれです; ごめんなさい。
530	490	これを助けるために、「ハンガリアン記法」のようなものの採用を
531	491	検討することもできます。
532	492
533	493	=head2 How do I convert from encoding FOO to encoding BAR?
534	494
535	495	(エンコーディング FOO からエンコーディング BAR に変換するには?)
536	496
537	497	=begin original
538	498
539	499	By first converting the FOO-encoded byte string to a text string, and then the
540	500	text string to a BAR-encoded byte string:
541	501
542	502	=end original
543	503
544	504	まず FOO でエンコードされたバイト文字列をテキスト文字列に変化し、
545	505	それからテキスト文字列を BAR エンコードされたバイト文字列に変換します:
546	506
547	507	my $text_string = decode('FOO', $foo_string);
548	508	my $bar_string = encode('BAR', $text_string);
549	509
550	510	=begin original
551	511
552	512	or by skipping the text string part, and going directly from one binary
553	513	encoding to the other:
554	514
555	515	=end original
556	516
557	517	あるいは、テキスト文字列の部分を飛ばして、あるバイナリエンコーディングから
558	518	他のものへ直接変換します:
559	519
560	520	use Encode qw(from_to);
561	521	from_to($string, 'FOO', 'BAR'); # changes contents of $string
562	522
563	523	=begin original
564	524
565	525	or by letting automatic decoding and encoding do all the work:
566	526
567	527	=end original
568	528
569	529	あるいは、自動でデコードとエンコードをさせることで全ての作業を行います:
570	530
571	531	open my $foofh, '<:encoding(FOO)', 'example.foo.txt';
572	532	open my $barfh, '>:encoding(BAR)', 'example.bar.txt';
573	533	print { $barfh } $_ while <$foofh>;
574	534
575	535	=head2 What are C<decode_utf8> and C<encode_utf8>?
576	536
577	537	(C<decode_utf8> と C<encode_utf8> って何?)
578	538
579	539	=begin original
580	540
581	541	These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
582		...)>. ~~Do not use these functions for data exchange. Instead use~~
	542	...)>.
583		C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see
584		L</What's the difference between UTF-8 and utf8?> below.
585	543
586	544	=end original
587	545
588	546	これらは C<decode('utf8', ...)> および C<encode('utf8', ...)> のもう一つの
589	547	文法です。
590		これらの関数をデータ交換に使わないでください。
591		代わりに C<decode('UTF-8', ...)> と C<encode('UTF-8', ...)> を使ってください;
592		後述する L</What's the difference between UTF-8 and utf8?> を
593		参照してください。
594	548
595	549	=head2 What is a "wide character"?
596	550
597	551	(「ワイド文字」って何?)
598	552
599	553	=begin original
600	554
601	555	This is a term used for characters occupying more than one byte.
602	556
603	557	=end original
604	558
605	559	これは、1 バイトで収まらない文字という意味で使われる用語です。
606	560
607	561	=begin original
608	562
609	563	The Perl warning "Wide character in ..." is caused by such a character.
610	564	With no specified encoding layer, Perl tries to
611	565	fit things into a single byte. When it can't, it
612	566	emits this warning (if warnings are enabled), and uses UTF-8 encoded data
613	567	instead.
614	568
615	569	=end original
616	570
617	571	Perl の警告 "Wide character in ..." はそのような文字によって引き起こされます。
618	572	エンコーディング層が指定されていない場合、Perl はそれを単一のバイトに
619	573	納めようとします。
620	574	これができないと、(警告が有効なら)この警告が出力され、代わりに UTF-8 で
621	575	エンコードされたデータを使います。
622	576
623	577	=begin original
624	578
625	579	To avoid this warning and to avoid having different output encodings in a single
626	580	stream, always specify an encoding explicitly, for example with a PerlIO layer:
627	581
628	582	=end original
629	583
630	584	この警告を回避し、一つのストリームに異なった出力エンコーディングが
631	585	出力されることを回避するには、常に明示的にエンコーディングを指定してください;
632	586	例えば PerlIO 層を使って:
633	587
634	588	binmode STDOUT, ":encoding(UTF-8)";
635	589
636	590	=head1 INTERNALS
637	591
638	592	(内部構造)
639	593
640	594	=head2 What is "the UTF8 flag"?
641	595
642	596	(「UTF8 フラグ」って何?)
643	597
644	598	=begin original
645	599
646	600	Please, unless you're hacking the internals, or debugging weirdness, don't
647	601	think about the UTF8 flag at all. That means that you very probably shouldn't
648	602	use C<is_utf8>, C<_utf8_on> or C<_utf8_off> at all.
649	603
650	604	=end original
651	605
652	606	内部をハックしようとしているか、変なものをデバッグしようとしているのでない
653	607	限り、どうか UTF8 フラグのことは一切考えないでください。
654	608	これは、まず間違いなく C<is_utf8>, C<_utf8_on>, C<_utf8_off> を
655	609	一切使うべきでないことを意味します。
656	610
657	611	=begin original
658	612
659	613	The UTF8 flag, also called SvUTF8, is an internal flag that indicates that the
660	614	current internal representation is UTF-8. Without the flag, it is assumed to be
661	615	ISO-8859-1. Perl converts between these automatically. (Actually Perl usually
662	616	assumes the representation is ASCII; see L</Why do regex character classes
663	617	sometimes match only in the ASCII range?> above.)
664	618
665	619	=end original
666	620
667	621	UTF8 フラグ(SvUTF8 とも呼ばれます)は、現在の内部表現が UTF-8 であることを
668	622	示す内部フラグです。
669	623	このフラグがない場合、ISO-8859-1 と仮定します。
670	624	Perl はこれらを自動的に変換します。
671	625	(実際のところ Perl は普通表現が ASCII であると仮定します; 上述の L</Why do
672	626	regex character classes sometimes match only in the ASCII range?> を
673	627	参照してください。)
674	628
675	629	=begin original
676	630
677	631	One of Perl's internal formats happens to be UTF-8. Unfortunately, Perl can't
678	632	keep a secret, so everyone knows about this. That is the source of much
679	633	confusion. It's better to pretend that the internal format is some unknown
680	634	encoding, and that you always have to encode and decode explicitly.
681	635
682	636	=end original
683	637
684	638	Perl の内部表現の一つはたまたま UTF-8 です。
685	639	残念ながら、Perl は秘密を守れないので、このことはみんな知っています。
686	640	これが多くの混乱の源です。
687	641	内部表現は何か分からないエンコーディングで、常に明示的にエンコードと
688	642	デコードが必要ということにしておいた方がよいです。
689	643
690	644	=head2 What about the C<use bytes> pragma?
691	645
692	646	(C<use bytes> プラグマって何?)
693	647
694	648	=begin original
695	649
696	650	Don't use it. It makes no sense to deal with bytes in a text string, and it
697	651	makes no sense to deal with characters in a byte string. Do the proper
698	652	conversions (by decoding/encoding), and things will work out well: you get
699	653	character counts for decoded data, and byte counts for encoded data.
700	654
701	655	=end original
702	656
703	657	これは使わないでください。
704	658	テキスト文字列をバイト単位で扱うことに意味はありませんし、
705	659	バイト文字列を文字単位で扱うことには意味はありません。
706	660	適切な変換(デコードかエンコード)を行えば、物事はうまくいきます:
707	661	デコードしたデータの文字数を得られますし、エンコードしたデータのバイト数を
708	662	得られます。
709	663
710	664	=begin original
711	665
712	666	C<use bytes> is usually a failed attempt to do something useful. Just forget
713	667	about it.
714	668
715	669	=end original
716	670
717	671	C<use bytes> は何か有用なことをしようとするためには間違った方法です。
718	672	これのことは単に忘れてください。
719	673
720	674	=head2 What about the C<use encoding> pragma?
721	675
722	676	(C<use encoding> プラグマって何?)
723	677
724	678	=begin original
725	679
726	680	Don't use it. Unfortunately, it assumes that the programmer's environment and
727	681	that of the user will use the same encoding. It will use the same encoding for
728	682	the source code and for STDIN and STDOUT. When a program is copied to another
729	683	machine, the source code does not change, but the STDIO environment might.
730	684
731	685	=end original
732	686
733	687	これは使わないでください。
734	688	残念ながら、これはプログラマの環境とユーザーの環境が同じであると仮定します。
735	689	これはソースコードと STDIN や STDOUT で同じエンコーディングを使います。
736	690	プログラムが他のマシンにコピーされると、ソースコードは変わりませんが、
737	691	STDIO 環境は変わるかもしれません。
738	692
739	693	=begin original
740	694
741	695	If you need non-ASCII characters in your source code, make it a UTF-8 encoded
742	696	file and C<use utf8>.
743	697
744	698	=end original
745	699
746	700	もしソースコードに非 ASCII 文字が必要なら、ファイルを UTF-8 で
747	701	エンコードして、C<use utf8> を使ってください。
748	702
749	703	=begin original
750	704
751	705	If you need to set the encoding for STDIN, STDOUT, and STDERR, for example
752	706	based on the user's locale, C<use open>.
753	707
754	708	=end original
755	709
756	710	もし STDIN, STDOUT, STDERR のエンコーディングを、例えばユーザーのロケールに
757	711	合わせてセットする必要があるなら、C<use open> してください。
758	712
759	713	=head2 What is the difference between C<:encoding> and C<:utf8>?
760	714
761	715	(C<:encoding> と C<:utf8> の違いは?)
762	716
763	717	=begin original
764	718
765	719	Because UTF-8 is one of Perl's internal formats, you can often just skip the
766	720	encoding or decoding step, and manipulate the UTF8 flag directly.
767	721
768	722	=end original
769	723
770	724	UTF-8 は Perl の内部形式のひとつなので、しばしばエンコードやデコードの
771	725	手順を省略して、UTF8 フラグを直接操作できます。
772	726
773	727	=begin original
774	728
775	729	Instead of C<:encoding(UTF-8)>, you can simply use C<:utf8>, which skips the
776	730	encoding step if the data was already represented as UTF8 internally. This is
777	731	widely accepted as good behavior when you're writing, but it can be dangerous
778	732	when reading, because it causes internal inconsistency when you have invalid
779	733	byte sequences. Using C<:utf8> for input can sometimes result in security
780	734	breaches, so please use C<:encoding(UTF-8)> instead.
781	735
782	736	=end original
783	737
784	738	C<:encoding(UTF-8)> の代わりに単に C<:utf8> を使うことで、もしデータが
785	739	内部で既に UTF8 で表現されていれば、エンコードの手順を省略します。
786	740	これは、書き込むときにはよい振る舞いであると広く受け入れられていますが、
787	741	読み込むときには危険があります; なぜなら不正なバイト列を受け取ると
788	742	内部矛盾を引き起こすからです。
789	743	入力に C<:utf8> を使うとセキュリティ侵害を引き起こす可能性があるので、
790	744	どうか代わりに C<:encoding(UTF-8)> を使ってください。
791	745
792	746	=begin original
793	747
794	748	Instead of C<decode> and C<encode>, you could use C<_utf8_on> and C<_utf8_off>,
795	749	but this is considered bad style. Especially C<_utf8_on> can be dangerous, for
796	750	the same reason that C<:utf8> can.
797	751
798	752	=end original
799	753
800	754	C<decode> と C<encode> の代わりに、C<_utf8_on> と C<_utf8_off> を
801	755	使えますが、これは悪いスタイルと考えられています。
802	756	特に C<_utf8_on> は、C<:utf8> と同じ理由で危険です。
803	757
804	758	=begin original
805	759
806	760	There are some shortcuts for oneliners;
807		see L<-C ~~in perlrun~~\|perlrun/-C [numberE<sol>list]>.
	761	see L<-C\|perlrun/-C [numberE<sol>list]> in L<perlrun>.
808	762
809	763	=end original
810	764
811	765	一行野郎のための省略形があります; L<perlrun> の
812		L<-C ~~in perlrun~~\|perlrun/-C [numberE<sol>list]> を参照してください。
	766	L<-C\|perlrun/-C [numberE<sol>list]> を参照してください。
813	767
814	768	=head2 What's the difference between C<UTF-8> and C<utf8>?
815	769
816	770	(C<UTF-8> と C<utf8> の違いは?)
817	771
818	772	=begin original
819	773
820	774	C<UTF-8> is the official standard. C<utf8> is Perl's way of being liberal in
821	775	what it accepts. If you have to communicate with things that aren't so liberal,
822	776	you may want to consider using C<UTF-8>. If you have to communicate with things
823	777	that are too liberal, you may have to use C<utf8>. The full explanation is in
824		L<Encode~~/"UTF-8 vs. utf8 vs. UTF8"~~>.
	778	L<Encode>.
825	779
826	780	=end original
827	781
828	782	C<UTF-8> は公式な標準です。
829	783	C<utf8> は、何を受け入れるかに関して自由な Perl のやり方です。
830	784	もしそれほど自由でないものと対話する必要があるなら、
831	785	C<UTF-8> を使うことを考えたくなるかもしれません。
832	786	自由すぎるものと対話する必要があるなら、C<utf8> を
833	787	使わなければならないかもしれません。
834		完全な説明は L<Encode~~/"UTF-8 vs. utf8 vs. UTF8"~~> にあります。
	788	完全な説明は L<Encode> にあります。
835	789
836	790	=begin original
837	791
838	792	C<UTF-8> is internally known as C<utf-8-strict>. The tutorial uses UTF-8
839	793	consistently, even where utf8 is actually used internally, because the
840	794	distinction can be hard to make, and is mostly irrelevant.
841	795
842	796	=end original
843	797
844	798	C<UTF-8> は内部では C<utf-8-strict> として知られます。
845	799	チュートリアルでは、たとえ内部では実際には utf8 が使われる場合でも
846	800	一貫して UTF-8 を使っています; なぜなら区別をつけるのは難しく、ほとんど
847	801	無意味だからです。
848	802
849	803	=begin original
850	804
851	805	For example, utf8 can be used for code points that don't exist in Unicode, like
852	806	9999999, but if you encode that to UTF-8, you get a substitution character (by
853	807	default; see L<Encode/"Handling Malformed Data"> for more ways of dealing with
854	808	this.)
855	809
856	810	=end original
857	811
858	812	例えば utf8 は、9999999 のような、Unicode に存在しない符号位置も使えますが、
859	813	これを UTF-8 でエンコードすると、代替文字を得ることになります(これは
860	814	デフォルトの場合です; これを扱う他の方法については
861	815	L<Encode/"Handling Malformed Data"> を参照してください。)
862	816
863	817	=begin original
864	818
865	819	Okay, if you insist: the "internal format" is utf8, not UTF-8. (When it's not
866	820	some other encoding.)
867	821
868	822	=end original
869	823
870	824	わかりました、どうしてもと言うのなら:「内部形式」は utf8 であって、
871	825	UTF-8 ではありません。
872	826	(もしその他のエンコーディングでないのなら。)
873	827
874	828	=head2 I lost track; what encoding is the internal format really?
875	829
876	830	(迷子になりました; 実際のところ内部形式のエンコーディングは何?)
877	831
878	832	=begin original
879	833
880	834	It's good that you lost track, because you shouldn't depend on the internal
881	835	format being any specific encoding. But since you asked: by default, the
882	836	internal format is either ISO-8859-1 (latin-1), or utf8, depending on the
883	837	history of the string. On EBCDIC platforms, this may be different even.
884	838
885	839	=end original
886	840
887	841	迷子になったのはよいことです; なぜなら内部形式が特定のエンコーディングで
888	842	あることに依存するべきではないからです。
889	843	しかし聞かれたので答えましょう: デフォルトでは、内部形式は
890	844	ISO-8859-1 (latin-1) か utf8 で、どちらになるかは文字列の歴史に
891	845	依存します。
892	846	EBCDIC プラットフォームでは、これは異なっているかもしれません。
893	847
894	848	=begin original
895	849
896	850	Perl knows how it stored the string internally, and will use that knowledge
897	851	when you C<encode>. In other words: don't try to find out what the internal
898	852	encoding for a certain string is, but instead just encode it into the encoding
899	853	that you want.
900	854
901	855	=end original
902	856
903	857	Perl は文字列が内部でどのように保管されているかを知っていて、この知識を
904	858	C<エンコードする> ときに使います。
905	859	言い換えると: 特定の文字列の内部エンコーディングが何かを
906	860	調べようとしてはいけません; 代わりに、単に望みのエンコーディングに
907	861	エンコードしてください。
908	862
909	863	=head1 AUTHOR
910	864
911	865	Juerd Waalboer <#####@juerd.nl>
912	866
913	867	=head1 SEE ALSO
914	868
915	869	L<perlunicode>, L<perluniintro>, L<Encode>
916	870
917	871	=begin meta
918	872
919	873	Translate: SHIRAKATA Kentaro <argrath@ub32.org> (5.10.0-)
920	874	Status: completed
921	875
922	876	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association