perlunifaq 5.28.0 と 5.10.1 の差分

1	1
2	2	=encoding euc-jp
3	3
4	4	=head1 NAME
5	5
6	6	=begin original
7	7
8	8	perlunifaq - Perl Unicode FAQ
9	9
10	10	=end original
11	11
12	12	perlunifaq - Perl Unicode FAQ
13	13
14	14	=head1 Q and A
15	15
16	16	=begin original
17	17
18	18	This is a list of questions and answers about Unicode in Perl, intended to be
19	19	read after L<perlunitut>.
20	20
21	21	=end original
22	22
23	23	これは、L<perlunitut> の後で読むことを想定した、Perl での Unicode に関する
24	24	質問と答えの一覧です。
25	25
26	26	=head2 perlunitut isn't really a Unicode tutorial, is it?
27	27
28	28	(perlunitut は実際には Unicode チュートリアルじゃないんじゃないの?)
29	29
30	30	=begin original
31	31
32	32	No, and this isn't really a Unicode FAQ.
33	33
34	34	=end original
35	35
36	36	はい、違います; そしてこれは実際には Unicode FAQ ではありません。
37	37
38	38	=begin original
39	39
40		Perl has an abstracted interface for all supported character encodings, so this
	40	Perl has an abstracted interface for all supported character encodings, so they
41	41	is actually a generic C<Encode> tutorial and C<Encode> FAQ. But many people
42	42	think that Unicode is special and magical, and I didn't want to disappoint
43	43	them, so I decided to call the document a Unicode tutorial.
44	44
45	45	=end original
46	46
47	47	Perl は対応している全ての文字エンコーディングへの抽象インターフェースを
48	48	持っているので、実際には汎用の C<Encode> チュートリアルと
49	49	C<Encode> FAQ です。
50	50	しかし、多くの人々が、Unicode は特別でマジカルなものだと考えていて、
51	51	私は彼らを失望させたくなかったので、そのドキュメントを
52	52	Unicode チュートリアルと呼ぶことに決めました。
53	53
54	54	=head2 What character encodings does Perl support?
55	55
56	56	(Perl が対応している文字エンコーディングは何?)
57	57
58	58	=begin original
59	59
60	60	To find out which character encodings your Perl supports, run:
61	61
62	62	=end original
63	63
64	64	Perl がどの文字エンコーディングに対応しているかを見つけるには、以下を
65	65	実行してください:
66	66
67	67	perl -MEncode -le "print for Encode->encodings(':all')"
68	68
69	69	=head2 Which version of perl should I use?
70	70
71	71	(どのバージョンの perl を使うべき?)
72	72
73	73	=begin original
74	74
75	75	Well, if you can, upgrade to the most recent, but certainly C<5.8.1> or newer.
76		The tutorial and FAQ as~~sum~~e the latest ~~rele~~ase.
	76	The tutorial and FAQ are based on the status quo as of C<5.8.8>.
77	77
78	78	=end original
79	79
80	80	うーん、もし可能なら、最新にアップグレードしてください; 但し、確実に
81	81	C<5.8.1> 以降にはしてください。
82		チュートリアルと FAQ は~~最新リリース~~を仮定しています。
	82	チュートリアルと FAQ は C<5.8.8> の状態を基にしています。
83	83
84	84	=begin original
85	85
86	86	You should also check your modules, and upgrade them if necessary. For example,
87	87	HTML::Entities requires version >= 1.32 to function correctly, even though the
88	88	changelog is silent about this.
89	89
90	90	=end original
91	91
92	92	モジュールもチェックして、もし必要ならアップグレードするべきです。
93	93	例えば HTML::Entities は、changelog は何も触れていませんが、正しく
94	94	動作するためにはバージョン >= 1.32 が必要です。
95	95
96	96	=head2 What about binary data, like images?
97	97
98	98	(イメージのようなバイナリデータはどうするの?)
99	99
100	100	=begin original
101	101
102	102	Well, apart from a bare C<binmode $fh>, you shouldn't treat them specially.
103	103	(The binmode is needed because otherwise Perl may convert line endings on Win32
104	104	systems.)
105	105
106	106	=end original
107	107
108	108	うーん、生の C<binmode $fh> を別として、特別に扱う必要はないはずです。
109	109	(Win32 システムで Perl が行端を変更しないようにするために、binmode が
110	110	必要です。)
111	111
112	112	=begin original
113	113
114	114	Be careful, though, to never combine text strings with binary strings. If you
115	115	need text in a binary stream, encode your text strings first using the
116	116	appropriate encoding, then join them with binary strings. See also: "What if I
117	117	don't encode?".
118	118
119	119	=end original
120	120
121	121	但し、決してテキスト文字列とバイナリ文字列を結合しないように
122	122	注意してください。
123	123	もしバイナリストリームにテキストが必要なら、まずテキスト文字列を適切な
124	124	エンコーディングを使ってエンコードして、それをバイナリ文字列と
125	125	結合してください。
126	126	L<"What if I don't encode?"> も参照してください。
127	127
128	128	=head2 When should I decode or encode?
129	129
130	130	(デコードやエンコードはいつ行うべき?)
131	131
132	132	=begin original
133	133
134	134	Whenever you're communicating text with anything that is external to your perl
135	135	process, like a database, a text file, a socket, or another program. Even if
136	136	the thing you're communicating with is also written in Perl.
137	137
138	138	=end original
139	139
140	140	データベース、テキストファイル、ソケット、他のプログラムといった、自分の
141	141	perl プロセスの外側にある何かとテキストを通信するときはいつでも、です。
142	142	通信の相手が Perl で書かれている場合も同じです。
143	143
144	144	=head2 What if I don't decode?
145	145
146	146	(デコードしないとどうなるの?)
147	147
148	148	=begin original
149	149
150	150	Whenever your encoded, binary string is used together with a text string, Perl
151	151	will assume that your binary string was encoded with ISO-8859-1, also known as
152	152	latin-1. If it wasn't latin-1, then your data is unpleasantly converted. For
153	153	example, if it was UTF-8, the individual bytes of multibyte characters are seen
154	154	as separate characters, and then again converted to UTF-8. Such double encoding
155	155	can be compared to double HTML encoding (C<&gt;>), or double URI encoding
156	156	(C<%253E>).
157	157
158	158	=end original
159	159
160	160	エンコードされたバイナリ文字列をテキスト文字列と一緒に使ったときはいつでも、
161	161	Perl はバイナリ文字列が ISO-8859-1 またの名を latin-1 と仮定します。
162	162	もしこれが latin-1 でなかった場合、データは不愉快な形に変換されます。
163	163	例えば、もしデータが UTF-8 だった場合、マルチバイト文字のそれぞれのバイトが
164	164	文字として扱われ、それから再び UTF-8 に変換されます。
165	165	このような二重エンコードは二重 HTML エンコーディング (C<&gt;>) や
166	166	二重 URI エンコーディング (C<%253E>) と比較できます。
167	167
168	168	=begin original
169	169
170	170	This silent implicit decoding is known as "upgrading". That may sound
171	171	positive, but it's best to avoid it.
172	172
173	173	=end original
174	174
175	175	この、暗黙のうちに行われるデコードは「昇格」("upgrading")と呼ばれます。
176	176	これは前向きなことに聞こえるかもしれませんが、避けるのが最良です。
177	177
178	178	=head2 What if I don't encode?
179	179
180	180	(エンコードしないとどうなるの?)
181	181
182	182	=begin original
183	183
184	184	Your text string will be sent using the bytes in Perl's internal format. In
185	185	some cases, Perl will warn you that you're doing something wrong, with a
186	186	friendly warning:
187	187
188	188	=end original
189	189
190	190	テキスト文字列は Perl の内部形式のバイト列を使って送信されます。
191	191	いくつかの場合では、Perl は何かが間違っていることを、親切なメッセージで
192	192	警告します:
193	193
194	194	Wide character in print at example.pl line 2.
195	195
196	196	=begin original
197	197
198	198	Because the internal format is often UTF-8, these bugs are hard to spot,
199	199	because UTF-8 is usually the encoding you wanted! But don't be lazy, and don't
200	200	use the fact that Perl's internal format is UTF-8 to your advantage. Encode
201	201	explicitly to avoid weird bugs, and to show to maintenance programmers that you
202	202	thought this through.
203	203
204	204	=end original
205	205
206	206	内部形式はしばしば UTF-8 なので、このバグは発見しにくいです; なぜなら
207	207	あなたがほしいのは普通 UTF-8 だからです!
208	208	しかし、手を抜かないでください; そして Perl の内部形式が UTF-8 であることを
209	209	利用しようとしないでください。
210	210	奇妙なバグを防ぐため、そして保守プログラマに対してあなたが何を考えたかを
211	211	示すために、明示的にエンコードしてください。
212	212
213	213	=head2 Is there a way to automatically decode or encode?
214	214
215	215	(自動的にデコードやエンコードする方法はある?)
216	216
217	217	=begin original
218	218
219	219	If all data that comes from a certain handle is encoded in exactly the same
220	220	way, you can tell the PerlIO system to automatically decode everything, with
221	221	the C<encoding> layer. If you do this, you can't accidentally forget to decode
222	222	or encode anymore, on things that use the layered handle.
223	223
224	224	=end original
225	225
226	226	もし、あるハンドルから来る全てのデータが正確に同じ方法で
227	227	エンコードされているなら、C<encoding> 層を使って、 PerlIO システムに自動的に
228	228	全てをデコードするように伝えることができます。
229	229	これを行えば、この層のハンドルを使っている限り、うっかりデコードや
230	230	エンコードを忘れることはありません。
231	231
232	232	=begin original
233	233
234	234	You can provide this layer when C<open>ing the file:
235	235
236	236	=end original
237	237
238	238	ファイルを C<open> するときにこの層を指定することができます:
239	239
240		open my $fh, '>:encoding(UTF-8)', $filename; # auto encoding on write
	240	open my $fh, '>:encoding(UTF-8)', $filename; # auto encoding on write
241		open my $fh, '<:encoding(UTF-8)', $filename; # auto decoding on read
	241	open my $fh, '<:encoding(UTF-8)', $filename; # auto decoding on read
242	242
243	243	=begin original
244	244
245	245	Or if you already have an open filehandle:
246	246
247	247	=end original
248	248
249	249	あるいは既にオープンしているファイルハンドルがあるなら:
250	250
251		binmode $fh, ':encoding(UTF-8)';
	251	binmode $fh, ':encoding(UTF-8)';
252	252
253	253	=begin original
254	254
255	255	Some database drivers for DBI can also automatically encode and decode, but
256	256	that is sometimes limited to the UTF-8 encoding.
257	257
258	258	=end original
259	259
260	260	DBI のデータベースドライバのいくつかも、エンコードとデコードを自動的に
261	261	行いますが、ときどきこれは UTF-8 エンコーディングに制限されています。
262	262
263	263	=head2 What if I don't know which encoding was used?
264	264
265	265	(どのエンコーディングが使われているかわからないときは?)
266	266
267	267	=begin original
268	268
269	269	Do whatever you can to find out, and if you have to: guess. (Don't forget to
270	270	document your guess with a comment.)
271	271
272	272	=end original
273	273
274	274	なんとかして見つけるか、もし必要なら、推測してください。
275	275	(どう推測したかをコメントとして文書化するのを忘れないでください。)
276	276
277	277	=begin original
278	278
279	279	You could open the document in a web browser, and change the character set or
280	280	character encoding until you can visually confirm that all characters look the
281	281	way they should.
282	282
283	283	=end original
284	284
285	285	ドキュメントを web ブラウザで開いて、全ての文字があるべき形であることを
286	286	視覚的に確認できるまで文字集合や文字エンコーディングを変更する方法も
287	287	あります。
288	288
289	289	=begin original
290	290
291	291	There is no way to reliably detect the encoding automatically, so if people
292	292	keep sending you data without charset indication, you may have to educate them.
293	293
294	294	=end original
295	295
296	296	エンコーディングを自動的に検出するための信頼性のある方法はないので、
297	297	もし人々があなたに文字集合の指示なしにデータを送り続けるなら、彼らを
298	298	教育する必要があるかもしれません。
299	299
300	300	=head2 Can I use Unicode in my Perl sources?
301	301
302	302	(Perl のソースコードに Unicode は使える?)
303	303
304	304	=begin original
305	305
306	306	Yes, you can! If your sources are UTF-8 encoded, you can indicate that with the
307	307	C<use utf8> pragma.
308	308
309	309	=end original
310	310
311	311	はい、できます!
312	312	ソースコードが UTF-8 でエンコードされているなら、C<use utf8> プラグマを
313	313	使ってそれを示すことができます。
314	314
315	315	use utf8;
316	316
317	317	=begin original
318	318
319	319	This doesn't do anything to your input, or to your output. It only influences
320	320	the way your sources are read. You can use Unicode in string literals, in
321	321	identifiers (but they still have to be "word characters" according to C<\w>),
322	322	and even in custom delimiters.
323	323
324	324	=end original
325	325
326	326	これは入出力に対しては何も行いません。
327	327	ソースを読み込む方法のみに影響を与えます。
328	328	文字列リテラル、識別子(しかし C<\w> に従った「単語文字」である必要が
329	329	あります)、そして独自デリミタにすら Unicode が使えます。
330	330
331	331	=head2 Data::Dumper doesn't restore the UTF8 flag; is it broken?
332	332
333	333	(Data::Dumper は UTF8 フラグを復元しません; これは壊れてるの?)
334	334
335	335	=begin original
336	336
337	337	No, Data::Dumper's Unicode abilities are as they should be. There have been
338	338	some complaints that it should restore the UTF8 flag when the data is read
339	339	again with C<eval>. However, you should really not look at the flag, and
340	340	nothing indicates that Data::Dumper should break this rule.
341	341
342	342	=end original
343	343
344	344	いいえ、Data::Dumper の Unicode 能力は、あるべき形であります。
345	345	C<eval> で再びデータを読み込むとき、UTF8 フラグを復元するべきだという
346	346	苦情が来ることがあります。
347	347	しかし、実際にはフラグを見るべきではないですし、Data::Dumper がこの規則を
348	348	破っていることを示すものは何もありません。
349	349
350	350	=begin original
351	351
352	352	Here's what happens: when Perl reads in a string literal, it sticks to 8 bit
353	353	encoding as long as it can. (But perhaps originally it was internally encoded
354	354	as UTF-8, when you dumped it.) When it has to give that up because other
355	355	characters are added to the text string, it silently upgrades the string to
356	356	UTF-8.
357	357
358	358	=end original
359	359
360	360	起きているのは以下のようなことです: Perl が文字列リテラルを読み込むとき、
361	361	可能な限り長く 8 ビットエンコーディングにこだわります。
362	362	(しかしおそらく、これをダンプしたときには内部では UTF-8 でエンコード
363	363	されていました。)
364	364	それ以外の文字をテキスト文字列に追加するためにこれを諦めなければならない
365	365	とき、Perl は暗黙のうちに文字列を UTF-8 に昇格させます。
366	366
367	367	=begin original
368	368
369	369	If you properly encode your strings for output, none of this is of your
370	370	concern, and you can just C<eval> dumped data as always.
371	371
372	372	=end original
373	373
374	374	出力用の文字列を適切にエンコードしていれば、これについてあなたは何も
375	375	心配することはなく、いつも通りにダンプしたデータを C<eval> できます。
376	376
377	377	=head2 Why do regex character classes sometimes match only in the ASCII range?
378	378
379	379	(なぜ正規表現文字クラスは時々 ASCII の範囲にしかマッチしないの?)
380	380
	381	=head2 Why do some characters not uppercase or lowercase correctly?
	382
	383	(なぜいくつかの文字は正しく大文字や小文字にならないの?)
	384
381	385	=begin original
382	386
383		St~~arting~~ ~~in P~~erl ~~5.14 (~~and ~~part~~ia~~lly~~ in Perl ~~5.12)~~, ~~jus~~t put a
	387	It seemed like a good idea at the time, to keep the semantics the same for
384		~~C<u~~s~~e fea~~t~~ure 'u~~n~~ico~~de_strings'> near the ~~begi~~nning of your program.
	388	standard strings, when Perl got Unicode support. While it might be repaired
385		Wi~~thi~~n its le~~xical~~ ~~scop~~e you shouldn't have this probl~~em.~~ It also is
	389	in the future, we now have to deal with the fact that Perl treats equal
386		aut~~omat~~i~~cally~~ enabled un~~der~~ ~~C<us~~e ~~fea~~ture ~~':5.12'> o~~r ~~C<u~~se v5.~~12> or~~
	390	strings differently, depending on the internal state.
387		using C<-E> on the command line for Perl 5.12 or higher.
388	391
389	392	=end original
390	393
391		Perl ~~5.14~~ ~~から (そして部分的に P~~erl ~~5.12 から、) 単~~に~~プログラムの先頭付近に~~
	394	Perl が Unicode 対応になった時点では、これは標準文字列と同じ意味論を
392		~~C<use feature 'unicode_strings'> を書~~い~~てくださ~~い。
	395	維持するのにいい考えだと思われました。
393		こ~~のレキシカルスコープ内で~~は~~この問題は発生~~しないはずです。
	396	一方、これは将来修正されるかもしれないので、Perl が同じ文字列を内部状態に
394		~~これは~~また ~~C<use feature ':5.12'> または C<use v5.12> が有効か、Perl 5.12~~
	397	よって異なる扱いをするという事実に対応する必要が出てきました。
395		以降でコマンドラインで C<-E> を使っていると自動的に有効になります。
396	398
397	399	=begin original
398	400
399		The rat~~ional~~e for requ~~iring~~ this is to ~~not~~ ~~break~~ ~~older~~ ~~programs that~~
	401	Affected are C<uc>, C<lc>, C<ucfirst>, C<lcfirst>, C<\U>, C<\L>, C<\u>, C<\l>,
400		~~rely~~ ~~on the way thing~~s w~~orked~~ ~~before~~ ~~Unicode~~ ~~came~~ ~~along~~. ~~Those older~~
	402	C<\d>, C<\s>, C<\w>, C<\D>, C<\S>, C<\W>, C</.../i>, C<(?i:...)>,
401		pro~~gram~~s knew o~~nly~~ about the ~~ASCII ch~~a~~rac~~ter s~~et,~~ and so may ~~not wo~~rk
	403	C</[[:posix:]]/>, and C<quotemeta> (though this last should not cause any real
402		pro~~per~~l~~y for additional charact~~ers. ~~When a string is encoded in UTF-8,~~
	404	problems).
403		Perl assumes that the program is prepared to deal with Unicode, but when
404		the string isn't, Perl assumes that only ASCII
405		is wanted, and so those characters that are not ASCII
406		characters aren't recognized as to what they would be in Unicode.
407		C<use feature 'unicode_strings'> tells Perl to treat all characters as
408		Unicode, whether the string is encoded in UTF-8 or not, thus avoiding
409		the problem.
410	405
411	406	=end original
412	407
413		~~これが必要な理論的根拠~~は~~、Un~~ic~~ode~~ ~~がやってくる前に動作する方法に~~
	408	影響を受けるのは C<uc>, C<lc>, C<ucfirst>, C<lcfirst>, C<\U>, C<\L>, C<\u>, C<\l>,
414		~~依存している古いプログラムを壊さないことです。~~
	409	C<\d>, C<\s>, C<\w>, C<\D>, C<\S>, C<\W>, C</.../i>, C<(?i:...)>,
415		~~このような古いプログラムは~~ ASCII ~~文字集合~~の~~みを知っている~~の~~で、追加~~の
	410	C</[[:posix:]]/>, C<quotemeta> です (しかし最後のものは実際には何の問題も
416		~~文字については正しく動作し~~ない~~かも知れません~~。
	411	起こさないはずです)。
417		Perl はプログラムが Unicode を扱えるように準備されていると仮定しますが、
418		文字列がそうでなかった場合、Perl は (EBCDIC プラットフォームでなければ)
419		ASCII のみが求められていると仮定するので、非 ASCII 文字は Unicode に
420		するべきものとして認識しません。
421		C<use feature 'unicode_strings'> は Perl に、文字が UTF-8 で
422		エンコードされているかどうかにかかわらず全ての文字を Unicode として
423		扱うように知らせて、この問題を回避します。
424	412
425	413	=begin original
426	414
427		Ho~~wever,~~ o~~n ea~~rlier Perls, ~~or if~~ you pass strings to ~~sub~~routines out~~side~~
	415	To force Unicode semantics, you can upgrade the internal representation to
428		~~the~~ ~~fea~~ture's s~~cope,~~ ~~you~~ can ~~forc~~e ~~Unicode r~~ules ~~by changing th~~e
	416	by doing C<utf8::upgrade($string)>. This can be used
429		encoding to UTF-8 by doing C<utf8::upgrade($string)>. This can be used
430	417	safely on any string, as it checks and does not change strings that have
431	418	already been upgraded.
432	419
433	420	=end original
434	421
435		~~しかし、以前の P~~erl ~~であっ~~たり、こ~~の機能のスコープの外側のサブルーチンに~~
	422	Unicode の意味論を強制するために、C<utf8::upgrade($string)> とすることで
436		~~文字列~~を~~渡した場合、C<utf8::upgrade($string)> とすること~~で~~エンコーディングを~~
	423	内部表現を昇格できます。
437		UTF-8 にすることで Unicode の規則を強制できます。
438	424	これは既に昇格している文字列は変更しないので、どのような文字列に対しても
439	425	安全に用いることができます。
440	426
441	427	=begin original
442	428
443	429	For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
444	430
445	431	=end original
446	432
447	433	さらなる詳細な議論については、CPAN の L<Unicode::Semantics> を
448	434	参照してください。
449	435
450		=head2 Why do some characters not uppercase or lowercase correctly?
451
452		(なぜいくつかの文字は正しく大文字や小文字にならないの?)
453
454		=begin original
455
456		See the answer to the previous question.
457
458		=end original
459
460		前述の質問の答えを参照してください。
461
462	436	=head2 How can I determine if a string is a text string or a binary string?
463	437
464	438	(文字列がテキスト文字列かバイナリ文字列かを決定するには?)
465	439
466	440	=begin original
467	441
468	442	You can't. Some use the UTF8 flag for this, but that's misuse, and makes well
469	443	behaved modules like Data::Dumper look bad. The flag is useless for this
470	444	purpose, because it's off when an 8 bit encoding (by default ISO-8859-1) is
471	445	used to store the string.
472	446
473	447	=end original
474	448
475	449	それはできません。
476	450	このために UTF8 フラグを使う人もいますが、これは誤用で、Data::Dumper のように
477	451	正しく振る舞うモジュールをおかしくします。
478	452	このフラグはこの目的のためには使えません; なぜなら文字列の保管に 8 ビット
479	453	エンコーディングが使われている場合 (デフォルトでは ISO-8859-1 です)、
480	454	オフだからです。
481	455
482	456	=begin original
483	457
484	458	This is something you, the programmer, has to keep track of; sorry. You could
485	459	consider adopting a kind of "Hungarian notation" to help with this.
486	460
487	461	=end original
488	462
489	463	把握しておく必要があるプログラマに言えることはこれです; ごめんなさい。
490	464	これを助けるために、「ハンガリアン記法」のようなものの採用を
491	465	検討することもできます。
492	466
493	467	=head2 How do I convert from encoding FOO to encoding BAR?
494	468
495	469	(エンコーディング FOO からエンコーディング BAR に変換するには?)
496	470
497	471	=begin original
498	472
499	473	By first converting the FOO-encoded byte string to a text string, and then the
500	474	text string to a BAR-encoded byte string:
501	475
502	476	=end original
503	477
504	478	まず FOO でエンコードされたバイト文字列をテキスト文字列に変化し、
505	479	それからテキスト文字列を BAR エンコードされたバイト文字列に変換します:
506	480
507	481	my $text_string = decode('FOO', $foo_string);
508	482	my $bar_string = encode('BAR', $text_string);
509	483
510	484	=begin original
511	485
512	486	or by skipping the text string part, and going directly from one binary
513	487	encoding to the other:
514	488
515	489	=end original
516	490
517	491	あるいは、テキスト文字列の部分を飛ばして、あるバイナリエンコーディングから
518	492	他のものへ直接変換します:
519	493
520	494	use Encode qw(from_to);
521	495	from_to($string, 'FOO', 'BAR'); # changes contents of $string
522	496
523	497	=begin original
524	498
525	499	or by letting automatic decoding and encoding do all the work:
526	500
527	501	=end original
528	502
529	503	あるいは、自動でデコードとエンコードをさせることで全ての作業を行います:
530	504
531	505	open my $foofh, '<:encoding(FOO)', 'example.foo.txt';
532	506	open my $barfh, '>:encoding(BAR)', 'example.bar.txt';
533	507	print { $barfh } $_ while <$foofh>;
534	508
535	509	=head2 What are C<decode_utf8> and C<encode_utf8>?
536	510
537	511	(C<decode_utf8> と C<encode_utf8> って何?)
538	512
539	513	=begin original
540	514
541	515	These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
542		...)>. ~~Do not use these functions for data exchange. Instead use~~
	516	...)>.
543		C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see
544		L</What's the difference between UTF-8 and utf8?> below.
545	517
546	518	=end original
547	519
548	520	これらは C<decode('utf8', ...)> および C<encode('utf8', ...)> のもう一つの
549	521	文法です。
550		これらの関数をデータ交換に使わないでください。
551		代わりに C<decode('UTF-8', ...)> と C<encode('UTF-8', ...)> を使ってください;
552		後述する L</What's the difference between UTF-8 and utf8?> を
553		参照してください。
554	522
555	523	=head2 What is a "wide character"?
556	524
557	525	(「ワイド文字」って何?)
558	526
559	527	=begin original
560	528
561		This is a term used for characters ~~occupy~~i~~ng more~~ than one byte.
	529	This is a term used both for characters with an ordinal value greater than 127,
	530	characters with an ordinal value greater than 255, or any character occupying
	531	than one byte, depending on the context.
562	532
563	533	=end original
564	534
565		これは、1 ~~バイトで収まらな~~い文字とい~~う意味で使われる用語です。~~
	535	これは文脈に依存して、 127 より大きい序数を持つ文字、255 より大きい序数を
	536	持つ文字、1 バイトで収まらない文字、のいずれかの意味で使われる用語です。
566	537
567	538	=begin original
568	539
569		The Perl warning "Wide character in ..." is caused by ~~such~~ a character.
	540	The Perl warning "Wide character in ..." is caused by a character with an
570		With no specified encoding layer, Perl tries to
	541	ordinal value greater than 255. With no specified encoding layer, Perl tries to
571		fit things into a si~~ngle~~ byte. When it can't, it
	542	fit things in ISO-8859-1 for backward compatibility reasons. When it can't, it
572		emits this warning (if warnings are enabled), and uses UTF-8 encoded data
	543	emits this warning (if warnings are enabled), and outputs UTF-8 encoded data
573	544	instead.
574	545
575	546	=end original
576	547
577		Perl の警告 "Wide character in ..." はそのような文字によって~~引き起こされます。~~
	548	Perl の警告 "Wide character in ..." は 255 より大きい序数を持つ文字によって
578		~~エンコーディング層が指定~~され~~ていない場合、Perl はそれを単一のバイトに~~
	549	引き起こされます。
579		納めよ~~うとします。~~
	550	エンコーディング層が指定されていない場合、Perl は過去互換性の理由によって
	551	文字を ISO-8859-1 に合わせようとします。
580	552	これができないと、(警告が有効なら)この警告が出力され、代わりに UTF-8 で
581		エンコードされたデータ~~を使い~~ます。
	553	エンコードされたデータが出力されます。
582	554
583	555	=begin original
584	556
585	557	To avoid this warning and to avoid having different output encodings in a single
586	558	stream, always specify an encoding explicitly, for example with a PerlIO layer:
587	559
588	560	=end original
589	561
590	562	この警告を回避し、一つのストリームに異なった出力エンコーディングが
591	563	出力されることを回避するには、常に明示的にエンコーディングを指定してください;
592	564	例えば PerlIO 層を使って:
593	565
594	566	binmode STDOUT, ":encoding(UTF-8)";
595	567
596	568	=head1 INTERNALS
597	569
598	570	(内部構造)
599	571
600	572	=head2 What is "the UTF8 flag"?
601	573
602	574	(「UTF8 フラグ」って何?)
603	575
604	576	=begin original
605	577
606	578	Please, unless you're hacking the internals, or debugging weirdness, don't
607	579	think about the UTF8 flag at all. That means that you very probably shouldn't
608	580	use C<is_utf8>, C<_utf8_on> or C<_utf8_off> at all.
609	581
610	582	=end original
611	583
612	584	内部をハックしようとしているか、変なものをデバッグしようとしているのでない
613	585	限り、どうか UTF8 フラグのことは一切考えないでください。
614	586	これは、まず間違いなく C<is_utf8>, C<_utf8_on>, C<_utf8_off> を
615	587	一切使うべきでないことを意味します。
616	588
617	589	=begin original
618	590
619	591	The UTF8 flag, also called SvUTF8, is an internal flag that indicates that the
620	592	current internal representation is UTF-8. Without the flag, it is assumed to be
621		ISO-8859-1. Perl converts between these automatically. ~~(Actually Perl usually~~
	593	ISO-8859-1. Perl converts between these automatically.
622		assumes the representation is ASCII; see L</Why do regex character classes
623		sometimes match only in the ASCII range?> above.)
624	594
625	595	=end original
626	596
627	597	UTF8 フラグ(SvUTF8 とも呼ばれます)は、現在の内部表現が UTF-8 であることを
628	598	示す内部フラグです。
629	599	このフラグがない場合、ISO-8859-1 と仮定します。
630	600	Perl はこれらを自動的に変換します。
631		(実際のところ Perl は普通表現が ASCII であると仮定します; 上述の L</Why do
632		regex character classes sometimes match only in the ASCII range?> を
633		参照してください。)
634	601
635	602	=begin original
636	603
637	604	One of Perl's internal formats happens to be UTF-8. Unfortunately, Perl can't
638	605	keep a secret, so everyone knows about this. That is the source of much
639	606	confusion. It's better to pretend that the internal format is some unknown
640	607	encoding, and that you always have to encode and decode explicitly.
641	608
642	609	=end original
643	610
644	611	Perl の内部表現の一つはたまたま UTF-8 です。
645	612	残念ながら、Perl は秘密を守れないので、このことはみんな知っています。
646	613	これが多くの混乱の源です。
647	614	内部表現は何か分からないエンコーディングで、常に明示的にエンコードと
648	615	デコードが必要ということにしておいた方がよいです。
649	616
650	617	=head2 What about the C<use bytes> pragma?
651	618
652	619	(C<use bytes> プラグマって何?)
653	620
654	621	=begin original
655	622
656	623	Don't use it. It makes no sense to deal with bytes in a text string, and it
657	624	makes no sense to deal with characters in a byte string. Do the proper
658	625	conversions (by decoding/encoding), and things will work out well: you get
659	626	character counts for decoded data, and byte counts for encoded data.
660	627
661	628	=end original
662	629
663	630	これは使わないでください。
664	631	テキスト文字列をバイト単位で扱うことに意味はありませんし、
665	632	バイト文字列を文字単位で扱うことには意味はありません。
666	633	適切な変換(デコードかエンコード)を行えば、物事はうまくいきます:
667	634	デコードしたデータの文字数を得られますし、エンコードしたデータのバイト数を
668	635	得られます。
669	636
670	637	=begin original
671	638
672	639	C<use bytes> is usually a failed attempt to do something useful. Just forget
673	640	about it.
674	641
675	642	=end original
676	643
677	644	C<use bytes> は何か有用なことをしようとするためには間違った方法です。
678	645	これのことは単に忘れてください。
679	646
680	647	=head2 What about the C<use encoding> pragma?
681	648
682	649	(C<use encoding> プラグマって何?)
683	650
684	651	=begin original
685	652
686	653	Don't use it. Unfortunately, it assumes that the programmer's environment and
687	654	that of the user will use the same encoding. It will use the same encoding for
688	655	the source code and for STDIN and STDOUT. When a program is copied to another
689	656	machine, the source code does not change, but the STDIO environment might.
690	657
691	658	=end original
692	659
693	660	これは使わないでください。
694	661	残念ながら、これはプログラマの環境とユーザーの環境が同じであると仮定します。
695	662	これはソースコードと STDIN や STDOUT で同じエンコーディングを使います。
696	663	プログラムが他のマシンにコピーされると、ソースコードは変わりませんが、
697	664	STDIO 環境は変わるかもしれません。
698	665
699	666	=begin original
700	667
701	668	If you need non-ASCII characters in your source code, make it a UTF-8 encoded
702	669	file and C<use utf8>.
703	670
704	671	=end original
705	672
706	673	もしソースコードに非 ASCII 文字が必要なら、ファイルを UTF-8 で
707	674	エンコードして、C<use utf8> を使ってください。
708	675
709	676	=begin original
710	677
711	678	If you need to set the encoding for STDIN, STDOUT, and STDERR, for example
712	679	based on the user's locale, C<use open>.
713	680
714	681	=end original
715	682
716	683	もし STDIN, STDOUT, STDERR のエンコーディングを、例えばユーザーのロケールに
717	684	合わせてセットする必要があるなら、C<use open> してください。
718	685
719	686	=head2 What is the difference between C<:encoding> and C<:utf8>?
720	687
721	688	(C<:encoding> と C<:utf8> の違いは?)
722	689
723	690	=begin original
724	691
725	692	Because UTF-8 is one of Perl's internal formats, you can often just skip the
726	693	encoding or decoding step, and manipulate the UTF8 flag directly.
727	694
728	695	=end original
729	696
730	697	UTF-8 は Perl の内部形式のひとつなので、しばしばエンコードやデコードの
731	698	手順を省略して、UTF8 フラグを直接操作できます。
732	699
733	700	=begin original
734	701
735	702	Instead of C<:encoding(UTF-8)>, you can simply use C<:utf8>, which skips the
736	703	encoding step if the data was already represented as UTF8 internally. This is
737	704	widely accepted as good behavior when you're writing, but it can be dangerous
738	705	when reading, because it causes internal inconsistency when you have invalid
739	706	byte sequences. Using C<:utf8> for input can sometimes result in security
740	707	breaches, so please use C<:encoding(UTF-8)> instead.
741	708
742	709	=end original
743	710
744	711	C<:encoding(UTF-8)> の代わりに単に C<:utf8> を使うことで、もしデータが
745	712	内部で既に UTF8 で表現されていれば、エンコードの手順を省略します。
746	713	これは、書き込むときにはよい振る舞いであると広く受け入れられていますが、
747	714	読み込むときには危険があります; なぜなら不正なバイト列を受け取ると
748	715	内部矛盾を引き起こすからです。
749	716	入力に C<:utf8> を使うとセキュリティ侵害を引き起こす可能性があるので、
750	717	どうか代わりに C<:encoding(UTF-8)> を使ってください。
751	718
752	719	=begin original
753	720
754	721	Instead of C<decode> and C<encode>, you could use C<_utf8_on> and C<_utf8_off>,
755	722	but this is considered bad style. Especially C<_utf8_on> can be dangerous, for
756	723	the same reason that C<:utf8> can.
757	724
758	725	=end original
759	726
760	727	C<decode> と C<encode> の代わりに、C<_utf8_on> と C<_utf8_off> を
761	728	使えますが、これは悪いスタイルと考えられています。
762	729	特に C<_utf8_on> は、C<:utf8> と同じ理由で危険です。
763	730
764	731	=begin original
765	732
766		There are some shortcuts for oneliners;
	733	There are some shortcuts for oneliners; see C<-C> in L<perlrun>.
767		see L<-C\|perlrun/-C [numberE<sol>list]> in L<perlrun>.
768	734
769	735	=end original
770	736
771		一行野郎のための省略形があります; L<perlrun> の
	737	一行野郎のための省略形があります; L<perlrun> の C<-C> を参照してください。
772		L<-C\|perlrun/-C [numberE<sol>list]> を参照してください。
773	738
774	739	=head2 What's the difference between C<UTF-8> and C<utf8>?
775	740
776	741	(C<UTF-8> と C<utf8> の違いは?)
777	742
778	743	=begin original
779	744
780	745	C<UTF-8> is the official standard. C<utf8> is Perl's way of being liberal in
781	746	what it accepts. If you have to communicate with things that aren't so liberal,
782	747	you may want to consider using C<UTF-8>. If you have to communicate with things
783	748	that are too liberal, you may have to use C<utf8>. The full explanation is in
784		L<Encode~~/"UTF-8 vs. utf8 vs. UTF8"~~>.
	749	L<Encode>.
785	750
786	751	=end original
787	752
788	753	C<UTF-8> は公式な標準です。
789	754	C<utf8> は、何を受け入れるかに関して自由な Perl のやり方です。
790	755	もしそれほど自由でないものと対話する必要があるなら、
791	756	C<UTF-8> を使うことを考えたくなるかもしれません。
792	757	自由すぎるものと対話する必要があるなら、C<utf8> を
793	758	使わなければならないかもしれません。
794		完全な説明は L<Encode~~/"UTF-8 vs. utf8 vs. UTF8"~~> にあります。
	759	完全な説明は L<Encode> にあります。
795	760
796	761	=begin original
797	762
798	763	C<UTF-8> is internally known as C<utf-8-strict>. The tutorial uses UTF-8
799	764	consistently, even where utf8 is actually used internally, because the
800	765	distinction can be hard to make, and is mostly irrelevant.
801	766
802	767	=end original
803	768
804	769	C<UTF-8> は内部では C<utf-8-strict> として知られます。
805	770	チュートリアルでは、たとえ内部では実際には utf8 が使われる場合でも
806	771	一貫して UTF-8 を使っています; なぜなら区別をつけるのは難しく、ほとんど
807	772	無意味だからです。
808	773
809	774	=begin original
810	775
811	776	For example, utf8 can be used for code points that don't exist in Unicode, like
812	777	9999999, but if you encode that to UTF-8, you get a substitution character (by
813	778	default; see L<Encode/"Handling Malformed Data"> for more ways of dealing with
814	779	this.)
815	780
816	781	=end original
817	782
818	783	例えば utf8 は、9999999 のような、Unicode に存在しない符号位置も使えますが、
819	784	これを UTF-8 でエンコードすると、代替文字を得ることになります(これは
820	785	デフォルトの場合です; これを扱う他の方法については
821	786	L<Encode/"Handling Malformed Data"> を参照してください。)
822	787
823	788	=begin original
824	789
825	790	Okay, if you insist: the "internal format" is utf8, not UTF-8. (When it's not
826	791	some other encoding.)
827	792
828	793	=end original
829	794
830	795	わかりました、どうしてもと言うのなら:「内部形式」は utf8 であって、
831	796	UTF-8 ではありません。
832	797	(もしその他のエンコーディングでないのなら。)
833	798
834	799	=head2 I lost track; what encoding is the internal format really?
835	800
836	801	(迷子になりました; 実際のところ内部形式のエンコーディングは何?)
837	802
838	803	=begin original
839	804
840	805	It's good that you lost track, because you shouldn't depend on the internal
841	806	format being any specific encoding. But since you asked: by default, the
842	807	internal format is either ISO-8859-1 (latin-1), or utf8, depending on the
843	808	history of the string. On EBCDIC platforms, this may be different even.
844	809
845	810	=end original
846	811
847	812	迷子になったのはよいことです; なぜなら内部形式が特定のエンコーディングで
848	813	あることに依存するべきではないからです。
849	814	しかし聞かれたので答えましょう: デフォルトでは、内部形式は
850	815	ISO-8859-1 (latin-1) か utf8 で、どちらになるかは文字列の歴史に
851	816	依存します。
852	817	EBCDIC プラットフォームでは、これは異なっているかもしれません。
853	818
854	819	=begin original
855	820
856	821	Perl knows how it stored the string internally, and will use that knowledge
857	822	when you C<encode>. In other words: don't try to find out what the internal
858	823	encoding for a certain string is, but instead just encode it into the encoding
859	824	that you want.
860	825
861	826	=end original
862	827
863	828	Perl は文字列が内部でどのように保管されているかを知っていて、この知識を
864	829	C<エンコードする> ときに使います。
865	830	言い換えると: 特定の文字列の内部エンコーディングが何かを
866	831	調べようとしてはいけません; 代わりに、単に望みのエンコーディングに
867	832	エンコードしてください。
868	833
869	834	=head1 AUTHOR
870	835
871	836	Juerd Waalboer <#####@juerd.nl>
872	837
873	838	=head1 SEE ALSO
874	839
875	840	L<perlunicode>, L<perluniintro>, L<Encode>
876	841
877	842	=begin meta
878	843
879		Translate: ~~SHIRA~~K~~ATA K~~entaro <argrath@ub32.org> (5.10.0-)
	844	Translate: Kentaro Shirakata <argrath@ub32.org> (5.10.0-)
880	845	Status: completed
881	846
882	847	=end meta

Powered by Amon2, 翻訳, サイト. Operated by Japan Perl Association