=encoding euc-jp

=head1 NAME

=begin original

Unicode::String - String of Unicode characters (UTF-16BE)

=end original

Unicode::String - Unicode 文字の文字列 (UTF16-BE) 

=head1 SYNOPSIS

 use Unicode::String qw(utf8 latin1 utf16be);

 $u = utf8("string");
 $u = latin1("string");
 $u = utf16be("\0s\0t\0r\0i\0n\0g");

 print $u->utf32be;   # 4 byte characters
 print $u->utf16le;   # 2 byte characters + surrogates
 print $u->utf8;      # 1-4 byte characters

=head1 DESCRIPTION

=begin original

A C<Unicode::String> object represents a sequence of Unicode
characters.  Methods are provided to convert between various external
formats (encodings) and C<Unicode::String> objects, and methods are
provided for common string manipulations.

=end original

C<Unicode::String> オブジェクトは Unicode 文字の並びを表現します。
様々な外部フォーマット(エンコーディング) と
C<Unicode::String> オブジェクトの変換を行うメソッドが提供され、また
基本的な文字列操作のためのメソッドも提供されます。

=begin original

The functions utf32be(), utf32le(), utf16be(), utf16le(), utf8(),
utf7(), latin1(), uhex(), uchr() can be imported from the
C<Unicode::String> module and will work as constructors initializing
strings of the corresponding encoding.

=end original

utf32be(), utf32le(), utf16be(), utf16le(), utf8(), utf7(), latin1(),
uhex(), uchr() といった関数は、C<Unicode::String> モジュールから
インポートでき、対応するエンコーディングの文字列を初期化するための
コンストラクタとしても動作します。

=begin original

The C<Unicode::String> objects overload various operators, which means
that they in most cases can be treated like plain strings.

=end original

C<Unicode::String> オブジェクトは様々な演算子をオーバーロードしているので、
ほとんどの場合で通常の文字列と同じように扱うことができます。

=begin original

Internally a C<Unicode::String> object is represented by a string of 2
byte numbers in network byte order (big-endian). This representation
is not visible by the API provided, but it might be useful to know in
order to predict the efficiency of the provided methods.

=end original

内部的には C<Unicode::String> オブジェクトはネットワークバイト順序
(ビッグエンディアン)で並んだ 2 バイトの文字列です。
この表現は提供されている API からは見えませんが、提供されているメソッドの
効率を予測するためには知っておくと有用でしょう。

=head2 METHODS

(メソッド)

=head2 Class methods

(クラスメソッド)

=begin original

The following class methods are available:

=end original

以下のクラスメソッドが利用可能です:

=over 4

=item Unicode::String->stringify_as

=item Unicode::String->stringify_as( $enc )

=begin original

This method is used to specify which encoding will be used when
C<Unicode::String> objects are implicitly converted to and from plain
strings.

=end original

このメソッドは、 通常の文字列と C<Unicode::String> との間で暗黙的な変換が
行われるときに使用されるエンコーディングを指定するものです。 

=begin original

If an argument is provided it sets the current encoding.  The argument
should have one of the following: "ucs4", "utf32", "utf32be",
"utf32le", "ucs2", "utf16", "utf16be", "utf16le", "utf8", "utf7",
"latin1" or "hex".  The default is "utf8".

=end original

If an argument is provided it sets the current encoding.  The argument
should have one of the following: "ucs4", "utf32", "utf32be",
"utf32le", "ucs2", "utf16", "utf16be", "utf16le", "utf8", "utf7",
"latin1" or "hex".  The default is "utf8".
(TBT)

=begin original

The stringify_as() method returns a reference to the current encoding
function.

=end original

The stringify_as() method returns a reference to the current encoding
function.
(TBT)

=item $us = Unicode::String->new

=item $us = Unicode::String->new( $initial_value )

=begin original

This is the object constructor.  Without argument, it creates an empty
C<Unicode::String> object.  If an $initial_value argument is given, it
is decoded according to the specified stringify_as() encoding, UTF-8
by default.

=end original

これはオブジェクトコンストラクタです。
引数が与えられなかった場合、このメソッドは空の
C<Unicode::String> オブジェクトを返します。
$initial_value 引数が与えられた場合には、
stringify_as() エンコーディングの指定に従ってデコードが行われます;
デフォルトは UTF-8 です。

=begin original

In general it is recommended to import and use one of the encoding
specific constructor functions instead of invoking this method.

=end original

In general it is recommended to import and use one of the encoding
specific constructor functions instead of invoking this method.
(TBT)

=back

=head2 Encoding methods

=begin original

These methods get or set the value of the C<Unicode::String> object by
passing strings in the corresponding encoding.  If a new value is
passed as argument it will set the value of the C<Unicode::String>,
and the previous value is returned.  If no argument is passed then the
current value is returned.

=end original

These methods get or set the value of the C<Unicode::String> object by
passing strings in the corresponding encoding.  If a new value is
passed as argument it will set the value of the C<Unicode::String>,
and the previous value is returned.  If no argument is passed then the
current value is returned.
(TBT)

=begin original

To illustrate the encodings we show how the 2 character sample string
of "E<micro>m" (micro meter) is encoded for each one.

=end original

To illustrate the encodings we show how the 2 character sample string
of "E<micro>m" (micro meter) is encoded for each one.
(TBT)

=over 4

=item $us->utf32be

=item $us->utf32be( $newval )

=begin original

The string passed should be in the UTF-32 encoding with bytes in big
endian order.  The sample "E<micro>m" is "\0\0\0\xB5\0\0\0m" in this encoding.

=end original

The string passed should be in the UTF-32 encoding with bytes in big
endian order.  The sample "E<micro>m" is "\0\0\0\xB5\0\0\0m" in this encoding.
(TBT)

=begin original

Alternative names for this method are utf32() and ucs4().

=end original

Alternative names for this method are utf32() and ucs4().
(TBT)

=item $us->utf32le

=item $us->utf32le( $newval )

=begin original

The string passed should be in the UTF-32 encoding with bytes in little
endian order.  The sample "E<micro>m" is is "\xB5\0\0\0m\0\0\0" in this encoding.

=end original

The string passed should be in the UTF-32 encoding with bytes in little
endian order.  The sample "E<micro>m" is is "\xB5\0\0\0m\0\0\0" in this encoding.
(TBT)

=item $us->utf16be

=item $us->utf16be( $newval )

=begin original

The string passed should be in the UTF-16 encoding with bytes in big
endian order. The sample "E<micro>m" is "\0\xB5\0m" in this encoding.

=end original

The string passed should be in the UTF-16 encoding with bytes in big
endian order. The sample "E<micro>m" is "\0\xB5\0m" in this encoding.
(TBT)

=begin original

Alternative names for this method are utf16() and ucs2().

=end original

Alternative names for this method are utf16() and ucs2().
(TBT)

=begin original

If the string passed to utf16be() starts with the Unicode byte order
mark in little endian order, the result is as if utf16le() was called
instead.

=end original

If the string passed to utf16be() starts with the Unicode byte order
mark in little endian order, the result is as if utf16le() was called
instead.
(TBT)

=item $us->utf16le

=item $us->utf16le( $newval )

=begin original

The string passed should be in the UTF-16 encoding with bytes in
little endian order.  The sample "E<micro>m" is is "\xB5\0m\0" in this
encoding.  This is the encoding used by the Microsoft Windows API.

=end original

The string passed should be in the UTF-16 encoding with bytes in
little endian order.  The sample "E<micro>m" is is "\xB5\0m\0" in this
encoding.  This is the encoding used by the Microsoft Windows API.
(TBT)

=begin original

If the string passed to utf16le() starts with the Unicode byte order
mark in big endian order, the result is as if utf16le() was called
instead.

=end original

If the string passed to utf16le() starts with the Unicode byte order
mark in big endian order, the result is as if utf16le() was called
instead.
(TBT)

=item $us->utf8

=item $us->utf8( $newval )

=begin original

The string passed should be in the UTF-8 encoding. The sample "E<micro>m" is
"\xC2\xB5m" in this encoding.

=end original

渡される文字列は UTF-7 エンコーディングです。
例として、"E<micro>m" はこのエンコーディングでは "\xC2\xB5m" となります。

=item $us->utf7

=item $us->utf7( $newval )

=begin original

The string passed should be in the UTF-7 encoding. The sample "E<micro>m" is
"+ALU-m" in this encoding.

=end original

渡される文字列は UTF-7 エンコーディングです。
例として、"E<micro>m" はこのエンコーディングでは "+ALU-m" となります。

=begin original

The UTF-7 encoding only use plain US-ASCII characters for the
encoding.  This makes it safe for transport through 8-bit stripping
protocols.  Characters outside the US-ASCII range are base64-encoded
and '+' is used as an escape character.  The UTF-7 encoding is
described in RFC 1642.

=end original

UTF-7エンコーディングは、プレーンな US-ASCII 文字だけを用いる
エンコーディングです。
これは 8 ビット目を落としてしまうような転送に対しても安全なものにします。
US-ASCII の範囲にない文字は base64 エンコードされ、
エスケープ文字として '+' が使われます。
UTF-7 エンコーディングは RFC 1642 に記述されています。 

=begin original

If the (global) variable $Unicode::String::UTF7_OPTIONAL_DIRECT_CHARS
is TRUE, then a wider range of characters are encoded as themselves.
It is even TRUE by default.  The characters affected by this are:

=end original

(グローバル)変数 $Unicode::String::UTF7_OPTIONAL_DIRECT_CHARS が
真である場合には、文字エンコードの対象となる範囲が広がります。
デフォルトでは真にセットされています。
影響を受ける文字は以下の通りです:

   ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }

=item $us->latin1

=item $us->latin1( $newval )

=begin original

The string passed should be in the ISO-8859-1 encoding. The sample "E<micro>m" is
"\xB5m" in this encoding.

=end original

渡される文字列は ISO-8859-1 エンコーディングです。
例として、"E<micro>m" はこのエンコーディングでは "\xB5m" となります。

=begin original

Characters outside the "\x00" .. "\xFF" range are simply removed from
the return value of the latin1() method.  If you want more control
over the mapping from Unicode to ISO-8859-1, use the C<Unicode::Map8>
class.  This is also the way to deal with other 8-bit character sets.

=end original

latin1() メソッドの返り値から、"\x00" から "\xFF" の範囲にない文字は
単に削除されます。
Unicode から ISO-8859-1 のマッピングをもっと自分で制御したいというのであれば、
C<Unicode::Map8> クラスを使用してください。
このクラスは他の 8 ビット文字クラスを扱うときも同様です。 

=item $us->hex

=item $us->hex( $newval )

=begin original

The string passed should be plain ASCII where each Unicode character
is represented by the "U+XXXX" string and separated by a single space
character.  The "U+" prefix is optional when setting the value.  The
sample "E<micro>m" is "U+00b5 U+006d" in this encoding.

=end original

渡される文字列は、それぞれの文字が "U+XXXX" で表現され、
一つの空白で区切られる、プレーンな ASCII です。
値を設定する場合は "U+" 接頭辞は省略可能です。
例として、"E<micro>m" はこのエンコーディングでは "U+00b5 U+006d" となります。

=back

=head2 String Operations

=begin original

The following methods are available:

=end original

以下のメソッドが使用可能です:

=over 4

=item $us->as_string

=begin original

Converts a C<Unicode::String> to a plain string according to the
setting of stringify_as().  The default stringify_as() encoding is
"utf8".

=end original

C<Unicode::String> を stringify_as() の設定にしたがってプレーンな文字列に
変換します。
デフォルトの stringify_as() エンコーディングは "utf8" です。 

=item $us->as_num

=begin original

Converts a C<Unicode::String> to a number.  Currently only the digits
in the range 0x30 .. 0x39 are recognized.  The plan is to eventually
support all Unicode digit characters.

=end original

C<Unicode::String> を数値に変換します。
現時点では、0x30 から 0x39 までのみが数字として認識されます。
計画では、全ての Unicode の数字文字をサポートする予定です。 

=item $us->as_bool

=begin original

Converts a C<Unicode::String> to a boolean value.  Only the empty
string is FALSE.  A string consisting of only the character U+0030 is
considered TRUE, even if Perl consider "0" to be FALSE.

=end original

C<Unicode::String> を真偽値に変換します。
空文字列のみが FALSE となります。
Perl が "0" を FALSE とみなすにも関わらず、U+0030 という文字のみの
文字列は TRUE と判定されます。 

=item $us->repeat( $count )

=begin original

Returns a new C<Unicode::String> where the content of $us is repeated
$count times.  This operation is also overloaded as:

=end original

$us の内容を $count 回繰り返した新しい C<Unicode::String> を返します。
この操作はまた、以下のようにオーバーロードされています:

  $us x $count

=item $us->concat( $other_string )

=begin original

Concatenates the string $us and the string $other_string.  If
$other_string is not an C<Unicode::String> object, then it is first
passed to the Unicode::String->new constructor function.  This
operation is also overloaded as:

=end original

文字列 $us と文字列 $other_string を連結します。
$other_string が C<Unicode::String> オブジェクトでない場合には、
まず最初に Unicode::string->new コンストラクタに渡されます。
この操作はまた、以下のようにオーバーロードされています:

  $us . $other_string


=item $us->append( $other_string )

=begin original

Appends the string $other_string to the value of $us.  If
$other_string is not an C<Unicode::String> object, then it is first
passed to the Unicode::String->new constructor function.  This
operation is also overloaded as:

=end original

文字列 $other_string を $us に追加します。
$other_string が C<Unicode::String> オブジェクトでなかった場合には、
追加に先立って Unicode::String->new コンストラクタに
$other_string が渡されます。
この操作はまた、以下のようにオーバーロードされています:

  $us .= $other_string

=item $us->copy

=begin original

Returns a copy of the current C<Unicode::String> object.  This
operation is overloaded as the assignment operator.

=end original

C<Unicode::String> のカレントオブジェクトのコピーを返します。
この操作は代入演算子としてオーバーロードされています。

=item $us->length

=begin original

Returns the length of the C<Unicode::String>.  Surrogate pairs are
still counted as 2.

=end original

C<Unicode::String> の長さを返します。
サロゲートペアの長さは 2 として数えられます。 

=item $us->byteswap

=begin original

This method will swap the bytes in the internal representation of the
C<Unicode::String> object.

=end original

This method will swap the bytes in the internal representation of the
C<Unicode::String> object.
(TBT)

=begin original

Unicode reserve the character U+FEFF character as a byte order mark.
This works because the swapped character, U+FFFE, is reserved to not
be valid.  For strings that have the byte order mark as the first
character, we can guaranty to get the byte order right with the
following code:

=end original

Unicode reserve the character U+FEFF character as a byte order mark.
This works because the swapped character, U+FFFE, is reserved to not
be valid.  For strings that have the byte order mark as the first
character, we can guaranty to get the byte order right with the
following code:
(TBT)

   $ustr->byteswap if $ustr->ord == 0xFFFE;

=item $us->unpack

=begin original

Returns a list of integers each representing an UCS-2 character code.

=end original

それぞれが UCS-2 文字コードを表している整数値のリストを返します。

=item $us->pack( @uchr )

=begin original

Sets the value of $us as a sequence of UCS-2 characters with the
characters codes given as parameter.

=end original

$us の値として、引数で渡された UCS-2 文字の並びをセットします。 

=item $us->ord

=begin original

Returns the character code of the first character in $us.  The ord()
method deals with surrogate pairs, which gives us a result-range of
0x0 .. 0x10FFFF.  If the $us string is empty, undef is returned.

=end original

$us の最初の文字の文字コードを返します。
ord()メソッドはサロゲートペアにも対応していて、0x0 から 0x10FFFF の範囲で
結果を返します。
$us が空文字列であった場合には undef が返されます。 

=item $us->chr( $code )

=begin original

Sets the value of $us to be a string containing the character assigned
code $code.  The argument $code must be an integer in the range 0x0
.. 0x10FFFF.  If the code is greater than 0xFFFF then a surrogate pair
created.

=end original

$us に $code で指示された文字から構成される文字列をセットします。
引数 $code は 0x0 から 0x10FFFF の間の整数でなければなりません。
コードが 0xFFFF よりも大きい場合にはサロゲートペアが生成されます。 

=item $us->name

=begin original

In scalar context returns the official Unicode name of the first
character in $us.  In array context returns the name of all characters
in $us.  Also see L<Unicode::CharName>.

=end original

スカラコンテキストでは $us の最初の文字の公式 Unicode 名を返します。
配列コンテキストでは $us の全ての文字の名前を返します。
L<Unicode::CharName> も参照してください。 

=item $us->substr( $offset )

=item $us->substr( $offset, $length )

=item $us->substr( $offset, $length, $subst )

=begin original

Returns a sub-string of $us.  Works similar to the builtin substr()
function.

=end original

$us の部分文字列を返します。
組み込みの substr() 関数と同様です。

=item $us->index( $other )

=item $us->index( $other, $pos )

=begin original

Locates the position of $other within $us, possibly starting the
search at position $pos.

=end original

$us 中での $other の位置です; $pos を起点として検索することも可能です。 

=item $us->chop

=begin original

Chops off the last character of $us and returns it (as a
C<Unicode::String> object).

=end original

$us の最後の文字を切り取って、それを
(C<Unicode::String> オブジェクトとして)返します。 

=back

=head1 FUNCTIONS

(関数)

=begin original

The following functions are provided.  None of these are exported by default.

=end original

以下の関数が提供されます。
デフォルトでエクスポートされる関数はありません。

=over 4

=item byteswap2( $str, ... )

=begin original

This function will swap 2 and 2 bytes in the strings passed as
arguments.  If this function is called in void context,
then it will modify its arguments in-place.  Otherwise, the swapped
strings are returned.

=end original

This function will swap 2 and 2 bytes in the strings passed as
arguments.  If this function is called in void context,
then it will modify its arguments in-place.  Otherwise, the swapped
strings are returned.
(TBT)

=item byteswap4( $str, ... )

=begin original

The byteswap4 function works similar to byteswap2, but will reverse
the order of 4 and 4 bytes.

=end original

The byteswap4 function works similar to byteswap2, but will reverse
the order of 4 and 4 bytes.
(TBT)

=item latin1( $str )

=item utf7( $str )

=item utf8( $str )

=item utf16le( $str )

=item utf16be( $str )

=item utf32le( $str )

=item utf32be( $str )

=begin original

Constructor functions for the various Unicode encodings.  These return
new C<Unicode::String> objects.  The provided argument should be
encoded correspondingly.

=end original

Constructor functions for the various Unicode encodings.  These return
new C<Unicode::String> objects.  The provided argument should be
encoded correspondingly.
(TBT)

=item uhex( $str )

=begin original

Constructs a new C<Unicode::String> object from a string of hex
values.  See hex() method above for description of the format.

=end original

Constructs a new C<Unicode::String> object from a string of hex
values.  See hex() method above for description of the format.
(TBT)

=item uchar( $num )

=begin original

Constructs a new one character C<Unicode::String> object from a
Unicode character code.  This works similar to perl's builtin chr()
function.

=end original

Constructs a new one character C<Unicode::String> object from a
Unicode character code.  This works similar to perl's builtin chr()
function.
(TBT)

=back

=head1 SEE ALSO

L<Unicode::CharName>,
L<Unicode::Map8>

L<http://www.unicode.org/>

L<perlunicode>

=head1 COPYRIGHT

Copyright 1997-2000,2005 Gisle Aas.

This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

=begin meta

Created: KIMURA Koichi (2.02)
Updated: Kentaro Shirakata <argrath@ub32.org> (2.09)

=end meta

=cut