Perlの組み込み関数 quotemeta の翻訳

perl-5.42.0

編集

変更履歴

誤訳の報告

原文を表示/隠す

quotemeta EXPR

quotemeta

Returns the value of EXPR with all the ASCII non-"word" characters backslashed. (That is, all ASCII characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the \Q escape in double-quoted strings. (See below for the behavior on non-ASCII code points.)

EXPR の中のすべての ASCII 非英数字キャラクタをバックスラッシュでエスケープしたものを返します。 (つまり、/[A-Za-z_0-9]/ にマッチしない全ての ASCII 文字の前にはロケールに関わらずバックスラッシュが前置されます。) これは、ダブルクォート文字列での \Q エスケープを実装するための内部関数です。 (非 ASCII 符号位置での振る舞いについては以下を参照してください。)

If EXPR is omitted, uses $_.

EXPR が省略されると、$_ を使います。

The motivation behind this is to make all characters in EXPR match their literal selves. Otherwise any metacharacters in it could trigger their "magic" matching behaviors. The characters this function has been applied to are said to be "quoted" or "escaped".

これの裏側にある動機は、EXPR の全ての文字がリテラルなそれ自身にマッチングするようにすることです。さもなければ、この中のメタ文字がその「マジカルな」マッチングの振る舞いを引き起こす可能性があります。この関数が適用された文字は、「クォートされた」(quoted)または「エスケープされた」(escaped)と呼ばれます。

quotemeta (and \Q ... \E) are useful when interpolating strings into regular expressions, because by default an interpolated variable will be considered a mini-regular expression. For example:

クォートメタ (と \Q ... \E) は、文字列を正規表現に展開するのに便利です; なぜなら、デフォルトでは展開された変数は小さな正規表現として扱われるからです。例えば:

    my $sentence = 'The quick brown fox jumped over the lazy dog';
    my $substring = 'quick.*?fox';
    $sentence =~ s{$substring}{big bad wolf};

Will cause $sentence to become 'The big bad wolf jumped over...'.

とすると、$sentence は 'The big bad wolf jumped over...' になります。

On the other hand:

一方:

    my $sentence = 'The quick brown fox jumped over the lazy dog';
    my $substring = 'quick.*?fox';
    $sentence =~ s{\Q$substring\E}{big bad wolf};

Or:

あるいは:

    my $sentence = 'The quick brown fox jumped over the lazy dog';
    my $substring = 'quick.*?fox';
    my $quoted_substring = quotemeta($substring);
    $sentence =~ s{$quoted_substring}{big bad wolf};

Will both leave the sentence as is. Normally, when accepting literal string input from the user, quotemeta or \Q must be used.

とすると、両方ともそのままです。普通は、ユーザーからのリテラルな文字列入力を受け付ける場合は、必ず quotemeta か \Q を使わなければなりません。

Beware that if you put literal backslashes (those not inside interpolated variables) between \Q and \E, double-quotish backslash interpolation may lead to confusing results. If you need to use literal backslashes within \Q...\E, consult "Gory details of parsing quoted constructs" in perlop.

リテラルな逆スラッシュ (変数置換の中でないもの) を \Q と \E の間に置くと、ダブルクォート風逆スラッシュ変数置換は混乱した結果を引き起こすことがあることに注意してください。 \Q...\E の中でリテラルな逆スラッシュを使うことが必要なら、 "Gory details of parsing quoted constructs" in perlop を参照してください。

Because the result of "\Q STRING \E" has all metacharacters quoted, there is no way to insert a literal $ or @ inside a \Q\E pair. If protected by \, $ will be quoted to become "\\\$"; if not, it is interpreted as the start of an interpolated scalar.

"\Q STRING \E" の結果では全てのメタ文字がクォートされているので、 \Q\E の組の中にリテラルな $ や @ を挿入する方法はありません。 \ で保護すると、$ は "\\\$" になってクォートされます; 保護しないと、変数展開されるスカラの開始として解釈されます。

In Perl v5.14, all non-ASCII characters are quoted in non-UTF-8-encoded strings, but not quoted in UTF-8 strings.

Perl v5.14 では、全ての非 ASCII 文字は非 UTF-8 エンコードされた文字列ではクォートされませんが、UTF-8 文字列ではクォートされます。

Starting in Perl v5.16, Perl adopted a Unicode-defined strategy for quoting non-ASCII characters; the quoting of ASCII characters is unchanged.

Perl v5.16 から、Perl は非 ASCII 文字をクォートするのに Unicode で定義された戦略を採用しました; ASCII 文字のクォートは変わりません。

Also unchanged is the quoting of non-UTF-8 strings when outside the scope of a use feature 'unicode_strings', which is to quote all characters in the upper Latin1 range. This provides complete backwards compatibility for old programs which do not use Unicode. (Note that unicode_strings is automatically enabled within the scope of a use v5.12 or greater.)

また、 use feature 'unicode_strings' の範囲外で非 UTF-8 文字列をクォートするのも変わりません; 上位の Latin1 の範囲の全ての文字をクォートします。これは Unicode を使わない古いプログラムに対して完全な後方互換性を提供します。 (unicode_strings は use v5.12 またはそれ以上のスコープでは自動的に有効になることに注意してください。)

Within the scope of use locale, all non-ASCII Latin1 code points are quoted whether the string is encoded as UTF-8 or not. As mentioned above, locale does not affect the quoting of ASCII-range characters. This protects against those locales where characters such as "|" are considered to be word characters.

use locale スコープの内側では、全ての非 ASCII Latin1 符号位置は文字列が UTF-8 でエンコードされているかどうかに関わらずクォートされます。上述のように、ロケールは ASCII の範囲の文字のクォートに影響を与えません。これは "|" のような文字が単語文字として考えられるロケールから守ります。

Otherwise, Perl quotes non-ASCII characters using an adaptation from Unicode (see https://www.unicode.org/reports/tr31/). The only code points that are quoted are those that have any of the Unicode properties: Pattern_Syntax, Pattern_White_Space, White_Space, Default_Ignorable_Code_Point, or General_Category=Control.

さもなければ、Perl は Unicode からの本版を使って非 ASCII 文字をクォートします (https://www.unicode.org/reports/tr31/ 参照)。クォートされる符号位置は以下のどれかの Unicode を特性を持つものだけです: Pattern_Syntax, Pattern_White_Space, White_Space, Default_Ignorable_Code_Point, or General_Category=Control。

Of these properties, the two important ones are Pattern_Syntax and Pattern_White_Space. They have been set up by Unicode for exactly this purpose of deciding which characters in a regular expression pattern should be quoted. No character that can be in an identifier has these properties.

これらの特性の中で、重要な二つは Pattern_Syntax と Pattern_White_Space です。これらはまさに正規表現中パターン中のどの文字をクォートするべきかを決定するという目的のために Unicode によって設定されています。識別子になる文字はこれらの特性はありません。

Perl promises, that if we ever add regular expression pattern metacharacters to the dozen already defined (\ | ( ) [ { ^ $ * + ? .), that we will only use ones that have the Pattern_Syntax property. Perl also promises, that if we ever add characters that are considered to be white space in regular expressions (currently mostly affected by /x), they will all have the Pattern_White_Space property.

Perl は、正規表現メタ文字として既に定義されている (\ | ( ) [ { ^ $ * + ? .) ものに追加するときは、 Pattern_Syntax 特性を持つものだけを使うことを約束します。 Perl はまた、(現在の所ほとんどは /x よって影響される)正規表現中で空白と考えられる文字に追加するときは、Pattern_White_Space 特性を持つものであることを約束します。

Unicode promises that the set of code points that have these two properties will never change, so something that is not quoted in v5.16 will never need to be quoted in any future Perl release. (Not all the code points that match Pattern_Syntax have actually had characters assigned to them; so there is room to grow, but they are quoted whether assigned or not. Perl, of course, would never use an unassigned code point as an actual metacharacter.)

Unicode はこれら二つの特性を持つ符号位置の集合が決して変わらないことを約束しているので、v5.16 でクォートされないものは将来の Perl リリースでもクォートする必要はありません。 (Pattern_Syntax にマッチングする全ての符号位置が実際に割り当てられている文字を持っているわけではありません; したがって拡張する余地がありますが、割り当てられているかどうかに関わらずクォートされます。 Perl はもちろん割り当てられていない符号位置を実際のメタ文字として使うことはありません。)

Quoting characters that have the other 3 properties is done to enhance the readability of the regular expression and not because they actually need to be quoted for regular expression purposes (characters with the White_Space property are likely to be indistinguishable on the page or screen from those with the Pattern_White_Space property; and the other two properties contain non-printing characters).

その他の 3 特性を持つ文字のクォートは正規表現の可読性を向上させるために行われ、実際には正規表現の目的でクォートする必要があるからではありません (White_Space 特性を持つ文字は表示上は Pattern_White_Space 特性を持つ文字とおそらく区別が付かないでしょう; そして残りの二つの特性は非表示文字を含んでいます).