Perlの組み込み関数 m の翻訳

perl-5.40.0

m/PATTERN/msixpodualngc

Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails. If no string is specified via the =~ or !~ operator, the $_ string is searched. (The string specified with =~ need not be an lvalue--it may be the result of an expression evaluation, but remember the =~ binds rather tightly.) See also perlre.

パターンマッチで文字列検索を行ない、スカラコンテキストでは成功したときは真、失敗したときは偽を返します。 =~ 演算子か !~ 演算子で検索対象の文字列を示さなかったときには、 $_ の文字列が検索対象となります。 (=~ で指定される文字列は、左辺値である必要はありません-- 式を評価した結果でもかまいませんが、=~ の優先順位がいくぶん高いことに注意してください。) perlre も参照してください。

Options are as described in qr// above; in addition, the following match process modifiers are available:

オプションは上述した qr// に記述されています; さらに、以下のマッチング処理修飾子が利用可能です:

 g  Match globally, i.e., find all occurrences.
 c  Do not reset search position on a failed match when /g is
    in effect.

 g  グローバルにマッチング、つまり、すべてを探し出す。
 c  /g が有効なとき、マッチングに失敗しても検索位置をリセットしない。

If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace (ASCII) characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter, then a match-only-once rule applies, described in m?PATTERN? below. If "'" (single quote) is the delimiter, no variable interpolation is performed on the PATTERN. When using a delimiter character valid in an identifier, whitespace is required after the m.

区切文字が "/" のときには、最初の m は付けても付けなくてもかまいません。 m を付けるときには、(ASCII の)空白でもない、任意の文字のペアを区切文字として使うことができます。これは特に、"/" を含むパス名にパターンマッチングを行なうときに、 LTS (傾斜楊枝症候群) を避けるために便利でしょう。 "?" がデリミタなら、後述する m?PATTERN? にある「一度だけマッチング」ルールが適用されます。 "'" (シングルクォート) がデリミタの場合、PATTERN に対する変数展開は行われません。識別子として有効な区切り文字を使う場合、m の後に空白が必要です。

PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated, except for when the delimiter is a single quote. (Note that $(, $), and $| are not interpolated because they look like end-of-string tests.) Perl will not recompile the pattern unless an interpolated variable that it contains changes. You can force Perl to skip the test and never recompile by adding a /o (which stands for "once") after the trailing delimiter. Once upon a time, Perl would recompile regular expressions unnecessarily, and this modifier was useful to tell it not to do so, in the interests of speed. But now, the only reasons to use /o are one of:

PATTERN には、変数が含まれていてもよく、パターンが評価されるごとに、 (デリミタがシングルクォートでない限り)変数は展開され (パターンが再コンパイルされ) ます。 (変数 $(, $), $| は文字列の終わりを調べるパターンであると解釈されるので、展開されません。) Perl は展開された変数の値が変更されない限りパターンを再コンパイルしません。デリミタに引き続いて /o ("once" を意味します) を追加することで、テストを飛ばして再コンパイルしないようにすることができます。昔々、Perl は不必要に正規表現を再コンパイルしていたので、速度に関心がある場合は再コンパイルしないようにするためにこの修飾子は有用でした。しかし今では、/o を使う理由は以下のいずれかだけです:

The variables are thousands of characters long and you know that they don't change, and you need to wring out the last little bit of speed by having Perl skip testing for that. (There is a maintenance penalty for doing this, as mentioning /o constitutes a promise that you won't change the variables in the pattern. If you do change them, Perl won't even notice.)

変数が数千文字の長さで、これが変更されないことが分かっており、これに対するテストを飛ばすことであともう少しだけ速度を稼ぐ必要がある。 (こうすることには保守上のペナルティがあります; なぜなら /o と言及することでパターン内の変数を変更しないことを約束したことになるからです。変更しても、Perl は気づきもしません。)

you want the pattern to use the initial values of the variables regardless of whether they change or not. (But there are saner ways of accomplishing this than using /o.)

変数が変更されようが変更されまいが、変数の初期値を使ったパターンがほしい。 (しかしこれを達成するための、/o を使うよりもまともな方法があります。)

If the pattern contains embedded code, such as

以下のようにパターンに組み込みコードが含まれている場合

    use re 'eval';
    $code = 'foo(?{ $x })';
    /$code/

then perl will recompile each time, even though the pattern string hasn't changed, to ensure that the current value of $x is seen each time. Use /o if you want to avoid this.

$x の現在の値を毎回確認するために、例えパターン文字列が変更されていなくても、毎回再コンパイルされます。これを避けたい場合は /o を使ってください。

The bottom line is that using /o is almost never a good idea.

結論としては、/o を使うことがいい考えであることはほとんどありません。

(空パターン //)

If the PATTERN evaluates to the empty string, the last successfully matched regular expression in the current dynamic scope is used instead (see also "Scoping Rules of Regex Variables" in perlvar). In this case, only the g and c flags on the empty pattern are honored; the other flags are taken from the original pattern. If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match). Using a user supplied string as a pattern has the risk that if the string is empty that it triggers the "last successful match" behavior, which can be very confusing. In such cases you are recommended to replace m/$pattern/ with m/(?:$pattern)/ to avoid this behavior.

PATTERN を評価した結果が空文字列となった場合には、現在の動的スコープで最後にマッチングに 成功した 正規表現が、代わりに使われます ("Scoping Rules of Regex Variables" in perlvar も参照)。この場合、空パターンに対して g の c フラグだけが有効です; その他のフラグは元のパターンから取られます。以前に成功したマッチングがない場合、これは(暗黙に)真の空パターンとして動作します(つまり常にマッチングします)。ユーザーが提供した文字列をパターンとして使うことは、文字列が空だった場合に「最後に成功したマッチング」の振る舞いを引き起こし、とても混乱するというリスクがあります。このような場合では、この振る舞いを避けるために、 m/$pattern/ を m/(?:$pattern)/ に置き換えることを勧めます。

The last successful pattern may be accessed as a variable via ${^LAST_SUCCESSFUL_PATTERN}. Matching against it, or the empty pattern should have the same effect, with the exception that when there is no last successful pattern the empty pattern will silently match, whereas using the ${^LAST_SUCCESSFUL_PATTERN} variable will produce undefined warnings (if warnings are enabled). You can check defined(${^LAST_SUCCESSFUL_PATTERN}) to test if there is a "last successful match" in the current scope.

最後に成功したパターンは、 ${^LAST_SUCCESSFUL_PATTERN} 変数経由でアクセスされます。これと、空文字列に対するマッチングは同じ効果を持ちますが、空文字列が暗黙にマッチングする最後に成功したパターンはないけれども、 ${^LAST_SUCCESSFUL_PATTERN} 変数を使うと(警告がが有効なら) 未定義警告発生するという例外があります。現在のスコープで「最後に成功したマッチング」があるかどうかをテストにするには、 defined(${^LAST_SUCCESSFUL_PATTERN}) とします。

Note that it's possible to confuse Perl into thinking // (the empty regex) is really // (the defined-or operator). Perl is usually pretty good about this, but some pathological cases might trigger this, such as $x/// (is that ($x) / (//) or $x // /?) and print $fh // (print $fh(// or print($fh //?). In all of these examples, Perl will assume you meant defined-or. If you meant the empty regex, just use parentheses or spaces to disambiguate, or even prefix the empty regex with an m (so // becomes m//).

Perl が // (空正規表現) と // (定義性和演算子) を混同する可能性があることに注意してください。 Perl は普通これをかなりうまく処理しますが、$x/// (($x) / (//) それとも $x // /?) や print $fh // (print $fh(// それとも print($fh //?) のような病的な状況ではこれが起こりえます。これらの例の全てでは、Perl は定義性和を意味していると仮定します。もし空正規表現を意味したいなら、あいまいさをなくすために単にかっこや空白を使うか、空正規表現に接頭辞 m を付けてください (つまり // を m// にします)。

(リストコンテキストでのマッチング)

If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, that is, ($1, $2, $3...) (Note that here $1 etc. are also set). When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure.

/g オプションが使われなかった場合、リストコンテキストでのm//はパターンの中の括弧で括られた部分列にマッチしたもので構成されるリストを返します; これは、($1, $2, $3, ...) ということです (この場合、$1 なども設定されます)。パターンに括弧がない場合は、返り値は成功時はリスト (1) です。括弧のあるなしに関わらず、失敗時は空リストを返します。

Examples:

例:

 open(TTY, "+</dev/tty")
    || die "can't access /dev/tty: $!";

 <TTY> =~ /^y/i && foo();       # do foo if desired

 if (/Version: *([0-9.]*)/) { $version = $1; }

 next if m#^/usr/spool/uucp#;

 # poor man's grep
 $arg = shift;
 while (<>) {
    print if /$arg/;
 }
 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))

This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were assigned; that is, if the pattern matched.

最後の例は、$foo を最初の 2 つの単語と行の残りに分解し、 $F1 と $F2 と $Etc に代入しています。変数に代入されれば、すなわちパターンがマッチすれば、 if の条件が真となります。

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

/g 修飾子は、グローバルなパターンマッチを指定するもので、文字列の中で可能な限りたくさんマッチを行ないます。この動作は、コンテキストに依存します。リストコンテキストでは、正規表現内の括弧付けされたものにマッチした部分文字列のリストが返されます。括弧がなければ、パターン全体を括弧で括っていたかのように、すべてのマッチした文字列のリストが返されます。

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.

スカラコンテキストでは、m//g を実行する毎に次のマッチを探します; マッチした場合は真を返し、もうマッチしなくなったら偽を返します。最後のマッチの位置は pos() 関数で読み出しや設定ができます; "pos" in perlfunc を参照して下さい。マッチに失敗すると通常は検索位置を文字列の先頭にリセットしますが、 /c 修飾子をつける(例えば m//gc)ことでこれを防ぐことができます。ターゲットとなる文字列が変更された場合も検索位置はリセットされます。

(\G アサート)

You can intermix m//g matches with m/\G.../g, where \G is a zero-width assertion that matches the exact position where the previous m//g, if any, left off. Without the /g modifier, the \G assertion still anchors at pos() as it was at the start of the operation (see "pos" in perlfunc), but the match is of course only attempted once. Using \G without /g on a target string that has not previously had a /g match applied to it is the same as using the \A assertion to match the beginning of the string. Note also that, currently, \G is only properly supported when anchored at the very beginning of the pattern.

m//g マッチを m/\G.../g と混ぜることもできます; \G は前回の m//g があればその同じ位置でマッチするゼロ文字幅のアサートです。 /g 修飾子なしの場合、\G アサートは操作の最初としてpos() に固定しますが、("pos" in perlfunc 参照) マッチはもちろん一度だけ試されます。以前に /g マッチを適用していないターゲット文字列に対して /g なしで \G を使うと、文字列の先頭にマッチする \A アサートを使うのと同じことになります。 \G は現在のところ、パターンのまさに先頭を示す場合にのみ正しく対応することにも注意してください。

Examples:

例:

    # list context
    ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);

    # scalar context
    local $/ = "";
    while ($paragraph = <>) {
        while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
            $sentences++;
        }
    }
    say $sentences;

Here's another way to check for sentences in a paragraph:

以下は段落内の文をチェックするためのもう一つの方法です:

 my $sentence_rx = qr{
    (?: (?<= ^ ) | (?<= \s ) )  # after start-of-string or
                                # whitespace
    \p{Lu}                      # capital letter
    .*?                         # a bunch of anything
    (?<= \S )                   # that ends in non-
                                # whitespace
    (?<! \b [DMS]r  )           # but isn't a common abbr.
    (?<! \b Mrs )
    (?<! \b Sra )
    (?<! \b St  )
    [.?!]                       # followed by a sentence
                                # ender
    (?= $ | \s )                # in front of end-of-string
                                # or whitespace
 }sx;
 local $/ = "";
 while (my $paragraph = <>) {
    say "NEW PARAGRAPH";
    my $count = 0;
    while ($paragraph =~ /($sentence_rx)/g) {
        printf "\tgot sentence %d: <%s>\n", ++$count, $1;
    }
 }

 my $sentence_rx = qr{
    (?: (?<= ^ ) | (?<= \s ) )  # 文字列の先頭か空白の後
    \p{Lu}                      # 大文字
    .*?                         # なんでも
    (?<= \S )                   # 空白以外で終わる
    (?<! \b [DMS]r  )           # しかし一般的な省略形ではない
    (?<! \b Mrs )
    (?<! \b Sra )
    (?<! \b St  )
    [.?!]                       # 引き続いて文を終わらせるものが
    (?= $ | \s )                # 文字列の末尾か空白の前に
 }sx;
 local $/ = "";
 while (my $paragraph = <>) {
    say "NEW PARAGRAPH";
    my $count = 0;
    while ($paragraph =~ /($sentence_rx)/g) {
        printf "\tgot sentence %d: <%s>\n", ++$count, $1;
    }
 }

Here's how to use m//gc with \G:

以下は m//gc を \G で使う方法です:

    $_ = "ppooqppqq";
    while ($i++ < 2) {
        print "1: '";
        print $1 while /(o)/gc; print "', pos=", pos, "\n";
        print "2: '";
        print $1 if /\G(q)/gc;  print "', pos=", pos, "\n";
        print "3: '";
        print $1 while /(p)/gc; print "', pos=", pos, "\n";
    }
    print "Final: '$1', pos=",pos,"\n" if /\G(.)/;

The last example should print:

最後のものは以下のものを表示するはずです:

    1: 'oo', pos=4
    2: 'q', pos=5
    3: 'pp', pos=7
    1: '', pos=7
    2: 'q', pos=8
    3: '', pos=8
    Final: 'q', pos=8

Notice that the final match matched q instead of p, which a match without the \G anchor would have done. Also note that the final match did not update pos. pos is only updated on a /g match. If the final match did indeed match p, it's a good bet that you're running an ancient (pre-5.6.0) version of Perl.

\G なしでのマッチが行われたため、最後のマッチでは p ではなく q がマッチすることに注意してください。また、最後のマッチは pos を更新しないことに注意してください。 pos は /g マッチでのみ更新されます。もし最後のマッチで p にマッチした場合、かなりの確率でとても古い (5.6.0 以前の) Perl で実行しているはずです。

A useful idiom for lex-like scanners is /\G.../gc. You can combine several regexps like this to process a string part-by-part, doing different actions depending on which regexp matched. Each regexp tries to match where the previous one leaves off.

lex 風にスキャンするために便利な指定は /\G.../gc です。文字列を部分ごとに処理するためにいくつかの正規表現をつなげて、どの正規表現にマッチングしたかによって異なる処理をすることができます。それぞれの正規表現は前の正規表現が飛ばした部分に対してマッチングを試みます。

 $_ = <<'EOL';
    $url = URI::URL->new( "http://example.com/" );
    die if $url eq "xXx";
 EOL

 LOOP: {
     print(" digits"),       redo LOOP if /\G\d+\b[,.;]?\s*/gc;
     print(" lowercase"),    redo LOOP
                                    if /\G\p{Ll}+\b[,.;]?\s*/gc;
     print(" UPPERCASE"),    redo LOOP
                                    if /\G\p{Lu}+\b[,.;]?\s*/gc;
     print(" Capitalized"),  redo LOOP
                              if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
     print(" MiXeD"),        redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
     print(" alphanumeric"), redo LOOP
                            if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
     print(" line-noise"),   redo LOOP if /\G\W+/gc;
     print ". That's all!\n";
 }

Here is the output (split into several lines):

出力は以下のようになります(何行かに分割しています):

 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
 line-noise lowercase line-noise lowercase line-noise lowercase
 lowercase line-noise lowercase lowercase line-noise lowercase
 lowercase line-noise MiXeD line-noise. That's all!