内容

名称

perlreref - Perl 正则表达式参考

描述

这是 Perl 正则表达式的快速参考。有关完整信息,请参阅 perlreperlop,以及本文档中的 "另请参阅" 部分。

操作符

=~ 确定正则表达式应用于哪个变量。如果不存在,则使用 $_。

$var =~ /foo/;

!~ 确定正则表达式应用于哪个变量,并否定匹配结果;如果匹配成功,则返回 false,如果匹配失败,则返回 true。

$var !~ /foo/;

m/pattern/msixpogcdualn 在字符串中搜索模式匹配,应用给定的选项。

m  Multiline mode - ^ and $ match internal lines
s  match as a Single line - . matches \n
i  case-Insensitive
x  eXtended legibility - free whitespace and comments
p  Preserve a copy of the matched string -
   ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
o  compile pattern Once
g  Global - all occurrences
c  don't reset pos on failed matches when using /g
a  restrict \d, \s, \w and [:posix:] to match ASCII only
aa (two a's) also /i matches exclude ASCII/non-ASCII
l  match according to current locale
u  match according to Unicode rules
d  match according to native rules unless something indicates
   Unicode
n  Non-capture mode. Don't let () fill in $1, $2, etc...

如果“pattern”是空字符串,则使用最后一次成功匹配的正则表达式。此运算符和以下运算符可以使用“/”以外的分隔符。如果分隔符是“/”,则可以省略前导的m

qr/pattern/msixpodualn 允许您将正则表达式存储在变量中,或传递正则表达式。修饰符与m//相同,并存储在正则表达式中。

s/pattern/replacement/msixpogcedual 用“replacement”替换“pattern”的匹配项。修饰符与m//相同,另加两个修饰符

e  Evaluate 'replacement' as an expression
r  Return substitution and leave the original string untouched.

可以多次指定“e”。除非分隔符是单引号('),否则将“replacement”解释为双引号字符串。

m?pattern?m/pattern/ 类似,但仅匹配一次。不能使用其他分隔符。必须使用 reset() 重置。

语法

\       Escapes the character immediately following it
.       Matches any single character except a newline (unless /s is
          used)
^       Matches at the beginning of the string (or line, if /m is used)
$       Matches at the end of the string (or line, if /m is used)
*       Matches the preceding element 0 or more times
+       Matches the preceding element 1 or more times
?       Matches the preceding element 0 or 1 times
{...}   Specifies a range of occurrences for the element preceding it
[...]   Matches any one of the characters contained within the brackets
(...)   Groups subexpressions for capturing to $1, $2...
(?:...) Groups subexpressions without capturing (cluster)
|       Matches either the subexpression preceding or following it
\g1 or \g{1}, \g2 ...    Matches the text from the Nth group
\1, \2, \3 ...           Matches the text from the Nth group
\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
\g{name}     Named backreference
\k<name>     Named backreference
\k'name'     Named backreference
(?P=name)    Named backreference (python syntax)

转义序列

这些序列与普通字符串中的序列相同。

\a       Alarm (beep)
\e       Escape
\f       Formfeed
\n       Newline
\r       Carriage return
\t       Tab
\037     Char whose ordinal is the 3 octal digits, max \777
\o{2307} Char whose ordinal is the octal number, unrestricted
\x7f     Char whose ordinal is the 2 hex digits, max \xFF
\x{263a} Char whose ordinal is the hex number, unrestricted
\cx      Control-x
\N{name} A named Unicode character or character sequence
\N{U+263D} A Unicode character by hex ordinal

\l  Lowercase next character
\u  Titlecase next character
\L  Lowercase until \E
\U  Uppercase until \E
\F  Foldcase until \E
\Q  Disable pattern metacharacters until \E
\E  End modification

有关标题大小写,请参见 "标题大小写"

此序列与普通字符串中的序列不同

\b  An assertion, not backspace, except in a character class

字符类

[amy]    Match 'a', 'm' or 'y'
[f-j]    Dash specifies "range"
[f-j-]   Dash escaped or at start or end means 'dash'
[^f-j]   Caret indicates "match any character _except_ these"

以下序列(\N 除外)在字符类内或外均有效。前六个是区域感知的,所有都是 Unicode 感知的。有关详细信息,请参见 perllocaleperlunicode

\d      A digit
\D      A nondigit
\w      A word character
\W      A non-word character
\s      A whitespace character
\S      A non-whitespace character
\h      A horizontal whitespace
\H      A non horizontal whitespace
\N      A non newline (when not followed by '{NAME}';;
        not valid in a character class; equivalent to [^\n]; it's
        like '.' without /s modifier)
\v      A vertical whitespace
\V      A non vertical whitespace
\R      A generic newline           (?>\v|\x0D\x0A)

\pP     Match P-named (Unicode) property
\p{...} Match Unicode property with name longer than 1 character
\PP     Match non-P
\P{...} Match lack of Unicode property with name longer than 1 char
\X      Match Unicode extended grapheme cluster

POSIX 字符类及其 Unicode 和 Perl 等效项

           ASCII-         Full-
  POSIX    range          range    backslash
[[:...:]]  \p{...}        \p{...}   sequence    Description

-----------------------------------------------------------------------
alnum   PosixAlnum       XPosixAlnum            'alpha' plus 'digit'
alpha   PosixAlpha       XPosixAlpha            Alphabetic characters
ascii   ASCII                                   Any ASCII character
blank   PosixBlank       XPosixBlank   \h       Horizontal whitespace;
                                                  full-range also
                                                  written as
                                                  \p{HorizSpace} (GNU
                                                  extension)
cntrl   PosixCntrl       XPosixCntrl            Control characters
digit   PosixDigit       XPosixDigit   \d       Decimal digits
graph   PosixGraph       XPosixGraph            'alnum' plus 'punct'
lower   PosixLower       XPosixLower            Lowercase characters
print   PosixPrint       XPosixPrint            'graph' plus 'space',
                                                  but not any Controls
punct   PosixPunct       XPosixPunct            Punctuation and Symbols
                                                  in ASCII-range; just
                                                  punct outside it
space   PosixSpace       XPosixSpace   \s       Whitespace
upper   PosixUpper       XPosixUpper            Uppercase characters
word    PosixWord        XPosixWord    \w       'alnum' + Unicode marks
                                                   + connectors, like
                                                   '_' (Perl extension)
xdigit  ASCII_Hex_Digit  XPosixDigit            Hexadecimal digit,
                                                   ASCII-range is
                                                   [0-9A-Fa-f]

此外,还有各种同义词,如 \p{Alpha} 表示 \p{XPosixAlpha};所有这些都列在 "通过 \p{} 和 \P{} 访问的属性"(在 perluniprops 中)

在字符类中

  POSIX      traditional   Unicode
[:digit:]       \d        \p{Digit}
[:^digit:]      \D        \P{Digit}

锚点

所有都是零宽断言。

^  Match string start (or line, if /m is used)
$  Match string end (or line, if /m is used) or before newline
\b{} Match boundary of type specified within the braces
\B{} Match wherever \b{} doesn't match
\b Match word boundary (between \w and \W)
\B Match except at word boundary (between \w and \w or \W and \W)
\A Match string start (regardless of /m)
\Z Match string end (before optional newline)
\z Match absolute string end
\G Match where previous m//g left off
\K Keep the stuff left of the \K, don't include it in $&

限定符

限定符默认情况下是贪婪的,并匹配最长的左端项。

Maximal Minimal Possessive Allowed range
------- ------- ---------- -------------
{n,m}   {n,m}?  {n,m}+     Must occur at least n times
                           but no more than m times
{n,}    {n,}?   {n,}+      Must occur at least n times
{,n}    {,n}?   {,n}+      Must occur at most n times
{n}     {n}?    {n}+       Must occur exactly n times
*       *?      *+         0 or more times (same as {0,})
+       +?      ++         1 or more times (same as {1,})
?       ??      ?+         0 or 1 time (same as {0,1})

占有形式(Perl 5.10 中的新增内容)可防止回溯:即使导致整个匹配失败,也不会回溯到占有限定符模式匹配的内容中。

扩展构造

(?#text)          A comment
(?:...)           Groups subexpressions without capturing (cluster)
(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
(?=...)           Zero-width positive lookahead assertion
(*pla:...)        Same, starting in 5.32; experimentally in 5.28
(*positive_lookahead:...) Same, same versions as *pla
(?!...)           Zero-width negative lookahead assertion
(*nla:...)        Same, starting in 5.32; experimentally in 5.28
(*negative_lookahead:...) Same, same versions as *nla
(?<=...)          Zero-width positive lookbehind assertion
(*plb:...)        Same, starting in 5.32; experimentally in 5.28
(*positive_lookbehind:...) Same, same versions as *plb
(?<!...)          Zero-width negative lookbehind assertion
(*nlb:...)        Same, starting in 5.32; experimentally in 5.28
(*negative_lookbehind:...) Same, same versions as *plb
(?>...)           Grab what we can, prohibit backtracking
(*atomic:...)     Same, starting in 5.32; experimentally in 5.28
(?|...)           Branch reset
(?<name>...)      Named capture
(?'name'...)      Named capture
(?P<name>...)     Named capture (python syntax)
(?[...])          Extended bracketed character class
(?{ code })       Embedded code, return value becomes $^R
(??{ code })      Dynamic regex, return value used as regex
(?N)              Recurse into subpattern number N
(?-N), (?+N)      Recurse into Nth previous/next subpattern
(?R), (?0)        Recurse at the beginning of the whole pattern
(?&name)          Recurse into a named subpattern
(?P>name)         Recurse into a named subpattern (python syntax)
(?(cond)yes|no)
(?(cond)yes)      Conditional expression, where "(cond)" can be:
                  (?=pat)   lookahead; also (*pla:pat)
                            (*positive_lookahead:pat)
                  (?!pat)   negative lookahead; also (*nla:pat)
                            (*negative_lookahead:pat)
                  (?<=pat)  lookbehind; also (*plb:pat)
                            (*lookbehind:pat)
                  (?<!pat)  negative lookbehind; also (*nlb:pat)
                            (*negative_lookbehind:pat)
                  (N)       subpattern N has matched something
                  (<name>)  named subpattern has matched something
                  ('name')  named subpattern has matched something
                  (?{code}) code condition
                  (R)       true if recursing
                  (RN)      true if recursing into Nth subpattern
                  (R&name)  true if recursing into named subpattern
                  (DEFINE)  always false, no no-pattern allowed

变量

$_    Default variable for operators to use

$`    Everything prior to matched string
$&    Entire matched string
$'    Everything after to matched string

${^PREMATCH}   Everything prior to matched string
${^MATCH}      Entire matched string
${^POSTMATCH}  Everything after to matched string

对于仍在使用 Perl 5.18 或更早版本的使用者:使用 $`$&$' 会减慢程序中所有正则表达式使用速度。请参阅 perlvar 中的 @- 以查看不会导致速度变慢的等效表达式。另请参阅 Devel::SawAmpersand。从 Perl 5.10 开始,你还可以使用等效变量 ${^PREMATCH}${^MATCH}${^POSTMATCH},但要定义它们,你必须在正则表达式中指定 /p(保留)修饰符。在 Perl 5.20 中,使用 $`$&$' 不会产生速度差异。

$1, $2 ...  hold the Xth captured expr
$+    Last parenthesized pattern match
$^N   Holds the most recently closed capture
$^R   Holds the result of the last (?{...}) expr
@-    Offsets of starts of groups. $-[0] holds start of whole match
@+    Offsets of ends of groups. $+[0] holds end of whole match
%+    Named capture groups
%-    Named capture groups, as array refs

捕获组根据其括号进行编号。

函数

lc          Lowercase a string
lcfirst     Lowercase first char of a string
uc          Uppercase a string
ucfirst     Titlecase first char of a string
fc          Foldcase a string

pos         Return or set current match position
quotemeta   Quote metacharacters
reset       Reset m?pattern? status
study       Analyze string for optimizing matching

split       Use a regex to split a string into parts

前五个类似于转义序列 \L\l\U\u\F。有关标题大小写,请参阅 “标题大小写”;有关折叠大小写,请参阅 “折叠大小写”

术语

标题大小写

Unicode 概念,通常等于大写,但对于某些字符(如德语“尖锐 s”)存在差异。

折叠大小写

Unicode 形式,在比较字符串时很有用,无论大小写如何,因为某些字符具有复杂的一对多大小写映射。主要是小写的变体。

作者

Iain Truskett。由 Perl 5 Porters 更新。

本文件可以在与 Perl 相同的条款下分发。

另请参阅

致谢

感谢 David P.C. Wollmann、Richard Soderberg、Sean M. Burke、Tom Christiansen、Jim Cromie 和 Jeffrey Goff 提供的宝贵建议。