perlreref - Perl 正则表达式参考
这是 Perl 正则表达式的快速参考。有关完整信息,请参阅 perlre 和 perlop,以及本文档中的 "另请参阅" 部分。
=~
确定正则表达式应用于哪个变量。如果不存在,则使用 $_。
$var =~ /foo/;
!~
确定正则表达式应用于哪个变量,并否定匹配结果;如果匹配成功,则返回 false,如果匹配失败,则返回 true。
$var !~ /foo/;
m/pattern/msixpogcdualn
在字符串中搜索模式匹配,应用给定的选项。
m Multiline mode - ^ and $ match internal lines
s match as a Single line - . matches \n
i case-Insensitive
x eXtended legibility - free whitespace and comments
p Preserve a copy of the matched string -
${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
o compile pattern Once
g Global - all occurrences
c don't reset pos on failed matches when using /g
a restrict \d, \s, \w and [:posix:] to match ASCII only
aa (two a's) also /i matches exclude ASCII/non-ASCII
l match according to current locale
u match according to Unicode rules
d match according to native rules unless something indicates
Unicode
n Non-capture mode. Don't let () fill in $1, $2, etc...
如果“pattern”是空字符串,则使用最后一次成功匹配的正则表达式。此运算符和以下运算符可以使用“/”以外的分隔符。如果分隔符是“/”,则可以省略前导的m
。
qr/pattern/msixpodualn
允许您将正则表达式存储在变量中,或传递正则表达式。修饰符与m//
相同,并存储在正则表达式中。
s/pattern/replacement/msixpogcedual
用“replacement”替换“pattern”的匹配项。修饰符与m//
相同,另加两个修饰符
e Evaluate 'replacement' as an expression
r Return substitution and leave the original string untouched.
可以多次指定“e”。除非分隔符是单引号('
),否则将“replacement”解释为双引号字符串。
m?pattern?
与m/pattern/
类似,但仅匹配一次。不能使用其他分隔符。必须使用 reset() 重置。
\ Escapes the character immediately following it
. Matches any single character except a newline (unless /s is
used)
^ Matches at the beginning of the string (or line, if /m is used)
$ Matches at the end of the string (or line, if /m is used)
* Matches the preceding element 0 or more times
+ Matches the preceding element 1 or more times
? Matches the preceding element 0 or 1 times
{...} Specifies a range of occurrences for the element preceding it
[...] Matches any one of the characters contained within the brackets
(...) Groups subexpressions for capturing to $1, $2...
(?:...) Groups subexpressions without capturing (cluster)
| Matches either the subexpression preceding or following it
\g1 or \g{1}, \g2 ... Matches the text from the Nth group
\1, \2, \3 ... Matches the text from the Nth group
\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
\g{name} Named backreference
\k<name> Named backreference
\k'name' Named backreference
(?P=name) Named backreference (python syntax)
这些序列与普通字符串中的序列相同。
\a Alarm (beep)
\e Escape
\f Formfeed
\n Newline
\r Carriage return
\t Tab
\037 Char whose ordinal is the 3 octal digits, max \777
\o{2307} Char whose ordinal is the octal number, unrestricted
\x7f Char whose ordinal is the 2 hex digits, max \xFF
\x{263a} Char whose ordinal is the hex number, unrestricted
\cx Control-x
\N{name} A named Unicode character or character sequence
\N{U+263D} A Unicode character by hex ordinal
\l Lowercase next character
\u Titlecase next character
\L Lowercase until \E
\U Uppercase until \E
\F Foldcase until \E
\Q Disable pattern metacharacters until \E
\E End modification
有关标题大小写,请参见 "标题大小写"。
此序列与普通字符串中的序列不同
\b An assertion, not backspace, except in a character class
[amy] Match 'a', 'm' or 'y'
[f-j] Dash specifies "range"
[f-j-] Dash escaped or at start or end means 'dash'
[^f-j] Caret indicates "match any character _except_ these"
以下序列(\N
除外)在字符类内或外均有效。前六个是区域感知的,所有都是 Unicode 感知的。有关详细信息,请参见 perllocale 和 perlunicode。
\d A digit
\D A nondigit
\w A word character
\W A non-word character
\s A whitespace character
\S A non-whitespace character
\h A horizontal whitespace
\H A non horizontal whitespace
\N A non newline (when not followed by '{NAME}';;
not valid in a character class; equivalent to [^\n]; it's
like '.' without /s modifier)
\v A vertical whitespace
\V A non vertical whitespace
\R A generic newline (?>\v|\x0D\x0A)
\pP Match P-named (Unicode) property
\p{...} Match Unicode property with name longer than 1 character
\PP Match non-P
\P{...} Match lack of Unicode property with name longer than 1 char
\X Match Unicode extended grapheme cluster
POSIX 字符类及其 Unicode 和 Perl 等效项
ASCII- Full-
POSIX range range backslash
[[:...:]] \p{...} \p{...} sequence Description
-----------------------------------------------------------------------
alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
alpha PosixAlpha XPosixAlpha Alphabetic characters
ascii ASCII Any ASCII character
blank PosixBlank XPosixBlank \h Horizontal whitespace;
full-range also
written as
\p{HorizSpace} (GNU
extension)
cntrl PosixCntrl XPosixCntrl Control characters
digit PosixDigit XPosixDigit \d Decimal digits
graph PosixGraph XPosixGraph 'alnum' plus 'punct'
lower PosixLower XPosixLower Lowercase characters
print PosixPrint XPosixPrint 'graph' plus 'space',
but not any Controls
punct PosixPunct XPosixPunct Punctuation and Symbols
in ASCII-range; just
punct outside it
space PosixSpace XPosixSpace \s Whitespace
upper PosixUpper XPosixUpper Uppercase characters
word PosixWord XPosixWord \w 'alnum' + Unicode marks
+ connectors, like
'_' (Perl extension)
xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
ASCII-range is
[0-9A-Fa-f]
此外,还有各种同义词,如 \p{Alpha}
表示 \p{XPosixAlpha}
;所有这些都列在 "通过 \p{} 和 \P{} 访问的属性"(在 perluniprops 中)
在字符类中
POSIX traditional Unicode
[:digit:] \d \p{Digit}
[:^digit:] \D \P{Digit}
所有都是零宽断言。
^ Match string start (or line, if /m is used)
$ Match string end (or line, if /m is used) or before newline
\b{} Match boundary of type specified within the braces
\B{} Match wherever \b{} doesn't match
\b Match word boundary (between \w and \W)
\B Match except at word boundary (between \w and \w or \W and \W)
\A Match string start (regardless of /m)
\Z Match string end (before optional newline)
\z Match absolute string end
\G Match where previous m//g left off
\K Keep the stuff left of the \K, don't include it in $&
限定符默认情况下是贪婪的,并匹配最长的左端项。
Maximal Minimal Possessive Allowed range
------- ------- ---------- -------------
{n,m} {n,m}? {n,m}+ Must occur at least n times
but no more than m times
{n,} {n,}? {n,}+ Must occur at least n times
{,n} {,n}? {,n}+ Must occur at most n times
{n} {n}? {n}+ Must occur exactly n times
* *? *+ 0 or more times (same as {0,})
+ +? ++ 1 or more times (same as {1,})
? ?? ?+ 0 or 1 time (same as {0,1})
占有形式(Perl 5.10 中的新增内容)可防止回溯:即使导致整个匹配失败,也不会回溯到占有限定符模式匹配的内容中。
(?#text) A comment
(?:...) Groups subexpressions without capturing (cluster)
(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
(?=...) Zero-width positive lookahead assertion
(*pla:...) Same, starting in 5.32; experimentally in 5.28
(*positive_lookahead:...) Same, same versions as *pla
(?!...) Zero-width negative lookahead assertion
(*nla:...) Same, starting in 5.32; experimentally in 5.28
(*negative_lookahead:...) Same, same versions as *nla
(?<=...) Zero-width positive lookbehind assertion
(*plb:...) Same, starting in 5.32; experimentally in 5.28
(*positive_lookbehind:...) Same, same versions as *plb
(?<!...) Zero-width negative lookbehind assertion
(*nlb:...) Same, starting in 5.32; experimentally in 5.28
(*negative_lookbehind:...) Same, same versions as *plb
(?>...) Grab what we can, prohibit backtracking
(*atomic:...) Same, starting in 5.32; experimentally in 5.28
(?|...) Branch reset
(?<name>...) Named capture
(?'name'...) Named capture
(?P<name>...) Named capture (python syntax)
(?[...]) Extended bracketed character class
(?{ code }) Embedded code, return value becomes $^R
(??{ code }) Dynamic regex, return value used as regex
(?N) Recurse into subpattern number N
(?-N), (?+N) Recurse into Nth previous/next subpattern
(?R), (?0) Recurse at the beginning of the whole pattern
(?&name) Recurse into a named subpattern
(?P>name) Recurse into a named subpattern (python syntax)
(?(cond)yes|no)
(?(cond)yes) Conditional expression, where "(cond)" can be:
(?=pat) lookahead; also (*pla:pat)
(*positive_lookahead:pat)
(?!pat) negative lookahead; also (*nla:pat)
(*negative_lookahead:pat)
(?<=pat) lookbehind; also (*plb:pat)
(*lookbehind:pat)
(?<!pat) negative lookbehind; also (*nlb:pat)
(*negative_lookbehind:pat)
(N) subpattern N has matched something
(<name>) named subpattern has matched something
('name') named subpattern has matched something
(?{code}) code condition
(R) true if recursing
(RN) true if recursing into Nth subpattern
(R&name) true if recursing into named subpattern
(DEFINE) always false, no no-pattern allowed
$_ Default variable for operators to use
$` Everything prior to matched string
$& Entire matched string
$' Everything after to matched string
${^PREMATCH} Everything prior to matched string
${^MATCH} Entire matched string
${^POSTMATCH} Everything after to matched string
对于仍在使用 Perl 5.18 或更早版本的使用者:使用 $`
、$&
或 $'
会减慢程序中所有正则表达式使用速度。请参阅 perlvar 中的 @-
以查看不会导致速度变慢的等效表达式。另请参阅 Devel::SawAmpersand。从 Perl 5.10 开始,你还可以使用等效变量 ${^PREMATCH}
、${^MATCH}
和 ${^POSTMATCH}
,但要定义它们,你必须在正则表达式中指定 /p
(保留)修饰符。在 Perl 5.20 中,使用 $`
、$&
和 $'
不会产生速度差异。
$1, $2 ... hold the Xth captured expr
$+ Last parenthesized pattern match
$^N Holds the most recently closed capture
$^R Holds the result of the last (?{...}) expr
@- Offsets of starts of groups. $-[0] holds start of whole match
@+ Offsets of ends of groups. $+[0] holds end of whole match
%+ Named capture groups
%- Named capture groups, as array refs
捕获组根据其左括号进行编号。
lc Lowercase a string
lcfirst Lowercase first char of a string
uc Uppercase a string
ucfirst Titlecase first char of a string
fc Foldcase a string
pos Return or set current match position
quotemeta Quote metacharacters
reset Reset m?pattern? status
study Analyze string for optimizing matching
split Use a regex to split a string into parts
前五个类似于转义序列 \L
、\l
、\U
、\u
和 \F
。有关标题大小写,请参阅 “标题大小写”;有关折叠大小写,请参阅 “折叠大小写”。
Unicode 概念,通常等于大写,但对于某些字符(如德语“尖锐 s”)存在差异。
Unicode 形式,在比较字符串时很有用,无论大小写如何,因为某些字符具有复杂的一对多大小写映射。主要是小写的变体。
Iain Truskett。由 Perl 5 Porters 更新。
本文件可以在与 Perl 相同的条款下分发。
perlretut,了解正则表达式的教程。
perlrequick,了解快速教程。
perlre,了解更详细的信息。
perlvar,了解变量的详细信息。
perlop,了解运算符的详细信息。
perlfunc,了解函数的详细信息。
perlfaq6,了解正则表达式的常见问题解答。
perlrebackslash,了解反斜杠序列的参考。
perlrecharclass,了解字符类的参考。
使用 re 模块来更改行为并帮助调试。
perluniintro、perlunicode、charnames 和 perllocale 提供有关正则表达式和国际化的详细信息。
Jeffrey Friedl 著的《精通正则表达式》(http://oreilly.com/catalog/9780596528126/)为该主题提供了全面的基础和参考。
感谢 David P.C. Wollmann、Richard Soderberg、Sean M. Burke、Tom Christiansen、Jim Cromie 和 Jeffrey Goff 提供的宝贵建议。