1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
| | COMMENT(!MOD!zsh/re2
Interface to the RE2 regular expression library.
!MOD!)
cindex(regular expressions)
cindex(re2)
The tt(zsh/re2) module provides regular expression handling using the
RE2 library.
This engine assumes UTF-8 strings by default and zsh never disables this.
Canonical documentation for this syntax accepted by this regular expression
engine can be found at:
uref(https://github.com/google/re2/wiki/Syntax)
The tt(zsh/re2) module makes available some commands and test conditions.
Regular expressions can be pre-compiled and given explicit names; these
are not shell variables and do not share a namespace with them. There
is currently no mechanism to enumerate them.
The supported commands are:
startitem()
findex(re2_compile)
item(tt(re2_compile) COMMENT(TODO: [ tt(-R) var(NAME) ]) [ tt(-acilwLP) ] var(REGEX))(
Compiles an RE2-syntax regular expression, defaulting to case-sensitive.
COMMENT(TODO: Option tt(-R) stores the regular expression with the given name,
instead of in anonymous global state.)
Option tt(-L) will interpret the pattern as a literal, not a regex.
Option tt(-P) will enable POSIX syntax instead of the full language.
Option tt(-a) will force the pattern to be anchored.
Option tt(-c) will re-enable Perl class support in POSIX mode.
Option tt(-i) will compile a case-insensitive pattern.
Option tt(-l) will use a longest-match not first-match algorithm for
selecting which branch matches.
Option tt(-w) will re-enable Perl word-boundary support in POSIX mode.
)
enditem()
startitem()
findex(re2_match)
item(tt(re2_match) [ tt(-v) var(var) ] [ tt(-a) var(arr) ] \
COMMENT(TODO:[ tt(-R) var(REGNAME) ]|)[ tt(-P) var(PATTERN) ] var(string))(
Matches a regular expression against the supplied string, storing matches in
variables.
Returns success if var(string) matches the tested regular expression.
Without option+COMMENT(TODO: s tt(-R) or) tt(-P) will match against an implicit current regular
expression object, which must have been compiled with tt(re2_compile).
COMMENT(TODO: Option tt(-R) will use the regular expression with the given name.)
Option tt(-P) will take a regular expression as a parameter and compile and
use it, without changing the implicit current regular expression object as
set by calling tt(re2_compile).
Without a successful match, no variables are modified, even those explicitly
specified.
Upon successful match: the entire matched portion of var(string) is stored in
the var(var) of option tt(-v) if given, else in tt(MATCH); any captured
sub-expressions are stored in the array var(arr) of option tt(-a) if given,
else in tt(match).
No offset variables are currently mutated; this may change in a future release
of Zsh.
)
enditem()
The supported test conditions are:
startitem()
findex(re2-match)
item(var(expr) tt(-re2-match) var(regex))(
Matches a string against an RE2 regular expression.
Upon successful match, the
matched portion of the string will normally be placed in the tt(MATCH)
variable. If there are any capturing parentheses within the regex, then
the tt(match) array variable will contain those.
If the match is not successful, then the variables will not be altered.
In addition, the tt(MBEGIN) and tt(MEND) variables are updated to point
to the offsets within var(expr) for the beginning and end of the matched
text, with the tt(mbegin) and tt(mend) arrays holding the beginning and
end of each substring matched.
If tt(BASH_REMATCH) is set, then the array tt(BASH_REMATCH) will be set
instead of all of the other variables.
The tt(NO_CASE_MATCH) option may be used to make matching case-sensitive.
For finer-grained control, use the tt(re2_match) builtin.
)
enditem()
startitem()
findex(re2-match-posix)
item(var(expr) tt(-re2-match-posix) var(regex))(
Matches as per tt(-re2-match) but configuring the RE2 engine to use
POSIX syntax.
)
enditem()
startitem()
findex(re2-match-posixperl)
item(var(expr) tt(-re2-match-posixperl) var(regex))(
Matches as per tt(-re2-match) but configuring the RE2 engine to use
POSIX syntax, with the Perl classes and word-boundary extensions re-enabled
too.
This thus adds support for:
tt(\d), tt(\s), tt(\w), tt(\D), tt(\S), tt(\W), tt(\b), and tt(\B).
)
enditem()
startitem()
findex(re2-match-longest)
item(var(expr) tt(-re2-match-longest) var(regex))(
Matches as per tt(-re2-match) but configuring the RE2 engine to find
the longest match, instead of the left-most.
For example, given
example([[ abb -re2-match-longest ^a+LPAR()b|bb+RPAR() ]])
This will match the right-branch, thus tt(abb), where tt(-re2-match) would
instead match only tt(ab).
)
enditem()
|