* [PATCH] Add zsh/re2 module with conditions
@ 2016-09-08 4:15 Phil Pennock
2016-09-08 13:56 ` [PATCH] re2: fix clean-up path; fix two comments Phil Pennock
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Phil Pennock @ 2016-09-08 4:15 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 24032 bytes --]
Folks,
I tend to get automatically kicked off the -workers list by ezmlm
because I reject mails which are self-declared as spam, so please CC
replies to me. Also: my commit-bit is currently
surrended-for-safekeeping because I've not been doing much with Zsh, so
someone else will need to merge this, if it's accepted.
RE2 is a regular expression library, written in C++, from Google. It
offers most of the features of PCRE, excluding those which can't be
handled without backtracking. It's BSD-licensed. This patch adds the
zsh/re2 module. It used the `cre` library to have C-language bindings.
At this point, I haven't done anything about rebinding =~ to handle
this. It's purely new infix-operators based on words. I'm thinking
perhaps something along the lines of $zsh_reop_modules=(regex), with
`setopt rematch_pcre` becoming a compatibility interface that acts as
though `pcre` were prepended to that list and
zsh_reop_modules=(pcre regex)
having the same effect. Then I could use `zsh_reop_modules=(re2 regex)`.
Does this seem sane? Anyone have better suggestions? I do want to have
=~ able to use this module, but the current work stands alone and should
be merge-able as-is.
Is there particular interest in having command-forms too? There's no
"study" concept, but I suppose compiling a hairy regexp only once might
be good in some situations (but why use shell for those?)
This has been tested on MacOS 10.10.5.
My ulterior motive is that I want "better than zsh/regex" available by
default on MacOS, where Apple build without GPL modules for the system
Zsh. I hope that by offering this option, Apple's engineers might
incorporate this one day and I can be happier. :)
I've also pushed this code to a GitHub repo, philpennock/zsh-code on the
re2 branch: https://github.com/philpennock/zsh-code/tree/re2
Tested with re2 20160901 installed via Brew, cre2 installed via:
git clone https://github.com/marcomaggi/cre2
cd cre2
LIBTOOLIZE=glibtoolize sh ./autogen.sh
CXX=g++-6 CC=gcc-6 ./configure --prefix=/opt/regexps
make doc/stamp-vti
make
make install
and Zsh configured with:
CPPFLAGS=-I/opt/regexps/include LDFLAGS=-L/opt/regexps/lib \
./configure --prefix=/opt/zsh-devel --enable-pcre --enable-re2 \
--enable-cap --enable-multibyte --enable-zsh-secure-free \
--with-tcsetpgrp --enable-etcdir=/etc
Feedback welcome.
(Oh, I can't spell "tough", it seems; deferring fix for now).
Regards,
-Phil
----------------------------8< git patch >8-----------------------------
Add support for Google's BSD-licensed RE2 library, via the cre
C-language bindings (also BSD-licensed).
Guard with --enable-re2 for now.
Adds 4 infix conditions. Currently no commands, no support for changing
how =~ binds.
Includes tests & docs
---
Doc/Makefile.in | 2 +-
Doc/Zsh/mod_re2.yo | 65 +++++++++++
INSTALL | 8 ++
Src/Modules/re2.c | 324 ++++++++++++++++++++++++++++++++++++++++++++++++++++
Src/Modules/re2.mdd | 5 +
Test/V11re2.ztst | 170 +++++++++++++++++++++++++++
configure.ac | 14 +++
7 files changed, 587 insertions(+), 1 deletion(-)
create mode 100644 Doc/Zsh/mod_re2.yo
create mode 100644 Src/Modules/re2.c
create mode 100644 Src/Modules/re2.mdd
create mode 100644 Test/V11re2.ztst
diff --git a/Doc/Makefile.in b/Doc/Makefile.in
index 2752096..8c00876 100644
--- a/Doc/Makefile.in
+++ b/Doc/Makefile.in
@@ -65,7 +65,7 @@ Zsh/mod_datetime.yo Zsh/mod_db_gdbm.yo Zsh/mod_deltochar.yo \
Zsh/mod_example.yo Zsh/mod_files.yo Zsh/mod_langinfo.yo \
Zsh/mod_mapfile.yo Zsh/mod_mathfunc.yo Zsh/mod_newuser.yo \
Zsh/mod_parameter.yo Zsh/mod_pcre.yo Zsh/mod_private.yo \
-Zsh/mod_regex.yo Zsh/mod_sched.yo Zsh/mod_socket.yo \
+Zsh/mod_re2.yo Zsh/mod_regex.yo Zsh/mod_sched.yo Zsh/mod_socket.yo \
Zsh/mod_stat.yo Zsh/mod_system.yo Zsh/mod_tcp.yo \
Zsh/mod_termcap.yo Zsh/mod_terminfo.yo \
Zsh/mod_zftp.yo Zsh/mod_zle.yo Zsh/mod_zleparameter.yo \
diff --git a/Doc/Zsh/mod_re2.yo b/Doc/Zsh/mod_re2.yo
new file mode 100644
index 0000000..5527440
--- /dev/null
+++ b/Doc/Zsh/mod_re2.yo
@@ -0,0 +1,65 @@
+COMMENT(!MOD!zsh/re2
+Interface to the RE2 regular expression library.
+!MOD!)
+cindex(regular expressions)
+cindex(re2)
+The tt(zsh/re2) module makes available the following test conditions:
+
+startitem()
+findex(re2-match)
+item(var(expr) tt(-re2-match) var(regex))(
+Matches a string against an RE2 regular expression.
+On successful match,
+matched portion of the string will normally be placed in the tt(MATCH)
+variable. If there are any capturing parentheses within the regex, then
+the tt(match) array variable will contain those.
+If the match is not successful, then the variables will not be altered.
+
+In addition, the tt(MBEGIN) and tt(MEND) variables are updated to point
+to the offsets within var(expr) for the beginning and end of the matched
+text, with the tt(mbegin) and tt(mend) arrays holding the beginning and
+end of each substring matched.
+
+If tt(BASH_REMATCH) is set, then the array tt(BASH_REMATCH) will be set
+instead of all of the other variables.
+
+Canonical documentation for this syntax accepted by this regular expression
+engine can be found at:
+uref(https://github.com/google/re2/wiki/Syntax)
+)
+enditem()
+
+startitem()
+findex(re2-match-posix)
+item(var(expr) tt(-re2-match-posix) var(regex))(
+Matches as per tt(-re2-match) but configuring the RE2 engine to use
+POSIX syntax.
+)
+enditem()
+
+startitem()
+findex(re2-match-posixperl)
+item(var(expr) tt(-re2-match-posixperl) var(regex))(
+Matches as per tt(-re2-match) but configuring the RE2 engine to use
+POSIX syntax, with the Perl classes and word-boundary extensions re-enabled
+too.
+
+This thus adds support for:
+tt(\d), tt(\s), tt(\w), tt(\D), tt(\S), tt(\W), tt(\b), and tt(\B).
+)
+enditem()
+
+startitem()
+findex(re2-match-longest)
+item(var(expr) tt(-re2-match-longest) var(regex))(
+Matches as per tt(-re2-match) but configuring the RE2 engine to find
+the longest match, instead of the left-most.
+
+For example, given
+
+example([[ abb -re2-match-longest ^a+LPAR()b|bb+RPAR() ]])
+
+This will match the right-branch, thus tt(abb), where tt(-re2-match) would
+instead match only tt(ab).
+)
+enditem()
diff --git a/INSTALL b/INSTALL
index 99895bd..887dd8e 100644
--- a/INSTALL
+++ b/INSTALL
@@ -558,6 +558,14 @@ only be searched for if the option --enable-pcre is passed to configure.
(Future versions of the shell may have a better fix for this problem.)
+--enable-re2:
+
+The RE2 library is written in C++, so a C-library shim layer is needed for
+use by Zsh. We use https://github.com/marcomaggi/cre2 for this, which is
+currently at version 0.3.1. Both re2 and cre2 need to be installed for
+this option to successfully enable the zsh/re2 module. The Zsh
+functionality is currently experimental.
+
--enable-cap:
This searches for POSIX capabilities; if found, the `cap' library
diff --git a/Src/Modules/re2.c b/Src/Modules/re2.c
new file mode 100644
index 0000000..e542723
--- /dev/null
+++ b/Src/Modules/re2.c
@@ -0,0 +1,324 @@
+/*
+ * re2.c
+ *
+ * This file is part of zsh, the Z shell.
+ *
+ * Copyright (c) 2016 Phil Pennock
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, without written agreement and without
+ * license or royalty fees, to use, copy, modify, and distribute this
+ * software and to distribute modified versions of this software for any
+ * purpose, provided that the above copyright notice and the following
+ * two paragraphs appear in all copies of this software.
+ *
+ * In no event shall Phil Pennock or the Zsh Development Group be liable
+ * to any party for direct, indirect, special, incidental, or consequential
+ * damages arising out of the use of this software and its documentation,
+ * even if Phil Pennock and the Zsh Development Group have been advised of
+ * the possibility of such damage.
+ *
+ * Phil Pennock and the Zsh Development Group specifically disclaim any
+ * warranties, including, but not limited to, the implied warranties of
+ * merchantability and fitness for a particular purpose. The software
+ * provided hereunder is on an "as is" basis, and Phil Pennock and the
+ * Zsh Development Group have no obligation to provide maintenance,
+ * support, updates, enhancements, or modifications.
+ *
+ */
+
+/* This is heavily based upon my earlier regex module, with Peter's fixes
+ * for the tought stuff I had skipped / gotten wrong. */
+
+#include "re2.mdh"
+#include "re2.pro"
+
+/*
+ * re2 itself is a C++ library; zsh needs C language bindings.
+ * These come from <https://github.com/marcomaggi/cre2>.
+ */
+#include <cre2.h>
+
+/* the conditions we support */
+#define ZRE2_COND_RE2 0
+#define ZRE2_COND_POSIX 1
+#define ZRE2_COND_POSIXPERL 2
+#define ZRE2_COND_LONGEST 3
+
+/**/
+static int
+zcond_re2_match(char **a, int id)
+{
+ cre2_regexp_t *rex;
+ cre2_options_t *opt;
+ cre2_string_t *m, *matches = NULL;
+ char *lhstr, *lhstr_zshmeta, *rhre, *rhre_zshmeta;
+ char **result_array, **x;
+ char *s;
+ char **mbegin, **mend, **bptr, **eptr;
+ size_t matchessz = 0;
+ int return_value, ncaptures, matched, nelem, start, n, indexing_base;
+ int remaining_len, charlen;
+ zlong offs;
+
+ return_value = 0; /* 1 => matched successfully */
+
+ lhstr_zshmeta = cond_str(a,0,0);
+ rhre_zshmeta = cond_str(a,1,0);
+ lhstr = ztrdup(lhstr_zshmeta);
+ unmetafy(lhstr, NULL);
+ rhre = ztrdup(rhre_zshmeta);
+ unmetafy(rhre, NULL);
+
+ opt = cre2_opt_new();
+ if (!opt) {
+ zwarn("re2 opt memory allocation failure");
+ goto CLEANUP_UNMETAONLY;
+ }
+ /* nb: we can set encoding here; re2 assumes UTF-8 by default */
+ cre2_opt_set_log_errors(opt, 0); /* don't hit stderr by default */
+ if (!isset(CASEMATCH)) {
+ cre2_opt_set_case_sensitive(opt, 0);
+ }
+
+ /* "The following options are only consulted when POSIX syntax is enabled;
+ * when POSIX syntax is disabled: these features are always enabled and
+ * cannot be turned off."
+ * Seems hard to mis-parse, but I did. Okay, Perl classes \d,\w and friends
+ * always on normally, can _also_ be enabled in POSIX mode. */
+
+ switch (id) {
+ case ZRE2_COND_RE2:
+ /* nothing to do, this is default */
+ break;
+ case ZRE2_COND_POSIX:
+ cre2_opt_set_posix_syntax(opt, 1);
+ break;
+ case ZRE2_COND_POSIXPERL:
+ cre2_opt_set_posix_syntax(opt, 1);
+ /* we enable Perl classes (\d, \s, \w, \D, \S, \W)
+ * and boundaries/not (\b \B) */
+ cre2_opt_set_perl_classes(opt, 1);
+ cre2_opt_set_word_boundary(opt, 1);
+ break;
+ case ZRE2_COND_LONGEST:
+ cre2_opt_set_longest_match(opt, 1);
+ break;
+ default:
+ DPUTS(1, "bad re2 option");
+ goto CLEANUP_UNMETAONLY;
+ }
+
+ rex = cre2_new(rhre, strlen(rhre), opt);
+ if (!rex) {
+ zwarn("re2 regular expression memory allocation failure");
+ goto CLEANUP_OPT;
+ }
+ if (cre2_error_code(rex)) {
+ zwarn("re2 rexexp compilation failed: %s", cre2_error_string(rex));
+ goto CLEANUP;
+ }
+
+ ncaptures = cre2_num_capturing_groups(rex);
+ /* the nmatch for cre2_match follows the usual pattern of index 0 holding
+ * the entire matched substring, index 1 holding the first capturing
+ * sub-expression, etc. So we need ncaptures+1 elements. */
+ matchessz = (ncaptures + 1) * sizeof(cre2_string_t);
+ matches = zalloc(matchessz);
+
+ matched = cre2_match(rex,
+ lhstr, strlen(lhstr), /* text to match against */
+ 0, strlen(lhstr), /* substring of text to consider */
+ CRE2_UNANCHORED, /* user should explicitly anchor */
+ matches, (ncaptures+1));
+ if (!matched)
+ goto CLEANUP;
+ return_value = 1;
+
+ /* We have a match, we will return success, we have array of cre2_string_t
+ * items, each with .data and .length fields pointing into the matched text,
+ * all in unmetafied format.
+ *
+ * We need to collect the results, put together various arrays and offset
+ * variables, while respecting options to change the array set, the indexing
+ * of that array and everything else that 26 years of history has endowed
+ * upon us. */
+ /* option BASHREMATCH set:
+ * set $BASH_REMATCH instead of $MATCH/$match
+ * entire matched portion in index 0 (useful with option KSH_ARRAYS)
+ * option _not_ set:
+ * $MATCH scalar gets entire string
+ * $match array gets substrings
+ * $MBEGIN $MEND scalars get offsets of entire match
+ * $mbegin $mend arrays get offsets of substrings
+ * all of the offsets depend upon KSHARRAYS to determine indexing!
+ */
+
+ if (isset(BASHREMATCH)) {
+ start = 0;
+ nelem = ncaptures + 1;
+ } else {
+ start = 1;
+ nelem = ncaptures;
+ }
+ result_array = NULL;
+ if (nelem) {
+ result_array = x = (char **) zalloc(sizeof(char *) * (nelem + 1));
+ for (m = matches + start, n = start; n <= ncaptures; ++n, ++m, ++x) {
+ /* .data is (const char *), metafy can modify in-place so takes
+ * (char *) but doesn't modify given META_DUP, so safe to drop
+ * the const. */
+ *x = metafy((char *)m->data, m->length, META_DUP);
+ }
+ *x = NULL;
+ }
+
+ if (isset(BASHREMATCH)) {
+ setaparam("BASH_REMATCH", result_array);
+ goto CLEANUP;
+ }
+
+ indexing_base = isset(KSHARRAYS) ? 0 : 1;
+
+ setsparam("MATCH", metafy((char *)matches[0].data, matches[0].length, META_DUP));
+ /* count characters before the match */
+ s = lhstr;
+ remaining_len = matches[0].data - lhstr;
+ offs = 0;
+ MB_CHARINIT();
+ while (remaining_len) {
+ offs++;
+ charlen = MB_CHARLEN(s, remaining_len);
+ s += charlen;
+ remaining_len -= charlen;
+ }
+ setiparam("MBEGIN", offs + indexing_base);
+ /* then the characters within the match */
+ remaining_len = matches[0].length;
+ while (remaining_len) {
+ offs++;
+ charlen = MB_CHARLEN(s, remaining_len);
+ s += charlen;
+ remaining_len -= charlen;
+ }
+ /* zsh ${foo[a,b]} is inclusive of end-points, [a,b] not [a,b) */
+ setiparam("MEND", offs + indexing_base - 1);
+ if (!nelem) {
+ goto CLEANUP;
+ }
+
+ bptr = mbegin = (char **)zalloc(sizeof(char *)*(nelem+1));
+ eptr = mend = (char **)zalloc(sizeof(char *)*(nelem+1));
+ for (m = matches + start, n = 0;
+ n < nelem;
+ ++n, ++m, ++bptr, ++eptr)
+ {
+ char buf[DIGBUFSIZE];
+ if (m->data == NULL) {
+ /* FIXME: have assumed this is the API for non-matching substrings; confirm! */
+ *bptr = ztrdup("-1");
+ *eptr = ztrdup("-1");
+ continue;
+ }
+ s = lhstr;
+ remaining_len = m->data - lhstr;
+ offs = 0;
+ /* Find the start offset */
+ MB_CHARINIT();
+ while (remaining_len) {
+ offs++;
+ charlen = MB_CHARLEN(s, remaining_len);
+ s += charlen;
+ remaining_len -= charlen;
+ }
+ convbase(buf, offs + indexing_base, 10);
+ *bptr = ztrdup(buf);
+ /* Continue to the end offset */
+ remaining_len = m->length;
+ while (remaining_len) {
+ offs++;
+ charlen = MB_CHARLEN(s, remaining_len);
+ s += charlen;
+ remaining_len -= charlen;
+ }
+ convbase(buf, offs + indexing_base - 1, 10);
+ *eptr = ztrdup(buf);
+ }
+ *bptr = *eptr = NULL;
+
+ setaparam("match", result_array);
+ setaparam("mbegin", mbegin);
+ setaparam("mend", mend);
+
+CLEANUP:
+ if (matches)
+ zfree(matches, matchessz);
+ cre2_delete(rex);
+CLEANUP_OPT:
+ cre2_opt_delete(opt);
+CLEANUP_UNMETAONLY:
+ free(lhstr);
+ free(rhre);
+ return return_value;
+}
+
+
+static struct conddef cotab[] = {
+ CONDDEF("re2-match", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_RE2),
+ CONDDEF("re2-match-posix", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_POSIX),
+ CONDDEF("re2-match-posixperl", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_POSIXPERL),
+ CONDDEF("re2-match-longest", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_LONGEST),
+};
+
+
+static struct features module_features = {
+ NULL, 0,
+ cotab, sizeof(cotab)/sizeof(*cotab),
+ NULL, 0,
+ NULL, 0,
+ 0
+};
+
+
+/**/
+int
+setup_(UNUSED(Module m))
+{
+ return 0;
+}
+
+/**/
+int
+features_(Module m, char ***features)
+{
+ *features = featuresarray(m, &module_features);
+ return 0;
+}
+
+/**/
+int
+enables_(Module m, int **enables)
+{
+ return handlefeatures(m, &module_features, enables);
+}
+
+/**/
+int
+boot_(UNUSED(Module m))
+{
+ return 0;
+}
+
+/**/
+int
+cleanup_(Module m)
+{
+ return setfeatureenables(m, &module_features, NULL);
+}
+
+/**/
+int
+finish_(UNUSED(Module m))
+{
+ return 0;
+}
diff --git a/Src/Modules/re2.mdd b/Src/Modules/re2.mdd
new file mode 100644
index 0000000..b20838c
--- /dev/null
+++ b/Src/Modules/re2.mdd
@@ -0,0 +1,5 @@
+name=zsh/re2
+link='if test "x$enable_re2" = xyes && test "x$ac_cv_lib_cre2_cre2_version_string" = xyes; then echo dynamic; else echo no; fi'
+load=no
+
+objects="re2.o"
diff --git a/Test/V11re2.ztst b/Test/V11re2.ztst
new file mode 100644
index 0000000..d6e327c
--- /dev/null
+++ b/Test/V11re2.ztst
@@ -0,0 +1,170 @@
+%prep
+
+ if ! zmodload -F zsh/re2 C:re2-match 2>/dev/null
+ then
+ ZTST_unimplemented="the zsh/re2 module is not available"
+ return 0
+ fi
+# Load the rest of the builtins
+ zmodload zsh/re2
+ ##FIXME#setopt rematch_pcre
+# Find a UTF-8 locale.
+ setopt multibyte
+# Don't let LC_* override our choice of locale.
+ unset -m LC_\*
+ mb_ok=
+ langs=(en_{US,GB}.{UTF-,utf}8 en.UTF-8
+ $(locale -a 2>/dev/null | egrep 'utf8|UTF-8'))
+ for LANG in $langs; do
+ if [[ é = ? ]]; then
+ mb_ok=1
+ break;
+ fi
+ done
+ if [[ -z $mb_ok ]]; then
+ ZTST_unimplemented="no UTF-8 locale or multibyte mode is not implemented"
+ else
+ print -u $ZTST_fd Testing RE2 multibyte with locale $LANG
+ mkdir multibyte.tmp && cd multibyte.tmp
+ fi
+
+%test
+
+ [[ 'foo→bar' -re2-match .([^[:ascii:]]). ]]
+ print $MATCH
+ print $match[1]
+0:Basic non-ASCII regexp matching
+>o→b
+>→
+
+ [[ alphabeta -re2-match a([^a]+)a ]]
+ echo "$? basic"
+ print $MATCH
+ print $match[1]
+ [[ ! alphabeta -re2-match a(.+)a ]]
+ echo "$? negated op"
+ [[ alphabeta -re2-match ^b ]]
+ echo "$? failed match"
+# default matches on first, then takes longest substring
+# -longest keeps looking
+ [[ abb -re2-match a(b|bb) ]]
+ echo "$? first .${MATCH}.${match[1]}."
+ [[ abb -re2-match-longest a(b|bb) ]]
+ echo "$? longest .${MATCH}.${match[1]}."
+ [[ alphabeta -re2-match ab ]]; echo "$? unanchored"
+ [[ alphabeta -re2-match ^ab ]]; echo "$? anchored"
+ [[ alphabeta -re2-match '^a(\w+)a$' ]]
+ echo "$? perl class used"
+ echo ".${MATCH}. .${match[1]}."
+ [[ alphabeta -re2-match-posix '^a(\w+)a$' ]]
+ echo "$? POSIX-mode, should inhibit Perl class"
+ [[ alphabeta -re2-match-posixperl '^a(\w+)a$' ]]
+ echo "$? POSIX-mode with Perl classes enabled .${match[1]}."
+ unset MATCH match
+ [[ alphabeta -re2-match ^a([^a]+)a([^a]+)a$ ]]
+ echo "$? matched, set vars"
+ echo ".$MATCH. ${#MATCH}"
+ echo ".${(j:|:)match[*]}."
+ unset MATCH match
+ [[ alphabeta -re2-match fr(.+)d ]]
+ echo "$? unmatched, not setting MATCH/match"
+ echo ".$MATCH. ${#MATCH}"
+ echo ".${(j:|:)match[*]}."
+0:Basic matching & result codes
+>0 basic
+>alpha
+>lph
+>1 negated op
+>1 failed match
+>0 first .ab.b.
+>0 longest .abb.bb.
+>0 unanchored
+>1 anchored
+>0 perl class used
+>.alphabeta. .lphabet.
+>1 POSIX-mode, should inhibit Perl class
+>0 POSIX-mode with Perl classes enabled .lphabet.
+>0 matched, set vars
+>.alphabeta. 9
+>.lph|bet.
+>1 unmatched, not setting MATCH/match
+>.. 0
+>..
+
+ m() {
+ unset MATCH MBEGIN MEND match mbegin mend
+ [[ $2 -re2-match $3 ]]
+ print $? $1: m:${MATCH}: ma:${(j:|:)match}: MBEGIN=$MBEGIN MEND=$MEND mbegin="(${mbegin[*]})" mend="(${mend[*]})"
+ }
+ data='alpha beta gamma delta'
+ m uncapturing $data '\b\w+\b'
+ m capturing $data '\b(\w+)\b'
+ m 'capture 2' $data '\b(\w+)\s+(\w+)\b'
+ m 'capture repeat' $data '\b(?:(\w+)\s+)+(\w+)\b'
+0:Beginning and end testing
+>0 uncapturing: m:alpha: ma:: MBEGIN=1 MEND=5 mbegin=() mend=()
+>0 capturing: m:alpha: ma:alpha: MBEGIN=1 MEND=5 mbegin=(1) mend=(5)
+>0 capture 2: m:alpha beta: ma:alpha|beta: MBEGIN=1 MEND=10 mbegin=(1 7) mend=(5 10)
+>0 capture repeat: m:alpha beta gamma delta: ma:gamma|delta: MBEGIN=1 MEND=22 mbegin=(12 18) mend=(16 22)
+
+
+ unset match mend
+ s=$'\u00a0'
+ [[ $s -re2-match '^.$' ]] && print OK
+ [[ A${s}B -re2-match .(.). && $match[1] == $s ]] && print OK
+ [[ A${s}${s}B -re2-match A([^[:ascii:]]*)B && $mend[1] == 3 ]] && print OK
+ unset s
+0:Raw IMETA characters in input string
+>OK
+>OK
+>OK
+
+ [[ foo -re2-match f.+ ]] ; print $?
+ [[ foo -re2-match x.+ ]] ; print $?
+ [[ ! foo -re2-match f.+ ]] ; print $?
+ [[ ! foo -re2-match x.+ ]] ; print $?
+ [[ foo -re2-match f.+ && bar -re2-match b.+ ]] ; print $?
+ [[ foo -re2-match x.+ && bar -re2-match b.+ ]] ; print $?
+ [[ foo -re2-match f.+ && bar -re2-match x.+ ]] ; print $?
+ [[ ! foo -re2-match f.+ && bar -re2-match b.+ ]] ; print $?
+ [[ foo -re2-match f.+ && ! bar -re2-match b.+ ]] ; print $?
+ [[ ! ( foo -re2-match f.+ && bar -re2-match b.+ ) ]] ; print $?
+ [[ ! foo -re2-match x.+ && bar -re2-match b.+ ]] ; print $?
+ [[ foo -re2-match x.+ && ! bar -re2-match b.+ ]] ; print $?
+ [[ ! ( foo -re2-match x.+ && bar -re2-match b.+ ) ]] ; print $?
+0:Regex result inversion detection
+>0
+>1
+>1
+>0
+>0
+>1
+>1
+>1
+>1
+>1
+>0
+>1
+>0
+
+# Subshell because crash on failure
+ ( [[ test.txt -re2-match '^(.*_)?(test)' ]]
+ echo $match[2] )
+0:regression for segmentation fault (pcre, dup for re2), workers/38307
+>test
+
+ setopt BASH_REMATCH KSH_ARRAYS
+ unset MATCH MBEGIN MEND match mbegin mend BASH_REMATCH
+ [[ alphabeta -re2-match '^a([^a]+)(a)([^a]+)a$' ]]
+ echo "$? bash_rematch"
+ echo "m:${MATCH}: ma:${(j:|:)match}:"
+ echo MBEGIN=$MBEGIN MEND=$MEND mbegin="(${mbegin[*]})" mend="(${mend[*]})"
+ echo "BASH_REMATCH=[${(j:, :)BASH_REMATCH[@]}]"
+ echo "[0]=${BASH_REMATCH[0]} [1]=${BASH_REMATCH[1]}"
+0:bash_rematch works
+>0 bash_rematch
+>m:: ma::
+>MBEGIN= MEND= mbegin=() mend=()
+>BASH_REMATCH=[alphabeta, lph, a, bet]
+>[0]=alphabeta [1]=lph
+
diff --git a/configure.ac b/configure.ac
index 0e0bd53..9c23691 100644
--- a/configure.ac
+++ b/configure.ac
@@ -442,6 +442,11 @@ AC_ARG_ENABLE(pcre,
AC_HELP_STRING([--enable-pcre],
[enable the search for the pcre library (may create run-time library dependencies)]))
+dnl Do you want to look for re2 support?
+AC_ARG_ENABLE(re2,
+AC_HELP_STRING([--enable-re2],
+[enable the search for cre2 C-language bindings and re2 library]))
+
dnl Do you want to look for capability support?
AC_ARG_ENABLE(cap,
AC_HELP_STRING([--enable-cap],
@@ -683,6 +688,15 @@ if test "x$ac_cv_prog_PCRECONF" = xpcre-config; then
fi
fi
+if test x$enable_re2 = xyes; then
+AC_CHECK_LIB([re2],[main],,
+ [AC_MSG_FAILURE([test for RE2 library failed])])
+AC_CHECK_LIB([cre2],[cre2_version_string],,
+ [AC_MSG_FAILURE([test for CRE2 library failed])])
+AC_CHECK_HEADERS([cre2.h],,
+ [AC_MSG_ERROR([test for RE2 header failed])])
+fi
+
AC_CHECK_HEADERS(sys/time.h sys/times.h sys/select.h termcap.h termio.h \
termios.h sys/param.h sys/filio.h string.h memory.h \
limits.h fcntl.h libc.h sys/utsname.h sys/resource.h \
--
2.10.0
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] re2: fix clean-up path; fix two comments
2016-09-08 4:15 [PATCH] Add zsh/re2 module with conditions Phil Pennock
@ 2016-09-08 13:56 ` Phil Pennock
2016-09-08 21:14 ` [PATCH] Add zsh/re2 module with conditions Oliver Kiddle
[not found] ` <20160908144203.GA28545@fujitsu.shahaf.local2>
2 siblings, 0 replies; 8+ messages in thread
From: Phil Pennock @ 2016-09-08 13:56 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]
On 2016-09-08 at 00:15 -0400, Phil Pennock wrote:
> I've also pushed this code to a GitHub repo, philpennock/zsh-code on the
> re2 branch: https://github.com/philpennock/zsh-code/tree/re2
This change is there too.
> (Oh, I can't spell "tough", it seems; deferring fix for now).
Fixed. Also fixed a bug described just below in the patch body, and
swapped a FIXME comment for a TODO, referencing whatever future work
changes =~ binding. (Feedback on that idea, outlined in previous mail,
appreciated!)
-Phil
----------------------------8< git patch >8-----------------------------
The clean-up path is for an internal function being passed an id which
it can't handle, but the ids come from this file, so it's protection
against coding mistakes in future extension. In that hypothetical case,
we'd leak the memory of one RE2 opt object each time the matching
function was called in the unhandled id-profile.
Also clean up two comments.
---
Src/Modules/re2.c | 4 ++--
Test/V11re2.ztst | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Src/Modules/re2.c b/Src/Modules/re2.c
index e542723..f6a5283 100644
--- a/Src/Modules/re2.c
+++ b/Src/Modules/re2.c
@@ -28,7 +28,7 @@
*/
/* This is heavily based upon my earlier regex module, with Peter's fixes
- * for the tought stuff I had skipped / gotten wrong. */
+ * for the tougher stuff I had skipped / gotten wrong. */
#include "re2.mdh"
#include "re2.pro"
@@ -106,7 +106,7 @@ zcond_re2_match(char **a, int id)
break;
default:
DPUTS(1, "bad re2 option");
- goto CLEANUP_UNMETAONLY;
+ goto CLEANUP_OPT;
}
rex = cre2_new(rhre, strlen(rhre), opt);
diff --git a/Test/V11re2.ztst b/Test/V11re2.ztst
index d6e327c..823a5ef 100644
--- a/Test/V11re2.ztst
+++ b/Test/V11re2.ztst
@@ -7,7 +7,7 @@
fi
# Load the rest of the builtins
zmodload zsh/re2
- ##FIXME#setopt rematch_pcre
+ # TODO: use future mechanism to switch =~ to use re2 and test =~ too
# Find a UTF-8 locale.
setopt multibyte
# Don't let LC_* override our choice of locale.
--
2.10.0
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add zsh/re2 module with conditions
2016-09-08 4:15 [PATCH] Add zsh/re2 module with conditions Phil Pennock
2016-09-08 13:56 ` [PATCH] re2: fix clean-up path; fix two comments Phil Pennock
@ 2016-09-08 21:14 ` Oliver Kiddle
2016-09-08 21:48 ` Phil Pennock
[not found] ` <20160908144203.GA28545@fujitsu.shahaf.local2>
2 siblings, 1 reply; 8+ messages in thread
From: Oliver Kiddle @ 2016-09-08 21:14 UTC (permalink / raw)
To: zsh-workers; +Cc: Phil Pennock
Phil Pennock wrote:
> At this point, I haven't done anything about rebinding =~ to handle
> this. It's purely new infix-operators based on words. I'm thinking
> perhaps something along the lines of $zsh_reop_modules=(regex), with
> `setopt rematch_pcre` becoming a compatibility interface that acts as
> though `pcre` were prepended to that list and
>
> zsh_reop_modules=(pcre regex)
>
> having the same effect. Then I could use `zsh_reop_modules=(re2 regex)`.
> Does this seem sane? Anyone have better suggestions? I do want to have
If the first listed module in the array has control of =~, what is
the meaning of subsequent ones?
How about perhaps using a module alias so you would do, e.g.
zmodload -A zsh/default/regex=zsh/re2
Oliver
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add zsh/re2 module with conditions
2016-09-08 21:14 ` [PATCH] Add zsh/re2 module with conditions Oliver Kiddle
@ 2016-09-08 21:48 ` Phil Pennock
0 siblings, 0 replies; 8+ messages in thread
From: Phil Pennock @ 2016-09-08 21:48 UTC (permalink / raw)
To: Oliver Kiddle; +Cc: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]
On 2016-09-08 at 23:14 +0200, Oliver Kiddle wrote:
> Phil Pennock wrote:
> > At this point, I haven't done anything about rebinding =~ to handle
> > this. It's purely new infix-operators based on words. I'm thinking
> > perhaps something along the lines of $zsh_reop_modules=(regex), with
> > `setopt rematch_pcre` becoming a compatibility interface that acts as
> > though `pcre` were prepended to that list and
> >
> > zsh_reop_modules=(pcre regex)
> >
> > having the same effect. Then I could use `zsh_reop_modules=(re2 regex)`.
> > Does this seem sane? Anyone have better suggestions? I do want to have
>
> If the first listed module in the array has control of =~, what is
> the meaning of subsequent ones?
Ignored, as long as the first one could be loaded.
The first loadable one gets =~
It's bound and tied at that point.
If the variable is re-assigned to, the shell would try again to work
through the list.
> How about perhaps using a module alias so you would do, e.g.
> zmodload -A zsh/default/regex=zsh/re2
Would probably need to be more than that, to be able to alias explicit
features. It's C: infix-conditionals which need to be grabbed, a
different one from each module.
-Phil
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20160908144203.GA28545@fujitsu.shahaf.local2>]
end of thread, other threads:[~2016-09-14 18:47 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 4:15 [PATCH] Add zsh/re2 module with conditions Phil Pennock
2016-09-08 13:56 ` [PATCH] re2: fix clean-up path; fix two comments Phil Pennock
2016-09-08 21:14 ` [PATCH] Add zsh/re2 module with conditions Oliver Kiddle
2016-09-08 21:48 ` Phil Pennock
[not found] ` <20160908144203.GA28545@fujitsu.shahaf.local2>
[not found] ` <20160908204737.GA12164@breadbox.private.spodhuis.org>
[not found] ` <20160908211643.GA4432@fujitsu.shahaf.local2>
[not found] ` <20160909005557.GB12371@breadbox.private.spodhuis.org>
[not found] ` <20160909045739.GA6623@fujitsu.shahaf.local2>
[not found] ` <20160910010456.GA85981@tower.spodhuis.org>
[not found] ` <20160910190924.GB4045@fujitsu.shahaf.local2>
2016-09-11 19:23 ` zsh/re2 : avoid until further notice Phil Pennock
2016-09-11 19:27 ` Phil Pennock
2016-09-12 3:50 ` Daniel Shahaf
2016-09-14 18:47 ` Phil Pennock
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).