From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26589 invoked by alias); 8 Sep 2016 04:16:05 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 39234 Received: (qmail 29881 invoked from network); 8 Sep 2016 04:16:05 -0000 X-Qmail-Scanner-Diagnostics: from mx.spodhuis.org by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(94.142.241.89):SA:0(-1.1/5.0):. Processed in 0.913073 secs); 08 Sep 2016 04:16:05 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=RP_MATCHES_RCVD,SPF_PASS, T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: zsh-workers+phil.pennock@spodhuis.org X-Qmail-Scanner-Mime-Attachments: |signature.asc| X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at spodhuis.org designates 94.142.241.89 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201608; h=Content-Type:MIME-Version:Message-ID:Subject:To: From:Date; bh=soI/w+Dl0J0Ty5Po38J2ktdgATvlbpWXB6jL+URray8=; b=oc3tLV8FxYmBCD0 GqsIbNHs8anYbvDZuCUPqs9mpVD8Nn5nhBm1oOQzaD3urYPtURtjm46pHwrvFqDdIa48xgvQdomTn KU73w2R9KrRa5Yi9FWaqcQ7JAbyd+sDEKfRdj4scevEnKbu+N0FJ1+EXWvzcTVhAajG1fNAJ8qxvU j8NTKAqhfLJRL2pWsTsMvZl2XYokiLljlxoJ1O3; Date: Thu, 8 Sep 2016 00:15:57 -0400 From: Phil Pennock To: zsh-workers@zsh.org Subject: [PATCH] Add zsh/re2 module with conditions Message-ID: <20160908041556.GA8401@breadbox.private.spodhuis.org> Mail-Followup-To: zsh-workers@zsh.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="5vNYLRcllDrimb99" Content-Disposition: inline OpenPGP: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc --5vNYLRcllDrimb99 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Folks, I tend to get automatically kicked off the -workers list by ezmlm because I reject mails which are self-declared as spam, so please CC replies to me. Also: my commit-bit is currently surrended-for-safekeeping because I've not been doing much with Zsh, so someone else will need to merge this, if it's accepted. RE2 is a regular expression library, written in C++, from Google. It offers most of the features of PCRE, excluding those which can't be handled without backtracking. It's BSD-licensed. This patch adds the zsh/re2 module. It used the `cre` library to have C-language bindings. At this point, I haven't done anything about rebinding =3D~ to handle this. It's purely new infix-operators based on words. I'm thinking perhaps something along the lines of $zsh_reop_modules=3D(regex), with `setopt rematch_pcre` becoming a compatibility interface that acts as though `pcre` were prepended to that list and zsh_reop_modules=3D(pcre regex) having the same effect. Then I could use `zsh_reop_modules=3D(re2 regex)`. Does this seem sane? Anyone have better suggestions? I do want to have =3D~ able to use this module, but the current work stands alone and should be merge-able as-is. Is there particular interest in having command-forms too? There's no "study" concept, but I suppose compiling a hairy regexp only once might be good in some situations (but why use shell for those?) This has been tested on MacOS 10.10.5. My ulterior motive is that I want "better than zsh/regex" available by default on MacOS, where Apple build without GPL modules for the system Zsh. I hope that by offering this option, Apple's engineers might incorporate this one day and I can be happier. :) I've also pushed this code to a GitHub repo, philpennock/zsh-code on the re2 branch: https://github.com/philpennock/zsh-code/tree/re2 Tested with re2 20160901 installed via Brew, cre2 installed via: git clone https://github.com/marcomaggi/cre2 cd cre2 LIBTOOLIZE=3Dglibtoolize sh ./autogen.sh CXX=3Dg++-6 CC=3Dgcc-6 ./configure --prefix=3D/opt/regexps make doc/stamp-vti make make install and Zsh configured with: CPPFLAGS=3D-I/opt/regexps/include LDFLAGS=3D-L/opt/regexps/lib \ ./configure --prefix=3D/opt/zsh-devel --enable-pcre --enable-re2 \ --enable-cap --enable-multibyte --enable-zsh-secure-free \ --with-tcsetpgrp --enable-etcdir=3D/etc Feedback welcome. (Oh, I can't spell "tough", it seems; deferring fix for now). Regards, -Phil ----------------------------8< git patch >8----------------------------- Add support for Google's BSD-licensed RE2 library, via the cre C-language bindings (also BSD-licensed). Guard with --enable-re2 for now. Adds 4 infix conditions. Currently no commands, no support for changing how =3D~ binds. Includes tests & docs --- Doc/Makefile.in | 2 +- Doc/Zsh/mod_re2.yo | 65 +++++++++++ INSTALL | 8 ++ Src/Modules/re2.c | 324 ++++++++++++++++++++++++++++++++++++++++++++++++= ++++ Src/Modules/re2.mdd | 5 + Test/V11re2.ztst | 170 +++++++++++++++++++++++++++ configure.ac | 14 +++ 7 files changed, 587 insertions(+), 1 deletion(-) create mode 100644 Doc/Zsh/mod_re2.yo create mode 100644 Src/Modules/re2.c create mode 100644 Src/Modules/re2.mdd create mode 100644 Test/V11re2.ztst diff --git a/Doc/Makefile.in b/Doc/Makefile.in index 2752096..8c00876 100644 --- a/Doc/Makefile.in +++ b/Doc/Makefile.in @@ -65,7 +65,7 @@ Zsh/mod_datetime.yo Zsh/mod_db_gdbm.yo Zsh/mod_deltochar.= yo \ Zsh/mod_example.yo Zsh/mod_files.yo Zsh/mod_langinfo.yo \ Zsh/mod_mapfile.yo Zsh/mod_mathfunc.yo Zsh/mod_newuser.yo \ Zsh/mod_parameter.yo Zsh/mod_pcre.yo Zsh/mod_private.yo \ -Zsh/mod_regex.yo Zsh/mod_sched.yo Zsh/mod_socket.yo \ +Zsh/mod_re2.yo Zsh/mod_regex.yo Zsh/mod_sched.yo Zsh/mod_socket.yo \ Zsh/mod_stat.yo Zsh/mod_system.yo Zsh/mod_tcp.yo \ Zsh/mod_termcap.yo Zsh/mod_terminfo.yo \ Zsh/mod_zftp.yo Zsh/mod_zle.yo Zsh/mod_zleparameter.yo \ diff --git a/Doc/Zsh/mod_re2.yo b/Doc/Zsh/mod_re2.yo new file mode 100644 index 0000000..5527440 --- /dev/null +++ b/Doc/Zsh/mod_re2.yo @@ -0,0 +1,65 @@ +COMMENT(!MOD!zsh/re2 +Interface to the RE2 regular expression library. +!MOD!) +cindex(regular expressions) +cindex(re2) +The tt(zsh/re2) module makes available the following test conditions: + +startitem() +findex(re2-match) +item(var(expr) tt(-re2-match) var(regex))( +Matches a string against an RE2 regular expression. +On successful match, +matched portion of the string will normally be placed in the tt(MATCH) +variable. If there are any capturing parentheses within the regex, then +the tt(match) array variable will contain those. +If the match is not successful, then the variables will not be altered. + +In addition, the tt(MBEGIN) and tt(MEND) variables are updated to point +to the offsets within var(expr) for the beginning and end of the matched +text, with the tt(mbegin) and tt(mend) arrays holding the beginning and +end of each substring matched. + +If tt(BASH_REMATCH) is set, then the array tt(BASH_REMATCH) will be set +instead of all of the other variables. + +Canonical documentation for this syntax accepted by this regular expression +engine can be found at: +uref(https://github.com/google/re2/wiki/Syntax) +) +enditem() + +startitem() +findex(re2-match-posix) +item(var(expr) tt(-re2-match-posix) var(regex))( +Matches as per tt(-re2-match) but configuring the RE2 engine to use +POSIX syntax. +) +enditem() + +startitem() +findex(re2-match-posixperl) +item(var(expr) tt(-re2-match-posixperl) var(regex))( +Matches as per tt(-re2-match) but configuring the RE2 engine to use +POSIX syntax, with the Perl classes and word-boundary extensions re-enabled +too. + +This thus adds support for: +tt(\d), tt(\s), tt(\w), tt(\D), tt(\S), tt(\W), tt(\b), and tt(\B). +) +enditem() + +startitem() +findex(re2-match-longest) +item(var(expr) tt(-re2-match-longest) var(regex))( +Matches as per tt(-re2-match) but configuring the RE2 engine to find +the longest match, instead of the left-most. + +For example, given + +example([[ abb -re2-match-longest ^a+LPAR()b|bb+RPAR() ]]) + +This will match the right-branch, thus tt(abb), where tt(-re2-match) would +instead match only tt(ab). +) +enditem() diff --git a/INSTALL b/INSTALL index 99895bd..887dd8e 100644 --- a/INSTALL +++ b/INSTALL @@ -558,6 +558,14 @@ only be searched for if the option --enable-pcre is pa= ssed to configure. =20 (Future versions of the shell may have a better fix for this problem.) =20 +--enable-re2: + +The RE2 library is written in C++, so a C-library shim layer is needed for +use by Zsh. We use https://github.com/marcomaggi/cre2 for this, which is +currently at version 0.3.1. Both re2 and cre2 need to be installed for +this option to successfully enable the zsh/re2 module. The Zsh +functionality is currently experimental. + --enable-cap: =20 This searches for POSIX capabilities; if found, the `cap' library diff --git a/Src/Modules/re2.c b/Src/Modules/re2.c new file mode 100644 index 0000000..e542723 --- /dev/null +++ b/Src/Modules/re2.c @@ -0,0 +1,324 @@ +/* + * re2.c + * + * This file is part of zsh, the Z shell. + * + * Copyright (c) 2016 Phil Pennock + * All Rights Reserved. + * + * Permission is hereby granted, without written agreement and without + * license or royalty fees, to use, copy, modify, and distribute this + * software and to distribute modified versions of this software for any + * purpose, provided that the above copyright notice and the following + * two paragraphs appear in all copies of this software. + * + * In no event shall Phil Pennock or the Zsh Development Group be liable + * to any party for direct, indirect, special, incidental, or consequential + * damages arising out of the use of this software and its documentation, + * even if Phil Pennock and the Zsh Development Group have been advised of + * the possibility of such damage. + * + * Phil Pennock and the Zsh Development Group specifically disclaim any + * warranties, including, but not limited to, the implied warranties of + * merchantability and fitness for a particular purpose. The software + * provided hereunder is on an "as is" basis, and Phil Pennock and the + * Zsh Development Group have no obligation to provide maintenance, + * support, updates, enhancements, or modifications. + * + */ + +/* This is heavily based upon my earlier regex module, with Peter's fixes + * for the tought stuff I had skipped / gotten wrong. */ + +#include "re2.mdh" +#include "re2.pro" + +/* + * re2 itself is a C++ library; zsh needs C language bindings. + * These come from . + */ +#include + +/* the conditions we support */ +#define ZRE2_COND_RE2 0 +#define ZRE2_COND_POSIX 1 +#define ZRE2_COND_POSIXPERL 2 +#define ZRE2_COND_LONGEST 3 + +/**/ +static int +zcond_re2_match(char **a, int id) +{ + cre2_regexp_t *rex; + cre2_options_t *opt; + cre2_string_t *m, *matches =3D NULL; + char *lhstr, *lhstr_zshmeta, *rhre, *rhre_zshmeta; + char **result_array, **x; + char *s; + char **mbegin, **mend, **bptr, **eptr; + size_t matchessz =3D 0; + int return_value, ncaptures, matched, nelem, start, n, indexing_base; + int remaining_len, charlen; + zlong offs; + + return_value =3D 0; /* 1 =3D> matched successfully */ + + lhstr_zshmeta =3D cond_str(a,0,0); + rhre_zshmeta =3D cond_str(a,1,0); + lhstr =3D ztrdup(lhstr_zshmeta); + unmetafy(lhstr, NULL); + rhre =3D ztrdup(rhre_zshmeta); + unmetafy(rhre, NULL); + + opt =3D cre2_opt_new(); + if (!opt) { + zwarn("re2 opt memory allocation failure"); + goto CLEANUP_UNMETAONLY; + } + /* nb: we can set encoding here; re2 assumes UTF-8 by default */ + cre2_opt_set_log_errors(opt, 0); /* don't hit stderr by default */ + if (!isset(CASEMATCH)) { + cre2_opt_set_case_sensitive(opt, 0); + } + + /* "The following options are only consulted when POSIX syntax is enab= led; + * when POSIX syntax is disabled: these features are always enabled and + * cannot be turned off." + * Seems hard to mis-parse, but I did. Okay, Perl classes \d,\w and f= riends + * always on normally, can _also_ be enabled in POSIX mode. */ + + switch (id) { + case ZRE2_COND_RE2: + /* nothing to do, this is default */ + break; + case ZRE2_COND_POSIX: + cre2_opt_set_posix_syntax(opt, 1); + break; + case ZRE2_COND_POSIXPERL: + cre2_opt_set_posix_syntax(opt, 1); + /* we enable Perl classes (\d, \s, \w, \D, \S, \W) + * and boundaries/not (\b \B) */ + cre2_opt_set_perl_classes(opt, 1); + cre2_opt_set_word_boundary(opt, 1); + break; + case ZRE2_COND_LONGEST: + cre2_opt_set_longest_match(opt, 1); + break; + default: + DPUTS(1, "bad re2 option"); + goto CLEANUP_UNMETAONLY; + } + + rex =3D cre2_new(rhre, strlen(rhre), opt); + if (!rex) { + zwarn("re2 regular expression memory allocation failure"); + goto CLEANUP_OPT; + } + if (cre2_error_code(rex)) { + zwarn("re2 rexexp compilation failed: %s", cre2_error_string(rex)); + goto CLEANUP; + } + + ncaptures =3D cre2_num_capturing_groups(rex); + /* the nmatch for cre2_match follows the usual pattern of index 0 hold= ing + * the entire matched substring, index 1 holding the first capturing + * sub-expression, etc. So we need ncaptures+1 elements. */ + matchessz =3D (ncaptures + 1) * sizeof(cre2_string_t); + matches =3D zalloc(matchessz); + + matched =3D cre2_match(rex, + lhstr, strlen(lhstr), /* text to match against */ + 0, strlen(lhstr), /* substring of text to consider */ + CRE2_UNANCHORED, /* user should explicitly anchor */ + matches, (ncaptures+1)); + if (!matched) + goto CLEANUP; + return_value =3D 1; + + /* We have a match, we will return success, we have array of cre2_stri= ng_t + * items, each with .data and .length fields pointing into the matched= text, + * all in unmetafied format. + * + * We need to collect the results, put together various arrays and off= set + * variables, while respecting options to change the array set, the in= dexing + * of that array and everything else that 26 years of history has endo= wed + * upon us. */ + /* option BASHREMATCH set: + * set $BASH_REMATCH instead of $MATCH/$match + * entire matched portion in index 0 (useful with option KSH_ARRAYS) + * option _not_ set: + * $MATCH scalar gets entire string + * $match array gets substrings + * $MBEGIN $MEND scalars get offsets of entire match + * $mbegin $mend arrays get offsets of substrings + * all of the offsets depend upon KSHARRAYS to determine indexing! + */ + + if (isset(BASHREMATCH)) { + start =3D 0; + nelem =3D ncaptures + 1; + } else { + start =3D 1; + nelem =3D ncaptures; + } + result_array =3D NULL; + if (nelem) { + result_array =3D x =3D (char **) zalloc(sizeof(char *) * (nelem + 1)); + for (m =3D matches + start, n =3D start; n <=3D ncaptures; ++n, ++m, ++x)= { + /* .data is (const char *), metafy can modify in-place so takes + * (char *) but doesn't modify given META_DUP, so safe to drop + * the const. */ + *x =3D metafy((char *)m->data, m->length, META_DUP); + } + *x =3D NULL; + } + + if (isset(BASHREMATCH)) { + setaparam("BASH_REMATCH", result_array); + goto CLEANUP; + } + + indexing_base =3D isset(KSHARRAYS) ? 0 : 1; + + setsparam("MATCH", metafy((char *)matches[0].data, matches[0].length, = META_DUP)); + /* count characters before the match */ + s =3D lhstr; + remaining_len =3D matches[0].data - lhstr; + offs =3D 0; + MB_CHARINIT(); + while (remaining_len) { + offs++; + charlen =3D MB_CHARLEN(s, remaining_len); + s +=3D charlen; + remaining_len -=3D charlen; + } + setiparam("MBEGIN", offs + indexing_base); + /* then the characters within the match */ + remaining_len =3D matches[0].length; + while (remaining_len) { + offs++; + charlen =3D MB_CHARLEN(s, remaining_len); + s +=3D charlen; + remaining_len -=3D charlen; + } + /* zsh ${foo[a,b]} is inclusive of end-points, [a,b] not [a,b) */ + setiparam("MEND", offs + indexing_base - 1); + if (!nelem) { + goto CLEANUP; + } + + bptr =3D mbegin =3D (char **)zalloc(sizeof(char *)*(nelem+1)); + eptr =3D mend =3D (char **)zalloc(sizeof(char *)*(nelem+1)); + for (m =3D matches + start, n =3D 0; + n < nelem; + ++n, ++m, ++bptr, ++eptr) + { + char buf[DIGBUFSIZE]; + if (m->data =3D=3D NULL) { + /* FIXME: have assumed this is the API for non-matching substrings; c= onfirm! */ + *bptr =3D ztrdup("-1"); + *eptr =3D ztrdup("-1"); + continue; + } + s =3D lhstr; + remaining_len =3D m->data - lhstr; + offs =3D 0; + /* Find the start offset */ + MB_CHARINIT(); + while (remaining_len) { + offs++; + charlen =3D MB_CHARLEN(s, remaining_len); + s +=3D charlen; + remaining_len -=3D charlen; + } + convbase(buf, offs + indexing_base, 10); + *bptr =3D ztrdup(buf); + /* Continue to the end offset */ + remaining_len =3D m->length; + while (remaining_len) { + offs++; + charlen =3D MB_CHARLEN(s, remaining_len); + s +=3D charlen; + remaining_len -=3D charlen; + } + convbase(buf, offs + indexing_base - 1, 10); + *eptr =3D ztrdup(buf); + } + *bptr =3D *eptr =3D NULL; + + setaparam("match", result_array); + setaparam("mbegin", mbegin); + setaparam("mend", mend); + +CLEANUP: + if (matches) + zfree(matches, matchessz); + cre2_delete(rex); +CLEANUP_OPT: + cre2_opt_delete(opt); +CLEANUP_UNMETAONLY: + free(lhstr); + free(rhre); + return return_value; +} + + +static struct conddef cotab[] =3D { + CONDDEF("re2-match", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_RE2= ), + CONDDEF("re2-match-posix", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_CO= ND_POSIX), + CONDDEF("re2-match-posixperl", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE= 2_COND_POSIXPERL), + CONDDEF("re2-match-longest", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_= COND_LONGEST), +}; + + +static struct features module_features =3D { + NULL, 0, + cotab, sizeof(cotab)/sizeof(*cotab), + NULL, 0, + NULL, 0, + 0 +}; + + +/**/ +int +setup_(UNUSED(Module m)) +{ + return 0; +} + +/**/ +int +features_(Module m, char ***features) +{ + *features =3D featuresarray(m, &module_features); + return 0; +} + +/**/ +int +enables_(Module m, int **enables) +{ + return handlefeatures(m, &module_features, enables); +} + +/**/ +int +boot_(UNUSED(Module m)) +{ + return 0; +} + +/**/ +int +cleanup_(Module m) +{ + return setfeatureenables(m, &module_features, NULL); +} + +/**/ +int +finish_(UNUSED(Module m)) +{ + return 0; +} diff --git a/Src/Modules/re2.mdd b/Src/Modules/re2.mdd new file mode 100644 index 0000000..b20838c --- /dev/null +++ b/Src/Modules/re2.mdd @@ -0,0 +1,5 @@ +name=3Dzsh/re2 +link=3D'if test "x$enable_re2" =3D xyes && test "x$ac_cv_lib_cre2_cre2_ver= sion_string" =3D xyes; then echo dynamic; else echo no; fi' +load=3Dno + +objects=3D"re2.o" diff --git a/Test/V11re2.ztst b/Test/V11re2.ztst new file mode 100644 index 0000000..d6e327c --- /dev/null +++ b/Test/V11re2.ztst @@ -0,0 +1,170 @@ +%prep + + if ! zmodload -F zsh/re2 C:re2-match 2>/dev/null + then + ZTST_unimplemented=3D"the zsh/re2 module is not available" + return 0 + fi +# Load the rest of the builtins + zmodload zsh/re2 + ##FIXME#setopt rematch_pcre +# Find a UTF-8 locale. + setopt multibyte +# Don't let LC_* override our choice of locale. + unset -m LC_\* + mb_ok=3D + langs=3D(en_{US,GB}.{UTF-,utf}8 en.UTF-8 + $(locale -a 2>/dev/null | egrep 'utf8|UTF-8')) + for LANG in $langs; do + if [[ =C3=A9 =3D ? ]]; then + mb_ok=3D1 + break; + fi + done + if [[ -z $mb_ok ]]; then + ZTST_unimplemented=3D"no UTF-8 locale or multibyte mode is not impleme= nted" + else + print -u $ZTST_fd Testing RE2 multibyte with locale $LANG + mkdir multibyte.tmp && cd multibyte.tmp + fi + +%test + + [[ 'foo=E2=86=92bar' -re2-match .([^[:ascii:]]). ]] + print $MATCH + print $match[1] +0:Basic non-ASCII regexp matching +>o=E2=86=92b +>=E2=86=92 + + [[ alphabeta -re2-match a([^a]+)a ]] + echo "$? basic" + print $MATCH + print $match[1] + [[ ! alphabeta -re2-match a(.+)a ]] + echo "$? negated op" + [[ alphabeta -re2-match ^b ]] + echo "$? failed match" +# default matches on first, then takes longest substring +# -longest keeps looking + [[ abb -re2-match a(b|bb) ]] + echo "$? first .${MATCH}.${match[1]}." + [[ abb -re2-match-longest a(b|bb) ]] + echo "$? longest .${MATCH}.${match[1]}." + [[ alphabeta -re2-match ab ]]; echo "$? unanchored" + [[ alphabeta -re2-match ^ab ]]; echo "$? anchored" + [[ alphabeta -re2-match '^a(\w+)a$' ]] + echo "$? perl class used" + echo ".${MATCH}. .${match[1]}." + [[ alphabeta -re2-match-posix '^a(\w+)a$' ]] + echo "$? POSIX-mode, should inhibit Perl class" + [[ alphabeta -re2-match-posixperl '^a(\w+)a$' ]] + echo "$? POSIX-mode with Perl classes enabled .${match[1]}." + unset MATCH match + [[ alphabeta -re2-match ^a([^a]+)a([^a]+)a$ ]] + echo "$? matched, set vars" + echo ".$MATCH. ${#MATCH}" + echo ".${(j:|:)match[*]}." + unset MATCH match + [[ alphabeta -re2-match fr(.+)d ]] + echo "$? unmatched, not setting MATCH/match" + echo ".$MATCH. ${#MATCH}" + echo ".${(j:|:)match[*]}." +0:Basic matching & result codes +>0 basic +>alpha +>lph +>1 negated op +>1 failed match +>0 first .ab.b. +>0 longest .abb.bb. +>0 unanchored +>1 anchored +>0 perl class used +>.alphabeta. .lphabet. +>1 POSIX-mode, should inhibit Perl class +>0 POSIX-mode with Perl classes enabled .lphabet. +>0 matched, set vars +>.alphabeta. 9 +>.lph|bet. +>1 unmatched, not setting MATCH/match +>.. 0 +>.. + + m() { + unset MATCH MBEGIN MEND match mbegin mend + [[ $2 -re2-match $3 ]] + print $? $1: m:${MATCH}: ma:${(j:|:)match}: MBEGIN=3D$MBEGIN MEND=3D$M= END mbegin=3D"(${mbegin[*]})" mend=3D"(${mend[*]})" + } + data=3D'alpha beta gamma delta' + m uncapturing $data '\b\w+\b' + m capturing $data '\b(\w+)\b' + m 'capture 2' $data '\b(\w+)\s+(\w+)\b' + m 'capture repeat' $data '\b(?:(\w+)\s+)+(\w+)\b' +0:Beginning and end testing +>0 uncapturing: m:alpha: ma:: MBEGIN=3D1 MEND=3D5 mbegin=3D() mend=3D() +>0 capturing: m:alpha: ma:alpha: MBEGIN=3D1 MEND=3D5 mbegin=3D(1) mend=3D(= 5) +>0 capture 2: m:alpha beta: ma:alpha|beta: MBEGIN=3D1 MEND=3D10 mbegin=3D(= 1 7) mend=3D(5 10) +>0 capture repeat: m:alpha beta gamma delta: ma:gamma|delta: MBEGIN=3D1 ME= ND=3D22 mbegin=3D(12 18) mend=3D(16 22) + + + unset match mend + s=3D$'\u00a0' + [[ $s -re2-match '^.$' ]] && print OK + [[ A${s}B -re2-match .(.). && $match[1] =3D=3D $s ]] && print OK + [[ A${s}${s}B -re2-match A([^[:ascii:]]*)B && $mend[1] =3D=3D 3 ]] && pr= int OK + unset s +0:Raw IMETA characters in input string +>OK +>OK +>OK + + [[ foo -re2-match f.+ ]] ; print $? + [[ foo -re2-match x.+ ]] ; print $? + [[ ! foo -re2-match f.+ ]] ; print $? + [[ ! foo -re2-match x.+ ]] ; print $? + [[ foo -re2-match f.+ && bar -re2-match b.+ ]] ; print $? + [[ foo -re2-match x.+ && bar -re2-match b.+ ]] ; print $? + [[ foo -re2-match f.+ && bar -re2-match x.+ ]] ; print $? + [[ ! foo -re2-match f.+ && bar -re2-match b.+ ]] ; print $? + [[ foo -re2-match f.+ && ! bar -re2-match b.+ ]] ; print $? + [[ ! ( foo -re2-match f.+ && bar -re2-match b.+ ) ]] ; print $? + [[ ! foo -re2-match x.+ && bar -re2-match b.+ ]] ; print $? + [[ foo -re2-match x.+ && ! bar -re2-match b.+ ]] ; print $? + [[ ! ( foo -re2-match x.+ && bar -re2-match b.+ ) ]] ; print $? +0:Regex result inversion detection +>0 +>1 +>1 +>0 +>0 +>1 +>1 +>1 +>1 +>1 +>0 +>1 +>0 + +# Subshell because crash on failure + ( [[ test.txt -re2-match '^(.*_)?(test)' ]] + echo $match[2] ) +0:regression for segmentation fault (pcre, dup for re2), workers/38307 +>test + + setopt BASH_REMATCH KSH_ARRAYS + unset MATCH MBEGIN MEND match mbegin mend BASH_REMATCH + [[ alphabeta -re2-match '^a([^a]+)(a)([^a]+)a$' ]] + echo "$? bash_rematch" + echo "m:${MATCH}: ma:${(j:|:)match}:" + echo MBEGIN=3D$MBEGIN MEND=3D$MEND mbegin=3D"(${mbegin[*]})" mend=3D"(${= mend[*]})" + echo "BASH_REMATCH=3D[${(j:, :)BASH_REMATCH[@]}]" + echo "[0]=3D${BASH_REMATCH[0]} [1]=3D${BASH_REMATCH[1]}" +0:bash_rematch works +>0 bash_rematch +>m:: ma:: +>MBEGIN=3D MEND=3D mbegin=3D() mend=3D() +>BASH_REMATCH=3D[alphabeta, lph, a, bet] +>[0]=3Dalphabeta [1]=3Dlph + diff --git a/configure.ac b/configure.ac index 0e0bd53..9c23691 100644 --- a/configure.ac +++ b/configure.ac @@ -442,6 +442,11 @@ AC_ARG_ENABLE(pcre, AC_HELP_STRING([--enable-pcre], [enable the search for the pcre library (may create run-time library depen= dencies)])) =20 +dnl Do you want to look for re2 support? +AC_ARG_ENABLE(re2, +AC_HELP_STRING([--enable-re2], +[enable the search for cre2 C-language bindings and re2 library])) + dnl Do you want to look for capability support? AC_ARG_ENABLE(cap, AC_HELP_STRING([--enable-cap], @@ -683,6 +688,15 @@ if test "x$ac_cv_prog_PCRECONF" =3D xpcre-config; then fi fi =20 +if test x$enable_re2 =3D xyes; then +AC_CHECK_LIB([re2],[main],, + [AC_MSG_FAILURE([test for RE2 library failed])]) +AC_CHECK_LIB([cre2],[cre2_version_string],, + [AC_MSG_FAILURE([test for CRE2 library failed])]) +AC_CHECK_HEADERS([cre2.h],, + [AC_MSG_ERROR([test for RE2 header failed])]) +fi + AC_CHECK_HEADERS(sys/time.h sys/times.h sys/select.h termcap.h termio.h \ termios.h sys/param.h sys/filio.h string.h memory.h \ limits.h fcntl.h libc.h sys/utsname.h sys/resource.h \ --=20 2.10.0 --5vNYLRcllDrimb99 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJX0OXwAAoJEBPa2Zx+QVGcKS0QAJiZ6m24I/gToGp/9tLefwaC GbnxGXlYuc769DHwiAIqfw7I62hWsMi51ThYdZ4Ux4N9m3vI+p3yQ7KbF1EegMId RJMWqFedMBPBd5XxElfTW2rquIL4+jLPR4HvBtH9ik5bYV7ixkKy3kT0OseBQ8XI WujyDRz/76acMPF/sQTwsMxXi1tHRMFzpEB9kt0U6EQh7lSMPcSG8Uae9xwTnRkH hBackIklhtn6FoFLO2zeAcNJYpYBNdsPvGH5pzTRhexT0m60nHGGY1sUqTi3Qa7n kharK2iuclOCyj/AbrYhAwCBlLRztGDfuz9Z/32mo3Y0jJEztmv3as7NEKM97KS+ yUejEGPIs2nvBP4/OZwOa1NOhWDtNl5DT7Onn2yzF0y18XUpi6Ygnv2Yw/LtPG8c XSnOfuBnmTK8WDfr6zYy+/RjCSRA+4sc9yyakRxaPrET7muKHCIDNfWQGUIPY8CE zyaFsqb6cU8axrz893bLrZHDo0NP3DrGTw7CxAYOcOKTxxEsxfb+atN5SpwWShha DKmzeHVHG/wUFL9WmYnBzq8udbWVddgcIe7ZFdNgCZ4ysb7vosSAblQdIFV0KkbJ EFiAQmnd2V2hzh5tz+sbyCsjkj5xoOTcPCxtdzMDX5eNQEKi+Youf0jcDuxQzzeA R1ZHxlONucvHevQ165w+ =HjY4 -----END PGP SIGNATURE----- --5vNYLRcllDrimb99--