zsh-workers
 help / color / mirror / code / Atom feed
* Re: PATCH: zsh/regex and =~
@ 2007-05-02 14:49 Daniel Qarras
  2007-05-02 16:36 ` Peter Stephenson
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Qarras @ 2007-05-02 14:49 UTC (permalink / raw)
  To: zsh-workers

Hi,

FWIW, I just tried latest zsh-CVS as /bin/sh on Fedora Core 6 and now
with the =~ patch it works all ok as a bash replacement!

Only thing missing being support for echo $"Starting foo:" style
messages (used to localizate the message) but that's not much of a
problem.

Thanks,

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-02 14:49 PATCH: zsh/regex and =~ Daniel Qarras
@ 2007-05-02 16:36 ` Peter Stephenson
  2007-05-02 16:54   ` Bart Schaefer
  2007-05-02 17:12   ` Andrey Borzenkov
  0 siblings, 2 replies; 16+ messages in thread
From: Peter Stephenson @ 2007-05-02 16:36 UTC (permalink / raw)
  To: zsh-workers

Daniel Qarras <dqarras@yahoo.com> wrote:
> FWIW, I just tried latest zsh-CVS as /bin/sh on Fedora Core 6 and now
> with the =~ patch it works all ok as a bash replacement!

Thanks, interesting.

> Only thing missing being support for echo $"Starting foo:" style
> messages (used to localizate the message) but that's not much of a
> problem.

I don't even know how that works; it doesn't appear to be excessively
well documented.

pws


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-02 16:36 ` Peter Stephenson
@ 2007-05-02 16:54   ` Bart Schaefer
  2007-05-02 17:18     ` Andrey Borzenkov
  2007-05-02 17:12   ` Andrey Borzenkov
  1 sibling, 1 reply; 16+ messages in thread
From: Bart Schaefer @ 2007-05-02 16:54 UTC (permalink / raw)
  To: zsh-workers

On May 2,  5:36pm, Peter Stephenson wrote:
}
} > Only thing missing being support for echo $"Starting foo:" style
} > messages (used to localizate the message) but that's not much of a
} > problem.
} 
} I don't even know how that works; it doesn't appear to be excessively
} well documented.

I'm sure it relies on GNU gettext.

$"Your message here" is equivalent to $(gettext -s "Your message here"),
as far as I can tell, though it calls the library internally rather than
executing it as an external command.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-02 16:36 ` Peter Stephenson
  2007-05-02 16:54   ` Bart Schaefer
@ 2007-05-02 17:12   ` Andrey Borzenkov
  1 sibling, 0 replies; 16+ messages in thread
From: Andrey Borzenkov @ 2007-05-02 17:12 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On Wednesday 02 May 2007, Peter Stephenson wrote:
> Daniel Qarras <dqarras@yahoo.com> wrote:
>
> > Only thing missing being support for echo $"Starting foo:" style
> > messages (used to localizate the message) but that's not much of a
> > problem.
>
> I don't even know how that works; it doesn't appear to be excessively
> well documented.
>

Also documentation for gettext rather advices against using it and suggests 
using explicit gettext & Co. calls instead:

   The security holes of `$"..."' come from the fact that after looking
up the translation of the string, `bash' processes it like it processes
any double-quoted string: dollar and backquote processing, like `eval'
does.

With obvious implications.

Of course one possibility could be to make result fully quoted and immune to 
any further processing. OTOH it probably still breaks in nested evals.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-02 16:54   ` Bart Schaefer
@ 2007-05-02 17:18     ` Andrey Borzenkov
  2007-05-02 17:32       ` Peter Stephenson
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Borzenkov @ 2007-05-02 17:18 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

On Wednesday 02 May 2007, Bart Schaefer wrote:
> On May 2,  5:36pm, Peter Stephenson wrote:
> }
> } > Only thing missing being support for echo $"Starting foo:" style
> } > messages (used to localizate the message) but that's not much of a
> } > problem.
> }
> } I don't even know how that works; it doesn't appear to be excessively
> } well documented.
>
> I'm sure it relies on GNU gettext.
>

this is unspecified. On systems supporting gettext it is likely it; it may 
well be catgets (if my memory serves me right) or whatever.

> $"Your message here" is equivalent to $(gettext -s "Your message here"),
> as far as I can tell, though it calls the library internally rather than
> executing it as an external command.
>

One problem is I do not see any portable way to specify message catalog when 
using $"...". You are unlikely to be interested in internal bash messages.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-02 17:18     ` Andrey Borzenkov
@ 2007-05-02 17:32       ` Peter Stephenson
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Stephenson @ 2007-05-02 17:32 UTC (permalink / raw)
  To: Zsh hackers list

Andrey Borzenkov <arvidjaar@newmail.ru> wrote:
> One problem is I do not see any portable way to specify message
> catalog when using $"...". You are unlikely to be interested in
> internal bash messages.

Right, that's what's really at the back of my mind.  It seems to be
mostly used for system stuff, but I don't see any way of specifying
this at the system level rather than in the shell.  However, I'm still
a bit confused anyway about what comes from where.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-28  7:56 ` Phil Pennock
                     ` (3 preceding siblings ...)
  2007-05-01 21:59   ` Peter Stephenson
@ 2007-05-29  8:56   ` Phil Pennock
  4 siblings, 0 replies; 16+ messages in thread
From: Phil Pennock @ 2007-05-29  8:56 UTC (permalink / raw)
  To: zsh-workers

On 2007-04-28 at 00:56 -0700, Phil Pennock wrote:
> Index: Src/parse.c
> ===================================================================
> RCS file: /cvsroot/zsh/zsh/Src/parse.c,v
> retrieving revision 1.64
> diff -p -u -r1.64 parse.c
> --- Src/parse.c	23 Apr 2007 17:24:23 -0000	1.64
> +++ Src/parse.c	28 Apr 2007 07:42:52 -0000
> @@ -2124,6 +2124,12 @@ par_cond_triple(char *a, char *b, char *
>  	ecstr(a);
>  	ecstr(c);
>  	ecadd(ecnpats++);
> +    } else if ((b[0] == Equals || b[0] == '=') &&
> +               (b[1] == '~' || b[1] == Tilde) && ~b[2]) {
> +	ecadd(WCB_COND(COND_REGEX, 0));
> +	ecstr(a);
> +	ecstr(c);
> +	ecadd(ecnpats++);
>      } else if (b[0] == '-') {
>  	if ((t0 = get_cond_num(b + 1)) > -1) {
>  	    ecadd(WCB_COND(t0 + COND_NT, 0));

*blush*

Uhm, the third character of the sequence comprising the =~ operator
needs to be a NUL, which should be tested with a logical negation, not a
bitwise negation.

I'd wonder what I was thinking but apparently I wasn't thinking.

Could someone with commit access please fix that to be !b[2] ?

Thanks,
-Phil :^( who only noticed whilst debugging an updated viewvc install


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-02  0:11     ` Phil Pennock
@ 2007-05-02  2:53       ` Bart Schaefer
  0 siblings, 0 replies; 16+ messages in thread
From: Bart Schaefer @ 2007-05-02  2:53 UTC (permalink / raw)
  To: zsh-workers

On May 1,  5:11pm, Phil Pennock wrote:
}
} > I'll commit this even if there are quibbles, to establish either a stick
} > in the ground or a line in the sand.  (I asked what the difference
} > was and was told one was horizontal and the other vertical.)
} 
} You can tell the time by the stick in the ground, on a sunny day, if you
} draw lines in the sand to mark the hours.

A stick in the ground marks your point of farthest advance, from which
you are unwilling to retreat.  A line in the sand marks the limit to
which you intend to allow your adversary to advance.  As usual, hard
upright things signify aggression, and soft recumbent things, defense.
Further exploration of these metaphors is left as an exercise for the
reader, lest even more of my zsh mail end up in Gmail's spam folder.


(Yes, after months since the last incident, zsh mail is being junked by
google again.  Sigh.)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-05-01 21:59   ` Peter Stephenson
@ 2007-05-02  0:11     ` Phil Pennock
  2007-05-02  2:53       ` Bart Schaefer
  0 siblings, 1 reply; 16+ messages in thread
From: Phil Pennock @ 2007-05-02  0:11 UTC (permalink / raw)
  To: zsh-workers

On 2007-05-01 at 22:59 +0100, Peter Stephenson wrote:
> I didn't put RE_MATCH_PCRE on by default; it seems to me to be a
> user choice and having it depend on the shell emulation is more
> confusing than useful.  Similarly, I've made the condition code
> behave identically (apart, obviously, from the module =~ uses)
> whether or not the option is set; if the module you didn't
> ask for is not available, you get an error message rather than
> a different sort of regular expression.

Makes sense.

> I've fixed the doc for =~ and tweaked at least one other minor typo.

English really is my first language.  *cough*
Similarly, regex.c:
 /* if you want Basic syntax, make it an alternative options */

s/options/option/

> I've added the option NO_CASE_MATCH to zsh/regex handling (like bash).
> I don't know enough about the PCRE library to decide whether it's
> sensible to have the same effect there, but if it is that's fine by me.

It is.  If you look at bin_pcre_compile(), you can see that if the -i
option is passed, it ORs in the bitflag PCRE_CASELESS.  Both
bin_pcre_compile() and cond_pcre_match() need the obvious:
  if (!isset(CASEMATCH)) pcre_opts |= PCRE_CASELESS;
at a place of your choosing, before the call to pcre_compile().

You _can_ achieve the same thing using embedded options:

% [[ FoO =~ ^(?i)f([aeiou]+) ]] && print -l $MATCH $match
FoO
oO

but I see no reason to not be as compatible as possible in interpreting
basic behavioural options.  Aside: the embedded options parsing means
that the default stringification by ruby of a regexp type can be
directly parsed by zsh's pcre_compile, a happy accident for me which I
noticed yesterday.

% zrb '/^f([aeiou]+)/' ; print $zrb_value
(?-mix:^f([aeiou]+))
% pcre_compile $zrb_value ; pcre_match faui && print $match
aui

Neat.  :^)

> I've added some debugging code to test for a bad id passed to
> the regex-match handler.  This doesn't do a heck of a lot at
> the moment, but the case statement was looking a bit lonely
> with just one entry.

Doh.  Yes, of course, not something I normally skip.  Thanks.

> I'll commit this even if there are quibbles, to establish either a stick
> in the ground or a line in the sand.  (I asked what the difference
> was and was told one was horizontal and the other vertical.)

You can tell the time by the stick in the ground, on a sunny day, if you
draw lines in the sand to mark the hours.

Doing so accurately obviously requires control of both the horizontal
and the vertical.  I've reached the outer limit of what I'll say on this
topic.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-28  7:56 ` Phil Pennock
                     ` (2 preceding siblings ...)
  2007-04-29 15:16   ` Peter Stephenson
@ 2007-05-01 21:59   ` Peter Stephenson
  2007-05-02  0:11     ` Phil Pennock
  2007-05-29  8:56   ` Phil Pennock
  4 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2007-05-01 21:59 UTC (permalink / raw)
  To: zsh-workers

On Sat, 28 Apr 2007 00:56:35 -0700
Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> The attached patch and files, which includes documentation, adds a new
> loadable module, zsh/regex.  I've not examined widechar issues and which
> regex libraries actually do handle these.  I've not looked at linkage
> issues on platforms where regex (the POSIX interface, not regexp) is not
> a part of libc.
> 
> This also includes my previous =~ work, replacing the previous patch.
> I'm not sure that auto-unsetting REMATCH_PCRE is a good idea, so invite
> comments; also as to which should be the default value; I suppose that
> if pcre is not the default, then the warning can be put back in ...

Here's what I've ended up with after putting the patches together and
tweaking them.  The tweaks include (some of these deal with points
above):

I didn't put RE_MATCH_PCRE on by default; it seems to me to be a
user choice and having it depend on the shell emulation is more
confusing than useful.  Similarly, I've made the condition code
behave identically (apart, obviously, from the module =~ uses)
whether or not the option is set; if the module you didn't
ask for is not available, you get an error message rather than
a different sort of regular expression.

I've fixed the doc for =~ and tweaked at least one other minor typo.

I've added the option NO_CASE_MATCH to zsh/regex handling (like bash).
I don't know enough about the PCRE library to decide whether it's
sensible to have the same effect there, but if it is that's fine by me.

I've made regex.mdd only compile regex.c based on locating all
four POSIX functions.  This is the best way of handling conditional
support for modules: if it's not supported, there's nothing there
at all, so anything that tests will find out straight away it's
not supported and the disk isn't cluttered with unusable junk.

I've added some debugging code to test for a bad id passed to
the regex-match handler.  This doesn't do a heck of a lot at
the moment, but the case statement was looking a bit lonely
with just one entry.

regex.c is substantially Phil's work so he's mentioned in the copyright.
(As the licence makes clear, no one has actually transferred their
copyright anyway, so if there's ever any argy-bargy it has to be
sorted out line by line or even character by character---the copyrights
at the top of the files are rather less legally meaningful than they look.)


I'll commit this even if there are quibbles, to establish either a stick
in the ground or a line in the sand.  (I asked what the difference
was and was told one was horizontal and the other vertical.)

Index: configure.ac
===================================================================
RCS file: /cvsroot/zsh/zsh/configure.ac,v
retrieving revision 1.61
diff -u -r1.61 configure.ac
--- configure.ac	5 Jan 2007 13:58:04 -0000	1.61
+++ configure.ac	1 May 2007 21:54:50 -0000
@@ -1135,7 +1135,8 @@
 	       erand48 open_memstream \
 	       wctomb iconv \
 	       grantpt unlockpt ptsname \
-	       htons ntohs)
+	       htons ntohs \
+	       regcomp regexec regerror regfree)
 AC_FUNC_STRCOLL
 
 if test x$enable_cap = xyes; then
Index: Doc/Makefile.in
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Makefile.in,v
retrieving revision 1.35
diff -u -r1.35 Makefile.in
--- Doc/Makefile.in	17 Dec 2006 16:02:02 -0000	1.35
+++ Doc/Makefile.in	1 May 2007 21:54:52 -0000
@@ -61,7 +61,7 @@
 Zsh/mod_datetime.yo Zsh/mod_deltochar.yo \
 Zsh/mod_example.yo Zsh/mod_files.yo \
 Zsh/mod_mapfile.yo Zsh/mod_mathfunc.yo Zsh/mod_newuser.yo \
-Zsh/mod_parameter.yo Zsh/mod_pcre.yo \
+Zsh/mod_parameter.yo Zsh/mod_pcre.yo Zsh/mod_regex.yo \
 Zsh/mod_sched.yo Zsh/mod_socket.yo \
 Zsh/mod_stat.yo  Zsh/mod_system.yo Zsh/mod_tcp.yo \
 Zsh/mod_termcap.yo Zsh/mod_terminfo.yo \
Index: Doc/Zsh/cond.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/cond.yo,v
retrieving revision 1.3
diff -u -r1.3 cond.yo
--- Doc/Zsh/cond.yo	22 May 2000 15:01:35 -0000	1.3
+++ Doc/Zsh/cond.yo	1 May 2007 21:54:52 -0000
@@ -109,6 +109,20 @@
 item(var(string) tt(!=) var(pattern))(
 true if var(string) does not match var(pattern).
 )
+item(var(string) tt(=~) var(regexp))(
+true if var(string) matches the regular expression
+var(regexp).  If the option tt(RE_MATCH_PCRE) is set
+var(regexp) is tested as a PCRE regular expression using
+the tt(zsh/pcre) module, else it is tested as a POSIX
+regular expression using the tt(zsh/regex) module.
+If the option tt(BASH_REMATCH) is set the array
+tt(BASH_REMATCH) is set to the substring that matched the pattern
+followed by the substrings that matched parenthesised
+subexpressions within the pattern; otherwise, the scalar parameter
+tt(MATCH) is set to the substring that matched the pattern and
+and the array tt(match) to the substrings that matched parenthesised
+subexpressions.
+)
 item(var(string1) tt(<) var(string2))(
 true if var(string1) comes before var(string2)
 based on ASCII value of their characters.
Index: Doc/Zsh/mod_pcre.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/mod_pcre.yo,v
retrieving revision 1.5
diff -u -r1.5 mod_pcre.yo
--- Doc/Zsh/mod_pcre.yo	20 Jun 2004 22:47:18 -0000	1.5
+++ Doc/Zsh/mod_pcre.yo	1 May 2007 21:54:52 -0000
@@ -22,14 +22,17 @@
 matching.
 )
 findex(pcre_match)
-item(tt(pcre_match) [ tt(-a) var(arr) ] var(string))(
+item(tt(pcre_match) [ tt(-v) var(var) ] [ tt(-a) var(arr) ] var(string))(
 Returns successfully if tt(string) matches the previously-compiled
 PCRE.
 
 If the expression captures substrings within parentheses,
 tt(pcre_match) will set the array var($match) to those
 substrings, unless the tt(-a) option is given, in which
-case it will set the array var(arr).
+case it will set the array var(arr).  Similarly, the variable
+var(MATCH) will be set to the entire matched portion of the
+string, unless the tt(-v) option is given, in which case the variable
+var(var) will be set.
 )
 enditem()
 
Index: Doc/Zsh/options.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/options.yo,v
retrieving revision 1.53
diff -u -r1.53 options.yo
--- Doc/Zsh/options.yo	5 Mar 2007 17:35:18 -0000	1.53
+++ Doc/Zsh/options.yo	1 May 2007 21:54:52 -0000
@@ -319,6 +319,13 @@
 can match the directory tt(CVS) owing to the presence of the globbing flag
 (unless the option tt(BARE_GLOB_QUAL) is unset).
 )
+pindex(CASE_MATCH)
+cindex(case-insensitive regular expression matches, option)
+cindex(regular expressions, case-insensitive matching, option)
+item(tt(CASE_MATCH) <D>)(
+Make regular expressions using the tt(zsh/regex) module (including
+matches with tt(=~)) sensitive to case.
+)
 pindex(CSH_NULL_GLOB)
 cindex(csh, null globbing style)
 cindex(null globbing style, csh)
@@ -478,6 +485,15 @@
 `var(fooabar foobbar foocbar)' instead of the default
 `var(fooa b cbar)'.
 )
+pindex(REMATCH_PCRE)
+cindex(regexp, PCRE)
+cindex(PCRE, regexp)
+item(tt(REMATCH_PCRE) <Z>)(
+If set, regular expression matching with the tt(=~) operator will use
+Perl-Compatible Regular Expressions from the PCRE library, if available.
+If not set, regular expressions will use the extended regexp syntax
+provided by the system libraries.
+)
 pindex(SH_GLOB)
 cindex(sh, globbing style)
 cindex(globbing style, sh)
@@ -1131,6 +1147,20 @@
 
 subsect(Shell Emulation)
 startitem()
+pindex(BASH_REMATCH)
+cindex(bash, BASH_REMATCH variable)
+cindex(regexp, bash BASH_REMATCH variable)
+item(tt(BASH_REMATCH))(
+When set, matches performed with the tt(=~) operator will set the
+tt(BASH_REMATCH) array variable, instead of the default tt(MATCH) and
+tt(match) variables.  The first element of the tt(BASH_REMATCH) array
+will contain the entire matched text and subsequent elements will contain
+extracted substrings.  This option makes more sense when tt(KSH_ARRAYS) is
+also set, so that the entire matched portion is stored at index 0 and the
+first substring is at index 1.  Without this option, the tt(MATCH) variable
+contains the entire matched text and the tt(match) array variable contains
+substrings.
+)
 pindex(BSD_ECHO)
 cindex(echo, BSD compatible)
 item(tt(BSD_ECHO) <S>)(
Index: Src/cond.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/cond.c,v
retrieving revision 1.8
diff -u -r1.8 cond.c
--- Src/cond.c	30 May 2006 22:35:03 -0000	1.8
+++ Src/cond.c	1 May 2007 21:54:52 -0000
@@ -34,7 +34,7 @@
 
 static char *condstr[COND_MOD] = {
     "!", "&&", "||", "==", "!=", "<", ">", "-nt", "-ot", "-ef", "-eq",
-    "-ne", "-lt", "-gt", "-le", "-ge"
+    "-ne", "-lt", "-gt", "-le", "-ge", "=~"
 };
 
 /*
@@ -53,14 +53,14 @@
 evalcond(Estate state, char *fromtest)
 {
     struct stat *st;
-    char *left, *right;
+    char *left, *right, *overridename, overridebuf[13];
     Wordcode pcode;
     wordcode code;
     int ctype, htok = 0, ret;
 
  rec:
 
-    left = right = NULL;
+    left = right = overridename = NULL;
     pcode = state->pc++;
     code = *pcode;
     ctype = WC_COND_TYPE(code);
@@ -92,13 +92,28 @@
 	    state->pc = pcode + (WC_COND_SKIP(code) + 1);
 	    return ret;
 	}
+    case COND_REGEX:
+	{
+	    char *modname = isset(REMATCHPCRE) ? "zsh/pcre" : "zsh/regex";
+	    if (!load_module_silence(modname, 1)) {
+		zwarnnam(fromtest, "%s not available for regex",
+			 modname);
+		return 2;
+	    }
+	    sprintf(overridename = overridebuf, "-%s-match", modname+4);
+	    ctype = COND_MODI;
+	}
+	/*FALLTHROUGH*/
     case COND_MOD:
     case COND_MODI:
 	{
 	    Conddef cd;
-	    char *name = ecgetstr(state, EC_NODUP, NULL), **strs;
+	    char *name = overridename;
+	    char **strs;
 	    int l = WC_COND_SKIP(code);
 
+	    if (name == NULL)
+		name = ecgetstr(state, EC_NODUP, NULL);
 	    if (ctype == COND_MOD)
 		strs = ecgetarr(state, l, EC_DUP, NULL);
 	    else {
@@ -139,7 +154,8 @@
 		    return !cd->handler(strs, cd->condid);
 		} else {
 		    zwarnnam(fromtest,
-			     "unrecognized condition: `%s'", name);
+			     "unrecognized condition: `%s'",
+			     name ? name : "<null>");
 		}
 	    }
 	    /* module not found, error */
Index: Src/options.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/options.c,v
retrieving revision 1.35
diff -u -r1.35 options.c
--- Src/options.c	15 Mar 2007 15:16:58 -0000	1.35
+++ Src/options.c	1 May 2007 21:54:54 -0000
@@ -88,11 +88,13 @@
 {{NULL, "banghist",	      OPT_NONBOURNE},		 BANGHIST},
 {{NULL, "bareglobqual",       OPT_EMULATE|OPT_ZSH},      BAREGLOBQUAL},
 {{NULL, "bashautolist",	      0},                        BASHAUTOLIST},
+{{NULL, "bashrematch",	      0},			 BASHREMATCH},
 {{NULL, "beep",		      OPT_ALL},			 BEEP},
 {{NULL, "bgnice",	      OPT_EMULATE|OPT_NONBOURNE},BGNICE},
 {{NULL, "braceccl",	      OPT_EMULATE},		 BRACECCL},
 {{NULL, "bsdecho",	      OPT_EMULATE|OPT_SH},	 BSDECHO},
 {{NULL, "caseglob",	      OPT_ALL},			 CASEGLOB},
+{{NULL, "casematch",	      OPT_ALL},			 CASEMATCH},
 {{NULL, "cbases",	      0},			 CBASES},
 {{NULL, "cdablevars",	      OPT_EMULATE},		 CDABLEVARS},
 {{NULL, "chasedots",	      OPT_EMULATE},		 CHASEDOTS},
@@ -201,6 +203,7 @@
 {{NULL, "rcquotes",	      OPT_EMULATE},		 RCQUOTES},
 {{NULL, "rcs",		      OPT_ALL},			 RCS},
 {{NULL, "recexact",	      0},			 RECEXACT},
+{{NULL, "rematchpcre",	      0},			 REMATCHPCRE},
 {{NULL, "restricted",	      OPT_SPECIAL},		 RESTRICTED},
 {{NULL, "rmstarsilent",	      OPT_BOURNE},		 RMSTARSILENT},
 {{NULL, "rmstarwait",	      0},			 RMSTARWAIT},
Index: Src/parse.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/parse.c,v
retrieving revision 1.64
diff -u -r1.64 parse.c
--- Src/parse.c	23 Apr 2007 17:24:23 -0000	1.64
+++ Src/parse.c	1 May 2007 21:54:54 -0000
@@ -2124,6 +2124,12 @@
 	ecstr(a);
 	ecstr(c);
 	ecadd(ecnpats++);
+    } else if ((b[0] == Equals || b[0] == '=') &&
+               (b[1] == '~' || b[1] == Tilde) && ~b[2]) {
+	ecadd(WCB_COND(COND_REGEX, 0));
+	ecstr(a);
+	ecstr(c);
+	ecadd(ecnpats++);
     } else if (b[0] == '-') {
 	if ((t0 = get_cond_num(b + 1)) > -1) {
 	    ecadd(WCB_COND(t0 + COND_NT, 0));
Index: Src/text.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/text.c,v
retrieving revision 1.19
diff -u -r1.19 text.c
--- Src/text.c	23 Apr 2007 15:24:00 -0000	1.19
+++ Src/text.c	1 May 2007 21:54:54 -0000
@@ -640,7 +640,7 @@
 	    {
 		static char *c1[] = {
 		    "=", "!=", "<", ">", "-nt", "-ot", "-ef", "-eq",
-		    "-ne", "-lt", "-gt", "-le", "-ge"
+		    "-ne", "-lt", "-gt", "-le", "-ge", "=~"
 		};
 
 		int ctype;
@@ -724,7 +724,7 @@
 			}
 			break;
 		    default:
-			if (ctype <= COND_GE) {
+			if (ctype < COND_MOD) {
 			    /* Binary test: `a = b' etc. */
 			    taddstr(ecgetstr(state, EC_NODUP, NULL));
 			    taddstr(" ");
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.112
diff -u -r1.112 zsh.h
--- Src/zsh.h	29 Mar 2007 21:35:39 -0000	1.112
+++ Src/zsh.h	1 May 2007 21:54:57 -0000
@@ -519,8 +519,9 @@
 #define COND_GT    13
 #define COND_LE    14
 #define COND_GE    15
-#define COND_MOD   16
-#define COND_MODI  17
+#define COND_REGEX 16
+#define COND_MOD   17
+#define COND_MODI  18
 
 typedef int (*CondHandler) _((char **, int));
 
@@ -1588,11 +1589,13 @@
     BANGHIST,
     BAREGLOBQUAL,
     BASHAUTOLIST,
+    BASHREMATCH,
     BEEP,
     BGNICE,
     BRACECCL,
     BSDECHO,
     CASEGLOB,
+    CASEMATCH,
     CBASES,
     CDABLEVARS,
     CHASEDOTS,
@@ -1695,6 +1698,7 @@
     RCQUOTES,
     RCS,
     RECEXACT,
+    REMATCHPCRE,
     RESTRICTED,
     RMSTARSILENT,
     RMSTARWAIT,
Index: Src/Modules/pcre.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Modules/pcre.c,v
retrieving revision 1.11
diff -u -r1.11 pcre.c
--- Src/Modules/pcre.c	5 Apr 2007 16:20:15 -0000	1.11
+++ Src/Modules/pcre.c	1 May 2007 21:54:57 -0000
@@ -3,7 +3,7 @@
  *
  * This file is part of zsh, the Z shell.
  *
- * Copyright (c) 2001, 2002, 2003, 2004 Clint Adams
+ * Copyright (c) 2001, 2002, 2003, 2004, 2007 Clint Adams
  * All rights reserved.
  *
  * Permission is hereby granted, without written agreement and without
@@ -42,6 +42,37 @@
 
 /**/
 static int
+zpcre_utf8_enabled(void)
+{
+#if defined(MULTIBYTE_SUPPORT) && defined(HAVE_NL_LANGINFO) && defined(CODESET)
+    static int have_utf8_pcre = -1;
+
+    /* value can toggle based on MULTIBYTE, so don't
+     * be too eager with caching */
+    if (have_utf8_pcre < -1)
+	return 0;
+
+    if (!isset(MULTIBYTE))
+	return 0;
+
+    if ((have_utf8_pcre == -1) &&
+        (!strcmp(nl_langinfo(CODESET), "UTF-8"))) {
+
+	if (pcre_config(PCRE_CONFIG_UTF8, &have_utf8_pcre))
+	    have_utf8_pcre = -2; /* erk, failed to ask */
+    }
+
+    if (have_utf8_pcre < 0)
+	return 0;
+    return have_utf8_pcre;
+
+#else
+    return 0;
+#endif
+}
+
+/**/
+static int
 bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func))
 {
     int pcre_opts = 0, pcre_errptr;
@@ -52,8 +83,14 @@
     if(OPT_ISSET(ops,'m')) pcre_opts |= PCRE_MULTILINE;
     if(OPT_ISSET(ops,'x')) pcre_opts |= PCRE_EXTENDED;
     
+    if (zpcre_utf8_enabled())
+	pcre_opts |= PCRE_UTF8;
+
     pcre_hints = NULL;  /* Is this necessary? */
     
+    if (pcre_pattern)
+	pcre_free(pcre_pattern);
+
     pcre_pattern = pcre_compile(*args, pcre_opts, &pcre_error, &pcre_errptr, NULL);
     
     if (pcre_pattern == NULL)
@@ -100,37 +137,52 @@
 
 /**/
 static int
-zpcre_get_substrings(char *arg, int *ovec, int ret, char *receptacle)
+zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar, char *substravar, int matchedinarr)
 {
-    char **captures, **matches;
+    char **captures, **match_all, **matches;
+    int capture_start = 1;
 
-	if(!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures)) {
-	    
-	    matches = zarrdup(&captures[1]); /* first one would be entire string */
-	    if (receptacle == NULL)
-		setaparam("match", matches);
-	    else
-		setaparam(receptacle, matches);
-	    
-	    pcre_free_substring_list((const char **)captures);
-	}
+    if (matchedinarr)
+	capture_start = 0;
+    if (matchvar == NULL)
+	matchvar = "MATCH";
+    if (substravar == NULL)
+	substravar = "match";
+
+    /* captures[0] will be entire matched string, [1] first substring */
+    if(!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures)) {
+	match_all = ztrdup(captures[0]);
+	setsparam(matchvar, match_all);
+	matches = zarrdup(&captures[capture_start]);
+	setaparam(substravar, matches);
+	pcre_free_substring_list((const char **)captures);
+    }
 
-	return 0;
+    return 0;
 }
 
 /**/
 static int
 bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 {
-    int ret, capcount, *ovec, ovecsize;
+    int ret, capcount, *ovec, ovecsize, c;
+    char *matched_portion = NULL;
     char *receptacle = NULL;
+    int return_value = 1;
+
+    if (pcre_pattern == NULL) {
+	zwarnnam(nam, "no pattern has been compiled");
+	return 1;
+    }
     
-    if(OPT_ISSET(ops,'a')) {
-	receptacle = *args++;
-	if(!*args) {
-	    zwarnnam(nam, "not enough arguments");
-	    return 1;
-	}
+    if(OPT_HASARG(ops,c='a')) {
+	receptacle = OPT_ARG(ops,c);
+    }
+    if(OPT_HASARG(ops,c='v')) {
+	matched_portion = OPT_ARG(ops,c);
+    }
+    if(!*args) {
+	zwarnnam(nam, "not enough arguments");
     }
     
     if ((ret = pcre_fullinfo(pcre_pattern, pcre_hints, PCRE_INFO_CAPTURECOUNT, &capcount)))
@@ -144,18 +196,20 @@
     
     ret = pcre_exec(pcre_pattern, pcre_hints, *args, strlen(*args), 0, 0, ovec, ovecsize);
     
-    if (ret==0) return 0;
-    else if (ret==PCRE_ERROR_NOMATCH) return 1; /* no match */
+    if (ret==0) return_value = 0;
+    else if (ret==PCRE_ERROR_NOMATCH) /* no match */;
     else if (ret>0) {
-	zpcre_get_substrings(*args, ovec, ret, receptacle);
-	return 0;
+	zpcre_get_substrings(*args, ovec, ret, matched_portion, receptacle, 0);
+	return_value = 0;
     }
     else {
 	zwarnnam(nam, "error in pcre_exec");
-	return 1;
     }
     
-    return 1;
+    if (ovec)
+	zfree(ovec, ovecsize*sizeof(int));
+
+    return return_value;
 }
 
 /**/
@@ -164,33 +218,63 @@
 {
     pcre *pcre_pat;
     const char *pcre_err;
-    char *lhstr, *rhre;
+    char *lhstr, *rhre, *avar=NULL;
     int r = 0, pcre_opts = 0, pcre_errptr, capcnt, *ov, ovsize;
+    int return_value = 0;
+
+    if (zpcre_utf8_enabled())
+	pcre_opts |= PCRE_UTF8;
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);
+    pcre_pat = ov = NULL;
+
+    if (isset(BASHREMATCH))
+	avar="BASH_REMATCH";
 
     switch(id) {
 	 case CPCRE_PLAIN:
-		 pcre_pat = pcre_compile(rhre, pcre_opts, &pcre_err, &pcre_errptr, NULL);
-                 pcre_fullinfo(pcre_pat, NULL, PCRE_INFO_CAPTURECOUNT, &capcnt);
-    		 ovsize = (capcnt+1)*3;
-		 ov = zalloc(ovsize*sizeof(int));
-    		 r = pcre_exec(pcre_pat, NULL, lhstr, strlen(lhstr), 0, 0, ov, ovsize);
-    		if (r==0) return 1;
+		pcre_pat = pcre_compile(rhre, pcre_opts, &pcre_err, &pcre_errptr, NULL);
+		if (pcre_pat == NULL) {
+		    zwarn("failed to compile regexp /%s/: %s", rhre, pcre_err);
+		    break;
+		}
+                pcre_fullinfo(pcre_pat, NULL, PCRE_INFO_CAPTURECOUNT, &capcnt);
+    		ovsize = (capcnt+1)*3;
+		ov = zalloc(ovsize*sizeof(int));
+    		r = pcre_exec(pcre_pat, NULL, lhstr, strlen(lhstr), 0, 0, ov, ovsize);
+		/* r < 0 => error; r==0 match but not enough size in ov
+		 * r > 0 => (r-1) substrings found; r==1 => no substrings
+		 */
+    		if (r==0) {
+		    zwarn("reportable zsh problem: pcre_exec() returned 0");
+		    return_value = 1;
+		    break;
+		}
 	        else if (r==PCRE_ERROR_NOMATCH) return 0; /* no match */
+		else if (r<0) {
+		    zwarn("pcre_exec() error: %d", r);
+		    break;
+		}
                 else if (r>0) {
-		    zpcre_get_substrings(lhstr, ov, r, NULL);
-		    return 1;
+		    zpcre_get_substrings(lhstr, ov, r, NULL, avar, isset(BASHREMATCH));
+		    return_value = 1;
+		    break;
 		}
 		break;
     }
 
-    return 0;
+    if (pcre_pat)
+	pcre_free(pcre_pat);
+    if (ov)
+	zfree(ov, ovsize*sizeof(int));
+
+    return return_value;
 }
 
 static struct conddef cotab[] = {
     CONDDEF("pcre-match", CONDF_INFIX, cond_pcre_match, 0, 0, CPCRE_PLAIN)
+    /* CONDDEF can register =~ but it won't be found */
 };
 
 /**/
@@ -206,7 +290,7 @@
 static struct builtin bintab[] = {
     BUILTIN("pcre_compile", 0, bin_pcre_compile, 1, 1, 0, "aimx",  NULL),
     BUILTIN("pcre_study",   0, bin_pcre_study,   0, 0, 0, NULL,    NULL),
-    BUILTIN("pcre_match",   0, bin_pcre_match,   1, 2, 0, "a",    NULL)
+    BUILTIN("pcre_match",   0, bin_pcre_match,   1, 1, 0, "a:v:",    NULL)
 };
 
 
Index: Src/Modules/regex.c
===================================================================
RCS file: Src/Modules/regex.c
diff -N Src/Modules/regex.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ Src/Modules/regex.c	1 May 2007 21:54:57 -0000
@@ -0,0 +1,161 @@
+/*
+ * regex.c
+ *
+ * This file is part of zsh, the Z shell.
+ *
+ * Copyright (c) 2007 Phil Pennock
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, without written agreement and without
+ * license or royalty fees, to use, copy, modify, and distribute this
+ * software and to distribute modified versions of this software for any
+ * purpose, provided that the above copyright notice and the following
+ * two paragraphs appear in all copies of this software.
+ *
+ * In no event shall Phil Pennock or the Zsh Development Group be liable
+ * to any party for direct, indirect, special, incidental, or consequential
+ * damages arising out of the use of this software and its documentation,
+ * even if Phil Pennock and the Zsh Development Group have been advised of
+ * the possibility of such damage.
+ *
+ * Phil Pennock and the Zsh Development Group specifically disclaim any
+ * warranties, including, but not limited to, the implied warranties of
+ * merchantability and fitness for a particular purpose.  The software
+ * provided hereunder is on an "as is" basis, and Phil Pennock and the
+ * Zsh Development Group have no obligation to provide maintenance,
+ * support, updates, enhancements, or modifications.
+ *
+ */
+
+#include "regex.mdh"
+#include "regex.pro"
+
+#include <regex.h>
+
+/* we default to a vaguely modern syntax and set of capabilities */
+#define ZREGEX_EXTENDED 0
+/* if you want Basic syntax, make it an alternative options */
+
+static void
+zregex_regerrwarn(int r, regex_t *re, char *msg)
+{
+    char *errbuf;
+    size_t errbufsz;
+
+    errbufsz = regerror(r, re, NULL, 0);
+    errbuf = zalloc(errbufsz*sizeof(char));
+    regerror(r, re, errbuf, errbufsz);
+    zwarn("%s: %s", msg, errbuf);
+    zfree(errbuf, errbufsz);
+}
+
+/**/
+static int
+zcond_regex_match(char **a, int id)
+{
+    regex_t re;
+    regmatch_t *m, *matches = NULL;
+    size_t matchessz;
+    char *lhstr, *rhre, *s, **arr, **x;
+    int r, n, return_value, rcflags, reflags, nelem, start;
+
+    lhstr = cond_str(a,0,0);
+    rhre = cond_str(a,1,0);
+    rcflags = reflags = 0;
+    return_value = 0; /* 1 => matched successfully */
+
+    switch(id) {
+    case ZREGEX_EXTENDED:
+	rcflags |= REG_EXTENDED;
+	if (!isset(CASEMATCH))
+	    rcflags |= REG_ICASE;
+	r = regcomp(&re, rhre, rcflags);
+	if (r) {
+	    zregex_regerrwarn(r, &re, "failed to compile regex");
+	    break;
+	}
+	/* re.re_nsub is number of parenthesized groups, we also need
+	 * 1 for the 0 offset, which is the entire matched portion
+	 */
+	if (re.re_nsub < 0) {
+	    zwarn("INTERNAL ERROR: regcomp() returned "
+		    "negative subpattern count %d", re.re_nsub);
+	    break;
+	}
+	matchessz = (re.re_nsub + 1) * sizeof(regmatch_t);
+	matches = zalloc(matchessz);
+	r = regexec(&re, lhstr, re.re_nsub+1, matches, reflags);
+	if (r == REG_NOMATCH) /**/;
+	else if (r == 0) {
+	    return_value = 1;
+	    if (isset(BASHREMATCH)) {
+		start = 0;
+		nelem = re.re_nsub + 1;
+	    } else {
+		start = 1;
+		nelem = re.re_nsub;
+	    }
+	    arr = NULL; /* bogus gcc warning of used uninitialised */
+	    /* entire matched portion + re_nsub substrings + NULL */
+	    if (nelem) {
+		arr = x = (char **) zalloc(sizeof(char *) * (nelem + 1));
+		for (m = matches + start, n = start; n <= re.re_nsub; ++n, ++m, ++x) {
+		    *x = ztrduppfx(lhstr + m->rm_so, m->rm_eo - m->rm_so);
+		}
+		*x = NULL;
+	    }
+	    if (isset(BASHREMATCH)) {
+		setaparam("BASH_REMATCH", arr);
+	    } else {
+		m = matches;
+		s = ztrduppfx(lhstr + m->rm_so, m->rm_eo - m->rm_so);
+		setsparam("MATCH", s);
+		if (nelem)
+		    setaparam("match", arr);
+	    }
+	}
+	else zregex_regerrwarn(r, &re, "regex matching error");
+	break;
+    default:
+	DPUTS(1, "bad regex option");
+	break;
+    }
+
+    if (matches)
+	zfree(matches, matchessz);
+    regfree(&re);
+    return return_value;
+}
+
+static struct conddef cotab[] = {
+    CONDDEF("regex-match", CONDF_INFIX, zcond_regex_match, 0, 0, ZREGEX_EXTENDED)
+};
+
+/**/
+int
+setup_(UNUSED(Module m))
+{
+    return 0;
+}
+
+/**/
+int
+boot_(Module m)
+{
+    return !addconddefs(m->nam, cotab, sizeof(cotab)/sizeof(*cotab));
+}
+
+/**/
+int
+cleanup_(Module m)
+{
+    deleteconddefs(m->nam, cotab, sizeof(cotab)/sizeof(*cotab));
+    return 0;
+}
+
+/**/
+int
+finish_(UNUSED(Module m))
+{
+    return 0;
+}
Index: Src/Modules/regex.mdd
===================================================================
RCS file: Src/Modules/regex.mdd
diff -N Src/Modules/regex.mdd
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ Src/Modules/regex.mdd	1 May 2007 21:54:57 -0000
@@ -0,0 +1,10 @@
+name=zsh/regex
+link=`if test x$ac_cv_func_regcomp = xyes && \
+         test x$ac_cv_func_regexec = xyes && \
+         test x$ac_cv_func_regerror = xyes && \
+         test x$ac_cv_func_regfree = xyes; then echo dynamic; else echo no; fi`
+load=no
+
+autobins=""
+
+objects="regex.o"

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-29 15:28     ` Peter Stephenson
@ 2007-04-29 19:17       ` Phil Pennock
  0 siblings, 0 replies; 16+ messages in thread
From: Phil Pennock @ 2007-04-29 19:17 UTC (permalink / raw)
  To: zsh-workers

On 2007-04-29 at 16:28 +0100, Peter Stephenson wrote:
> Having looked further, I see that's the effect of what you've done (with
> other stuff)... however, (unless I missed something in one of the
> subsequent patches) the documentation for =~ still claims it's (always)
> based on pcre.

Meh.  I knew I was missing something.
Sorry.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-29 15:16   ` Peter Stephenson
@ 2007-04-29 15:28     ` Peter Stephenson
  2007-04-29 19:17       ` Phil Pennock
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2007-04-29 15:28 UTC (permalink / raw)
  To: zsh-workers

Peter Stephenson wrote:
> Phil Pennock wrote:
> > This also includes my previous =~ work, replacing the previous patch.
> 
> As I said, I want to do =~ with standard POSIX regular expressions and
> no add-ons.  I've already got this working.

Having looked further, I see that's the effect of what you've done (with
other stuff)... however, (unless I missed something in one of the
subsequent patches) the documentation for =~ still claims it's (always)
based on pcre.

There's no harm in the regular expression stuff being a separate module,
but if it's going to be it might as well be loadable if and only
if configure detects the POSIX regexp stuff.  That's easy---I can
patch that later.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-28  7:56 ` Phil Pennock
  2007-04-28  8:20   ` Phil Pennock
  2007-04-29  0:51   ` Phil Pennock
@ 2007-04-29 15:16   ` Peter Stephenson
  2007-04-29 15:28     ` Peter Stephenson
  2007-05-01 21:59   ` Peter Stephenson
  2007-05-29  8:56   ` Phil Pennock
  4 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2007-04-29 15:16 UTC (permalink / raw)
  To: zsh-workers

Phil Pennock wrote:
> This also includes my previous =~ work, replacing the previous patch.

As I said, I want to do =~ with standard POSIX regular expressions and
no add-ons.  I've already got this working.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-28  7:56 ` Phil Pennock
  2007-04-28  8:20   ` Phil Pennock
@ 2007-04-29  0:51   ` Phil Pennock
  2007-04-29 15:16   ` Peter Stephenson
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Phil Pennock @ 2007-04-29  0:51 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 973 bytes --]

On 2007-04-28 at 00:56 -0700, Phil Pennock wrote:
> The attached patch and files, which includes documentation, adds a new
> loadable module, zsh/regex.  I've not examined widechar issues and which
> regex libraries actually do handle these.  I've not looked at linkage
> issues on platforms where regex (the POSIX interface, not regexp) is not
> a part of libc.

I noticed a gcc complaint that a variable might be used uninitialised;
this was bogus, but understandable.  The first patch below fixes it.
The second patch is a spelling correction in the docs.  Alternatively,
it might be a sign that the option needs to be renamed ...

arr is only initialised if nelem.  Later there are two references to it.
The second is guarded by "if (nelem)"; the first is guarded by
"if (isset(BASHREMATCH))".  If BASHREMATCH is set, nelem is always at
least 1.  Assuming re.re_nsub is never negative, which it isn't.  And
the patch also affirms this for sheer paranoia's sake.

-Phil

[-- Attachment #2: zsh-regex-bogus-unused.patch --]
[-- Type: text/x-diff, Size: 880 bytes --]

--- Src/Modules/regex.c.old	Sat Apr 28 17:40:12 2007
+++ Src/Modules/regex.c	Sat Apr 28 17:43:27 2007
@@ -75,6 +75,11 @@ zcond_regex_match(char **a, int id)
 	/* re.re_nsub is number of parenthesized groups, we also need
 	 * 1 for the 0 offset, which is the entire matched portion
 	 */
+	if (re.re_nsub < 0) {
+	    zwarn("INTERNAL ERROR: regcomp() returned "
+		    "negative subpattern count %d", re.re_nsub);
+	    break;
+	}
 	matchessz = (re.re_nsub + 1) * sizeof(regmatch_t);
 	matches = zalloc(matchessz);
 	r = regexec(&re, lhstr, re.re_nsub+1, matches, reflags);
@@ -88,6 +93,7 @@ zcond_regex_match(char **a, int id)
 		start = 1;
 		nelem = re.re_nsub;
 	    }
+	    arr = NULL; /* bogus gcc warning of used uninitialised */
 	    /* entire matched portion + re_nsub substrings + NULL */
 	    if (nelem) {
 		arr = x = (char **) zalloc(sizeof(char *) * (nelem + 1));

[-- Attachment #3: zsh-regex-speling.patch --]
[-- Type: text/x-diff, Size: 513 bytes --]

--- Doc/Zsh/mod_regex.yo.old	Sat Apr 28 17:45:57 2007
+++ Doc/Zsh/mod_regex.yo	Sat Apr 28 17:45:35 2007
@@ -15,7 +15,7 @@
 
 [[ alphabetical -regex-match ^a([^a]+)a([^a]+)a ]] && print -l $MATCH X $match
 
-If tt(REGMATCH_PCRE) is not set, then the tt(=~) operator will automatically
+If tt(REMATCH_PCRE) is not set, then the tt(=~) operator will automatically
 load this module as needed and will invoke the tt(-regex-match) operator.
 
 If tt(BASH_REMATCH) is set, then tt($BASH_REMATCH) will be set instead of

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: zsh/regex and =~
  2007-04-28  7:56 ` Phil Pennock
@ 2007-04-28  8:20   ` Phil Pennock
  2007-04-29  0:51   ` Phil Pennock
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Phil Pennock @ 2007-04-28  8:20 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

On 2007-04-28 at 00:56 -0700, Phil Pennock wrote:
> The attached patch and files, which includes documentation,

*sigh*  I say these things, I tinker with documentation for the options
etc, but do I remember to write documentation for the new module?

See attached, relative to the source tree as patched by the previous
mail.

[-- Attachment #2: zsh-regex-doc.patch --]
[-- Type: text/x-diff, Size: 1703 bytes --]

diff -pNur zsh-regexp/Doc/Makefile.in zsh-regex-doc/Doc/Makefile.in
--- zsh-regexp/Doc/Makefile.in	Sun Dec 17 08:02:02 2006
+++ zsh-regex-doc/Doc/Makefile.in	Sat Apr 28 01:08:36 2007
@@ -61,7 +61,7 @@ Zsh/mod_computil.yo \
 Zsh/mod_datetime.yo Zsh/mod_deltochar.yo \
 Zsh/mod_example.yo Zsh/mod_files.yo \
 Zsh/mod_mapfile.yo Zsh/mod_mathfunc.yo Zsh/mod_newuser.yo \
-Zsh/mod_parameter.yo Zsh/mod_pcre.yo \
+Zsh/mod_parameter.yo Zsh/mod_pcre.yo Zsh/mod_regex.yo \
 Zsh/mod_sched.yo Zsh/mod_socket.yo \
 Zsh/mod_stat.yo  Zsh/mod_system.yo Zsh/mod_tcp.yo \
 Zsh/mod_termcap.yo Zsh/mod_terminfo.yo \
diff -pNur zsh-regexp/Doc/Zsh/mod_regex.yo zsh-regex-doc/Doc/Zsh/mod_regex.yo
--- zsh-regexp/Doc/Zsh/mod_regex.yo	Wed Dec 31 16:00:00 1969
+++ zsh-regex-doc/Doc/Zsh/mod_regex.yo	Sat Apr 28 01:15:20 2007
@@ -0,0 +1,24 @@
+COMMENT(!MOD!zsh/regex
+Interface to the POSIX regex library.
+!MOD!)
+cindex(regular expressions, REGEX)
+The tt(zsh/regex) module makes available the following test condition:
+startitem()
+findex(regex-match)
+item(expr tt(-regex-match) regex)(
+Matches a string against a POSIX extended regular expression.
+The matched portion of the string will normally be placed in the tt($MATCH)
+variable.  If there are any capturing parentheses within the regex, then
+the tt($match) array variable will contain those.
+
+For example,
+
+[[ alphabetical -regex-match ^a([^a]+)a([^a]+)a ]] && print -l $MATCH X $match
+
+If tt(REGMATCH_PCRE) is not set, then the tt(=~) operator will automatically
+load this module as needed and will invoke the tt(-regex-match) operator.
+
+If tt(BASH_REMATCH) is set, then tt($BASH_REMATCH) will be set instead of
+tt($MATCH) and tt($match).
+)
+enditem()

^ permalink raw reply	[flat|nested] 16+ messages in thread

* PATCH: zsh/regex and =~
@ 2007-04-28  7:56 ` Phil Pennock
  2007-04-28  8:20   ` Phil Pennock
                     ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Phil Pennock @ 2007-04-28  7:56 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

[ Sorry for not having one diff which creates new files, but "cvs diff -N"
  seems to be ignoring that little 'N' ]

The attached patch and files, which includes documentation, adds a new
loadable module, zsh/regex.  I've not examined widechar issues and which
regex libraries actually do handle these.  I've not looked at linkage
issues on platforms where regex (the POSIX interface, not regexp) is not
a part of libc.

This also includes my previous =~ work, replacing the previous patch.
I'm not sure that auto-unsetting REMATCH_PCRE is a good idea, so invite
comments; also as to which should be the default value; I suppose that
if pcre is not the default, then the warning can be put back in ...

My only test platform has been freebsd/amd64.

I've also cleaned up various memory leaks in zsh/pcre.
zsh/pcre now also sets $MATCH, not just $match.

I went with having $BASH_REMATCH be set instead of, rather than in
addition to, $MATCH and $match.  I'm again very open to persuasion here.

Oh, and the copyright notice in regex.c seems a bit disjointed, with
multiple names.  What's the copyright policy on newly contributed files?

zsh/regex provides the -regex-match conditional operator, the knowledge
of -regex-match and -pcre-match remains in cond.c with the COND_REGEX
handling for =~.

Also, I've decided that I much prefer the PCRE API to the POSIX regex
API.  :-)  I'm off to drink more wine to recover.

-Phil

[-- Attachment #2: regex-both.patch --]
[-- Type: text/x-diff, Size: 16747 bytes --]

Index: Doc/Zsh/cond.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/cond.yo,v
retrieving revision 1.3
diff -p -u -r1.3 cond.yo
--- Doc/Zsh/cond.yo	22 May 2000 15:01:35 -0000	1.3
+++ Doc/Zsh/cond.yo	28 Apr 2007 07:42:51 -0000
@@ -109,6 +109,11 @@ backward compatibility and should be con
 item(var(string) tt(!=) var(pattern))(
 true if var(string) does not match var(pattern).
 )
+item(var(string) tt(=~) var(regexp))(
+true if var(string) matches the PCRE regular expression
+var(regexp).  Requires the tt(zsh/pcre) module to be present,
+which is a compile-time option.
+)
 item(var(string1) tt(<) var(string2))(
 true if var(string1) comes before var(string2)
 based on ASCII value of their characters.
Index: Doc/Zsh/mod_pcre.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/mod_pcre.yo,v
retrieving revision 1.5
diff -p -u -r1.5 mod_pcre.yo
--- Doc/Zsh/mod_pcre.yo	20 Jun 2004 22:47:18 -0000	1.5
+++ Doc/Zsh/mod_pcre.yo	28 Apr 2007 07:42:51 -0000
@@ -22,14 +22,17 @@ Studies the previously-compiled PCRE whi
 matching.
 )
 findex(pcre_match)
-item(tt(pcre_match) [ tt(-a) var(arr) ] var(string))(
+item(tt(pcre_match) [ tt(-v) var(var) ] [ tt(-a) var(arr) ] var(string))(
 Returns successfully if tt(string) matches the previously-compiled
 PCRE.
 
 If the expression captures substrings within parentheses,
 tt(pcre_match) will set the array var($match) to those
 substrings, unless the tt(-a) option is given, in which
-case it will set the array var(arr).
+case it will set the array var(arr).  Similarly, the variable
+var($MATCH) will be set to the entire matched portion of the
+string, unless the tt(-v) option is given, in which var it will
+set the variable var(var).
 )
 enditem()
 
Index: Doc/Zsh/options.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/options.yo,v
retrieving revision 1.53
diff -p -u -r1.53 options.yo
--- Doc/Zsh/options.yo	5 Mar 2007 17:35:18 -0000	1.53
+++ Doc/Zsh/options.yo	28 Apr 2007 07:42:51 -0000
@@ -478,6 +478,19 @@ var(xx) is set to tt(LPAR())var(a b c)tt
 `var(fooabar foobbar foocbar)' instead of the default
 `var(fooa b cbar)'.
 )
+pindex(REMATCH_PCRE)
+cindex(regexp, PCRE)
+cindex(PCRE, regexp)
+item(tt(REMATCH_PCRE) <Z>)(
+If set, regular expression matching with the tt(=~) operator will use
+Perl-Compatible Regular Expressions from the PCRE library, if available.
+If not set, regular expressions will use the extended regexp syntax
+provided by the system libraries.
+Experimental:
+When zsh is invoked as tt(zsh), this option is initially set, but may be
+unset if the tt(zsh/pcre) module can not be loaded.  This behaviour, as
+well as the default status, is subject to change.
+)
 pindex(SH_GLOB)
 cindex(sh, globbing style)
 cindex(globbing style, sh)
@@ -1131,6 +1144,20 @@ enditem()
 
 subsect(Shell Emulation)
 startitem()
+pindex(BASH_REMATCH)
+cindex(bash, BASH_REMATCH variable)
+cindex(regexp, bash BASH_REMATCH variable)
+item(tt(BASH_REMATCH))(
+When set, matches performed with the tt(=~) operator will set the
+tt(BASH_REMATCH) array variable, instead of the default tt(MATCH) and
+tt(match) variables.  The first element of the tt(BASH_REMATCH) array
+will contain the entire matched text and subsequent elements will contain
+extracted substrings.  This option makes more sense when tt(KSH_ARRAYS) is
+also set, so that the entire matched portion is stored at index 0 and the
+first substring is at index 1.  Without this option, the tt(MATCH) variable
+contains the entire matched text and the tt(match) array variable will
+the substrings.
+)
 pindex(BSD_ECHO)
 cindex(echo, BSD compatible)
 item(tt(BSD_ECHO) <S>)(
Index: Src/cond.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/cond.c,v
retrieving revision 1.8
diff -p -u -r1.8 cond.c
--- Src/cond.c	30 May 2006 22:35:03 -0000	1.8
+++ Src/cond.c	28 Apr 2007 07:42:51 -0000
@@ -34,7 +34,7 @@ int tracingcond;
 
 static char *condstr[COND_MOD] = {
     "!", "&&", "||", "==", "!=", "<", ">", "-nt", "-ot", "-ef", "-eq",
-    "-ne", "-lt", "-gt", "-le", "-ge"
+    "-ne", "-lt", "-gt", "-le", "-ge", "=~"
 };
 
 /*
@@ -53,14 +53,14 @@ int
 evalcond(Estate state, char *fromtest)
 {
     struct stat *st;
-    char *left, *right;
+    char *left, *right, *overridename;
     Wordcode pcode;
     wordcode code;
     int ctype, htok = 0, ret;
 
  rec:
 
-    left = right = NULL;
+    left = right = overridename = NULL;
     pcode = state->pc++;
     code = *pcode;
     ctype = WC_COND_TYPE(code);
@@ -92,13 +92,42 @@ evalcond(Estate state, char *fromtest)
 	    state->pc = pcode + (WC_COND_SKIP(code) + 1);
 	    return ret;
 	}
+    case COND_REGEX:
+	{
+	    int loaded = 0;
+	    if (isset(REMATCHPCRE)) {
+		loaded = load_module_silence("zsh/pcre", 1);
+		if (loaded) {
+		    overridename = "-pcre-match";
+		} else {
+		    dosetopt(REMATCHPCRE, 0, 1);
+#if 0
+		    zwarnnam(fromtest, "zsh/pcre not available for regex");
+		    return 2;
+#endif
+		}
+	    }
+	    if (!loaded) {
+		loaded = load_module_silence("zsh/regex", 1);
+		if (loaded) {
+		    overridename = "-regex-match";
+		} else {
+		    zwarnnam(fromtest, "zsh/regex not available for regex");
+		    return 2;
+		}
+	    }
+	    ctype = COND_MODI;
+	}
     case COND_MOD:
     case COND_MODI:
 	{
 	    Conddef cd;
-	    char *name = ecgetstr(state, EC_NODUP, NULL), **strs;
+	    char *name = overridename;
+	    char **strs;
 	    int l = WC_COND_SKIP(code);
 
+	    if (name == NULL)
+		name = ecgetstr(state, EC_NODUP, NULL);
 	    if (ctype == COND_MOD)
 		strs = ecgetarr(state, l, EC_DUP, NULL);
 	    else {
@@ -139,7 +168,8 @@ evalcond(Estate state, char *fromtest)
 		    return !cd->handler(strs, cd->condid);
 		} else {
 		    zwarnnam(fromtest,
-			     "unrecognized condition: `%s'", name);
+			     "unrecognized condition: `%s'",
+			     name ? name : "<null>");
 		}
 	    }
 	    /* module not found, error */
Index: Src/options.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/options.c,v
retrieving revision 1.35
diff -p -u -r1.35 options.c
--- Src/options.c	15 Mar 2007 15:16:58 -0000	1.35
+++ Src/options.c	28 Apr 2007 07:42:51 -0000
@@ -88,6 +88,7 @@ static struct optname optns[] = {
 {{NULL, "banghist",	      OPT_NONBOURNE},		 BANGHIST},
 {{NULL, "bareglobqual",       OPT_EMULATE|OPT_ZSH},      BAREGLOBQUAL},
 {{NULL, "bashautolist",	      0},                        BASHAUTOLIST},
+{{NULL, "bashrematch",	      0},			 BASHREMATCH},
 {{NULL, "beep",		      OPT_ALL},			 BEEP},
 {{NULL, "bgnice",	      OPT_EMULATE|OPT_NONBOURNE},BGNICE},
 {{NULL, "braceccl",	      OPT_EMULATE},		 BRACECCL},
@@ -201,6 +202,7 @@ static struct optname optns[] = {
 {{NULL, "rcquotes",	      OPT_EMULATE},		 RCQUOTES},
 {{NULL, "rcs",		      OPT_ALL},			 RCS},
 {{NULL, "recexact",	      0},			 RECEXACT},
+{{NULL, "rematchpcre",	      OPT_ZSH},			 REMATCHPCRE},
 {{NULL, "restricted",	      OPT_SPECIAL},		 RESTRICTED},
 {{NULL, "rmstarsilent",	      OPT_BOURNE},		 RMSTARSILENT},
 {{NULL, "rmstarwait",	      0},			 RMSTARWAIT},
Index: Src/parse.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/parse.c,v
retrieving revision 1.64
diff -p -u -r1.64 parse.c
--- Src/parse.c	23 Apr 2007 17:24:23 -0000	1.64
+++ Src/parse.c	28 Apr 2007 07:42:52 -0000
@@ -2124,6 +2124,12 @@ par_cond_triple(char *a, char *b, char *
 	ecstr(a);
 	ecstr(c);
 	ecadd(ecnpats++);
+    } else if ((b[0] == Equals || b[0] == '=') &&
+               (b[1] == '~' || b[1] == Tilde) && ~b[2]) {
+	ecadd(WCB_COND(COND_REGEX, 0));
+	ecstr(a);
+	ecstr(c);
+	ecadd(ecnpats++);
     } else if (b[0] == '-') {
 	if ((t0 = get_cond_num(b + 1)) > -1) {
 	    ecadd(WCB_COND(t0 + COND_NT, 0));
Index: Src/text.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/text.c,v
retrieving revision 1.19
diff -p -u -r1.19 text.c
--- Src/text.c	23 Apr 2007 15:24:00 -0000	1.19
+++ Src/text.c	28 Apr 2007 07:42:52 -0000
@@ -640,7 +640,7 @@ gettext2(Estate state)
 	    {
 		static char *c1[] = {
 		    "=", "!=", "<", ">", "-nt", "-ot", "-ef", "-eq",
-		    "-ne", "-lt", "-gt", "-le", "-ge"
+		    "-ne", "-lt", "-gt", "-le", "-ge", "=~"
 		};
 
 		int ctype;
@@ -724,7 +724,7 @@ gettext2(Estate state)
 			}
 			break;
 		    default:
-			if (ctype <= COND_GE) {
+			if (ctype < COND_MOD) {
 			    /* Binary test: `a = b' etc. */
 			    taddstr(ecgetstr(state, EC_NODUP, NULL));
 			    taddstr(" ");
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.112
diff -p -u -r1.112 zsh.h
--- Src/zsh.h	29 Mar 2007 21:35:39 -0000	1.112
+++ Src/zsh.h	28 Apr 2007 07:42:53 -0000
@@ -519,8 +519,9 @@ struct timedfn {
 #define COND_GT    13
 #define COND_LE    14
 #define COND_GE    15
-#define COND_MOD   16
-#define COND_MODI  17
+#define COND_REGEX 16
+#define COND_MOD   17
+#define COND_MODI  18
 
 typedef int (*CondHandler) _((char **, int));
 
@@ -1588,6 +1589,7 @@ enum {
     BANGHIST,
     BAREGLOBQUAL,
     BASHAUTOLIST,
+    BASHREMATCH,
     BEEP,
     BGNICE,
     BRACECCL,
@@ -1695,6 +1697,7 @@ enum {
     RCQUOTES,
     RCS,
     RECEXACT,
+    REMATCHPCRE,
     RESTRICTED,
     RMSTARSILENT,
     RMSTARWAIT,
Index: Src/Modules/pcre.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Modules/pcre.c,v
retrieving revision 1.11
diff -p -u -r1.11 pcre.c
--- Src/Modules/pcre.c	5 Apr 2007 16:20:15 -0000	1.11
+++ Src/Modules/pcre.c	28 Apr 2007 07:42:53 -0000
@@ -3,7 +3,7 @@
  *
  * This file is part of zsh, the Z shell.
  *
- * Copyright (c) 2001, 2002, 2003, 2004 Clint Adams
+ * Copyright (c) 2001, 2002, 2003, 2004, 2007 Clint Adams
  * All rights reserved.
  *
  * Permission is hereby granted, without written agreement and without
@@ -42,6 +42,37 @@ static pcre_extra *pcre_hints;
 
 /**/
 static int
+zpcre_utf8_enabled(void)
+{
+#if defined(MULTIBYTE_SUPPORT) && defined(HAVE_NL_LANGINFO) && defined(CODESET)
+    static int have_utf8_pcre = -1;
+
+    /* value can toggle based on MULTIBYTE, so don't
+     * be too eager with caching */
+    if (have_utf8_pcre < -1)
+	return 0;
+
+    if (!isset(MULTIBYTE))
+	return 0;
+
+    if ((have_utf8_pcre == -1) &&
+        (!strcmp(nl_langinfo(CODESET), "UTF-8"))) {
+
+	if (pcre_config(PCRE_CONFIG_UTF8, &have_utf8_pcre))
+	    have_utf8_pcre = -2; /* erk, failed to ask */
+    }
+
+    if (have_utf8_pcre < 0)
+	return 0;
+    return have_utf8_pcre;
+
+#else
+    return 0;
+#endif
+}
+
+/**/
+static int
 bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func))
 {
     int pcre_opts = 0, pcre_errptr;
@@ -52,8 +83,14 @@ bin_pcre_compile(char *nam, char **args,
     if(OPT_ISSET(ops,'m')) pcre_opts |= PCRE_MULTILINE;
     if(OPT_ISSET(ops,'x')) pcre_opts |= PCRE_EXTENDED;
     
+    if (zpcre_utf8_enabled())
+	pcre_opts |= PCRE_UTF8;
+
     pcre_hints = NULL;  /* Is this necessary? */
     
+    if (pcre_pattern)
+	pcre_free(pcre_pattern);
+
     pcre_pattern = pcre_compile(*args, pcre_opts, &pcre_error, &pcre_errptr, NULL);
     
     if (pcre_pattern == NULL)
@@ -100,37 +137,52 @@ bin_pcre_study(char *nam, UNUSED(char **
 
 /**/
 static int
-zpcre_get_substrings(char *arg, int *ovec, int ret, char *receptacle)
+zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar, char *substravar, int matchedinarr)
 {
-    char **captures, **matches;
+    char **captures, **match_all, **matches;
+    int capture_start = 1;
 
-	if(!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures)) {
-	    
-	    matches = zarrdup(&captures[1]); /* first one would be entire string */
-	    if (receptacle == NULL)
-		setaparam("match", matches);
-	    else
-		setaparam(receptacle, matches);
-	    
-	    pcre_free_substring_list((const char **)captures);
-	}
+    if (matchedinarr)
+	capture_start = 0;
+    if (matchvar == NULL)
+	matchvar = "MATCH";
+    if (substravar == NULL)
+	substravar = "match";
+
+    /* captures[0] will be entire matched string, [1] first substring */
+    if(!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures)) {
+	match_all = ztrdup(captures[0]);
+	setsparam(matchvar, match_all);
+	matches = zarrdup(&captures[capture_start]);
+	setaparam(substravar, matches);
+	pcre_free_substring_list((const char **)captures);
+    }
 
-	return 0;
+    return 0;
 }
 
 /**/
 static int
 bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 {
-    int ret, capcount, *ovec, ovecsize;
+    int ret, capcount, *ovec, ovecsize, c;
+    char *matched_portion = NULL;
     char *receptacle = NULL;
+    int return_value = 1;
+
+    if (pcre_pattern == NULL) {
+	zwarnnam(nam, "no pattern has been compiled");
+	return 1;
+    }
     
-    if(OPT_ISSET(ops,'a')) {
-	receptacle = *args++;
-	if(!*args) {
-	    zwarnnam(nam, "not enough arguments");
-	    return 1;
-	}
+    if(OPT_HASARG(ops,c='a')) {
+	receptacle = OPT_ARG(ops,c);
+    }
+    if(OPT_HASARG(ops,c='v')) {
+	matched_portion = OPT_ARG(ops,c);
+    }
+    if(!*args) {
+	zwarnnam(nam, "not enough arguments");
     }
     
     if ((ret = pcre_fullinfo(pcre_pattern, pcre_hints, PCRE_INFO_CAPTURECOUNT, &capcount)))
@@ -144,18 +196,20 @@ bin_pcre_match(char *nam, char **args, O
     
     ret = pcre_exec(pcre_pattern, pcre_hints, *args, strlen(*args), 0, 0, ovec, ovecsize);
     
-    if (ret==0) return 0;
-    else if (ret==PCRE_ERROR_NOMATCH) return 1; /* no match */
+    if (ret==0) return_value = 0;
+    else if (ret==PCRE_ERROR_NOMATCH) /* no match */;
     else if (ret>0) {
-	zpcre_get_substrings(*args, ovec, ret, receptacle);
-	return 0;
+	zpcre_get_substrings(*args, ovec, ret, matched_portion, receptacle, 0);
+	return_value = 0;
     }
     else {
 	zwarnnam(nam, "error in pcre_exec");
-	return 1;
     }
     
-    return 1;
+    if (ovec)
+	zfree(ovec, ovecsize*sizeof(int));
+
+    return return_value;
 }
 
 /**/
@@ -164,33 +218,63 @@ cond_pcre_match(char **a, int id)
 {
     pcre *pcre_pat;
     const char *pcre_err;
-    char *lhstr, *rhre;
+    char *lhstr, *rhre, *avar=NULL;
     int r = 0, pcre_opts = 0, pcre_errptr, capcnt, *ov, ovsize;
+    int return_value = 0;
+
+    if (zpcre_utf8_enabled())
+	pcre_opts |= PCRE_UTF8;
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);
+    pcre_pat = ov = NULL;
+
+    if (isset(BASHREMATCH))
+	avar="BASH_REMATCH";
 
     switch(id) {
 	 case CPCRE_PLAIN:
-		 pcre_pat = pcre_compile(rhre, pcre_opts, &pcre_err, &pcre_errptr, NULL);
-                 pcre_fullinfo(pcre_pat, NULL, PCRE_INFO_CAPTURECOUNT, &capcnt);
-    		 ovsize = (capcnt+1)*3;
-		 ov = zalloc(ovsize*sizeof(int));
-    		 r = pcre_exec(pcre_pat, NULL, lhstr, strlen(lhstr), 0, 0, ov, ovsize);
-    		if (r==0) return 1;
+		pcre_pat = pcre_compile(rhre, pcre_opts, &pcre_err, &pcre_errptr, NULL);
+		if (pcre_pat == NULL) {
+		    zwarn("failed to compile regexp /%s/: %s", rhre, pcre_err);
+		    break;
+		}
+                pcre_fullinfo(pcre_pat, NULL, PCRE_INFO_CAPTURECOUNT, &capcnt);
+    		ovsize = (capcnt+1)*3;
+		ov = zalloc(ovsize*sizeof(int));
+    		r = pcre_exec(pcre_pat, NULL, lhstr, strlen(lhstr), 0, 0, ov, ovsize);
+		/* r < 0 => error; r==0 match but not enough size in ov
+		 * r > 0 => (r-1) substrings found; r==1 => no substrings
+		 */
+    		if (r==0) {
+		    zwarn("reportable zsh problem: pcre_exec() returned 0");
+		    return_value = 1;
+		    break;
+		}
 	        else if (r==PCRE_ERROR_NOMATCH) return 0; /* no match */
+		else if (r<0) {
+		    zwarn("pcre_exec() error: %d", r);
+		    break;
+		}
                 else if (r>0) {
-		    zpcre_get_substrings(lhstr, ov, r, NULL);
-		    return 1;
+		    zpcre_get_substrings(lhstr, ov, r, NULL, avar, isset(BASHREMATCH));
+		    return_value = 1;
+		    break;
 		}
 		break;
     }
 
-    return 0;
+    if (pcre_pat)
+	pcre_free(pcre_pat);
+    if (ov)
+	zfree(ov, ovsize*sizeof(int));
+
+    return return_value;
 }
 
 static struct conddef cotab[] = {
     CONDDEF("pcre-match", CONDF_INFIX, cond_pcre_match, 0, 0, CPCRE_PLAIN)
+    /* CONDDEF can register =~ but it won't be found */
 };
 
 /**/
@@ -206,7 +290,7 @@ static struct conddef cotab[] = {
 static struct builtin bintab[] = {
     BUILTIN("pcre_compile", 0, bin_pcre_compile, 1, 1, 0, "aimx",  NULL),
     BUILTIN("pcre_study",   0, bin_pcre_study,   0, 0, 0, NULL,    NULL),
-    BUILTIN("pcre_match",   0, bin_pcre_match,   1, 2, 0, "a",    NULL)
+    BUILTIN("pcre_match",   0, bin_pcre_match,   1, 1, 0, "a:v:",    NULL)
 };
 
 

[-- Attachment #3: regex.mdd --]
[-- Type: text/plain, Size: 68 bytes --]

name=zsh/regex
link=dynamic
load=no

autobins=""

objects="regex.o"

[-- Attachment #4: regex.c --]
[-- Type: text/x-csrc, Size: 3879 bytes --]

/*
 * regex.c
 *
 * This file is part of zsh, the Z shell.
 *
 * Copyright (c) 2007 Phil Pennock
 * All Rights Reserved.
 *
 * Permission is hereby granted, without written agreement and without
 * license or royalty fees, to use, copy, modify, and distribute this
 * software and to distribute modified versions of this software for any
 * purpose, provided that the above copyright notice and the following
 * two paragraphs appear in all copies of this software.
 *
 * In no event shall Clint Adams or the Zsh Development Group be liable
 * to any party for direct, indirect, special, incidental, or consequential
 * damages arising out of the use of this software and its documentation,
 * even if Andrew Main and the Zsh Development Group have been advised of
 * the possibility of such damage.
 *
 * Clint Adams and the Zsh Development Group specifically disclaim any
 * warranties, including, but not limited to, the implied warranties of
 * merchantability and fitness for a particular purpose.  The software
 * provided hereunder is on an "as is" basis, and Andrew Main and the
 * Zsh Development Group have no obligation to provide maintenance,
 * support, updates, enhancements, or modifications.
 *
 */

#include "regex.mdh"
#include "regex.pro"

#include <regex.h>

/* we default to a vaguely modern syntax and set of capabilities */
#define ZREGEX_EXTENDED 0
/* if you want Basic syntax, make it an alternative options */

static void
zregex_regerrwarn(int r, regex_t *re, char *msg)
{
    char *errbuf;
    size_t errbufsz;

    errbufsz = regerror(r, re, NULL, 0);
    errbuf = zalloc(errbufsz*sizeof(char));
    regerror(r, re, errbuf, errbufsz);
    zwarn("%s: %s", msg, errbuf);
    zfree(errbuf, errbufsz);
}

/**/
static int
zcond_regex_match(char **a, int id)
{
    regex_t re;
    regmatch_t *m, *matches = NULL;
    size_t matchessz;
    char *lhstr, *rhre, *s, **arr, **x;
    int r, n, return_value, rcflags, reflags, nelem, start;

    lhstr = cond_str(a,0,0);
    rhre = cond_str(a,1,0);
    rcflags = reflags = 0;
    return_value = 0; /* 1 => matched successfully */

    switch(id) {
    case ZREGEX_EXTENDED:
	rcflags |= REG_EXTENDED;
	r = regcomp(&re, rhre, rcflags);
	if (r) {
	    zregex_regerrwarn(r, &re, "failed to compile regex");
	    break;
	}
	/* re.re_nsub is number of parenthesized groups, we also need
	 * 1 for the 0 offset, which is the entire matched portion
	 */
	matchessz = (re.re_nsub + 1) * sizeof(regmatch_t);
	matches = zalloc(matchessz);
	r = regexec(&re, lhstr, re.re_nsub+1, matches, reflags);
	if (r == REG_NOMATCH) /**/;
	else if (r == 0) {
	    return_value = 1;
	    if (isset(BASHREMATCH)) {
		start = 0;
		nelem = re.re_nsub + 1;
	    } else {
		start = 1;
		nelem = re.re_nsub;
	    }
	    /* entire matched portion + re_nsub substrings + NULL */
	    if (nelem) {
		arr = x = (char **) zalloc(sizeof(char *) * (nelem + 1));
		for (m = matches + start, n = start; n <= re.re_nsub; ++n, ++m, ++x) {
		    *x = ztrduppfx(lhstr + m->rm_so, m->rm_eo - m->rm_so);
		}
		*x = NULL;
	    }
	    if (isset(BASHREMATCH)) {
		setaparam("BASH_REMATCH", arr);
	    } else {
		m = matches;
		s = ztrduppfx(lhstr + m->rm_so, m->rm_eo - m->rm_so);
		setsparam("MATCH", s);
		if (nelem)
		    setaparam("match", arr);
	    }
	}
	else zregex_regerrwarn(r, &re, "regex matching error");
	break;
    }

    if (matches)
	zfree(matches, matchessz);
    regfree(&re);
    return return_value;
}

static struct conddef cotab[] = {
    CONDDEF("regex-match", CONDF_INFIX, zcond_regex_match, 0, 0, ZREGEX_EXTENDED)
};

/**/
int
setup_(UNUSED(Module m))
{
    return 0;
}

/**/
int
boot_(Module m)
{
    return !addconddefs(m->nam, cotab, sizeof(cotab)/sizeof(*cotab));
}

/**/
int
cleanup_(Module m)
{
    deleteconddefs(m->nam, cotab, sizeof(cotab)/sizeof(*cotab));
    return 0;
}

/**/
int
finish_(UNUSED(Module m))
{
    return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-05-29  8:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-02 14:49 PATCH: zsh/regex and =~ Daniel Qarras
2007-05-02 16:36 ` Peter Stephenson
2007-05-02 16:54   ` Bart Schaefer
2007-05-02 17:18     ` Andrey Borzenkov
2007-05-02 17:32       ` Peter Stephenson
2007-05-02 17:12   ` Andrey Borzenkov
     [not found] <zsh-workers+phil.pennock@spodhuis.org>
2007-04-28  7:56 ` Phil Pennock
2007-04-28  8:20   ` Phil Pennock
2007-04-29  0:51   ` Phil Pennock
2007-04-29 15:16   ` Peter Stephenson
2007-04-29 15:28     ` Peter Stephenson
2007-04-29 19:17       ` Phil Pennock
2007-05-01 21:59   ` Peter Stephenson
2007-05-02  0:11     ` Phil Pennock
2007-05-02  2:53       ` Bart Schaefer
2007-05-29  8:56   ` Phil Pennock

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).