zsh-workers
 help / color / mirror / code / Atom feed
* Re: cp file with a filter
       [not found]     ` <20100117205117.568632c9@pws-pc>
  2010-01-17 21:30       ` cp file with a filter Peter Stephenson
@ 2010-01-17 22:13       ` Bart Schaefer
  1 sibling, 0 replies; 4+ messages in thread
From: Bart Schaefer @ 2010-01-17 22:13 UTC (permalink / raw)
  To: zsh-workers

On Jan 17,  8:51pm, Peter Stephenson wrote:
}
} +`${var[$MBEGIN,$MEND]}' is identical to `$MATCH'.

You need only ${var[MBEGIN,MEND]} there, array subscripts interpret in
math context ...


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: cp file with a filter
  2010-01-17 22:18         ` Bart Schaefer
@ 2010-01-18  9:56           ` Peter Stephenson
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2010-01-18  9:56 UTC (permalink / raw)
  To: Zsh hackers list

Bart Schaefer wrote:
> On Jan 17,  9:30pm, Peter Stephenson wrote:
> }
> } Talking of annoying inconsistencies, this updates the patch to work when
> } pcre is used for regex matching and fixes the pcre behaviour (and
> } documents the behaviour for both cases)
> 
> Was there supposed to be a patch for Doc/Zsh/mod_pcre.yo included here?
> There was none.

No.  The "=~" interface, whether you're using pcre or POSIX regular
expressions, is separate from the original PCRE interface and is
completely documented in cond.yo.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: cp file with a filter
  2010-01-17 21:30       ` cp file with a filter Peter Stephenson
@ 2010-01-17 22:18         ` Bart Schaefer
  2010-01-18  9:56           ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Schaefer @ 2010-01-17 22:18 UTC (permalink / raw)
  To: Zsh hackers list

On Jan 17,  9:30pm, Peter Stephenson wrote:
}
} Talking of annoying inconsistencies, this updates the patch to work when
} pcre is used for regex matching and fixes the pcre behaviour (and
} documents the behaviour for both cases)

Was there supposed to be a patch for Doc/Zsh/mod_pcre.yo included here?
There was none.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: cp file with a filter
       [not found]     ` <20100117205117.568632c9@pws-pc>
@ 2010-01-17 21:30       ` Peter Stephenson
  2010-01-17 22:18         ` Bart Schaefer
  2010-01-17 22:13       ` Bart Schaefer
  1 sibling, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2010-01-17 21:30 UTC (permalink / raw)
  To: Zsh hackers list

Peter Stephenson wrote:
> If you're dead set on regular expressions you can use the [[ ... =~
> ... ]] syntax, but unfortunately I've just noticed this is a bit broken
> for substitutions since although it sets the variable MATCH it doesn't
> set the variables MBEGIN and MEND, which both is annoyingly inconsistent
> with variable substitution and makes it hard to decide which bit of the
> line you're replacing.  The following fixes that omission.  It's an
> exercise for the reader to use this to replace the part of the history line
> from $MBEGIN to $MEND.
> 
> Comments on the patch should go to zsh-workers.

Talking of annoying inconsistencies, this updates the patch to work when
pcre is used for regex matching and fixes the pcre behaviour (and
documents the behaviour for both cases) so that the array variables
aren't set in =~ matching if there were no parenthesised subexpressions.

Index: Doc/Zsh/cond.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/cond.yo,v
retrieving revision 1.6
diff -u -r1.6 cond.yo
--- Doc/Zsh/cond.yo	15 Jan 2009 09:49:06 -0000	1.6
+++ Doc/Zsh/cond.yo	17 Jan 2010 21:26:03 -0000
@@ -117,13 +117,28 @@
 extended regular expression using the tt(zsh/regex) module.
 Upon successful match, some variables will be updated; no variables
 are changed if the matching fails.
+
+If the option tt(BASH_REMATCH) is not set the scalar parameter
+tt(MATCH) is set to the substring that matched the pattern and
+the integer parameters tt(MBEGIN) and tt(MEND) to the index of the start
+and end, respectively, of the match in var(string), such that if
+var(string) is contained in variable tt(var) the expression
+`${var[$MBEGIN,$MEND]}' is identical to `$MATCH'.  The setting
+of the option tt(KSH_ARRAYS) is respected.  Likewise, the array
+tt(match) is set to the substrings that matched parenthesised
+subexpressions and the arrays tt(mbegin) and tt(mend) to the indices of
+the start and end positions, respectively, of the substrings within
+var(string).  For example, if the string `tt(a short string)' is matched
+against the regular expression `tt(s(...)t)', then (assuming the option
+tt(KSH_ARRAYS) is not set) tt(MATCH), tt(MBEGIN)
+and tt(MEND) are `tt(short)', 3 and 7, respectively, while tt(match),
+tt(mbegin) and tt(mend) are single entry arrays containing
+the strings `tt(hor)', `tt(4)' and `tt(6), respectively.
+
 If the option tt(BASH_REMATCH) is set the array
 tt(BASH_REMATCH) is set to the substring that matched the pattern
 followed by the substrings that matched parenthesised
-subexpressions within the pattern; otherwise, the scalar parameter
-tt(MATCH) is set to the substring that matched the pattern and
-and the array tt(match) to the substrings that matched parenthesised
-subexpressions.
+subexpressions within the pattern.
 )
 item(var(string1) tt(<) var(string2))(
 true if var(string1) comes before var(string2)
Index: Src/Modules/pcre.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Modules/pcre.c,v
retrieving revision 1.16
diff -u -r1.16 pcre.c
--- Src/Modules/pcre.c	25 Mar 2009 11:29:14 -0000	1.16
+++ Src/Modules/pcre.c	17 Jan 2010 21:26:03 -0000
@@ -138,8 +138,9 @@
 
 /**/
 static int
-zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar, char *substravar, 
-    int want_offset_pair, int matchedinarr)
+zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
+		     char *substravar, int want_offset_pair, int matchedinarr,
+		     int want_begin_end)
 {
     char **captures, *match_all, **matches;
     char offset_all[50];
@@ -154,6 +155,7 @@
     
     /* captures[0] will be entire matched string, [1] first substring */
     if (!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures)) {
+	int nelem = arrlen(captures)-1;
 	/* Set to the offsets of the complete match */
 	if (want_offset_pair) {
 	    sprintf(offset_all, "%d %d", ovec[0], ovec[1]);
@@ -161,8 +163,70 @@
 	}
 	match_all = ztrdup(captures[0]);
 	setsparam(matchvar, match_all);
-	matches = zarrdup(&captures[capture_start]);
-	setaparam(substravar, matches);
+	/*
+	 * If we're setting match, mbegin, mend we only do
+	 * so if there were parenthesised matches, for consistency
+	 * (c.f. regex.c).
+	 */
+	if (!want_begin_end || nelem) {
+	    matches = zarrdup(&captures[capture_start]);
+	    setaparam(substravar, matches);
+	}
+
+	if (want_begin_end) {
+	    char *ptr = arg;
+	    zlong offs = 0;
+
+	    /* Count the characters before the match */
+	    MB_METACHARINIT();
+	    while (ptr < arg + ovec[0]) {
+		offs++;
+		ptr += MB_METACHARLEN(ptr);
+	    }
+	    setiparam("MBEGIN", offs + !isset(KSHARRAYS));
+	    /* Add on the characters in the match */
+	    while (ptr < arg + ovec[1]) {
+		offs++;
+		ptr += MB_METACHARLEN(ptr);
+	    }
+	    setiparam("MEND", offs + !isset(KSHARRAYS) - 1);
+	    if (nelem) {
+		char **mbegin, **mend, **bptr, **eptr;
+		int i, *ipair;
+
+		bptr = mbegin = zalloc(nelem+1);
+		eptr = mend = zalloc(nelem+1);
+
+		for (ipair = ovec + 2, i = 0;
+		     i < nelem;
+		     ipair += 2, i++, bptr++, eptr++)
+		{
+		    char buf[DIGBUFSIZE];
+		    ptr = arg;
+		    offs = 0;
+		    /* Find the start offset */
+		    MB_METACHARINIT();
+		    while (ptr < arg + ipair[0]) {
+			offs++;
+			ptr += MB_METACHARLEN(ptr);
+		    }
+		    convbase(buf, offs + !isset(KSHARRAYS), 10);
+		    *bptr = ztrdup(buf);
+		    /* Continue to the end offset */
+		    while (ptr < arg + ipair[1]) {
+			offs++;
+			ptr += MB_METACHARLEN(ptr);
+		    }
+		    convbase(buf, offs + !isset(KSHARRAYS) - 1, 10);
+		    *eptr = ztrdup(buf);
+		}
+		*bptr = *eptr = NULL;
+
+		setaparam("mbegin", mbegin);
+		setaparam("mend", mend);
+	    }
+	}
+
 	pcre_free_substring_list((const char **)captures);
     }
 
@@ -238,7 +302,8 @@
     if (ret==0) return_value = 0;
     else if (ret==PCRE_ERROR_NOMATCH) /* no match */;
     else if (ret>0) {
-	zpcre_get_substrings(*args, ovec, ret, matched_portion, receptacle, want_offset_pair, 0);
+	zpcre_get_substrings(*args, ovec, ret, matched_portion, receptacle,
+			     want_offset_pair, 0, 0);
 	return_value = 0;
     }
     else {
@@ -297,7 +362,9 @@
 		    break;
 		}
                 else if (r>0) {
-		    zpcre_get_substrings(lhstr, ov, r, NULL, avar, 0, isset(BASHREMATCH));
+		    zpcre_get_substrings(lhstr, ov, r, NULL, avar, 0,
+					 isset(BASHREMATCH),
+					 !isset(BASHREMATCH));
 		    return_value = 1;
 		    break;
 		}
Index: Src/Modules/regex.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Modules/regex.c,v
retrieving revision 1.5
diff -u -r1.5 regex.c
--- Src/Modules/regex.c	19 Jan 2009 08:26:21 -0000	1.5
+++ Src/Modules/regex.c	17 Jan 2010 21:26:03 -0000
@@ -108,11 +108,65 @@
 	    if (isset(BASHREMATCH)) {
 		setaparam("BASH_REMATCH", arr);
 	    } else {
+		zlong offs;
+		char *ptr;
+
 		m = matches;
 		s = ztrduppfx(lhstr + m->rm_so, m->rm_eo - m->rm_so);
 		setsparam("MATCH", s);
-		if (nelem)
+		/*
+		 * Count the characters before the match.
+		 */
+		ptr = lhstr;
+		offs = 0;
+		MB_METACHARINIT();
+		while (ptr < lhstr + m->rm_so) {
+		    offs++;
+		    ptr += MB_METACHARLEN(ptr);
+		}
+		setiparam("MBEGIN", offs + !isset(KSHARRAYS));
+		/*
+		 * Add on the characters in the match.
+		 */
+		while (ptr < lhstr + m->rm_eo) {
+		    offs++;
+		    ptr += MB_METACHARLEN(ptr);
+		}
+		setiparam("MEND", offs + !isset(KSHARRAYS) - 1);
+		if (nelem) {
+		    char **mbegin, **mend, **bptr, **eptr;
+		    bptr = mbegin = (char **)zalloc(nelem+1);
+		    eptr = mend = (char **)zalloc(nelem+1);
+
+		    for (m = matches + start, n = start;
+			 n <= (int)re.re_nsub;
+			 ++n, ++m, ++bptr, ++eptr)
+		    {
+			char buf[DIGBUFSIZE];
+			ptr = lhstr;
+			offs = 0;
+			/* Find the start offset */
+			MB_METACHARINIT();
+			while (ptr < lhstr + m->rm_so) {
+			    offs++;
+			    ptr += MB_METACHARLEN(ptr);
+			}
+			convbase(buf, offs + !isset(KSHARRAYS), 10);
+			*bptr = ztrdup(buf);
+			/* Continue to the end offset */
+			while (ptr < lhstr + m->rm_eo) {
+			    offs++;
+			    ptr += MB_METACHARLEN(ptr);
+			}
+			convbase(buf, offs + !isset(KSHARRAYS) - 1, 10);
+			*eptr = ztrdup(buf);
+		    }
+		    *bptr = *eptr = NULL;
+
 		    setaparam("match", arr);
+		    setaparam("mbegin", mbegin);
+		    setaparam("mend", mend);
+		}
 	    }
 	}
 	else
Index: Test/C02cond.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/C02cond.ztst,v
retrieving revision 1.23
diff -u -r1.23 C02cond.ztst
--- Test/C02cond.ztst	26 Nov 2008 10:50:07 -0000	1.23
+++ Test/C02cond.ztst	17 Jan 2010 21:26:03 -0000
@@ -251,6 +251,39 @@
   fi
 0:regex tests shouldn't crash
 
+  if zmodload -i zsh/regex 2>/dev/null; then
+    string="this has stuff in it"
+    bad_regex=0
+    if [[ $string =~ "h([a-z]*) s([a-z]*) " ]]; then
+      if [[ "$MATCH $MBEGIN $MEND" != "has stuff  6 15" ]]; then
+	print -r "regex variables MATCH MBEGIN MEND:
+  '$MATCH $MBEGIN $MEND'
+  should be:
+  'has stuff  6 15'" >&2
+        bad_regex=1
+      else
+	results=("as 7 8" "tuff 11 14")
+	for i in 1 2; do
+	  if [[ "$match[$i] $mbegin[$i] $mend[$i]" != $results[i] ]]; then
+	    print -r "regex variables match[$i] mbegin[$i] mend[$i]:
+  '$match[$i] $mbegin[$i] $mend[$i]'
+  should be
+  '$results[$i]'" >&2
+	    break
+	  fi
+	done
+      fi
+    else
+      print -r "regex failed to match '$string'" >&2
+    fi
+    (( bad_regex )) || print OK
+  else
+    # if it didn't load, tough, but not a test error
+    print OK
+  fi
+0:MATCH, MBEGIN, MEND, match, mbegin, mend
+>OK
+
 %clean
   # This works around a bug in rm -f in some versions of Cygwin
   chmod 644 unmodish


-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-01-18  9:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Xns9D00DA47D5E30zzappergmailcom@80.91.229.13>
     [not found] ` <20100114225814.GB24626@dyuven.local>
     [not found]   ` <20100115171759.1099b9a0@abz8111lx.eurac.edu>
     [not found]     ` <20100117205117.568632c9@pws-pc>
2010-01-17 21:30       ` cp file with a filter Peter Stephenson
2010-01-17 22:18         ` Bart Schaefer
2010-01-18  9:56           ` Peter Stephenson
2010-01-17 22:13       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).