zsh-workers
 help / color / mirror / code / Atom feed
* pcre_match() option "-n" broken under zsh 5.0.6
@ 2014-09-07  5:39 Sess
  2014-09-07 17:43 ` Bart Schaefer
  0 siblings, 1 reply; 2+ messages in thread
From: Sess @ 2014-09-07  5:39 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2906 bytes --]

zsh 5.0.6 appears to have broken pcre_match() with respect to option "-n".
As a minimal length example, running a slightly embellished variant of the
"man zshmodules" example yields an infinite loop: e.g.,

    % string="The following zip codes: 78884 90210 99513"
    % pcre_compile -m "\d{5}"
    % accum=()
    % pcre_match -b -- $string
    % while [[ $? -eq 0 ]] do
    .     print "match: $MATCH; ZPCRE_OP: $ZPCRE_OP"
    .     b=($=ZPCRE_OP)
    .     accum+=$MATCH
    .     pcre_match -b -n $b[2] -- $string
    % done
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
                    .
                    .
                    .

The new behaviour seems rather... unproductive.

Under zsh 5.0.5, the same example successfully terminates with standard
output:

    match: 78884; ZPCRE_OP: 25 30
    match: 90210; ZPCRE_OP: 31 36
    match: 99513; ZPCRE_OP: 37 42

A unit test might prove helpful here.

While on the topic, it might also be helpful to note that the manpage
documentation for pcre_match() is rather incorrect. It reads:

    For example, a ZPCRE_OP set to "32 45" indicates that the matched
portion began on byte offset 32 and ended on byte offset 44. Here, byte
offset position 45 is the position directly after the matched portion.

But that isn't the case. The first word of ZPCRE_OP is the offset of the
byte preceding the first byte of the matched substring, while the second
word of ZPCRE_OP is the offset of the last byte of the matched substring --
the diametric opposite. Hence, such documentation should read:

    For example, a ZPCRE_OP set to "32 45" indicates that the matched
portion began on byte offset 33 and ended on byte offset 45. Here, byte
offset position 32 is the position directly before the matched portion.

Given that, one would assume line "pcre_match -b -n $b[2] -- $string" of
the manpage example to also be incorrect. Specifically, since "$b[2]" is
the offset of the last byte of the prior match, passing such offset to
option "-n" should force pcre_match() to begin searching one byte earlier
than intended.

But that isn't the case. pcre_match() searches correctly, as verifiable by
replacing "\d{5}" by "\d{2}" in such example. This implies option "-n" to
begin searching at the byte following the passed byte offset (rather than
at such offset), implying such option to also be incorrectly documented. It
reads:

    A -n option starts searching for a match from the byte offset position
in string.

Correcting for clarity and grammar, that should read:

    If the -n option is given, a match will be searched for starting at the
byte following the passed byte offset in the string.

In any case, thanks all for the continued grit, fortitude, and hard shell
work.

Humbly yours,
Cecil

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: pcre_match() option "-n" broken under zsh 5.0.6
  2014-09-07  5:39 pcre_match() option "-n" broken under zsh 5.0.6 Sess
@ 2014-09-07 17:43 ` Bart Schaefer
  0 siblings, 0 replies; 2+ messages in thread
From: Bart Schaefer @ 2014-09-07 17:43 UTC (permalink / raw)
  To: Sess, zsh-workers

On Sep 7,  1:39am, Sess wrote:
} 
} zsh 5.0.6 appears to have broken pcre_match() with respect to option "-n".

Damn.  Misplaced paren.

Here it is with a regression test (the multibyte characters in the test
preceding may cause the patch not to apply from the email).

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index 040a33f..2393cd1 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -289,7 +289,7 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 	matched_portion = OPT_ARG(ops,c);
     }
     if(OPT_HASARG(ops,c='n')) { /* The offset position to start the search, in bytes. */
-	if ((offset_start = getposint(OPT_ARG(ops,c), nam) < 0))
+	if ((offset_start = getposint(OPT_ARG(ops,c), nam)) < 0)
 	    return 1;
     }
     /* For the entire match, 'Return' the offset byte positions instead of the matched string */
diff --git a/Test/V07pcre.ztst b/Test/V07pcre.ztst
index f5b05de..3a65331 100644
--- a/Test/V07pcre.ztst
+++ b/Test/V07pcre.ztst
@@ -108,3 +108,12 @@
 >1
 >0 xo→t →t
 >0 Xo→t →t
+
+  string="The following zip codes: 78884 90210 99513"
+  pcre_compile -m "\d{5}"
+  pcre_match -b -- $string && print "$MATCH; ZPCRE_OP: $ZPCRE_OP"
+  pcre_match -b -n $ZPCRE_OP[(w)2] -- $string || print failed
+  print "$MATCH; ZPCRE_OP: $ZPCRE_OP"
+0:pcre_match -b and pcre_match -n
+>78884; ZPCRE_OP: 25 30
+>90210; ZPCRE_OP: 31 36


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-09-07 17:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-07  5:39 pcre_match() option "-n" broken under zsh 5.0.6 Sess
2014-09-07 17:43 ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).