zsh-workers
 help / color / mirror / code / Atom feed
From: Sess <leycec@gmail.com>
To: zsh-workers@zsh.org
Subject: pcre_match() option "-n" broken under zsh 5.0.6
Date: Sun, 7 Sep 2014 01:39:05 -0400	[thread overview]
Message-ID: <CAJJ24mY+f9G3MYoaNotb-Yuu16gpqKHxPTi60E1YmL2dQcFu0Q@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2906 bytes --]

zsh 5.0.6 appears to have broken pcre_match() with respect to option "-n".
As a minimal length example, running a slightly embellished variant of the
"man zshmodules" example yields an infinite loop: e.g.,

    % string="The following zip codes: 78884 90210 99513"
    % pcre_compile -m "\d{5}"
    % accum=()
    % pcre_match -b -- $string
    % while [[ $? -eq 0 ]] do
    .     print "match: $MATCH; ZPCRE_OP: $ZPCRE_OP"
    .     b=($=ZPCRE_OP)
    .     accum+=$MATCH
    .     pcre_match -b -n $b[2] -- $string
    % done
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
                    .
                    .
                    .

The new behaviour seems rather... unproductive.

Under zsh 5.0.5, the same example successfully terminates with standard
output:

    match: 78884; ZPCRE_OP: 25 30
    match: 90210; ZPCRE_OP: 31 36
    match: 99513; ZPCRE_OP: 37 42

A unit test might prove helpful here.

While on the topic, it might also be helpful to note that the manpage
documentation for pcre_match() is rather incorrect. It reads:

    For example, a ZPCRE_OP set to "32 45" indicates that the matched
portion began on byte offset 32 and ended on byte offset 44. Here, byte
offset position 45 is the position directly after the matched portion.

But that isn't the case. The first word of ZPCRE_OP is the offset of the
byte preceding the first byte of the matched substring, while the second
word of ZPCRE_OP is the offset of the last byte of the matched substring --
the diametric opposite. Hence, such documentation should read:

    For example, a ZPCRE_OP set to "32 45" indicates that the matched
portion began on byte offset 33 and ended on byte offset 45. Here, byte
offset position 32 is the position directly before the matched portion.

Given that, one would assume line "pcre_match -b -n $b[2] -- $string" of
the manpage example to also be incorrect. Specifically, since "$b[2]" is
the offset of the last byte of the prior match, passing such offset to
option "-n" should force pcre_match() to begin searching one byte earlier
than intended.

But that isn't the case. pcre_match() searches correctly, as verifiable by
replacing "\d{5}" by "\d{2}" in such example. This implies option "-n" to
begin searching at the byte following the passed byte offset (rather than
at such offset), implying such option to also be incorrectly documented. It
reads:

    A -n option starts searching for a match from the byte offset position
in string.

Correcting for clarity and grammar, that should read:

    If the -n option is given, a match will be searched for starting at the
byte following the passed byte offset in the string.

In any case, thanks all for the continued grit, fortitude, and hard shell
work.

Humbly yours,
Cecil

             reply	other threads:[~2014-09-07  5:39 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-07  5:39 Sess [this message]
2014-09-07 17:43 ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJJ24mY+f9G3MYoaNotb-Yuu16gpqKHxPTi60E1YmL2dQcFu0Q@mail.gmail.com \
    --to=leycec@gmail.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).