From: Peter Stephenson <p.w.stephenson@ntlworld.com>
To: Johan Grande <nahoj@crans.org>, zsh-workers@zsh.org
Subject: Re: Pattern bug on (a*|)~^(*b)
Date: Mon, 31 Jul 2023 16:21:46 +0100 (BST) [thread overview]
Message-ID: <57958231.1555405.1690816906480@mail.virginmedia.com> (raw)
In-Reply-To: <138967397.1507519.1690803366587@mail.virginmedia.com>
> On 31/07/2023 12:36 Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> > On 25/07/2023 14:19 Johan Grande <nahoj@crans.org> wrote:
> > In zsh 5.8.1 (x86_64-ubuntu-linux-gnu) with extended_glob,
> >
> > [[ "ab" = (|a*)~^(*b) ]]
> >
> > incorrectly (unless I'm mistaken) returns 1.
>
> The fixes I can think of are either to find a point at which to undo the sync
> record for the match that has failed after the event or, perhaps better as it's
> a local change even though it does more work, reset any previous matches before
> the sync point on the next alternative when we mark that as having matched
> (we will only do that if the previous branch failed --- remember [irony
> intended] that a branch succeeds only if all following patterns match too).
I think the latter is good enough. There's no sign of a slowdown with this
fix in D02glob.ztst, so any resulting problems will have to be complicated
things with nested branches and probably zero-length matches --- not all
of which are well optimised anyway. I could be wrong, but I think we'll
have to take it on the chin.
Anyway, I couldn't think of any easy way around that, despite some investigation,
so as this does fix the immediate problem, we can go with it. It's the sort
of self-contained fix I like.
pws
diff --git a/Src/pattern.c b/Src/pattern.c
index 3edda1772..2a1a514fb 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -2987,14 +2987,15 @@ patmatch(Upat prog)
case P_EXCSYNC:
/* See the P_EXCLUDE code below for where syncptr comes from */
{
- unsigned char *syncptr;
+ unsigned char *syncstart, *syncptr, *ptr;
Upat after;
after = P_OPERAND(scan);
DPUTS(!P_ISEXCLUDE(after),
"BUG: EXCSYNC not followed by EXCLUDE.");
DPUTS(!P_OPERAND(after)->p,
"BUG: EXCSYNC not handled by EXCLUDE");
- syncptr = P_OPERAND(after)->p + (patinput - patinstart);
+ syncstart = P_OPERAND(after)->p;
+ syncptr = syncstart + (patinput - patinstart);
/*
* If we already matched from here, this time we fail.
* See WBRANCH code for story about error count.
@@ -3009,6 +3010,23 @@ patmatch(Upat prog)
* failed anyway.
*/
*syncptr = errsfound + 1;
+ /*
+ * Because of backtracking, any match before this point
+ * can't apply to the current branch we're on so is now
+ * a failure --- this can happen if, on a previous
+ * branch, we initially marked a success before failing
+ * on a later part of the pattern after marking up the
+ * P_EXCSYNC (even an end anchor will have this effect).
+ * To make sure we record the current match point
+ * correctly, mark those down now.
+ *
+ * This might have side effects on the efficiency of
+ * pathological cases involving nested branches. To
+ * fix that we'd probably need to record matches on
+ * different branches separately.
+ */
+ for (ptr = syncstart; ptr < syncptr; ++ptr)
+ *ptr = 0;
}
break;
case P_EXCEND:
diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst
index 850a535e5..4d88e5c27 100644
--- a/Test/D02glob.ztst
+++ b/Test/D02glob.ztst
@@ -817,6 +817,32 @@
*>*/glob.tmp/(flip|flop)
*>*/glob.tmp/(flip|flop)/trailing/components
+# The following set test an obscure problem with branches followed by
+# exclusions that shows up when the exclusion matches against
+# something other than the complete test string, hence the complicated
+# double negative.
+ [[ ab = (|a*)~^(*b) ]]
+0:Regression test for exclusion after branches: empty first alternative
+
+ [[ ab = (b|a*)~^(*b) ]]
+0:Regression test for exclusion after branches: non-empty first alternative
+
+ [[ ab = (b*|a*)~^(*b) ]]
+0:Regression test for exclusion after branches: full length first alternative
+
+# Corresponding tests where the exclusion should succeed, so the
+# match fails. It's hard to know how to provoke bugs here...
+ [[ abc = (|a*)~^(*b) ]]
+1:Regression test for exclusion after branches: failure case 1
+
+ [[ abc = (b|a*)~^(*b) ]]
+1:Regression test for exclusion after branches: failure case 2
+
+ [[ abc = (b*|a*)~^(*b) ]]
+1:Regression test for exclusion after branches: failure case 3
+
+# Careful: extendedglob off from this point.
+
unsetopt extendedglob
print -r -- ${(*)=${(@s.+.):-A+B}/(#b)(?)/-${(L)match[1]} ${match[1]}}
0:the '*' qualfier enables extended_glob for pattern matching
prev parent reply other threads:[~2023-07-31 15:22 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 13:19 Johan Grande
2023-07-25 18:35 ` Bart Schaefer
2023-07-25 18:47 ` Johan Grande
2023-07-28 1:02 ` Bart Schaefer
2023-07-28 6:41 ` Stephane Chazelas
2023-07-29 1:35 ` Bart Schaefer
2023-08-01 13:19 ` Johan Grande
2023-08-01 13:30 ` Peter Stephenson
2023-08-01 13:46 ` Johan Grande
2023-08-02 8:31 ` Johan Grande
2023-08-02 9:37 ` Peter Stephenson
2023-07-31 11:36 ` Peter Stephenson
2023-07-31 15:21 ` Peter Stephenson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57958231.1555405.1690816906480@mail.virginmedia.com \
--to=p.w.stephenson@ntlworld.com \
--cc=nahoj@crans.org \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).