From: Oliver Kiddle <opk@zsh.org>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: chris0e3@gmail.com, Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: [PATCH?] Re: [BUG] `$match` is haunting my regex’s trailing, optional, capture
Date: Tue, 12 Dec 2023 00:49:50 +0100 [thread overview]
Message-ID: <34734-1702338590.864931@1x0T.Klos.9utN> (raw)
In-Reply-To: <CAH+w=7bSrq8p8-LNbn-M-Fkigo1GP3S=5+uXho5zw3bJxXBbBQ@mail.gmail.com>
Bart Schaefer wrote:
> On Fri, Dec 8, 2023 at 10:23 PM Bart Schaefer <[1]schaefer@brasslantern.com>
> wrote:
>
> On Fri, Dec 8, 2023 at 9:14 PM <[2]chris0e3@gmail.com> wrote:
> >
> > setopt rematch_pcre
> > [[ 'REQUIRE. OPT' =~ 'REQUIRE.(\s*OPT)?' ]] && printf '\tA. ‹%s›\n'
> $match
> > [[ 'REQUIRE.' =~ 'REQUIRE.(\s*OPT)?' ]] && printf '\tB. ‹%s›\n'
Without rematchpcre and with \s changed to just a space, this will set
match=( '' ) which is what would seem most logical to me.
> Is "unset match" OK here? There doesn't seem to be an obvious way to
> distinguish "there are capture expressions, but none matched anything" from
> "there were no capture expressions". Maybe Oliver has a better clue.
pcre2_get_ovector_count() will give how many capture expressions
the pattern contains. The following:
[[ 'REQUIRE.1' =~ 'REQUIRE.(\s*O(P)T)?(1)' ]]
results in match=( '' '' 1 ). So adding empty elements at the end too is
consistent with that. pcre2_match's return status tells us the
last capture element that was set.
I didn't find anything in the documentation to confirm that later
elements of the ovector will have been initialised empty but they do
appear to be. If you get garbage instead of empty elements, that'll be
the cause.
Oliver
diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index e48ae3ae5..a49d1a307 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -391,6 +391,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
pcre_mdata = pcre2_match_data_create_from_pattern(pcre_pattern, NULL);
ret = pcre2_match(pcre_pattern, (PCRE2_SPTR) plaintext, subject_len,
offset_start, 0, pcre_mdata, mcontext);
+ if (ret > 0)
+ ret = pcre2_get_ovector_count(pcre_mdata);
}
if (ret==0) return_value = 0;
@@ -479,7 +481,8 @@ cond_pcre_match(char **a, int id)
break;
}
else if (r>0) {
- zpcre_get_substrings(pcre_pat, lhstr_plain, pcre_mdata, r, svar, avar,
+ uint32_t ovec_count = pcre2_get_ovector_count(pcre_mdata);
+ zpcre_get_substrings(pcre_pat, lhstr_plain, pcre_mdata, ovec_count, svar, avar,
".pcre.match", 0, isset(BASHREMATCH), !isset(BASHREMATCH));
return_value = 1;
break;
next prev parent reply other threads:[~2023-12-11 23:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-09 5:14 chris0e3
2023-12-09 6:23 ` Bart Schaefer
2023-12-09 20:54 ` [PATCH?] " Bart Schaefer
2023-12-11 23:49 ` Oliver Kiddle [this message]
2023-12-12 1:38 ` Bart Schaefer
2024-01-25 22:14 ` Oliver Kiddle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34734-1702338590.864931@1x0T.Klos.9utN \
--to=opk@zsh.org \
--cc=chris0e3@gmail.com \
--cc=schaefer@brasslantern.com \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).