From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 8725 invoked from network); 11 Dec 2023 23:50:49 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 11 Dec 2023 23:50:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Message-ID:Date: Content-Transfer-Encoding:Content-ID:Content-Type:MIME-Version:Subject:To: References:From:In-reply-to:cc:Reply-To:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=Uf2zBx8HrNu42GyP1U+mUM17ih0gnBaamp6/hsUE6LA=; b=UxHHiSS1exGTymunL1SRi7mqt7 2vOLY37MG1L3ixWTtSMyAmNyVOC5DYxiypDG7ZTAsnU4qHu5rN7l+7Y13rM7x23IUJj6r3KGEaJCQ 2BVbcSXFC4RkmKDH+5NhSiBcggSL2RXtd3byn8/FlK3RC8MVobn05KU6ED/S0CHd1QDv0OsDBPMhj UGQ2F0lc5wiOm1KYCF2eBG/UV+E8RLyzt7dXE5ymvZb4++0FJ+6lfcXLJyyj6XjOWoL7jIBG9xI0o 8V6mGGvIKwwNyi//OLnmH+epRQ2pE9prHoLHEWEA+/T5OmrEI1n57jao6F13elAyP3FFISfe6ORRZ 877387qA==; Received: by zero.zsh.org with local id 1rCq2y-000Jp6-DN; Mon, 11 Dec 2023 23:50:48 +0000 Received: by zero.zsh.org with esmtpsa (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1rCq29-000JVc-6C; Mon, 11 Dec 2023 23:49:57 +0000 Received: from [192.168.178.21] (helo=hydra) by mail.kiddle.eu with esmtp(Exim 4.95) (envelope-from ) id 1rCq22-00092F-S0; Tue, 12 Dec 2023 00:49:56 +0100 cc: chris0e3@gmail.com, Zsh hackers list In-reply-to: From: Oliver Kiddle References: To: Bart Schaefer Subject: =?UTF-8?Q?Re:_[PATCH=3F]_Re:_[BUG]_`$match`_is_haunting_my_regex?= =?UTF-8?Q?=E2=80=99s_trailing,_optional,_capture?= MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <34733.1702338590.1@hydra> Content-Transfer-Encoding: 8bit Date: Tue, 12 Dec 2023 00:49:50 +0100 Message-ID: <34734-1702338590.864931@1x0T.Klos.9utN> X-Seq: 52405 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: Bart Schaefer wrote: > On Fri, Dec 8, 2023 at 10:23 PM Bart Schaefer <[1]schaefer@brasslantern.com> > wrote: > > On Fri, Dec 8, 2023 at 9:14 PM <[2]chris0e3@gmail.com> wrote: > > > >   setopt rematch_pcre > >   [[ 'REQUIRE. OPT' =~ 'REQUIRE.(\s*OPT)?' ]] && printf '\tA. ‹%s›\n' > $match > >   [[ 'REQUIRE.'     =~ 'REQUIRE.(\s*OPT)?' ]] && printf '\tB. ‹%s›\n' Without rematchpcre and with \s changed to just a space, this will set match=( '' ) which is what would seem most logical to me. > Is "unset match" OK here?  There doesn't seem to be an obvious way to > distinguish "there are capture expressions, but none matched anything" from > "there were no capture expressions".  Maybe Oliver has a better clue. pcre2_get_ovector_count() will give how many capture expressions the pattern contains. The following: [[ 'REQUIRE.1' =~ 'REQUIRE.(\s*O(P)T)?(1)' ]] results in match=( '' '' 1 ). So adding empty elements at the end too is consistent with that. pcre2_match's return status tells us the last capture element that was set. I didn't find anything in the documentation to confirm that later elements of the ovector will have been initialised empty but they do appear to be. If you get garbage instead of empty elements, that'll be the cause. Oliver diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c index e48ae3ae5..a49d1a307 100644 --- a/Src/Modules/pcre.c +++ b/Src/Modules/pcre.c @@ -391,6 +391,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func)) pcre_mdata = pcre2_match_data_create_from_pattern(pcre_pattern, NULL); ret = pcre2_match(pcre_pattern, (PCRE2_SPTR) plaintext, subject_len, offset_start, 0, pcre_mdata, mcontext); + if (ret > 0) + ret = pcre2_get_ovector_count(pcre_mdata); } if (ret==0) return_value = 0; @@ -479,7 +481,8 @@ cond_pcre_match(char **a, int id) break; } else if (r>0) { - zpcre_get_substrings(pcre_pat, lhstr_plain, pcre_mdata, r, svar, avar, + uint32_t ovec_count = pcre2_get_ovector_count(pcre_mdata); + zpcre_get_substrings(pcre_pat, lhstr_plain, pcre_mdata, ovec_count, svar, avar, ".pcre.match", 0, isset(BASHREMATCH), !isset(BASHREMATCH)); return_value = 1; break;