From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4169 invoked by alias); 19 Jul 2014 22:21:33 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 18966 Received: (qmail 4469 invoked from network); 19 Jul 2014 22:21:30 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 From: Bart Schaefer Message-id: <140719152116.ZM13686@torch.brasslantern.com> Date: Sat, 19 Jul 2014 15:21:16 -0700 In-reply-to: <20140719121937.GN12213@bunkus.org> Comments: In reply to Moritz Bunkus "case-insensitivity of =~ operator" (Jul 19, 2:19pm) References: <20140719121937.GN12213@bunkus.org> <20140719123158.GO12213@bunkus.org> <20140719123620.GP12213@bunkus.org> <20140719123645.GQ12213@bunkus.org> In-reply-to: <20140719123158.GO12213@bunkus.org> Comments: In reply to Moritz Bunkus "inconsistency in empty argument matching =~/pcre_match" (Jul 19, 2:31pm) In-reply-to: <20140719123620.GP12213@bunkus.org> Comments: In reply to Moritz Bunkus "Re: case-insensitivity of =~ operator" (Jul 19, 2:36pm) In-reply-to: <20140719123645.GQ12213@bunkus.org> Comments: In reply to Moritz Bunkus "Re: inconsistency in empty argument matching =~/pcre_match" (Jul 19, 2:36pm) X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: Moritz Bunkus , zsh-users@zsh.org Subject: Several PCRE module oddities MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Replying to all of these PCRE-related items all together ... Preliminary remark: In none of your test scripts do I see you setting the RE_MATCH_PCRE option. You need to set that, in addition to loading the zsh/pcre module, or [[ str =~ pat ]] continues to use zsh/regex. I think that explains your "zsh -f" behavior confusion (see below). On Jul 19, 2:19pm, Moritz Bunkus wrote: } } today I stumbled across this paragraph in zsh's info documentation } again. Citing "Description of Options": } } > CASE_MATCH } > Make regular expressions using the zsh/regex module (including } > matches with =~) sensitive to case. } } It does not apply to =~ if the zsh/pcre module is loaded. See the first and third hunks of the patch below, though I suppose we should get general agreement on whether it should work this way, because there's no way to turn it off on a per-pattern basis (unlike turning it *on* with "pcre_compile -i"). Also of course if you pcre_compile with one setting of CASE_MATCH and then change it before calling pcre_match, you get the behavior from compile time, so that ought to be explicitly documented. That doesn't apply to the inline condition operator, which recompiles every time it's used. On Jul 19, 2:31pm, Moritz Bunkus wrote: } } line= } if [[ $line =~ '^$' ]] print is empty case 1 } } pcre_compile '^$' } pcre_match "$line" && print is empty case 2 } pcre_match $line && print is empty case 3 } ---------------------------------------- } } 1. =~ matches as expected } } 2. pcre_match "$line" does NOT match and doesn't emit an error message } } 3. pcre_match $line does NOT match either and emits an error message } } This is not only inconsistent but also simply wrong. Both 2. and } 3. should match, and 2. shouldn't emit an error message. Actually only (2) is strange here, see second hunk of patch below (there may be a better way to fix this). As for examples (1) and (3): [[ $line =~ ^$ ]] is a special syntactic construct which treats $line (unquoted parameter reference) as a token before expanding the value. You also don't need the single-quotes around ^$ for this reason. The calls to pcre_match, on the other hand, are normal shell commands, which means the parameter references are expanded and unquoted values are completely removed from the argument list, before the command is even invoked. So "not enough arguments" is exactly as expected, and completely consistent with other shell commands. On Jul 19, 2:36pm, Moritz Bunkus wrote: } } This case gets even weirder. The previous output I posted was gathered } from zsh running with my normal RC files. The output actually differs if } run with -f: } } Meaning without any RCs case 2 matches, too! This is because you haven't set the RE_MATCH_PCRE option, so zsh/regex is being used for case 2. Here's the patch. diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c index cb9f8ef..2333438 100644 --- a/Src/Modules/pcre.c +++ b/Src/Modules/pcre.c @@ -87,6 +87,8 @@ bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func)) if (zpcre_utf8_enabled()) pcre_opts |= PCRE_UTF8; + if (!isset(CASEMATCH)) + pcre_opts |= PCRE_CASELESS; pcre_hints = NULL; /* Is this necessary? */ @@ -311,7 +313,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func)) unmetafy(plaintext, NULL); subject_len = (int)strlen(plaintext); - if (offset_start < 0 || offset_start >= subject_len) + if (offset_start < 0 || + (subject_len ? offset_start >= subject_len : offset_start > 0)) ret = PCRE_ERROR_NOMATCH; else ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize); @@ -345,6 +348,8 @@ cond_pcre_match(char **a, int id) if (zpcre_utf8_enabled()) pcre_opts |= PCRE_UTF8; + if (!isset(CASEMATCH)) + pcre_opts |= PCRE_CASELESS; lhstr = cond_str(a,0,0); rhre = cond_str(a,1,0);