* case-insensitivity of =~ operator
@ 2014-07-19 12:19 Moritz Bunkus
2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus
2014-07-19 12:36 ` case-insensitivity of =~ operator Moritz Bunkus
0 siblings, 2 replies; 9+ messages in thread
From: Moritz Bunkus @ 2014-07-19 12:19 UTC (permalink / raw)
To: zsh-users
[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]
Hey,
today I stumbled across this paragraph in zsh's info documentation
again. Citing »Description of Options«:
> CASE_MATCH <D>
> Make regular expressions using the zsh/regex module (including
> matches with =~) sensitive to case.
It does not apply to =~ if the zsh/pcre module is loaded. Test script:
----------------------------------------
#!/bin/zsh
export LC_ALL=C
zmodload zsh/pcre
setopt no_case_match
line=Hello
if [[ $line =~ He ]] print match case 1
if [[ $line =~ he ]] print match case 2
if [[ $line -pcre-match He ]] print match case 3
if [[ $line -pcre-match he ]] print match case 4
pcre_compile He
pcre_match $line && print match case 5
pcre_compile -i He
pcre_match $line && print match case 6
----------------------------------------
Output:
----------------------------------------
match case 1
match case 3
match case 5
match case 6
----------------------------------------
Yes, the paragraph above states zsh/regex and not zsh/pcre, but as
zsh/pcre's presence can be… well… it's hard(er) for scripts do detect.
What I'm asking for are two things:
1. Make the paragraph above a bit clearer. An example what would work
for me:
> Make regular expressions using the zsh/regex module (including
> matches with =~ if the zsh/pcre module is not loaded) sensitive to
> case.
2. Provide an option for making matches with zsh/pcre including
pcre_match and =~ case-insensitive.
Alternatively extend CASE_MATCH to work on the =~ provided by zsh/pcre,
too and update the info section accordingly. This might make more sense
considering that pcre_match must be used with pcre_compile in tandem and
pcre_compile provides -i already.
Thanks for your consideration.
Kind regards,
mosu
[-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* inconsistency in empty argument matching =~/pcre_match @ 2014-07-19 12:31 ` Moritz Bunkus 2014-07-19 12:36 ` Moritz Bunkus 0 siblings, 1 reply; 9+ messages in thread From: Moritz Bunkus @ 2014-07-19 12:31 UTC (permalink / raw) To: zsh-users [-- Attachment #1: Type: text/plain, Size: 884 bytes --] Hey, if I load zsh/pcre and try to match empty lines then =~ and pcre_match will behave differently. Consider this test script: ---------------------------------------- #!/bin/zsh zmodload zsh/pcre line= if [[ $line =~ '^$' ]] print is empty case 1 pcre_compile '^$' pcre_match "$line" && print is empty case 2 pcre_match $line && print is empty case 3 ---------------------------------------- The output is: ---------------------------------------- is empty case 1 ./test.sh:pcre_match:10: not enough arguments ---------------------------------------- Meaning: 1. =~ matches as expected 2. pcre_match "$line" does NOT match and doesn't emit an error message 3. pcre_match $line does NOT match either and emits an error message This is not only inconsistent but also simply wrong. Both 2. and 3. should match, and 2. shouldn't emit an error message. Kind regards, mosu [-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: inconsistency in empty argument matching =~/pcre_match 2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus @ 2014-07-19 12:36 ` Moritz Bunkus 2014-07-19 22:21 ` Several PCRE module oddities Bart Schaefer 0 siblings, 1 reply; 9+ messages in thread From: Moritz Bunkus @ 2014-07-19 12:36 UTC (permalink / raw) To: zsh-users [-- Attachment #1: Type: text/plain, Size: 98 bytes --] Hey, forgot to mention: zsh 5.0.5, output is from running 'zsh -f ./test.sh' Kind regards, mosu [-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Several PCRE module oddities 2014-07-19 12:36 ` Moritz Bunkus @ 2014-07-19 22:21 ` Bart Schaefer 2014-07-20 7:40 ` Moritz Bunkus ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Bart Schaefer @ 2014-07-19 22:21 UTC (permalink / raw) To: Moritz Bunkus, zsh-users Replying to all of these PCRE-related items all together ... Preliminary remark: In none of your test scripts do I see you setting the RE_MATCH_PCRE option. You need to set that, in addition to loading the zsh/pcre module, or [[ str =~ pat ]] continues to use zsh/regex. I think that explains your "zsh -f" behavior confusion (see below). On Jul 19, 2:19pm, Moritz Bunkus wrote: } } today I stumbled across this paragraph in zsh's info documentation } again. Citing "Description of Options": } } > CASE_MATCH <D> } > Make regular expressions using the zsh/regex module (including } > matches with =~) sensitive to case. } } It does not apply to =~ if the zsh/pcre module is loaded. See the first and third hunks of the patch below, though I suppose we should get general agreement on whether it should work this way, because there's no way to turn it off on a per-pattern basis (unlike turning it *on* with "pcre_compile -i"). Also of course if you pcre_compile with one setting of CASE_MATCH and then change it before calling pcre_match, you get the behavior from compile time, so that ought to be explicitly documented. That doesn't apply to the inline condition operator, which recompiles every time it's used. On Jul 19, 2:31pm, Moritz Bunkus wrote: } } line= } if [[ $line =~ '^$' ]] print is empty case 1 } } pcre_compile '^$' } pcre_match "$line" && print is empty case 2 } pcre_match $line && print is empty case 3 } ---------------------------------------- } } 1. =~ matches as expected } } 2. pcre_match "$line" does NOT match and doesn't emit an error message } } 3. pcre_match $line does NOT match either and emits an error message } } This is not only inconsistent but also simply wrong. Both 2. and } 3. should match, and 2. shouldn't emit an error message. Actually only (2) is strange here, see second hunk of patch below (there may be a better way to fix this). As for examples (1) and (3): [[ $line =~ ^$ ]] is a special syntactic construct which treats $line (unquoted parameter reference) as a token before expanding the value. You also don't need the single-quotes around ^$ for this reason. The calls to pcre_match, on the other hand, are normal shell commands, which means the parameter references are expanded and unquoted values are completely removed from the argument list, before the command is even invoked. So "not enough arguments" is exactly as expected, and completely consistent with other shell commands. On Jul 19, 2:36pm, Moritz Bunkus wrote: } } This case gets even weirder. The previous output I posted was gathered } from zsh running with my normal RC files. The output actually differs if } run with -f: } } Meaning without any RCs case 2 matches, too! This is because you haven't set the RE_MATCH_PCRE option, so zsh/regex is being used for case 2. Here's the patch. diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c index cb9f8ef..2333438 100644 --- a/Src/Modules/pcre.c +++ b/Src/Modules/pcre.c @@ -87,6 +87,8 @@ bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func)) if (zpcre_utf8_enabled()) pcre_opts |= PCRE_UTF8; + if (!isset(CASEMATCH)) + pcre_opts |= PCRE_CASELESS; pcre_hints = NULL; /* Is this necessary? */ @@ -311,7 +313,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func)) unmetafy(plaintext, NULL); subject_len = (int)strlen(plaintext); - if (offset_start < 0 || offset_start >= subject_len) + if (offset_start < 0 || + (subject_len ? offset_start >= subject_len : offset_start > 0)) ret = PCRE_ERROR_NOMATCH; else ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize); @@ -345,6 +348,8 @@ cond_pcre_match(char **a, int id) if (zpcre_utf8_enabled()) pcre_opts |= PCRE_UTF8; + if (!isset(CASEMATCH)) + pcre_opts |= PCRE_CASELESS; lhstr = cond_str(a,0,0); rhre = cond_str(a,1,0); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Several PCRE module oddities 2014-07-19 22:21 ` Several PCRE module oddities Bart Schaefer @ 2014-07-20 7:40 ` Moritz Bunkus 2014-07-20 10:24 ` Roman Neuhauser 2014-07-20 16:14 ` Bart Schaefer 2 siblings, 0 replies; 9+ messages in thread From: Moritz Bunkus @ 2014-07-20 7:40 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-users [-- Attachment #1: Type: text/plain, Size: 254 bytes --] Hey, thanks. With your patch everything's much more consistent. > Preliminary remark: In none of your test scripts do I see you setting > the RE_MATCH_PCRE option. Doh, stupid me. Of course. I do have RE_MATCH_PCRE set in my RCs. Kind regards, mosu [-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Several PCRE module oddities 2014-07-19 22:21 ` Several PCRE module oddities Bart Schaefer 2014-07-20 7:40 ` Moritz Bunkus @ 2014-07-20 10:24 ` Roman Neuhauser 2014-07-20 16:14 ` Bart Schaefer 2 siblings, 0 replies; 9+ messages in thread From: Roman Neuhauser @ 2014-07-20 10:24 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-users # schaefer@brasslantern.com / 2014-07-19 15:21:16 -0700: > - if (offset_start < 0 || offset_start >= subject_len) > + if (offset_start < 0 || > + (subject_len ? offset_start >= subject_len : offset_start > 0)) > ret = PCRE_ERROR_NOMATCH; > else > ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize); maybe the slightly shorter version? > - if (offset_start < 0 || offset_start >= (subject_len ? subject_len : 1)) -- roman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Several PCRE module oddities 2014-07-19 22:21 ` Several PCRE module oddities Bart Schaefer 2014-07-20 7:40 ` Moritz Bunkus 2014-07-20 10:24 ` Roman Neuhauser @ 2014-07-20 16:14 ` Bart Schaefer 2014-07-20 17:19 ` Peter Stephenson 2 siblings, 1 reply; 9+ messages in thread From: Bart Schaefer @ 2014-07-20 16:14 UTC (permalink / raw) To: zsh-users On Jul 19, 3:21pm, Bart Schaefer wrote: } } See the first and third hunks of the patch below, though I suppose we } should get general agreement on whether it should work this way, because } there's no way to turn it off on a per-pattern basis (unlike turning it } *on* with "pcre_compile -i"). Upon reflection I think NO_CASE_MATCH should apply to =~ but not to pcre_compile, because the latter has the -i option. (Thus discard the first hunk of the patch from users/18966.) I will follow up to zsh-workers with an updated patch. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Several PCRE module oddities 2014-07-20 16:14 ` Bart Schaefer @ 2014-07-20 17:19 ` Peter Stephenson 0 siblings, 0 replies; 9+ messages in thread From: Peter Stephenson @ 2014-07-20 17:19 UTC (permalink / raw) To: zsh-users On Sun, 20 Jul 2014 09:14:21 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Jul 19, 3:21pm, Bart Schaefer wrote: > } > } See the first and third hunks of the patch below, though I suppose we > } should get general agreement on whether it should work this way, because > } there's no way to turn it off on a per-pattern basis (unlike turning it > } *on* with "pcre_compile -i"). > > Upon reflection I think NO_CASE_MATCH should apply to =~ but not to > pcre_compile, because the latter has the -i option. (Thus discard the > first hunk of the patch from users/18966.) Yes, that sounds right. NO_CASE_MATCH is for the normal RE match operator, whatever happens to implement it. pws ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: case-insensitivity of =~ operator 2014-07-19 12:19 case-insensitivity of =~ operator Moritz Bunkus 2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus @ 2014-07-19 12:36 ` Moritz Bunkus 1 sibling, 0 replies; 9+ messages in thread From: Moritz Bunkus @ 2014-07-19 12:36 UTC (permalink / raw) To: zsh-users [-- Attachment #1: Type: text/plain, Size: 986 bytes --] Hey, forgot to mention: this is zsh 5.0.5. This case gets even weirder. The previous output I posted was gathered from zsh running with my normal RC files. The output actually differs if run with -f: ---------------------------------------- match case 1 match case 2 match case 3 match case 5 match case 6 ---------------------------------------- Meaning without any RCs case 2 matches, too! This is all very, very inconsistent… Still the same test script: ---------------------------------------- #!/bin/zsh export LC_ALL=C zmodload zsh/pcre setopt no_case_match line=Hello if [[ $line =~ He ]] print match case 1 if [[ $line =~ he ]] print match case 2 if [[ $line -pcre-match He ]] print match case 3 if [[ $line -pcre-match he ]] print match case 4 pcre_compile He pcre_match $line && print match case 5 pcre_compile -i He pcre_match $line && print match case 6 ---------------------------------------- Kind regards, mosu [-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-07-20 17:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-07-19 12:19 case-insensitivity of =~ operator Moritz Bunkus 2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus 2014-07-19 12:36 ` Moritz Bunkus 2014-07-19 22:21 ` Several PCRE module oddities Bart Schaefer 2014-07-20 7:40 ` Moritz Bunkus 2014-07-20 10:24 ` Roman Neuhauser 2014-07-20 16:14 ` Bart Schaefer 2014-07-20 17:19 ` Peter Stephenson 2014-07-19 12:36 ` case-insensitivity of =~ operator Moritz Bunkus
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).