zsh-users
 help / color / mirror / code / Atom feed
* case-insensitivity of =~ operator
@ 2014-07-19 12:19 Moritz Bunkus
  2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus
  2014-07-19 12:36 ` case-insensitivity of =~ operator Moritz Bunkus
  0 siblings, 2 replies; 9+ messages in thread
From: Moritz Bunkus @ 2014-07-19 12:19 UTC (permalink / raw)
  To: zsh-users

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

Hey,

today I stumbled across this paragraph in zsh's info documentation
again. Citing »Description of Options«:

> CASE_MATCH <D>
>      Make regular expressions using the zsh/regex module (including
>      matches with =~) sensitive to case.

It does not apply to =~ if the zsh/pcre module is loaded. Test script:

----------------------------------------
#!/bin/zsh

export LC_ALL=C

zmodload zsh/pcre
setopt no_case_match

line=Hello
if [[ $line =~ He ]] print match case 1
if [[ $line =~ he ]] print match case 2
if [[ $line -pcre-match He ]] print match case 3
if [[ $line -pcre-match he ]] print match case 4

pcre_compile He
pcre_match $line && print match case 5

pcre_compile -i He
pcre_match $line && print match case 6
----------------------------------------

Output:

----------------------------------------
match case 1
match case 3
match case 5
match case 6
----------------------------------------

Yes, the paragraph above states zsh/regex and not zsh/pcre, but as
zsh/pcre's presence can be… well… it's hard(er) for scripts do detect.

What I'm asking for are two things:

1. Make the paragraph above a bit clearer. An example what would work
   for me:

   > Make regular expressions using the zsh/regex module (including
   > matches with =~ if the zsh/pcre module is not loaded) sensitive to
   > case.

2. Provide an option for making matches with zsh/pcre including
   pcre_match and =~ case-insensitive.

Alternatively extend CASE_MATCH to work on the =~ provided by zsh/pcre,
too and update the info section accordingly. This might make more sense
considering that pcre_match must be used with pcre_compile in tandem and
pcre_compile provides -i already.

Thanks for your consideration.

Kind regards,
mosu

[-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* inconsistency in empty argument matching =~/pcre_match
@ 2014-07-19 12:31 ` Moritz Bunkus
  2014-07-19 12:36   ` Moritz Bunkus
  0 siblings, 1 reply; 9+ messages in thread
From: Moritz Bunkus @ 2014-07-19 12:31 UTC (permalink / raw)
  To: zsh-users

[-- Attachment #1: Type: text/plain, Size: 884 bytes --]

Hey,

if I load zsh/pcre and try to match empty lines then =~ and pcre_match
will behave differently. Consider this test script:

----------------------------------------
#!/bin/zsh

zmodload zsh/pcre

line=
if [[ $line =~ '^$' ]] print is empty case 1

pcre_compile '^$'
pcre_match "$line" && print is empty case 2
pcre_match $line   && print is empty case 3
----------------------------------------

The output is:

----------------------------------------
is empty case 1
./test.sh:pcre_match:10: not enough arguments
----------------------------------------

Meaning:

1. =~ matches as expected

2. pcre_match "$line" does NOT match and doesn't emit an error message

3. pcre_match $line does NOT match either and emits an error message

This is not only inconsistent but also simply wrong. Both 2. and
3. should match, and 2. shouldn't emit an error message.

Kind regards,
mosu

[-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: case-insensitivity of =~ operator
  2014-07-19 12:19 case-insensitivity of =~ operator Moritz Bunkus
  2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus
@ 2014-07-19 12:36 ` Moritz Bunkus
  1 sibling, 0 replies; 9+ messages in thread
From: Moritz Bunkus @ 2014-07-19 12:36 UTC (permalink / raw)
  To: zsh-users

[-- Attachment #1: Type: text/plain, Size: 986 bytes --]

Hey,

forgot to mention: this is zsh 5.0.5.

This case gets even weirder. The previous output I posted was gathered
from zsh running with my normal RC files. The output actually differs if
run with -f:

----------------------------------------
match case 1
match case 2
match case 3
match case 5
match case 6
----------------------------------------

Meaning without any RCs case 2 matches, too!

This is all very, very inconsistent…

Still the same test script:

----------------------------------------
#!/bin/zsh

export LC_ALL=C

zmodload zsh/pcre
setopt no_case_match

line=Hello
if [[ $line =~ He ]] print match case 1
if [[ $line =~ he ]] print match case 2
if [[ $line -pcre-match He ]] print match case 3
if [[ $line -pcre-match he ]] print match case 4

pcre_compile He
pcre_match $line && print match case 5

pcre_compile -i He
pcre_match $line && print match case 6
----------------------------------------

Kind regards,
mosu

[-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: inconsistency in empty argument matching =~/pcre_match
  2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus
@ 2014-07-19 12:36   ` Moritz Bunkus
  2014-07-19 22:21     ` Several PCRE module oddities Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Moritz Bunkus @ 2014-07-19 12:36 UTC (permalink / raw)
  To: zsh-users

[-- Attachment #1: Type: text/plain, Size: 98 bytes --]

Hey,

forgot to mention: zsh 5.0.5, output is from running 'zsh -f ./test.sh'

Kind regards,
mosu

[-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Several PCRE module oddities
  2014-07-19 12:36   ` Moritz Bunkus
@ 2014-07-19 22:21     ` Bart Schaefer
  2014-07-20  7:40       ` Moritz Bunkus
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Bart Schaefer @ 2014-07-19 22:21 UTC (permalink / raw)
  To: Moritz Bunkus, zsh-users

Replying to all of these PCRE-related items all together ...

Preliminary remark:  In none of your test scripts do I see you setting
the RE_MATCH_PCRE option.  You need to set that, in addition to loading
the zsh/pcre module, or [[ str =~ pat ]] continues to use zsh/regex.  I
think that explains your "zsh -f" behavior confusion (see below).

On Jul 19,  2:19pm, Moritz Bunkus wrote:
}
} today I stumbled across this paragraph in zsh's info documentation
} again. Citing "Description of Options":
} 
} > CASE_MATCH <D>
} >      Make regular expressions using the zsh/regex module (including
} >      matches with =~) sensitive to case.
} 
} It does not apply to =~ if the zsh/pcre module is loaded.

See the first and third hunks of the patch below, though I suppose we
should get general agreement on whether it should work this way, because
there's no way to turn it off on a per-pattern basis (unlike turning it
*on* with "pcre_compile -i").

Also of course if you pcre_compile with one setting of CASE_MATCH and then
change it before calling pcre_match, you get the behavior from compile
time, so that ought to be explicitly documented.  That doesn't apply to
the inline condition operator, which recompiles every time it's used.


On Jul 19,  2:31pm, Moritz Bunkus wrote:
}
} line=
} if [[ $line =~ '^$' ]] print is empty case 1
} 
} pcre_compile '^$'
} pcre_match "$line" && print is empty case 2
} pcre_match $line   && print is empty case 3
} ----------------------------------------
} 
} 1. =~ matches as expected
} 
} 2. pcre_match "$line" does NOT match and doesn't emit an error message
} 
} 3. pcre_match $line does NOT match either and emits an error message
} 
} This is not only inconsistent but also simply wrong. Both 2. and
} 3. should match, and 2. shouldn't emit an error message.

Actually only (2) is strange here, see second hunk of patch below (there
may be a better way to fix this).  As for examples (1) and (3):

[[ $line =~ ^$ ]] is a special syntactic construct which treats $line
(unquoted parameter reference) as a token before expanding the value.
You also don't need the single-quotes around ^$ for this reason.

The calls to pcre_match, on the other hand, are normal shell commands,
which means the parameter references are expanded and unquoted values
are completely removed from the argument list, before the command is
even invoked.  So "not enough arguments" is exactly as expected, and
completely consistent with other shell commands.


On Jul 19,  2:36pm, Moritz Bunkus wrote:
}
} This case gets even weirder. The previous output I posted was gathered
} from zsh running with my normal RC files. The output actually differs if
} run with -f:
} 
} Meaning without any RCs case 2 matches, too!

This is because you haven't set the RE_MATCH_PCRE option, so zsh/regex
is being used for case 2.


Here's the patch.

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index cb9f8ef..2333438 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -87,6 +87,8 @@ bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func))
     
     if (zpcre_utf8_enabled())
 	pcre_opts |= PCRE_UTF8;
+    if (!isset(CASEMATCH))
+	pcre_opts |= PCRE_CASELESS;
 
     pcre_hints = NULL;  /* Is this necessary? */
     
@@ -311,7 +313,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     unmetafy(plaintext, NULL);
     subject_len = (int)strlen(plaintext);
 
-    if (offset_start < 0 || offset_start >= subject_len)
+    if (offset_start < 0 ||
+	(subject_len ? offset_start >= subject_len : offset_start > 0))
 	ret = PCRE_ERROR_NOMATCH;
     else
 	ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize);
@@ -345,6 +348,8 @@ cond_pcre_match(char **a, int id)
 
     if (zpcre_utf8_enabled())
 	pcre_opts |= PCRE_UTF8;
+    if (!isset(CASEMATCH))
+	pcre_opts |= PCRE_CASELESS;
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Several PCRE module oddities
  2014-07-19 22:21     ` Several PCRE module oddities Bart Schaefer
@ 2014-07-20  7:40       ` Moritz Bunkus
  2014-07-20 10:24       ` Roman Neuhauser
  2014-07-20 16:14       ` Bart Schaefer
  2 siblings, 0 replies; 9+ messages in thread
From: Moritz Bunkus @ 2014-07-20  7:40 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-users

[-- Attachment #1: Type: text/plain, Size: 254 bytes --]

Hey,

thanks. With your patch everything's much more consistent.

> Preliminary remark:  In none of your test scripts do I see you setting
> the RE_MATCH_PCRE option.

Doh, stupid me. Of course. I do have RE_MATCH_PCRE set in my RCs.

Kind regards,
mosu

[-- Attachment #2: Type: application/pgp-signature, Size: 173 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Several PCRE module oddities
  2014-07-19 22:21     ` Several PCRE module oddities Bart Schaefer
  2014-07-20  7:40       ` Moritz Bunkus
@ 2014-07-20 10:24       ` Roman Neuhauser
  2014-07-20 16:14       ` Bart Schaefer
  2 siblings, 0 replies; 9+ messages in thread
From: Roman Neuhauser @ 2014-07-20 10:24 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-users

# schaefer@brasslantern.com / 2014-07-19 15:21:16 -0700:
> -    if (offset_start < 0 || offset_start >= subject_len)
> +    if (offset_start < 0 ||
> +	(subject_len ? offset_start >= subject_len : offset_start > 0))
>  	ret = PCRE_ERROR_NOMATCH;
>      else
>  	ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize);

maybe the slightly shorter version?

> -    if (offset_start < 0 || offset_start >= (subject_len ? subject_len : 1))

-- 
roman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Several PCRE module oddities
  2014-07-19 22:21     ` Several PCRE module oddities Bart Schaefer
  2014-07-20  7:40       ` Moritz Bunkus
  2014-07-20 10:24       ` Roman Neuhauser
@ 2014-07-20 16:14       ` Bart Schaefer
  2014-07-20 17:19         ` Peter Stephenson
  2 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2014-07-20 16:14 UTC (permalink / raw)
  To: zsh-users

On Jul 19,  3:21pm, Bart Schaefer wrote:
}
} See the first and third hunks of the patch below, though I suppose we
} should get general agreement on whether it should work this way, because
} there's no way to turn it off on a per-pattern basis (unlike turning it
} *on* with "pcre_compile -i").

Upon reflection I think NO_CASE_MATCH should apply to =~ but not to
pcre_compile, because the latter has the -i option.  (Thus discard the
first hunk of the patch from users/18966.)

I will follow up to zsh-workers with an updated patch.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Several PCRE module oddities
  2014-07-20 16:14       ` Bart Schaefer
@ 2014-07-20 17:19         ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2014-07-20 17:19 UTC (permalink / raw)
  To: zsh-users

On Sun, 20 Jul 2014 09:14:21 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jul 19,  3:21pm, Bart Schaefer wrote:
> }
> } See the first and third hunks of the patch below, though I suppose we
> } should get general agreement on whether it should work this way, because
> } there's no way to turn it off on a per-pattern basis (unlike turning it
> } *on* with "pcre_compile -i").
> 
> Upon reflection I think NO_CASE_MATCH should apply to =~ but not to
> pcre_compile, because the latter has the -i option.  (Thus discard the
> first hunk of the patch from users/18966.)

Yes, that sounds right.  NO_CASE_MATCH is for the normal RE match
operator, whatever happens to implement it.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-07-20 17:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-19 12:19 case-insensitivity of =~ operator Moritz Bunkus
2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus
2014-07-19 12:36   ` Moritz Bunkus
2014-07-19 22:21     ` Several PCRE module oddities Bart Schaefer
2014-07-20  7:40       ` Moritz Bunkus
2014-07-20 10:24       ` Roman Neuhauser
2014-07-20 16:14       ` Bart Schaefer
2014-07-20 17:19         ` Peter Stephenson
2014-07-19 12:36 ` case-insensitivity of =~ operator Moritz Bunkus

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).