zsh-users
 help / color / mirror / code / Atom feed
* [[ 'abcde' =~ (#i)Bcd ]]
@ 2022-11-07 21:10 Ray Andrews
  2022-11-07 21:26 ` Roman Perepelitsa
  2022-11-08 17:40 ` Phil Pennock
  0 siblings, 2 replies; 13+ messages in thread
From: Ray Andrews @ 2022-11-07 21:10 UTC (permalink / raw)
  To: Zsh Users


[[ 'abcde' =~ 'bcd' ]] && echo match1
[[ 'abcde' = (#i)ABcde ]] && echo match2
[[ 'abcde' =~ (#i)Bcd ]] && echo match3
[[ 'bcd' =~ 'abcde' ]] && echo match4

... I get match 1 and match 2.  I  understand not getting match 4 
because '=~' is not bi-directional, the latter value must be a subset of 
the former.  But why don't I get match 3? It seems to break no rules to 
make 'Bcd' case insensitive and then find it within 'abcde'.  Is there a 
workaround?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 21:10 [[ 'abcde' =~ (#i)Bcd ]] Ray Andrews
@ 2022-11-07 21:26 ` Roman Perepelitsa
  2022-11-07 21:47   ` Ray Andrews
  2022-11-07 21:50   ` Lawrence Velázquez
  2022-11-08 17:40 ` Phil Pennock
  1 sibling, 2 replies; 13+ messages in thread
From: Roman Perepelitsa @ 2022-11-07 21:26 UTC (permalink / raw)
  To: Ray Andrews; +Cc: Zsh Users

On Mon, Nov 7, 2022 at 10:11 PM Ray Andrews <rayandrews@eastlink.ca> wrote:
>
>
> [[ 'abcde' =~ 'bcd' ]] && echo match1
> [[ 'abcde' = (#i)ABcde ]] && echo match2
> [[ 'abcde' =~ (#i)Bcd ]] && echo match3
> [[ 'bcd' =~ 'abcde' ]] && echo match4
>
> ... I get match 1 and match 2.  I  understand not getting match 4
> because '=~' is not bi-directional, the latter value must be a subset of
> the former.  But why don't I get match 3?

Does it surprise you that this also doesn't match?

    [[ 'a' =~ (#i)A ]]

(#i) only works with pattern matching. For regex the easiest
workaround is to convert left-hand-side to lowercase:

    foo=XaBcX
    [[ ${(L)foo} =~ abc ]] && echo match

Another option is to use zsh/pcre module. See
https://zsh.sourceforge.io/Doc/Release/Zsh-Modules.html#The-zsh_002fpcre-Module.

In this specific case it's better to use pattern matching of course:

    [[ $foo == (#i)*abc* ]] && echo match

I find it extremely rare in practice that I need a regex match in zsh.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 21:26 ` Roman Perepelitsa
@ 2022-11-07 21:47   ` Ray Andrews
  2022-11-07 22:15     ` Lawrence Velázquez
  2022-11-07 21:50   ` Lawrence Velázquez
  1 sibling, 1 reply; 13+ messages in thread
From: Ray Andrews @ 2022-11-07 21:47 UTC (permalink / raw)
  To: zsh-users


On 2022-11-07 13:26, Roman Perepelitsa wrote:
> [[ 'abcde' =~ (#i)Bcd ]] && echo match3

> (#i) only works with pattern matching.

But isn't that a pattern match?

[[ 'abcde' = (#i)ABcde ]] && echo match2

... that seems happy so it would seem that wildcards aren't required.

> In this specific case it's better to use pattern matching of course:
>
>      [[ $foo == (#i)*abc* ]] && echo match
>
That's what puzzles me I expect:

[[ $foo == (#i)*abc* ]] && echo match

and:

[[ $foo =~ (#i)abc ]] && echo match

... to be exactly the same. If not, why not? Actually there are several workarounds but still I'd expect that to work too.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 21:26 ` Roman Perepelitsa
  2022-11-07 21:47   ` Ray Andrews
@ 2022-11-07 21:50   ` Lawrence Velázquez
  2022-11-08  2:05     ` Ray Andrews
  1 sibling, 1 reply; 13+ messages in thread
From: Lawrence Velázquez @ 2022-11-07 21:50 UTC (permalink / raw)
  To: Ray Andrews; +Cc: Roman Perepelitsa, zsh-users

On Mon, Nov 7, 2022, at 4:26 PM, Roman Perepelitsa wrote:
> (#i) only works with pattern matching. For regex the easiest
> workaround is to convert left-hand-side to lowercase:
>
>     foo=XaBcX
>     [[ ${(L)foo} =~ abc ]] && echo match
>
> Another option is to use zsh/pcre module. See
> https://zsh.sourceforge.io/Doc/Release/Zsh-Modules.html#The-zsh_002fpcre-Module.

If you're not using zsh/pcre, yet another option is to disable
CASE_MATCH.  It's a bit drastic, though.

You may also be able to use some nonstandard extensions defined by
your host system's regex library, but that would make your script
highly dependent on said library.

> I find it extremely rare in practice that I need a regex match in zsh.

I concur with Roman.  I doubt you actually need case-insensitive
regex.

-- 
vq


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 21:47   ` Ray Andrews
@ 2022-11-07 22:15     ` Lawrence Velázquez
  2022-11-08  1:57       ` Ray Andrews
  0 siblings, 1 reply; 13+ messages in thread
From: Lawrence Velázquez @ 2022-11-07 22:15 UTC (permalink / raw)
  To: Ray Andrews; +Cc: zsh-users

On Mon, Nov 7, 2022, at 4:47 PM, Ray Andrews wrote:
> On 2022-11-07 13:26, Roman Perepelitsa wrote:
>> [[ 'abcde' =~ (#i)Bcd ]] && echo match3
>
>> (#i) only works with pattern matching.
>
> But isn't that a pattern match?

No.  It is a regular expression match.  When discussing shells and
adjacent tools, "pattern" almost always implies the syntax used for
filename generation, or an extension thereof.  (I only say "almost"
as a hedge.)


> [[ 'abcde' = (#i)ABcde ]] && echo match2
>
> ... that seems happy so it would seem that wildcards aren't required.

They are required if you want a partial-length match.

	% [[ abcde = (#i)ABcde ]]; print $?
	0
	% [[ abcde = (#i)Bcd ]]; print $?
	1
	% [[ abcde = (#i)*Bcd* ]]; print $?
	0


>> In this specific case it's better to use pattern matching of course:
>>
>>      [[ $foo == (#i)*abc* ]] && echo match
>>
> That's what puzzles me I expect:
>
> [[ $foo == (#i)*abc* ]] && echo match
>
> and:
>
> [[ $foo =~ (#i)abc ]] && echo match
>
> ... to be exactly the same. If not, why not?

Glob qualifiers only work with globs.

Regular expression matching is done using an external library (either
PCRE or the host regex library).  These libraries can hardly be
expected to understand zsh glob qualifiers.


-- 
vq


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 22:15     ` Lawrence Velázquez
@ 2022-11-08  1:57       ` Ray Andrews
  0 siblings, 0 replies; 13+ messages in thread
From: Ray Andrews @ 2022-11-08  1:57 UTC (permalink / raw)
  To: zsh-users


On 2022-11-07 14:15, Lawrence Velázquez wrote:
> Glob qualifiers only work with globs.
> Regular expression matching is done using an external library (either
> PCRE or the host regex library).  These libraries can hardly be
> expected to understand zsh glob qualifiers.

Ah!  So that's not even zsh's native opinion on the subject. There is a 
forgivable confusion there since filename globbing and pattern matching 
look so similar.    I often wonder when and where zsh relies on other 
libraries and programs.  This is a very good example of that sort of 
thing.  It short circuits any whining I might be tempted to do since 
it's not even zsh code.  Thanks, this is the sort of deep answer that 
parts many clouds.  I need to look to regex syntax for any answers I 
might want.



>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 21:50   ` Lawrence Velázquez
@ 2022-11-08  2:05     ` Ray Andrews
  2022-11-08  8:19       ` Roman Perepelitsa
  0 siblings, 1 reply; 13+ messages in thread
From: Ray Andrews @ 2022-11-08  2:05 UTC (permalink / raw)
  To: zsh-users


On 2022-11-07 13:50, Lawrence Velázquez wrote:
> I concur with Roman. I doubt you actually need case-insensitive
> regex.
>
I'm happy with what I've got working at the moment tho you guys would 
probably improve it.  Pardon my personal jargon but:

local vvar=$( basename $cc[$aa] 2> /dev/null )

if   [[ "$scope_msg" = 'BROAD' && $vvar = (#i)*$filter* ]]; then
elif [[ "$scope_msg" = 'Case INsensitive TAME' && $vvar:u = $filter:u 
]]; then
elif [[ "$scope_msg" = 'Case Sensitive WILD' && $vvar =~ $filter ]]; then
elif [[ "$scope_msg" = 'EXACT' && $vvar = $filter ]]; then
else cc[$aa]=
fi

... the function let's me search for directories with automatic 
wildcards and/or case sensitivity  or both or neither.  The four 
combinations seem well handled above.  The construction is still clumsy, 
I'll fix it shortly.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-08  2:05     ` Ray Andrews
@ 2022-11-08  8:19       ` Roman Perepelitsa
  2022-11-08 13:32         ` Ray Andrews
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2022-11-08  8:19 UTC (permalink / raw)
  To: Ray Andrews; +Cc: zsh-users

On Tue, Nov 8, 2022 at 3:06 AM Ray Andrews <rayandrews@eastlink.ca> wrote:
>
>
> On 2022-11-07 13:50, Lawrence Velázquez wrote:
> > I concur with Roman. I doubt you actually need case-insensitive
> > regex.
> >
> I'm happy with what I've got working at the moment tho you guys would
> probably improve it.  Pardon my personal jargon but:
>
> local vvar=$( basename $cc[$aa] 2> /dev/null )

There is a zsh way for this:

  local var=${cc[$aa]:t}

"t" is short for tail. There is also "h" for head.

> if   [[ "$scope_msg" = 'BROAD' && $vvar = (#i)*$filter* ]]; then
> elif [[ "$scope_msg" = 'Case INsensitive TAME' && $vvar:u = $filter:u
> ]]; then
> elif [[ "$scope_msg" = 'Case Sensitive WILD' && $vvar =~ $filter ]]; then
> elif [[ "$scope_msg" = 'EXACT' && $vvar = $filter ]]; then
> else cc[$aa]=
> fi

Here WILD suggests a wildcard (a.k.a. glob, a.k.a. pattern) match, but
the code is doing a regex match. If your intention is to perform a
wildcard/glob/pattern match, do this:

    [[ $vvar == $~filter ]]

Or, if you want to always perform a partial match:

    [[ $vvar == *$~filter* ]]

Other cases in your if-else chain also look suspiciously
non-orthogonal. The orthogonal bits of matching are:

1. Pattern matching or regex?
2. Case sensitive or not?
3. Partial or full?

There are a total of 8 combinations. If you drop regex (which you
probably want to do), it leaves 4 combinations.

    [[ $data == $~pattern ]]  # case sensitive, full
    [[ $data == (#i)$~pattern ]]  # case insensitive, full
    [[ $data == *$~pattern* ]]  # case sensitive, partial
    [[ $data == (#i)*$~pattern* ]]  # case insensitive, partial

Note that you don't need to quote $data here (although you can, if you
prefer to do it for stylistic reasons).

> ... the function let's me search for directories with automatic
> wildcards and/or case sensitivity  or both or neither.

There might be a better way to do this which would take advantage of
**/*. It's hard to say without knowing what you are trying to achieve.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-08  8:19       ` Roman Perepelitsa
@ 2022-11-08 13:32         ` Ray Andrews
  2022-11-08 14:37           ` Roman Perepelitsa
  0 siblings, 1 reply; 13+ messages in thread
From: Ray Andrews @ 2022-11-08 13:32 UTC (permalink / raw)
  To: zsh-users



>> local vvar=$( basename $cc[$aa] 2> /dev/null )
> There is a zsh way for this:
>
>    local var=${cc[$aa]:t}
>
> "t" is short for tail. There is also "h" for head.

Thanks yes, I knew zsh could do it, the use of basename was just a 
fill-in.  Anyway you did the work for me just there.  But I was going to 
pattern match, seems as usual zsh has a better way.


> Here WILD suggests a wildcard (a.k.a. glob, a.k.a. pattern) match, but
> the code is doing a regex match. If your intention is to perform a
> wildcard/glob/pattern match, do this:

Thing is that I need both.  Sometimes I'm searching for directories in a 
saved list, sometimes searching out there in the real world of globbing 
the filesystem.  What I showed was the search in the saved list.  My 
directory stack is file based, universal and persistent sorta like the 
history list but sometimes I want to go looking out on the FS too.  So 
yeah, 8 combinations :(  You'd think it might be four since in the mind 
it feels like a text search in both situations.


>
> There might be a better way to do this which would take advantage of
> **/*. It's hard to say without knowing what you are trying to achieve.

It's a directory 'cd' from my personal stack sent to Sebastian's 
n_list() for graphical selection.  I can't live without it.  But I 
decided to add live 'cd' from the entire filesystem filtered via 
arguments and, as above, the four combinations and as you anticipate I 
ran into the mud expecting the syntax for the four combinations in the 
latter situation to be the same as the former but the latter is 'live 
globbing' whereas the former is just pattern matching in the lines of a 
file so they are chalk and cheese.   It seems to be working but there's 
always the next gotcha:

     1 /aWorking/Zsh/Source/Wk 0 $ . c; c ,a zsh

     Searching entire system for directories matching "zsh" (BROAD):

... gives this n_list() screen:

-------------------------------------------------------------------------------------------------------

: Most recently visited directories matching "zsh" (BROAD):

/aWorking/Backup/Zsh
/aWorking/Zsh-55555
/aWorking/Zsh
/usr/share/zsh
/aWorking/garbageZSH
/aWorking/Zsh/Zsh-5.8
/usr/share/doc/zsh-common

: System wide directories matching "zsh" (BROAD):

/aMisc/Backup-root-2022-10-11/.thunderbird/i3n1gea2.Default 
User/Mail/Local Folders/ZSH.sbd
/aWorking/Backup/Zsh
/aWorking/Backup/Zsh/Zsh-5.8
/aWorking/Backup/Zsh/Zsh-5.8/share/zsh
/aWorking/garbageZSH
/aWorking/Zsh
/aWorking/Zsh-55555
/aWorking/Zsh/Zsh-5.8
/aWorking/Zsh/Zsh-5.8/share/zsh
/etc/zzsh
/root/.thunderbird/i3n1gea2.Default User/Mail/Local Folders/ZSH.sbd
/usr/lib/x86_64-linux-gnu/zsh
/usr/lib/x86_64-linux-gnu/zsh/5.8/zsh
/usr/local/share/zsh
/usr/share/doc/zsh
/usr/share/doc/zsh-common
/usr/share/zsh
/usr/share/zsh/functions/Completion/Zsh

-------------------------------------------------------------------------------------------------------------

... cursor up,  cursor down, pick a directory, press ENTER and you're 
there automagically.

Or I can demand an exact search (no card sharping, no advice on how to 
be insensitive):


1 /aWorking/Zsh/Source/Wk 0 $ . c; c ,Xa zsh
Searching entire system for directories matching "zsh" (EXACT):

------------------------------------------------------------------------------------------------------------------

: Most recently visited directories matching "zsh" (EXACT):

/usr/share/zsh

: System wide directories matching "zsh" (EXACT):

/aWorking/Backup/Zsh/Zsh-5.8/share/zsh
/aWorking/Zsh/Zsh-5.8/share/zsh
/usr/lib/x86_64-linux-gnu/zsh
/usr/lib/x86_64-linux-gnu/zsh/5.8/zsh
/usr/local/share/zsh
/usr/share/doc/zsh
/usr/share/zsh

--------------------------------------------------------------------------------------------------------------------

... so far, so good.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-08 14:37           ` Roman Perepelitsa
@ 2022-11-08 14:30             ` Ray Andrews
  0 siblings, 0 replies; 13+ messages in thread
From: Ray Andrews @ 2022-11-08 14:30 UTC (permalink / raw)
  To: zsh-users


On 2022-11-08 06:37, Roman Perepelitsa wrote:
>
> Can you give an example of a use case where you are using a regex and
> cannot use a pattern instead? None of the examples you already listed
> would qualify.

Let me chew over what I've learned just yesterday and today and see how 
it shakes out then backatcha.  It's huge just conceptualizing the 
difference.  As I said, I've tended to think of globbing and regex and 
pattern matching as more or less the same thing.  :(




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-08 13:32         ` Ray Andrews
@ 2022-11-08 14:37           ` Roman Perepelitsa
  2022-11-08 14:30             ` Ray Andrews
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2022-11-08 14:37 UTC (permalink / raw)
  To: Ray Andrews; +Cc: zsh-users

On Tue, Nov 8, 2022 at 2:32 PM Ray Andrews <rayandrews@eastlink.ca> wrote:
>
> > Here WILD suggests a wildcard (a.k.a. glob, a.k.a. pattern) match, but
> > the code is doing a regex match.
>
> Thing is that I need both.

Can you give an example of a use case where you are using a regex and
cannot use a pattern instead? None of the examples you already listed
would qualify.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-07 21:10 [[ 'abcde' =~ (#i)Bcd ]] Ray Andrews
  2022-11-07 21:26 ` Roman Perepelitsa
@ 2022-11-08 17:40 ` Phil Pennock
  2022-11-08 18:43   ` Ray Andrews
  1 sibling, 1 reply; 13+ messages in thread
From: Phil Pennock @ 2022-11-08 17:40 UTC (permalink / raw)
  To: zsh-users

On 2022-11-07 at 13:10 -0800, Ray Andrews wrote:
> [[ 'abcde' =~ 'bcd' ]] && echo match1
> [[ 'abcde' = (#i)ABcde ]] && echo match2
> [[ 'abcde' =~ (#i)Bcd ]] && echo match3
> [[ 'bcd' =~ 'abcde' ]] && echo match4
> 
> ... I get match 1 and match 2.  I  understand not getting match 4 because
> '=~' is not bi-directional, the latter value must be a subset of the
> former.  But why don't I get match 3? It seems to break no rules to make
> 'Bcd' case insensitive and then find it within 'abcde'.  Is there a
> workaround?

 *  =   : equivalent to "==", string comparison with globs supported
 *  =~  : regular expression match, syntax from Perl, used in bash
 *  -regex-match : operator for very explicit regexp match
 *  -pcre-match : operator for very explicit regexp match

In zsh, = and == came first, then -pcre-match.
The =~ operator from Perl was added to bash and I added support to zsh,
and wrote the zsh/regex module so that _by default_ zsh would be
compatible with bash.

Using `setopt pcre_match` will switch =~ from bash-compatible to using
PCRE, Perl Compatible Regular Expressions, so much closer to the
original =~.

The downside of PCRE in zsh is that for licensing reasons, not all
distributions include it.  Zsh itself is BSD-licensed, PCRE is not.

If PCRE is available, then:

  [[ 'abcde' =~ (?i)Bcd ]] && echo match3

Use `man pcrepattern` and look at "INTERNAL OPTION SETTING" to see how
(?something) turns on options, with 'i' being PCRE_CASELESS.  This
syntax matches Perl.

If PCRE is not available, then you are stuck with ERE syntax
(see `man 7 regex`) and you'll have to be a lot more explicit.  So
probably better to find ways to use zsh glob pattern matching instead of
regular expressions.

-Phil


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [[ 'abcde' =~ (#i)Bcd ]]
  2022-11-08 17:40 ` Phil Pennock
@ 2022-11-08 18:43   ` Ray Andrews
  0 siblings, 0 replies; 13+ messages in thread
From: Ray Andrews @ 2022-11-08 18:43 UTC (permalink / raw)
  To: zsh-users


On 2022-11-08 09:40, Phil Pennock wrote:
>
> Using `setopt pcre_match` will switch =~ from bash-compatible to using
> PCRE, Perl Compatible Regular Expressions, so much closer to the
> original =~.

So complicated!  Not just zsh as she is, but all that history and the 
variations on the theme.  I've always wondered what PCRE is, now I 
know.  Being even vaguely informed about all this stuff is useful tho it 
puts all issues in their cultural context.  Thinks might not be as clean 
and clear as one might wish.





^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-08 18:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 21:10 [[ 'abcde' =~ (#i)Bcd ]] Ray Andrews
2022-11-07 21:26 ` Roman Perepelitsa
2022-11-07 21:47   ` Ray Andrews
2022-11-07 22:15     ` Lawrence Velázquez
2022-11-08  1:57       ` Ray Andrews
2022-11-07 21:50   ` Lawrence Velázquez
2022-11-08  2:05     ` Ray Andrews
2022-11-08  8:19       ` Roman Perepelitsa
2022-11-08 13:32         ` Ray Andrews
2022-11-08 14:37           ` Roman Perepelitsa
2022-11-08 14:30             ` Ray Andrews
2022-11-08 17:40 ` Phil Pennock
2022-11-08 18:43   ` Ray Andrews

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).