input foo, output '[F|f][O|o][O|o]'?

zsh-users
 help / color / mirror / code / Atom feed

* input foo, output '[F|f][O|o][O|o]'?
@ 2013-07-01 17:59 TJ Luoma
  2013-07-01 18:44 ` ZyX
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: TJ Luoma @ 2013-07-01 17:59 UTC (permalink / raw)
  To: Zsh-Users List

Before I reinvent the wheel, I thought I'd ask if someone already had 
(or knew of) a way to take a string of characters and output a 'case 
insensitive' regex version.

For example, if I input 'CrashPlan' I'd want to get out 
[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]

(Input will usually be ASCII letters, with an occasional number and 
perhaps the occasional '-' or '_' but doesn't need to handle anything 
more complex than that.)

I tried Google but found it pretty impossible to make a good query for 
something like this.

TjL

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-01 17:59 input foo, output '[F|f][O|o][O|o]'? TJ Luoma
@ 2013-07-01 18:44 ` ZyX
  2013-07-02  0:58   ` TJ Luoma
  2013-07-01 19:37 ` Phil Pennock
  2013-07-01 23:51 ` Alex Satrapa
  2 siblings, 1 reply; 14+ messages in thread
From: ZyX @ 2013-07-01 18:44 UTC (permalink / raw)
  To: TJ Luoma; +Cc: Zsh-Users List

I do not think you will find a way to do this. All regex engines (precisely, programs that are using them) I know support a way to set case sensitivity for the whole regular expression and some support toggling this for a part of regular expression. In grep this is an -i switch, for vim it is either /i or \c/\C, for sed this is /i, for PCRE and perl this is additionally (?i) "atom" (additionally to other ways of toggling the behavior which are highly dependent on programs embedding PCRE; for perl this is usual /i flag). Even zsh globs do support (#i).

By the way, what regex engine is your output for? Any I am aware of parse "[N|n]" as "either one of three characters: N, n, or pipe".

I am also assuming XY problem here: what for do you need such conversion? You should consider lowercasing the tested string if nothing like -i is available.

01.07.13, 22:12, "TJ Luoma" <luomat@gmail.com>":
> 
> 
> Before I reinvent the wheel, I thought I'd ask if someone already had 
> (or knew of) a way to take a string of characters and output a 'case 
> insensitive' regex version.
> 
> For example, if I input 'CrashPlan' I'd want to get out 
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> 
> (Input will usually be ASCII letters, with an occasional number and 
> perhaps the occasional '-' or '_' but doesn't need to handle anything 
> more complex than that.)
> 
> I tried Google but found it pretty impossible to make a good query for 
> something like this.
> 
> TjL

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-01 18:44 ` ZyX
@ 2013-07-02  0:58   ` TJ Luoma
  2013-07-02  1:30     ` Lawrence Velázquez
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: TJ Luoma @ 2013-07-02  0:58 UTC (permalink / raw)
  To: ZyX; +Cc: Zsh-Users List


On 1 Jul 2013, at 14:44, ZyX wrote:

> I do not think you will find a way to do this. All regex engines 
> (precisely, programs that are using them) I know support a way to set 
> case sensitivity for the whole regular expression and some support 
> toggling this for a part of regular expression. In grep this is an -i 
> switch, for vim it is either /i or \c/\C, for sed this is /i, for PCRE 
> and perl this is additionally (?i) "atom" (additionally to other ways 
> of toggling the behavior which are highly dependent on programs 
> embedding PCRE; for perl this is usual /i flag). Even zsh globs do 
> support (#i).

Phil Pennock's version worked great:

% foo=CrashPlan
% for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
%


> By the way, what regex engine is your output for? Any I am aware of 
> parse "[N|n]" as "either one of three characters: N, n, or pipe".

Really? I can think of several that support it. Maybe it's because I'm 
old enough to remember when a lot of these utilities didn't have 'ignore 
case'

% echo "foo\nbar\nbat" | egrep -v '[F|f]'
bar
bat

% echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
XXX
bar
bat

You can also use it for matching case/esac :

case "$i" in
	[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
			echo "matched crashplan"
	;;

	*)
			echo "No Match"
	;;

esac


> I am also assuming XY problem here: what for do you need such 
> conversion? You should consider lowercasing the tested string if 
> nothing like -i is available.

It's for use with the AddDescription directive for .htaccess which (from 
what I understand) takes its case sensitivity from the underlying 
filesystem when matching filenames. There's no "ignore case" flag or 
anything else that I can use with it, so my only option (at least, the 
only one I can think of) is the one that I suggested. For example, if I 
wanted to add this for any files which start with 'BBEdit' (case 
insensitive) this is what I'd need to use:

	AddDescription "<a href='http://barebones.com/bbedit'>A text editor 
that doesn't suck</a>" [B|b][B|b][E|e][D|d][I|i][T|t]*

I verified that it works, but typing that stuff manually is tedious and 
highly error prone, which made it the perfect place for a shell script 
:-)

TjL


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  0:58   ` TJ Luoma
@ 2013-07-02  1:30     ` Lawrence Velázquez
  2013-07-02  1:53     ` Benjamin R. Haskell
  2013-07-02  3:51     ` ZyX
  2 siblings, 0 replies; 14+ messages in thread
From: Lawrence Velázquez @ 2013-07-02  1:30 UTC (permalink / raw)
  To: TJ Luoma; +Cc: zsh-users

On Jul 1, 2013, at 8:58 PM, TJ Luoma <luomat@gmail.com> wrote:

> On 1 Jul 2013, at 14:44, ZyX wrote:
> 
>> By the way, what regex engine is your output for? Any I am aware of parse "[N|n]" as "either one of three characters: N, n, or pipe".
> 
> Really? I can think of several that support it. Maybe it's because I'm old enough to remember when a lot of these utilities didn't have 'ignore case'
> 
> % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> bar
> bat
> 
> % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> XXX
> bar
> bat

These regex are not matching what you think they are matching. For sure, "[F|f]" matches both lowercase and uppercase F, but as ZyX said, it also matches pipe characters:

    % echo "foo\nbar\nbaz\nbar|||baz" | egrep -v '[F|f]'
    bar
    baz

    % echo "foo\nbar\nbaz\nbar|||baz" | sed 's/[F|f][O|o][O|o]/XXX/g'
    XXX
    bar
    baz
    barXXXbaz

You are probably thinking of "(F|f)"; you should just use "[Ff]".

> You can also use it for matching case/esac :
> 
> case "$i" in
> 	[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> 			echo "matched crashplan"
> 	;;
> 
> 	*)
> 			echo "No Match"
> 	;;
> 
> esac

Same misconception here; pipes in patterns' bracket expressions have no special meaning. You can remove all the pipes from your example, and it would still work.

    % case "crasHPLan" in
    case>           [Cc][Rr][Aa][Ss][Hh][Pp][Ll][Aa][Nn]) echo "matched";;
    case>           *) echo "did not match";;
    case> esac
    matched

vq

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  0:58   ` TJ Luoma
  2013-07-02  1:30     ` Lawrence Velázquez
@ 2013-07-02  1:53     ` Benjamin R. Haskell
  2013-07-02  4:24       ` TJ Luoma
  2013-07-02  3:51     ` ZyX
  2 siblings, 1 reply; 14+ messages in thread
From: Benjamin R. Haskell @ 2013-07-02  1:53 UTC (permalink / raw)
  To: TJ Luoma; +Cc: ZyX, Zsh-Users List

On Mon, 1 Jul 2013, TJ Luoma wrote:

>
> On 1 Jul 2013, at 14:44, ZyX wrote:
>
>> [...]
>> By the way, what regex engine is your output for? Any I am aware of 
>> parse "[N|n]" as "either one of three characters: N, n, or pipe".
>
> Really? I can think of several that support it. Maybe it's because I'm old 
> enough to remember when a lot of these utilities didn't have 'ignore case'

I think you've missed the point of "N, n, or pipe".  [F|f] matches upper 
'F' or lower 'f', but also the character '|'.  You seem to be 
conflating:

[xyz] - 'x' or 'y' or 'z'

with:

(x|y|z) - 'x' or 'y' or 'z'

The '|' doesn't mean 'or' within square brackets.  It means the literal 
character: '|'.


> % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> bar
> bat

% echo "foo\nb|r\nbat" | egrep -v '[F|f]'
bat
(It rejected 'b|r', because it contains '|', even though it doesn't 
contain 'F' or 'f')


> % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> XXX
> bar
> bat

% echo "f|o\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
XXX
bar
bat
(It changed 'f|o' to 'XXX', despite '|' not being 'O' or 'o')


> You can also use it for matching case/esac :
>
> case "$i" in
> 	[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> 			echo "matched crashplan"
> 	;;
>
> 	*)
> 			echo "No Match"
> 	;;
>
> esac

(Using a smaller example:)

i='|||'

case "$i" in
   [F|f][O|o][O|o]) echo matched foo ;;
   *) echo no match ;;
esac

will echo:
matched foo

You really just want:

[Cc][Rr][Aa][Ss][Hh][Pp][Ll][Aa][Nn])


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  1:53     ` Benjamin R. Haskell
@ 2013-07-02  4:24       ` TJ Luoma
  0 siblings, 0 replies; 14+ messages in thread
From: TJ Luoma @ 2013-07-02  4:24 UTC (permalink / raw)
  To: Benjamin R. Haskell; +Cc: ZyX, Zsh-Users List


On 1 Jul 2013, at 21:53, Benjamin R. Haskell wrote:

> I think you've missed the point of "N, n, or pipe".  [F|f] matches 
> upper 'F' or lower 'f', but also the character '|'.  You seem to be 
> conflating:
>
> [xyz] - 'x' or 'y' or 'z'
>
> with:
>
> (x|y|z) - 'x' or 'y' or 'z'
>
> The '|' doesn't mean 'or' within square brackets.  It means the 
> literal character: '|'.

Oh FFS… yes… sorry… that's exactly what I was doing. Sorry about 
that. I kept reading what I was thinking about instead of what I'd 
written.

*sigh*

(It's been a long week for it to be only just past Monday :-)

TjL


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  0:58   ` TJ Luoma
  2013-07-02  1:30     ` Lawrence Velázquez
  2013-07-02  1:53     ` Benjamin R. Haskell
@ 2013-07-02  3:51     ` ZyX
  2013-07-02  4:00       ` ZyX
  2 siblings, 1 reply; 14+ messages in thread
From: ZyX @ 2013-07-02  3:51 UTC (permalink / raw)
  To: TJ Luoma; +Cc: Zsh-Users List



02.07.13, 04:58, "TJ Luoma" <luomat@gmail.com>":
> 
> 
> On 1 Jul 2013, at 14:44, ZyX wrote:
> 
> > I do not think you will find a way to do this. All regex engines 
> > (precisely, programs that are using them) I know support a way to set 
> > case sensitivity for the whole regular expression and some support 
> > toggling this for a part of regular expression. In grep this is an -i 
> > switch, for vim it is either /i or \c/\C, for sed this is /i, for PCRE 
> > and perl this is additionally (?i) "atom" (additionally to other ways 
> > of toggling the behavior which are highly dependent on programs 
> > embedding PCRE; for perl this is usual /i flag). Even zsh globs do 
> > support (#i).
> 
> Phil Pennock's version worked great:
> 
> % foo=CrashPlan
> % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> %

This is a bicycle. You asked for a standard way, you will not find it.

> 
> 
> > By the way, what regex engine is your output for? Any I am aware of 
> > parse "[N|n]" as "either one of three characters: N, n, or pipe".
> 
> Really? I can think of several that support it. Maybe it's because I'm 
> old enough to remember when a lot of these utilities didn't have 'ignore 
> case'
> 
> % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> bar
> bat
> 
> % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> XXX
> bar
> bat
> 
> You can also use it for matching case/esac :
> 
> case "$i" in
> 	[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> 			echo "matched crashplan"
> 	;;
> 
> 	*)
> 			echo "No Match"
> 	;;
> 
> esac

It is not a disproof. Check it with pipe symbol in input.

> 
> 
> > I am also assuming XY problem here: what for do you need such 
> > conversion? You should consider lowercasing the tested string if 
> > nothing like -i is available.
> 
> It's for use with the AddDescription directive for .htaccess which (from 
> what I understand) takes its case sensitivity from the underlying 
> filesystem when matching filenames. There's no "ignore case" flag or 
> anything else that I can use with it, so my only option (at least, the 
> only one I can think of) is the one that I suggested. For example, if I 
> wanted to add this for any files which start with 'BBEdit' (case 
> insensitive) this is what I'd need to use:
> 
> 	AddDescription "<a href='http://barebones.com/bbedit'>A text editor 
> that doesn't suck</a>" [B|b][B|b][E|e][D|d][I|i][T|t]*
> 
> I verified that it works, but typing that stuff manually is tedious and 
> highly error prone, which made it the perfect place for a shell script 
> :-)

Unless AddDescription uses different regex engine then FilesMatch answer is in the first link if searching for "htaccess case insensitive regex": (?i:pattern).


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  3:51     ` ZyX
@ 2013-07-02  4:00       ` ZyX
  2013-07-02 23:24         ` Phil Pennock
  0 siblings, 1 reply; 14+ messages in thread
From: ZyX @ 2013-07-02  4:00 UTC (permalink / raw)
  To: TJ Luoma; +Cc: Zsh-Users List



02.07.13, 07:51, "ZyX" <kp-pav@yandex.ru>":
> 
> 
> 
> 02.07.13, 04:58, "TJ Luoma" <luomat@gmail.com>":
> > 
> > 
> > On 1 Jul 2013, at 14:44, ZyX wrote:
> > 
> > > I do not think you will find a way to do this. All regex engines 
> > > (precisely, programs that are using them) I know support a way to set 
> > > case sensitivity for the whole regular expression and some support 
> > > toggling this for a part of regular expression. In grep this is an -i 
> > > switch, for vim it is either /i or \c/\C, for sed this is /i, for PCRE 
> > > and perl this is additionally (?i) "atom" (additionally to other ways 
> > > of toggling the behavior which are highly dependent on programs 
> > > embedding PCRE; for perl this is usual /i flag). Even zsh globs do 
> > > support (#i).
> > 
> > Phil Pennock's version worked great:
> > 
> > % foo=CrashPlan
> > % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> > [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> > %
> 
> This is a bicycle. You asked for a standard way, you will not find it.
> 
> > 
> > 
> > > By the way, what regex engine is your output for? Any I am aware of 
> > > parse "[N|n]" as "either one of three characters: N, n, or pipe".
> > 
> > Really? I can think of several that support it. Maybe it's because I'm 
> > old enough to remember when a lot of these utilities didn't have 'ignore 
> > case'
> > 
> > % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> > bar
> > bat
> > 
> > % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> > XXX
> > bar
> > bat
> > 
> > You can also use it for matching case/esac :
> > 
> > case "$i" in
> > 	[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> > 			echo "matched crashplan"
> > 	;;
> > 
> > 	*)
> > 			echo "No Match"
> > 	;;
> > 
> > esac
> 
> It is not a disproof. Check it with pipe symbol in input.
> 
> > 
> > 
> > > I am also assuming XY problem here: what for do you need such 
> > > conversion? You should consider lowercasing the tested string if 
> > > nothing like -i is available.
> > 
> > It's for use with the AddDescription directive for .htaccess which (from 
> > what I understand) takes its case sensitivity from the underlying 
> > filesystem when matching filenames. There's no "ignore case" flag or 
> > anything else that I can use with it, so my only option (at least, the 
> > only one I can think of) is the one that I suggested. For example, if I 
> > wanted to add this for any files which start with 'BBEdit' (case 
> > insensitive) this is what I'd need to use:
> > 
> > 	AddDescription "<a href='http://barebones.com/bbedit'>A text editor 
> > that doesn't suck</a>" [B|b][B|b][E|e][D|d][I|i][T|t]*
> > 
> > I verified that it works, but typing that stuff manually is tedious and 
> > highly error prone, which made it the perfect place for a shell script 
> > :-)
> 
> Unless AddDescription uses different regex engine then FilesMatch answer is in the first link if searching for "htaccess case insensitive regex": (?i:pattern).

It seems it does use different regex engine.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  4:00       ` ZyX
@ 2013-07-02 23:24         ` Phil Pennock
  0 siblings, 0 replies; 14+ messages in thread
From: Phil Pennock @ 2013-07-02 23:24 UTC (permalink / raw)
  To: ZyX; +Cc: TJ Luoma, Zsh-Users List

On 2013-07-02 at 08:00 +0400, ZyX wrote:
> > > Phil Pennock's version worked great:
> > > 
> > > % foo=CrashPlan
> > > % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> > > [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]

Sorry, I was in a rush and missed that this was [A|B] which should of
course be written [AB] or (A|B) if the regexp language supports the
latter.  I should have caught that, instead of answering exactly what
was asked.

> > Unless AddDescription uses different regex engine then FilesMatch
> > answer is in the first link if searching for "htaccess case
> > insensitive regex": (?i:pattern).
> 
> It seems it does use different regex engine.

AddDescription does not take a regex.  It takes a filename pattern, or
what in shell is called a Glob.

It happens that some shells use [AB] as a glob pattern too, also to
introduce a character class, and that's why it works -- Apache supports
that syntax also.

Those places in Apache that do take regexps use the PCRE engine, the one
written by Philip Hazel for Exim, and which zsh also supports with
"zmodload zsh/pcre" (or setting the option to change =~ to use it,
"setopt rematch_pcre", which will auto-load that module when you first
use =~).

So those places in Apache which want regexps, you can test with zsh to
get a decent approximation, or use the pcretest(1) tool from the PCRE
distribution to get something designed to interactively test regexps
against inputs.

If you want to settle on that syntax, also consider installing the
pcregrep tool.  It's very nice to be able to relax and just use PCRE
syntax, even though the PCRE implementation is not as efficient as the
older tools (or the newer RE2 system).

-Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-01 17:59 input foo, output '[F|f][O|o][O|o]'? TJ Luoma
  2013-07-01 18:44 ` ZyX
@ 2013-07-01 19:37 ` Phil Pennock
  2013-07-02  0:24   ` TJ Luoma
  2013-07-01 23:51 ` Alex Satrapa
  2 siblings, 1 reply; 14+ messages in thread
From: Phil Pennock @ 2013-07-01 19:37 UTC (permalink / raw)
  To: TJ Luoma; +Cc: Zsh-Users List

On 2013-07-01 at 13:59 -0400, TJ Luoma wrote:
> Before I reinvent the wheel, I thought I'd ask if someone already had 
> (or knew of) a way to take a string of characters and output a 'case 
> insensitive' regex version.
> 
> For example, if I input 'CrashPlan' I'd want to get out 
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]

% foo=CrashPlan
% for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
[C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
%

If this is for use within zsh, then as ZyX suggests it's simpler, even
for regexp cases:

 setopt rematch_pcre
 [[ "cRAShpLAn" =~ (?i)$foo ]]

-Phil


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-01 19:37 ` Phil Pennock
@ 2013-07-02  0:24   ` TJ Luoma
  0 siblings, 0 replies; 14+ messages in thread
From: TJ Luoma @ 2013-07-02  0:24 UTC (permalink / raw)
  To: Phil Pennock; +Cc: Zsh-Users List


On 1 Jul 2013, at 15:37, Phil Pennock wrote:

> % foo=CrashPlan
> % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> %

That works great.

Thanks!


> If this is for use within zsh, then as ZyX suggests it's simpler, even
> for regexp cases:
>
> setopt rematch_pcre
> [[ "cRAShpLAn" =~ (?i)$foo ]]

It isn't… but that's helpful for future reference.

TjL


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-01 17:59 input foo, output '[F|f][O|o][O|o]'? TJ Luoma
  2013-07-01 18:44 ` ZyX
  2013-07-01 19:37 ` Phil Pennock
@ 2013-07-01 23:51 ` Alex Satrapa
  2013-07-02  0:38   ` Kurtis Rader
  2 siblings, 1 reply; 14+ messages in thread
From: Alex Satrapa @ 2013-07-01 23:51 UTC (permalink / raw)
  To: TJ Luoma; +Cc: Zsh-Users List

Since the topic is Regular Expressions, I will take the opportunity to recommend the O'Reilly book, "Mastering Regular Expressions" by Jeffrey Freidl. Even if you only work through the first few chapters (it provides examples for you to play with and learn), it will be worth the investment.

I am a very happy "student" (disciple, even) of Jeffrey Freidl's writing, and I have many O'Reilly books in my library.

Alex Satrapa

On 02/07/2013, at 3:59, "TJ Luoma" <luomat@gmail.com> wrote:

> 
> Before I reinvent the wheel, I thought I'd ask if someone already had (or knew of) a way to take a string of characters and output a 'case insensitive' regex version.
> 
> For example, if I input 'CrashPlan' I'd want to get out [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> 
> (Input will usually be ASCII letters, with an occasional number and perhaps the occasional '-' or '_' but doesn't need to handle anything more complex than that.)
> 
> I tried Google but found it pretty impossible to make a good query for something like this.
> 
> TjL

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-01 23:51 ` Alex Satrapa
@ 2013-07-02  0:38   ` Kurtis Rader
  2013-07-02  1:11     ` TJ Luoma
  0 siblings, 1 reply; 14+ messages in thread
From: Kurtis Rader @ 2013-07-02  0:38 UTC (permalink / raw)
  To: Alex Satrapa; +Cc: TJ Luoma, Zsh-Users List

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

+1 Alex's recommendation. The optimal answer depends on the capabilities of
the regex engine you're targeting. In zsh, for example, you can use the #i
modifier to make globs case insensitive. There's also the zsh/pcre module
which provides Perl compatible regex support including case insensitive
matches. In short, for any reasonable regex implementation there is no need
to resort to something as clumsy  as what you propose. Keep in mind to that
for anything other than ASCII this is impossible, for all practical
purposes, to implement correctly using the strategy you had in mind.


On Mon, Jul 1, 2013 at 4:51 PM, Alex Satrapa <grail@goldweb.com.au> wrote:

> Since the topic is Regular Expressions, I will take the opportunity to
> recommend the O'Reilly book, "Mastering Regular Expressions" by Jeffrey
> Freidl. Even if you only work through the first few chapters (it provides
> examples for you to play with and learn), it will be worth the investment.
>
> I am a very happy "student" (disciple, even) of Jeffrey Freidl's writing,
> and I have many O'Reilly books in my library.
>
> Alex Satrapa
>
> On 02/07/2013, at 3:59, "TJ Luoma" <luomat@gmail.com> wrote:
>
> >
> > Before I reinvent the wheel, I thought I'd ask if someone already had
> (or knew of) a way to take a string of characters and output a 'case
> insensitive' regex version.
> >
> > For example, if I input 'CrashPlan' I'd want to get out
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> >
> > (Input will usually be ASCII letters, with an occasional number and
> perhaps the occasional '-' or '_' but doesn't need to handle anything more
> complex than that.)
> >
> > I tried Google but found it pretty impossible to make a good query for
> something like this.
> >
> > TjL
>



-- 
Kurtis Rader
Caretake of the exceptional canines Junior and Chino

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: input foo, output '[F|f][O|o][O|o]'?
  2013-07-02  0:38   ` Kurtis Rader
@ 2013-07-02  1:11     ` TJ Luoma
  0 siblings, 0 replies; 14+ messages in thread
From: TJ Luoma @ 2013-07-02  1:11 UTC (permalink / raw)
  To: Zsh-Users List

On 1 Jul 2013, at 20:38, Kurtis Rader wrote:

> In short, for any reasonable regex implementation there is no need to 
> resort to something as clumsy  as what you propose.

You would certainly be correct in the vast majority of cases. However, I 
happen to be dealing with one of the small percentages of exceptions, as 
the AddDescription implementation in Apache does not have a 'reasonable 
regex' implementation for this, as far as I have been able to tell.

> Keep in mind to that for anything other than ASCII this is impossible, 
> for all practical purposes, to implement correctly using the strategy 
> you had in mind.

Yup. But these are filenames on a web server and will only be ASCII.

Thanks again

TjL

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-07-02 23:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-01 17:59 input foo, output '[F|f][O|o][O|o]'? TJ Luoma
2013-07-01 18:44 ` ZyX
2013-07-02  0:58   ` TJ Luoma
2013-07-02  1:30     ` Lawrence Velázquez
2013-07-02  1:53     ` Benjamin R. Haskell
2013-07-02  4:24       ` TJ Luoma
2013-07-02  3:51     ` ZyX
2013-07-02  4:00       ` ZyX
2013-07-02 23:24         ` Phil Pennock
2013-07-01 19:37 ` Phil Pennock
2013-07-02  0:24   ` TJ Luoma
2013-07-01 23:51 ` Alex Satrapa
2013-07-02  0:38   ` Kurtis Rader
2013-07-02  1:11     ` TJ Luoma

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).