* Re: input foo, output '[F|f][O|o][O|o]'?
2013-07-02 0:58 ` TJ Luoma
@ 2013-07-02 1:30 ` Lawrence Velázquez
2013-07-02 1:53 ` Benjamin R. Haskell
2013-07-02 3:51 ` ZyX
2 siblings, 0 replies; 14+ messages in thread
From: Lawrence Velázquez @ 2013-07-02 1:30 UTC (permalink / raw)
To: TJ Luoma; +Cc: zsh-users
On Jul 1, 2013, at 8:58 PM, TJ Luoma <luomat@gmail.com> wrote:
> On 1 Jul 2013, at 14:44, ZyX wrote:
>
>> By the way, what regex engine is your output for? Any I am aware of parse "[N|n]" as "either one of three characters: N, n, or pipe".
>
> Really? I can think of several that support it. Maybe it's because I'm old enough to remember when a lot of these utilities didn't have 'ignore case'
>
> % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> bar
> bat
>
> % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> XXX
> bar
> bat
These regex are not matching what you think they are matching. For sure, "[F|f]" matches both lowercase and uppercase F, but as ZyX said, it also matches pipe characters:
% echo "foo\nbar\nbaz\nbar|||baz" | egrep -v '[F|f]'
bar
baz
% echo "foo\nbar\nbaz\nbar|||baz" | sed 's/[F|f][O|o][O|o]/XXX/g'
XXX
bar
baz
barXXXbaz
You are probably thinking of "(F|f)"; you should just use "[Ff]".
> You can also use it for matching case/esac :
>
> case "$i" in
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> echo "matched crashplan"
> ;;
>
> *)
> echo "No Match"
> ;;
>
> esac
Same misconception here; pipes in patterns' bracket expressions have no special meaning. You can remove all the pipes from your example, and it would still work.
% case "crasHPLan" in
case> [Cc][Rr][Aa][Ss][Hh][Pp][Ll][Aa][Nn]) echo "matched";;
case> *) echo "did not match";;
case> esac
matched
vq
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: input foo, output '[F|f][O|o][O|o]'?
2013-07-02 0:58 ` TJ Luoma
2013-07-02 1:30 ` Lawrence Velázquez
@ 2013-07-02 1:53 ` Benjamin R. Haskell
2013-07-02 4:24 ` TJ Luoma
2013-07-02 3:51 ` ZyX
2 siblings, 1 reply; 14+ messages in thread
From: Benjamin R. Haskell @ 2013-07-02 1:53 UTC (permalink / raw)
To: TJ Luoma; +Cc: ZyX, Zsh-Users List
On Mon, 1 Jul 2013, TJ Luoma wrote:
>
> On 1 Jul 2013, at 14:44, ZyX wrote:
>
>> [...]
>> By the way, what regex engine is your output for? Any I am aware of
>> parse "[N|n]" as "either one of three characters: N, n, or pipe".
>
> Really? I can think of several that support it. Maybe it's because I'm old
> enough to remember when a lot of these utilities didn't have 'ignore case'
I think you've missed the point of "N, n, or pipe". [F|f] matches upper
'F' or lower 'f', but also the character '|'. You seem to be
conflating:
[xyz] - 'x' or 'y' or 'z'
with:
(x|y|z) - 'x' or 'y' or 'z'
The '|' doesn't mean 'or' within square brackets. It means the literal
character: '|'.
> % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> bar
> bat
% echo "foo\nb|r\nbat" | egrep -v '[F|f]'
bat
(It rejected 'b|r', because it contains '|', even though it doesn't
contain 'F' or 'f')
> % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> XXX
> bar
> bat
% echo "f|o\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
XXX
bar
bat
(It changed 'f|o' to 'XXX', despite '|' not being 'O' or 'o')
> You can also use it for matching case/esac :
>
> case "$i" in
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> echo "matched crashplan"
> ;;
>
> *)
> echo "No Match"
> ;;
>
> esac
(Using a smaller example:)
i='|||'
case "$i" in
[F|f][O|o][O|o]) echo matched foo ;;
*) echo no match ;;
esac
will echo:
matched foo
You really just want:
[Cc][Rr][Aa][Ss][Hh][Pp][Ll][Aa][Nn])
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: input foo, output '[F|f][O|o][O|o]'?
2013-07-02 1:53 ` Benjamin R. Haskell
@ 2013-07-02 4:24 ` TJ Luoma
0 siblings, 0 replies; 14+ messages in thread
From: TJ Luoma @ 2013-07-02 4:24 UTC (permalink / raw)
To: Benjamin R. Haskell; +Cc: ZyX, Zsh-Users List
On 1 Jul 2013, at 21:53, Benjamin R. Haskell wrote:
> I think you've missed the point of "N, n, or pipe". [F|f] matches
> upper 'F' or lower 'f', but also the character '|'. You seem to be
> conflating:
>
> [xyz] - 'x' or 'y' or 'z'
>
> with:
>
> (x|y|z) - 'x' or 'y' or 'z'
>
> The '|' doesn't mean 'or' within square brackets. It means the
> literal character: '|'.
Oh FFS… yes… sorry… that's exactly what I was doing. Sorry about
that. I kept reading what I was thinking about instead of what I'd
written.
*sigh*
(It's been a long week for it to be only just past Monday :-)
TjL
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: input foo, output '[F|f][O|o][O|o]'?
2013-07-02 0:58 ` TJ Luoma
2013-07-02 1:30 ` Lawrence Velázquez
2013-07-02 1:53 ` Benjamin R. Haskell
@ 2013-07-02 3:51 ` ZyX
2013-07-02 4:00 ` ZyX
2 siblings, 1 reply; 14+ messages in thread
From: ZyX @ 2013-07-02 3:51 UTC (permalink / raw)
To: TJ Luoma; +Cc: Zsh-Users List
02.07.13, 04:58, "TJ Luoma" <luomat@gmail.com>":
>
>
> On 1 Jul 2013, at 14:44, ZyX wrote:
>
> > I do not think you will find a way to do this. All regex engines
> > (precisely, programs that are using them) I know support a way to set
> > case sensitivity for the whole regular expression and some support
> > toggling this for a part of regular expression. In grep this is an -i
> > switch, for vim it is either /i or \c/\C, for sed this is /i, for PCRE
> > and perl this is additionally (?i) "atom" (additionally to other ways
> > of toggling the behavior which are highly dependent on programs
> > embedding PCRE; for perl this is usual /i flag). Even zsh globs do
> > support (#i).
>
> Phil Pennock's version worked great:
>
> % foo=CrashPlan
> % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> %
This is a bicycle. You asked for a standard way, you will not find it.
>
>
> > By the way, what regex engine is your output for? Any I am aware of
> > parse "[N|n]" as "either one of three characters: N, n, or pipe".
>
> Really? I can think of several that support it. Maybe it's because I'm
> old enough to remember when a lot of these utilities didn't have 'ignore
> case'
>
> % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> bar
> bat
>
> % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> XXX
> bar
> bat
>
> You can also use it for matching case/esac :
>
> case "$i" in
> [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> echo "matched crashplan"
> ;;
>
> *)
> echo "No Match"
> ;;
>
> esac
It is not a disproof. Check it with pipe symbol in input.
>
>
> > I am also assuming XY problem here: what for do you need such
> > conversion? You should consider lowercasing the tested string if
> > nothing like -i is available.
>
> It's for use with the AddDescription directive for .htaccess which (from
> what I understand) takes its case sensitivity from the underlying
> filesystem when matching filenames. There's no "ignore case" flag or
> anything else that I can use with it, so my only option (at least, the
> only one I can think of) is the one that I suggested. For example, if I
> wanted to add this for any files which start with 'BBEdit' (case
> insensitive) this is what I'd need to use:
>
> AddDescription "<a href='http://barebones.com/bbedit'>A text editor
> that doesn't suck</a>" [B|b][B|b][E|e][D|d][I|i][T|t]*
>
> I verified that it works, but typing that stuff manually is tedious and
> highly error prone, which made it the perfect place for a shell script
> :-)
Unless AddDescription uses different regex engine then FilesMatch answer is in the first link if searching for "htaccess case insensitive regex": (?i:pattern).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: input foo, output '[F|f][O|o][O|o]'?
2013-07-02 3:51 ` ZyX
@ 2013-07-02 4:00 ` ZyX
2013-07-02 23:24 ` Phil Pennock
0 siblings, 1 reply; 14+ messages in thread
From: ZyX @ 2013-07-02 4:00 UTC (permalink / raw)
To: TJ Luoma; +Cc: Zsh-Users List
02.07.13, 07:51, "ZyX" <kp-pav@yandex.ru>":
>
>
>
> 02.07.13, 04:58, "TJ Luoma" <luomat@gmail.com>":
> >
> >
> > On 1 Jul 2013, at 14:44, ZyX wrote:
> >
> > > I do not think you will find a way to do this. All regex engines
> > > (precisely, programs that are using them) I know support a way to set
> > > case sensitivity for the whole regular expression and some support
> > > toggling this for a part of regular expression. In grep this is an -i
> > > switch, for vim it is either /i or \c/\C, for sed this is /i, for PCRE
> > > and perl this is additionally (?i) "atom" (additionally to other ways
> > > of toggling the behavior which are highly dependent on programs
> > > embedding PCRE; for perl this is usual /i flag). Even zsh globs do
> > > support (#i).
> >
> > Phil Pennock's version worked great:
> >
> > % foo=CrashPlan
> > % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> > [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
> > %
>
> This is a bicycle. You asked for a standard way, you will not find it.
>
> >
> >
> > > By the way, what regex engine is your output for? Any I am aware of
> > > parse "[N|n]" as "either one of three characters: N, n, or pipe".
> >
> > Really? I can think of several that support it. Maybe it's because I'm
> > old enough to remember when a lot of these utilities didn't have 'ignore
> > case'
> >
> > % echo "foo\nbar\nbat" | egrep -v '[F|f]'
> > bar
> > bat
> >
> > % echo "foo\nbar\nbat" | sed 's#[F|f][O|o][O|o]#XXX#g'
> > XXX
> > bar
> > bat
> >
> > You can also use it for matching case/esac :
> >
> > case "$i" in
> > [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n])
> > echo "matched crashplan"
> > ;;
> >
> > *)
> > echo "No Match"
> > ;;
> >
> > esac
>
> It is not a disproof. Check it with pipe symbol in input.
>
> >
> >
> > > I am also assuming XY problem here: what for do you need such
> > > conversion? You should consider lowercasing the tested string if
> > > nothing like -i is available.
> >
> > It's for use with the AddDescription directive for .htaccess which (from
> > what I understand) takes its case sensitivity from the underlying
> > filesystem when matching filenames. There's no "ignore case" flag or
> > anything else that I can use with it, so my only option (at least, the
> > only one I can think of) is the one that I suggested. For example, if I
> > wanted to add this for any files which start with 'BBEdit' (case
> > insensitive) this is what I'd need to use:
> >
> > AddDescription "<a href='http://barebones.com/bbedit'>A text editor
> > that doesn't suck</a>" [B|b][B|b][E|e][D|d][I|i][T|t]*
> >
> > I verified that it works, but typing that stuff manually is tedious and
> > highly error prone, which made it the perfect place for a shell script
> > :-)
>
> Unless AddDescription uses different regex engine then FilesMatch answer is in the first link if searching for "htaccess case insensitive regex": (?i:pattern).
It seems it does use different regex engine.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: input foo, output '[F|f][O|o][O|o]'?
2013-07-02 4:00 ` ZyX
@ 2013-07-02 23:24 ` Phil Pennock
0 siblings, 0 replies; 14+ messages in thread
From: Phil Pennock @ 2013-07-02 23:24 UTC (permalink / raw)
To: ZyX; +Cc: TJ Luoma, Zsh-Users List
On 2013-07-02 at 08:00 +0400, ZyX wrote:
> > > Phil Pennock's version worked great:
> > >
> > > % foo=CrashPlan
> > > % for c in ${(s::)foo}; do print -n "[${(U)c}|${(L)c}]";done; print
> > > [C|c][R|r][A|a][S|s][H|h][P|p][L|l][A|a][N|n]
Sorry, I was in a rush and missed that this was [A|B] which should of
course be written [AB] or (A|B) if the regexp language supports the
latter. I should have caught that, instead of answering exactly what
was asked.
> > Unless AddDescription uses different regex engine then FilesMatch
> > answer is in the first link if searching for "htaccess case
> > insensitive regex": (?i:pattern).
>
> It seems it does use different regex engine.
AddDescription does not take a regex. It takes a filename pattern, or
what in shell is called a Glob.
It happens that some shells use [AB] as a glob pattern too, also to
introduce a character class, and that's why it works -- Apache supports
that syntax also.
Those places in Apache that do take regexps use the PCRE engine, the one
written by Philip Hazel for Exim, and which zsh also supports with
"zmodload zsh/pcre" (or setting the option to change =~ to use it,
"setopt rematch_pcre", which will auto-load that module when you first
use =~).
So those places in Apache which want regexps, you can test with zsh to
get a decent approximation, or use the pcretest(1) tool from the PCRE
distribution to get something designed to interactively test regexps
against inputs.
If you want to settle on that syntax, also consider installing the
pcregrep tool. It's very nice to be able to relax and just use PCRE
syntax, even though the PCRE implementation is not as efficient as the
older tools (or the newer RE2 system).
-Phil
^ permalink raw reply [flat|nested] 14+ messages in thread