From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-request@math.gatech.edu>
Received: (qmail 4244 invoked from network); 2 Nov 1998 17:30:55 -0000
Received: from math.gatech.edu (list@130.207.146.50)
  by ns1.primenet.com.au with SMTP; 2 Nov 1998 17:30:55 -0000
Received: (from list@localhost)
	by math.gatech.edu (8.9.1/8.9.1) id MAA21577;
	Mon, 2 Nov 1998 12:23:07 -0500 (EST)
Resent-Date: Mon, 2 Nov 1998 12:23:07 -0500 (EST)
Message-Id: <9811021707.AA24379@ibmth.df.unipi.it>
To: zsh-workers@math.gatech.edu (Zsh hackers list)
Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing 
In-Reply-To: ""Zefram""'s message of "Mon, 02 Nov 1998 09:21:30 NFT."
             <199811020921.JAA10113@diamond.tao.co.uk> 
Date: Mon, 02 Nov 1998 18:07:41 +0100
From: Peter Stephenson <pws@ibmth.df.unipi.it>
Resent-Message-ID: <"eiKNo2.0.4H5.wfUFs"@math>
Resent-From: zsh-workers@math.gatech.edu
X-Mailing-List: <zsh-workers@math.gatech.edu> archive/latest/4503
X-Loop: zsh-workers@math.gatech.edu
Precedence: list
Resent-Sender: zsh-workers-request@math.gatech.edu

"Zefram" wrote:
> Case insensitivity is
> a property of pattern matching, not filename generation.  Therefore the
> syntax to control case sensitivity should be part of the glob pattern
> syntax, rather than part of the glob qualifiers.  Preferably, it should
> be possible to localise case insensitivity to an arbitrary subpattern,
> rather than only to the entire pattern.
> 
> If someone comes up with a patch for case insensitive pattern matching
> of the form I have just described, I'll probably put it into the baseline.

I've got two possible implentations to propose (I have them both
working, the differences aren't so great).  Both are based on the way
it's done in perl 5:  the closure operator, in our case #, at the
start of a group signifies that flags follow.  This doesn't clash with
any existing syntax.  Obviously you need EXTENDED_GLOB set.

Syntax 1       Syntax 2
(#ifoo)bar     ((#i)foo)bar    match FOObar FoObar fOobar, not FOOBAR
bar(#ifoo)     bar(#i)foo      same with the bits the other way round
(#lfooBAR)     (#l)fooBAR      match FOOBAR FoOBAR fOoBAR, not foobar
(#ifoo(#cbar)) (#i)foo(#c)bar  same as first example; #c negates i or l

So in the first case, only the #X is the flag and grouping is normal,
while in the second case the whole of (#X) is the flag and doesn't
mark a separate group.  In both cases the effect stays until the end
of the nearest enclosing group.

#s (for significant) could be an alternative to #c; #l corresponds to
Sven's (f) qualifier, i.e. only lower case letters in the pattern
match case-insensitively in the target string.

I think I find the second version (which is also more perl-like) a
bit cleaner.  The only real bind with this is with KSH_GLOB, where the
second set of examples would have to become @(@(#i)foo)bar,
@(#l)fooBAR and @(#i)foo@(#c)bar.  (Actually I'm lying, because the
shell doesn't need the @ if it comes across the left parenthesis
before anything else, so you can drop the first @ in each case, but
this is deliberately undocumented.)

One point about this is that you need to turn on case-insensitivity at
any segment of the path where you need it:

/(#i)foo/(#i)bar       to match /FoO/BaR, /foo/BAR, /FOO/bar, ...

I think this is OK: it mirrors what the shell is really doing --- as
the file system is case sensitive, it has to do separate searches in
each directory.  Here the second syntax is definitely clearer.  If
someone wants to propose a way of turning on case-insensitivity for
all parts of the path --- which means doing globbing on every segment
so is slow --- I'll listen.

I will post the patch if there's any positive response to either of these.

-- 
Peter Stephenson <pws@ibmth.df.unipi.it>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarotti 2, 56100 Pisa, Italy