From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4244 invoked from network); 2 Nov 1998 17:30:55 -0000 Received: from math.gatech.edu (list@130.207.146.50) by ns1.primenet.com.au with SMTP; 2 Nov 1998 17:30:55 -0000 Received: (from list@localhost) by math.gatech.edu (8.9.1/8.9.1) id MAA21577; Mon, 2 Nov 1998 12:23:07 -0500 (EST) Resent-Date: Mon, 2 Nov 1998 12:23:07 -0500 (EST) Message-Id: <9811021707.AA24379@ibmth.df.unipi.it> To: zsh-workers@math.gatech.edu (Zsh hackers list) Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing In-Reply-To: ""Zefram""'s message of "Mon, 02 Nov 1998 09:21:30 NFT." <199811020921.JAA10113@diamond.tao.co.uk> Date: Mon, 02 Nov 1998 18:07:41 +0100 From: Peter Stephenson Resent-Message-ID: <"eiKNo2.0.4H5.wfUFs"@math> Resent-From: zsh-workers@math.gatech.edu X-Mailing-List: archive/latest/4503 X-Loop: zsh-workers@math.gatech.edu Precedence: list Resent-Sender: zsh-workers-request@math.gatech.edu "Zefram" wrote: > Case insensitivity is > a property of pattern matching, not filename generation. Therefore the > syntax to control case sensitivity should be part of the glob pattern > syntax, rather than part of the glob qualifiers. Preferably, it should > be possible to localise case insensitivity to an arbitrary subpattern, > rather than only to the entire pattern. > > If someone comes up with a patch for case insensitive pattern matching > of the form I have just described, I'll probably put it into the baseline. I've got two possible implentations to propose (I have them both working, the differences aren't so great). Both are based on the way it's done in perl 5: the closure operator, in our case #, at the start of a group signifies that flags follow. This doesn't clash with any existing syntax. Obviously you need EXTENDED_GLOB set. Syntax 1 Syntax 2 (#ifoo)bar ((#i)foo)bar match FOObar FoObar fOobar, not FOOBAR bar(#ifoo) bar(#i)foo same with the bits the other way round (#lfooBAR) (#l)fooBAR match FOOBAR FoOBAR fOoBAR, not foobar (#ifoo(#cbar)) (#i)foo(#c)bar same as first example; #c negates i or l So in the first case, only the #X is the flag and grouping is normal, while in the second case the whole of (#X) is the flag and doesn't mark a separate group. In both cases the effect stays until the end of the nearest enclosing group. #s (for significant) could be an alternative to #c; #l corresponds to Sven's (f) qualifier, i.e. only lower case letters in the pattern match case-insensitively in the target string. I think I find the second version (which is also more perl-like) a bit cleaner. The only real bind with this is with KSH_GLOB, where the second set of examples would have to become @(@(#i)foo)bar, @(#l)fooBAR and @(#i)foo@(#c)bar. (Actually I'm lying, because the shell doesn't need the @ if it comes across the left parenthesis before anything else, so you can drop the first @ in each case, but this is deliberately undocumented.) One point about this is that you need to turn on case-insensitivity at any segment of the path where you need it: /(#i)foo/(#i)bar to match /FoO/BaR, /foo/BAR, /FOO/bar, ... I think this is OK: it mirrors what the shell is really doing --- as the file system is case sensitive, it has to do separate searches in each directory. Here the second syntax is definitely clearer. If someone wants to propose a way of turning on case-insensitivity for all parts of the path --- which means doing globbing on every segment so is slow --- I'll listen. I will post the patch if there's any positive response to either of these. -- Peter Stephenson Tel: +39 050 844536 WWW: http://www.ifh.de/~pws/ Dipartimento di Fisica, Via Buonarotti 2, 56100 Pisa, Italy