zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: 3.1.5 - (Sven) Case-insensitive globbing
@ 1998-10-31 10:14 Bart Schaefer
  1998-11-02  9:21 ` Zefram
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Schaefer @ 1998-10-31 10:14 UTC (permalink / raw)
  To: zsh-workers

Sven's patch for case-insensitive globbing, applied to 3.1.5.  This does NOT
include any of the enhanced compctl stuff that makes use of it.

Index: Doc/Zsh/expn.yo
===================================================================
diff -u -r1.1.1.2 -r1.8
--- expn.yo	1998/10/30 15:56:51	1.1.1.2
+++ expn.yo	1998/10/30 17:52:39	1.8
@@ -1010,6 +1011,14 @@
 item(tt(D))(
 sets the tt(GLOB_DOTS) option for the current pattern
 pindex(GLOB_DOTS, setting in pattern)
+)
+item(tt(f))(
+makes lower case letters in the pattern match themselves and the
+corresponding uppercase letter
+)
+item(tt(F))(
+makes all letters match themselves and their uppercase or lowercase
+counterpart
 )
 enditem()
 
Index: Src/glob.c
===================================================================
diff -u -r1.1.1.2 -r1.4
--- glob.c	1998/10/30 15:57:02	1.1.1.2
+++ glob.c	1998/10/30 17:52:44	1.4
@@ -81,6 +81,7 @@
 static int qualct, qualorct;
 static int range, amc, units;
 static int gf_nullglob, gf_markdirs, gf_noglobdots, gf_listtypes, gf_follow;
+static int gf_case = 0;
 
 /* Prefix, suffix for doing zle trickery */
 
@@ -868,6 +869,7 @@
     Complist q;				/* pattern after parsing         */
     char *ostr = (char *)getdata(np);	/* the pattern before the parser */
 					/* chops it up                   */
+    int gfc = 0;			/* case insensitive?             */
 
     MUSTUSEHEAP("glob");
     if (unset(GLOBOPT) || !haswilds(ostr)) {
@@ -1206,6 +1208,12 @@
 			    ++s;
 			data = qgetnum(&s);
 			break;
+		    case 'f':
+			gfc = 1;
+			break;
+		    case 'F':
+			gfc = 2;
+			break;
 
 		    default:
 			zerr("unknown file attribute", NULL, 0);
@@ -1262,7 +1270,9 @@
 
     /* The actual processing takes place here: matches go into  *
      * matchbuf.  This is the only top-level call to scanner(). */
+    gf_case = gfc;
     scanner(q);
+    gf_case = 0;
 
     /* Deal with failures to match depending on options */
     if (matchct)
@@ -2489,7 +2499,9 @@
 	    }
 	    continue;
 	}
-	if (*pptr == *pat) {
+	if (*pptr == *pat ||
+	    (gf_case == 1 ? (islower(*pat) && tuupper(*pat) == *pptr) :
+	     (gf_case == 2 ? (tulower(*pat) == tulower(*pptr)) : 0))) {
 	    /* just plain old characters */
 	    pptr++;
 	    pat++;

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-10-31 10:14 PATCH: 3.1.5 - (Sven) Case-insensitive globbing Bart Schaefer
@ 1998-11-02  9:21 ` Zefram
  1998-11-02 17:07   ` Peter Stephenson
  0 siblings, 1 reply; 21+ messages in thread
From: Zefram @ 1998-11-02  9:21 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
>Sven's patch for case-insensitive globbing, applied to 3.1.5.

Ah yes, I meant to mention this when announcing the new version.
This patch is not going into the baseline as is.  Case insensitivity is
a property of pattern matching, not filename generation.  Therefore the
syntax to control case sensitivity should be part of the glob pattern
syntax, rather than part of the glob qualifiers.  Preferably, it should
be possible to localise case insensitivity to an arbitrary subpattern,
rather than only to the entire pattern.

If someone comes up with a patch for case insensitive pattern matching
of the form I have just described, I'll probably put it into the baseline.

-zefram


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-02  9:21 ` Zefram
@ 1998-11-02 17:07   ` Peter Stephenson
  1998-11-02 17:45     ` Bruce Stephens
  1998-11-02 18:06     ` Zefram
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Stephenson @ 1998-11-02 17:07 UTC (permalink / raw)
  To: Zsh hackers list

"Zefram" wrote:
> Case insensitivity is
> a property of pattern matching, not filename generation.  Therefore the
> syntax to control case sensitivity should be part of the glob pattern
> syntax, rather than part of the glob qualifiers.  Preferably, it should
> be possible to localise case insensitivity to an arbitrary subpattern,
> rather than only to the entire pattern.
> 
> If someone comes up with a patch for case insensitive pattern matching
> of the form I have just described, I'll probably put it into the baseline.

I've got two possible implentations to propose (I have them both
working, the differences aren't so great).  Both are based on the way
it's done in perl 5:  the closure operator, in our case #, at the
start of a group signifies that flags follow.  This doesn't clash with
any existing syntax.  Obviously you need EXTENDED_GLOB set.

Syntax 1       Syntax 2
(#ifoo)bar     ((#i)foo)bar    match FOObar FoObar fOobar, not FOOBAR
bar(#ifoo)     bar(#i)foo      same with the bits the other way round
(#lfooBAR)     (#l)fooBAR      match FOOBAR FoOBAR fOoBAR, not foobar
(#ifoo(#cbar)) (#i)foo(#c)bar  same as first example; #c negates i or l

So in the first case, only the #X is the flag and grouping is normal,
while in the second case the whole of (#X) is the flag and doesn't
mark a separate group.  In both cases the effect stays until the end
of the nearest enclosing group.

#s (for significant) could be an alternative to #c; #l corresponds to
Sven's (f) qualifier, i.e. only lower case letters in the pattern
match case-insensitively in the target string.

I think I find the second version (which is also more perl-like) a
bit cleaner.  The only real bind with this is with KSH_GLOB, where the
second set of examples would have to become @(@(#i)foo)bar,
@(#l)fooBAR and @(#i)foo@(#c)bar.  (Actually I'm lying, because the
shell doesn't need the @ if it comes across the left parenthesis
before anything else, so you can drop the first @ in each case, but
this is deliberately undocumented.)

One point about this is that you need to turn on case-insensitivity at
any segment of the path where you need it:

/(#i)foo/(#i)bar       to match /FoO/BaR, /foo/BAR, /FOO/bar, ...

I think this is OK: it mirrors what the shell is really doing --- as
the file system is case sensitive, it has to do separate searches in
each directory.  Here the second syntax is definitely clearer.  If
someone wants to propose a way of turning on case-insensitivity for
all parts of the path --- which means doing globbing on every segment
so is slow --- I'll listen.

I will post the patch if there's any positive response to either of these.

-- 
Peter Stephenson <pws@ibmth.df.unipi.it>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarotti 2, 56100 Pisa, Italy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-02 17:07   ` Peter Stephenson
@ 1998-11-02 17:45     ` Bruce Stephens
  1998-11-02 18:06     ` Zefram
  1 sibling, 0 replies; 21+ messages in thread
From: Bruce Stephens @ 1998-11-02 17:45 UTC (permalink / raw)
  To: Zsh hackers list

Peter Stephenson <pws@ibmth.df.unipi.it> writes:

> This doesn't clash with any existing syntax.  Obviously you need
> EXTENDED_GLOB set.
> 
> Syntax 1       Syntax 2
> (#ifoo)bar     ((#i)foo)bar    match FOObar FoObar fOobar, not FOOBAR
> bar(#ifoo)     bar(#i)foo      same with the bits the other way round
> (#lfooBAR)     (#l)fooBAR      match FOOBAR FoOBAR fOoBAR, not foobar
> (#ifoo(#cbar)) (#i)foo(#c)bar  same as first example; #c negates i or l
> 
> So in the first case, only the #X is the flag and grouping is normal,
> while in the second case the whole of (#X) is the flag and doesn't
> mark a separate group.  In both cases the effect stays until the end
> of the nearest enclosing group.
> 
> #s (for significant) could be an alternative to #c; #l corresponds to
> Sven's (f) qualifier, i.e. only lower case letters in the pattern
> match case-insensitively in the target string.
> 
> I think I find the second version (which is also more perl-like) a
> bit cleaner.

I prefer the second versions too.  What would really make it
compelling, of course, would be other flags that you might want to use
(when the first syntax could get tricky and ambiguous).  

I can only think of one candidate at present: ignore dots.  #d, say.
Then, a single pattern could match README, READ.ME, Read.Me and so on:
(#di)readme.

But my example is strained, I don't really suggest that it would be a
good idea.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-02 17:07   ` Peter Stephenson
  1998-11-02 17:45     ` Bruce Stephens
@ 1998-11-02 18:06     ` Zefram
  1998-11-03  8:12       ` Sven Wischnowsky
  1 sibling, 1 reply; 21+ messages in thread
From: Zefram @ 1998-11-02 18:06 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

Peter Stephenson wrote:
>I've got two possible implentations to propose (I have them both
>working, the differences aren't so great).  Both are based on the way
>it's done in perl 5:  the closure operator, in our case #, at the
>start of a group signifies that flags follow.  This doesn't clash with
>any existing syntax.  Obviously you need EXTENDED_GLOB set.

Good.  I'd considered this extension mechanism, but for some reason
dismissed it as impractical -- maybe I was confused by the dependence
on EXTENDED_GLOB.

>Syntax 1       Syntax 2
>(#ifoo)bar     ((#i)foo)bar    match FOObar FoObar fOobar, not FOOBAR
>bar(#ifoo)     bar(#i)foo      same with the bits the other way round
>(#lfooBAR)     (#l)fooBAR      match FOOBAR FoOBAR fOoBAR, not foobar
>(#ifoo(#cbar)) (#i)foo(#c)bar  same as first example; #c negates i or l

Let's go for the more Perl-like syntax.  I think your syntax 1 is slightly
more logical, but the difference is minimal so I think it would be wise
to follow precedent.

>#s (for significant) could be an alternative to #c; #l corresponds to
>Sven's (f) qualifier, i.e. only lower case letters in the pattern
>match case-insensitively in the target string.

If there were just two senses to the flag, I'd argue for #i and #I (#I
being the opposite of #i).  In this case perhaps #i, #l and #I could be
used.  I'd prefer a better mnemonic for the one-way case insensitivity,
though.

>              The only real bind with this is with KSH_GLOB, where the
>second set of examples would have to become @(@(#i)foo)bar,
>@(#l)fooBAR and @(#i)foo@(#c)bar.

Considering the circumstances under which KSH_GLOB will be used,
I don't think that making it pleasant to mix with EXTENDED_GLOB is a
major consideration.

>shell doesn't need the @ if it comes across the left parenthesis
>before anything else, so you can drop the first @ in each case, but
>this is deliberately undocumented.)

It's not explicitly documented, but it's intentional, and should be
derivable from the documentation.  Basically, KSH_GLOB doesn't turn off
the effects of "("; it just makes certain characters special immediately
before a "(".

>One point about this is that you need to turn on case-insensitivity at
>any segment of the path where you need it:

I'm rather dubious about this.  () grouping doesn't have to be on a
component-by-component basis; I think these modifiers' effects should
last up to the end of the textual group, even if this spans multiple
pathname components.  It's the principle of least surprise.

-zefram


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
@ 1998-11-03  8:12       ` Sven Wischnowsky
  1998-11-03 12:22         ` Bruce Stephens
       [not found]         ` <MLIST_vbn269dkyw.fsf@snake.isode.com>
  0 siblings, 2 replies; 21+ messages in thread
From: Sven Wischnowsky @ 1998-11-03  8:12 UTC (permalink / raw)
  To: zsh-workers


Bruce Stephens wrote:

> 
> Peter Stephenson <pws@ibmth.df.unipi.it> writes:
> 
> > This doesn't clash with any existing syntax.  Obviously you need
> > EXTENDED_GLOB set.
> > 
> > Syntax 1       Syntax 2
> > (#ifoo)bar     ((#i)foo)bar    match FOObar FoObar fOobar, not FOOBAR
> > bar(#ifoo)     bar(#i)foo      same with the bits the other way round
> > (#lfooBAR)     (#l)fooBAR      match FOOBAR FoOBAR fOoBAR, not foobar
> > (#ifoo(#cbar)) (#i)foo(#c)bar  same as first example; #c negates i or l
> > 
> > So in the first case, only the #X is the flag and grouping is normal,
> > while in the second case the whole of (#X) is the flag and doesn't
> > mark a separate group.  In both cases the effect stays until the end
> > of the nearest enclosing group.
> > 
> > #s (for significant) could be an alternative to #c; #l corresponds to
> > Sven's (f) qualifier, i.e. only lower case letters in the pattern
> > match case-insensitively in the target string.
> > 
> > I think I find the second version (which is also more perl-like) a
> > bit cleaner.
> 
> I prefer the second versions too.  What would really make it
> compelling, of course, would be other flags that you might want to use
> (when the first syntax could get tricky and ambiguous).  
> 
> I can only think of one candidate at present: ignore dots.  #d, say.
> Then, a single pattern could match README, READ.ME, Read.Me and so on:
> (#di)readme.
> 
> But my example is strained, I don't really suggest that it would be a
> good idea.

I completely agree, there is a whole new set of globbing options
on the horizon ;-)

About the `options for the whole path' thing (which I would like to
have, too): why not use a generic approach, like the `^' and `-' glob
modifiers, i.e. `(#i)' works on the current path component, probably
only up to the next `(#...)' and `(#/i)' works on this and all
following components (until switched off again).

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03  8:12       ` Sven Wischnowsky
@ 1998-11-03 12:22         ` Bruce Stephens
  1998-11-03 12:47           ` Bruce Stephens
       [not found]         ` <MLIST_vbn269dkyw.fsf@snake.isode.com>
  1 sibling, 1 reply; 21+ messages in thread
From: Bruce Stephens @ 1998-11-03 12:22 UTC (permalink / raw)
  To: zsh-workers

Sven Wischnowsky <wischnow@informatik.hu-berlin.de> writes:

> Bruce Stephens wrote:

> > I can only think of one candidate at present: ignore dots.  #d, say.
> > Then, a single pattern could match README, READ.ME, Read.Me and so on:
> > (#di)readme.
> > 
> > But my example is strained, I don't really suggest that it would be a
> > good idea.
> 
> I completely agree, there is a whole new set of globbing options
> on the horizon ;-)
> 
> About the `options for the whole path' thing (which I would like to
> have, too): why not use a generic approach, like the `^' and `-' glob
> modifiers, i.e. `(#i)' works on the current path component, probably
> only up to the next `(#...)' and `(#/i)' works on this and all
> following components (until switched off again).

Yes, maybe.  I thought of another example: approximate matching.

Approximate matching could either use the auto-correct code, or could
use something like whatever agrep uses.  In the latter case, it would
have an optional integer parameter too, so "(#a1)readme" would match
"Readme" and "read.me", but to match "read", you'd need "(#a2)readme".

Hmm, maybe this could provide a way to configure the autocorrection
feature too?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 12:22         ` Bruce Stephens
@ 1998-11-03 12:47           ` Bruce Stephens
  1998-11-03 15:01             ` Zefram
  0 siblings, 1 reply; 21+ messages in thread
From: Bruce Stephens @ 1998-11-03 12:47 UTC (permalink / raw)
  To: zsh-workers

Bruce Stephens <b.stephens@isode.com> writes:

> Yes, maybe.  I thought of another example: approximate matching.
> 
> Approximate matching could either use the auto-correct code, or could
> use something like whatever agrep uses.  In the latter case, it would
> have an optional integer parameter too, so "(#a1)readme" would match
> "Readme" and "read.me", but to match "read", you'd need "(#a2)readme".
> 
> Hmm, maybe this could provide a way to configure the autocorrection
> feature too?

This would provide useful implementation parts anyway---given some way
of specifying a function that could operate on a possibly miss-spelled
word, approximate globbing would provide a natural way of generating a
sensible alternative (or alternatives).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 12:47           ` Bruce Stephens
@ 1998-11-03 15:01             ` Zefram
  1998-11-03 15:27               ` Bruce Stephens
  0 siblings, 1 reply; 21+ messages in thread
From: Zefram @ 1998-11-03 15:01 UTC (permalink / raw)
  To: zsh-workers

>Approximate matching could either use the auto-correct code, or could
>use something like whatever agrep uses.  In the latter case, it would
>have an optional integer parameter too, so "(#a1)readme" would match
>"Readme" and "read.me", but to match "read", you'd need "(#a2)readme".

Great idea.  If this can be implemented cleanly, it's in.  Let's be the
first shell with approximate pattern matching (and approximate globbing)
as a standard feature.

>Hmm, maybe this could provide a way to configure the autocorrection
>feature too?

Maybe.  The autocorrection could be implemented as a shell function,
which by default uses fuzzy globbing to generate corrections.  Is this
what you have in mind?

-zefram


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 15:01             ` Zefram
@ 1998-11-03 15:27               ` Bruce Stephens
  0 siblings, 0 replies; 21+ messages in thread
From: Bruce Stephens @ 1998-11-03 15:27 UTC (permalink / raw)
  To: zsh-workers

"Zefram" <zefram@tao.co.uk> writes:

> >Hmm, maybe this could provide a way to configure the autocorrection
> >feature too?
> 
> Maybe.  The autocorrection could be implemented as a shell function,
> which by default uses fuzzy globbing to generate corrections.  Is
> this what you have in mind?

Yes.  And, if autocorrection used completion-type widgets/functions
which knew their context, then they could allow different amounts of
fuzziness.

For example, there are lots of cases where I want to give filenames
which don't exist, so (#a1) would be appropriate, but I rarely want to
(interactively) expand a variable which doesn't exist, so it might
make sense to try harder to find a variable name which exists.  (This
isn't file globbing, of course.)

And such patterns might be useful as a last-resort in completion too.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
       [not found]         ` <MLIST_vbn269dkyw.fsf@snake.isode.com>
@ 1998-11-03 18:09           ` Jarkko Hietaniemi
  1998-11-03 18:54             ` Zefram
  0 siblings, 1 reply; 21+ messages in thread
From: Jarkko Hietaniemi @ 1998-11-03 18:09 UTC (permalink / raw)
  To: Bruce Stephens; +Cc: wischnow, zsh-workers


: Approximate matching could either use the auto-correct code, or could
: use something like whatever agrep uses.  In the latter case, it would
: have an optional integer parameter too, so "(#a1)readme" would match
: "Readme" and "read.me", but to match "read", you'd need "(#a2)readme".

<unlurk>

Hey!  This is getting interesting...I may be able to offer my help to
the zsh community (after a long slumber).

It so happens that I have been slowly implementing from scratch the
approximate matching algorithm(s) used in agrep.  My intention has
been to encapsulate them into a nice library.  I tried doing that from
the agrep 2.04 (the version that comes with Glimpse, not the version
at ftp.cs.arizona.edu), but that's a little bit hard: agrep as it now
stands uses dozens of global variables, is not re-entrant, has certain
hard-coded limitations, et cetera.

There are several different algorithms used in agrep: Boyer-Moore,
vanilla regexp, vanilla approximate, approximate + regexp, multistring
search.  I intend only to implement the fuzzy ones and for filename
globbing the vanilla approximate algorithm should be enough.

I am implementing the library under the Artistic License (available
from the Perl distribution) so there should be no trouble including it
into zsh, right?

Of course, if you need the approximate matching code, like, today or
tomorrow, sorry, no can do, I am rather busy with other projects.  If
I really work on it it will take couple of weeks to implement and test
(benchmark and regress-test it against agrep).

Interested?

P.S.  To further confuse the issue of filename globbing: ta-dah, yet
another idea! :-) But first off: sorry if the following feature
already exists in zsh --- I haven't been following the development
(aka the feeping creaturism :-) of zsh that closely lately.

The feature: "union directories" aka "multiple working directories"
aka "virtual directories".  Instead of having just a single directory
to glob/expand filenames from, how about having several of them: "a
path of cwds".  Yes, this may be *really* confusing -- but I often
find myself at least temporarily wishing for something like that.
There are open issues like for example what to do when several
identically named files match: just take the first one / take all of
them / abort and tell that there are several possibilities / other?

-- 
$jhi++; # http://www.iki.fi/~jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 18:09           ` Jarkko Hietaniemi
@ 1998-11-03 18:54             ` Zefram
  1998-11-03 19:14               ` Jarkko Hietaniemi
  0 siblings, 1 reply; 21+ messages in thread
From: Zefram @ 1998-11-03 18:54 UTC (permalink / raw)
  To: Jarkko Hietaniemi; +Cc: b.stephens, wischnow, zsh-workers

Jarkko Hietaniemi wrote:
>I am implementing the library under the Artistic License (available
>from the Perl distribution) so there should be no trouble including it
>into zsh, right?

The Artistic License restricts the right to redistribute modified versions
more than the zsh license does.  Perl itself is actually released under
two licenses simultaneously, the Artistic License and the GPL; the GPL
gives total permission to redistribute modified versions, as does the
zsh license.  I would not be happy with part of the zsh distribution
having the more restrictive conditions of the Artistic License.

>Interested?

I'm concerned about how it would be integrated into zsh's globbing code.
I doubt that we can actually use an external library for fuzzy matching,
at least in the cases where the glob pattern contains more than just
literal text.  But others here should be able to give a more definitive
opinion on this.

>The feature: "union directories" aka "multiple working directories"
>aka "virtual directories".

That's an OS issue.  The shell really mustn't fiddle with the semantics of
the filesystem.  Not to mention that it *will* get people into trouble --
no one will expect "rm *" to affect more than one directory.

-zefram


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 18:54             ` Zefram
@ 1998-11-03 19:14               ` Jarkko Hietaniemi
  1998-11-03 19:27                 ` Zefram
  0 siblings, 1 reply; 21+ messages in thread
From: Jarkko Hietaniemi @ 1998-11-03 19:14 UTC (permalink / raw)
  To: Zefram; +Cc: Jarkko Hietaniemi, b.stephens, wischnow, zsh-workers


Zefram writes:
 > Jarkko Hietaniemi wrote:
 > >I am implementing the library under the Artistic License (available
 > >from the Perl distribution) so there should be no trouble including it
 > >into zsh, right?
 > 
 > The Artistic License restricts the right to redistribute modified versions
 > more than the zsh license does.  Perl itself is actually released under
 > two licenses simultaneously, the Artistic License and the GPL; the GPL
 > gives total permission to redistribute modified versions, as does the
 > zsh license.  I would not be happy with part of the zsh distribution
 > having the more restrictive conditions of the Artistic License.

That's not an actual problem.  I have no problems releasing it almost
under any free-ish license EXCEPT the GPL.  I don't like that one.

 > >Interested?
 > 
 > I'm concerned about how it would be integrated into zsh's globbing code.
 > I doubt that we can actually use an external library for fuzzy matching,
 > at least in the cases where the glob pattern contains more than just
 > literal text.  But others here should be able to give a more definitive
 > opinion on this.

Whatever.  I will implement mine as a generic library.  Do what you will.

 > >The feature: "union directories" aka "multiple working directories"
 > >aka "virtual directories".
 > 
 > That's an OS issue.  The shell really mustn't fiddle with the semantics of
 > the filesystem.  Not to mention that it *will* get people into trouble --

I would say the proposed case-blind option is very much fiddling with
the semantics of the filesystem.

 > no one will expect "rm *" to affect more than one directory.

I don't remember anyone saying this new option would be the default.

 > -zefram
-- 
$jhi++; # http://www.iki.fi/~jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 19:14               ` Jarkko Hietaniemi
@ 1998-11-03 19:27                 ` Zefram
  1998-11-03 19:36                   ` Jarkko Hietaniemi
  0 siblings, 1 reply; 21+ messages in thread
From: Zefram @ 1998-11-03 19:27 UTC (permalink / raw)
  To: jhi; +Cc: jhi, b.stephens, wischnow, zsh-workers

Jarkko Hietaniemi wrote:
>That's not an actual problem.  I have no problems releasing it almost
>under any free-ish license EXCEPT the GPL.  I don't like that one.

That's OK then.  Obviously, for code to go into the zsh distribution,
it would be most preferable for it to be released under the existing
zsh license.

>I would say the proposed case-blind option is very much fiddling with
>the semantics of the filesystem.

No, that's an additional option in glob patterns.  Doesn't affect the
apparent behaviour of the filesystem at all.  It only affects pattern
matching, and then only when one explicitly requests it (with (#i)).
One can already get multiple-directory globbing by writing a glob pattern
like (foo|bar)/baz.

-zefram


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 19:27                 ` Zefram
@ 1998-11-03 19:36                   ` Jarkko Hietaniemi
  1998-11-04 18:48                     ` Bart Schaefer
  1998-11-06  9:24                     ` Approximate matching Bart Schaefer
  0 siblings, 2 replies; 21+ messages in thread
From: Jarkko Hietaniemi @ 1998-11-03 19:36 UTC (permalink / raw)
  To: Zefram; +Cc: jhi, jhi, b.stephens, wischnow, zsh-workers


Zefram writes:
 > Jarkko Hietaniemi wrote:
 > >That's not an actual problem.  I have no problems releasing it almost
 > >under any free-ish license EXCEPT the GPL.  I don't like that one.
 > 
 > That's OK then.  Obviously, for code to go into the zsh distribution,
 > it would be most preferable for it to be released under the existing
 > zsh license.

But, but, but...I don't want my code to be married with zsh, neither....

Hmmm.  Because it's my code I can do several releases of it and as
many releases with whatever licenses I happen to fancy at the
particular moment in time...be polygamous.  So I guess I could make
separate "zsh-compliant" release(s) of the code, with zsh license and
everything.  When finish the code in the first place, that is...I can
even try to build some rudimentary compatibility with the zsh globbing
code, if that helps -- and if I have the time.

 > >I would say the proposed case-blind option is very much fiddling with
 > >the semantics of the filesystem.
 > 
 > No, that's an additional option in glob patterns.  Doesn't affect the
 > apparent behaviour of the filesystem at all.  It only affects pattern
 > matching, and then only when one explicitly requests it (with (#i)).
 > One can already get multiple-directory globbing by writing a glob pattern
 > like (foo|bar)/baz.

See?  I said that I'm not up to speed with the zsh features...

And I still can pull another joker from my sleeve: cdpath is
definitely fiddling with the fs semantics :-)

 > -zefram
-- 
$jhi++; # http://www.iki.fi/~jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
  1998-11-03 19:36                   ` Jarkko Hietaniemi
@ 1998-11-04 18:48                     ` Bart Schaefer
  1998-11-05  9:26                       ` PATCH: 3.1.5: Case-insensitive globbing (2) Peter Stephenson
  1998-11-06  9:24                     ` Approximate matching Bart Schaefer
  1 sibling, 1 reply; 21+ messages in thread
From: Bart Schaefer @ 1998-11-04 18:48 UTC (permalink / raw)
  To: Zsh hackers list

Skipping a lot of conversation here ...

On Nov 2,  6:07pm, Peter Stephenson wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} "Zefram" wrote:
} > Case insensitivity is
} > a property of pattern matching, not filename generation.  Therefore the
} > syntax to control case sensitivity should be part of the glob pattern

I've decided that I agree with that.

} I've got two possible implentations to propose (I have them both
} working, the differences aren't so great).

I agree with the choice that's been made of which one to use.

On Nov 2,  6:06pm, Zefram wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} If there were just two senses to the flag, I'd argue for #i and #I (#I
} being the opposite of #i).  In this case perhaps #i, #l and #I could be
} used.  I'd prefer a better mnemonic for the one-way case insensitivity,
} though.

I agree about the mnemonic for one-way, but I think what PWS did is fine
(based on the description at the top of the patch message, I haven't had
a chance to actually try it yet).

On Nov 3,  9:12am, Sven Wischnowsky wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} About the `options for the whole path' thing (which I would like to
} have, too): why not use a generic approach, like the `^' and `-' glob
} modifiers, i.e. `(#i)' works on the current path component, probably
} only up to the next `(#...)' and `(#/i)' works on this and all
} following components (until switched off again).

I think that's getting a bit too complicated.

However, I'm curious how (#i) interacts with parenthesis for grouping.
For example:

zsh% echo zsh-3.1.5/((#i)src/zle|doc/zsh)/make*

Where does the case-insensitivity stop in that expression?  It would be
logical for it to stop at either the vertical bar or the close paren.

BTW, is there going to be a parsing conflict between things like
((#i)src) and math expressions in (( ))?  How do you get the desired
glob behavior if there is?

I'm going to have to catch up on the autocorrect and approximate matching
part of this discussion later, I'm out of time right now.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* PATCH: 3.1.5: Case-insensitive globbing (2)
  1998-11-04 18:48                     ` Bart Schaefer
@ 1998-11-05  9:26                       ` Peter Stephenson
  1998-11-05 18:15                         ` Bart Schaefer
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 1998-11-05  9:26 UTC (permalink / raw)
  To: Zsh hackers list

(The necessity for the extra patch turned up while I was thinking
about these examples, see below.)

"Bart Schaefer" wrote:
> However, I'm curious how (#i) interacts with parenthesis for grouping.
> For example:
> 
> zsh% echo zsh-3.1.5/((#i)src/zle|doc/zsh)/make*
> 
> Where does the case-insensitivity stop in that expression?  It would be
> logical for it to stop at either the vertical bar or the close paren.

To the best of my knowedge, zsh has never allowed grouping to cross
directories, i.e. not even (Src/Zle) is supported.  You'll find that
everything between the / and the next ) is ignored.  This is because
the path segment code, which is separate from the pattern matching
code, has never been rewritten to understand grouping.  So the
question has never (yet) arisen.  It's always seemed like a long job
to fix this, so I for one have never thought seriously about it.
Documenting it might be an idea.

In the case above,

% echo zsh-3.1.5/(#i){src/zle,doc/zsh}/(#I)make*

should work if you know there are files matching both branches of the
{...}.  Whoops, there's a minor bug:  (#I) wasn't turned off for the
`make' string.  The patch fixes that.

> BTW, is there going to be a parsing conflict between things like
> ((#i)src) and math expressions in (( ))?  How do you get the desired
> glob behavior if there is?

The conflict only applies to words in command position, either on the
line itself or at the start of a $(...) where there is a confusion
with $((...)).  It turns out that ((#i)src) at the start of the line
is interpreted as starting two subshells, followed by a comment.

However, once again it looks like even the existing behaviour is a bit
counter-intuitive:

% /(bin|var)/false
zsh: permission denied: /

It seems zsh treats (and has always treated) left parentheses inside the
command word differently.  When you think about it, it's probably
obvious because it's expected to parse a whole command string at once,
so if it's sensitive to a `(' at the start it will be anywhere else in
the word.

So you can't have any form of grouping in the command word.  But is it
a good idea?  You don't know beforehand what command it's going to
run.

*** Src/glob.c.ci2	Thu Nov  5 10:20:04 1998
--- Src/glob.c	Thu Nov  5 10:17:58 1998
***************
*** 579,585 ****
  		return NULL;
  	    if (eptr == cstr) {
  		/* if no string yet, carry on and get one. */
! 		c->stat |= addflags;
  		cstr = pptr;
  		continue;
  	    }
--- 579,585 ----
  		return NULL;
  	    if (eptr == cstr) {
  		/* if no string yet, carry on and get one. */
! 		c->stat = addflags;
  		cstr = pptr;
  		continue;
  	    }
-- 
Peter Stephenson <pws@ibmth.df.unipi.it>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarotti 2, 56100 Pisa, Italy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5: Case-insensitive globbing (2)
  1998-11-05  9:26                       ` PATCH: 3.1.5: Case-insensitive globbing (2) Peter Stephenson
@ 1998-11-05 18:15                         ` Bart Schaefer
  1998-11-06 11:01                           ` PATCH: 3.1.5: doc fix, was re: Case-insensitive globbing Peter Stephenson
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Schaefer @ 1998-11-05 18:15 UTC (permalink / raw)
  To: Zsh hackers list

On Nov 5, 10:26am, Peter Stephenson wrote:
} Subject: PATCH: 3.1.5: Case-insensitive globbing (2)
}
} "Bart Schaefer" wrote:
} > However, I'm curious how (#i) interacts with parenthesis for grouping.
} > For example:
} > 
} > zsh% echo zsh-3.1.5/((#i)src/zle|doc/zsh)/make*
} > 
} > Where does the case-insensitivity stop in that expression?  It would be
} > logical for it to stop at either the vertical bar or the close paren.
} 
} To the best of my knowedge, zsh has never allowed grouping to cross
} directories, i.e. not even (Src/Zle) is supported.

Well, ok, then ... drop the / from the example:

zsh% echo zsh-3.1.5/((#i)src|doc)/make*

Should the case-insensitivity end at the )/ or not?  (I just got these
patches compiled, and presently it does stop at close of group, which I
think is good.)

} everything between the / and the next ) is ignored. [...]
} Documenting it might be an idea.

Yes.

} % /(bin|var)/false
} zsh: permission denied: /
} 
} It seems zsh treats (and has always treated) left parentheses inside the
} command word differently.

Hm.  I don't think grouping needs to work inside the command word, but
there should not be an implicit word break before the paren:

zagzig<7> echo(config.|stamp-)h
config.h stamp-h

Rather I'd expect to see some sort of parse error, or command-not-found.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Approximate matching
  1998-11-03 19:36                   ` Jarkko Hietaniemi
  1998-11-04 18:48                     ` Bart Schaefer
@ 1998-11-06  9:24                     ` Bart Schaefer
  1 sibling, 0 replies; 21+ messages in thread
From: Bart Schaefer @ 1998-11-06  9:24 UTC (permalink / raw)
  To: zsh-workers

On Nov 3, 12:22pm, Bruce Stephens wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} Approximate matching could either use the auto-correct code, or could
} use something like whatever agrep uses.  In the latter case, it would
} have an optional integer parameter too, so "(#a1)readme" would match
} "Readme" and "read.me", but to match "read", you'd need "(#a2)readme".
} 
} Hmm, maybe this could provide a way to configure the autocorrection
} feature too?

On Nov 3,  3:01pm, Zefram wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} Maybe.  The autocorrection could be implemented as a shell function,
} which by default uses fuzzy globbing to generate corrections.  Is this
} what you have in mind?

I rambled on about this back in
	http://www.zsh.org/mla/workers/1996/msg01707.html
during a thread about tcsh-style correction (which can be invoked in a
completion-like manner).

It should be relatively simple to provide a completion interface to
spckword(), which could then be used as the default for this operation.
A fuzzy matching function could then be added as a user-defined widget
if desired (and spckword() made to call it, if we give it a predefined
widget name or some such).

On Nov 3,  8:09pm, Jarkko Hietaniemi wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
} 
} It so happens that I have been slowly implementing from scratch the
} approximate matching algorithm(s) used in agrep.  My intention has
} been to encapsulate them into a nice library. [...]

On Nov 3,  6:54pm, Zefram wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} I'm concerned about how it would be integrated into zsh's globbing code.
} I doubt that we can actually use an external library for fuzzy matching,
} at least in the cases where the glob pattern contains more than just
} literal text.  But others here should be able to give a more definitive
} opinion on this.

I don't know all that much about zsh's globbing code, but I've written
both globbers and regex packages and in general they don't mix well.  I
doubt, for example, that we want to convert glob patterns into regexes
to pass them through an agrep library; and globbing usually outperforms
regular expressions on very short strings like file names.

That isn't to say that the fuzziness algorithms wouldn't be adaptable,
but it limits the kinds of library interfaces would be useful to zsh.

On Nov 3,  9:36pm, Jarkko Hietaniemi wrote:
} Subject: Re: PATCH: 3.1.5 - (Sven) Case-insensitive globbing
}
} And I still can pull another joker from my sleeve: cdpath is
} definitely fiddling with the fs semantics :-)

No, it isn't; not any more than searching for executables in $PATH is.

Of course, I wouldn't say that ANY of the things (most elided) that were
mentioned in this little side-discussion are actually "fiddling with fs
semantics."  Ever heard of conditional symbolic links?  The Sony NeWS OS,
a BSD 4.4 derivative that I was running for a while many years ago, had
symbolic links that could change what they pointed at based on the value
of an environment variable.  THAT is fiddling with the fs semantics.  It
also is probably the biggest security hole anybody ever introduced into
any unix platform.

Having * match files in multiple unrelated directories is fiddling with
a universal unix shell globbing mechanism in a similarly dangerous way,
but it would have to be implemented by actually generating the paths to
any files not in the current directory.  If you passed the result to
"echo" you'd see those paths.  If there were some magical way to hide
the paths yet still have external commands access the files, THEN it
would be "fiddling with fs semantics."

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* PATCH: 3.1.5: doc fix, was re: Case-insensitive globbing
  1998-11-05 18:15                         ` Bart Schaefer
@ 1998-11-06 11:01                           ` Peter Stephenson
  1998-11-06 13:43                             ` Bruce Stephens
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 1998-11-06 11:01 UTC (permalink / raw)
  To: Zsh hackers list

"Bart Schaefer" wrote:
> zsh% echo zsh-3.1.5/((#i)src|doc)/make*
> 
> Should the case-insensitivity end at the )/ or not?  (I just got these
> patches compiled, and presently it does stop at close of group, which I
> think is good.)

This is supposed to be explicit in the documentation in the original
patch.

> } everything between the / and the next ) is ignored. [...]
> } Documenting it might be an idea.
> 
> Yes.

The patch below adds a few lines mentioning the limitation.

> Hm.  I don't think grouping needs to work inside the command word, but
> there should not be an implicit word break before the paren:
> 
> zagzig<7> echo(config.|stamp-)h
> config.h stamp-h
> 
> Rather I'd expect to see some sort of parse error, or command-not-found.

I suppose it's for people to be able to do things like `if(test...)'
without typing all those time-consuming spaces.

*** Doc/Zsh/expn.yo.group	Tue Nov  3 13:32:28 1998
--- Doc/Zsh/expn.yo	Fri Nov  6 11:54:52 1998
***************
*** 777,782 ****
--- 777,785 ----
  If the tt(KSH_GLOB) option is set, then a
  `tt(@)', `tt(*)', `tt(+)', `tt(?)' or `tt(!)' immediately preceding
  the `tt(LPAR())' is treated specially, as detailed below.
+ Note that grouping cannot currently extend over multiple directories:
+ a `tt(/)' separating a directory terminates processing of the current
+ group; processing resumes after the end of the group.
  )
  item(var(x)tt(|)var(y))(
  Matches either var(x) or var(y).

-- 
Peter Stephenson <pws@ibmth.df.unipi.it>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarotti 2, 56100 Pisa, Italy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PATCH: 3.1.5: doc fix, was re: Case-insensitive globbing
  1998-11-06 11:01                           ` PATCH: 3.1.5: doc fix, was re: Case-insensitive globbing Peter Stephenson
@ 1998-11-06 13:43                             ` Bruce Stephens
  0 siblings, 0 replies; 21+ messages in thread
From: Bruce Stephens @ 1998-11-06 13:43 UTC (permalink / raw)
  To: Zsh hackers list

Peter Stephenson <pws@ibmth.df.unipi.it> writes:

> "Bart Schaefer" wrote:

> > Hm.  I don't think grouping needs to work inside the command word, but
> > there should not be an implicit word break before the paren:
> > 
> > zagzig<7> echo(config.|stamp-)h
> > config.h stamp-h
> > 
> > Rather I'd expect to see some sort of parse error, or command-not-found.
> 
> I suppose it's for people to be able to do things like `if(test...)'
> without typing all those time-consuming spaces.

Is this a feature?  I don't like it, if it is.  

Suppose I have a few commands like foo_baa, or foo.baa, but I can
never remember what case the baa is supposed to be.

I think it would be cool to be able to use case insensitive globbing
even on the command word.  It certainly feels ugly to have a special
case like this.  

I can see why you might want special cases for builtin bits of syntax
like if, while, etc., I suppose (just to avoid all that time-wasting
whitespace), but it still feels ugly.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~1998-11-06 13:50 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-10-31 10:14 PATCH: 3.1.5 - (Sven) Case-insensitive globbing Bart Schaefer
1998-11-02  9:21 ` Zefram
1998-11-02 17:07   ` Peter Stephenson
1998-11-02 17:45     ` Bruce Stephens
1998-11-02 18:06     ` Zefram
1998-11-03  8:12       ` Sven Wischnowsky
1998-11-03 12:22         ` Bruce Stephens
1998-11-03 12:47           ` Bruce Stephens
1998-11-03 15:01             ` Zefram
1998-11-03 15:27               ` Bruce Stephens
     [not found]         ` <MLIST_vbn269dkyw.fsf@snake.isode.com>
1998-11-03 18:09           ` Jarkko Hietaniemi
1998-11-03 18:54             ` Zefram
1998-11-03 19:14               ` Jarkko Hietaniemi
1998-11-03 19:27                 ` Zefram
1998-11-03 19:36                   ` Jarkko Hietaniemi
1998-11-04 18:48                     ` Bart Schaefer
1998-11-05  9:26                       ` PATCH: 3.1.5: Case-insensitive globbing (2) Peter Stephenson
1998-11-05 18:15                         ` Bart Schaefer
1998-11-06 11:01                           ` PATCH: 3.1.5: doc fix, was re: Case-insensitive globbing Peter Stephenson
1998-11-06 13:43                             ` Bruce Stephens
1998-11-06  9:24                     ` Approximate matching Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).