zsh-workers
 help / color / mirror / code / Atom feed
* <n> == <n->?
@ 2000-04-03 23:09 Johan Sundström
  2000-04-04  1:16 ` Zefram
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Sundström @ 2000-04-03 23:09 UTC (permalink / raw)
  To: zsh-workers

Hi!

When upgrading from zsh 3.1.6 to 3.1.6-dev-17 (as found in the Mandrake
zsh-3.1.6dev17-1mdk rpm), I was sad to notice that the glob behaviour of
the pattern <number> had changed to something identical to what I had
earlier (and still can) specified as <number->, that is, an open range of
numbers, from number onwards.

The old behaviour is, luckily, still available as <n-n> (although with
some extra pointless repetitive typing), but I rather liked the syntactic
simplicity of <n>, especially since <-n> or <n-> both are easy typers and
good for their purposes. <n> isn't useless, if (s)he who changed its 
behaviour thought so, since it matches all the number n with any amount of
leading zeroes, a feature I have daily use for, when rummaging through
huge log directories, for instance.

Is this behaviour still there in the cvs version? On purpose? Any chance
of getting back the old style "syntactic sugar" pattern look?

/Johan Sundström, occational zsh developer, frequent zsh user and deployer


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: <n> == <n->?
  2000-04-03 23:09 <n> == <n->? Johan Sundström
@ 2000-04-04  1:16 ` Zefram
  2000-04-04 17:00   ` Johan Sundström
  0 siblings, 1 reply; 5+ messages in thread
From: Zefram @ 2000-04-04  1:16 UTC (permalink / raw)
  To: [Johan Sundstr_m]; +Cc: zsh-workers

[Johan Sundstr_m] wrote:
>When upgrading from zsh 3.1.6 to 3.1.6-dev-17 (as found in the Mandrake
>zsh-3.1.6dev17-1mdk rpm), I was sad to notice that the glob behaviour of
>the pattern <number> had changed to something identical to what I had
>earlier (and still can) specified as <number->, that is, an open range of
>numbers, from number onwards.

Hmm.

I thought we'd decided, quite some time ago, that the numeric glob syntax
was going to require a "-", to minimise ambiguity with redirection.
This is, in fact, what zshexpn(1) shows.  However, that was when the
<> operator was being introduced, so perhaps that change was limited
to making "<>" be always a redirection rather than a glob operator,
requiring "<->" for globbing.

<fx: checks>

Actually, lex.c is more lenient than that.  Anything matching
/\<[-0-9]+\>/ is initially lexed as a string rather than as operators.
However, gettokstr() has some nasties here.  Although the above grammar
applies at the beginning of a word, gettokstr() makes no such check
in the middle of a word.  As far as it's concerned, anything matching
/\<[-0-9]/ is the start of a glob operator, and it'll keep adding to
the string (past whitespace and so on) until it finds the closing ">".
Try typing "echo a<1" (and compare against "echo <1").

To complete the set, tokenize() insists on /\<[0-9]*-[0-9]*\>/.  So it
looks like it's *intended* that the "-" be required, but the lexer just
isn't actually enforcing it.  The code that actually causes "<n>" to
be treated like "<n->" is in pattern.c: it sees that it has a starting
number but no ending number, and just doesn't distinguish the two cases.

>                         <n> isn't useless, if (s)he who changed its 
>behaviour thought so, since it matches all the number n with any amount of
>leading zeroes, a feature I have daily use for, when rummaging through
>huge log directories, for instance.

"0#n" will do that (# = zero or more of the previous character).

OK.  This patch (already in the repository) fixes the grammar
disagreements, making all the relevant places check for the
/\<[0-9]*-[0-9]*\>/ syntax.  "<n>" is consequently removed; you'll have
to use "0#n" or "<n-n>".  No doc change, since this is changing things
to match the documented behaviour.

On the way, I fixed the rather nasty bug that if a word started with
a digit followed by a numeric glob, the initial digit got swallowed.
(The digit was provisionally treated as a file descriptor number and
never got restored.)

Incidentally, Adam, in /home/groups/zsh/zsh, you've managed to set all
*regular* files to be sgid, rather than all directories.  Can we have
from Adam and Peter please a "chgrp -R zsh /home/groups/zsh; chmod -R
g+w,g-s /home/groups/zsh; chmod g+s /home/groups/zsh/**/*(/)".

-zefram

Index: ChangeLog
===================================================================
RCS file: /cvsroot/zsh/zsh/ChangeLog,v
retrieving revision 1.3
diff -c -r1.3 ChangeLog
*** ChangeLog	2000/04/02 17:37:34	1.3
--- ChangeLog	2000/04/04 01:11:25
***************
*** 1,3 ****
--- 1,9 ----
+ 2000-04-04  Andrew Main  <zefram@zsh.org>
+ 
+ 	* 10444: Src/lex.c, Src/pattern.c: Insist on proper syntax
+ 	for numeric globbing (with the "-").  Also fix the bug whereby
+ 	"echo 1<2-3>" would lose the "1".
+ 
  2000-04-02  Peter Stephenson  <pws@pwstephenson.fsnet.co.uk>
  
  	* pws: Config/version.mk: 3.1.6-dev-21.
Index: Src/lex.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/lex.c,v
retrieving revision 1.1.1.19
diff -c -r1.1.1.19 lex.c
*** Src/lex.c	2000/03/13 09:44:19	1.1.1.19
--- Src/lex.c	2000/04/04 01:11:29
***************
*** 569,575 ****
--- 569,612 ----
      return skipcomm();
  }
  
+ /* Check whether we're looking at valid numeric globbing syntax      *
+  * (/\<[0-9]*-[0-9]*\>/).  Call pointing just after the opening "<". *
+  * Leaves the input in the same place, returning 0 or 1.             */
+ 
  /**/
+ static int
+ isnumglob(void)
+ {
+     int c, ec = '-', ret = 0;
+     int tbs = 256, n = 0;
+     char *tbuf = (char *)zalloc(tbs);
+ 
+     while(1) {
+ 	c = hgetc();
+ 	if(lexstop) {
+ 	    lexstop = 0;
+ 	    break;
+ 	}
+ 	tbuf[n++] = c;
+ 	if(!idigit(c)) {
+ 	    if(c != ec)
+ 		break;
+ 	    if(ec == '>') {
+ 		ret = 1;
+ 		break;
+ 	    }
+ 	    ec = '>';
+ 	}
+ 	if(n == tbs)
+ 	    tbuf = (char *)realloc(tbuf, tbs *= 2);
+     }
+     while(n--)
+ 	hungetc(tbuf[n]);
+     zfree(tbuf, tbs);
+     return ret;
+ }
+ 
+ /**/
  int
  gettok(void)
  {
***************
*** 719,759 ****
  	if (!incmdpos && d == '(') {
  	    hungetc(d);
  	    lexstop = 0;
  	    break;
  	}
! 	if (d == '>')
  	    peek = INOUTANG;
- 	else if (idigit(d) || d == '-') {
- 	    int tbs = 256, n = 0, nc;
- 	    char *tbuf, *tbp, *ntb;
- 
- 	    tbuf = tbp = (char *)zalloc(tbs);
- 	    hungetc(d);
- 
- 	    while ((nc = hgetc()) && !lexstop) {
- 		if (!idigit(nc) && nc != '-')
- 		    break;
- 		*tbp++ = (char)nc;
- 		if (++n == tbs) {
- 		    ntb = (char *)realloc(tbuf, tbs *= 2);
- 		    tbp += ntb - tbuf;
- 		    tbuf = ntb;
- 		}
- 	    }
- 	    if (nc == '>' && !lexstop) {
- 		hungetc(nc);
- 		while (n--)
- 		    hungetc(*--tbp);
- 		zfree(tbuf, tbs);
- 		break;
- 	    }
- 	    if (nc && !lexstop)
- 		hungetc(nc);
- 	    lexstop = 0;
- 	    while (n--)
- 		hungetc(*--tbp);
- 	    zfree(tbuf, tbs);
- 	    peek = INANG;
  	} else if (d == '<') {
  	    int e = hgetc();
  
--- 756,770 ----
  	if (!incmdpos && d == '(') {
  	    hungetc(d);
  	    lexstop = 0;
+ 	    unpeekfd:
+ 	    if(peekfd != -1) {
+ 		hungetc(c);
+ 		c = '0' + peekfd;
+ 	    }
  	    break;
  	}
! 	if (d == '>') {
  	    peek = INOUTANG;
  	} else if (d == '<') {
  	    int e = hgetc();
  
***************
*** 770,781 ****
  		lexstop = 0;
  		peek = DINANG;
  	    }
! 	} else if (d == '&')
  	    peek = INANGAMP;
! 	else {
! 	    peek = INANG;
  	    hungetc(d);
! 	    lexstop = 0;
  	}
  	tokfd = peekfd;
  	return peek;
--- 781,793 ----
  		lexstop = 0;
  		peek = DINANG;
  	    }
! 	} else if (d == '&') {
  	    peek = INANGAMP;
! 	} else {
  	    hungetc(d);
! 	    if(isnumglob())
! 		goto unpeekfd;
! 	    peek = INANG;
  	}
  	tokfd = peekfd;
  	return peek;
***************
*** 783,789 ****
  	d = hgetc();
  	if (d == '(') {
  	    hungetc(d);
! 	    break;
  	} else if (d == '&') {
  	    d = hgetc();
  	    if (d == '!' || d == '|')
--- 795,801 ----
  	d = hgetc();
  	if (d == '(') {
  	    hungetc(d);
! 	    goto unpeekfd;
  	} else if (d == '&') {
  	    d = hgetc();
  	    if (d == '!' || d == '|')
***************
*** 1056,1084 ****
  	    if (isset(SHGLOB) && sub)
  		break;
  	    e = hgetc();
! 	    if (!(idigit(e) || e == '-' || (e == '(' && intpos))) {
! 		hungetc(e);
! 		lexstop = 0;
! 		if (in_brace_param || sub)
! 		    break;
! 		goto brk;
! 	    }
! 	    c = Inang;
! 	    if (e == '(') {
! 		add(c);
  		if (skipcomm()) {
  		    peek = LEXERR;
  		    goto brk;
  		}
  		c = Outpar;
! 	    } else {
! 		add(c);
! 		c = e;
! 		while (c != '>' && !lexstop)
! 		    add(c), c = hgetc();
  		c = Outang;
  	    }
! 	    break;
  	case LX2_EQUALS:
  	    if (intpos) {
  		e = hgetc();
--- 1068,1094 ----
  	    if (isset(SHGLOB) && sub)
  		break;
  	    e = hgetc();
! 	    if(e == '(' && intpos) {
! 		add(Inang);
  		if (skipcomm()) {
  		    peek = LEXERR;
  		    goto brk;
  		}
  		c = Outpar;
! 		break;
! 	    }
! 	    hungetc(e);
! 	    if(isnumglob()) {
! 		add(Inang);
! 		while ((c = hgetc()) != '>')
! 		    add(c);
  		c = Outang;
+ 		break;
  	    }
! 	    lexstop = 0;
! 	    if (in_brace_param || sub)
! 		break;
! 	    goto brk;
  	case LX2_EQUALS:
  	    if (intpos) {
  		e = hgetc();
Index: Src/pattern.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v
retrieving revision 1.2
diff -c -r1.2 pattern.c
*** Src/pattern.c	2000/04/01 20:49:48	1.2
--- Src/pattern.c	2000/04/04 01:11:37
***************
*** 989,1002 ****
  		patparse = nptr;
  		len |= 1;
  	    }
! 	    if (*patparse == '-') {
! 		patparse++;
! 		if (idigit(*patparse)) {
! 		    to = (zrange_t) zstrtol((char *)patparse,
! 					      (char **)&nptr, 10);
! 		    patparse = nptr;
! 		    len |= 2;
! 		}
  	    }
  	    if (*patparse != Outang)
  		return 0;
--- 989,1001 ----
  		patparse = nptr;
  		len |= 1;
  	    }
! 	    DPUTS(*patparse != '-', "BUG: - missing from numeric glob");
! 	    patparse++;
! 	    if (idigit(*patparse)) {
! 		to = (zrange_t) zstrtol((char *)patparse,
! 					  (char **)&nptr, 10);
! 		patparse = nptr;
! 		len |= 2;
  	    }
  	    if (*patparse != Outang)
  		return 0;
END


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: <n> == <n->?
  2000-04-04  1:16 ` Zefram
@ 2000-04-04 17:00   ` Johan Sundström
  2000-04-04 19:32     ` Peter Stephenson
  2000-04-04 21:01     ` Zefram
  0 siblings, 2 replies; 5+ messages in thread
From: Johan Sundström @ 2000-04-04 17:00 UTC (permalink / raw)
  To: zsh-workers

On Tue, 4 Apr 2000, Zefram wrote:

> I thought we'd decided, quite some time ago, that the numeric glob syntax
> was going to require a "-", to minimise ambiguity with redirection.

Sounds reasonable. After some digging through zshmisc(1), I'd guess the
situation we're trying to protect from a possible ambiguity is "< word" --
am I right? I may be at a loss here, but I don't quite see where the
problem might arise. Could someone depict an example or two and how to
trigger the problem?

I guess some situation with a directory having files named '1>', '1', '01'
and/or similar, and trying some command with an argument of <1> might be
what yo're getting at, but I haven't been able to figure out how or why.

> Actually, lex.c is more lenient than that.  Anything matching
> /\<[-0-9]+\>/ is initially lexed as a string rather than as operators.
> However, gettokstr() has some nasties here.  Although the above grammar
> applies at the beginning of a word, gettokstr() makes no such check
> in the middle of a word.  As far as it's concerned, anything matching
> /\<[-0-9]/ is the start of a glob operator, and it'll keep adding to
> the string (past whitespace and so on) until it finds the closing ">".
> Try typing "echo a<1" (and compare against "echo <1").

I noticed them being different (which showed better using cat than echo),
but I failed to understand how the first case tried to operate; to me, it
seemed like a broken effort at <<- or <<<, but then I guess I just didn't
understand what happened, so my guess isn't worth a lot. :-]

Either way, I'm not sure I see the impact of this on the case where the
word continues with a > and possibly more pattern matching. After all,
when I want redirection, I don't try my luck at inserting a < or > in
the middle of the current word I'm typing, and I haven't found anything in
the man pages supporting that behaviour either.

> "0#n" will do that (# = zero or more of the previous character).

As stated, I've been quite fond of <n>. Among other reasons, because
editing a past commandline from <n> to <-n> or <n-> was such a breeze. I'm 
familiar with #, but lazy as I am, I found the deprecated <n> syntax more 
typing friendly.

Another thing I forgot at first when on the subject: for quite some time
now, <x-y> has been ungreedy about its matches, to my disappointment. This
means that <1-2>* will match 1, 2, 10 through 29, and so on, instead of a
single, closed range, as at leas I would hope when constructing such a
pattern. The <x-y> syntax is of course still useful, but it takes a whole
lot more work and narrowing-down to make the pattern as tight as wanted.
Any chance of getting back the greedy version as seen in, for example,
3.0.0? (I'm starting to sound like an old fart, here... ;-) Some
pattern-matching token for stating the level of greediness would of course
be a welcome addition as well, but that sounds like a whole lot more work.

Oh, and don't get me wrong about all this; I'm not complaining about how
zsh works or doesn't work, I'm just trying to make the best I can of the
situation. Zsh isn't my all-time favourite shell without reason; I'm very
fond of zsh's trend of always trying to make things as handy as they can
be, and whenever I can help, I do my best.

/Johan Sundström, fond of globbing and pattern matching


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: <n> == <n->?
  2000-04-04 17:00   ` Johan Sundström
@ 2000-04-04 19:32     ` Peter Stephenson
  2000-04-04 21:01     ` Zefram
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Stephenson @ 2000-04-04 19:32 UTC (permalink / raw)
  To: zsh-workers

Johan Sundstr m wrote:
> Another thing I forgot at first when on the subject: for quite some time
> now, <x-y> has been ungreedy about its matches, to my disappointment. This
> means that <1-2>* will match 1, 2, 10 through 29, and so on, instead of a
> single, closed range, as at leas I would hope when constructing such a
> pattern.

I thought about this when I changed it, and came to the conclusion that the
new behaviour was a simple matter of consistency with patterns.  *
guarantees to match anything at all, so 123potato is bound to match <1-2>*.
I don't think it's an option to make * not match numbers in this one case.
You probably don't need to be told all the workarounds; I suppose the
simplest is <1-2>[^0-9]*.

-- 
Peter Stephenson <pws@pwstephenson.fsnet.co.uk>
Work: pws@CambridgeSiliconRadio.com
Web: http://www.pwstephenson.fsnet.co.uk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: <n> == <n->?
  2000-04-04 17:00   ` Johan Sundström
  2000-04-04 19:32     ` Peter Stephenson
@ 2000-04-04 21:01     ` Zefram
  1 sibling, 0 replies; 5+ messages in thread
From: Zefram @ 2000-04-04 21:01 UTC (permalink / raw)
  To: [Johan Sundstr_m]; +Cc: zsh-workers

[Johan Sundstr_m] wrote:
>Sounds reasonable. After some digging through zshmisc(1), I'd guess the
>situation we're trying to protect from a possible ambiguity is "< word" --
>am I right?

Right right.

>            I may be at a loss here, but I don't quite see where the
>problem might arise. Could someone depict an example or two and how to
>trigger the problem?

	cat <123-456>foo

zsh treats this as a command line with two words, one of which is a glob
pattern that will match "00234foo" and so on.  Bourne shell syntax has
this meaning the same as the way both zsh and sh interpret

	cat <123-456 >foo

which is a command line with one word and two redirections (input from
"123-456", output to "foo").  It's quite common to omit spaces before
and after redirection operators.

>I noticed them being different (which showed better using cat than echo),
>but I failed to understand how the first case tried to operate; to me, it
>seemed like a broken effort at <<- or <<<, but then I guess I just didn't
>understand what happened, so my guess isn't worth a lot. :-]

What actually happens if you type "echo a<1" (before my patch) is that
the lexer sees a word "echo", then sees another word that starts with
"a".  After the "<", it looks for a ">" to finish that glob operator;
it gets a newline, which is treated as part of the word, then it asks
for more input, still reading that word.  Try typing "x>y" as the second
input line to complete the word.

Completely broken behaviour, nothing to do with here documents.

>Either way, I'm not sure I see the impact of this on the case where the
>word continues with a > and possibly more pattern matching. After all,
>when I want redirection, I don't try my luck at inserting a < or > in
>the middle of the current word I'm typing,

Quite.  The practical rarity of the syntax clash is the only reason that
we can get away with that as a glob syntax.  The reason for requiring the
"-" is to make the clash as small and as simply bounded as possible.

>                                           and I haven't found anything in
>the man pages supporting that behaviour either.

It's not clear on where whitespace is permitted.  Whitespace is allowed
but not required before and after redirection operators.

-zefram


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-04-04 21:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-03 23:09 <n> == <n->? Johan Sundström
2000-04-04  1:16 ` Zefram
2000-04-04 17:00   ` Johan Sundström
2000-04-04 19:32     ` Peter Stephenson
2000-04-04 21:01     ` Zefram

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).