* <n> == <n->? @ 2000-04-03 23:09 Johan Sundström 2000-04-04 1:16 ` Zefram 0 siblings, 1 reply; 5+ messages in thread From: Johan Sundström @ 2000-04-03 23:09 UTC (permalink / raw) To: zsh-workers Hi! When upgrading from zsh 3.1.6 to 3.1.6-dev-17 (as found in the Mandrake zsh-3.1.6dev17-1mdk rpm), I was sad to notice that the glob behaviour of the pattern <number> had changed to something identical to what I had earlier (and still can) specified as <number->, that is, an open range of numbers, from number onwards. The old behaviour is, luckily, still available as <n-n> (although with some extra pointless repetitive typing), but I rather liked the syntactic simplicity of <n>, especially since <-n> or <n-> both are easy typers and good for their purposes. <n> isn't useless, if (s)he who changed its behaviour thought so, since it matches all the number n with any amount of leading zeroes, a feature I have daily use for, when rummaging through huge log directories, for instance. Is this behaviour still there in the cvs version? On purpose? Any chance of getting back the old style "syntactic sugar" pattern look? /Johan Sundström, occational zsh developer, frequent zsh user and deployer ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <n> == <n->? 2000-04-03 23:09 <n> == <n->? Johan Sundström @ 2000-04-04 1:16 ` Zefram 2000-04-04 17:00 ` Johan Sundström 0 siblings, 1 reply; 5+ messages in thread From: Zefram @ 2000-04-04 1:16 UTC (permalink / raw) To: [Johan Sundstr_m]; +Cc: zsh-workers [Johan Sundstr_m] wrote: >When upgrading from zsh 3.1.6 to 3.1.6-dev-17 (as found in the Mandrake >zsh-3.1.6dev17-1mdk rpm), I was sad to notice that the glob behaviour of >the pattern <number> had changed to something identical to what I had >earlier (and still can) specified as <number->, that is, an open range of >numbers, from number onwards. Hmm. I thought we'd decided, quite some time ago, that the numeric glob syntax was going to require a "-", to minimise ambiguity with redirection. This is, in fact, what zshexpn(1) shows. However, that was when the <> operator was being introduced, so perhaps that change was limited to making "<>" be always a redirection rather than a glob operator, requiring "<->" for globbing. <fx: checks> Actually, lex.c is more lenient than that. Anything matching /\<[-0-9]+\>/ is initially lexed as a string rather than as operators. However, gettokstr() has some nasties here. Although the above grammar applies at the beginning of a word, gettokstr() makes no such check in the middle of a word. As far as it's concerned, anything matching /\<[-0-9]/ is the start of a glob operator, and it'll keep adding to the string (past whitespace and so on) until it finds the closing ">". Try typing "echo a<1" (and compare against "echo <1"). To complete the set, tokenize() insists on /\<[0-9]*-[0-9]*\>/. So it looks like it's *intended* that the "-" be required, but the lexer just isn't actually enforcing it. The code that actually causes "<n>" to be treated like "<n->" is in pattern.c: it sees that it has a starting number but no ending number, and just doesn't distinguish the two cases. > <n> isn't useless, if (s)he who changed its >behaviour thought so, since it matches all the number n with any amount of >leading zeroes, a feature I have daily use for, when rummaging through >huge log directories, for instance. "0#n" will do that (# = zero or more of the previous character). OK. This patch (already in the repository) fixes the grammar disagreements, making all the relevant places check for the /\<[0-9]*-[0-9]*\>/ syntax. "<n>" is consequently removed; you'll have to use "0#n" or "<n-n>". No doc change, since this is changing things to match the documented behaviour. On the way, I fixed the rather nasty bug that if a word started with a digit followed by a numeric glob, the initial digit got swallowed. (The digit was provisionally treated as a file descriptor number and never got restored.) Incidentally, Adam, in /home/groups/zsh/zsh, you've managed to set all *regular* files to be sgid, rather than all directories. Can we have from Adam and Peter please a "chgrp -R zsh /home/groups/zsh; chmod -R g+w,g-s /home/groups/zsh; chmod g+s /home/groups/zsh/**/*(/)". -zefram Index: ChangeLog =================================================================== RCS file: /cvsroot/zsh/zsh/ChangeLog,v retrieving revision 1.3 diff -c -r1.3 ChangeLog *** ChangeLog 2000/04/02 17:37:34 1.3 --- ChangeLog 2000/04/04 01:11:25 *************** *** 1,3 **** --- 1,9 ---- + 2000-04-04 Andrew Main <zefram@zsh.org> + + * 10444: Src/lex.c, Src/pattern.c: Insist on proper syntax + for numeric globbing (with the "-"). Also fix the bug whereby + "echo 1<2-3>" would lose the "1". + 2000-04-02 Peter Stephenson <pws@pwstephenson.fsnet.co.uk> * pws: Config/version.mk: 3.1.6-dev-21. Index: Src/lex.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/lex.c,v retrieving revision 1.1.1.19 diff -c -r1.1.1.19 lex.c *** Src/lex.c 2000/03/13 09:44:19 1.1.1.19 --- Src/lex.c 2000/04/04 01:11:29 *************** *** 569,575 **** --- 569,612 ---- return skipcomm(); } + /* Check whether we're looking at valid numeric globbing syntax * + * (/\<[0-9]*-[0-9]*\>/). Call pointing just after the opening "<". * + * Leaves the input in the same place, returning 0 or 1. */ + /**/ + static int + isnumglob(void) + { + int c, ec = '-', ret = 0; + int tbs = 256, n = 0; + char *tbuf = (char *)zalloc(tbs); + + while(1) { + c = hgetc(); + if(lexstop) { + lexstop = 0; + break; + } + tbuf[n++] = c; + if(!idigit(c)) { + if(c != ec) + break; + if(ec == '>') { + ret = 1; + break; + } + ec = '>'; + } + if(n == tbs) + tbuf = (char *)realloc(tbuf, tbs *= 2); + } + while(n--) + hungetc(tbuf[n]); + zfree(tbuf, tbs); + return ret; + } + + /**/ int gettok(void) { *************** *** 719,759 **** if (!incmdpos && d == '(') { hungetc(d); lexstop = 0; break; } ! if (d == '>') peek = INOUTANG; - else if (idigit(d) || d == '-') { - int tbs = 256, n = 0, nc; - char *tbuf, *tbp, *ntb; - - tbuf = tbp = (char *)zalloc(tbs); - hungetc(d); - - while ((nc = hgetc()) && !lexstop) { - if (!idigit(nc) && nc != '-') - break; - *tbp++ = (char)nc; - if (++n == tbs) { - ntb = (char *)realloc(tbuf, tbs *= 2); - tbp += ntb - tbuf; - tbuf = ntb; - } - } - if (nc == '>' && !lexstop) { - hungetc(nc); - while (n--) - hungetc(*--tbp); - zfree(tbuf, tbs); - break; - } - if (nc && !lexstop) - hungetc(nc); - lexstop = 0; - while (n--) - hungetc(*--tbp); - zfree(tbuf, tbs); - peek = INANG; } else if (d == '<') { int e = hgetc(); --- 756,770 ---- if (!incmdpos && d == '(') { hungetc(d); lexstop = 0; + unpeekfd: + if(peekfd != -1) { + hungetc(c); + c = '0' + peekfd; + } break; } ! if (d == '>') { peek = INOUTANG; } else if (d == '<') { int e = hgetc(); *************** *** 770,781 **** lexstop = 0; peek = DINANG; } ! } else if (d == '&') peek = INANGAMP; ! else { ! peek = INANG; hungetc(d); ! lexstop = 0; } tokfd = peekfd; return peek; --- 781,793 ---- lexstop = 0; peek = DINANG; } ! } else if (d == '&') { peek = INANGAMP; ! } else { hungetc(d); ! if(isnumglob()) ! goto unpeekfd; ! peek = INANG; } tokfd = peekfd; return peek; *************** *** 783,789 **** d = hgetc(); if (d == '(') { hungetc(d); ! break; } else if (d == '&') { d = hgetc(); if (d == '!' || d == '|') --- 795,801 ---- d = hgetc(); if (d == '(') { hungetc(d); ! goto unpeekfd; } else if (d == '&') { d = hgetc(); if (d == '!' || d == '|') *************** *** 1056,1084 **** if (isset(SHGLOB) && sub) break; e = hgetc(); ! if (!(idigit(e) || e == '-' || (e == '(' && intpos))) { ! hungetc(e); ! lexstop = 0; ! if (in_brace_param || sub) ! break; ! goto brk; ! } ! c = Inang; ! if (e == '(') { ! add(c); if (skipcomm()) { peek = LEXERR; goto brk; } c = Outpar; ! } else { ! add(c); ! c = e; ! while (c != '>' && !lexstop) ! add(c), c = hgetc(); c = Outang; } ! break; case LX2_EQUALS: if (intpos) { e = hgetc(); --- 1068,1094 ---- if (isset(SHGLOB) && sub) break; e = hgetc(); ! if(e == '(' && intpos) { ! add(Inang); if (skipcomm()) { peek = LEXERR; goto brk; } c = Outpar; ! break; ! } ! hungetc(e); ! if(isnumglob()) { ! add(Inang); ! while ((c = hgetc()) != '>') ! add(c); c = Outang; + break; } ! lexstop = 0; ! if (in_brace_param || sub) ! break; ! goto brk; case LX2_EQUALS: if (intpos) { e = hgetc(); Index: Src/pattern.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v retrieving revision 1.2 diff -c -r1.2 pattern.c *** Src/pattern.c 2000/04/01 20:49:48 1.2 --- Src/pattern.c 2000/04/04 01:11:37 *************** *** 989,1002 **** patparse = nptr; len |= 1; } ! if (*patparse == '-') { ! patparse++; ! if (idigit(*patparse)) { ! to = (zrange_t) zstrtol((char *)patparse, ! (char **)&nptr, 10); ! patparse = nptr; ! len |= 2; ! } } if (*patparse != Outang) return 0; --- 989,1001 ---- patparse = nptr; len |= 1; } ! DPUTS(*patparse != '-', "BUG: - missing from numeric glob"); ! patparse++; ! if (idigit(*patparse)) { ! to = (zrange_t) zstrtol((char *)patparse, ! (char **)&nptr, 10); ! patparse = nptr; ! len |= 2; } if (*patparse != Outang) return 0; END ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <n> == <n->? 2000-04-04 1:16 ` Zefram @ 2000-04-04 17:00 ` Johan Sundström 2000-04-04 19:32 ` Peter Stephenson 2000-04-04 21:01 ` Zefram 0 siblings, 2 replies; 5+ messages in thread From: Johan Sundström @ 2000-04-04 17:00 UTC (permalink / raw) To: zsh-workers On Tue, 4 Apr 2000, Zefram wrote: > I thought we'd decided, quite some time ago, that the numeric glob syntax > was going to require a "-", to minimise ambiguity with redirection. Sounds reasonable. After some digging through zshmisc(1), I'd guess the situation we're trying to protect from a possible ambiguity is "< word" -- am I right? I may be at a loss here, but I don't quite see where the problem might arise. Could someone depict an example or two and how to trigger the problem? I guess some situation with a directory having files named '1>', '1', '01' and/or similar, and trying some command with an argument of <1> might be what yo're getting at, but I haven't been able to figure out how or why. > Actually, lex.c is more lenient than that. Anything matching > /\<[-0-9]+\>/ is initially lexed as a string rather than as operators. > However, gettokstr() has some nasties here. Although the above grammar > applies at the beginning of a word, gettokstr() makes no such check > in the middle of a word. As far as it's concerned, anything matching > /\<[-0-9]/ is the start of a glob operator, and it'll keep adding to > the string (past whitespace and so on) until it finds the closing ">". > Try typing "echo a<1" (and compare against "echo <1"). I noticed them being different (which showed better using cat than echo), but I failed to understand how the first case tried to operate; to me, it seemed like a broken effort at <<- or <<<, but then I guess I just didn't understand what happened, so my guess isn't worth a lot. :-] Either way, I'm not sure I see the impact of this on the case where the word continues with a > and possibly more pattern matching. After all, when I want redirection, I don't try my luck at inserting a < or > in the middle of the current word I'm typing, and I haven't found anything in the man pages supporting that behaviour either. > "0#n" will do that (# = zero or more of the previous character). As stated, I've been quite fond of <n>. Among other reasons, because editing a past commandline from <n> to <-n> or <n-> was such a breeze. I'm familiar with #, but lazy as I am, I found the deprecated <n> syntax more typing friendly. Another thing I forgot at first when on the subject: for quite some time now, <x-y> has been ungreedy about its matches, to my disappointment. This means that <1-2>* will match 1, 2, 10 through 29, and so on, instead of a single, closed range, as at leas I would hope when constructing such a pattern. The <x-y> syntax is of course still useful, but it takes a whole lot more work and narrowing-down to make the pattern as tight as wanted. Any chance of getting back the greedy version as seen in, for example, 3.0.0? (I'm starting to sound like an old fart, here... ;-) Some pattern-matching token for stating the level of greediness would of course be a welcome addition as well, but that sounds like a whole lot more work. Oh, and don't get me wrong about all this; I'm not complaining about how zsh works or doesn't work, I'm just trying to make the best I can of the situation. Zsh isn't my all-time favourite shell without reason; I'm very fond of zsh's trend of always trying to make things as handy as they can be, and whenever I can help, I do my best. /Johan Sundström, fond of globbing and pattern matching ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <n> == <n->? 2000-04-04 17:00 ` Johan Sundström @ 2000-04-04 19:32 ` Peter Stephenson 2000-04-04 21:01 ` Zefram 1 sibling, 0 replies; 5+ messages in thread From: Peter Stephenson @ 2000-04-04 19:32 UTC (permalink / raw) To: zsh-workers Johan Sundstr m wrote: > Another thing I forgot at first when on the subject: for quite some time > now, <x-y> has been ungreedy about its matches, to my disappointment. This > means that <1-2>* will match 1, 2, 10 through 29, and so on, instead of a > single, closed range, as at leas I would hope when constructing such a > pattern. I thought about this when I changed it, and came to the conclusion that the new behaviour was a simple matter of consistency with patterns. * guarantees to match anything at all, so 123potato is bound to match <1-2>*. I don't think it's an option to make * not match numbers in this one case. You probably don't need to be told all the workarounds; I suppose the simplest is <1-2>[^0-9]*. -- Peter Stephenson <pws@pwstephenson.fsnet.co.uk> Work: pws@CambridgeSiliconRadio.com Web: http://www.pwstephenson.fsnet.co.uk ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <n> == <n->? 2000-04-04 17:00 ` Johan Sundström 2000-04-04 19:32 ` Peter Stephenson @ 2000-04-04 21:01 ` Zefram 1 sibling, 0 replies; 5+ messages in thread From: Zefram @ 2000-04-04 21:01 UTC (permalink / raw) To: [Johan Sundstr_m]; +Cc: zsh-workers [Johan Sundstr_m] wrote: >Sounds reasonable. After some digging through zshmisc(1), I'd guess the >situation we're trying to protect from a possible ambiguity is "< word" -- >am I right? Right right. > I may be at a loss here, but I don't quite see where the >problem might arise. Could someone depict an example or two and how to >trigger the problem? cat <123-456>foo zsh treats this as a command line with two words, one of which is a glob pattern that will match "00234foo" and so on. Bourne shell syntax has this meaning the same as the way both zsh and sh interpret cat <123-456 >foo which is a command line with one word and two redirections (input from "123-456", output to "foo"). It's quite common to omit spaces before and after redirection operators. >I noticed them being different (which showed better using cat than echo), >but I failed to understand how the first case tried to operate; to me, it >seemed like a broken effort at <<- or <<<, but then I guess I just didn't >understand what happened, so my guess isn't worth a lot. :-] What actually happens if you type "echo a<1" (before my patch) is that the lexer sees a word "echo", then sees another word that starts with "a". After the "<", it looks for a ">" to finish that glob operator; it gets a newline, which is treated as part of the word, then it asks for more input, still reading that word. Try typing "x>y" as the second input line to complete the word. Completely broken behaviour, nothing to do with here documents. >Either way, I'm not sure I see the impact of this on the case where the >word continues with a > and possibly more pattern matching. After all, >when I want redirection, I don't try my luck at inserting a < or > in >the middle of the current word I'm typing, Quite. The practical rarity of the syntax clash is the only reason that we can get away with that as a glob syntax. The reason for requiring the "-" is to make the clash as small and as simply bounded as possible. > and I haven't found anything in >the man pages supporting that behaviour either. It's not clear on where whitespace is permitted. Whitespace is allowed but not required before and after redirection operators. -zefram ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2000-04-04 21:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-04-03 23:09 <n> == <n->? Johan Sundström 2000-04-04 1:16 ` Zefram 2000-04-04 17:00 ` Johan Sundström 2000-04-04 19:32 ` Peter Stephenson 2000-04-04 21:01 ` Zefram
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).