zsh-users
 help / color / mirror / code / Atom feed
* Bug in case stmt with '('
@ 1996-07-17 21:01 Morris M. Siegel
  1996-07-18 18:27 ` Bart Schaefer
  0 siblings, 1 reply; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-17 21:01 UTC (permalink / raw)
  To: zsh-users

It seems that when a pattern in a case statement is preceded by the optional
'(', the parser is in the wrong state when parsing the statement following
the pattern.  In particular, an assignment such as 'var=value' is not
recognized; instead, zsh complains 'command not found: var=value'.

This problem exists in zsh-3.0-pre1 and -pre3 (I never built -pre2),
and it seems that -pre3 behaves even worse than -pre1:  the following
console listing (lines 63-65) shows that "zsh3p3 case3let.ksh"
induces the message "segmentation fault (core dumped)"!
(As far as I can tell, I built -pre1 and -pre3 the same way.)
Moreover (lines 73-75, with zsh-3.0-pre3 as the interactive shell),
". case3let.ksh" not only results in the message
	"case3let.ksh: command not found: gigo=case '(v)' [2]"
but the interpreter then apparently falls through to the next branch of
the case statement rather than exiting the case statement.  (This falling
through is also done by -pre1 but is not shown in the listing below.)

 1  $(0) 15:38:50 [39] head case*
 2  ==> case2let.ksh <==
 3  case v in
 4  #(v)    gigo="case '(v)'"
 5  #       echo $gigo;;
 6  v)      gigo="case 'v)'"
 7          echo $gigo;;
 8  *)      gigo="case '*)'"
 9          echo $gigo;;
10  esac
11
12  ==> case3let.ksh <==
13  case v in
14  (v)     gigo="case '(v)'"
15          echo $gigo;;
16  v)      gigo="case 'v)'"
17          echo $gigo;;
18  *)      gigo="case '*)'"
19          echo $gigo;;
20  esac
21  $(0) 15:38:55 [40] sh case2let.ksh # try Bourne shell
22  case 'v)'
23  $(0) 15:39:16 [41] r 2l=3l
24  sh case3let.ksh # try Bourne shell
25  case3let.ksh: syntax error at line 2: `(' unexpected
26  zsh: 10255 exit 2     sh case3let.ksh
27  $(2) 15:39:27 [42] ksh case2let.ksh # try Korn shell
28  case 'v)'
29  $(0) 15:39:46 [43] r 2l=3l
30  ksh case3let.ksh # try Korn shell
31  case '(v)'
32  $(0) 15:39:52 [44] zsh25 case2let.ksh # try 2.5.03
33  case 'v)'
34  $(0) 15:41:11 [45] r 2l=3l
35  zsh25 case3let.ksh # try 2.5.03
36  case3let.ksh: parse error near `gigo="case '(v)'"' [2]
37  case3let.ksh: parse error near `;;' [3]
38  case3let.ksh: parse error near `)' [4]
39  case3let.ksh: parse error near `;;' [5]
40  case3let.ksh: parse error near `)' [6]
41  case3let.ksh: parse error near `;;' [7]
42  case3let.ksh: parse error near `esac' [8]
43  $(0) 15:41:16 [46] zsh26 case2let.ksh # try 2.6-beta20
44  case 'v)'
45  $(0) 15:41:56 [47] r 2l=3l
46  zsh26 case3let.ksh # try 2.6-beta20
47  case3let.ksh: parse error near `gigo="case '(v)'"' [2]
48  case3let.ksh: parse error near `;;' [3]
49  case3let.ksh: parse error near `)' [4]
50  case3let.ksh: parse error near `;;' [5]
51  case3let.ksh: parse error near `)' [6]
52  case3let.ksh: parse error near `;;' [7]
53  case3let.ksh: parse error near `esac' [8]
54  $(0) 15:42:06 [48] zsh3p1 case2let.ksh # try zsh-3.0-pre1
55  case 'v)'
56  $(0) 15:42:45 [49] r 2l=3l
57  zsh3p1 case3let.ksh # try zsh-3.0-pre1
58  case3let.ksh: command not found: gigo=case '(v)' [2]
59
60  $(0) 15:42:50 [50] zsh3p3 case2let.ksh # try zsh-3.0-pre3
61  case 'v)'
62  $(0) 15:43:11 [51] r 2l=3l
63  zsh3p3 case3let.ksh # try zsh-3.0-pre3
64  case3let.ksh: command not found: gigo=case '(v)' [2]
65  zsh: 10287 segmentation fault (core dumped)  zsh3p3 case3let.ksh
66  $(139) 15:43:16 [52] ls -sailF core
67  937892978377 8377 -rw-r--r--  1 segal     8577456 Jul 17 15:43 core
68  $(0) 15:43:29 [53] echo $ZSH_VERSION
69  3.0-pre3
70  $(0) 15:43:41 [54] . case2let.ksh
71  case 'v)'
72  $(0) 15:43:50 [55] r 2l=3l
73  . case3let.ksh
74  case3let.ksh: command not found: gigo=case '(v)' [2]
75  case 'v)'
76  $(0) 15:44:01 [56]



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-17 21:01 Bug in case stmt with '(' Morris M. Siegel
@ 1996-07-18 18:27 ` Bart Schaefer
  1996-07-18 23:43   ` Zoltan Hidvegi
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Schaefer @ 1996-07-18 18:27 UTC (permalink / raw)
  To: Morris M. Siegel, zsh-users

On Jul 17,  5:01pm, Morris M. Siegel wrote:
> Subject: Bug in case stmt with '('
> It seems that when a pattern in a case statement is preceded by the optional
> '(', the parser is in the wrong state when parsing the statement following
> the pattern.

Yup.  Here's the problem:

To parse the case pattern, zsh attempts to lex a STRING token.  In old
case syntax, this stops before either the next "|" or the ")".  With the
fully-parenthesized syntax, it lexes the entire (...) expression.

Zsh then reads another token, and looks to see whether it's a "|" (in
which case it loops) or a ")" (in which case the parse is finished).
With the new syntax, neither of these is true, which means zsh has
lexed the next word (in this case, "gigo=...") instead of the OUTPAR.
Problem is, it needs to lex that word with incmdpos=1 to get an
ENVSTRING token, but instead it's lexing with incmdpos=0,incasepat=1
and getting a STRING token.

In the old syntax, where the token was OUTPAR, zsh sets incmdpos=1 and
then lexes again, getting ENVSTRING as it should.  When zsh notices that
the token is not OUTPAR, it attempts to fix the situation by skipping
that lex; but the damage has already been done.  Zsh needs to look for
the matched parens sooner.

There's one additional complication: zsh glob syntax uses parens, so zsh
has previously accepted this sort of thing:

    case $foo in
    (v|w)|x)	echo 'case (v|w)|x';;
    (y|z))	echo 'case (y|z)';;
    *)		echo 'default';;
    esac

That means it can't look only for matching parens, but also has to look
for matching parens followed by a closing paren or vertical bar.

I'm unsure of this patch, so the pre3 stuff is still there #ifdef OLD.
There may be a better way to deal with it.

*** Src/parse.c.orig	Mon Jul 15 00:07:21 1996
--- Src/parse.c	Thu Jul 18 11:12:14 1996
***************
*** 502,508 ****
--- 502,527 ----
  	    break;
  	}
  	str = tokstr;
+ #ifdef OLD
  	yylex();
+ #else
+ 	/* POSIX allows (foo*) patterns */
+ 	if (skipparens(Inpar, Outpar, &str) || *str) {
+ 	    /* Not fully parenthesized, look for a token */
+ 	    str = tokstr;
+ 	    yylex();
+ 	} else {
+ 	    /* Fully parenthesized, but zsh glob syntax permits
+ 	     * parens in (x|y) patterns, so check for that too. */
+ 	    int c;
+ 	    str = tokstr;
+ 	    c = hgetc();
+ 	    if (c == /*(*/ ')' || c == '|') {
+ 		hungetc(c);
+ 		yylex();
+ 	    }
+ 	}
+ #endif
  	while (tok == BAR) {
  	    char *str2;
  	    int sl = strlen(str);
***************
*** 527,532 ****
--- 546,552 ----
  	}
  	incasepat = 0;
  	incmdpos = 1;
+ #ifdef OLD
  	if (tok != OUTPAR) {
  	    /* POSIX allows (foo*) patterns */
  	    char *s = str;
***************
*** 534,539 ****
--- 554,560 ----
  	    if (skipparens(Inpar, Outpar, &s) || *s)
  		YYERRORV;
  	} else
+ #endif
  	    yylex();
  	addlinknode(pats, str);
  	addlinknode(lists, par_list());
***************



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-18 18:27 ` Bart Schaefer
@ 1996-07-18 23:43   ` Zoltan Hidvegi
  1996-07-19  2:23     ` Bart Schaefer
  1996-07-24 16:07     ` Morrie is off for July 25 Morris M. Siegel
  0 siblings, 2 replies; 17+ messages in thread
From: Zoltan Hidvegi @ 1996-07-18 23:43 UTC (permalink / raw)
  To: schaefer; +Cc: segal, zsh-users

Bart Schaefer wrote:
> > It seems that when a pattern in a case statement is preceded by the optional
> > '(', the parser is in the wrong state when parsing the statement following
> > the pattern. 

[...]

> That means it can't look only for matching parens, but also has to look
> for matching parens followed by a closing paren or vertical bar.
> 
> I'm unsure of this patch, so the pre3 stuff is still there #ifdef OLD.
> There may be a better way to deal with it.

Actually the patch below is a simpler solution for that.  An other
advantage of that is that hgetc()/hungetc() is not used which is
scientifically better since tokens should be recognized in lex.c and
parse.c should only use the tokens returned by lex.c.  The patch to lex.c
removes an unnecessary special case for `<' in a case pattern.  After that
the only effect of the incasepat variable is that it disables spell
checking.

Zoltan


*** Src/parse.c	1996/07/13 20:26:35	2.20
--- Src/parse.c	1996/07/18 23:32:54
***************
*** 502,512 ****
--- 502,516 ----
  	    break;
  	}
  	str = tokstr;
+ 	incasepat = 0;
+ 	incmdpos = 1;
  	yylex();
  	while (tok == BAR) {
  	    char *str2;
  	    int sl = strlen(str);
  
+ 	    incasepat = 1;
+ 	    incmdpos = 0;
  	    yylex();
  	    if (tok == OUTPAR) {
  		str2 = ncalloc(sl + 2);
***************
*** 514,519 ****
--- 518,525 ----
  		str2[sl] = Bar;
  		str2[sl+1] = '\0';
  		str = str2;
+ 		incasepat = 0;
+ 		incmdpos = 1;
  		break;
  	    }
  	    if (tok != STRING)
***************
*** 525,532 ****
  	    str = str2;
  	    yylex();
  	}
- 	incasepat = 0;
- 	incmdpos = 1;
  	if (tok != OUTPAR) {
  	    /* POSIX allows (foo*) patterns */
  	    char *s = str;
--- 531,536 ----
*** Src/lex.c	1996/07/13 20:26:35	2.35
--- Src/lex.c	1996/07/18 23:29:53
***************
*** 496,502 ****
  	return OUTPAR;
      case LX1_INANG:
  	d = hgetc();
! 	if ((!incmdpos && d == '(') || incasepat) {
  	    hungetc(d);
  	    lexstop = 0;
  	    break;
--- 496,502 ----
  	return OUTPAR;
      case LX1_INANG:
  	d = hgetc();
! 	if (!incmdpos && d == '(') {
  	    hungetc(d);
  	    lexstop = 0;
  	    break;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-18 23:43   ` Zoltan Hidvegi
@ 1996-07-19  2:23     ` Bart Schaefer
  1996-07-19 16:00       ` Zoltan Hidvegi
  1996-07-24 16:07     ` Morrie is off for July 25 Morris M. Siegel
  1 sibling, 1 reply; 17+ messages in thread
From: Bart Schaefer @ 1996-07-19  2:23 UTC (permalink / raw)
  To: Zoltan Hidvegi; +Cc: zsh-users

On Jul 19,  1:43am, Zoltan Hidvegi wrote:
> Subject: Re: Bug in case stmt with '('
> Bart Schaefer wrote:
> 
> > That means it can't look only for matching parens, but also has to look
> > for matching parens followed by a closing paren or vertical bar.
> > 
> > I'm unsure of this patch, so the pre3 stuff is still there #ifdef OLD.
> > There may be a better way to deal with it.
> 
> Actually the patch below is a simpler solution for that.

Is it?  It handles (v|w)), but it doesn't handle this case:

	case v in
	(v|w)|x)   gigo="case '(v|w)|x)'"
		echo $gigo;;
	*)      gigo="case '*)'"
		echo $gigo;;
	esac

I backed out my patch and applied yours, and I still get:

	zsh: command not found: gigo=case '(v|w)|x)'

> An other
> advantage of that is that hgetc()/hungetc() is not used which is
> scientifically better since tokens should be recognized in lex.c

Yes, I know, but there are other cases in parse.c that lookahead with
hgetc().

> parse.c should only use the tokens returned by lex.c.  The patch to lex.c
> removes an unnecessary special case for `<' in a case pattern.  After that
> the only effect of the incasepat variable is that it disables spell
> checking.

That's fine, but doesn't address the problem.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-19  2:23     ` Bart Schaefer
@ 1996-07-19 16:00       ` Zoltan Hidvegi
  1996-07-21 22:53         ` Morris M. Siegel
  0 siblings, 1 reply; 17+ messages in thread
From: Zoltan Hidvegi @ 1996-07-19 16:00 UTC (permalink / raw)
  To: schaefer; +Cc: zsh-users

> > Actually the patch below is a simpler solution for that.
> 
> Is it?  It handles (v|w)), but it doesn't handle this case:
> 
> 	case v in
> 	(v|w)|x)   gigo="case '(v|w)|x)'"
> 		echo $gigo;;
> 	*)      gigo="case '*)'"
> 		echo $gigo;;
> 	esac

All right.  It was late at night when I wrote that.  Below is an other
attempt.  It is a patch against unmodified 3_0-pre3.

> > An other
> > advantage of that is that hgetc()/hungetc() is not used which is
> > scientifically better since tokens should be recognized in lex.c
> 
> Yes, I know, but there are other cases in parse.c that lookahead with
> hgetc().

hgetc/hungetc is only used in parse.c to discard the remaining line after a
parse error.  It is not used for parsing.  For parsing only the tokens
return by yylex() must be used.

Zoltan


*** Src/parse.c	1996/07/13 20:26:35	2.20
--- Src/parse.c	1996/07/19 15:38:54
***************
*** 502,540 ****
  	    break;
  	}
  	str = tokstr;
! 	yylex();
! 	while (tok == BAR) {
! 	    char *str2;
! 	    int sl = strlen(str);
! 
  	    yylex();
  	    if (tok == OUTPAR) {
  		str2 = ncalloc(sl + 2);
  		strcpy(str2, str);
  		str2[sl] = Bar;
  		str2[sl+1] = '\0';
  		str = str2;
  		break;
  	    }
- 	    if (tok != STRING)
- 		YYERRORV;
- 	    str2 = ncalloc(sl + strlen(tokstr) + 2);
- 	    strcpy(str2, str);
- 	    str2[sl] = Bar;
- 	    strcpy(str2 + sl + 1, tokstr);
- 	    str = str2;
- 	    yylex();
  	}
- 	incasepat = 0;
- 	incmdpos = 1;
- 	if (tok != OUTPAR) {
- 	    /* POSIX allows (foo*) patterns */
- 	    char *s = str;
- 
- 	    if (skipparens(Inpar, Outpar, &s) || *s)
- 		YYERRORV;
- 	} else
- 	    yylex();
  	addlinknode(pats, str);
  	addlinknode(lists, par_list());
  	n++;
--- 502,546 ----
  	    break;
  	}
  	str = tokstr;
! 	incasepat = 0;
! 	incmdpos = 1;
! 	for (;;) {
  	    yylex();
  	    if (tok == OUTPAR) {
+ 		incasepat = 0;
+ 		incmdpos = 1;
+ 		yylex();
+ 		break;
+ 	    } else if (tok == BAR) {
+ 		char *str2;
+ 		int sl = strlen(str);
+ 
+ 		incasepat = 1;
+ 		incmdpos = 0;
  		str2 = ncalloc(sl + 2);
  		strcpy(str2, str);
  		str2[sl] = Bar;
  		str2[sl+1] = '\0';
  		str = str2;
+ 	    } else if (tok == STRING) {
+ 		char *str2;
+ 		int sl = strlen(str);
+ 
+ 		if (str[sl - 1] != Bar)
+ 		    YYERRORV;
+ 		str2 = ncalloc(sl + strlen(tokstr) + 1);
+ 		strcpy(str2, str);
+ 		strcpy(str2 + sl, tokstr);
+ 		str = str2;
+ 	    } else {
+ 		/* POSIX allows (foo*) patterns */
+ 		char *s = str;
+ 
+ 		if (skipparens(Inpar, Outpar, &s) || *s)
+ 		    YYERRORV;
  		break;
  	    }
  	}
  	addlinknode(pats, str);
  	addlinknode(lists, par_list());
  	n++;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-19 16:00       ` Zoltan Hidvegi
@ 1996-07-21 22:53         ` Morris M. Siegel
  1996-07-22  6:31           ` Bart Schaefer
  0 siblings, 1 reply; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-21 22:53 UTC (permalink / raw)
  To: Zoltan Hidvegi, schaefer, zsh-users

Greetings:--

I don't have the time to examine how zsh's lex.c and parse.c work,
so I can't comment technically on Bart Schaefer and Zoltan Hidvegi's
respective patches.  However, the thought occurred to me that the
lexer and/or parser might be simplified if we revise our outlook on
the case-statement grammar.  Instead of the syntax of each case-
clause's prefix being

{A}	[(] pat-1 | ... | pat-n )

(which requires the troublesome analysis of deciding whether an
initial '(' is the optional '(' of the case-clause syntax or simply
a glob-type '(' belonging to pat-1 proper), why not extend the syntax
to be

{B1}	pat-1 | ... | pat-n [)]

?  Thus (presuming that normal zsh globbing is in effect), an
initial '(' would _always_ be a glob-type parenthesis (making
the entire case-clause prefix into one big grouped pat-1 that
is not followed by the optional trailing ')').

This should not break existing code, since currently a trailing
')' is always required.  Moreover, sh, ksh, and zsh all disallow
blanks in any pat-i, as the following console listing demonstrates,
so there is no valid case-clause (of the form, say, 'a b) c;;')
such that omitting the ')' would result in an ambiguous or misparsed
case-clause (such as 'a b c;;').

If one were reluctant to depart so far from traditional case-statement 
syntax as {B1} does, one could make the trailing ')' optional only
when pat-n itself ends in ')', and to stay even more compatible with
POSIX syntax, additionally require that n = 1.  However, zsh already has
so many extensions that I don't seem the harm in letting the trailing
')' always be optional:  each pat-i would be unambiguously terminated by
(possibly optional whitespace preceding) '|', ')', or the first token
of the case-clause body (which cannot be '|' or ')' in any case).

The above all presumes that '(' is a globbing character, which is
normally so.  If it isn't (such as when the NO_GLOB [or SH_GLOB ?]
option is in effect), then {B1}, although still necessary, is
insufficient, and must obviously be supplemented with the alternative

{B2}	( pat-1 | ... | pat-n )

.  In this case, the ')' is mandatory:  traditional case-clause syntax
requires us to tolerate an unmatched trailing ')', but there is no
reason to allow an unmatched leading '('.

The only remaining issue is how to describe case-prefix syntax in 
the zshmisc man pages.  It is unnecessary to mention the
implementation detail that when '(' is a globbing character,
the lexer or parser treats {B2} as a special case of {B1}.
To be rigorous, one could state that the case-prefix syntax 
is {B1} | {B2}.  However, this is somewhat tedious, and instead
it might be preferable to present the syntax simply as

{C}	[(] pat-1 | ... | pat-n [)]

and mention in the accompanying English-language text that if the
leading '(' is supplied then the trailing ')' is required as well.
__________________________________

By the way, my previous posting shows that with the buggy behavior
of zsh-3.0-pre{1,3}, when executing ". case3let.ksh", the interpreter
erroneously falls through to the next case-clause instead of exiting
the entire case-statement (see lines 68-75 of that posting's console
listing).  In addition, when executing "zsh case3let.ksh", version
3.0-pre3 (but not -pre1) gets a segmentation fault (see lines 62-67).
These would seem to be problems aside from the incorrect parsing
of the case-clause prefix.

-- Morrie Siegel
__________________________________

$(0) 13:08:37 {5} head case[45]sp.ksh
==> case4sp.ksh <==
case 'a z' in
#a z )	echo 'a z )' ;;
a\ z )	echo 'a\\ z )' ;;
a[ ]z )	echo 'a[ ]z )' ;;
a* )	echo 'a* )' ;;
* )	echo '* )' ;;
esac
==> case5sp.ksh <==
case 'a z' in
a z )	echo 'a z )' ;;
a\ z )	echo 'a\\ z )' ;;
a[ ]z )	echo 'a[ ]z )' ;;
a* )	echo 'a* )' ;;
* )	echo '* )' ;;
esac
$(0) 13:08:53 {6} /bin/sh case4sp.ksh
case4sp.ksh: syntax error at line 4: `]z' unexpected
zsh: 17070 exit 2     /bin/sh case4sp.ksh
$(2) 13:09:15 {7} r 4=5
/bin/sh case5sp.ksh
case5sp.ksh: syntax error at line 2: `z' unexpected
zsh: 17071 exit 2     /bin/sh case5sp.ksh
$(2) 13:09:22 {8} /bin/ksh case4sp.ksh
case4sp.ksh: syntax error at line 4 : `]z' unexpected
zsh: 17072 exit 2     /bin/ksh case4sp.ksh
$(2) 13:09:36 {9} r 4=5
/bin/ksh case5sp.ksh
case5sp.ksh: syntax error at line 2 : `z' unexpected
zsh: 17074 exit 2     /bin/ksh case5sp.ksh
$(2) 13:09:42 {10} zsh3p1 case4sp.ksh
case4sp.ksh: parse error near `]z' [4]
case4sp.ksh: parse error near `)' [5]
case4sp.ksh: parse error near `)' [6]
case4sp.ksh: parse error near `esac' [7]
$(0) 13:10:08 {11} r 4=5
zsh3p1 case5sp.ksh
case5sp.ksh: parse error near `z' [2]
case5sp.ksh: parse error near `)' [3]
case5sp.ksh: parse error near `)' [4]
case5sp.ksh: parse error near `)' [5]
case5sp.ksh: parse error near `)' [6]
case5sp.ksh: parse error near `esac' [7]
$(0) 13:10:44 {12} exit
__________________________________


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-21 22:53         ` Morris M. Siegel
@ 1996-07-22  6:31           ` Bart Schaefer
       [not found]             ` <schaefer@candle.brasslantern.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Schaefer @ 1996-07-22  6:31 UTC (permalink / raw)
  To: Zoltan Hidvegi, Morris M. Siegel, zsh-users

On Jul 21,  6:53pm, Morris M. Siegel wrote:
} Subject: Re: Bug in case stmt with '('
}
} I don't have the time to examine how zsh's lex.c and parse.c work,
} so I can't comment technically on Bart Schaefer and Zoltan Hidvegi's
} respective patches.

Zoltan's revised patch for this is the best one so far.  It's the way
I would have approached it had I a better understanding of the effects
of the `incmdpos' and `incasepat' state variables.  However, see below.

} However, the thought occurred to me that the
} lexer and/or parser might be simplified if we revise our outlook on
} the case-statement grammar.  Instead of the syntax of each case-
} clause's prefix being
} 
} {A}	[(] pat-1 | ... | pat-n )
} 
} (which requires the troublesome analysis of deciding whether an
} initial '(' is the optional '(' of the case-clause syntax or simply
} a glob-type '(' belonging to pat-1 proper),

It doesn't currently require that analysis, so the rest of this is
mostly for benefit of zsh-workers; zsh-users can quit reading now. :-)

} why not extend the syntax to be
} 
} {B1}	pat-1 | ... | pat-n [)]
} 
} ?  Thus (presuming that normal zsh globbing is in effect), an
} initial '(' would _always_ be a glob-type parenthesis (making
} the entire case-clause prefix into one big grouped pat-1 that
} is not followed by the optional trailing ')').

This is actually very close to how it already works.  In any case where
there is an opening paren, it *is* treated as a regular glob-type paren.
The (pre-Zoltan's-patch) code simply (1) looked for a close paren, and
(2) if it didn't find one, checked whether the previously-parsed glob
pattern in fact contained balanced parens.

The bug you reported happened becase "look for a close paren" was done
by calling the lexer with the "expect command tokens" (incmdpos) state
set wrong.

All Zoltan did was rearrange the code so that the lexer state is set
correctly before looking for the next token.  In fact, a bit later on
you describe *exactly* what the (post-Zoltan's-patch) code does:

} If one were reluctant to depart so far from traditional case-statement 
} syntax as {B1} does, one could make the trailing ')' optional only
} when pat-n itself ends in ')', and to stay even more compatible with
} POSIX syntax, additionally require that n = 1.

So your suggestion would amount to omitting the check for balanced
parens, thus assuming (in the event that a closing paren is not found)
that the glob pattern is correctly formed.  This doesn't simplify the
parser in any significant way, and it delays detection of some other
malformed glob patterns.

I think we should leave it POSIX-style.

However, there is a remaining bug in Zoltan's patch.

zagzig<8> case foo in
> (foo) echo yes;;
zsh: parse error near `echo'
zagzig<9> case foo in
foo) echo yes;; 
> esac
yes
zagzig<10> case foo in
> (foo)
> echo yes;;
> esac
yes
zagzig<11> case foo in
> (foo) foo=bar echo yes;;
> esac
yes

Apparently there still needs to be at least one token between the close
paren and the first word of the actual command.

} __________________________________
} 
} By the way, my previous posting shows that with the buggy behavior
} of zsh-3.0-pre{1,3}, when executing ". case3let.ksh", the interpreter
} erroneously falls through to the next case-clause instead of exiting
} the entire case-statement (see lines 68-75 of that posting's console
} listing).

Yes, this was true.  After Zoltan's patch, it is not true any longer,
and the bogus clause generates the appropriate syntax error, aborting
the case statement.

} In addition, when executing "zsh case3let.ksh", version
} 3.0-pre3 (but not -pre1) gets a segmentation fault (see lines 62-67).
} These would seem to be problems aside from the incorrect parsing
} of the case-clause prefix.

No.  As far as I can tell, the core dump was directly related to the
improper fall-through; that is, the same wrong code-path resulted in
both problems.  It was the difference in surrounding tokens that made
the effect be a core dump in one case and merely bad behavior in the
other.

-- 
Bart Schaefer                             Brass Lantern Enterprises
http://www.well.com/user/barts            http://www.nbn.com/people/lantern

New male in /home/schaefer:
>N  2 Justin William Schaefer  Sat May 11 03:43  53/4040  "Happy Birthday"


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
       [not found]             ` <schaefer@candle.brasslantern.com>
@ 1996-07-22 19:26               ` Morris M. Siegel
  1996-07-22 19:51                 ` Zoltan Hidvegi
  1996-07-22 22:31               ` Morris M. Siegel
  1 sibling, 1 reply; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-22 19:26 UTC (permalink / raw)
  To: schaefer, Zoltan Hidvegi, zsh-users

> So your suggestion would amount to omitting the check for balanced
> parens, thus assuming (in the event that a closing paren is not found)
> that the glob pattern is correctly formed.  This doesn't simplify the
> parser in any significant way, and it delays detection of some other
> malformed glob patterns.

I'm ready to believe that my suggestion doesn't simplify the parser
significantly, but I'm _not_ suggesting to forego checking for balanced parens
-- on the contrary, I assume the parser would only accept well-formed glob
patterns.

	     | required paren
	     v
	"( x ) )"
	       ^
	       | optional paren
	       v
	 " x   )"

What I am suggesting is that the trailing ')' (which if present is UNbalanced)
not be required to be present.  I.e., each well-formed case-prefix pattern
(which may have internal parens or '|'s) is terminated by an [external]
'|', ')', or whitespace; if by whitespace, then the parser considers the
following token: if it is '|' or ')', then parsing of the case-prefix continues,
else the case-prefix is considered complete (the parser regards the whitespace
as equivalent to a trailing ')', if you will), and the token is taken to be
the initial token of the case-clause body.  Accordingly, I believe that
detection of malformed glob patterns would not at all be delayed.
Perhaps my approach might even make it easier to avoid the remaining problem
with Zoltan's current patch.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 19:26               ` Morris M. Siegel
@ 1996-07-22 19:51                 ` Zoltan Hidvegi
  1996-07-22 20:48                   ` Morris M. Siegel
  1996-07-22 21:23                   ` Zefram
  0 siblings, 2 replies; 17+ messages in thread
From: Zoltan Hidvegi @ 1996-07-22 19:51 UTC (permalink / raw)
  To: Morris M. Siegel; +Cc: schaefer, zsh-users

> I'm ready to believe that my suggestion doesn't simplify the parser

It would allow removing three lines.

[...]
> What I am suggesting is that the trailing ')' (which if present is UNbalanced)
> not be required to be present.  I.e., each well-formed case-prefix pattern
> (which may have internal parens or '|'s) is terminated by an [external]
> '|', ')', or whitespace; if by whitespace, then the parser considers the
> following token: if it is '|' or ')', then parsing of the case-prefix continues,
> else the case-prefix is considered complete (the parser regards the whitespace
> as equivalent to a trailing ')', if you will), and the token is taken to be
> the initial token of the case-clause body.  Accordingly, I believe that

There is no whitespace token.  Whitespace is a separator and not a token.
Note that

case foo in

f*    |     b*    ) echo yes;;

esac

is a valid case statement.  Autoconf use this kind of case in many places.
Allowing space around | makes the script more readable.  This means that
your new syntax would introduce an ambiguity:

case foo in
f* | echo echo yes;;
esac

The question is wether the first echo is part of the pattern of a command
and | denotes an empty pattern.  Of course you may say that if a pattern
ends in a trailing | it must be followed by a ).  It would really mean that
3 line simplification in the code but it would make the syntax more
complicated.  It think that case works pretty well after my patch and
Bart's latest patch.

Unfortunately there is still an incompatibility in case:

case foo in
( f* | b* ) echo yes
esac

should print yes but in zsh it works as

case foo in
\ f*\ |\ b*\ ) echo yes;;
esac

To fix this would be really difficult I think.

Zoltan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 19:51                 ` Zoltan Hidvegi
@ 1996-07-22 20:48                   ` Morris M. Siegel
  1996-07-22 21:27                     ` Zefram
  1996-07-22 21:51                     ` Bart Schaefer
  1996-07-22 21:23                   ` Zefram
  1 sibling, 2 replies; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-22 20:48 UTC (permalink / raw)
  To: Zoltan Hidvegi, zsh-users; +Cc: schaefer

> There is no whitespace token.  Whitespace is a separator and not a token.

I know that whitespace is a separator, not a token.  My proposed description
of the parsing of a case-prefix did not distinguish lexical from syntactic
analysis.

> Allowing space around | makes the script more readable.

I agree, and did not even think of forbidding it (more on this below).

> This means that your new syntax would introduce an ambiguity:
> 
> case foo in
> f* | echo echo yes;;
> esac
> 
> The question is wether the first echo is part of the pattern of a command
> and | denotes an empty pattern.  Of course you may say that if a pattern
> ends in a trailing | it must be followed by a ).

I confess to having forgotten about the empty pattern.  Using null syntax
to implicitly denote the empty pattern in a script doesn't enhance readability
in any case, and I think zsh syntax would be safer and less confusing if the
empty pattern had to be denoted with the explicit syntax "()".

I would suggest that your above sample case statement be interpreted
as though it were

	case foo in
	f* | echo) echo yes;;
	esac

(following my proposed parsing of a case-prefix), and that the alternative
interpretation with the empty pattern would have to be written either as

	case foo in
	f* | ) echo echo yes;;
	esac

as you suggest (if implicit empty pattern are allowed), or better (with an
explicit empty pattern) as

	case foo in
	f* | () echo echo yes;;
	esac

> It ... would make the syntax more complicated.

I assume you mean the logical syntax (the grammer), not {lex,parse}.c .
If implicit empty patterns are disallowed, I think that the syntax becomes
less, not more, confusing.

> Unfortunately there is still an incompatibility in case:
> 
> case foo in
> ( f* | b* ) echo yes
> esac
> 
> should print yes but in zsh it works as
> 
> case foo in
> \ f*\ |\ b*\ ) echo yes;;
> esac
> 
> To fix this would be really difficult I think.

Perl5 enhances the readability of its regular expressions by introducing
the "/x" modifier, which makes unescaped whitespace syntactically insignificant.
Considering the fact that glob patterns in general are supposed to generate
file names, which usually do not contain blanks, zsh might also ignore
unescaped whitespace in glob patterns (in general, not just in 'case'
statements).  This would clearly solve this last-mentioned incompatibility,
allow blanks around '|' in case-prefixes (referred to above),
and in general improve readability.  Non-trivial whitespace could always
be escaped.  Since file names rarely contain blanks, this should not be
likely to break existing code; if you're worried about the possibility,
you could introduce a new zsh option (say) IGNORE_GLOB_WHITESPACE .

-- Morrie Siegel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 19:51                 ` Zoltan Hidvegi
  1996-07-22 20:48                   ` Morris M. Siegel
@ 1996-07-22 21:23                   ` Zefram
  1 sibling, 0 replies; 17+ messages in thread
From: Zefram @ 1996-07-22 21:23 UTC (permalink / raw)
  To: Zoltan Hidvegi; +Cc: segal, schaefer, zsh-users

>Unfortunately there is still an incompatibility in case:
>
>case foo in
>( f* | b* ) echo yes
>esac
>
>should print yes but in zsh it works as
>
>case foo in
>\ f*\ |\ b*\ ) echo yes;;
>esac

I was wondering whether that would be the case.  It's a serious problem.

>To fix this would be really difficult I think.

It could be easily fixed by modifying glob semantics such that unquoted
whitespace embedded in a pattern is ignored.  I doubt that any real
code relies on the current behaviour, which is in any case
undocumented.

-zefram


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 20:48                   ` Morris M. Siegel
@ 1996-07-22 21:27                     ` Zefram
  1996-07-22 21:56                       ` Morris M. Siegel
  1996-07-22 21:51                     ` Bart Schaefer
  1 sibling, 1 reply; 17+ messages in thread
From: Zefram @ 1996-07-22 21:27 UTC (permalink / raw)
  To: Morris M. Siegel; +Cc: hzoli, zsh-users, schaefer

>I confess to having forgotten about the empty pattern.  Using null syntax
>to implicitly denote the empty pattern in a script doesn't enhance readability
>in any case, and I think zsh syntax would be safer and less confusing if the
>empty pattern had to be denoted with the explicit syntax "()".

Does POSIX require that a completely empty pattern be allowed?  sh and
ksh don't like it, and zsh only allows it where there's a |.  I don't
see any problem with producing a parse error, and requiring an explicit
empty pattern.

-zefram


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 20:48                   ` Morris M. Siegel
  1996-07-22 21:27                     ` Zefram
@ 1996-07-22 21:51                     ` Bart Schaefer
  1 sibling, 0 replies; 17+ messages in thread
From: Bart Schaefer @ 1996-07-22 21:51 UTC (permalink / raw)
  To: Zoltan Hidvegi, Morris M. Siegel, zsh-users

On Jul 22,  9:51pm, Zoltan Hidvegi wrote:
} Subject: Re: Bug in case stmt with '('
}
} Unfortunately there is still an incompatibility in case:
} 
} case foo in
} ( f* | b* ) echo yes
} esac
} 
} should print yes but in zsh it works as
} 
} case foo in
} \ f*\ |\ b*\ ) echo yes;;
} esac
} 
} To fix this would be really difficult I think.

On Jul 22,  4:48pm, Morris M. Siegel wrote:
} Subject: Re: Bug in case stmt with '('
} 
} Considering the fact that glob patterns in general are supposed to generate
} file names, which usually do not contain blanks

I dispute that assertion.  Ever used a filesystem that's shared with a
Macintosh?

} zsh might also ignore
} unescaped whitespace in glob patterns (in general, not just in 'case'
} statements).

There's no such thing as "unescaped whitespace in glob patterns."  Zsh
lexes glob patterns as a STRING token.  By the time filename generation
gets around to interpreting the string as a glob pattern, any spaces
that are left MUST have been "escaped" somehow, to make it through the
lexer.  Putting parens around the pattern is one such possible quoting.

There is exactly one place where we know in advance that we're reading
a glob pattern:  case statements.  Hence I think the fix for this is
really quite simple:

*** Src/lex.c.0	Thu Jul 18 19:15:13 1996
--- Src/lex.c	Mon Jul 22 14:38:56 1996
***************
*** 638,643 ****
--- 638,647 ----
  	if (inblank(c) && !in_brace_param && !pct)
  	    act = LX2_BREAK;
  	else {
+ 	    if (incasepat && pct == 1 && !in_brace_param && iblank(c)) {
+ 		c = hgetc();
+ 		continue;
+ 	    }
  	    act = lexact2[STOUC(c)];
  	    c = lextok2[STOUC(c)];
  	}
***************

This works even for:

zsh% case "foo bar" in
> ((foo bar) | boing) echo yes;;
> esac
yes
zsh% case "foo bar" in
> "foo bar" | boing ) echo yes;;
> esac
yes
zsh% case "foo bar" in
> foo\ bar | boing ) echo yes;;
> esac
yes
zsh%

However, there's another bug lurking in filename generation.  Note:

zsh% setopt nobadpattern nonomatch
zsh% echo ( Make | buy )
zsh: no match


-- 
Bart Schaefer                             Brass Lantern Enterprises
http://www.well.com/user/barts            http://www.nbn.com/people/lantern

New male in /home/schaefer:
>N  2 Justin William Schaefer  Sat May 11 03:43  53/4040  "Happy Birthday"


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 21:27                     ` Zefram
@ 1996-07-22 21:56                       ` Morris M. Siegel
  0 siblings, 0 replies; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-22 21:56 UTC (permalink / raw)
  To: Zefram; +Cc: hzoli, zsh-users, schaefer

On Jul 22, 10:27pm, Zefram wrote:
> Subject: Re: Bug in case stmt with '('
> >I confess to having forgotten about the empty pattern.  Using null syntax
> >to implicitly denote the empty pattern in a script doesn't enhance readability
> >in any case, and I think zsh syntax would be safer and less confusing if the
> >empty pattern had to be denoted with the explicit syntax "()".
> 
> Does POSIX require that a completely empty pattern be allowed?  sh and
> ksh don't like it, and zsh only allows it where there's a |.  I don't
> see any problem with producing a parse error, and requiring an explicit
> empty pattern.
> 
> -zefram
>-- End of excerpt from Zefram

To tell the truth, I have no idea what POSIX says about empty patterns.
It seems that your views on the subject coincide with mine.

-- Morrie Siegel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
       [not found]             ` <schaefer@candle.brasslantern.com>
  1996-07-22 19:26               ` Morris M. Siegel
@ 1996-07-22 22:31               ` Morris M. Siegel
  1996-07-22 23:37                 ` Bart Schaefer
  1 sibling, 1 reply; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-22 22:31 UTC (permalink / raw)
  To: schaefer, Zoltan Hidvegi, zsh-users

On Jul 22,  2:51pm, Bart Schaefer wrote:
> Subject: Re: Bug in case stmt with '('
> On Jul 22,  4:48pm, Morris M. Siegel wrote:
> } Subject: Re: Bug in case stmt with '('
> } 
> } Considering the fact that glob patterns in general are supposed to generate
> } file names, which usually do not contain blanks
> 
> I dispute that assertion.  Ever used a filesystem that's shared with a
> Macintosh?

I confess not.  By the way, has zsh been ported to the Macintosh? or to
any non-Unix OS?

> } zsh might also ignore
> } unescaped whitespace in glob patterns (in general, not just in 'case'
> } statements).
> 
> There's no such thing as "unescaped whitespace in glob patterns."  Zsh
> lexes glob patterns as a STRING token.  By the time filename generation
> gets around to interpreting the string as a glob pattern, any spaces
> that are left MUST have been "escaped" somehow, to make it through the
> lexer.  Putting parens around the pattern is one such possible quoting.

Okay -- (not knowing how lex.c works puts me somewhat at a disadvantage) --
in light of what you say, then what I mean is that the whitespace would have
to be escaped with backslash or quotes to be non-trivial.  E.g., the glob
pattern
	( file*.* ~ file.c )
would be equivalent to
	(file*.*~file.c)
and
	(  )
would be the empty pattern, whereas
	(my\ file)
or
	'my file'
would each match the 7-character filename 'my file'.

-- Morrie Siegel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug in case stmt with '('
  1996-07-22 22:31               ` Morris M. Siegel
@ 1996-07-22 23:37                 ` Bart Schaefer
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Schaefer @ 1996-07-22 23:37 UTC (permalink / raw)
  To: Zoltan Hidvegi, Morris M. Siegel, zsh-users

On Jul 22,  6:31pm, Morris M. Siegel wrote:
} Subject: Re: Bug in case stmt with '('
}
} > There's no such thing as "unescaped whitespace in glob patterns."  Zsh
} > lexes glob patterns as a STRING token.
} 
} Okay -- (not knowing how lex.c works puts me somewhat at a disadvantage) --
} in light of what you say, then what I mean is that the whitespace would have
} to be escaped with backslash or quotes to be non-trivial.

Unfortunately, glob patterns are not the only STRING tokens.  That means
that you can't strip out the "trivial" spaces when lexing, without messing
up other uses of those tokens.

} E.g., the glob pattern
} 	( file*.* ~ file.c )
} would be equivalent to
} 	(file*.*~file.c)

That's exactly what my `if (incasepat && ...)' patch does, except it only
does it in the pattern part of case statements.

Unfortunately, it breaks this sort of thing:

	case "foo bar" in
	(foo bar)|boing) echo yes;;
	esac

because you can't tell until you find the `|' beyond the first ')' that
you should not have been stripping out spaces when lexing inside the
earlier '('.

This works, though:

	case "foo bar" in
	((foo bar)|boing) echo yes;;
	esac


-- 
Bart Schaefer                             Brass Lantern Enterprises
http://www.well.com/user/barts            http://www.nbn.com/people/lantern

New male in /home/schaefer:
>N  2 Justin William Schaefer  Sat May 11 03:43  53/4040  "Happy Birthday"


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Morrie is off for July 25
  1996-07-18 23:43   ` Zoltan Hidvegi
  1996-07-19  2:23     ` Bart Schaefer
@ 1996-07-24 16:07     ` Morris M. Siegel
  1 sibling, 0 replies; 17+ messages in thread
From: Morris M. Siegel @ 1996-07-24 16:07 UTC (permalink / raw)
  To: Zoltan Hidvegi, schaefer, zsh-users; +Cc: comet

Dear Andy,

This is a short note to remind that (as I requested earlier and you agreed),
I'm taking off tomorrow, July 25.

-- Morrie


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~1996-07-24 16:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-07-17 21:01 Bug in case stmt with '(' Morris M. Siegel
1996-07-18 18:27 ` Bart Schaefer
1996-07-18 23:43   ` Zoltan Hidvegi
1996-07-19  2:23     ` Bart Schaefer
1996-07-19 16:00       ` Zoltan Hidvegi
1996-07-21 22:53         ` Morris M. Siegel
1996-07-22  6:31           ` Bart Schaefer
     [not found]             ` <schaefer@candle.brasslantern.com>
1996-07-22 19:26               ` Morris M. Siegel
1996-07-22 19:51                 ` Zoltan Hidvegi
1996-07-22 20:48                   ` Morris M. Siegel
1996-07-22 21:27                     ` Zefram
1996-07-22 21:56                       ` Morris M. Siegel
1996-07-22 21:51                     ` Bart Schaefer
1996-07-22 21:23                   ` Zefram
1996-07-22 22:31               ` Morris M. Siegel
1996-07-22 23:37                 ` Bart Schaefer
1996-07-24 16:07     ` Morrie is off for July 25 Morris M. Siegel

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).