From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from euclid.skiles.gatech.edu (list@euclid.skiles.gatech.edu [130.207.146.50]) by melb.werple.net.au (8.7.5/8.7.3/2) with ESMTP id AAA06963 for ; Wed, 24 Jul 1996 00:22:10 +1000 (EST) Received: (from list@localhost) by euclid.skiles.gatech.edu (8.7.3/8.7.3) id KAA11937; Tue, 23 Jul 1996 10:09:05 -0400 (EDT) Resent-Date: Tue, 23 Jul 1996 10:09:05 -0400 (EDT) From: Zoltan Hidvegi Message-Id: <199607231408.QAA11293@bolyai.cs.elte.hu> Subject: Re: Bug in case stmt with '(' To: A.Main@dcs.warwick.ac.uk (Zefram) Date: Tue, 23 Jul 1996 16:08:17 +0200 (MET DST) Cc: segal@morgan.com, schaefer@nbn.com, zsh-workers@math.gatech.edu (Zsh workers list) In-Reply-To: <17651.199607222123@stone.dcs.warwick.ac.uk> from Zefram at "Jul 22, 96 10:23:24 pm" Organization: Dept. of Comp. Sci., Eotvos University, Budapest, Hungary Phone: (36 1)2669833 ext: 2667, home phone: (36 1) 2752368 X-Mailer: ELM [version 2.4ME+ PL16 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Resent-Message-ID: <"lpmtq.0.Ow2.0qDzn"@euclid> Resent-From: zsh-workers@math.gatech.edu X-Mailing-List: archive/latest/1742 X-Loop: zsh-workers@math.gatech.edu Precedence: list Resent-Sender: zsh-workers-request@math.gatech.edu > >Unfortunately there is still an incompatibility in case: > > > >case foo in > >( f* | b* ) echo yes > >esac > > > >should print yes but in zsh it works as > > > >case foo in > >\ f*\ |\ b*\ ) echo yes;; > >esac > > I was wondering whether that would be the case. It's a serious problem. > > >To fix this would be really difficult I think. > > It could be easily fixed by modifying glob semantics such that unquoted > whitespace embedded in a pattern is ignored. I doubt that any real > code relies on the current behaviour, which is in any case > undocumented. It is used in ${...%...} substitutions where a space stands for itself. And it is not even a zsh feature. So unquoted whitespace should not always be ignored. Ignoring unquoted whitespace before | and ) and after ( and | seems to be a better solution but it is more difficult to implement. Consistency is desirable so patterns should behave similarily in case statements, ${...%...} substitutions (double quoted or not), in argumenents to builtins after -m etc. At a few places the code assumes that the lexer just tokenizes the input but does not actually modfy it so omitting these spaces in the lexer is not the best solution. The best would be to handle it in glob.c but here it is a problem how can we distinguish ( foo\ | bar ) from ( foo | bar ). It would require a new token for a null-space (the word `token' is amiguous in zsh, it can be either a token returned by gettok or a character token for a character in ztokens, here I use the second meaning). A token for null-TAB may also be necessary here. The simplest solution is to convert every unquoted space and TAB which is inside a globbing paren to a null-space and null-tab token and later in glob.c a null-space or null-tab is either treated as space/tab or discarded if it is adjacent to | or comes after a `(' of before a `)' (or it may be better to disard these after `)' and before `(' as well). This seems to be a quite simple solution. On the other topic of empty patterns: it is true that POSIX does not allow that. Even previous zsh versions had problems with empty patterns in case statement so it probably does not cause any problems if we do not allow that in case statement. That would simplyfy parsing since it means that after a BAR a STRING token must come in a case pattern. But in other places empty patterns are usefull. I ofter use (...|) for an optional match and this should not be disallowed. Zoltan PS. I'm now moving this quite technical discussion to zsh-workers.