From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15053 invoked by alias); 14 May 2015 14:42:57 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 35128 Received: (qmail 1944 invoked from network); 14 May 2015 14:42:46 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, SPF_HELO_PASS autolearn=ham version=3.3.2 X-AuditID: cbfec7f4-f79c56d0000012ee-f6-5554b46215d0 Date: Thu, 14 May 2015 15:42:38 +0100 From: Peter Stephenson To: Martijn Dekker , zsh-workers@zsh.org Subject: Re: 'case' pattern matching bug with bracket expressions Message-id: <20150514154238.0e547ff0@pwslap01u.europe.root.pri> In-reply-to: <55549FB2.80705@inlv.org> References: <55549FB2.80705@inlv.org> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDLMWRmVeSWpSXmKPExsVy+t/xy7pJW0JCDZ7O5bA41ZVicbD5IZMD k8fftZOYPFYd/MAUwBTFZZOSmpNZllqkb5fAlXFgv2zBZ8mKT7sWMDUwvhXuYuTkkBAwkdg/ 6xkzhC0mceHeerYuRi4OIYGljBL7/7czQjgNTBK7puxihXC2MUpsnbeQFaSFRUBVYsX1x2A2 m4ChxNRNsxlBbBEBK4lPM/+A2cICjhK/+jvBangF7CXan21kAbE5BdQkWtduB7OFgOac6V3I BGLzC+hLXP37iQniJHuJmVfOMEL0Ckr8mHwPrJ5ZQEti87YmVghbXmLzmrfMEHPUJW7c3c0+ gVFoFpKWWUhaZiFpWcDIvIpRNLU0uaA4KT3XUK84Mbe4NC9dLzk/dxMjJIy/7GBcfMzqEKMA B6MSD+8K9eBQIdbEsuLK3EOMEhzMSiK8dZtCQoV4UxIrq1KL8uOLSnNSiw8xSnOwKInzzt31 PkRIID2xJDU7NbUgtQgmy8TBKdXA2CAoybJysfkxI6ej606/tAlmLn72abPoJvOvh6P23JBe yCpUOontx7bdVoGikj1/lewVAlZc7LwmoHJK/sKvK2uCDteqx9kUzuS6d9/Exd/teGaxsO29 pvM20+YvETh5SXH97F8sv1ZzP/Jcv/1ex0Hz3ymRN9YxCYQmG94VVTqzVq84ufq0nBJLcUai oRZzUXEiAGe800lfAgAA On Thu, 14 May 2015 14:14:26 +0100 Martijn Dekker wrote: > While writing a cross-platform shell library I've come across a bug in > the way zsh (in POSIX mode) matches patterns in 'case' statements that > are at variance with other POSIX shells. > > Normally, zsh considers an empty bracket expression [] a bad pattern > while other shells ([d]ash, bash, ksh) consider it a negative: > > case abc in ( [] ) echo yes ;; ( * ) echo no ;; esac > > Expected output: no > Got output: zsh: bad pattern: [] This is the shell language being typically duplicitous and unhelpful. "]" after a "[" indicates that the "]" is part of the set. This is normal; in bash as well as zsh: [[ ']' = []] ]] && echo yes outputs 'yes'. However, as you've found out, other shells handle the case where there isn't another ']' later. Generally there's no harm in this, and in most cases we could do this (the case below is harder). Nonetheless, there's a real ambiguity here, so given this and the following I'd definitely suggest not relying on it if you can avoid doing so --- use something else to signify an empty string. > The same thing does NOT produce an error, but a false positive (!), if > an extra non-matching pattern with | is added: > > case abc in ( [] | *[!a-z]*) echo yes ;; ( * ) echo no ;; esac This is the pattern: '[' introducing bracketed expression '] | *[!a-z' characters inside ']' end of bracketed expression '*' wildcard. so it's a set including the character a followed by anything, and hence matches. I'm not really sure we *can* resolve this unambiguously the way you want. Is there something that forbids us from interpreting the pattern that way? The handling of ']' at the start is mandated, if I've followed all the logic corretly --- POSIX 2007 Shell and Utilities 2.13.1 says: [ If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the character ( '!' ) shall replace the character ( '^' ) in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression. A bracket expression starting with an unquoted character produces unspecified results. Otherwise, '[' shall match the character itself. The languaqge is a little turgid, but I think it's saying "unless you have ^ or [ just go with the RE rules in [section 9.3.5]". 9.3.5 (in regular expressions) says, amongst a lot of other things: The ( ']' ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial ( '^' ), if any) That's a "shall". I haven't read through the "case" doc so there may be some killer reason why that " | " has to be a case separator and not part of a square-bracketed expression. But that would seem to imply some form of hierarchical parsing in which those characters couldn't occur within a pattern. By the way, we don't handle all forms in 9.3.5, e.g. equivalence sets, so saying "it works like REs" isn't a perfect answer for zsh, either. pws