From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from math.gatech.edu (euclid.skiles.gatech.edu [130.207.146.50]) by werple.net.au (8.7/8.7.1) with SMTP id FAA12711 for <mason@werple.mira.net.au>; Tue, 7 Nov 1995 05:33:23 +1100 (EST)
Received: by math.gatech.edu (5.x/SMI-SVR4)
	id AA07571; Mon, 6 Nov 1995 13:20:20 -0500
Resent-Date: Mon, 6 Nov 1995 18:25:09 +0100 (MET)
Old-Return-Path: <hzoli@cs.elte.hu>
From: Zoltan Hidvegi <hzoli@cs.elte.hu>
Message-Id: <199511061725.SAA08785@bolyai.cs.elte.hu>
Subject: Re: Expansion/quoting quirks
To: kaefer@aglaia.snafu.de (Thorsten Meinecke)
Date: Mon, 6 Nov 1995 18:25:09 +0100 (MET)
In-Reply-To: <m0tC4gW-00007BC@aglaia.snafu.DE> from "Thorsten Meinecke" at Nov 5, 95 02:00:27 pm
X-Mailer: ELM [version 2.4 PL24]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: hzoli@cs.elte.hu
Resent-Message-Id: <"M5HPm1.0.Ds1.Z7bdm"@euclid>
Resent-From: zsh-workers@math.gatech.edu
X-Mailing-List: <zsh-workers@math.gatech.edu> archive/latest/537
X-Loop: zsh-workers@math.gatech.edu
Precedence: list
Resent-Sender: zsh-workers-request@math.gatech.edu

Thorsten Meinecke wrote:
> echo `echo \\\\`    # broken in hzoli, and in vanilla zsh if invoked as (k)sh

Yes, that's really broken.  However echo is not good for testing here since
shells are diffen in how they interpret escape sequences in echo.  It's better
to alias echo to 'print -r --'.  With zsh it is enough to set the bsdecho
option.  This bug appeared with the input patches from Peter.  The following
happens:  the \\\\ within `...` is parsed as Bnull\Bnull\.  The lexer is
called again with that and it thinks that the first \ quotes the Bnull.  The
last \ then a parse error.  Below is a patch to input.c to drop tokens from
the input.  These returned tokens caused some other bugs earlier and it can be
dangerous when a script contains some tokens.

> echo "$(echo \\\\)" # sh and ksh seem to differ here (bash would give `\\')

sh should give two slashes.  The difference is probably in the escape handling
of sh.

> nargs ${undef-"a b"}             # vanilla + hzoli: shouldn't split here

That's difficult.  sh_word_split splits the result of a parameter expansion.
Here the result is 'a b' which is split to 'a' 'b'.

> #% argc=3, argv=( 'a b' '' 'c' )
> nargs ${undef-"$@"}              # hzoli: 'a b' shouldn't split into 'a' 'b'

Same as the previous example.

> #% argc=3, argv=( 'a b' '' 'c' )
> nargs "${undef-"$@"}"            # hzoli: zsh: closing brace expected

That's because the second " closes the first.  It would be easy to fix it.

My problem is that I do not know what is the standard behaviour here.  My
library does not have the relevant POSIX papers.

It would be important to know how to parse these things.  It seems that the
lexer should be called on the body of ${...-...}.  I'll try to fix these if
someone tells me what the standards say here.  I have ksh93.  May I assume
that ksh93 behaviour is the standard?

The most difficult part here is ${...##...}.  Here the body should be
interpreted as a pattern.  Here the expanded body shoud be parsed again for
quotes.  E.g. foo='te\s\t' bar='\s\t' ; echo ${foo%%$bar} does not removes the
tail of foo since \ only escapes the s and t.  But foo='te"st"' bar='"??"'
echo ${foo%$bar} does remove the tail.

Bye,

  Zoltan

diff -c Src/input.c~ Src/input.c
*** Src/input.c~	Sat Nov  4 09:47:43 1995
--- Src/input.c	Mon Nov  6 17:50:17 1995
***************
*** 109,115 ****
  	if (inbufleft) {
  	    inbufleft--;
  	    inbufct--;
! 	    return lastc = (unsigned)*inbufptr++;
  	}
  	/*
  	 * No characters in input buffer.
--- 109,118 ----
  	if (inbufleft) {
  	    inbufleft--;
  	    inbufct--;
! 	    lastc = (unsigned)*inbufptr++;
! 	    if (itok(lastc))
! 		continue;
! 	    return lastc;
  	}
  	/*
  	 * No characters in input buffer.