zsh-workers
 help / color / mirror / code / Atom feed
* '<<-' here-documents oddity with line continuation
@ 2018-02-03 17:39 Martijn Dekker
  2018-02-09  7:01 ` Martijn Dekker
  0 siblings, 1 reply; 8+ messages in thread
From: Martijn Dekker @ 2018-02-03 17:39 UTC (permalink / raw)
  To: Zsh hackers list

zsh has an oddity with here-documents using the '<<-' operator.

(Note: below, <tab> represents a tab character, not the literal string
'<tab>'.)

POSIX says:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04
| If the redirection operator is "<<-", all leading <tab> characters
| shall be stripped from input lines and the line containing the
| trailing delimiter.

In a construct like

	cat <<-EOF
<tab>	one \
<tab>	two
<tab>	EOF

where the newline after "one \" is backslash-escaped (line
continuation), zsh outputs

one two

whereas all other shells (bash, dash, *ksh, yash, etc.) output
one <tab>two

Superficially, it looks like zsh is the only shell that actually
complies with POSIX, as it strips the leading <tab> characters from all
lines in the here-document, including lines followed by a line ending in
slash.

However, line continuation in POSIXy shells is parsed at a very early
stage, even before token recognition:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
| A <backslash> that is not quoted shall preserve the literal value of
| the following character, with the exception of a <newline>. If a
| <newline> follows the <backslash>, the shell shall interpret this as
| line continuation. The <backslash> and <newline> shall be removed
| before splitting the input into tokens. Since the escaped <newline>
| is removed entirely from the input and is not replaced by any white
| space, it cannot serve as a token separator.

(One funny effect of this: reserved words such as 'while' or 'select'
are not recognised if any part of them is quoted, but they can still be
split over multiple lines using line continuation!)

So it would seem logical that the definition of "input line" used by
POSIX for here-documents is based on lines resulting *after* parsing
line continuation. That would then keep the <tab>s from being stripped
from "continued" lines.

Here's a quick test script (compatible with all POSIX shells). It
outputs "zsh" on zsh and "ok" on all other shells.

tab=$(printf '\t')
lf=$(printf '\nX'); lf=${lf%X}
eval "foo=\$(cat <<-EOF${lf}${tab}1\\${lf}${tab}2${lf}${tab}EOF${lf})"
case $foo in
( 1${tab}2 ) echo ok ;;
( 12 )       echo zsh ;;
( * )        echo NEWBUG ;;
esac

Since zsh's behaviour looks sensible on the face of it, I'm reluctant to
call it a bug, but it is certainly an incompatibility and seems to be
non-compliant with POSIX. Maybe something to fix in emulation?

Thanks,

- M.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-03 17:39 '<<-' here-documents oddity with line continuation Martijn Dekker
@ 2018-02-09  7:01 ` Martijn Dekker
  2018-02-09  7:58   ` Martijn Dekker
  2018-02-09  9:24   ` Peter Stephenson
  0 siblings, 2 replies; 8+ messages in thread
From: Martijn Dekker @ 2018-02-09  7:01 UTC (permalink / raw)
  To: Zsh hackers list

Op 03-02-18 om 18:39 schreef Martijn Dekker:
> In a construct like
> 
> 	cat <<-EOF
> <tab>	one \
> <tab>	two
> <tab>	EOF
> 
> where the newline after "one \" is backslash-escaped (line
> continuation), zsh outputs
> 
> one two
> 
> whereas all other shells (bash, dash, *ksh, yash, etc.) output
> one <tab>two
[...]

While figuring out a patch for the above issue, I found another oddity
as well, that more clearly looks like a bug.

zsh:

cat <<EOF
foo\
EOF

output: foo (no final linefeed)

Every other shell:

cat <<EOF
foo\
EOF
EOF

output: fooEOF

The line continuation backslash after 'one\' should make the first EOF
part of the same line as 'one', thereby deactivating it as a terminator.

In the next few days I hope to submit a patch that fixes both issues.

Meanwhile I would welcome opinions whether either or both of these
issues should be fixed unconditionally, or in emulation only -- and, if
the latter, what shell option to attach it to. POSIX_STRINGS maybe?

Thanks,

- M.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-09  7:01 ` Martijn Dekker
@ 2018-02-09  7:58   ` Martijn Dekker
  2018-02-09  9:24   ` Peter Stephenson
  1 sibling, 0 replies; 8+ messages in thread
From: Martijn Dekker @ 2018-02-09  7:58 UTC (permalink / raw)
  To: Zsh hackers list

Op 09-02-18 om 08:01 schreef Martijn Dekker:
> In the next few days I hope to submit a patch that fixes both issues.
> 
> Meanwhile I would welcome opinions whether either or both of these
> issues should be fixed unconditionally, or in emulation only -- and, if
> the latter, what shell option to attach it to. POSIX_STRINGS maybe?

Here's a fairly trivial concept patch. I believe this makes zsh
here-documents act like other POSIX shells. If either or both fixes need
to be conditional upon emulation, an extra call to isset() should suffice.

- M.

diff --git a/Src/exec.c b/Src/exec.c
index c39680d..ca04b05 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -4351,7 +4351,7 @@ char *
 gethere(char **strp, int typ)
 {
     char *buf;
-    int bsiz, qt = 0, strip = 0;
+    int bsiz, qt = 0, strip = 0, linecont = 0;
     char *s, *t, *bptr, c;
     char *str = *strp;

@@ -4372,7 +4372,7 @@ gethere(char **strp, int typ)
     for (;;) {
 	t = bptr;

-	while ((c = hgetc()) == '\t' && strip)
+	while ((c = hgetc()) == '\t' && strip && !linecont)
 	    ;
 	for (;;) {
 	    if (bptr == buf + bsiz) {
@@ -4393,12 +4393,14 @@ gethere(char **strp, int typ)
 	    c = hgetc();
 	}
 	*bptr = '\0';
-	if (!strcmp(t, str))
+	if (!strcmp(t, str) && !linecont)
 	    break;
 	if (lexstop) {
 	    t = bptr;
 	    break;
 	}
+	if (!qt)
+	    linecont = (bptr > t && *(bptr - 1) == '\\');
 	*bptr++ = '\n';
     }
     *t = '\0';


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-09  7:01 ` Martijn Dekker
  2018-02-09  7:58   ` Martijn Dekker
@ 2018-02-09  9:24   ` Peter Stephenson
  2018-02-09 15:27     ` Stephane Chazelas
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2018-02-09  9:24 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 9 Feb 2018 08:01:41 +0100
Martijn Dekker <martijn@inlv.org> wrote:
> Meanwhile I would welcome opinions whether either or both of these
> issues should be fixed unconditionally, or in emulation only -- and, if
> the latter, what shell option to attach it to. POSIX_STRINGS maybe?

Thanks for the patch.

I think the question is whether existing users are more likely to be
relying on this behaviour, or simply finding it confusing.  I'd actaully
hazard the latter --- line continuation is more useful if it's doing
something a bit more predictable --- so I'd be inclined just to include
the patch.  But I expect there are arguments the other way.

pws


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-09  9:24   ` Peter Stephenson
@ 2018-02-09 15:27     ` Stephane Chazelas
  2018-02-09 16:07       ` Martijn Dekker
  0 siblings, 1 reply; 8+ messages in thread
From: Stephane Chazelas @ 2018-02-09 15:27 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2018-02-09 09:24:27 +0000, Peter Stephenson:
> On Fri, 9 Feb 2018 08:01:41 +0100
> Martijn Dekker <martijn@inlv.org> wrote:
> > Meanwhile I would welcome opinions whether either or both of these
> > issues should be fixed unconditionally, or in emulation only -- and, if
> > the latter, what shell option to attach it to. POSIX_STRINGS maybe?
> 
> Thanks for the patch.
> 
> I think the question is whether existing users are more likely to be
> relying on this behaviour, or simply finding it confusing.  I'd actaully
> hazard the latter --- line continuation is more useful if it's doing
> something a bit more predictable --- so I'd be inclined just to include
> the patch.  But I expect there are arguments the other way.
[...]

I agree it makes sense of treating it as a minor compatibility
bug fix.

Note that there's also:

cat << EOF
foo
E\
OF

which zsh does differently from other shells (and that nobody
would ever do).

About order and token recognition to reply on Martijn's initial
report, note  that in:

echo "$(echo 'foo\
bar')"

Or

echo "$(cat << 'EOF'
foo\
bar
EOF
)"

The \<LF> is meant *not* to be treated as a line continuation.
bash fixed a bug recently for that (the latter used to output
"foobar", it's only fixed on the development branch).

So there has to be some level of tokenisation and parsing done
before line continuation is handled. It's not like in C where
it's done as one of the first steps of the pre-processing stage.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-09 15:27     ` Stephane Chazelas
@ 2018-02-09 16:07       ` Martijn Dekker
  2018-02-09 18:19         ` Martijn Dekker
  0 siblings, 1 reply; 8+ messages in thread
From: Martijn Dekker @ 2018-02-09 16:07 UTC (permalink / raw)
  To: zsh-workers

Op 09-02-18 om 16:27 schreef Stephane Chazelas:
> Note that there's also:
> 
> cat << EOF
> foo
> E\
> OF
> 
> which zsh does differently from other shells (and that nobody
> would ever do).

IOW, all shells support line continuation within the terminating
delimiter except zsh. Eesh.

Somebody somewhere has probably done this. I'll see if I can rethink my
patch to fix this as well.

> About order and token recognition to reply on Martijn's initial
> report, note  that in:
> 
> echo "$(echo 'foo\
> bar')"
> 
> Or
> 
> echo "$(cat << 'EOF'
> foo\
> bar
> EOF
> )"
> 
> The \<LF> is meant *not* to be treated as a line continuation.
> bash fixed a bug recently for that (the latter used to output
> "foobar", it's only fixed on the development branch).

(For the record, it looks like zsh handles both correctly with or
without the patch.)

> So there has to be some level of tokenisation and parsing done
> before line continuation is handled. It's not like in C where
> it's done as one of the first steps of the pre-processing stage.

Yes, I did realise as I was figuring out this patch that I slightly got
the wrong idea in the initial report, as the gethere() conversion
function appears to depend on tokenisation having already happened.

- Martijn


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-09 16:07       ` Martijn Dekker
@ 2018-02-09 18:19         ` Martijn Dekker
  2018-02-12 10:07           ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Martijn Dekker @ 2018-02-09 18:19 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

Op 09-02-18 om 17:07 schreef Martijn Dekker:
> Op 09-02-18 om 16:27 schreef Stephane Chazelas:
>> Note that there's also:
>>
>> cat << EOF
>> foo
>> E\
>> OF
>>
>> which zsh does differently from other shells (and that nobody
>> would ever do).
> 
> IOW, all shells support line continuation within the terminating
> delimiter except zsh. Eesh.
> 
> Somebody somewhere has probably done this. I'll see if I can rethink my
> patch to fix this as well.

Stéphane helped me realise my whole approach to the patch was wrong. Of
course line continuation should be handled within the loop that parses a
line in the first place, and not in between parsing individual lines. No
extra flag is needed at all.

Here's take two, which should fix all three issues. I also added a test
case, and slightly edited another test case to make sure line
continuation is not parsed if the delimiter is quoted.

- Martijn

[-- Attachment #2: heredoc.patch --]
[-- Type: text/plain, Size: 2209 bytes --]

diff --git a/Src/exec.c b/Src/exec.c
index c39680d..e5c6455 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -4387,8 +4387,17 @@ gethere(char **strp, int typ)
 		bptr = buf + bsiz;
 		bsiz *= 2;
 	    }
-	    if (lexstop || c == '\n')
+	    if (lexstop)
 		break;
+	    if (c == '\n') {
+		if (!qt && bptr > t && *(bptr - 1) == '\\') {
+		    /* line continuation */
+		    bptr--;
+		    c = hgetc();
+		    continue;
+		} else
+		    break;
+	    }
 	    *bptr++ = c;
 	    c = hgetc();
 	}
diff --git a/Test/A04redirect.ztst b/Test/A04redirect.ztst
index b8105cf..ef7ddb2 100644
--- a/Test/A04redirect.ztst
+++ b/Test/A04redirect.ztst
@@ -114,7 +114,7 @@
   heretest() {
     print First line
     cat <<'    HERE'
-    $foo$foo met celeste  'but with extra'  "stuff to test quoting"
+    $foo$foo met celeste  'but with extra'  "stuff to test quoting"\
     HERE
     print Last line
   }
@@ -125,19 +125,57 @@
   heretest
 0:Re-evaluation of function output with here document, quoted
 >First line
->    $foo$foo met celeste  'but with extra'  "stuff to test quoting"
+>    $foo$foo met celeste  'but with extra'  "stuff to test quoting"\
 >Last line
 >First line
->    $foo$foo met celeste  'but with extra'  "stuff to test quoting"
+>    $foo$foo met celeste  'but with extra'  "stuff to test quoting"\
 >Last line
 >First line
->    $foo$foo met celeste  'but with extra'  "stuff to test quoting"
+>    $foo$foo met celeste  'but with extra'  "stuff to test quoting"\
 >Last line
 
   read -r line <<'  HERE'
   HERE
 1:No input, not even newline, from empty here document.
 
+  heretest() {
+    print First line
+    cat <<-HERE
+	$foo\
+	$foo
+	some\
+	stuff
+	to\
+  test
+	tab\stripping
+	HERE
+    print Last line
+  }
+  heretest
+  eval "$(functions heretest)"
+  heretest
+  eval "$(functions heretest)"
+  heretest
+0:Line continuation in here-document with unquoted delimiter
+>First line
+>bar	bar
+>some	stuff
+>to  test
+>tab\stripping
+>Last line
+>First line
+>bar	bar
+>some	stuff
+>to  test
+>tab\stripping
+>Last line
+>First line
+>bar	bar
+>some	stuff
+>to  test
+>tab\stripping
+>Last line
+
   #
   # exec tests: perform these in subshells so if they fail the
   # shell won't exit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: '<<-' here-documents oddity with line continuation
  2018-02-09 18:19         ` Martijn Dekker
@ 2018-02-12 10:07           ` Peter Stephenson
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Stephenson @ 2018-02-12 10:07 UTC (permalink / raw)
  To: zsh-workers

On Fri, 9 Feb 2018 19:19:52 +0100
Martijn Dekker <martijn@inlv.org> wrote:
> Here's take two, which should fix all three issues. I also added a test
> case, and slightly edited another test case to make sure line
> continuation is not parsed if the delimiter is quoted.

Thanks, it didn't look like there was going to be more discussion so
I've committed it.

pws


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-12 10:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-03 17:39 '<<-' here-documents oddity with line continuation Martijn Dekker
2018-02-09  7:01 ` Martijn Dekker
2018-02-09  7:58   ` Martijn Dekker
2018-02-09  9:24   ` Peter Stephenson
2018-02-09 15:27     ` Stephane Chazelas
2018-02-09 16:07       ` Martijn Dekker
2018-02-09 18:19         ` Martijn Dekker
2018-02-12 10:07           ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).