From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8063 invoked by alias); 3 Feb 2018 17:44:40 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42340 Received: (qmail 17081 invoked by uid 1010); 3 Feb 2018 17:44:40 -0000 X-Qmail-Scanner-Diagnostics: from kahlil.inlv.org by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(37.59.109.123):SA:0(-1.9/5.0):. Processed in 2.696162 secs); 03 Feb 2018 17:44:40 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: martijn@inlv.org X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | From: Martijn Dekker To: Zsh hackers list Subject: '<<-' here-documents oddity with line continuation Message-ID: Date: Sat, 3 Feb 2018 18:39:13 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit zsh has an oddity with here-documents using the '<<-' operator. (Note: below, represents a tab character, not the literal string ''.) POSIX says: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04 | If the redirection operator is "<<-", all leading characters | shall be stripped from input lines and the line containing the | trailing delimiter. In a construct like cat <<-EOF one \ two EOF where the newline after "one \" is backslash-escaped (line continuation), zsh outputs one two whereas all other shells (bash, dash, *ksh, yash, etc.) output one two Superficially, it looks like zsh is the only shell that actually complies with POSIX, as it strips the leading characters from all lines in the here-document, including lines followed by a line ending in slash. However, line continuation in POSIXy shells is parsed at a very early stage, even before token recognition: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01 | A that is not quoted shall preserve the literal value of | the following character, with the exception of a . If a | follows the , the shell shall interpret this as | line continuation. The and shall be removed | before splitting the input into tokens. Since the escaped | is removed entirely from the input and is not replaced by any white | space, it cannot serve as a token separator. (One funny effect of this: reserved words such as 'while' or 'select' are not recognised if any part of them is quoted, but they can still be split over multiple lines using line continuation!) So it would seem logical that the definition of "input line" used by POSIX for here-documents is based on lines resulting *after* parsing line continuation. That would then keep the s from being stripped from "continued" lines. Here's a quick test script (compatible with all POSIX shells). It outputs "zsh" on zsh and "ok" on all other shells. tab=$(printf '\t') lf=$(printf '\nX'); lf=${lf%X} eval "foo=\$(cat <<-EOF${lf}${tab}1\\${lf}${tab}2${lf}${tab}EOF${lf})" case $foo in ( 1${tab}2 ) echo ok ;; ( 12 ) echo zsh ;; ( * ) echo NEWBUG ;; esac Since zsh's behaviour looks sensible on the face of it, I'm reluctant to call it a bug, but it is certainly an incompatibility and seems to be non-compliant with POSIX. Maybe something to fix in emulation? Thanks, - M.