zsh-workers
 help / color / mirror / code / Atom feed
* indented heredocs
@ 2016-12-21 19:29 Dave Yost
  2016-12-21 19:50 ` Daniel Shahaf
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dave Yost @ 2016-12-21 19:29 UTC (permalink / raw)
  To: zsh workers

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Today we have this:

0 Wed 10:53:18 ~
204 Z% cat <<xx   
foo
bar
xx
foo
bar
0 Wed 10:53:33 ~
205 Z% 

Surely people have thought of this (Alternative 1):

0 Wed 10:53:53 ~
205 Z% cat <<xx   
  foo
  bar
  xx
foo
bar
0 Wed 10:53:53 ~
206 Z% 

but shells don’t do that.

I ran this idea by Steve Bourne and asked him why indenting was not allowed.

> I never considered the indent idea.  It's a good idea although I don't like the idea of post processing the temp file to remove the 
> leading white space.  I agree the way it is now is not easy to look at, and I can't think of a way to have the ident amount specified
> in advance of reading the document.


BTW, of historical interest, he also said he stole the heredoc idea from somebody else at Cambridge <https://www.youtube.com/watch?v=FI_bZhV7wpI#t=21m8s>.

I suggested this (Alternative 2), which he liked:

0 Wed 10:53:53 ~
206 Z% cat \
  <<xx   
  foo
  bar
  xx
foo
bar
0 Wed 10:54:10 ~
207 Z% 

He also suggested

> You could find another symbol after <  Right now <xxx is file <<yyy is heredoc <<< is string.  Other meta symbols
> are available which now would cause syntax error.


I don’t think that would help anything. If the parser doesn’t know how to do the new syntax with the existing << operator, you’ll get an error, and if the parser doesn’t know the new operator, you’ll get an error. Same difference.

I propose Alternative 2.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-21 19:29 indented heredocs Dave Yost
@ 2016-12-21 19:50 ` Daniel Shahaf
  2016-12-21 20:38 ` Dave Yost
  2016-12-21 22:10 ` Bart Schaefer
  2 siblings, 0 replies; 11+ messages in thread
From: Daniel Shahaf @ 2016-12-21 19:50 UTC (permalink / raw)
  To: Dave Yost; +Cc: zsh workers

Dave Yost wrote on Wed, Dec 21, 2016 at 11:29:17 -0800:
> Surely people have thought of this (Alternative 1):
> 

> 0 Wed 10:53:53 ~
> 205 Z% cat <<xx   
>   foo
>   bar
>   xx
> foo
> bar
> 0 Wed 10:53:53 ~
> 206 Z% 
> 
> but shells don’t do that.
> 

That's supported already:

       <<[-] word
              ⋮
              If <<- is used, then all leading tabs are stripped from word and
              from the document.

$ zsh -f
% cat <<-x
heredocd>       foo
heredocd>       bar
heredocd>       x
foo
bar
% 

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-21 19:29 indented heredocs Dave Yost
  2016-12-21 19:50 ` Daniel Shahaf
@ 2016-12-21 20:38 ` Dave Yost
  2016-12-21 22:10 ` Bart Schaefer
  2 siblings, 0 replies; 11+ messages in thread
From: Dave Yost @ 2016-12-21 20:38 UTC (permalink / raw)
  To: zsh workers

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

I should mention that <<- is defined to remove all leading tabs in the heredoc, so you get this:

0 Wed 12:34:44 yost DaveBook ~
215 Z% cat <<-xx
heredocd>               a
heredocd>       b
heredocd>       xx
a
b
0 Wed 12:35:35 yost DaveBook ~
216 Z% 

so if you intended the line containing “a” to have a leading tab, you will not get what you want. This is a misfeature IMO.

Also: the examples below don’t show the “heredocd>” prefix because I pasted the entire input into the terminal before hitting return.

Dave

> On 2016-12-21, at 11:29 AM, Dave Yost <dave@yost.com> wrote:
> 
> Today we have this:
> 
> 0 Wed 10:53:18 ~
> 204 Z% cat <<xx   
> foo
> bar
> xx
> foo
> bar
> 0 Wed 10:53:33 ~
> 205 Z% 
> 
> Surely people have thought of this (Alternative 1):
> 
> 0 Wed 10:53:53 ~
> 205 Z% cat <<xx   
>   foo
>   bar
>   xx
> foo
> bar
> 0 Wed 10:53:53 ~
> 206 Z% 
> 
> but shells don’t do that.
> 
> I ran this idea by Steve Bourne and asked him why indenting was not allowed.
> 
>> I never considered the indent idea.  It's a good idea although I don't like the idea of post processing the temp file to remove the 
>> leading white space.  I agree the way it is now is not easy to look at, and I can't think of a way to have the ident amount specified
>> in advance of reading the document.
> 
> 
> BTW, of historical interest, he also said he stole the heredoc idea from somebody else at Cambridge <https://www.youtube.com/watch?v=FI_bZhV7wpI#t=21m8s>.
> 
> I suggested this (Alternative 2), which he liked:
> 
> 0 Wed 10:53:53 ~
> 206 Z% cat \
>   <<xx   
>   foo
>   bar
>   xx
> foo
> bar
> 0 Wed 10:54:10 ~
> 207 Z% 
> 
> He also suggested
> 
>> You could find another symbol after <  Right now <xxx is file <<yyy is heredoc <<< is string.  Other meta symbols
>> are available which now would cause syntax error.
> 
> 
> I don’t think that would help anything. If the parser doesn’t know how to do the new syntax with the existing << operator, you’ll get an error, and if the parser doesn’t know the new operator, you’ll get an error. Same difference.
> 
> I propose Alternative 2.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-21 19:29 indented heredocs Dave Yost
  2016-12-21 19:50 ` Daniel Shahaf
  2016-12-21 20:38 ` Dave Yost
@ 2016-12-21 22:10 ` Bart Schaefer
  2016-12-21 23:04   ` Bart Schaefer
  2016-12-29 22:31   ` Nikolay Aleksandrovich Pavlov (ZyX)
  2 siblings, 2 replies; 11+ messages in thread
From: Bart Schaefer @ 2016-12-21 22:10 UTC (permalink / raw)
  To: Dave Yost; +Cc: zsh workers

On Wed, Dec 21, 2016 at 11:29 AM, Dave Yost <Dave@yost.com> wrote:
>
> Surely people have thought of this (Alternative 1):
>
> 0 Wed 10:53:53 ~
> 205 Z% cat <<xx
>   foo
>   bar
>   xx
> foo
> bar
> 0 Wed 10:53:53 ~
> 206 Z%
>
> but shells don’t do that.

[...]

> I suggested this (Alternative 2), which [Bourne] liked:
>
> 0 Wed 10:53:53 ~
> 206 Z% cat \
>   <<xx
>   foo
>   bar
>   xx
> foo
> bar
> 0 Wed 10:54:10 ~
> 207 Z%

I'm not thrilled with this idea because it gives special semantics to
backslash-newline (as well as to leading spaces before "<<") which do
not currently exist.  In existing syntax, backslash-newline can simply
be discarded without changing the meaning of the command line, I think
even before tokenization.

I would propose instead something similar (read on below) to this:

% cat <<-'  xx'
  foo
  bar
  xx
foo
bar
%

This explicitly quotes the leading space that is to be stripped, so
there is no parsing ambiguity, and it piggybacks on the existing <<-
syntax, merely changing the expected leading space from "all tabs" to
"the leading whitespace on the end marker".

> I don’t think that would help anything. If the parser doesn’t know how to do
> the new syntax with the existing << operator, you’ll get an error, and if the
> parser doesn’t know the new operator, you’ll get an error. Same difference.

It is a consideration that we might prefer that older shells choke on
the new syntax.  I think having them choke by failing to find the end
marker is rather worse than having them choke by failing to recognize
the operator -- something that wrongly appears to be the end marker
might appear later in the script if we go your "Alternative 2" route.

Taken literally, my example above would be accepted by an older shell
and processed without stripping the leading spaces.  If that's
unacceptable, we need a different (and currently invalid) replacement
for "<<-" (the only thing that comes to mind is "<<|" which seems a
bad choice).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-21 22:10 ` Bart Schaefer
@ 2016-12-21 23:04   ` Bart Schaefer
  2016-12-22 19:04     ` Philippe Troin
  2016-12-29 22:31   ` Nikolay Aleksandrovich Pavlov (ZyX)
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2016-12-21 23:04 UTC (permalink / raw)
  To: zsh workers

On Wed, Dec 21, 2016 at 2:10 PM, Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> % cat <<-'  xx'

Chet Ramey reminds me (palm to forehead) that this already means to
quote the here-document content.  So, we'd need some other way of
quoting the leading whitespace.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-21 23:04   ` Bart Schaefer
@ 2016-12-22 19:04     ` Philippe Troin
  2016-12-22 21:10       ` Bart Schaefer
  2016-12-29  9:56       ` Vincent Lefevre
  0 siblings, 2 replies; 11+ messages in thread
From: Philippe Troin @ 2016-12-22 19:04 UTC (permalink / raw)
  To: zsh workers

On Wed, 2016-12-21 at 15:04 -0800, Bart Schaefer wrote:
> On Wed, Dec 21, 2016 at 2:10 PM, Bart Schaefer
> <schaefer@brasslantern.com> wrote:
> >
> > % cat <<-'  xx'
> 
> Chet Ramey reminds me (palm to forehead) that this already means to
> quote the here-document content.  So, we'd need some other way of
> quoting the leading whitespace.

Isn't the whole thing overkill?
We already can handle the simple cases:
  cat <<EOD  will use the document as is
  cat <<-EOD will strip leading tabs

If you want anything fancier, you can always used sed:
  sed -e 's!^  !!' <<EOD
or even zsh itself:
  while read -r line; do print -r ${line#  }; done <<EOD

Phil.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-22 19:04     ` Philippe Troin
@ 2016-12-22 21:10       ` Bart Schaefer
  2016-12-29  9:56       ` Vincent Lefevre
  1 sibling, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2016-12-22 21:10 UTC (permalink / raw)
  To: zsh workers

On Dec 22, 11:04am, Philippe Troin wrote:
}
} Isn't the whole thing overkill?

Maybe.

} If you want anything fancier, you can always used sed [...]
} or even zsh itself

Indeed, I'm sure this is more of an aesthetic consideration than
anything else -- neatly indented code and all that.  Having to throw
in an additional process or loop to achieve better readability in
the here-document adds ugliness and is somewhat obfuscatory.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-22 19:04     ` Philippe Troin
  2016-12-22 21:10       ` Bart Schaefer
@ 2016-12-29  9:56       ` Vincent Lefevre
  1 sibling, 0 replies; 11+ messages in thread
From: Vincent Lefevre @ 2016-12-29  9:56 UTC (permalink / raw)
  To: zsh-workers

On 2016-12-22 11:04:49 -0800, Philippe Troin wrote:
> We already can handle the simple cases:
>   cat <<EOD  will use the document as is
>   cat <<-EOD will strip leading tabs

Not everyone uses leading tabs. I think that getting the indent
prefix (as a sequence of whitespace) from the first line of the
heredoc would have been a better idea.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-21 22:10 ` Bart Schaefer
  2016-12-21 23:04   ` Bart Schaefer
@ 2016-12-29 22:31   ` Nikolay Aleksandrovich Pavlov (ZyX)
  2016-12-30  2:56     ` Bart Schaefer
  1 sibling, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2016-12-29 22:31 UTC (permalink / raw)
  To: Bart Schaefer, Dave Yost; +Cc: zsh workers



22.12.2016, 01:11, "Bart Schaefer" <schaefer@brasslantern.com>:
> On Wed, Dec 21, 2016 at 11:29 AM, Dave Yost <Dave@yost.com> wrote:
>>  Surely people have thought of this (Alternative 1):
>>
>>  0 Wed 10:53:53 ~
>>  205 Z% cat <<xx
>>    foo
>>    bar
>>    xx
>>  foo
>>  bar
>>  0 Wed 10:53:53 ~
>>  206 Z%
>>
>>  but shells don’t do that.
>
> [...]
>
>>  I suggested this (Alternative 2), which [Bourne] liked:
>>
>>  0 Wed 10:53:53 ~
>>  206 Z% cat \
>>    <<xx
>>    foo
>>    bar
>>    xx
>>  foo
>>  bar
>>  0 Wed 10:54:10 ~
>>  207 Z%
>
> I'm not thrilled with this idea because it gives special semantics to
> backslash-newline (as well as to leading spaces before "<<") which do
> not currently exist. In existing syntax, backslash-newline can simply
> be discarded without changing the meaning of the command line, I think
> even before tokenization.
>
> I would propose instead something similar (read on below) to this:
>
> % cat <<-' xx'
>   foo
>   bar
>   xx
> foo
> bar
> %
>
> This explicitly quotes the leading space that is to be stripped, so
> there is no parsing ambiguity, and it piggybacks on the existing <<-
> syntax, merely changing the expected leading space from "all tabs" to
> "the leading whitespace on the end marker".

This makes changing the indent rather tricky. YAML does better here: amount of stripped indent is either determined based on the first non-blank line (e.g.

```
 cat <<| EOF

   xx
    x
 EOF
```

will produce

```

xx
 x
```

because `xx` is first non-blank and it has 3 leading spaces here and `x` has four, meaning that the result is "\nxx\n\x20x") or is specified explicitly, relative to the indent of the line where block scalar starts (e.g.

```
 cat <<|1 EOF

   xx
    x
 EOF
```

will produce

```

 xx
  x
```

because `cat` has single space as indent, `xx` has 3 and it was requested that meaningful content starts with 1 (cat indent) + 1 (`1` before EOF) = 2 spaces, meaning that the result is "\n\x20xx\n\x20\x20x": has one more indent then in previous example).

>
>>  I don’t think that would help anything. If the parser doesn’t know how to do
>>  the new syntax with the existing << operator, you’ll get an error, and if the
>>  parser doesn’t know the new operator, you’ll get an error. Same difference.
>
> It is a consideration that we might prefer that older shells choke on
> the new syntax. I think having them choke by failing to find the end
> marker is rather worse than having them choke by failing to recognize
> the operator -- something that wrongly appears to be the end marker
> might appear later in the script if we go your "Alternative 2" route.
>
> Taken literally, my example above would be accepted by an older shell
> and processed without stripping the leading spaces. If that's
> unacceptable, we need a different (and currently invalid) replacement
> for "<<-" (the only thing that comes to mind is "<<|" which seems a
> bad choice).

YAML uses `|` and `>` to start block scalars, that’s why I used `|` above (`<<>` seems odd and may be confused with `<>`). Not sure why this should be a bad choice: `|` already has different meanings in different contexts, though only three (pipe, or and array subtraction (`${:|}`)) so far. `-` used  in `<<-` has much more meanings: negation/subtraction, stripping leading spaces, prepending `-` to `argv[0]` (i.e. running as login shell in most cases), stdin, rest arguments separator (`echo - -E` outputs just `-E`, though not sure whether it is intentional, `--` in many commands definitely is), close (in `>& -`), range, default (in `${:-}`), dereference (in `*(-/)`), flags leader (in almost any command and also in `$-`).

---

`sed`-based alternative is not good for the same reason I would reject any explicitly added spaces. If bother with this at all, it should satisfy the following requirements:

- Keep extra indent (or `<<-` would be mostly fine, though better something which also removes spaces).
- Allow easy reindenting with simple editor command that reindents (like `<{motion}` and `>{motion}` in Vim) without any additional actions (or `sed` would be mostly fine).
- Allow indenting end marker as user likes (or, at least, as the initial indent: one space in the examples): basically I would treat `cat <<| EOF` as something like `{` or `do` and `EOF` as `}` or `done`: semantically they are literal block header and literal block terminator and thus `EOF` should be with the same indent as `cat` and *less* indented then other text which it is not a part of.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-29 22:31   ` Nikolay Aleksandrovich Pavlov (ZyX)
@ 2016-12-30  2:56     ` Bart Schaefer
  2016-12-31 18:11       ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2016-12-30  2:56 UTC (permalink / raw)
  To: zsh workers

On Dec 30,  1:31am, Nikolay Aleksandrovich Pavlov (ZyX) wrote:
}
} 
} 22.12.2016, 01:11, "Bart Schaefer" <schaefer@brasslantern.com>:
} > I would propose instead something similar (read on below) to this:
} >
} > % cat <<-' xx'

As has already been pointed out, this can't be used exactly as-is,
because quotes around the end marker already have semantics.

} This makes changing the indent rather tricky.

Well ... it means you have to both change the indent and declare that
you've changed it.  I wouldn't call that "tricky".

} YAML does better here: amount of stripped indent is either determined
} based on the first non-blank line [...]

This is at least feasible.  (Does "non-blank" mean "contains a character
that is not whitespace"?  What's whitespace?)

Would we want to strip leading space and tab, or e.g. leading $IFS (with
the probable exclusion of the set $'\f\n\r\v' in that case)?

} or is specified explicitly, relative to the indent of the line where
} block scalar starts

Now that latter I *would* call "tricky" -- a numeric count relative
to some other indent?  What if some of the leading whitespace is tabs?
Also if I read the rest of your explanation correctly, this would make
signficant the leading whitespace before the command whose input is
being redirected, which is a non-starter.

} YAML uses `|` and `>` to start block scalars, that's why I used
} `|` above (`<<>` seems odd and may be confused with `<>`). Not
} sure why this should be a bad choice: `|` already has different
} meanings in different contexts

It seems a bad choice to me because of >| and >>| which have a very
different meaning.  If we were going to use either <| or <<| for some
special purpose, it feels as if there should be symmetry implied, as
with e.g. <& and >&.

Of course << and >> have already given up that sort of symmetry except
for one being input and one being output, so ...

This reminds me that both <<; and <<& also are currently bad syntax;
though "<<;" is probably an even worse choice than "<<|".  There is
at least precedent for combining one of "|" or "&" with redirection.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: indented heredocs
  2016-12-30  2:56     ` Bart Schaefer
@ 2016-12-31 18:11       ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 0 replies; 11+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2016-12-31 18:11 UTC (permalink / raw)
  To: Bart Schaefer, zsh workers



30.12.2016, 05:57, "Bart Schaefer" <schaefer@brasslantern.com>:
> On Dec 30, 1:31am, Nikolay Aleksandrovich Pavlov (ZyX) wrote:
> }
> }
> } 22.12.2016, 01:11, "Bart Schaefer" <schaefer@brasslantern.com>:
> } > I would propose instead something similar (read on below) to this:
> } >
> } > % cat <<-' xx'
>
> As has already been pointed out, this can't be used exactly as-is,
> because quotes around the end marker already have semantics.
>
> } This makes changing the indent rather tricky.
>
> Well ... it means you have to both change the indent and declare that
> you've changed it. I wouldn't call that "tricky".

“Tricky” here means only “I can’t just use `V)>` in Vim”. Or “when doing refactoring it would be easy to miss necessary changes”.

>
> } YAML does better here: amount of stripped indent is either determined
> } based on the first non-blank line [...]
>
> This is at least feasible. (Does "non-blank" mean "contains a character
> that is not whitespace"? What's whitespace?)

I would suggest to mean “space or tab” by whitespace. Generally “anything what may be used for indentation or separating command arguments” (I would not be surprised to hear that I missed that zsh allows using some fancy unicode characters as whitespaces for indentation or separating comand arguments in unicode locales).

>
> Would we want to strip leading space and tab, or e.g. leading $IFS (with
> the probable exclusion of the set $'\f\n\r\v' in that case)?

I would expect it to strip leading spaces and tabs (and error out if there are no necessary spaces in a non-blank line). Involving IFS is not needed: it is not checked for indentation or separating command arguments in source code currently after all, only in some expansions and for `read`.

Example of error: "cat <<| EOF\n\tabc\n\n    def\nEOF" (indent of the third heredoc line is neither preceding the EOF marker nor indent of the first non-blank line (which is \t)).

>
> } or is specified explicitly, relative to the indent of the line where
> } block scalar starts
>
> Now that latter I *would* call "tricky" -- a numeric count relative
> to some other indent? What if some of the leading whitespace is tabs?
> Also if I read the rest of your explanation correctly, this would make
> signficant the leading whitespace before the command whose input is
> being redirected, which is a non-starter.

If this is implemented then heredocs own indent should be space-only, “some other” indent is copied as-is. But I agree that this would be tricky and it is not much needed. Also code with such explicit indent is less readable: harder to determine what exactly heredoc will result in if there is more then one line with extra (compared to initial+requested by number) indent. Actually I did not see YAML documents with such block scalars.

>
> } YAML uses `|` and `>` to start block scalars, that's why I used
> } `|` above (`<<>` seems odd and may be confused with `<>`). Not
> } sure why this should be a bad choice: `|` already has different
> } meanings in different contexts
>
> It seems a bad choice to me because of >| and >>| which have a very
> different meaning. If we were going to use either <| or <<| for some
> special purpose, it feels as if there should be symmetry implied, as
> with e.g. <& and >&.
>
> Of course << and >> have already given up that sort of symmetry except
> for one being input and one being output, so ...
>
> This reminds me that both <<; and <<& also are currently bad syntax;
> though "<<;" is probably an even worse choice than "<<|". There is
> at least precedent for combining one of "|" or "&" with redirection.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-12-31 18:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-21 19:29 indented heredocs Dave Yost
2016-12-21 19:50 ` Daniel Shahaf
2016-12-21 20:38 ` Dave Yost
2016-12-21 22:10 ` Bart Schaefer
2016-12-21 23:04   ` Bart Schaefer
2016-12-22 19:04     ` Philippe Troin
2016-12-22 21:10       ` Bart Schaefer
2016-12-29  9:56       ` Vincent Lefevre
2016-12-29 22:31   ` Nikolay Aleksandrovich Pavlov (ZyX)
2016-12-30  2:56     ` Bart Schaefer
2016-12-31 18:11       ` Nikolay Aleksandrovich Pavlov (ZyX)

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).