zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH?] Nofork and removing newlines
@ 2024-03-05  5:52 Bart Schaefer
  2024-03-05  6:56 ` Stephane Chazelas
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-05  5:52 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 410 bytes --]

On Tue, Feb 27, 2024 at 12:53 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> The intent was to have ${ ... } act more like parameter substitution.
> It might be possible/reasonable to have ${ ... } strip newlines and
> "${ ... }" keep them, if that feels better.

The attached patch implements this, for purposes of discussion.  The
doc updates are much larger than the actual code change.

[-- Attachment #2: nofork-nonewlines.txt --]
[-- Type: text/plain, Size: 3711 bytes --]

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 183ca6e03..b77942697 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1950,6 +1950,9 @@ the braces by whitespace, like `tt(${ )...tt( })', is replaced by its
 standard output.  Like `tt(${|)...tt(})' and unlike
 `tt($LPAR())...tt(RPAR())', the command executes in the current shell
 context with function local behaviors and does not create a subshell.
+Word splitting does not apply unless tt(SH_WORD_SPLIT) is set, but
+trailing newlines em(are) stripped unless the substitution is enclosed
+in double quotes.
 
 Note that because the `tt(${|)...tt(})' and `tt(${ )...tt( })' forms
 must be parsed at once as both string tokens and commands, all other
diff --git a/Etc/FAQ.yo b/Etc/FAQ.yo
index 4a86050e6..0515d2fca 100644
--- a/Etc/FAQ.yo
+++ b/Etc/FAQ.yo
@@ -1092,10 +1092,11 @@ sect(Comparisons of forking and non-forking command substitution)
   affects the caller.
 
   mytt($(command)) removes trailing newlines from the output of mytt(command)
-  when substituting, whereas mytt(${ command }) and its variants do not.
-  The latter is consistent with mytt(${|...}) from mksh but differs from
-  bash and ksh, so in emulation modes, newlines are stripped from command
-  output (not from tt(REPLY) assignments).
+  when substituting, as does mytt(${ command }) when not quoted.  Placing
+  double quotes around mytt("${ command }"), or using either mytt(${|...})
+  format, retains newlines.  The latter is consistent with mytt(${|...})
+  from mksh, but mytt("${ command }") differs from bash and ksh, so in
+  emulation modes, newlines stripped even from quoted command output.
 
   When not enclosed in double quotes, the expansion of mytt($(command)) is
   split on tt(IFS) into an array of words.  In contrast, and unlike both
diff --git a/Src/subst.c b/Src/subst.c
index 49f7336bb..785137357 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -2005,7 +2005,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 		int onoerrs = noerrs, rplylen;
 		noerrs = 2;
 		rplylen = zstuff(&cmdarg, rplytmp);
-		if (! EMULATION(EMULATE_ZSH)) {
+		if (! EMULATION(EMULATE_ZSH) || !qt) {
 		    /* bash and ksh strip trailing newlines here */
 		    while (rplylen > 0 && cmdarg[rplylen-1] == '\n')
 			rplylen--;
diff --git a/Test/D10nofork.ztst b/Test/D10nofork.ztst
index d6a5588df..1c6a30cb0 100644
--- a/Test/D10nofork.ztst
+++ b/Test/D10nofork.ztst
@@ -159,7 +159,7 @@ F:Why not use this error in the previous case as well?
 1:unbalanced braces, part 4+
 ?(eval):1: closing brace expected
 
-  purr ${ purr STDOUT }
+  purr "${ purr STDOUT }"
 0:capture stdout
 >STDOUT
 >
@@ -322,7 +322,7 @@ F:Fiddly here to get EOF past the test syntax
 0:here-string behavior
 >in a here string
 
-  <<<${ purr $'stdout as a here string' }
+  <<<"${ purr $'stdout as a here string' }"
 0:another capture stdout
 >stdout as a here string
 >
@@ -331,7 +331,7 @@ F:Fiddly here to get EOF past the test syntax
   wrap=${ purr "capture in environment assignment" } typeset -p wrap
 0:assignment context
 >typeset -g wrap='REPLY in environment assignment'
->typeset -g wrap=$'capture in environment assignment\n'
+>typeset -g wrap='capture in environment assignment'
 
 # Repeat return and exit tests with stdout capture
 
@@ -410,7 +410,7 @@ F:must do this before evaluating the next test block
 0:ignored braces, part 1
 >buried}
 
-  purr ${ purr ${REPLY:-buried}}}
+  purr "${ purr ${REPLY:-buried}}}"
 0:ignored braces, part 2
 >buried
 >}
@@ -418,7 +418,6 @@ F:must do this before evaluating the next test block
   purr ${ { echo nested ;} }
 0:ignored braces, part 3
 >nested
->
 
   purr ${ { echo nested } } DONE
 1:ignored braces, part 4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-05  5:52 [PATCH?] Nofork and removing newlines Bart Schaefer
@ 2024-03-05  6:56 ` Stephane Chazelas
  2024-03-05 22:48   ` Bart Schaefer
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Chazelas @ 2024-03-05  6:56 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-03-04 21:52:02 -0800, Bart Schaefer:
[...]
>    mytt($(command)) removes trailing newlines from the output of mytt(command)
> -  when substituting, whereas mytt(${ command }) and its variants do not.
> -  The latter is consistent with mytt(${|...}) from mksh but differs from
> -  bash and ksh, so in emulation modes, newlines are stripped from command
> -  output (not from tt(REPLY) assignments).
> +  when substituting, as does mytt(${ command }) when not quoted.  Placing
> +  double quotes around mytt("${ command }"), or using either mytt(${|...})
> +  format, retains newlines.  The latter is consistent with mytt(${|...})
> +  from mksh, but mytt("${ command }") differs from bash and ksh, so in
> +  emulation modes, newlines stripped even from quoted command output.
                              ^^^ typo missing "are".

To me ${ cmd; } being the non-forking version of $(...) should
behave like $(...) in that regard.

IMO, it's a bug in Bourne-like shells (and some others) that
$(...) removes *all* trailing newline characters, but removing
*one* is usually desired.

As in:

basename=$(basename -- "$file")

should remove the newline added by basename, but not the newline
characters that are found at the end of $file.

In any case, I agree ${|cmd} should expand to the value of
$REPLY as-is and trimming newlines there would not make sense.

IIRC I already mentioned it here but maybe having a:

ZSH_CMDSUBST_TRIM=<extendedglobpattern> (defaulting to $'\n##'
for backward compatibility) could address the general issue with
cmdsubst trimming too many newlines (for both $(...) and ${...;}).

One would change it to
ZSH_CMDSUBST_TRIM=$'\n' to get a saner default, or
ZSH_CMDSUBST_TRIM= to not remove anything or
ZSH_CMDSUBST_TRIM=$'(\r|)\n' or
ZSH_CMDSUBST_TRIM='[[:space:]]##' to handle MSDOS line
delimiters or remove any whitespace.

>  
>    When not enclosed in double quotes, the expansion of mytt($(command)) is
>    split on tt(IFS) into an array of words.

unless called in non-list contexts such as in scalar variable
assignment or [[ $var ]] or case $var in...

See also:

$ ./Src/zsh -c 'a=( "${(s[:])${ getconf PATH }}" ); typeset -p a'
typeset -a a=( /bin $'/usr/bin\n' )

-- 
Stephane


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-05  6:56 ` Stephane Chazelas
@ 2024-03-05 22:48   ` Bart Schaefer
  2024-03-06 17:57     ` Stephane Chazelas
  2024-03-06 19:43     ` Stephane Chazelas
  0 siblings, 2 replies; 29+ messages in thread
From: Bart Schaefer @ 2024-03-05 22:48 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, Mar 4, 2024 at 10:56 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> To me ${ cmd; } being the non-forking version of $(...) should
> behave like $(...) in that regard.

That's the starting point of this discussion, yes.

> IMO, it's a bug in Bourne-like shells (and some others) that
> $(...) removes *all* trailing newline characters, but removing
> *one* is usually desired.

Ignoring the many-vs.-one issue, the pivotal word here is "usually".
We can't change the behavior of $(...) but parameter expansions
already behave differently with respect to SH_WORD_SPLIT so we have
precedent for leeway on ${ ... }.  The suggested change would provide
$(...)-like behavior for the usual case and a simple way to keep the
newline(s) in the less-usual cases.

> IIRC I already mentioned it here but maybe having a:
>
> ZSH_CMDSUBST_TRIM=<extendedglobpattern>

This is both IMO way too complicated and also misses the point that
newline trimming or not ought to be easily switchable in the context
of a single expansion, not globally.

So when I started the thread about ${ ... } the consensus was that it
would be OK to always keep the newlines and if you don't want them in
a particular case, you can write
${${ command }%$'\n'}.

Since then it's been pointed out that a lot of uses of $(...) that
would be replace-able with ${ ... } will break if the newlines are not
stripped, and it's a bit of a pain to have to remember that nesting
all the time.  So the proposal made here has two goals:
1) Make it easy to replace many uses of $(...)
2) Make it easy to choose case-by-case whether to keep newlines
Thus
  ${ ... } strips newlines like $(...) for #1
  "${ ... }" keeps them for handling #2
and if you want full SH_WORD_SPLIT behavior you can still write
  ${=${ ... }}
which is shorter and easier than the %$'\n' thing and strips newlines too.

My strong inclination is to either go with this patch or leave it as
is.  The code change to implement this patch is literally two tokens.

Thanks for the doc proofread.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-05 22:48   ` Bart Schaefer
@ 2024-03-06 17:57     ` Stephane Chazelas
  2024-03-06 19:45       ` Bart Schaefer
  2024-03-06 19:43     ` Stephane Chazelas
  1 sibling, 1 reply; 29+ messages in thread
From: Stephane Chazelas @ 2024-03-06 17:57 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-03-05 14:48:00 -0800, Bart Schaefer:
> On Mon, Mar 4, 2024 at 10:56 PM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > To me ${ cmd; } being the non-forking version of $(...) should
> > behave like $(...) in that regard.
> 
> That's the starting point of this discussion, yes.
> 
> > IMO, it's a bug in Bourne-like shells (and some others) that
> > $(...) removes *all* trailing newline characters, but removing
> > *one* is usually desired.
> 
> Ignoring the many-vs.-one issue, the pivotal word here is "usually".
> We can't change the behavior of $(...) but parameter expansions
> already behave differently with respect to SH_WORD_SPLIT so we have
> precedent for leeway on ${ ... }.  The suggested change would provide
> $(...)-like behavior for the usual case and a simple way to keep the
> newline(s) in the less-usual cases.

Sorry, I hadn't realised ${ cmd } also didn't do IFS-splitting,
so it is indeed departing a lot from command substitution and
assuming we don't care about keep compatibility with
ksh93/mksh/bash, I agree the proposed behaviour makes sense and
it's usefil to have a command substitution that doesn't trim all
newlines, so as you say I can do for my previous example:

basename="${${ basename -- "$file" }%$'\n'}"

To properly get the basename of $file with basename. (yes, I
know it's a bad example as we can also do basename=$file:t).

> > IIRC I already mentioned it here but maybe having a:
> >
> > ZSH_CMDSUBST_TRIM=<extendedglobpattern>
> 
> This is both IMO way too complicated and also misses the point that
> newline trimming or not ought to be easily switchable in the context
> of a single expansion, not globally.

The idea would be to allow users to fix command substitution
once and for all with ZSH_CMDSUBST_TRIM=$'\n'.

So things like:

basename=$(basename -- "$file")

become correct regardless of the value of $file without to have
to resort to ugly work arounds.

set -o fixcmdsubstrnewlinetrimming

would work as well be be less versatile.

(I agree that in any case that's rather tangential to the
question of what to do with ${ ... })

> My strong inclination is to either go with this patch or leave it as
> is.  The code change to implement this patch is literally two tokens.

Either way or always removing all newlines or always removing one
newline or removing one newline when not quoted are fine with me.

-- 
Stephane


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-05 22:48   ` Bart Schaefer
  2024-03-06 17:57     ` Stephane Chazelas
@ 2024-03-06 19:43     ` Stephane Chazelas
  1 sibling, 0 replies; 29+ messages in thread
From: Stephane Chazelas @ 2024-03-06 19:43 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-03-05 14:48:00 -0800, Bart Schaefer:
[...]
> The suggested change would provide
> $(...)-like behavior for the usual case and a simple way to keep the
> newline(s) in the less-usual cases.
[...]

For reference, some other shells that can keep trailing newline
in command subtitution:

rc:

  whole_output = ``(){cmd and its args}

fish:

  set whole_output (cmd and its args | string collect -aN)

  (-aN short for --allow-empty --no-trim-newlines).

  fish's command substitution ( (...) and also $(...) including
  inside "..." in recent versions) doesn't fork so is closer to
  ksh93's ${ cmd; } than ksh86's $(...)

POSIX/Korn-like shells:

  I'm sure everyone will have their own variant, but

  get_whole_output() {
    eval "
      $1"'=$(shift; "$@"; ret=$?; echo .; exit "$ret")
      set -- "$1" "$?"
      '"$1"'=${'"$1"'%.}
      return "$2"'
  }
  get_whole_output whole_output cmd and its args

  (bearing in mind that there aren't many characters beside .
  that you can use safely there as it's important its encoding
  can't be found in the encoding of other characters.

-- 
Stephane


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-06 17:57     ` Stephane Chazelas
@ 2024-03-06 19:45       ` Bart Schaefer
  2024-03-06 22:22         ` Mikael Magnusson
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-06 19:45 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, Mar 6, 2024 at 9:57 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> Sorry, I hadn't realised ${ cmd } also didn't do IFS-splitting,
> so it is indeed departing a lot from command substitution and
> assuming we don't care about keep compatibility with
> ksh93/mksh/bash, I agree the proposed behaviour makes sense

If SH_WORD_SPLIT is in fact set (as when emulating) then it is
applied, so that's the other-shell-compatibility path.

> Either way or always removing all newlines or always removing one
> newline or removing one newline when not quoted are fine with me.

Thanks.  Anyone else waiting to weigh in?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-06 19:45       ` Bart Schaefer
@ 2024-03-06 22:22         ` Mikael Magnusson
  2024-03-06 22:42           ` Bart Schaefer
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Mikael Magnusson @ 2024-03-06 22:22 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On 3/6/24, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Wed, Mar 6, 2024 at 9:57 AM Stephane Chazelas <stephane@chazelas.org>
> wrote:
>>
>> Sorry, I hadn't realised ${ cmd } also didn't do IFS-splitting,
>> so it is indeed departing a lot from command substitution and
>> assuming we don't care about keep compatibility with
>> ksh93/mksh/bash, I agree the proposed behaviour makes sense
>
> If SH_WORD_SPLIT is in fact set (as when emulating) then it is
> applied, so that's the other-shell-compatibility path.
>
>> Either way or always removing all newlines or always removing one
>> newline or removing one newline when not quoted are fine with me.
>
> Thanks.  Anyone else waiting to weigh in?

These are just some observations with no real conclusion probably.

1) $(foo) will optimize away an extra fork if foo is an external command
2) ${ foo } will fork the same amount of times as 1) if foo is
external and not at all if foo is a function.
If you write a function that prints stuff, it is presumably pretty
easy to just make it not print the extra newlines in the first place.
If foo calls some external command that prints a newline then I
suppose 1) and 2) are not super relevant arguments.

"${ foo}" and ${ foo} having the same wordsplitting behavior but only
differing in stripping newlines feels a bit magical and weird. I would
feel surprised if it did wordsplitting without shwordsplit since it is
an extension of the ${} syntax which doesn't do it.

We could in theory add some new () flag, T for trim is free eg,
${(T)${ foo}} is somewhat more ergonomic than ${${ foo}%$'\n'}

Is there some strong reason we could not allow ${(T) foo} btw? The
space is syntactically kind of similar to other stuff that does work
like ${(f)^param} and would save the extra ${}, but I didn't take a
look at the code yet.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-06 22:22         ` Mikael Magnusson
@ 2024-03-06 22:42           ` Bart Schaefer
  2024-03-07  4:53           ` Bart Schaefer
  2024-03-07  6:52           ` Lawrence Velázquez
  2 siblings, 0 replies; 29+ messages in thread
From: Bart Schaefer @ 2024-03-06 22:42 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Zsh hackers list

Have to go to an appointment so just one quick thing now:

On Wed, Mar 6, 2024 at 2:22 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> Is there some strong reason we could not allow ${(T) foo} btw?

"{ " (curly bracket followed by space) is recognized like a syntax
token.  Can't break it up by sticking an arbitrary chunk of flags in
parens in the middle of it.

> space is syntactically kind of similar to other stuff that does work

In that case "{" is the token and all the stuff following is parsed later.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-06 22:22         ` Mikael Magnusson
  2024-03-06 22:42           ` Bart Schaefer
@ 2024-03-07  4:53           ` Bart Schaefer
  2024-03-07  7:02             ` Lawrence Velázquez
  2024-03-07  7:10             ` Stephane Chazelas
  2024-03-07  6:52           ` Lawrence Velázquez
  2 siblings, 2 replies; 29+ messages in thread
From: Bart Schaefer @ 2024-03-07  4:53 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Zsh hackers list

On Wed, Mar 6, 2024 at 2:22 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> 1) $(foo) will optimize away an extra fork if foo is an external command
> 2) ${ foo } will fork the same amount of times as 1) if foo is
> external and not at all if foo is a function.

You're almost quoting the FAQ entry. :-)

> "${ foo}" and ${ foo} having the same wordsplitting behavior but only
> differing in stripping newlines feels a bit magical and weird.

One question (and sort of the point) is whether anyone would really
notice.  If you put it in quotes you're expecting a literal result,
and if you (for example) assign it unquoted to a scalar you're
expecting it to "just work" the way assigning $(foo) would.  It's a
bit unusual but it seems to preserve the principle of least surprise,
and it uses the least amount of extra syntax.

On the other hand I'm not highly invested in this.  In the absence of
this (no)quoting behavior, I've found I nearly always want ${=${ foo
}} or ${(f)${ foo }}, each of which gives exactly the same result with
or without trimming.

> We could in theory add some new () flag, T for trim is free eg,
> ${(T)${ foo}} is somewhat more ergonomic than ${${ foo}%$'\n'}

I admittedly (still pre-patch) have used (f) for this when I know
there's only one line of output.  I'm just struggling to think of
where else I would use a (T).

Returning to this other bit ...

On Wed, Mar 6, 2024 at 2:42 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Wed, Mar 6, 2024 at 2:22 PM Mikael Magnusson <mikachu@gmail.com> wrote:
> >
> > Is there some strong reason we could not allow ${(T) foo} btw?
>
> "{ " (curly bracket followed by space) is recognized like a syntax
> token.

Code-wise, a sequence starting with ${ (with or without the space) and
ending with } is lexed into a single STRING token.  (If it's inside
double quotes, the entire double-quoted thing is a STRING token, but
you can have nested quotes inside the dollar-brace inside the double
quotes, etc., so this has to work recursively, and so on.)  So the
lexer has to decide when it sees dollar-brace how to find the closing
brace.  Skipping over parameter flags before deciding to switch to
parsing something that looks like a function body might be possible,
but doesn't really fit into the structure of the lexer.  Deciding
based on the very next character (space or pipe for a command, or any
other for a parameter) makes it tractable.

The lexical problem aside, when you get to the point of performing the
substitution, even if the command interpretation were deferred until
after all the flags are collected, it would still have to function
much like ${(flags)"$(cmdsubst)"} would, so it's a lot easier if it's
already structured as a nested substitution.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-06 22:22         ` Mikael Magnusson
  2024-03-06 22:42           ` Bart Schaefer
  2024-03-07  4:53           ` Bart Schaefer
@ 2024-03-07  6:52           ` Lawrence Velázquez
  2024-03-07  8:26             ` Mikael Magnusson
  2 siblings, 1 reply; 29+ messages in thread
From: Lawrence Velázquez @ 2024-03-07  6:52 UTC (permalink / raw)
  To: Mikael Magnusson, Bart Schaefer; +Cc: zsh-workers

On Wed, Mar 6, 2024, at 5:22 PM, Mikael Magnusson wrote:
> "${ foo}" and ${ foo} having the same wordsplitting behavior but only
> differing in stripping newlines feels a bit magical and weird.

I agree.  Personally, I'm always surprised when quoting does anything
other than suppress splitting, globbing, and special characters in
patterns.  For instance, I can never remember this pitfall mentioned
in workers/52666, even though (I think) I understand why it happens:

	% print ${:-{}x}
	{}x
	% print "${:-{}x}"
	{x}

-- 
vq


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07  4:53           ` Bart Schaefer
@ 2024-03-07  7:02             ` Lawrence Velázquez
  2024-03-07  8:09               ` ${<file} (Was: [PATCH?] Nofork and removing newlines) Stephane Chazelas
  2024-03-08  1:29               ` [PATCH?] Nofork and removing newlines Bart Schaefer
  2024-03-07  7:10             ` Stephane Chazelas
  1 sibling, 2 replies; 29+ messages in thread
From: Lawrence Velázquez @ 2024-03-07  7:02 UTC (permalink / raw)
  To: Bart Schaefer, Mikael Magnusson; +Cc: zsh-workers

On Wed, Mar 6, 2024, at 11:53 PM, Bart Schaefer wrote:
> On Wed, Mar 6, 2024 at 2:42 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>>
>> On Wed, Mar 6, 2024 at 2:22 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>> >
>> > Is there some strong reason we could not allow ${(T) foo} btw?
>>
>> "{ " (curly bracket followed by space) is recognized like a syntax
>> token.
>
> Code-wise, a sequence starting with ${ (with or without the space) and
> ending with } is lexed into a single STRING token.  (If it's inside
> double quotes, the entire double-quoted thing is a STRING token, but
> you can have nested quotes inside the dollar-brace inside the double
> quotes, etc., so this has to work recursively, and so on.)  So the
> lexer has to decide when it sees dollar-brace how to find the closing
> brace.  Skipping over parameter flags before deciding to switch to
> parsing something that looks like a function body might be possible,
> but doesn't really fit into the structure of the lexer.  Deciding
> based on the very next character (space or pipe for a command, or any
> other for a parameter) makes it tractable.

Hm, would it be feasible to create an explicit LF-preserving form
using a different character (e.g., ${&cmd})?  If so, would it be
undesirable for some other reason?

(Sorry if you already said something ruling this out; I only had
time to quickly skim today's messages.)

-- 
vq


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07  4:53           ` Bart Schaefer
  2024-03-07  7:02             ` Lawrence Velázquez
@ 2024-03-07  7:10             ` Stephane Chazelas
  2024-03-08  0:37               ` Bart Schaefer
  1 sibling, 1 reply; 29+ messages in thread
From: Stephane Chazelas @ 2024-03-07  7:10 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Mikael Magnusson, Zsh hackers list

2024-03-06 20:53:28 -0800, Bart Schaefer:
[...]
> > "${ foo}" and ${ foo} having the same wordsplitting behavior but only
> > differing in stripping newlines feels a bit magical and weird.
> 
> One question (and sort of the point) is whether anyone would really
> notice.  If you put it in quotes you're expecting a literal result,
> and if you (for example) assign it unquoted to a scalar you're
> expecting it to "just work" the way assigning $(foo) would.  It's a
> bit unusual but it seems to preserve the principle of least surprise,
> and it uses the least amount of extra syntax.
> 
> On the other hand I'm not highly invested in this.  In the absence of
> this (no)quoting behavior, I've found I nearly always want ${=${ foo
> }} or ${(f)${ foo }}, each of which gives exactly the same result with
> or without trimming.
[...]

For ${=${ foo }} that depends on whether $IFS contains a
(non-doubled) newline or not.

Without trimming:

$ IFS=:
$ printf '<%s>\n' ${=${ getconf PATH }}
</bin>
</usr/bin
>

$ IFS=$'\n\n'
$ printf '<%s>\n' ${=${ seq 3 }}
<1>
<2>
<3>
<>

For (f), see also:

$ printf '<%s>\n' "${(f@)${ print -l 'a b' '' 'c d' }}"
<a b>
<>
<c d>
<>

Like with IFS=$'\n\n', those are typically the cases where you
do want to preserve empty lines.

In both cases, trimming one (and only one) newline character
would lead to a better behaviour. One exception would be in:

lines=( "${(f@)${ print -l '' }}" )

Where you'd get no line instead of one empty line. Though at the moment, you get:

$ lines=( "${(f@)${ print -l '' }}" )
$ typeset -p lines
typeset -a lines=( '' '' )

(2 empty lines) which is not better.

We'd need to have a way to treat the separator as *delimiter*
instead (as POSIX requires for IFS splitting despite the S in
IFS; both "delimiting" and "separating" have their use).

-- 
Stephane


^ permalink raw reply	[flat|nested] 29+ messages in thread

* ${<file} (Was: [PATCH?] Nofork and removing newlines)
  2024-03-07  7:02             ` Lawrence Velázquez
@ 2024-03-07  8:09               ` Stephane Chazelas
  2024-03-08  1:29               ` [PATCH?] Nofork and removing newlines Bart Schaefer
  1 sibling, 0 replies; 29+ messages in thread
From: Stephane Chazelas @ 2024-03-07  8:09 UTC (permalink / raw)
  To: Lawrence Velázquez; +Cc: Bart Schaefer, Mikael Magnusson, zsh-workers

By the way, if ${ cmd } preserves trailing newlines, it would
be useful to also have ${<file} as a variant of $(<file) that
preserves trailing newlines (and remove the need for a zslurp).

"${ <file}" already does but that's via running $READNULLCMD so
that could be optimized.

ksh93 and mksh both support optimised ${ <file;} (also ${<file;}
in ksh93), but they do trim trailing newline characters so
AFAICT, they're no different from $(<file).

See also the

$(<<'EOF'
multi-line
text
EOF)

of mksh which actually skips the creation of the here-doc and is
in effect a form of multi-line quoting (though also trims
trailing newlines). Also works with:

${ <<'EOF'
multi-line
test
EOF
}

-- 
Stephane


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07  6:52           ` Lawrence Velázquez
@ 2024-03-07  8:26             ` Mikael Magnusson
  2024-03-07 19:02               ` Bart Schaefer
  0 siblings, 1 reply; 29+ messages in thread
From: Mikael Magnusson @ 2024-03-07  8:26 UTC (permalink / raw)
  To: Lawrence Velázquez; +Cc: zsh-workers

On 3/7/24, Lawrence Velázquez <larryv@zsh.org> wrote:
> On Wed, Mar 6, 2024, at 5:22 PM, Mikael Magnusson wrote:
>> "${ foo}" and ${ foo} having the same wordsplitting behavior but only
>> differing in stripping newlines feels a bit magical and weird.
>
> I agree.  Personally, I'm always surprised when quoting does anything
> other than suppress splitting, globbing, and special characters in
> patterns.  For instance, I can never remember this pitfall mentioned
> in workers/52666, even though (I think) I understand why it happens:
>
> 	% print ${:-{}x}
> 	{}x
> 	% print "${:-{}x}"
> 	{x}

This is not really an effect of quoting per se, really it's just luck
that the unquoted form works. You'll notice that if you try print
"${:-}x}" without the quotes it will simply fail. Your example only
happens to pass the parsing stage because the braces are balanced
which they have no inherent reason to do in what is supposedly a
string literal. Because the parser "knows" about the balanced braces
in the unquoted case, it skips past the first } for closing the ${,
but in the quoted form the { is not special in any way, so the first }
does match the ${, and then the second } is just a literal } which is
then printed after the x.

The correct way to write it in both cases would be:
% print ${:-\{\}x}
{}x
% print "${:-{\}x}"
{}x

(you can \escape the { inside the quotes too if you want, but it has
no effect on the result).

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07  8:26             ` Mikael Magnusson
@ 2024-03-07 19:02               ` Bart Schaefer
  2024-04-02  6:45                 ` Lawrence Velázquez
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-07 19:02 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Lawrence Velázquez, zsh-workers

On Thu, Mar 7, 2024 at 12:26 AM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 3/7/24, Lawrence Velázquez <larryv@zsh.org> wrote:
> >
> >       % print ${:-{}x}
> >       {}x
> >       % print "${:-{}x}"
> >       {x}
>
> This is not really an effect of quoting per se, really it's just luck
> that the unquoted form works. [...] Your example only
> happens to pass the parsing stage because the braces are balanced
> which they have no inherent reason to do in what is supposedly a
> string literal.

It passes the balanced braces because this:

% print ${:-{a,b,c}x}
ax bx cx

And because this:

% print {}
{}

I'm leaving this in the same discussion thread because I just noticed
that ${|...} and ${ cmd } do not really respect the
IGNORE_CLOSE_BRACES option.  Setting that option changes handling of
unbalanced braces (and I'm not yet sure if it does so in a sensible
way) but does not force use of the semicolon e.g. in ${ cmd; } which
theoretically it should.  Is this worth trying to work in?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07  7:10             ` Stephane Chazelas
@ 2024-03-08  0:37               ` Bart Schaefer
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Schaefer @ 2024-03-08  0:37 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, Mar 6, 2024 at 11:10 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> For ${=${ foo }} that depends on whether $IFS contains a
> (non-doubled) newline or not.

True, but I think not really relevant, because nobody is (I hope)
going to globally set a strange IFS in their dotfiles and still expect
any normal behavior.

> For (f), see also:
>
> $ printf '<%s>\n' "${(f@)${ print -l 'a b' '' 'c d' }}"
> <a b>
> <>
> <c d>
> <>

That's because of the historic behavior of the (s::) flag where (f) is
(ps:\n:).  But as was pointed out elsewhere if you're not invoking an
external command you can control this from inside the substitution:

% printf '<%s>\n' "${(f@)${ print -nl 'a b' '' 'c d' }}"
<a b>
<>
<c d>
%

Which leans a little in the direction of never trimming rather than of
choosing how many to trim.

It does however reveal a drawback in the quoting proposal, in that
when nesting ${ ... } inside another quoted expansion there would be
no way to disable newline retention.

> We'd need to have a way to treat the separator as *delimiter*

That would be a useful choice for (T) or some other new flag -- as in,
do NOT "trim" the separator when splitting -- but I don't see how it
helps decide whether to trim trailing newline(s) from ${ cmd } in the
first place, because in the delimiter case you'd want to keep them?

Just for grins ...

% : ${|reply|
  typeset -ga reply
  local -i i=1 MBEGIN MEND
  local -n MATCH='reply[i]'
  local pat=$'[^\n]#\n'
  : ${(*S)"${ print -l 'a b' '' 'c d' }"//(#m)($~pat)/$((i++))}
}
% typeset -p reply
typeset -a reply=( $'a b\n' $'\n' $'c d\n' )


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07  7:02             ` Lawrence Velázquez
  2024-03-07  8:09               ` ${<file} (Was: [PATCH?] Nofork and removing newlines) Stephane Chazelas
@ 2024-03-08  1:29               ` Bart Schaefer
  2024-03-08 22:15                 ` Oliver Kiddle
  1 sibling, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-08  1:29 UTC (permalink / raw)
  To: zsh-workers

On Wed, Mar 6, 2024 at 11:02 PM Lawrence Velázquez <larryv@zsh.org> wrote:
>
> Hm, would it be feasible to create an explicit LF-preserving form
> using a different character (e.g., ${&cmd})?  If so, would it be
> undesirable for some other reason?

Other than that we've just about run out of characters?

${< should be reserved for reading a file, as already suggested
elsewhere (no, I'm not going to implement that yet, though it seems to
be an undocumented ksh93 feature).

${> might work, but it "looks wrong" to have a command instead of a
file to the right of the pointy end.

${& looks like you're running something asynchronously, or perhaps
changing a file descriptor.

Every other character already has another meaning in that position, as
far as I can tell.

There is one other possibility:  ${||command}, that is,
${|var|command} with an empty var name.  That's already passed through
the lexer, so it could be picked out at the necessary place in subst.c
(I think, haven't actually tried yet).  It looks a little odd, too,
given "||" usually means "or", but it's at least sort of logical to
treat "assign this output to nothing" as "return the output in place",
and the other ${|...} forms do preserve trailing newlines.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-08  1:29               ` [PATCH?] Nofork and removing newlines Bart Schaefer
@ 2024-03-08 22:15                 ` Oliver Kiddle
  2024-03-08 23:28                   ` Bart Schaefer
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Kiddle @ 2024-03-08 22:15 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
> ${< should be reserved for reading a file, as already suggested
> elsewhere (no, I'm not going to implement that yet, though it seems to
> be an undocumented ksh93 feature).

> ${> might work, but it "looks wrong" to have a command instead of a
> file to the right of the pointy end.

I agree. I'd sooner expect that to be running $NULLCMD redirected to a file.
Not that that would be even remotely useful.

> Every other character already has another meaning in that position, as
> far as I can tell.

It could be nice to have ${= cmd } as a shorter alternative to
${=${ cmd }} particularly if the default is to be newline preserving.
That would need to do word splitting but trailing IFS characters also
get removed so it would work for some cases.

> There is one other possibility:  ${||command}, that is,
> ${|var|command} with an empty var name.  That's already passed through
> the lexer, so it could be picked out at the necessary place in subst.c
> (I think, haven't actually tried yet).  It looks a little odd, too,
> given "||" usually means "or", but it's at least sort of logical to
> treat "assign this output to nothing" as "return the output in place",
> and the other ${|...} forms do preserve trailing newlines.

The logic does at least follow from the usage with a variable. One way
to avoid the resemblance to an "or" is if ${| |command} also works.
It could perhaps be combined so ${||<file} slurps a file unmodified.

Why does it print command not found errors for things like ${|=|:},
${|*|:} and ${|?|:}, I'd rather have $? than it globbing for a single
character file.

Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-08 22:15                 ` Oliver Kiddle
@ 2024-03-08 23:28                   ` Bart Schaefer
  2024-03-09 20:43                     ` Oliver Kiddle
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-08 23:28 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: zsh-workers

On Fri, Mar 8, 2024 at 2:15 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > Every other character already has another meaning in that position, as
> > far as I can tell.
>
> It could be nice to have ${= cmd } as a shorter alternative to
> ${=${ cmd }}

Unfortunately the lexer needs to be able to do this with one-character
peek-ahead.  So it can't distinguish dollar-brace-equal-space from
dollar-brace-equal, and the latter has to be treated as a parameter
expansion.

> > There is one other possibility:  ${||command}, that is,
> > ${|var|command} with an empty var name.
>
> The logic does at least follow from the usage with a variable. One way
> to avoid the resemblance to an "or" is if ${| |command} also works.

That might be possible.  Right now the lexer sees "${|" and branches
to scanning something that looks like a function body (closely
approximate to how $(command) scans ahead to the closing paren without
really "understanding" what it's skipping over).  That happens to not
care whether the next "|" is in a sensible position, just that it's
something that can be skipped while looking for the closing brace.

Then at the point of actual substitution, when there's a leading "|"
it looks for an identifier followed by another "|".  So you can't
write
  ... ${|paste|read} ...
and expect $REPLY to be set as the default by read, instead $paste
will be set (probably to nothing).  Anyway the upshot is it could
probably also look for whitespace followed by another "|" without
confusing anything.  Right now it just attempts to evaluate the
equivalent of { |commmand } which is a parse error.

> It could perhaps be combined so ${||<file} slurps a file unmodified.

That's messy because you can write
  <file somecommand
and it means the same as
  somecommand <file
so again it's not enough to see "||<" ... we'd actually have to
special-case READNULLCMD or something.

> Why does it print command not found errors for things like ${|=|:},
> ${|*|:} and ${|?|:}, I'd rather have $? than it globbing for a single
> character file.

See above about the requirement for it to look like ${|ident|...}.
Since = * and ? are not identifiers, this is like writing { =|: } etc.
and you get the same errors.  All of the non-identifier special
parameters are read-only so it doesn't make sense to assign to them,
and the |ident| has to be assignable for the expansion to mean
anything, so why allow them in that position? Unless you're just going
for side-effects, but then why use the |var| form?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-08 23:28                   ` Bart Schaefer
@ 2024-03-09 20:43                     ` Oliver Kiddle
  2024-03-10  6:11                       ` Bart Schaefer
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Kiddle @ 2024-03-09 20:43 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
> See above about the requirement for it to look like ${|ident|...}.
> Since = * and ? are not identifiers, this is like writing { =|: } etc.

Ok, that makes sense. Thanks

> and you get the same errors.  All of the non-identifier special
> parameters are read-only so it doesn't make sense to assign to them,
> and the |ident| has to be assignable for the expansion to mean
> anything, so why allow them in that position? Unless you're just going
> for side-effects, but then why use the |var| form?

You may not be able to assign to it directly but I can think of uses
for $? (and perhaps also $!) if supported there. That is assuming $? is the
return status for the command running inside the expansion. Being an
identifier, $_ does work there, not that it's especially useful. $1, $2
etc also work.

Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-09 20:43                     ` Oliver Kiddle
@ 2024-03-10  6:11                       ` Bart Schaefer
  2024-03-12 17:54                         ` Bart Schaefer
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-10  6:11 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: zsh-workers

On Sat, Mar 9, 2024 at 12:44 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > ... the |ident| has to be assignable for the expansion to mean
> > anything, so why allow them in that position?
>
> You may not be able to assign to it directly but I can think of uses
> for $? (and perhaps also $!) if supported there.

$? is also $status and ${|status|...} is fine.

% print ${|status| return 9}
9

Also:

% x=${ return 9 }
% echo $?
9

(Just like with $(exit 9).)

Pondering $! ... hm.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-10  6:11                       ` Bart Schaefer
@ 2024-03-12 17:54                         ` Bart Schaefer
  2024-03-12 23:19                           ` Oliver Kiddle
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-12 17:54 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: zsh-workers

On Fri, Mar 8, 2024 at 2:15 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Why does it print command not found errors for things like ${|=|:},
> ${|*|:} and ${|?|:}, I'd rather have $? than it globbing for a single

Bart Schaefer wrote:
> See above about the requirement for it to look like ${|ident|...}.
> Since = * and ? are not identifiers, this is like writing { =|: } etc.> character file.

On Sat, Mar 9, 2024 at 12:44 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> You may not be able to assign to it directly but I can think of uses
> for $? (and perhaps also $!) if supported there.

On Sat, Mar 9, 2024 at 10:11 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> $? is also $status and ${|status|...} is fine.
>
> Pondering $! ... hm.

This can be done with e.g.

typeset -n bang=!
... ${|bang|...} ...

And that doesn't even run afoul of history expansion, though I would
not expect $! to be used that much in an interactive context.

However:

Returning to the original context here, we were talking about how to
make ${ ... } more newline-trimming-compatible with $(...) while still
providing a way to specify that newlines not be trimmed, and using
${||...} for the latter came up.

In thinking about ${|?|...} etc. I realized that there's no real
reason a set of non-identifier characters couldn't be allowed to
follow the first vertical bar.  It'd have to be simpler than just
tossing parameter expansion flags in there, but I could investigate
whether we could do things like ${|=|...} is the same as ${=${ ... }},
${|~|...} is ${~${ ... }}, etc.  That only saves 1 character, though,
and I'm not sure it's clearer.

It does mean, though, that we could use something like ${|<|...} for
non-trimming command substitution, instead of "empty" || meaning that.
Just from a "clean look" standpoint, though, I still like the quoting
approach better.

Separately, it's definitely possible to make zsh-mode ${ ... } trim
only one newline instead of all of them.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-12 17:54                         ` Bart Schaefer
@ 2024-03-12 23:19                           ` Oliver Kiddle
  2024-03-13  4:13                             ` Bart Schaefer
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Kiddle @ 2024-03-12 23:19 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
> On Fri, Mar 8, 2024 at 2:15 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > Why does it print command not found errors for things like ${|=|:},
> > ${|*|:} and ${|?|:}, I'd rather have $? than it globbing for a single
>
> Bart Schaefer wrote:
> > See above about the requirement for it to look like ${|ident|...}.
> > Since = * and ? are not identifiers, this is like writing { =|: } etc.

Considering this explanation, it is apparent that allowing |ident| is
not fully compatible with mksh where ${|ls| cat -} runs ls.
Not that I think that matters as such. In usage, it is probably wise to
make a convention of always having a space before the command starts.
And this leads on to the later question as we probably don't want to
expand considerably on what is valid between the vertical bars.

> This can be done with e.g.
>
> typeset -n bang=!
> ... ${|bang|...} ...

Yes that works. Is nice to see namerefs coming up in nifty solutions. I
hadn't checked the code for what supporting ? / ! would involve. If
trivial why not, but I well understand not wanting to do anything that
involves the lexer.

> However:
>
> Returning to the original context here, we were talking about how to
> make ${ ... } more newline-trimming-compatible with $(...) while still
> providing a way to specify that newlines not be trimmed, and using
> ${||...} for the latter came up.
>
> In thinking about ${|?|...} etc. I realized that there's no real
> reason a set of non-identifier characters couldn't be allowed to
> follow the first vertical bar.  It'd have to be simpler than just
> tossing parameter expansion flags in there, but I could investigate
> whether we could do things like ${|=|...} is the same as ${=${ ... }},
> ${|~|...} is ${~${ ... }}, etc.  That only saves 1 character, though,
> and I'm not sure it's clearer.

Would that potentially also extend to something like ${|=var| ... }
That might look like a default value assignment to someone used to a
language where vertical bars delimit closure parameters. Coming within
the vertical bars the character has a closer syntactic attachment to the
variable implying a semantic attachment. If it is hard to support
${= ... } then not doing it at all is probably better.

Given that the ${|var| ... } form appears to create a function-like
scope, should var perhaps be auto-declared local for that scope and the
local value be substituted?

> It does mean, though, that we could use something like ${|<|...} for
> non-trimming command substitution, instead of "empty" || meaning that.
> Just from a "clean look" standpoint, though, I still like the quoting
> approach better.

The quoting approach is clean and logical and is probably my preferred
option. I was initially bothered by the lack of consistency with $(...)
(where quoting prevents word splitting) but it can be useful if the lack
of fork is not the only thing which makes ${ ... } different and because
of the syntactic resemblance, consistency with ${var} is perhaps more
important - it does word splitting based on the shwordsplit option.

> Separately, it's definitely possible to make zsh-mode ${ ... } trim
> only one newline instead of all of them.

Only one is probably the most useful. I would mostly associate the fact
that $(...) strips multiple with the fact that it does word splitting
and so drops repeated newlines (empty words) also from the middle.
Admittedly "$(...)" preserves empty words in the middle but still drops
those at the end.

Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-12 23:19                           ` Oliver Kiddle
@ 2024-03-13  4:13                             ` Bart Schaefer
  2024-03-14 22:15                               ` Oliver Kiddle
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Schaefer @ 2024-03-13  4:13 UTC (permalink / raw)
  To: zsh-workers

On Tue, Mar 12, 2024 at 4:19 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> > Bart Schaefer wrote:
> > > See above about the requirement for it to look like ${|ident|...}.
> > > Since = * and ? are not identifiers, this is like writing { =|: } etc.
>
> Considering this explanation, it is apparent that allowing |ident| is
> not fully compatible with mksh where ${|ls| cat -} runs ls.

Hm, yes.  Although I wasn't really aiming for compatibility, rather
for borrowing the idea (via Sebastian's original attempt at it).  I
was also I confess a bit stuck on the idea that every case would look
like ${|REPLY=...} when of course piping to "read" etc. are also valid
ways to assign to REPLY.  How often would there be a command name with
no arguments in that position?

> And this leads on to the later question as we probably don't want to
> expand considerably on what is valid between the vertical bars.

I hesitate in suggesting this, but ... is there any existing case in
which "${{" is valid?  If not, I think I can change ${|var|...} to be
${{var}...} without too much violence (except to the doc, bleah).

> Yes that works. Is nice to see namerefs coming up in nifty solutions. I
> hadn't checked the code for what supporting ? / ! would involve.

Mostly it involves rejiggering valid_refname() to behave more like
itype_end(), if you mean supporting e.g. ${|?|...}.

> If trivial why not, but I well understand not wanting to do anything
> that involves the lexer.

That (and using {var} instead of |var|) would except for a single
conditional test all happen in subst.c, the lexer already skips ahead.

> Bart Schaefer wrote:
> > [...] I could investigate
> > whether we could do things like ${|=|...} is the same as ${=${ ... }},
> > ${|~|...} is ${~${ ... }}, etc.  That only saves 1 character, though,
> > and I'm not sure it's clearer.
>
> Would that potentially also extend to something like ${|=var| ... }

It could, yes.

> That might look like a default value assignment to someone

Would ${{=var}...} look better?  The doubled braces do give me pause.

> Given that the ${|var| ... } form appears to create a function-like
> scope, should var perhaps be auto-declared local for that scope and the
> local value be substituted?

I considered that but
(a) the implementation is messy, as the state of the parameter scope
has to be carried around subst.c a lot longer than with the single
known scalar "REPLY"
(b) it diverges even farther from the idea that REPLY is a
semi-special thing -- note that REPLY is automatically saved and
restored around ${|... REPLY=...}
(c) creating it local doesn't really add much that you can't do with
${ local value; ... } and
(d) part of the point was to be able to push the variable up to the
caller as a side effect, so you don't have to write
  value=${|value| ... value=...}
although I guess you do have to declare it somewhere so that's not
entirely helpful.

> The quoting approach is clean and logical and is probably my preferred
> option.  [...]  consistency with ${var} is perhaps more
> important - it does word splitting based on the shwordsplit option.

Thanks for the vote.

> > Separately, it's definitely possible to make zsh-mode ${ ... } trim
> > only one newline instead of all of them.
>
> Only one is probably the most useful. I would mostly associate the fact
> that $(...) strips multiple with the fact that it does word splitting

This is the code diff to make emulation trim all, ${ ... } trim one,
"${ ... }" trim none ... not re-doing the doc diff yet.

diff --git a/Src/subst.c b/Src/subst.c
index 49f7336bb..9d20a2d0e 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1900,6 +1900,7 @@ paramsubst(LinkList l, LinkNode n, char **str,
int qt, int pf_flags,
        /* The command string to be run by ${|...;} */
        char *cmdarg = NULL;
        size_t slen = 0;
+       int trim = (!EMULATION(EMULATE_ZSH)) ? 2 : !qt;
        inbrace = 1;
        s++;

@@ -2005,10 +2006,13 @@ paramsubst(LinkList l, LinkNode n, char **str,
int qt, int pf_flags,
                int onoerrs = noerrs, rplylen;
                noerrs = 2;
                rplylen = zstuff(&cmdarg, rplytmp);
-               if (! EMULATION(EMULATE_ZSH)) {
+               if (trim) {
                    /* bash and ksh strip trailing newlines here */
-                   while (rplylen > 0 && cmdarg[rplylen-1] == '\n')
+                   while (rplylen > 0 && cmdarg[rplylen-1] == '\n') {
                        rplylen--;
+                       if (trim == 1)
+                           break;
+                   }
                    cmdarg[rplylen] = 0;
                }
                noerrs = onoerrs;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-13  4:13                             ` Bart Schaefer
@ 2024-03-14 22:15                               ` Oliver Kiddle
  2024-03-15  8:42                                 ` Stephane Chazelas
  2024-03-27  7:05                                 ` Bart Schaefer
  0 siblings, 2 replies; 29+ messages in thread
From: Oliver Kiddle @ 2024-03-14 22:15 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
> like ${|REPLY=...} when of course piping to "read" etc. are also valid
> ways to assign to REPLY.  How often would there be a command name with
> no arguments in that position?

Probably not all that often.

> I hesitate in suggesting this, but ... is there any existing case in
> which "${{" is valid?  If not, I think I can change ${|var|...} to be
> ${{var}...} without too much violence (except to the doc, bleah).

Inner `$' in nested parameter expansions are fairly superfluous in
general. ${|var|...} is closer to the REPLY default with ${|...} but
other than that, I marginally prefer ${{var}...}
Certainly if it does involve much violence, what we currently have is
working.

> > That might look like a default value assignment to someone
>
> Would ${{=var}...} look better?  The doubled braces do give me pause.

Not as good as ${={var}...} but probably better.

> This is the code diff to make emulation trim all, ${ ... } trim one,
> "${ ... }" trim none ... not re-doing the doc diff yet.

Looks good to me.

Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-14 22:15                               ` Oliver Kiddle
@ 2024-03-15  8:42                                 ` Stephane Chazelas
  2024-03-27  1:16                                   ` Bart Schaefer
  2024-03-27  7:05                                 ` Bart Schaefer
  1 sibling, 1 reply; 29+ messages in thread
From: Stephane Chazelas @ 2024-03-15  8:42 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Bart Schaefer, zsh-workers

I don't know if that could be done and it's probably too late
anyway, but I thought I might throw in the idea anyway.

What about, instead of adding ksh93's ${ cmd;} and mksh's
${|cmd} (in slightly diverging ways), we added just a |
expansion flag whereby:

${(||)any zsh code} would expand to the output of the code
without the fork and without the newline trimming.

${(|var|)any zsh code} would expand to the value of var as set
by the zsh code

Some advantages:
- the flags can be cumulated as usual. So you can have ${(||.s[:])getconf PATH}
  to split the output of getconf PATH ("." to trim one newline,
  ".." to trim all) for example.
- there's no extra rule as to how the expansion works and how it
  can be combined with others as it's the same syntax as other
  parameter expansions
- as it's different syntax, it removes the potential surprises
  when ${ cmd;}, ${|cmd} behave differently than in
  ksh93/mksh/bash

=============

Or (as a completely different idea), an alternative to
mksh's ${|cmd} and ${|var|cmd} could be written ${REPLY<cmd}
${var<cmd}.

That could be added as well as ${|cmd} if we wanted to add
${|cmd} for compatibility with mksh/bash.

Or we could add neither of ${ cmd;} and ${|cmd} and have
${REPLY<cmd} as the (non-splitting, non-trimming) equivalent of
${|cmd} and ${<cmd} as the (non-splitting, non-trimming)
equivalent of ${ cmd;} (though the latter would prevent adding
${ cmd; } in the future).

And still allow flags there as in ${(.s[:])<getconf PATH}

-- 
Stephane
- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-15  8:42                                 ` Stephane Chazelas
@ 2024-03-27  1:16                                   ` Bart Schaefer
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Schaefer @ 2024-03-27  1:16 UTC (permalink / raw)
  To: zsh-workers

Delayed reply as I was traveling last week.

On Fri, Mar 15, 2024 at 1:42 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> What about, instead of adding ksh93's ${ cmd;} and mksh's
> ${|cmd} (in slightly diverging ways), we added just a |
> expansion flag

As mentioned in a previous context, the problem with this approach is
that the lexing/parsing of a parameter reference and the
lexing/parsing of what amounts to a function body are very different.
Upon encountering dollar-brace-pipe or dollar-brace-whitespace (or in
forthcoming proposed change, dollar-brace-brace), we can immediately
switch to expect a series of commands.  This allows for one-character
lookahead, which works with hungetc(). If required first to consume
parameter flags or any string of multiple characters, the lexer can't
backtrack without some serious gyrations.  Even if the backtracking
were worked out, the proposed flag now has semantics that the lexer
has to understand in order to proceed after the close-paren, whereas
current parameter flags are just swept up uninterpreted at lexing and
left to paramsubst() to decode.

On top of this the lexer has to maintain the PS2 context stack, which
was one of the most difficult bits of implementing the switch to/from
expecting commands vs. expecting (possibly nested) parameter
substitutions.

> Some advantages:
> - the flags can be cumulated as usual. So you can have ${(||.s[:])getconf PATH}

That would make this entirely impractical for lexing purposes.

> - there's no extra rule as to how the expansion works and how it
>   can be combined with others as it's the same syntax as other
>   parameter expansions

Except it's still not, because the syntax after the flags and up to
the matching close brace is nothing like identifiers / subscripts /
nested parameters.

> - as it's different syntax, it removes the potential surprises
>   when ${ cmd;}, ${|cmd} behave differently than in
>   ksh93/mksh/bash

Possibly, but since they'll work very similarly when in emulation
modes, I think this is minor.

> Or (as a completely different idea), an alternative to
> mksh's ${|cmd} and ${|var|cmd} could be written ${REPLY<cmd}
> ${var<cmd}.

I suspect that wouldn't interact as well with nested substitutions
(although I guess it wouldn't differ that much from ${REPLY=...} in
that respect), and it has the appearance of reading from a file.  I
don't especially like ${|...} that way either as it looks more like
writing than reading, but we're not setting the precedent there.

Given druthers, I'd have done something with $(...) instead of ${...},
 more like recognizing the "function" keyword so $(function { ... })
skips forking [ shorthand $(() { ... }) ] which could be done with
zero changes to the lexer/parser, but that already has conflicting
semantics with respect to [not] altering values in the current shell.

Patch to use ${{param} cmd} instead of ${|param| cmd} to follow in a bit.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-14 22:15                               ` Oliver Kiddle
  2024-03-15  8:42                                 ` Stephane Chazelas
@ 2024-03-27  7:05                                 ` Bart Schaefer
  1 sibling, 0 replies; 29+ messages in thread
From: Bart Schaefer @ 2024-03-27  7:05 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

On Thu, Mar 14, 2024 at 3:15 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > I hesitate in suggesting this, but ... is there any existing case in
> > which "${{" is valid?  If not, I think I can change ${|var|...} to be
> > ${{var}...} without too much violence (except to the doc, bleah).
>
> [...] I marginally prefer ${{var}...}
> Certainly if it does involve much violence, what we currently have is
> working.

It was slightly more violent than I expected, and consequently there
is probably some room for optimization, but the attached has it
working (minus Doc update as yet).

Following workers/52635 the extra "TEST COMPLETE" test in D10 is not
really needed any more.

[-- Attachment #2: nofork-doublecurly.txt --]
[-- Type: text/plain, Size: 6414 bytes --]

diff --git a/Src/lex.c b/Src/lex.c
index 31b130b07..700af2da1 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -1423,7 +1423,7 @@ gettokstr(int c, int sub)
 	if (lexstop)
 	    break;
 	if (!cmdsubst && in_brace_param && act == LX2_STRING &&
-	    (c == '|' || c == Bar || inblank(c))) {
+	    (c == '|' || c == Bar || c == '{' || c == Inbrace || inblank(c))) {
 	    cmdsubst = in_brace_param;
 	    cmdpush(CS_CURSH);
 	} else if (in_pattern == 2 && c != '/')
diff --git a/Src/subst.c b/Src/subst.c
index 9d20a2d0e..3764ed786 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1898,11 +1898,10 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
      */
     if (c == Inbrace) {
 	/* The command string to be run by ${|...;} */
-	char *cmdarg = NULL;
+	char *cmdarg = NULL, *endvar = NULL, inchar = *++s;
 	size_t slen = 0;
 	int trim = (!EMULATION(EMULATE_ZSH)) ? 2 : !qt;
 	inbrace = 1;
-	s++;
 
         /* Short-path for the nofork command substitution ${|cmd;}
 	 * See other comments about kludges for why this is here.
@@ -1913,43 +1912,74 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
          * should not be part of command substitution in any case.
          * Use ${(U)${|cmd;}} as you would for ${(U)$(cmd;)}.
 	 */
-	if (*s == '|' || *s == Bar || inblank(*s)) {
+	if (inchar == '|' || inchar == Bar || inblank(inchar)) {
 	    char *outbracep = s;
 	    char sav = *s;
 	    *s = Inbrace;
 	    if (skipparens(Inbrace, Outbrace, &outbracep) == 0) {
 		slen = outbracep - s - 1;
 		if ((*s = sav) != Bar) {
+		    /* This tokenize() is important */
 		    sav = *outbracep;
 		    *outbracep = '\0';
 		    tokenize(s);
 		    *outbracep = sav;
 		}
 	    }
+	} else if (inchar == '{' || inchar == Inbrace) {
+	    char *outbracep;
+	    *s = Inbrace;
+
+	    if ((outbracep = itype_end(s+1, INAMESPC, 0))) {
+		if (*outbracep == Inbrack &&
+		    (outbracep = parse_subscript(++outbracep, 1, ']')))
+		    ++outbracep;
+	    }
+	    /* True for valid substitution, or we messed up in lex.c */
+	    if (outbracep && *outbracep == Outbrace) {
+		char outchar = inchar == Inbrace ? Outbrace : '}';
+		endvar = outbracep++;
+
+		/* Reached the first close brace, find the last */
+		*endvar = '|';	/* Almost anything but braces/brackets */
+		outbracep = s;
+		if (skipparens(Inbrace, outchar, &outbracep) == 0)
+		    *endvar = Outbrace;
+		else {	/* Never happens? */
+		    *endvar = outchar;
+		    outbracep = endvar + 1;
+		}
+		slen = outbracep - s - 1;
+		if (inchar != Inbrace) {
+		    char sav = *outbracep;
+		    *outbracep = '\0';
+		    tokenize(s);
+		    *outbracep = sav;
+		    outbracep[-1] = Outbrace;
+		}
+	    } else {
+		zerr("bad substitution");
+		return NULL;
+	    }
 	}
 	if (slen > 1) {
 	    char *outbracep = s + slen;
 	    if (*outbracep == Outbrace) {
-		if ((rplyvar = itype_end(s+1, INAMESPC, 0))) {
-		    if (*rplyvar == Inbrack &&
-			(rplyvar = parse_subscript(++rplyvar, 1, ']')))
-			++rplyvar;
-		}
-		if (rplyvar == s+1 && *rplyvar == Bar) {
-		    /* Is ${||...} a subtitution error or a syntax error?
+		if (endvar == s+1 && !inblank(*endvar)) {
+		    /* Is ${{}...} a substitution error or a syntax error?
 		    zerr("bad substitution");
 		    return NULL;
 		    */
 		    rplyvar = NULL;
 		}
-		if (rplyvar && *rplyvar == Bar) {
-		    cmdarg = dupstrpfx(rplyvar+1, outbracep-rplyvar-1);
-		    rplyvar = dupstrpfx(s+1,rplyvar-s-1);
+		if (endvar && *endvar == Outbrace) {
+		    cmdarg = dupstrpfx(endvar+1, outbracep-endvar-1);
+		    rplyvar = dupstrpfx(s+1,endvar-s-1);
 		} else {
 		    cmdarg = dupstrpfx(s+1, outbracep-s-1);
 		    rplyvar = "REPLY";
 		}
-		if (inblank(*s)) {
+		if (inblank(inchar)) {
 		    /*
 		     * Admittedly a hack.  Take advantage of the enforced
 		     * locality of REPLY and the semantics of $(<file) to
diff --git a/Test/D10nofork.ztst b/Test/D10nofork.ztst
index fc6b84613..0616cf9e9 100644
--- a/Test/D10nofork.ztst
+++ b/Test/D10nofork.ztst
@@ -14,6 +14,28 @@
 0:Basic substitution and REPLY scoping
 >INNER OUTER
 
+  reply=(x OUTER x)
+  purl ${{reply}reply=(\{ INNER \})} $reply
+0:Basic substitution, brace quoting, and array result
+>{
+>INNER
+>}
+>{
+>INNER
+>}
+
+  () {
+    setopt localoptions ignorebraces
+    purl ${{reply} reply=({ INNER })} $reply
+  }
+0:Basic substitution, ignorebraces, and array result
+>{
+>INNER
+>}
+>{
+>INNER
+>}
+
   purr ${| REPLY=first}:${| REPLY=second}:$REPLY
 0:re-scoping of REPLY in one statement
 >first:second:OUTER
@@ -229,7 +251,7 @@ F:Why not use this error in the previous case as well?
 >26
 
   unset reply
-  purl ${|reply| reply=(1 2 ${| REPLY=3 } 4) }
+  purl ${{reply} reply=(1 2 ${| REPLY=3 } 4) }
   typeset -p reply
 0:array behavior with global assignment
 >1
@@ -315,7 +337,7 @@ F:status of "print" should hide return
 
   unset zz
   outer=GLOBAL
-  purr "${|zz|
+  purr "${{zz}
    local outer=LOCAL
    zz=NONLOCAL
   } $outer $?"
@@ -453,6 +475,7 @@ F:must do this before evaluating the next test block
 1:ignored braces, part 4
 ?(eval):3: parse error near `}'
 
+  unsetopt ignorebraces
   # "break" blocks function calls in outer loop
   # Could use print, but that might get fixed
   repeat 3 do purr ${
@@ -467,11 +490,6 @@ F:must do this before evaluating the next test block
 ?1
 ?2
 
-  print -u $ZTST_fd ${ZTST_testname}: TEST COMPLETE
-0:make sure we got to the end
-F:some tests might silently break the test harness
-
 %clean
 
   unfunction purr purl
-  unsetopt ignorebraces
diff --git a/Test/V10private.ztst b/Test/V10private.ztst
index ed51316f3..26004a2dc 100644
--- a/Test/V10private.ztst
+++ b/Test/V10private.ztst
@@ -497,7 +497,7 @@ F:Better if caught in checkclobberparam() but exec.c doesn't know scope
  () {
    private z=outer
    print ${(t)z} $z
-   print ${| REPLY=${|z| z=nofork} }
+   print ${| REPLY=${{z} z=nofork} }
    print ${(t)z} $z
  }
 0:nofork may write to private in calling function
@@ -518,9 +518,9 @@ F:Better if caught in checkclobberparam() but exec.c doesn't know scope
  () {
    private z=outer
    print ${(t)z} $z
-   print ${|z|
+   print ${{z}
      private q
-     z=${|q| q=nofork}
+     z=${{q} q=nofork}
    }
    print ${(t)z} $z
  }
@@ -533,7 +533,7 @@ F:Better if caught in checkclobberparam() but exec.c doesn't know scope
    print ${|
      () { REPLY="{$q}" }
    }
-   print ${|q|
+   print ${{q}
      () { q=nofork }
    }
  }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH?] Nofork and removing newlines
  2024-03-07 19:02               ` Bart Schaefer
@ 2024-04-02  6:45                 ` Lawrence Velázquez
  0 siblings, 0 replies; 29+ messages in thread
From: Lawrence Velázquez @ 2024-04-02  6:45 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On Thu, Mar 7, 2024, at 2:02 PM, Bart Schaefer wrote:
> I'm leaving this in the same discussion thread because I just noticed
> that ${|...} and ${ cmd } do not really respect the
> IGNORE_CLOSE_BRACES option.  Setting that option changes handling of
> unbalanced braces (and I'm not yet sure if it does so in a sensible
> way) but does not force use of the semicolon e.g. in ${ cmd; } which
> theoretically it should.  Is this worth trying to work in?

It would be greatly preferable if IGNORE_CLOSE_BRACES were respected,
so that we don't have yet another exception that has to be documented
and watched out for.  However, I can't opine on whether it'd be worth
doing, since I don't know how hard it'd be and won't be working on it
in any case.

(Sorry if this has already been settled; I'm just now catching up on
some older threads.)

-- 
vq


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-04-02  6:46 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-05  5:52 [PATCH?] Nofork and removing newlines Bart Schaefer
2024-03-05  6:56 ` Stephane Chazelas
2024-03-05 22:48   ` Bart Schaefer
2024-03-06 17:57     ` Stephane Chazelas
2024-03-06 19:45       ` Bart Schaefer
2024-03-06 22:22         ` Mikael Magnusson
2024-03-06 22:42           ` Bart Schaefer
2024-03-07  4:53           ` Bart Schaefer
2024-03-07  7:02             ` Lawrence Velázquez
2024-03-07  8:09               ` ${<file} (Was: [PATCH?] Nofork and removing newlines) Stephane Chazelas
2024-03-08  1:29               ` [PATCH?] Nofork and removing newlines Bart Schaefer
2024-03-08 22:15                 ` Oliver Kiddle
2024-03-08 23:28                   ` Bart Schaefer
2024-03-09 20:43                     ` Oliver Kiddle
2024-03-10  6:11                       ` Bart Schaefer
2024-03-12 17:54                         ` Bart Schaefer
2024-03-12 23:19                           ` Oliver Kiddle
2024-03-13  4:13                             ` Bart Schaefer
2024-03-14 22:15                               ` Oliver Kiddle
2024-03-15  8:42                                 ` Stephane Chazelas
2024-03-27  1:16                                   ` Bart Schaefer
2024-03-27  7:05                                 ` Bart Schaefer
2024-03-07  7:10             ` Stephane Chazelas
2024-03-08  0:37               ` Bart Schaefer
2024-03-07  6:52           ` Lawrence Velázquez
2024-03-07  8:26             ` Mikael Magnusson
2024-03-07 19:02               ` Bart Schaefer
2024-04-02  6:45                 ` Lawrence Velázquez
2024-03-06 19:43     ` Stephane Chazelas

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).