zsh-users
 help / color / mirror / code / Atom feed
* ${^var} and word splitting
@ 2014-11-24  9:56 Stephane Chazelas
  2014-11-24 11:12 ` Peter Stephenson
       [not found] ` <20141124111201.161d8cf2__23261.8202259347$1416827641$gmane$org@pwslap01u.europe.root.pri>
  0 siblings, 2 replies; 9+ messages in thread
From: Stephane Chazelas @ 2014-11-24  9:56 UTC (permalink / raw)
  To: Zsh hackers list

$ a='  1 2   3  '
$ print -l $=a
1
2
3
$ print -l x$^=a
x
x1
x2
x3
x
$ print -l x${^${=a}}
x1
x2
x3


Why the extra "x" lines with x$^=a ?

Same for the (s:sep:) or (f) expansion flags.

-- 
Stephane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
  2014-11-24  9:56 ${^var} and word splitting Stephane Chazelas
@ 2014-11-24 11:12 ` Peter Stephenson
       [not found] ` <20141124111201.161d8cf2__23261.8202259347$1416827641$gmane$org@pwslap01u.europe.root.pri>
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2014-11-24 11:12 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, 24 Nov 2014 09:56:37 +0000
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> $ a='  1 2   3  '
> $ print -l $=a
> 1
> 2
> 3
> $ print -l x$^=a
> x
> x1
> x2
> x3
> x
> $ print -l x${^${=a}}
> x1
> x2
> x3
> 
> 
> Why the extra "x" lines with x$^=a ?

In the case of $^=a, the steps are

- split a.  There's whitespace start and end so you get null elements
  corresponding to those.
- add the x's in front
- remove remaining null elements, but there aren't any.

With nested expansion, you get

- split a: same result
- remove null elements (before the x's get added).
- add the x's
- remove null elements for this level (but there aren't any more)

C.f.

$ print -l x${^"${(@)=a}"}
x
x1
x2
x3
x

which has been told explicitly to keep the null elements despite the nesting.

-- 
Peter Stephenson <p.stephenson@samsung.com>  Principal Software Engineer
Tel: +44 (0)1223 434724                Samsung Cambridge Solution Centre
St John's House, St John's Innovation Park, Cowley Road,
Cambridge, CB4 0DS, UK


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
       [not found] ` <20141124111201.161d8cf2__23261.8202259347$1416827641$gmane$org@pwslap01u.europe.root.pri>
@ 2014-11-24 15:26   ` Stephane Chazelas
  2014-11-24 15:55     ` Peter Stephenson
       [not found]     ` <20141124155524.0739b3ec__26419.4987401881$1416845250$gmane$org@pwslap01u.europe.root.pri>
  0 siblings, 2 replies; 9+ messages in thread
From: Stephane Chazelas @ 2014-11-24 15:26 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2014-11-24 11:12:01 +0000, Peter Stephenson:
> On Mon, 24 Nov 2014 09:56:37 +0000
> Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> > $ a='  1 2   3  '
> > $ print -l $=a
> > 1
> > 2
> > 3
> > $ print -l x$^=a
> > x
> > x1
> > x2
> > x3
> > x
> > $ print -l x${^${=a}}
> > x1
> > x2
> > x3
> > 
> > 
> > Why the extra "x" lines with x$^=a ?
> 
> In the case of $^=a, the steps are
> 
> - split a.  There's whitespace start and end so you get null elements
>   corresponding to those.
> - add the x's in front
> - remove remaining null elements, but there aren't any.

OK thanks. that's a difference from other shells I was not aware
of and it seems to be as documented indeed.

The source of my confusion can be simplified to:

~$ a='  1 2   3     '
~$ printf '%s\n' "${=a}"

1
2
3

~$


In other shells, leading/trailing _IFS white space_ characters
are ignored as part of word splitting, not in zsh.

If I understand correctly, in zsh the removing of those are
accounted to null-removal in things like:

$ print -l $=a
1
2
3

But then it's not clear why they are removed there and not in:

a=':a::b:'
IFS=:
print -l $=a

?


-- 
Stephane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
  2014-11-24 15:26   ` Stephane Chazelas
@ 2014-11-24 15:55     ` Peter Stephenson
  2014-11-24 16:55       ` Bart Schaefer
       [not found]     ` <20141124155524.0739b3ec__26419.4987401881$1416845250$gmane$org@pwslap01u.europe.root.pri>
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2014-11-24 15:55 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, 24 Nov 2014 15:26:28 +0000
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> If I understand correctly, in zsh the removing of those are
> accounted to null-removal in things like:
> 
> $ print -l $=a
> 1
> 2
> 3
> 
> But then it's not clear why they are removed there and not in:
> 
> a=':a::b:'
> IFS=:
> print -l $=a

I looked at the code and you're exactly right: it's not clear.  There's
a parameter determining how the split function behaves and there's an
argument allownull that I already noted I didn't understand in the
comment to sepsplit(), determining whether the argument being set will
be empty or will be set to something that has no effect except
preventing the argument being removed later.  In the case in question
this is zero.

Consequently it's easy to change the behaviour in the second case...
This doesn't cause any test failures.  Unless anyone has any ideas why
we do this, maybe we should simplify it like this?  If anyone does have
ideas, we should write a test for that case.

There's one other case in parameter substitution to do with assignment
within the substitution that presumably ought to be consistent.

diff --git a/Src/subst.c b/Src/subst.c
index 61aa1c1..17f35be 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -3322,7 +3322,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
 	    isarr = 0;
 	}
 	if (!ssub && (spbreak || spsep)) {
-	    aval = sepsplit(val, spsep, 0, 1);
+	    aval = sepsplit(val, spsep, 1, 1);
 	    if (!aval || !aval[0])
 		val = dupstring("");
 	    else if (!aval[1])

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
  2014-11-24 15:55     ` Peter Stephenson
@ 2014-11-24 16:55       ` Bart Schaefer
  2014-11-24 17:22         ` Peter Stephenson
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2014-11-24 16:55 UTC (permalink / raw)
  To: Zsh hackers list

On Nov 24,  3:55pm, Peter Stephenson wrote:
} Subject: Re: ${^var} and word splitting
}
} On Mon, 24 Nov 2014 15:26:28 +0000
} Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
} > If I understand correctly, in zsh the removing of those are
} > accounted to null-removal in things like:
} > 
} > $ print -l $=a
} > 1
} > 2
} > 3
} > 
} > But then it's not clear why they are removed there and not in:
} > 
} > a=':a::b:'
} > IFS=:
} > print -l $=a
} 
} I looked at the code and you're exactly right: it's not clear.

Isn't it always the case that *whitespace* in IFS is treated differently
than non-whitespace?  E.g. consecutive whitespace is treated as a single
character, so (IFS=" " "a  b") is two words but (IFS=: "a::b") is three?

I'm not actually able to try examples at the moment so maybe I'm just not
following something.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
  2014-11-24 16:55       ` Bart Schaefer
@ 2014-11-24 17:22         ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2014-11-24 17:22 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, 24 Nov 2014 08:55:08 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Nov 24,  3:55pm, Peter Stephenson wrote:
> } Subject: Re: ${^var} and word splitting
> }
> } On Mon, 24 Nov 2014 15:26:28 +0000
> } Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> } > If I understand correctly, in zsh the removing of those are
> } > accounted to null-removal in things like:
> } > 
> } > $ print -l $=a
> } > 1
> } > 2
> } > 3
> } > 
> } > But then it's not clear why they are removed there and not in:
> } > 
> } > a=':a::b:'
> } > IFS=:
> } > print -l $=a
> } 
> } I looked at the code and you're exactly right: it's not clear.
> 
> Isn't it always the case that *whitespace* in IFS is treated differently
> than non-whitespace?  E.g. consecutive whitespace is treated as a single
> character, so (IFS=" " "a  b") is two words but (IFS=: "a::b") is three?

Yes, that's right, but what zsh is doing is a bit funny: as Stephane
notes, in other shells you don't get the null arguments in the first
place if the special whitespace rule is being followed, it's not a
question of whether they get removed later or not.  At least I think
so --- splitting is implicit in other shells so I may just not
be quite doing the equivalent.  Here's what I did in bash:

$ fn() { local arg; for arg in "$@"; do echo $arg; done; }
$ fn2() { fn $1; }
$ fn2 '    a    b    c   '
a
b
c
$

So that $1 argument to fn2 gets split in the way we're talking about,
while within fn we make sure we pick up every piece that's been
split from it.  This definitely looks different from zsh.

However, I think what you mention is indeed the source of the
difference Stephane noticed, because doubling a whitespace character in
IFS does have the documented effect of making it work the other way
(see the zshparam manual; this is without the patch which is
obviously not a correct fix):


% a=' a b c '
% print $=a
a b c
% print -l $=a
a
b
c
% IFS='  ' # two spaces
% print -l $=a

a
b
c

%


So that explains the mysterious allownull whatever the explanation for
the implementation.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
       [not found]     ` <20141124155524.0739b3ec__26419.4987401881$1416845250$gmane$org@pwslap01u.europe.root.pri>
@ 2014-11-24 21:18       ` Stephane Chazelas
  2014-11-25  7:49         ` Bart Schaefer
       [not found]         ` <141124234931.ZM17259__8246.8130779036$1416901919$gmane$org@torch.brasslantern.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Stephane Chazelas @ 2014-11-24 21:18 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2014-11-24 15:55:24 +0000, Peter Stephenson:
[...]
> Consequently it's easy to change the behaviour in the second case...
> This doesn't cause any test failures.  Unless anyone has any ideas why
> we do this, maybe we should simplify it like this?  If anyone does have
> ideas, we should write a test for that case.
[...]

I'd say no. People expect:

IFS=:
PATH=/bin::/usr/bin
setopt shwordsplit
set -- $PATH

to split $PATH into /bin, "" and /usr/bin. That's how all the
shells (except the Bourne shell) behave (and POSIX requires) and
is the whole point of having _IFS white space_ in the first
place.

(BTW, POSIX also requires :/bin::/usr/bin: to be split into "",
"/bin", "" and "/usr/bin" (not another "") as IFS is the
internal field _delimiter_ there, not _separator_. I tend to
prefer the zsh way (also yash's and older versions of pdksh)
though.)

What I don't like much is IFS white spaces (or x's with (s:x:))
to be collapsed *but not removed from head and tail*.

The whole point of having /IFS white spaces/ was to split
strings the /natural/ way (like words in a text, like awk's
fields or like the Bourne shell did for any character of $IFS,
not just the whitespace ones). That means considering sequences
of blanks as one *and* leading and trailing blanks not to create
fields. A string like " : foo : bar : : baz  " would be split
into "", foo, bar, "" and baz.

I don't see the point in doing one and not the other. IOW in:

~$ a=' a  b ' zsh -c 'print -l ${(s, ,)a}'
a
b
~$ a=' a  b ' zsh -c 'print -l "${(s, ,)a}"'

a
b

~$ a=' a  b ' zsh -c 'print -l "${(s, ,@)a}"'

a

b

~$


I'd rather 2 above behave either like 1 or 2. I'm fine with 1
and 3 behave like they do now.

It may be too late to change the behaviour now, though I'd find
it hard to imagine people relying on "$=var" to make empty
arguments at the beginning and end but not in the middle.

-- 
Stephane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
  2014-11-24 21:18       ` Stephane Chazelas
@ 2014-11-25  7:49         ` Bart Schaefer
       [not found]         ` <141124234931.ZM17259__8246.8130779036$1416901919$gmane$org@torch.brasslantern.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2014-11-25  7:49 UTC (permalink / raw)
  To: Zsh hackers list

On Nov 24,  9:18pm, Stephane Chazelas wrote:
}
} I don't see the point in doing one and not the other. IOW in:
} 
} ~$ a=' a  b ' zsh -c 'print -l ${(s, ,)a}'
} a
} b
} ~$ a=' a  b ' zsh -c 'print -l "${(s, ,)a}"'
} 
} a
} b
} 
} ~$ a=' a  b ' zsh -c 'print -l "${(s, ,@)a}"'
} 
} a
} 
} b
} 
} ~$
} 
} I'd rather 2 above behave either like 1 or 2.

(Working with the presumption you mean "like 1 or 3".)

This may go back to a misinterpretation of documentation -- there are a
lot of little things about zsh that got that way because e.g. examples
in the ksh88 documentation were implemented without completely knowing
what was BNF-style markup and what was actual syntax.

Nevertheless I think the intention was that #2 is "collapse consecutive
whitespace to a single space and then act like #3".

In any case it all depends on where you apply the (@):

% a=' a  b ' zsh -c 'print -l "${(@)${(s, ,)a}}"'
a
b
% 


} It may be too late to change the behaviour now, though I'd find
} it hard to imagine people relying on "$=var" to make empty
} arguments at the beginning and end but not in the middle.

I have the nagging suspicion there may be cases in the completion code
that expect exactly that ... or that have been programmed to work around
it and would need to be fixed if it changes.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ${^var} and word splitting
       [not found]         ` <141124234931.ZM17259__8246.8130779036$1416901919$gmane$org@torch.brasslantern.com>
@ 2014-11-25 12:12           ` Stephane Chazelas
  0 siblings, 0 replies; 9+ messages in thread
From: Stephane Chazelas @ 2014-11-25 12:12 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2014-11-24 23:49:31 -0800, Bart Schaefer:
[...]
> } ~$ a=' a  b ' zsh -c 'print -l ${(s, ,)a}'
> } a
> } b
> } ~$ a=' a  b ' zsh -c 'print -l "${(s, ,)a}"'
> } 
> } a
> } b
> } 
> } ~$ a=' a  b ' zsh -c 'print -l "${(s, ,@)a}"'
> } 
> } a
> } 
> } b
> } 
> } ~$
> } 
> } I'd rather 2 above behave either like 1 or [3].
[...]
> } It may be too late to change the behaviour now, though I'd find
> } it hard to imagine people relying on "$=var" to make empty
> } arguments at the beginning and end but not in the middle.
> 
> I have the nagging suspicion there may be cases in the completion code
> that expect exactly that ... or that have been programmed to work around
> it and would need to be fixed if it changes.
[...]

I'd be surprised if it were the case.

Anyway, if one wants 1, he can write it as 1, and if one wants
3, he can write it as 3.

So, that's no big deal if 2 stays the way it is. It's just that
I don't find it intuitive or /consistent/.

2 is specific to zsh anyway. Other shells don't split inside
double quotes (except for ${array[@]}) and $^a is zsh-specific.

So it's not a question of compatibility with other shells.
zsh -o shwordsplit works like other shells where the behaviour
is defined in other shells.

-- 
Stephane


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-11-25 12:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-24  9:56 ${^var} and word splitting Stephane Chazelas
2014-11-24 11:12 ` Peter Stephenson
     [not found] ` <20141124111201.161d8cf2__23261.8202259347$1416827641$gmane$org@pwslap01u.europe.root.pri>
2014-11-24 15:26   ` Stephane Chazelas
2014-11-24 15:55     ` Peter Stephenson
2014-11-24 16:55       ` Bart Schaefer
2014-11-24 17:22         ` Peter Stephenson
     [not found]     ` <20141124155524.0739b3ec__26419.4987401881$1416845250$gmane$org@pwslap01u.europe.root.pri>
2014-11-24 21:18       ` Stephane Chazelas
2014-11-25  7:49         ` Bart Schaefer
     [not found]         ` <141124234931.ZM17259__8246.8130779036$1416901919$gmane$org@torch.brasslantern.com>
2014-11-25 12:12           ` Stephane Chazelas

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).