Hi, I was going to report a bug about a discrepancy in the handling of IFS, until I read what the POSIX standard says about it [1]. The example is this: IFS=, str='foo,bar,,roo,' printf '"%s"\n' $str In bash there's four fields, the last comma is ignored, in zsh there's five fields. In my system dash and ksh also output four fields, like bash. However, this is what POSIX says: 3.b. Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field, as described previously. We ignore all the white space stuff (since we are not using white spaces), and thus: Each occurrence in the input of an IFS character shall delimit a field. In zsh each occurrence of a comma does delimit a field (4 commas, 5 fields), which to me is what POSIX says should happen. So in this particular case it seems zsh is complying with POSIX (even in zsh mode), and all other shells are not. So there's no bug (at least in zsh), I just wanted to let you know what I found, and see if you agreed with my interpretation. Cheers. [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05 -- Felipe Contreras
[-- Attachment #1: Type: text/plain, Size: 1325 bytes --] > On Mar 30, 2023, at 7:13 AM, Felipe Contreras <felipe.contreras@gmail.com> wrote: > However, this is what POSIX says: > > 3.b. Each occurrence in the input of an IFS character that is not > IFS white space, along with any adjacent IFS white space, shall > delimit a field, as described previously. > > We ignore all the white space stuff (since we are not using white > spaces), and thus: > > Each occurrence in the input of an IFS character shall delimit a field. > > In zsh each occurrence of a comma does delimit a field (4 commas, 5 > fields), which to me is what POSIX says should happen. > > So in this particular case it seems zsh is complying with POSIX (even > in zsh mode), and all other shells are not. Before the excerpt you quoted, XCU 2.6.5 says: “The shell shall treat each character of the IFS as a delimiter and use the delimiters as field terminators to split the results of parameter expansion, command substitution, and arithmetic expansion into fields.” The bash/dash/ksh behavior is not unreasonable if the phrase “field terminators” is interpreted strictly. In any case, I believe the standard intends to describe the ksh behavior: https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html#tag_23_02_06_05 -- vq Sent from my iPhone [-- Attachment #2: Type: text/html, Size: 2862 bytes --]
On Thu, Mar 30, 2023 at 6:05 AM Lawrence Velázquez <larryv@zsh.org> wrote:
>
> On Mar 30, 2023, at 7:13 AM, Felipe Contreras <felipe.contreras@gmail.com> wrote:
>
> However, this is what POSIX says:
>
> 3.b. Each occurrence in the input of an IFS character that is not
> IFS white space, along with any adjacent IFS white space, shall
> delimit a field, as described previously.
>
> We ignore all the white space stuff (since we are not using white
> spaces), and thus:
>
> Each occurrence in the input of an IFS character shall delimit a field.
>
> In zsh each occurrence of a comma does delimit a field (4 commas, 5
> fields), which to me is what POSIX says should happen.
>
> So in this particular case it seems zsh is complying with POSIX (even
> in zsh mode), and all other shells are not.
>
>
> Before the excerpt you quoted, XCU 2.6.5 says: “The shell shall treat each character of the IFS as a delimiter and use the delimiters as field terminators to split the results of parameter expansion, command substitution, and arithmetic expansion into fields.”
>
> The bash/dash/ksh behavior is not unreasonable if the phrase “field terminators” is interpreted strictly.
>
> In any case, I believe the standard intends to describe the ksh behavior:
Yes, I was about to click send to point that out.
So if IFS contains terminators, and not separators, this should
generate 5 fields:
IFS=';'
str='foo;bar;;roo;;'
printf '"%s"\n' $str
For: 'foo;' 'bar;' ';' 'roo;' ';'
In which case bash is correct, zsh generates 6 fields, so it's not.
Seems weird that a variable called Internal Field Separator is not a
*separator*, but a terminator.
I'm changing the subject to reflect that.
Cheers.
--
Felipe Contreras
On 2023-03-30 05:10, Felipe Contreras wrote:
> Seems weird that a variable called Internal Field Separator is not a
> *separator*, but a terminator.
>
> I'm changing the subject to reflect that.
Just some unwanted commentary: Should one need to be a technical lawyer
to decide this? If one pointedly adds another
separator/terminator/delimiter/ender or whatever one might call it, one
has probably done so for a reason and that reason would almost
inevitably be that one intends to add another field even if empty. Thus
any shell the ignores such a character is throwing away syntax space and
acceding to the idea that characters in code can be ignored -- which
might in very limited situations be admissible but not very often. So
if zsh did other than it does and I crashed into that while writing
something, I'd foam at the mouth. So zsh is the good-guy here IMHO.
Practicality should trump legality almost every time.
This has been discussed before, e.g. workers/48498 about 2 years ago. There are even xfail tests in E03posix.ztst making note of it, added in workers/48560.
On Thu, Mar 30, 2023 at 8:49 AM Ray Andrews <rayandrews@eastlink.ca> wrote: > > On 2023-03-30 05:10, Felipe Contreras wrote: > > Seems weird that a variable called Internal Field Separator is not a > > *separator*, but a terminator. > > > > I'm changing the subject to reflect that. > Just some unwanted commentary: Should one need to be a technical lawyer > to decide this? If one pointedly adds another > separator/terminator/delimiter/ender or whatever one might call it, one > has probably done so for a reason and that reason would almost > inevitably be that one intends to add another field even if empty. Thus > any shell the ignores such a character is throwing away syntax space and > acceding to the idea that characters in code can be ignored -- which > might in very limited situations be admissible but not very often. So > if zsh did other than it does and I crashed into that while writing > something, I'd foam at the mouth. So zsh is the good-guy here IMHO. > Practicality should trump legality almost every time. Yeah, I agree zsh's behavior is much more useful, but I'm not talking about zsh's behavior by default, but in sh mode. If POSIX seems to specify terminators instead of separators, and that's what most shells do, shouldn't zsh in sh mode do the same? -- Felipe Contreras
On 2023-03-30 08:09, Felipe Contreras wrote:
> Yeah, I agree zsh's behavior is much more useful, but I'm not talking
> about zsh's behavior by default, but in sh mode.
>
Ah! Then I should be ranting ;-) It should be that way by default.
If in doubt, do the useful thing.
On Thu, Mar 30, 2023 at 8:58 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> This has been discussed before, e.g. workers/48498 about 2 years ago.
> There are even xfail tests in E03posix.ztst making note of it, added
> in workers/48560.
OK. But I don't see much of a conclusion.
Do you believe POSIX says these should be two fields? IFS=: str="a:b:"
POSIX does say the delimiter shall be considered a field terminator,
but str="a" has no field terminator, does that mean there's no valid
field? I understand from the point of view of processing strings in a
language like C it makes sense to consider an unterminated field
valid, but that's an assumption, POSIX doesn't specify that. It could
be considered that "return 0" is not a valid field (if it doesn't end
in a semicolon).
Or, one could assume POSIX meant in the case of str="a" the end of
string shall be considered an implicit terminator, but in that case
"a:b:" would have three fields, therefore making the terminators
identical to separators. In which case zsh is actually compatible with
POSIX.
So the options are:
1. zsh is compatible with POSIX
2. bash and other shells are compatible with POSIX
3. All are compatible since POSIX isn't clear
If POSIX isn't clear, then there's not much reason to implement
behavior just because other shells do it. But if you believe POSIX is
clear that the behavior of other shells is the correct one, then it
might make sense to implement it in sh mode.
Cheers.
--
Felipe Contreras
Hi, indeed an original bourne shell, and ksh88 and ksh94 behave exactly like bash On the other hand, tcsh and the ultra modern fish behave like zsh. Thus, I don't see the need for changing zsh. Let it as it is. It's in proud companionship. Cheers Tom --- Ursprüngliche Nachricht --- Von: Felipe Contreras <felipe.contreras@gmail.com> Datum: 30.03.2023 13:11:46 An: Zsh Users <zsh-users@zsh.org> Betreff: Discrepancy in IFS handling (zsh is POSIX compliant) Hi, I was going to report a bug about a discrepancy in the handling of IFS, until I read what the POSIX standard says about it [1]. The example is this: IFS=, str='foo,bar,,roo,' printf '"%s"\n' $str In bash there's four fields, the last comma is ignored, in zsh there's five fields. In my system dash and ksh also output four fields, like bash. However, this is what POSIX says: 3.b. Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field, as described previously. We ignore all the white space stuff (since we are not using white spaces), and thus: Each occurrence in the input of an IFS character shall delimit a field. In zsh each occurrence of a comma does delimit a field (4 commas, 5 fields), which to me is what POSIX says should happen. So in this particular case it seems zsh is complying with POSIX (even in zsh mode), and all other shells are not. So there's no bug (at least in zsh), I just wanted to let you know what I found, and see if you agreed with my interpretation. Cheers. [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05 -- Felipe Contreras
On Thu, Mar 30, 2023 at 6:05 AM Lawrence Velázquez <larryv@zsh.org> wrote:
>
> On Mar 30, 2023, at 7:13 AM, Felipe Contreras <felipe.contreras@gmail.com> wrote:
>
> However, this is what POSIX says:
>
> 3.b. Each occurrence in the input of an IFS character that is not
> IFS white space, along with any adjacent IFS white space, shall
> delimit a field, as described previously.
>
> We ignore all the white space stuff (since we are not using white
> spaces), and thus:
>
> Each occurrence in the input of an IFS character shall delimit a field.
>
> In zsh each occurrence of a comma does delimit a field (4 commas, 5
> fields), which to me is what POSIX says should happen.
>
> So in this particular case it seems zsh is complying with POSIX (even
> in zsh mode), and all other shells are not.
>
>
> Before the excerpt you quoted, XCU 2.6.5 says: “The shell shall treat each character of the IFS as a delimiter and use the delimiters as field terminators to split the results of parameter expansion, command substitution, and arithmetic expansion into fields.”
I was just about to mention that, and I thought I replied to you, but
apparently not.
So if IFS contains terminators, and not separators, this should
generate 5 fields:
IFS=';'
str='foo;bar;;roo;;'
printf '"%s"\n' $str
For: 'foo;' 'bar;' ';' 'roo;' ';'
In which case bash is correct, and zsh is not.
But, 'foo' doesn't contain any terminators, so it does not contain any
field, and should be dropped. Unless 1) you consider the end of the
string as a terminator, or 2) consider the terminator of the last
field as optional.
If you consider the end of the string as a terminator (1), then 'foo;'
contains two fields, not one, in which case zsh is correct. This makes
the terminators behave identically as separators.
If you consider the terminator of the last field as optional, then
bash (and other shells) are correct, but in that case what's the point
of terminators if they aren't actually going to demarcate the
*terminaton* of fields?
I think everyone can agree POSIX is not clear about this.
--
Felipe Contreras
On Fri, Mar 31, 2023 at 10:38 AM Thomas Paulsen
<thomas.paulsen@firemail.de> wrote:
> indeed an original bourne shell, and ksh88 and ksh94 behave exactly like bash
> On the other hand, tcsh and the ultra modern fish behave like zsh.
>
> Thus, I don't see the need for changing zsh. Let it as it is. It's in proud companionship.
Nobody is saying we should change zsh.
I'm saying if POSIX says the behavior shall be like bourne shell, then
perhaps zsh should do that in sh mode (not in zsh mode).
Cheers.
--
Felipe Contreras
On Fri, Mar 31, 2023, at 4:16 PM, Felipe Contreras wrote: > I think everyone can agree POSIX is not clear about this. I do agree. It looks like a situation where everyone involved was already familiar with the intended behavior (as per Chet Ramey [1]) and baked their assumptions into the drafted text, leaving it less explicit than it should have been. For the curious, a bug report has been opened on the Austin Group defect tracker [2]. [1]: https://lists.gnu.org/archive/html/bug-bash/2023-03/msg00175.html [2]: https://austingroupbugs.net/view.php?id=1649 -- vq