zsh-workers
 help / color / mirror / code / Atom feed
* shwordsplit: final non-whitespace IFS character problem
@ 2017-08-04  2:03 Martijn Dekker
  2017-08-04 10:56 ` Stephane Chazelas
  2017-08-06 18:08 ` Peter Stephenson
  0 siblings, 2 replies; 7+ messages in thread
From: Martijn Dekker @ 2017-08-04  2:03 UTC (permalink / raw)
  To: Zsh hackers list

In field/word splitting, a final non-whitespace IFS delimiter character
is counted as an empty field. This is unlike every other current shell
(including current versions of pdksh, i.e. mksh and OpenBSD ksh).

Test script:

setopt shwordsplit
IFS=:
x=a:b:
set -- $x
echo $#

Expected output: 2
Actual output: 3

The POSIX standard appears pretty ambiguous on this:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070
[click on "Show all old Histories" for complete discussion]

However, it turns out that the intention of the standard was clarified
back in 1995 (!) in:
http://www.open-std.org/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html

IOW, the intention of the POSIX standard is for IFS characters to be
field terminators rather than separators (in spite of the S in IFS).

Sorry if the timing of this is awkward (shortly before an upcoming release).

Thanks,

- M.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shwordsplit: final non-whitespace IFS character problem
  2017-08-04  2:03 shwordsplit: final non-whitespace IFS character problem Martijn Dekker
@ 2017-08-04 10:56 ` Stephane Chazelas
  2017-08-04 11:13   ` Peter Stephenson
  2017-08-06 18:08 ` Peter Stephenson
  1 sibling, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2017-08-04 10:56 UTC (permalink / raw)
  To: Martijn Dekker; +Cc: Zsh hackers list

2017-08-04 04:03:19 +0200, Martijn Dekker:
> In field/word splitting, a final non-whitespace IFS delimiter character
> is counted as an empty field. This is unlike every other current shell
> (including current versions of pdksh, i.e. mksh and OpenBSD ksh).
> 
> Test script:
> 
> setopt shwordsplit
> IFS=:
> x=a:b:
> set -- $x
> echo $#
[...]

IIRC, it was discussed before, and the concensus at the time was
that it would be silly as the S in IFS stands for *S*eparator, not
terminator nor delimiter. At the time though, a number of shells
(pdksh and ash based ones) still behaved like zsh and not ksh88
and POSIX was even less clear about it.

Note that, it also affects "read", where "IFS=: read var" on an
output that contains "word:" should (as per POSIX) store "word"
into "var" (while it "should" store "word1:word2:" on a line
with "word1:word2:").

I can't imagine changing the behaviour in the sh/ksh emulations
would be a problem (though I still think it's silly).

Note that with that semantic, you can no longer do things like:

IFS=:
set -o noglob
for dir in $PATH; do
  ...
done

You need to use some kludge like the "which" POSIX script does
on Debian to account for PATHs like "/bin:/usr/bin:".

-- 
Stephane


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shwordsplit: final non-whitespace IFS character problem
  2017-08-04 10:56 ` Stephane Chazelas
@ 2017-08-04 11:13   ` Peter Stephenson
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Stephenson @ 2017-08-04 11:13 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 4 Aug 2017 11:56:01 +0100
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> 2017-08-04 04:03:19 +0200, Martijn Dekker:
> > In field/word splitting, a final non-whitespace IFS delimiter character
> > is counted as an empty field. This is unlike every other current shell
> > (including current versions of pdksh, i.e. mksh and OpenBSD ksh).
> > 
> > Test script:
> > 
> > setopt shwordsplit
> > IFS=:
> > x=a:b:
> > set -- $x
> > echo $#
> [...]
> 
> IIRC, it was discussed before, and the concensus at the time was
> that it would be silly as the S in IFS stands for *S*eparator, not
> terminator nor delimiter.

Clearly it's not a good idea to change this in native mode.

> I can't imagine changing the behaviour in the sh/ksh emulations
> would be a problem (though I still think it's silly).

If everyone else does it, we should follow there.  I guess POSIX_STRINGS
is as sensible an option to use as anything (though previous disclaimers
on partial use of POSIX options apply).

Not sure this is going to get done before the release.

pws


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shwordsplit: final non-whitespace IFS character problem
  2017-08-04  2:03 shwordsplit: final non-whitespace IFS character problem Martijn Dekker
  2017-08-04 10:56 ` Stephane Chazelas
@ 2017-08-06 18:08 ` Peter Stephenson
  2017-08-06 20:01   ` Peter Stephenson
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Stephenson @ 2017-08-06 18:08 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 4 Aug 2017 04:03:19 +0200
Martijn Dekker <martijn@inlv.org> wrote:
> In field/word splitting, a final non-whitespace IFS delimiter character
> is counted as an empty field.

Hope this is good enough.  I've taken account of the fact that when
splitting "foo:bar::" one empty string is kept as it's not final.

As far as I can see from bash, in the case of white space, terminating
white space is also stripped (all of it, since the main difference here
is it combines to make a single delimiter), but it's quite hard to
generate a case to be sure with other shells making it pretty difficult
to get the effect of splitting without removing empty words.  So now

% foo="one    two   three   "
% print -l "${=foo}"
one
two
three

% (setopt posixstrings; print -l "${=foo}"; )
one
two
three
% 

which seems to agree with the following in bash:

$ var="one   two   three    "
$ fn() { typeset f; for f in "$@"; do echo $f; done; }
$ fn $var
one
two
three
$ 

but it's possible the space is being removed for some completely
different reason.

It would obviously be insane to make this the default behaviour.  I hope
the description of the option is suitably off-putting.

pws

diff --git a/Doc/Zsh/options.yo b/Doc/Zsh/options.yo
index 70092d6..c0f07d7 100644
--- a/Doc/Zsh/options.yo
+++ b/Doc/Zsh/options.yo
@@ -2193,16 +2193,16 @@ cindex(discarding embedded nulls in $'...')
 cindex(embedded nulls, in $'...')
 cindex(nulls, embedded in $'...')
 item(tt(POSIX_STRINGS) <K> <S>)(
-This option affects processing of quoted strings.  Currently it only
-affects the behaviour of null characters, i.e. character 0 in the
-portable character set corresponding to US ASCII.
+This option affects processing of quoted strings, and also
+splitting of strngs.
 
-When this option is not set, null characters embedded within strings
-of the form tt($')var(...)tt(') are treated as ordinary characters. The
-entire string is maintained within the shell and output to files where
-necessary, although owing to restrictions of the library interface
-the string is truncated at the null character in file names, environment
-variables, or in arguments to external programs.
+When this option is not set, null characters (character 0 in the
+portable character set coresponding to US ASCII) that are embedded
+within strings of the form tt($')var(...)tt(') are treated as ordinary
+characters. The entire string is maintained within the shell and output
+to files where necessary, although owing to restrictions of the libplary
+interface the string is truncated at the null character in file name s,
+environment variables, or in arguments to external programs.
 
 When this option is set, the tt($')var(...)tt(') expression is truncated at
 the null character.  Note that remaining parts of the same string
@@ -2211,6 +2211,19 @@ beyond the termination of the quotes are not truncated.
 For example, the command line argument tt(a$'b\0c'd) is treated with
 the option off as the characters tt(a), tt(b), null, tt(c), tt(d),
 and with the option on as the characters tt(a), tt(b), tt(d).
+
+Furthermore, when the option is set, a trailing separator followed by an
+empty strins does not cause extra fields to be output when the string
+is split.  For example,
+
+example(var="foo bar "
+print -l "${=var}")
+
+outputs a blank line at the end if tt(POSIXSTRINGS) is not set, but
+no blank line if the option is set.  Note the quotation marks as
+empty elements would in any case be removed without their presence.
+If the separator is not white space, only the final separator is
+ignored in this fashion.
 )
 pindex(POSIX_TRAPS)
 pindex(NO_POSIX_TRAPS)
diff --git a/Src/utils.c b/Src/utils.c
index 5055d69..5493317 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -3500,12 +3500,12 @@ skipwsep(char **s)
 mod_export char **
 spacesplit(char *s, int allownull, int heap, int quote)
 {
-    char *t, **ret, **ptr;
+    char *t, **ret, **ptr, **eptr;
     int l = sizeof(*ret) * (wordcount(s, NULL, -!allownull) + 1);
     char *(*dup)(const char *) = (heap ? dupstring : ztrdup);
 
     /* ### TODO: s/calloc/alloc/ */
-    ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l));
+    eptr = ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l));
 
     if (quote) {
 	/*
@@ -3537,6 +3537,7 @@ spacesplit(char *s, int allownull, int heap, int quote)
 	if (s > t || allownull) {
 	    *ptr = (char *) (heap ? zhalloc((s - t) + 1) :
 		                     zalloc((s - t) + 1));
+	    eptr = ptr;
 	    ztrncpy(*ptr++, t, s - t);
 	} else
 	    *ptr++ = dup(nulstring);
@@ -3545,6 +3546,20 @@ spacesplit(char *s, int allownull, int heap, int quote)
     }
     if (!allownull && t != s)
 	*ptr++ = dup("");
+    if (isset(POSIXSTRINGS) && ptr != eptr + 1) {
+	/*
+	 * Trailing separators do not generate extra fields in POSIX.
+	 * Note this is only the final separator --- if the
+	 * immediately preceding field was null it is still counted.
+	 * So just back up one.
+	 */
+	--ptr;
+	if (!heap) {
+	    char **ret2 = realloc(ret, sizeof(*ret) * (ptr+1-ret));
+	    ptr -= ret-ret2;
+	    ret = ret2;
+	}
+    }
     *ptr = NULL;
     return ret;
 }
diff --git a/Test/E01options.ztst b/Test/E01options.ztst
index f01d835..b394e7c 100644
--- a/Test/E01options.ztst
+++ b/Test/E01options.ztst
@@ -1339,3 +1339,44 @@
 ?(anon):4: `break' active at end of function scope
 ?(anon):4: `break' active at end of function scope
 ?(anon):4: `break' active at end of function scope
+
+  for opt in POSIX_STRINGS NO_POSIX_STRINGS; do
+    var="foo bar "
+    (setopt $opt; print -l X "${=var}" Y)
+    var="foo2::bar2:"
+    (setopt $opt; IFS=:; print -l X "${=var}" Y)
+    var="foo3:bar3::"
+    (setopt $opt; IFS=:; print -l X "${=var}" Y)
+  done
+0:POSIX_STRINGS effect on final delimiters
+>X
+>foo
+>bar
+>Y
+>X
+>foo2
+>
+>bar2
+>Y
+>X
+>foo3
+>bar3
+>
+>Y
+>X
+>foo
+>bar
+>
+>Y
+>X
+>foo2
+>
+>bar2
+>
+>Y
+>X
+>foo3
+>bar3
+>
+>
+>Y


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shwordsplit: final non-whitespace IFS character problem
  2017-08-06 18:08 ` Peter Stephenson
@ 2017-08-06 20:01   ` Peter Stephenson
  2017-08-08 10:56     ` Kamil Dudka
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Stephenson @ 2017-08-06 20:01 UTC (permalink / raw)
  To: Zsh hackers list

This uses some rephrasings from Daniel and also fails to leak in the
case of non-heap allocation.

pws

diff --git a/Doc/Zsh/options.yo b/Doc/Zsh/options.yo
index 70092d6..36bd939 100644
--- a/Doc/Zsh/options.yo
+++ b/Doc/Zsh/options.yo
@@ -2193,16 +2193,16 @@ cindex(discarding embedded nulls in $'...')
 cindex(embedded nulls, in $'...')
 cindex(nulls, embedded in $'...')
 item(tt(POSIX_STRINGS) <K> <S>)(
-This option affects processing of quoted strings.  Currently it only
-affects the behaviour of null characters, i.e. character 0 in the
-portable character set corresponding to US ASCII.
+This option affects processing of quoted strings, and also
+splitting of strngs.
 
-When this option is not set, null characters embedded within strings
-of the form tt($')var(...)tt(') are treated as ordinary characters. The
-entire string is maintained within the shell and output to files where
-necessary, although owing to restrictions of the library interface
-the string is truncated at the null character in file names, environment
-variables, or in arguments to external programs.
+When this option is not set, null characters (character 0 in the
+portable character set coresponding to US ASCII) that are embedded
+within strings of the form tt($')var(...)tt(') are treated as ordinary
+characters. The entire string is maintained within the shell and output
+to files where necessary, although owing to restrictions of the library
+interface the string is truncated at the null character in file names,
+environment variables, or in arguments to external programs.
 
 When this option is set, the tt($')var(...)tt(') expression is truncated at
 the null character.  Note that remaining parts of the same string
@@ -2211,6 +2211,18 @@ beyond the termination of the quotes are not truncated.
 For example, the command line argument tt(a$'b\0c'd) is treated with
 the option off as the characters tt(a), tt(b), null, tt(c), tt(d),
 and with the option on as the characters tt(a), tt(b), tt(d).
+
+Furthermore, when the option is set, a trailing separator followed by an
+empty strings does not cause extra fields to be produced when the string
+is split.  For example,
+
+example(var="foo bar "
+print -l "${=var}")
+
+outputs a blank line at the end if tt(POSIXSTRINGS) is not set, but
+no blank line if the option is set.  Note that empty elements would in
+any case be removed if quotation marks were not used.  If the separator
+is not white space, only the final separator is ignored in this fashion.
 )
 pindex(POSIX_TRAPS)
 pindex(NO_POSIX_TRAPS)
diff --git a/Src/utils.c b/Src/utils.c
index 5055d69..7d8e98c 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -3500,12 +3500,12 @@ skipwsep(char **s)
 mod_export char **
 spacesplit(char *s, int allownull, int heap, int quote)
 {
-    char *t, **ret, **ptr;
+    char *t, **ret, **ptr, **eptr;
     int l = sizeof(*ret) * (wordcount(s, NULL, -!allownull) + 1);
     char *(*dup)(const char *) = (heap ? dupstring : ztrdup);
 
     /* ### TODO: s/calloc/alloc/ */
-    ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l));
+    eptr = ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l));
 
     if (quote) {
 	/*
@@ -3537,6 +3537,7 @@ spacesplit(char *s, int allownull, int heap, int quote)
 	if (s > t || allownull) {
 	    *ptr = (char *) (heap ? zhalloc((s - t) + 1) :
 		                     zalloc((s - t) + 1));
+	    eptr = ptr;
 	    ztrncpy(*ptr++, t, s - t);
 	} else
 	    *ptr++ = dup(nulstring);
@@ -3545,6 +3546,21 @@ spacesplit(char *s, int allownull, int heap, int quote)
     }
     if (!allownull && t != s)
 	*ptr++ = dup("");
+    if (isset(POSIXSTRINGS) && ptr != eptr + 1) {
+	/*
+	 * Trailing separators do not generate extra fields in POSIX.
+	 * Note this is only the final separator --- if the
+	 * immediately preceding field was null it is still counted.
+	 * So just back up one.
+	 */
+	--ptr;
+	if (!heap) {
+	    char **ret2 = realloc(ret, sizeof(*ret) * (ptr+1-ret));
+	    ptr -= ret-ret2;
+	    free(ret);
+	    ret = ret2;
+	}
+    }
     *ptr = NULL;
     return ret;
 }
diff --git a/Test/E01options.ztst b/Test/E01options.ztst
index f01d835..b394e7c 100644
--- a/Test/E01options.ztst
+++ b/Test/E01options.ztst
@@ -1339,3 +1339,44 @@
 ?(anon):4: `break' active at end of function scope
 ?(anon):4: `break' active at end of function scope
 ?(anon):4: `break' active at end of function scope
+
+  for opt in POSIX_STRINGS NO_POSIX_STRINGS; do
+    var="foo bar "
+    (setopt $opt; print -l X "${=var}" Y)
+    var="foo2::bar2:"
+    (setopt $opt; IFS=:; print -l X "${=var}" Y)
+    var="foo3:bar3::"
+    (setopt $opt; IFS=:; print -l X "${=var}" Y)
+  done
+0:POSIX_STRINGS effect on final delimiters
+>X
+>foo
+>bar
+>Y
+>X
+>foo2
+>
+>bar2
+>Y
+>X
+>foo3
+>bar3
+>
+>Y
+>X
+>foo
+>bar
+>
+>Y
+>X
+>foo2
+>
+>bar2
+>
+>Y
+>X
+>foo3
+>bar3
+>
+>
+>Y


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shwordsplit: final non-whitespace IFS character problem
  2017-08-06 20:01   ` Peter Stephenson
@ 2017-08-08 10:56     ` Kamil Dudka
  2017-08-08 12:17       ` Peter Stephenson
  0 siblings, 1 reply; 7+ messages in thread
From: Kamil Dudka @ 2017-08-08 10:56 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

On Sunday, August 06, 2017 21:01:10 Peter Stephenson wrote:
> @@ -3545,6 +3546,21 @@ spacesplit(char *s, int allownull, int heap, int quote) }
>      if (!allownull && t != s)
>  	*ptr++ = dup("");
> +    if (isset(POSIXSTRINGS) && ptr != eptr + 1) {
> +	/*
> +	 * Trailing separators do not generate extra fields in POSIX.
> +	 * Note this is only the final separator --- if the
> +	 * immediately preceding field was null it is still counted.
> +	 * So just back up one.
> +	 */
> +	--ptr;
> +	if (!heap) {
> +	    char **ret2 = realloc(ret, sizeof(*ret) * (ptr+1-ret));
> +	    ptr -= ret-ret2;
> +	    free(ret);
> +	    ret = ret2;
> +	}
> +    }
>      *ptr = NULL;
>      return ret;
>  }

Is it correct when you free() the pointer that you previously gave to
realloc()?  Could not it cause a double free()?

Kamil


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: shwordsplit: final non-whitespace IFS character problem
  2017-08-08 10:56     ` Kamil Dudka
@ 2017-08-08 12:17       ` Peter Stephenson
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Stephenson @ 2017-08-08 12:17 UTC (permalink / raw)
  To: zsh-workers

On Tue, 8 Aug 2017 12:56:40 +0200
Kamil Dudka <kdudka@redhat.com> wrote:
> Is it correct when you free() the pointer that you previously gave to
> realloc()?  Could not it cause a double free()?

You're right --- which will be why I originally didn't have the free
there.  Will need remembering if we resurrect this.

Cheers
pws


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-08 12:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04  2:03 shwordsplit: final non-whitespace IFS character problem Martijn Dekker
2017-08-04 10:56 ` Stephane Chazelas
2017-08-04 11:13   ` Peter Stephenson
2017-08-06 18:08 ` Peter Stephenson
2017-08-06 20:01   ` Peter Stephenson
2017-08-08 10:56     ` Kamil Dudka
2017-08-08 12:17       ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).