* shwordsplit: final non-whitespace IFS character problem @ 2017-08-04 2:03 Martijn Dekker 2017-08-04 10:56 ` Stephane Chazelas 2017-08-06 18:08 ` Peter Stephenson 0 siblings, 2 replies; 7+ messages in thread From: Martijn Dekker @ 2017-08-04 2:03 UTC (permalink / raw) To: Zsh hackers list In field/word splitting, a final non-whitespace IFS delimiter character is counted as an empty field. This is unlike every other current shell (including current versions of pdksh, i.e. mksh and OpenBSD ksh). Test script: setopt shwordsplit IFS=: x=a:b: set -- $x echo $# Expected output: 2 Actual output: 3 The POSIX standard appears pretty ambiguous on this: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05 https://osdn.jp/ticket/browse.php?group_id=3863&tid=35283#comment:3863:35283:1435293070 [click on "Show all old Histories" for complete discussion] However, it turns out that the intention of the standard was clarified back in 1995 (!) in: http://www.open-std.org/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html IOW, the intention of the POSIX standard is for IFS characters to be field terminators rather than separators (in spite of the S in IFS). Sorry if the timing of this is awkward (shortly before an upcoming release). Thanks, - M. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: shwordsplit: final non-whitespace IFS character problem 2017-08-04 2:03 shwordsplit: final non-whitespace IFS character problem Martijn Dekker @ 2017-08-04 10:56 ` Stephane Chazelas 2017-08-04 11:13 ` Peter Stephenson 2017-08-06 18:08 ` Peter Stephenson 1 sibling, 1 reply; 7+ messages in thread From: Stephane Chazelas @ 2017-08-04 10:56 UTC (permalink / raw) To: Martijn Dekker; +Cc: Zsh hackers list 2017-08-04 04:03:19 +0200, Martijn Dekker: > In field/word splitting, a final non-whitespace IFS delimiter character > is counted as an empty field. This is unlike every other current shell > (including current versions of pdksh, i.e. mksh and OpenBSD ksh). > > Test script: > > setopt shwordsplit > IFS=: > x=a:b: > set -- $x > echo $# [...] IIRC, it was discussed before, and the concensus at the time was that it would be silly as the S in IFS stands for *S*eparator, not terminator nor delimiter. At the time though, a number of shells (pdksh and ash based ones) still behaved like zsh and not ksh88 and POSIX was even less clear about it. Note that, it also affects "read", where "IFS=: read var" on an output that contains "word:" should (as per POSIX) store "word" into "var" (while it "should" store "word1:word2:" on a line with "word1:word2:"). I can't imagine changing the behaviour in the sh/ksh emulations would be a problem (though I still think it's silly). Note that with that semantic, you can no longer do things like: IFS=: set -o noglob for dir in $PATH; do ... done You need to use some kludge like the "which" POSIX script does on Debian to account for PATHs like "/bin:/usr/bin:". -- Stephane ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: shwordsplit: final non-whitespace IFS character problem 2017-08-04 10:56 ` Stephane Chazelas @ 2017-08-04 11:13 ` Peter Stephenson 0 siblings, 0 replies; 7+ messages in thread From: Peter Stephenson @ 2017-08-04 11:13 UTC (permalink / raw) To: Zsh hackers list On Fri, 4 Aug 2017 11:56:01 +0100 Stephane Chazelas <stephane.chazelas@gmail.com> wrote: > 2017-08-04 04:03:19 +0200, Martijn Dekker: > > In field/word splitting, a final non-whitespace IFS delimiter character > > is counted as an empty field. This is unlike every other current shell > > (including current versions of pdksh, i.e. mksh and OpenBSD ksh). > > > > Test script: > > > > setopt shwordsplit > > IFS=: > > x=a:b: > > set -- $x > > echo $# > [...] > > IIRC, it was discussed before, and the concensus at the time was > that it would be silly as the S in IFS stands for *S*eparator, not > terminator nor delimiter. Clearly it's not a good idea to change this in native mode. > I can't imagine changing the behaviour in the sh/ksh emulations > would be a problem (though I still think it's silly). If everyone else does it, we should follow there. I guess POSIX_STRINGS is as sensible an option to use as anything (though previous disclaimers on partial use of POSIX options apply). Not sure this is going to get done before the release. pws ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: shwordsplit: final non-whitespace IFS character problem 2017-08-04 2:03 shwordsplit: final non-whitespace IFS character problem Martijn Dekker 2017-08-04 10:56 ` Stephane Chazelas @ 2017-08-06 18:08 ` Peter Stephenson 2017-08-06 20:01 ` Peter Stephenson 1 sibling, 1 reply; 7+ messages in thread From: Peter Stephenson @ 2017-08-06 18:08 UTC (permalink / raw) To: Zsh hackers list On Fri, 4 Aug 2017 04:03:19 +0200 Martijn Dekker <martijn@inlv.org> wrote: > In field/word splitting, a final non-whitespace IFS delimiter character > is counted as an empty field. Hope this is good enough. I've taken account of the fact that when splitting "foo:bar::" one empty string is kept as it's not final. As far as I can see from bash, in the case of white space, terminating white space is also stripped (all of it, since the main difference here is it combines to make a single delimiter), but it's quite hard to generate a case to be sure with other shells making it pretty difficult to get the effect of splitting without removing empty words. So now % foo="one two three " % print -l "${=foo}" one two three % (setopt posixstrings; print -l "${=foo}"; ) one two three % which seems to agree with the following in bash: $ var="one two three " $ fn() { typeset f; for f in "$@"; do echo $f; done; } $ fn $var one two three $ but it's possible the space is being removed for some completely different reason. It would obviously be insane to make this the default behaviour. I hope the description of the option is suitably off-putting. pws diff --git a/Doc/Zsh/options.yo b/Doc/Zsh/options.yo index 70092d6..c0f07d7 100644 --- a/Doc/Zsh/options.yo +++ b/Doc/Zsh/options.yo @@ -2193,16 +2193,16 @@ cindex(discarding embedded nulls in $'...') cindex(embedded nulls, in $'...') cindex(nulls, embedded in $'...') item(tt(POSIX_STRINGS) <K> <S>)( -This option affects processing of quoted strings. Currently it only -affects the behaviour of null characters, i.e. character 0 in the -portable character set corresponding to US ASCII. +This option affects processing of quoted strings, and also +splitting of strngs. -When this option is not set, null characters embedded within strings -of the form tt($')var(...)tt(') are treated as ordinary characters. The -entire string is maintained within the shell and output to files where -necessary, although owing to restrictions of the library interface -the string is truncated at the null character in file names, environment -variables, or in arguments to external programs. +When this option is not set, null characters (character 0 in the +portable character set coresponding to US ASCII) that are embedded +within strings of the form tt($')var(...)tt(') are treated as ordinary +characters. The entire string is maintained within the shell and output +to files where necessary, although owing to restrictions of the libplary +interface the string is truncated at the null character in file name s, +environment variables, or in arguments to external programs. When this option is set, the tt($')var(...)tt(') expression is truncated at the null character. Note that remaining parts of the same string @@ -2211,6 +2211,19 @@ beyond the termination of the quotes are not truncated. For example, the command line argument tt(a$'b\0c'd) is treated with the option off as the characters tt(a), tt(b), null, tt(c), tt(d), and with the option on as the characters tt(a), tt(b), tt(d). + +Furthermore, when the option is set, a trailing separator followed by an +empty strins does not cause extra fields to be output when the string +is split. For example, + +example(var="foo bar " +print -l "${=var}") + +outputs a blank line at the end if tt(POSIXSTRINGS) is not set, but +no blank line if the option is set. Note the quotation marks as +empty elements would in any case be removed without their presence. +If the separator is not white space, only the final separator is +ignored in this fashion. ) pindex(POSIX_TRAPS) pindex(NO_POSIX_TRAPS) diff --git a/Src/utils.c b/Src/utils.c index 5055d69..5493317 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -3500,12 +3500,12 @@ skipwsep(char **s) mod_export char ** spacesplit(char *s, int allownull, int heap, int quote) { - char *t, **ret, **ptr; + char *t, **ret, **ptr, **eptr; int l = sizeof(*ret) * (wordcount(s, NULL, -!allownull) + 1); char *(*dup)(const char *) = (heap ? dupstring : ztrdup); /* ### TODO: s/calloc/alloc/ */ - ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l)); + eptr = ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l)); if (quote) { /* @@ -3537,6 +3537,7 @@ spacesplit(char *s, int allownull, int heap, int quote) if (s > t || allownull) { *ptr = (char *) (heap ? zhalloc((s - t) + 1) : zalloc((s - t) + 1)); + eptr = ptr; ztrncpy(*ptr++, t, s - t); } else *ptr++ = dup(nulstring); @@ -3545,6 +3546,20 @@ spacesplit(char *s, int allownull, int heap, int quote) } if (!allownull && t != s) *ptr++ = dup(""); + if (isset(POSIXSTRINGS) && ptr != eptr + 1) { + /* + * Trailing separators do not generate extra fields in POSIX. + * Note this is only the final separator --- if the + * immediately preceding field was null it is still counted. + * So just back up one. + */ + --ptr; + if (!heap) { + char **ret2 = realloc(ret, sizeof(*ret) * (ptr+1-ret)); + ptr -= ret-ret2; + ret = ret2; + } + } *ptr = NULL; return ret; } diff --git a/Test/E01options.ztst b/Test/E01options.ztst index f01d835..b394e7c 100644 --- a/Test/E01options.ztst +++ b/Test/E01options.ztst @@ -1339,3 +1339,44 @@ ?(anon):4: `break' active at end of function scope ?(anon):4: `break' active at end of function scope ?(anon):4: `break' active at end of function scope + + for opt in POSIX_STRINGS NO_POSIX_STRINGS; do + var="foo bar " + (setopt $opt; print -l X "${=var}" Y) + var="foo2::bar2:" + (setopt $opt; IFS=:; print -l X "${=var}" Y) + var="foo3:bar3::" + (setopt $opt; IFS=:; print -l X "${=var}" Y) + done +0:POSIX_STRINGS effect on final delimiters +>X +>foo +>bar +>Y +>X +>foo2 +> +>bar2 +>Y +>X +>foo3 +>bar3 +> +>Y +>X +>foo +>bar +> +>Y +>X +>foo2 +> +>bar2 +> +>Y +>X +>foo3 +>bar3 +> +> +>Y ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: shwordsplit: final non-whitespace IFS character problem 2017-08-06 18:08 ` Peter Stephenson @ 2017-08-06 20:01 ` Peter Stephenson 2017-08-08 10:56 ` Kamil Dudka 0 siblings, 1 reply; 7+ messages in thread From: Peter Stephenson @ 2017-08-06 20:01 UTC (permalink / raw) To: Zsh hackers list This uses some rephrasings from Daniel and also fails to leak in the case of non-heap allocation. pws diff --git a/Doc/Zsh/options.yo b/Doc/Zsh/options.yo index 70092d6..36bd939 100644 --- a/Doc/Zsh/options.yo +++ b/Doc/Zsh/options.yo @@ -2193,16 +2193,16 @@ cindex(discarding embedded nulls in $'...') cindex(embedded nulls, in $'...') cindex(nulls, embedded in $'...') item(tt(POSIX_STRINGS) <K> <S>)( -This option affects processing of quoted strings. Currently it only -affects the behaviour of null characters, i.e. character 0 in the -portable character set corresponding to US ASCII. +This option affects processing of quoted strings, and also +splitting of strngs. -When this option is not set, null characters embedded within strings -of the form tt($')var(...)tt(') are treated as ordinary characters. The -entire string is maintained within the shell and output to files where -necessary, although owing to restrictions of the library interface -the string is truncated at the null character in file names, environment -variables, or in arguments to external programs. +When this option is not set, null characters (character 0 in the +portable character set coresponding to US ASCII) that are embedded +within strings of the form tt($')var(...)tt(') are treated as ordinary +characters. The entire string is maintained within the shell and output +to files where necessary, although owing to restrictions of the library +interface the string is truncated at the null character in file names, +environment variables, or in arguments to external programs. When this option is set, the tt($')var(...)tt(') expression is truncated at the null character. Note that remaining parts of the same string @@ -2211,6 +2211,18 @@ beyond the termination of the quotes are not truncated. For example, the command line argument tt(a$'b\0c'd) is treated with the option off as the characters tt(a), tt(b), null, tt(c), tt(d), and with the option on as the characters tt(a), tt(b), tt(d). + +Furthermore, when the option is set, a trailing separator followed by an +empty strings does not cause extra fields to be produced when the string +is split. For example, + +example(var="foo bar " +print -l "${=var}") + +outputs a blank line at the end if tt(POSIXSTRINGS) is not set, but +no blank line if the option is set. Note that empty elements would in +any case be removed if quotation marks were not used. If the separator +is not white space, only the final separator is ignored in this fashion. ) pindex(POSIX_TRAPS) pindex(NO_POSIX_TRAPS) diff --git a/Src/utils.c b/Src/utils.c index 5055d69..7d8e98c 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -3500,12 +3500,12 @@ skipwsep(char **s) mod_export char ** spacesplit(char *s, int allownull, int heap, int quote) { - char *t, **ret, **ptr; + char *t, **ret, **ptr, **eptr; int l = sizeof(*ret) * (wordcount(s, NULL, -!allownull) + 1); char *(*dup)(const char *) = (heap ? dupstring : ztrdup); /* ### TODO: s/calloc/alloc/ */ - ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l)); + eptr = ptr = ret = (char **) (heap ? hcalloc(l) : zshcalloc(l)); if (quote) { /* @@ -3537,6 +3537,7 @@ spacesplit(char *s, int allownull, int heap, int quote) if (s > t || allownull) { *ptr = (char *) (heap ? zhalloc((s - t) + 1) : zalloc((s - t) + 1)); + eptr = ptr; ztrncpy(*ptr++, t, s - t); } else *ptr++ = dup(nulstring); @@ -3545,6 +3546,21 @@ spacesplit(char *s, int allownull, int heap, int quote) } if (!allownull && t != s) *ptr++ = dup(""); + if (isset(POSIXSTRINGS) && ptr != eptr + 1) { + /* + * Trailing separators do not generate extra fields in POSIX. + * Note this is only the final separator --- if the + * immediately preceding field was null it is still counted. + * So just back up one. + */ + --ptr; + if (!heap) { + char **ret2 = realloc(ret, sizeof(*ret) * (ptr+1-ret)); + ptr -= ret-ret2; + free(ret); + ret = ret2; + } + } *ptr = NULL; return ret; } diff --git a/Test/E01options.ztst b/Test/E01options.ztst index f01d835..b394e7c 100644 --- a/Test/E01options.ztst +++ b/Test/E01options.ztst @@ -1339,3 +1339,44 @@ ?(anon):4: `break' active at end of function scope ?(anon):4: `break' active at end of function scope ?(anon):4: `break' active at end of function scope + + for opt in POSIX_STRINGS NO_POSIX_STRINGS; do + var="foo bar " + (setopt $opt; print -l X "${=var}" Y) + var="foo2::bar2:" + (setopt $opt; IFS=:; print -l X "${=var}" Y) + var="foo3:bar3::" + (setopt $opt; IFS=:; print -l X "${=var}" Y) + done +0:POSIX_STRINGS effect on final delimiters +>X +>foo +>bar +>Y +>X +>foo2 +> +>bar2 +>Y +>X +>foo3 +>bar3 +> +>Y +>X +>foo +>bar +> +>Y +>X +>foo2 +> +>bar2 +> +>Y +>X +>foo3 +>bar3 +> +> +>Y ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: shwordsplit: final non-whitespace IFS character problem 2017-08-06 20:01 ` Peter Stephenson @ 2017-08-08 10:56 ` Kamil Dudka 2017-08-08 12:17 ` Peter Stephenson 0 siblings, 1 reply; 7+ messages in thread From: Kamil Dudka @ 2017-08-08 10:56 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-workers On Sunday, August 06, 2017 21:01:10 Peter Stephenson wrote: > @@ -3545,6 +3546,21 @@ spacesplit(char *s, int allownull, int heap, int quote) } > if (!allownull && t != s) > *ptr++ = dup(""); > + if (isset(POSIXSTRINGS) && ptr != eptr + 1) { > + /* > + * Trailing separators do not generate extra fields in POSIX. > + * Note this is only the final separator --- if the > + * immediately preceding field was null it is still counted. > + * So just back up one. > + */ > + --ptr; > + if (!heap) { > + char **ret2 = realloc(ret, sizeof(*ret) * (ptr+1-ret)); > + ptr -= ret-ret2; > + free(ret); > + ret = ret2; > + } > + } > *ptr = NULL; > return ret; > } Is it correct when you free() the pointer that you previously gave to realloc()? Could not it cause a double free()? Kamil ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: shwordsplit: final non-whitespace IFS character problem 2017-08-08 10:56 ` Kamil Dudka @ 2017-08-08 12:17 ` Peter Stephenson 0 siblings, 0 replies; 7+ messages in thread From: Peter Stephenson @ 2017-08-08 12:17 UTC (permalink / raw) To: zsh-workers On Tue, 8 Aug 2017 12:56:40 +0200 Kamil Dudka <kdudka@redhat.com> wrote: > Is it correct when you free() the pointer that you previously gave to > realloc()? Could not it cause a double free()? You're right --- which will be why I originally didn't have the free there. Will need remembering if we resurrect this. Cheers pws ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-08 12:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-04 2:03 shwordsplit: final non-whitespace IFS character problem Martijn Dekker 2017-08-04 10:56 ` Stephane Chazelas 2017-08-04 11:13 ` Peter Stephenson 2017-08-06 18:08 ` Peter Stephenson 2017-08-06 20:01 ` Peter Stephenson 2017-08-08 10:56 ` Kamil Dudka 2017-08-08 12:17 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).