zsh-workers
 help / color / mirror / code / Atom feed
* The "set" utility outputs binary data
@ 2015-12-03 14:05 Vincent Lefevre
  2015-12-03 14:25 ` Peter Stephenson
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Vincent Lefevre @ 2015-12-03 14:05 UTC (permalink / raw)
  To: zsh-workers

The "set" utility outputs binary data (probably due to escape
sequences for coloring and so on in some parameters such as
prompts):

zira:~> set | grep ZSH
Binary file (standard input) matches

Though I could use the non-standard -a grep option, this is annoying.
I think that by default, "set" should quote non-printable characters
(including invalid byte sequences, I assume). I don't think that this
is even forbidden by POSIX, which already requires the shell to quote
some characters so that the output is "suitable for reinput to the
shell".

This is also important when the output is on a terminal.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-03 14:05 The "set" utility outputs binary data Vincent Lefevre
@ 2015-12-03 14:25 ` Peter Stephenson
  2015-12-04 14:29   ` Peter Stephenson
  2015-12-03 14:46 ` Stephane Chazelas
  2015-12-03 23:43 ` Daniel Shahaf
  2 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2015-12-03 14:25 UTC (permalink / raw)
  To: zsh-workers

On Thu, 3 Dec 2015 15:05:58 +0100
Vincent Lefevre <vincent@vinc17.net> wrote:
> I think that by default, "set" should quote non-printable characters
> (including invalid byte sequences, I assume).

It already does some sort of quoting, in fact.  It's probably just a
historical artifact that it doesn't do the latest and greatest sort.  So
I don't see any reason not to do this.  You could even consider this a
"minro feature".

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-03 14:05 The "set" utility outputs binary data Vincent Lefevre
  2015-12-03 14:25 ` Peter Stephenson
@ 2015-12-03 14:46 ` Stephane Chazelas
  2015-12-03 23:43 ` Daniel Shahaf
  2 siblings, 0 replies; 11+ messages in thread
From: Stephane Chazelas @ 2015-12-03 14:46 UTC (permalink / raw)
  To: zsh-workers

2015-12-03 15:05:58 +0100, Vincent Lefevre:
> The "set" utility outputs binary data (probably due to escape
> sequences for coloring and so on in some parameters such as
> prompts):
> 
> zira:~> set | grep ZSH
> Binary file (standard input) matches
[...]

It's the NUL byte in $IFS that's causing that.

-- 
Stephane


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-03 14:05 The "set" utility outputs binary data Vincent Lefevre
  2015-12-03 14:25 ` Peter Stephenson
  2015-12-03 14:46 ` Stephane Chazelas
@ 2015-12-03 23:43 ` Daniel Shahaf
  2 siblings, 0 replies; 11+ messages in thread
From: Daniel Shahaf @ 2015-12-03 23:43 UTC (permalink / raw)
  To: Vincent Lefevre; +Cc: zsh-workers

Vincent Lefevre wrote on Thu, Dec 03, 2015 at 15:05:58 +0100:
> The "set" utility outputs binary data (probably due to escape
> sequences for coloring and so on in some parameters such as
> prompts):
...
> This is also important when the output is on a terminal.

History expansions have a similar issue: after running
.
    bindkey ^T f
    (where ^T is a control character, inputted as ^V^T)
.
and then issueing a history expansion
    !!
the ^T is rendered, not as the two characters "^" "T" in reverse video,
but as a literal ^T, which my terminal renders as a box with "0 0 1 6"
inside it (the codepoint).

Printing an easier-to-read representation would be nice, although of
course it's a minor issue.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-03 14:25 ` Peter Stephenson
@ 2015-12-04 14:29   ` Peter Stephenson
  2015-12-04 21:56     ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2015-12-04 14:29 UTC (permalink / raw)
  To: zsh-workers

On Thu, 3 Dec 2015 14:25:33 +0000
Peter Stephenson <p.stephenson@samsung.com> wrote:
> On Thu, 3 Dec 2015 15:05:58 +0100
> Vincent Lefevre <vincent@vinc17.net> wrote:
> > I think that by default, "set" should quote non-printable characters
> > (including invalid byte sequences, I assume).
> 
> It already does some sort of quoting, in fact.  It's probably just a
> historical artifact that it doesn't do the latest and greatest sort.

It looks like the strategy would be be to upgrade quotedzputs() to
interact better with nicezputs() and nicechar().  The code that's not
there at the moment is to pick the right sort of quotes, and you only
know that after the event at the moment, so the interface to those two
needs expanding.

I'd propose not bothering to do this in the case where multibyte mode
isn't available (i.e is not even compiled in).  It's not useful enough
and wouldn't get much testing.

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-04 14:29   ` Peter Stephenson
@ 2015-12-04 21:56     ` Peter Stephenson
  2015-12-06 23:08       ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2015-12-04 21:56 UTC (permalink / raw)
  To: zsh-workers

On Fri, 04 Dec 2015 14:29:00 +0000
Peter Stephenson <p.stephenson@samsung.com> wrote:
> It looks like the strategy would be be to upgrade quotedzputs() to
> interact better with nicezputs() and nicechar().  The code that's not
> there at the moment is to pick the right sort of quotes, and you only
> know that after the event at the moment, so the interface to those two
> needs expanding.
> 
> I'd propose not bothering to do this in the case where multibyte mode
> isn't available (i.e is not even compiled in).  It's not useful enough
> and wouldn't get much testing.

This seems to be going the right way; let me know of any oddities or
unwanted side effects.  Note a few "nice" representations have changed
to fit $'..' conventions.

pws

diff --git a/Src/utils.c b/Src/utils.c
index ca810de..d131383 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -411,7 +411,7 @@ putshout(int c)
 mod_export char *
 nicechar(int c)
 {
-    static char buf[6];
+    static char buf[10];
     char *s = buf;
     c &= 0xff;
     if (ZISPRINT(c))
@@ -427,7 +427,9 @@ nicechar(int c)
 	    goto done;
     }
     if (c == 0x7f) {
-	*s++ = '^';
+	*s++ = '\\';
+	*s++ = 'C';
+	*s++ = '-';
 	c = '?';
     } else if (c == '\n') {
 	*s++ = '\\';
@@ -436,7 +438,9 @@ nicechar(int c)
 	*s++ = '\\';
 	c = 't';
     } else if (c < 0x20) {
-	*s++ = '^';
+	*s++ = '\\';
+	*s++ = 'C';
+	*s++ = '-';
 	c += 0x40;
     }
     done:
@@ -455,6 +459,22 @@ nicechar(int c)
     return buf;
 }
 
+/*
+ * Return 1 if nicechar() would reformat this character.
+ */
+
+/**/
+mod_export int
+is_nicechar(int c)
+{
+    c &= 0xff;
+    if (ZISPRINT(c))
+	return 0;
+    if (c & 0x80)
+	return !isset(PRINTEIGHTBIT);
+    return (c == 0x7f || c == '\n' || c == '\t' || c < 0x20);
+}
+
 /**/
 #ifdef MULTIBYTE_SUPPORT
 static mbstate_t mb_shiftstate;
@@ -532,7 +552,9 @@ wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
     s = buf;
     if (!iswprint(c) && (c < 0x80 || !isset(PRINTEIGHTBIT))) {
 	if (c == 0x7f) {
-	    *s++ = '^';
+	    *s++ = '\\';
+	    *s++ = 'C';
+	    *s++ = '-';
 	    c = '?';
 	} else if (c == L'\n') {
 	    *s++ = '\\';
@@ -541,7 +563,9 @@ wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
 	    *s++ = '\\';
 	    c = 't';
 	} else if (c < 0x20) {
-	    *s++ = '^';
+	    *s++ = '\\';
+	    *s++ = 'C';
+	    *s++ = '-';
 	    c += 0x40;
 	} else if (c >= 0x80) {
 	    ret = -1;
@@ -611,6 +635,23 @@ wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
     return buf;
 }
 
+/*
+ * Return 1 if wcs_nicechar() would reformat this character for display.
+ */
+
+/**/
+mod_export int is_wcs_nicechar(wchar_t c)
+{
+    if (!iswprint(c) && (c < 0x80 || !isset(PRINTEIGHTBIT))) {
+	if (c == 0x7f || c == L'\n' || c == L'\t' || c < 0x20)
+	    return 1;
+	if (c >= 0x80) {
+	    return (c >= 0x100);
+	}
+    }
+    return 0;
+}
+
 /**/
 mod_export int
 zwcwidth(wint_t wc)
@@ -4834,12 +4875,15 @@ niceztrlen(char const *s)
  * If outstrp is not NULL, set *outstrp to a zalloc'd version of
  * the output (still metafied).
  *
- * If "heap" is non-zero, use the heap for *outstrp, else zalloc.
+ * If flags contains NICEFLAG_HEAP, use the heap for *outstrp, else
+ * zalloc.
+ * If flags contsins NICEFLAG_QUOTE, the output is going to be within
+ * $'...', so quote "'" with a backslash.
  */
 
 /**/
 mod_export size_t
-mb_niceformat(const char *s, FILE *stream, char **outstrp, int heap)
+mb_niceformat(const char *s, FILE *stream, char **outstrp, int flags)
 {
     size_t l = 0, newl;
     int umlen, outalloc, outleft, eol = 0;
@@ -4886,7 +4930,10 @@ mb_niceformat(const char *s, FILE *stream, char **outstrp, int heap)
 	    cnt = 1;
 	    /* FALL THROUGH */
 	default:
-	    fmt = wcs_nicechar(c, &newl, NULL);
+	    if (c == L'\'' && (flags & NICEFLAG_QUOTE))
+		fmt = "\\'";
+	    else
+		fmt = wcs_nicechar(c, &newl, NULL);
 	    break;
 	}
 
@@ -4920,13 +4967,71 @@ mb_niceformat(const char *s, FILE *stream, char **outstrp, int heap)
     if (outstrp) {
 	*outptr = '\0';
 	/* Use more efficient storage for returned string */
-	*outstrp = heap ? dupstring(outstr) : ztrdup(outstr);
+	*outstrp = (flags & NICEFLAG_HEAP) ? dupstring(outstr) : ztrdup(outstr);
 	free(outstr);
     }
 
     return l;
 }
 
+/*
+ * Return 1 if mb_niceformat() would reformat this string, else 0.
+ */
+
+/**/
+mod_export int
+is_mb_niceformat(const char *s)
+{
+    int umlen, eol = 0, ret = 0;
+    wchar_t c;
+    char *ums, *ptr;
+    mbstate_t mbs;
+
+    ums = ztrdup(s);
+    untokenize(ums);
+    ptr = unmetafy(ums, &umlen);
+
+    memset(&mbs, 0, sizeof mbs);
+    while (umlen > 0) {
+	size_t cnt = eol ? MB_INVALID : mbrtowc(&c, ptr, umlen, &mbs);
+
+	switch (cnt) {
+	case MB_INCOMPLETE:
+	    eol = 1;
+	    /* FALL THROUGH */
+	case MB_INVALID:
+	    /* The byte didn't convert, so output it as a \M-... sequence. */
+	    if (is_nicechar(*ptr))  {
+		ret = 1;
+		break;
+	    }
+	    cnt = 1;
+	    /* Get mbs out of its undefined state. */
+	    memset(&mbs, 0, sizeof mbs);
+	    break;
+	case 0:
+	    /* Careful:  converting '\0' returns 0, but a '\0' is a
+	     * real character for us, so we should consume 1 byte. */
+	    cnt = 1;
+	    /* FALL THROUGH */
+	default:
+	    if (is_wcs_nicechar(c))
+		ret = 1;
+	    break;
+	}
+
+	if (ret)
+	    break;
+
+	umlen -= cnt;
+	ptr += cnt;
+    }
+
+    free(ums);
+
+    return ret;
+}
+
 /* ztrdup multibyte string with nice formatting */
 
 /**/
@@ -4935,7 +5040,7 @@ nicedup(const char *s, int heap)
 {
     char *retstr;
 
-    (void)mb_niceformat(s, NULL, &retstr, heap);
+    (void)mb_niceformat(s, NULL, &retstr, heap ? NICEFLAG_HEAP : 0);
 
     return retstr;
 }
@@ -5717,22 +5822,35 @@ quotestring(const char *s, char **e, int instring)
 /* Unmetafy and output a string, quoted if it contains special characters. */
 
 /**/
-mod_export int
+mod_export void
 quotedzputs(char const *s, FILE *stream)
 {
     int inquote = 0, c;
 
     /* check for empty string */
-    if(!*s)
-	return fputs("''", stream);
+    if(!*s) {
+	fputs("''", stream);
+	return;
+    }
 
-    if (!hasspecial(s))
-	return zputs(s, stream);
+#ifdef MULTIBYTE_SUPPORT
+    if (is_mb_niceformat(s)) {
+	fputs("$'", stream);
+	mb_niceformat(s, stream, NULL, NICEFLAG_QUOTE);
+	fputc('\'', stream);
+	return;
+    }
+#endif /* MULTIBYTE_SUPPORT */
+
+    if (!hasspecial(s)) {
+	zputs(s, stream);
+	return;
+    }
 
     if (isset(RCQUOTES)) {
 	/* use rc-style quotes-within-quotes for the whole string */
 	if(fputc('\'', stream) < 0)
-	    return EOF;
+	    return;
 	while(*s) {
 	    if (*s == Meta)
 		c = *++s ^ 32;
@@ -5741,16 +5859,16 @@ quotedzputs(char const *s, FILE *stream)
 	    s++;
 	    if (c == '\'') {
 		if(fputc('\'', stream) < 0)
-		    return EOF;
+		    return;
 	    } else if(c == '\n' && isset(CSHJUNKIEQUOTES)) {
 		if(fputc('\\', stream) < 0)
-		    return EOF;
+		    return;
 	    }
 	    if(fputc(c, stream) < 0)
-		return EOF;
+		return;
 	}
 	if(fputc('\'', stream) < 0)
-	    return EOF;
+	    return;
     } else {
 	/* use Bourne-style quoting, avoiding empty quoted strings */
 	while(*s) {
@@ -5762,31 +5880,30 @@ quotedzputs(char const *s, FILE *stream)
 	    if (c == '\'') {
 		if(inquote) {
 		    if(fputc('\'', stream) < 0)
-			return EOF;
+			return;
 		    inquote=0;
 		}
 		if(fputs("\\'", stream) < 0)
-		    return EOF;
+		    return;
 	    } else {
 		if (!inquote) {
 		    if(fputc('\'', stream) < 0)
-			return EOF;
+			return;
 		    inquote=1;
 		}
 		if(c == '\n' && isset(CSHJUNKIEQUOTES)) {
 		    if(fputc('\\', stream) < 0)
-			return EOF;
+			return;
 		}
 		if(fputc(c, stream) < 0)
-		    return EOF;
+		    return;
 	    }
 	}
 	if (inquote) {
 	    if(fputc('\'', stream) < 0)
-		return EOF;
+		return;
 	}
     }
-    return 0;
 }
 
 /* Double-quote a metafied string. */
diff --git a/Src/zsh.h b/Src/zsh.h
index d3bfcef..caf7def 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -3051,6 +3051,12 @@ enum {
 #define AFTERTRAPHOOK  (zshhooks + 2)
 
 #ifdef MULTIBYTE_SUPPORT
+/* Final argument to mb_niceformat() */
+enum {
+    NICEFLAG_HEAP = 1,		/* Heap allocation where needed */
+    NICEFLAG_QUOTE = 2,		/* Result will appear in $'...' */
+};
+
 /* Metafied input */
 #define nicezputs(str, outs)	(void)mb_niceformat((str), (outs), NULL, 0)
 #define MB_METACHARINIT()	mb_charinit()


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-04 21:56     ` Peter Stephenson
@ 2015-12-06 23:08       ` Bart Schaefer
  2015-12-07 10:24         ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2015-12-06 23:08 UTC (permalink / raw)
  To: zsh-workers

On Dec 4,  9:56pm, Peter Stephenson wrote:
}
} This seems to be going the right way; let me know of any oddities or
} unwanted side effects.  Note a few "nice" representations have changed
} to fit $'..' conventions.

I'll withhold judgment on whether I like this, though I must say I do
prefer "^C" to "\C-c".  In any case though, this has borked two of the
test scripts:

./D04parameter.ztst: starting.
Binary files /tmp/zsh.ztst.out.21054 and /tmp/zsh.ztst.tout.21054 differ
Test ./D04parameter.ztst failed: output differs from expected as shown above
for:
  foo=$'\x7f\x00'
  print ${(V)foo}
Was testing: ${(V)...}
./D04parameter.ztst: test failed.

./V09datetime.ztst: starting.
Test case skipped: Japanese UTF-8 locale not supported
Binary files /tmp/zsh.ztst.out.23103 and /tmp/zsh.ztst.tout.23103 differ
Test ./V09datetime.ztst failed: output differs from expected as shown above
for:
  print ${(V)"$(strftime $'%Y\0%m\0%d' 100000000)"}
Was testing: Embedded nulls
./V09datetime.ztst: test failed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-06 23:08       ` Bart Schaefer
@ 2015-12-07 10:24         ` Peter Stephenson
  2015-12-07 18:13           ` Bart Schaefer
  2015-12-07 18:29           ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Stephenson @ 2015-12-07 10:24 UTC (permalink / raw)
  To: zsh-workers

On Sun, 06 Dec 2015 15:08:44 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Dec 4,  9:56pm, Peter Stephenson wrote:
> }
> } This seems to be going the right way; let me know of any oddities or
> } unwanted side effects.  Note a few "nice" representations have changed
> } to fit $'..' conventions.
> 
> I'll withhold judgment on whether I like this, though I must say I do
> prefer "^C" to "\C-c".

If this is important enough, we can arrange two versions of nicechar /
wcs_nicechar.  The quickest change would be expanding the functions under
different names to take a flag when passed from mb_niceformat(),
together with a shim layer for other cases.

(You can't seriously be complaining that "typeset -m IFS" now outputs

IFS=$' \t\n\C-@'

instead of raw binary, can you?)

> In any case though, this has borked two of the test scripts:

Not sure what I can have been testing.

pws

diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index a3c5d71..2b46e06 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -396,9 +396,9 @@
 >Instead Here I Am Stuck By The Computer
 
   foo=$'\x7f\x00'
-  print ${(V)foo}
+  print -r -- ${(V)foo}
 0:${(V)...}
->^?^@
+>\C-?\C-@
 
   foo='playing '\''stupid'\'' "games" \w\i\t\h $quoting.'
   print -r ${(q)foo}
diff --git a/Test/V09datetime.ztst b/Test/V09datetime.ztst
index 63ff4ee..831421d 100644
--- a/Test/V09datetime.ztst
+++ b/Test/V09datetime.ztst
@@ -69,6 +69,6 @@
 >090
 >1
 
-  print ${(V)"$(strftime $'%Y\0%m\0%d' 100000000)"}
+  print -r -- ${(V)"$(strftime $'%Y\0%m\0%d' 100000000)"}
 0:Embedded nulls
->1973^@03^@03
+>1973\C-@03\C-@03


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-07 10:24         ` Peter Stephenson
@ 2015-12-07 18:13           ` Bart Schaefer
  2015-12-07 21:39             ` Peter Stephenson
  2015-12-07 18:29           ` Nikolay Aleksandrovich Pavlov (ZyX)
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2015-12-07 18:13 UTC (permalink / raw)
  To: zsh-workers

On Dec 7, 10:24am, Peter Stephenson wrote:
}
} (You can't seriously be complaining that "typeset -m IFS" now outputs
} 
} IFS=$' \t\n\C-@'
} 
} instead of raw binary, can you?)

Goodness, no.  Just the ${(V)...} substitution, mostly (the test cases
37335 updates) and anyplace where it's in human-informational output
rather than machine-re-readable output.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-07 10:24         ` Peter Stephenson
  2015-12-07 18:13           ` Bart Schaefer
@ 2015-12-07 18:29           ` Nikolay Aleksandrovich Pavlov (ZyX)
  1 sibling, 0 replies; 11+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2015-12-07 18:29 UTC (permalink / raw)
  To: Peter Stephenson, zsh-workers

07.12.2015, 13:26, "Peter Stephenson" <p.stephenson@samsung.com>:
> On Sun, 06 Dec 2015 15:08:44 -0800
> Bart Schaefer <schaefer@brasslantern.com> wrote:
>>  On Dec 4, 9:56pm, Peter Stephenson wrote:
>>  }
>>  } This seems to be going the right way; let me know of any oddities or
>>  } unwanted side effects. Note a few "nice" representations have changed
>>  } to fit $'..' conventions.
>>
>>  I'll withhold judgment on whether I like this, though I must say I do
>>  prefer "^C" to "\C-c".
>
> If this is important enough, we can arrange two versions of nicechar /
> wcs_nicechar. The quickest change would be expanding the functions under
> different names to take a flag when passed from mb_niceformat(),
> together with a shim layer for other cases.
>
> (You can't seriously be complaining that "typeset -m IFS" now outputs
>
> IFS=$' \t\n\C-@'
>
> instead of raw binary, can you?)

Can’t this code be somehow combined with ${(q)} rather then ${(V)}? I would really prefer if ${(q)"$(echo $'\C-c')} resulted in $'\C-c' and not in $'\003'.

>
>>  In any case though, this has borked two of the test scripts:
>
> Not sure what I can have been testing.
>
> pws
>
> diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
> index a3c5d71..2b46e06 100644
> --- a/Test/D04parameter.ztst
> +++ b/Test/D04parameter.ztst
> @@ -396,9 +396,9 @@
>  >Instead Here I Am Stuck By The Computer
>
>    foo=$'\x7f\x00'
> - print ${(V)foo}
> + print -r -- ${(V)foo}
>  0:${(V)...}
> ->^?^@
> +>\C-?\C-@
>
>    foo='playing '\''stupid'\'' "games" \w\i\t\h $quoting.'
>    print -r ${(q)foo}
> diff --git a/Test/V09datetime.ztst b/Test/V09datetime.ztst
> index 63ff4ee..831421d 100644
> --- a/Test/V09datetime.ztst
> +++ b/Test/V09datetime.ztst
> @@ -69,6 +69,6 @@
>  >090
>  >1
>
> - print ${(V)"$(strftime $'%Y\0%m\0%d' 100000000)"}
> + print -r -- ${(V)"$(strftime $'%Y\0%m\0%d' 100000000)"}
>  0:Embedded nulls
> ->1973^@03^@03
> +>1973\C-@03\C-@03


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The "set" utility outputs binary data
  2015-12-07 18:13           ` Bart Schaefer
@ 2015-12-07 21:39             ` Peter Stephenson
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Stephenson @ 2015-12-07 21:39 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On Mon, 7 Dec 2015 10:13:08 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Dec 7, 10:24am, Peter Stephenson wrote:
> }
> } (You can't seriously be complaining that "typeset -m IFS" now outputs
> } 
> } IFS=$' \t\n\C-@'
> } 
> } instead of raw binary, can you?)
> 
> Goodness, no.  Just the ${(V)...} substitution, mostly (the test cases
> 37335 updates) and anyplace where it's in human-informational output
> rather than machine-re-readable output.

This attempts to restore the short form when not called from
quotedzputs().  As a "free" bonus (that is, it's free to everyone else),
you can use ${(q+)...} to get the same effect as the new quoting within
parameters (so (V) does what it usd to but (q+) gives you something
a bit similar but readbackinable).

I'll write some tests one day.

I suppose you'll be wanting it to work, next.

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 564c70d..c6e7b6f 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1067,6 +1067,11 @@ If a tt(q-) is given (only a single tt(q) may appear), a minimal
 form of single quoting is used that only quotes the string if needed to
 protect special characters.  Typically this form gives the most readable
 output.
+
+If a tt(q+) is given, an extended form of minmal quoting is used that
+causes unprintable characters to be rendered using tt($')var(...)tt(').
+This quoting is similar to that used by the output of values by the
+tt(typeset) family of commands.
 )
 item(tt(Q))(
 Remove one level of quotes from the resulting words.
diff --git a/Src/subst.c b/Src/subst.c
index d9c9d24..bb1dd89 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1887,12 +1887,13 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 		    if (quotetype == QT_DOLLARS ||
 			quotetype == QT_BACKSLASH_PATTERN)
 			goto flagerr;
-		    if (s[1] == '-') {
+		    if (s[1] == '-' || s[1] == '+') {
 			if (quotemod)
 			    goto flagerr;
 			s++;
 			quotemod = 1;
-			quotetype = QT_SINGLE_OPTIONAL;
+			quotetype = (*s == '-') ? QT_SINGLE_OPTIONAL :
+			    QT_QUOTEDZPUTS;
 		    } else {
 			if (quotetype == QT_SINGLE_OPTIONAL) {
 			    /* extra q's after '-' not allowed */
@@ -3583,7 +3584,10 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 	    ap = aval;
 
 	    if (quotemod > 0) {
-		if (quotetype > QT_BACKSLASH) {
+		if (quotetype == QT_QUOTEDZPUTS) {
+		    for (; *ap; ap++)
+			*ap = quotedzputs(*ap, NULL);
+		} else if (quotetype > QT_BACKSLASH) {
 		    int sl;
 		    char *tmp;
 
@@ -3626,7 +3630,9 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 	    if (!copied)
 		val = dupstring(val), copied = 1;
 	    if (quotemod > 0) {
-		if (quotetype > QT_BACKSLASH) {
+		if (quotetype == QT_QUOTEDZPUTS) {
+		    val = quotedzputs(val, NULL);
+		} else if (quotetype > QT_BACKSLASH) {
 		    int sl;
 		    char *tmp;
 		    tmp = quotestring(val, NULL, quotetype);
diff --git a/Src/utils.c b/Src/utils.c
index fc2b192..1554fa0 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -387,6 +387,7 @@ putshout(int c)
     return 0;
 }
 
+#ifdef MULTIBYTE_SUPPORT
 /*
  * Turn a character into a visible representation thereof.  The visible
  * string is put together in a static buffer, and this function returns
@@ -409,6 +410,73 @@ putshout(int c)
 
 /**/
 mod_export char *
+nicechar_sel(int c, int quotable)
+{
+    static char buf[10];
+    char *s = buf;
+    c &= 0xff;
+    if (ZISPRINT(c))
+	goto done;
+    if (c & 0x80) {
+	if (isset(PRINTEIGHTBIT))
+	    goto done;
+	*s++ = '\\';
+	*s++ = 'M';
+	*s++ = '-';
+	c &= 0x7f;
+	if(ZISPRINT(c))
+	    goto done;
+    }
+    if (c == 0x7f) {
+	if (quotable) {
+	    *s++ = '\\';
+	    *s++ = 'C';
+	    *s++ = '-';
+	} else
+	    *s++ = '^';
+	c = '?';
+    } else if (c == '\n') {
+	*s++ = '\\';
+	c = 'n';
+    } else if (c == '\t') {
+	*s++ = '\\';
+	c = 't';
+    } else if (c < 0x20) {
+	if (quotable) {
+	    *s++ = '\\';
+	    *s++ = 'C';
+	    *s++ = '-';
+	} else
+	    *s++ = '^';
+	c += 0x40;
+    }
+    done:
+    /*
+     * The resulting string is still metafied, so check if
+     * we are returning a character in the range that needs metafication.
+     * This can't happen if the character is printed "nicely", so
+     * this results in a maximum of two bytes total (plus the null).
+     */
+    if (imeta(c)) {
+	*s++ = Meta;
+	*s++ = c ^ 32;
+    } else
+	*s++ = c;
+    *s = 0;
+    return buf;
+}
+
+/**/
+mod_export char *
+nicechar(int c)
+{
+    return nicechar_sel(c, 0);
+}
+
+#else /* MULTIBYTE_SUPPORT */
+
+/**/
+mod_export char *
 nicechar(int c)
 {
     static char buf[10];
@@ -459,6 +527,8 @@ nicechar(int c)
     return buf;
 }
 
+#endif /* MULTIBYTE_SUPPORT */
+
 /*
  * Return 1 if nicechar() would reformat this character.
  */
@@ -527,7 +597,7 @@ mb_charinit(void)
 
 /**/
 mod_export char *
-wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
+wcs_nicechar_sel(wchar_t c, size_t *widthp, char **swidep, int quotable)
 {
     static char *buf;
     static int bufalloc = 0, newalloc;
@@ -552,9 +622,12 @@ wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
     s = buf;
     if (!iswprint(c) && (c < 0x80 || !isset(PRINTEIGHTBIT))) {
 	if (c == 0x7f) {
-	    *s++ = '\\';
-	    *s++ = 'C';
-	    *s++ = '-';
+	    if (quotable) {
+		*s++ = '\\';
+		*s++ = 'C';
+		*s++ = '-';
+	    } else
+		*s++ = '^';
 	    c = '?';
 	} else if (c == L'\n') {
 	    *s++ = '\\';
@@ -563,9 +636,12 @@ wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
 	    *s++ = '\\';
 	    c = 't';
 	} else if (c < 0x20) {
-	    *s++ = '\\';
-	    *s++ = 'C';
-	    *s++ = '-';
+	    if (quotable) {
+		*s++ = '\\';
+		*s++ = 'C';
+		*s++ = '-';
+	    } else
+		*s++ = '^';
 	    c += 0x40;
 	} else if (c >= 0x80) {
 	    ret = -1;
@@ -635,6 +711,13 @@ wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
     return buf;
 }
 
+/**/
+mod_export char *
+wcs_nicechar(wchar_t c, size_t *widthp, char **swidep)
+{
+    return wcs_nicechar_sel(c, widthp, swidep, 0);
+}
+
 /*
  * Return 1 if wcs_nicechar() would reformat this character for display.
  */
@@ -4918,7 +5001,7 @@ mb_niceformat(const char *s, FILE *stream, char **outstrp, int flags)
 	    /* FALL THROUGH */
 	case MB_INVALID:
 	    /* The byte didn't convert, so output it as a \M-... sequence. */
-	    fmt = nicechar(*ptr);
+	    fmt = nicechar_sel(*ptr, flags & NICEFLAG_QUOTE);
 	    newl = strlen(fmt);
 	    cnt = 1;
 	    /* Get mbs out of its undefined state. */
@@ -4933,7 +5016,7 @@ mb_niceformat(const char *s, FILE *stream, char **outstrp, int flags)
 	    if (c == L'\'' && (flags & NICEFLAG_QUOTE))
 		fmt = "\\'";
 	    else
-		fmt = wcs_nicechar(c, &newl, NULL);
+		fmt = wcs_nicechar_sel(c, &newl, NULL, flags & NICEFLAG_QUOTE);
 	    break;
 	}
 
@@ -4967,8 +5050,13 @@ mb_niceformat(const char *s, FILE *stream, char **outstrp, int flags)
     if (outstrp) {
 	*outptr = '\0';
 	/* Use more efficient storage for returned string */
-	*outstrp = (flags & NICEFLAG_HEAP) ? dupstring(outstr) : ztrdup(outstr);
-	free(outstr);
+	if (flags & NICEFLAG_NODUP)
+	    *outstrp = outstr;
+	else {
+	    *outstrp = (flags & NICEFLAG_HEAP) ? dupstring(outstr) :
+		ztrdup(outstr);
+	    free(outstr);
+	}
     }
 
     return l;
@@ -5834,38 +5922,76 @@ quotestring(const char *s, char **e, int instring)
     return v;
 }
 
-/* Unmetafy and output a string, quoted if it contains special characters. */
+/*
+ * Unmetafy and output a string, quoted if it contains special
+ * characters.
+ *
+ * If stream is NULL, return the same output with any allocation on the
+ * heap.
+ */
 
 /**/
-mod_export void
+mod_export char *
 quotedzputs(char const *s, FILE *stream)
 {
     int inquote = 0, c;
+    char *outstr, *ptr;
 
     /* check for empty string */
     if(!*s) {
+	if (!stream)
+	    return "''";
 	fputs("''", stream);
-	return;
+	return NULL;
     }
 
 #ifdef MULTIBYTE_SUPPORT
     if (is_mb_niceformat(s)) {
-	fputs("$'", stream);
-	mb_niceformat(s, stream, NULL, NICEFLAG_QUOTE);
-	fputc('\'', stream);
-	return;
+	if (stream) {
+	    fputs("$'", stream);
+	    mb_niceformat(s, stream, NULL, NICEFLAG_QUOTE);
+	    fputc('\'', stream);
+	    return NULL;
+	} else {
+	    char *substr;
+	    mb_niceformat(s, NULL, &substr, NICEFLAG_QUOTE|NICEFLAG_NODUP);
+	    outstr = (char *)zhalloc(4 + strlen(substr));
+	    sprintf(outstr, "$'%s'", substr);
+	    free(substr);
+	    return outstr;
+	}
     }
 #endif /* MULTIBYTE_SUPPORT */
 
     if (!hasspecial(s)) {
-	zputs(s, stream);
-	return;
+	if (stream) {
+	    zputs(s, stream);
+	    return NULL;
+	} else {
+	    return dupstring(s);
+	}
     }
 
+    if (!stream) {
+	const char *cptr;
+	int l = strlen(s) + 2;
+	for (cptr = s; *cptr; cptr++) {
+	    if (*cptr == Meta)
+		cptr++;
+	    else if (*cptr == '\'')
+		l += isset(RCQUOTES) ? 1 : 3;
+	}
+	ptr = outstr = zhalloc(l + 1);
+    } else {
+	ptr = outstr = NULL;
+    }
     if (isset(RCQUOTES)) {
 	/* use rc-style quotes-within-quotes for the whole string */
-	if(fputc('\'', stream) < 0)
-	    return;
+	if (stream) {
+	    if (fputc('\'', stream) < 0)
+		return NULL;
+	} else
+	    *ptr++ = '\'';
 	while(*s) {
 	    if (*s == Meta)
 		c = *++s ^ 32;
@@ -5873,52 +5999,98 @@ quotedzputs(char const *s, FILE *stream)
 		c = *s;
 	    s++;
 	    if (c == '\'') {
-		if(fputc('\'', stream) < 0)
-		    return;
-	    } else if(c == '\n' && isset(CSHJUNKIEQUOTES)) {
-		if(fputc('\\', stream) < 0)
-		    return;
+		if (stream) {
+		    if (fputc('\'', stream) < 0)
+			return NULL;
+		} else
+		    *ptr++ = '\'';
+	    } else if (c == '\n' && isset(CSHJUNKIEQUOTES)) {
+		if (stream) {
+		    if (fputc('\\', stream) < 0)
+			return NULL;
+		} else
+		    *ptr++ = '\\';
+	    }
+	    if (stream) {
+		if (fputc(c, stream) < 0)
+		    return NULL;
+	    } else {
+		if (imeta(c)) {
+		    *ptr++ = Meta;
+		    *ptr++ = c ^ 32;
+		} else
+		    *ptr++ = c;
 	    }
-	    if(fputc(c, stream) < 0)
-		return;
 	}
-	if(fputc('\'', stream) < 0)
-	    return;
+	if (stream) {
+	    if (fputc('\'', stream) < 0)
+		return NULL;
+	} else
+	    *ptr++ = '\'';
     } else {
 	/* use Bourne-style quoting, avoiding empty quoted strings */
-	while(*s) {
+	while (*s) {
 	    if (*s == Meta)
 		c = *++s ^ 32;
 	    else
 		c = *s;
 	    s++;
 	    if (c == '\'') {
-		if(inquote) {
-		    if(fputc('\'', stream) < 0)
-			return;
+		if (inquote) {
+		    if (stream) {
+			if (putc('\'', stream) < 0)
+			    return NULL;
+		    } else
+			*ptr++ = '\'';
 		    inquote=0;
 		}
-		if(fputs("\\'", stream) < 0)
-		    return;
+		if (stream) {
+		    if (fputs("\\'", stream) < 0)
+			return NULL;
+		} else {
+		    *ptr++ = '\\';
+		    *ptr++ = '\'';
+		}
 	    } else {
 		if (!inquote) {
-		    if(fputc('\'', stream) < 0)
-			return;
+		    if (stream) {
+			if (fputc('\'', stream) < 0)
+			    return NULL;
+		    } else
+			*ptr++ = '\'';
 		    inquote=1;
 		}
-		if(c == '\n' && isset(CSHJUNKIEQUOTES)) {
-		    if(fputc('\\', stream) < 0)
-			return;
+		if (c == '\n' && isset(CSHJUNKIEQUOTES)) {
+		    if (stream) {
+			if (fputc('\\', stream) < 0)
+			    return NULL;
+		    } else
+			*ptr++ = '\\';
+		}
+		if (stream) {
+		    if (fputc(c, stream) < 0)
+			return NULL;
+		} else {
+		    if (imeta(c)) {
+			*ptr++ = Meta;
+			*ptr++ = c ^ 32;
+		    } else
+			*ptr++ = c;
 		}
-		if(fputc(c, stream) < 0)
-		    return;
 	    }
 	}
 	if (inquote) {
-	    if(fputc('\'', stream) < 0)
-		return;
+	    if (stream) {
+		if (fputc('\'', stream) < 0)
+		    return NULL;
+	    } else
+		*ptr++ = '\'';
 	}
     }
+    if (!stream)
+	*ptr++ = '\0';
+
+    return outstr;
 }
 
 /* Double-quote a metafied string. */
diff --git a/Src/zsh.h b/Src/zsh.h
index caf7def..0302d68 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -272,7 +272,12 @@ enum {
     /*
      * As QT_BACKSLASH, but a NULL string is shown as ''.
      */
-    QT_BACKSLASH_SHOWNULL
+    QT_BACKSLASH_SHOWNULL,
+    /*
+     * Quoting as produced by quotedzputs(), used for human
+     * readability of parameter values.
+     */
+    QT_QUOTEDZPUTS
 };
 
 #define QT_IS_SINGLE(x)	((x) == QT_SINGLE || (x) == QT_SINGLE_OPTIONAL)
@@ -3055,6 +3060,7 @@ enum {
 enum {
     NICEFLAG_HEAP = 1,		/* Heap allocation where needed */
     NICEFLAG_QUOTE = 2,		/* Result will appear in $'...' */
+    NICEFLAG_NODUP = 4,         /* Leave allocated */
 };
 
 /* Metafied input */
diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index 2b46e06..1460ff6 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -398,7 +398,7 @@
   foo=$'\x7f\x00'
   print -r -- ${(V)foo}
 0:${(V)...}
->\C-?\C-@
+>^?^@
 
   foo='playing '\''stupid'\'' "games" \w\i\t\h $quoting.'
   print -r ${(q)foo}
diff --git a/Test/V09datetime.ztst b/Test/V09datetime.ztst
index 831421d..7905155 100644
--- a/Test/V09datetime.ztst
+++ b/Test/V09datetime.ztst
@@ -71,4 +71,4 @@
 
   print -r -- ${(V)"$(strftime $'%Y\0%m\0%d' 100000000)"}
 0:Embedded nulls
->1973\C-@03\C-@03
+>1973^@03^@03


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-12-07 21:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 14:05 The "set" utility outputs binary data Vincent Lefevre
2015-12-03 14:25 ` Peter Stephenson
2015-12-04 14:29   ` Peter Stephenson
2015-12-04 21:56     ` Peter Stephenson
2015-12-06 23:08       ` Bart Schaefer
2015-12-07 10:24         ` Peter Stephenson
2015-12-07 18:13           ` Bart Schaefer
2015-12-07 21:39             ` Peter Stephenson
2015-12-07 18:29           ` Nikolay Aleksandrovich Pavlov (ZyX)
2015-12-03 14:46 ` Stephane Chazelas
2015-12-03 23:43 ` Daniel Shahaf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).