zsh-workers
 help / color / mirror / code / Atom feed
* printf %q segfault
@ 2016-10-16 14:58 lolilolicon
  2016-10-16 16:03 ` Daniel Shahaf
  0 siblings, 1 reply; 4+ messages in thread
From: lolilolicon @ 2016-10-16 14:58 UTC (permalink / raw)
  To: zsh-workers

The following produces segmentation fault:

    printf '%q' 你

produced with zsh 5.2.

Ask if you need any more info.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: printf %q segfault
  2016-10-16 14:58 printf %q segfault lolilolicon
@ 2016-10-16 16:03 ` Daniel Shahaf
  2016-10-18 19:57   ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Shahaf @ 2016-10-16 16:03 UTC (permalink / raw)
  To: lolilolicon; +Cc: zsh-workers

lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> The following produces segmentation fault:
> 
>     printf '%q' 你
> 
> produced with zsh 5.2.
> 
> Ask if you need any more info.

With latest master it doesn't segfault, but it's not correct, either:

% printf '%q' 你 | xxd
0000000: 2427 5c33 3434 2724 275c 3237 3527 a0    $'\344'$'\275'.

The UTF-8 encoding of your character is E4 BD A0, however, the low byte
(0xA0) is output literally.  Since a lone 0xA0 is not a valid UTF-8
sequence, my terminal renders it [if I remove the |xxd pipe] as a U+FFFD
REPLACEMENT CHARACTER instead.

This also reproduces with «printf '%q\n' $'\U00A0'», which should print
either « » (a non-breaking-space) or «$'\302'$'\240'» (the quotestring()
representation of the UTF-8 encoding of U+00A0; that encoding is C2 A0).

Bottom line: the byte 0xA0 should not be printed literally but escaped.

The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
branch in quotestring(); if it didn't take that branch, it'd behave
correctly.

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: printf %q segfault
  2016-10-16 16:03 ` Daniel Shahaf
@ 2016-10-18 19:57   ` Peter Stephenson
  2016-10-19  8:52     ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2016-10-18 19:57 UTC (permalink / raw)
  To: zsh-workers

On Sun, 16 Oct 2016 16:03:12 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:

> lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> > The following produces segmentation fault:
> > 
> >     printf '%q' 你
> 
> The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
> branch in quotestring(); if it didn't take that branch, it'd behave
> correctly.

hmm...

mumble mumble metafy mumble bin_print mumble mumble total madness mumble
metafy shmetafy token shmoken grmph.

pws

diff --git a/Src/builtin.c b/Src/builtin.c
index 8b8b217..2db739f 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -4874,9 +4874,10 @@ bin_print(char *name, char **args, Options ops, int func)
 		break;
 	    case 'q':
 		stringval = curarg ?
-		    quotestring(curarg, QT_BACKSLASH_SHOWNULL) : &nullstr;
+		    quotestring(metafy(curarg, curlen, META_USEHEAP),
+				QT_BACKSLASH_SHOWNULL) : &nullstr;
 		*d = 's';
-		print_val(stringval);
+		print_val(unmetafy(stringval, &curlen));
 		break;
 	    case 'd':
 	    case 'i':
diff --git a/Src/utils.c b/Src/utils.c
index db43529..e2657de 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5916,7 +5916,24 @@ quotestring(const char *s, int instring)
 		}
 	    }
 
-	    if (itok(*u) || instring != QT_BACKSLASH) {
+	    /*
+	     * Now check if the output is unprintable in the
+	     * current character set.
+	     */
+	    uend = u + MB_METACHARLENCONV(u, &cc);
+	    if (
+#ifdef MULTIBYTE_SUPPORT
+		cc != WEOF &&
+#endif
+		WC_ISPRINT(cc)) {
+		if (dobackslash)
+		    *v++ = '\\';
+		while (u < uend) {
+		    if (*u == Meta)
+			*v++ = *u++;
+		    *v++ = *u++;
+		}
+	    } else if (itok(*u) || instring != QT_BACKSLASH) {
 		/* Needs to be passed straight through. */
 		if (dobackslash)
 		    *v++ = '\\';
@@ -5940,25 +5957,6 @@ quotestring(const char *s, int instring)
 		} else
 		    *v++ = *u++;
 		continue;
-	    }
-
-	    /*
-	     * Now check if the output is unprintable in the
-	     * current character set.
-	     */
-	    uend = u + MB_METACHARLENCONV(u, &cc);
-	    if (
-#ifdef MULTIBYTE_SUPPORT
-		cc != WEOF &&
-#endif
-		WC_ISPRINT(cc)) {
-		if (dobackslash)
-		    *v++ = '\\';
-		while (u < uend) {
-		    if (*u == Meta)
-			*v++ = *u++;
-		    *v++ = *u++;
-		}
 	    } else {
 		/* Not printable */
 		*v++ = '$';
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 1b1d042..3a6e955 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -579,3 +579,7 @@
 0:Sorting of metafied Polish characters
 >a ą b c ć d e ę f
 >a ą b c ć d e ę f
+
+  printf '%q%q\n' 你你
+0:printf %q and quotestring and general metafy / token madness
+>你你


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: printf %q segfault
  2016-10-18 19:57   ` Peter Stephenson
@ 2016-10-19  8:52     ` Peter Stephenson
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2016-10-19  8:52 UTC (permalink / raw)
  To: zsh-workers

On Tue, 18 Oct 2016 20:57:15 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> On Sun, 16 Oct 2016 16:03:12 +0000
> Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> 
> > lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> > > The following produces segmentation fault:
> > > 
> > >     printf '%q' 你
> > 
> > The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
> > branch in quotestring(); if it didn't take that branch, it'd behave
> > correctly.
>
> mumble

I don't think we need the utils.c hunk.  We shouldn't meet an unmetafied
token when the input is handled properly, unless the input actually is
still tokenised.  There's at least one place where this does happen, I
think down in completion.

pws

diff --git a/Src/builtin.c b/Src/builtin.c
index 8b8b217..2db739f 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -4874,9 +4874,10 @@ bin_print(char *name, char **args, Options ops, int func)
 		break;
 	    case 'q':
 		stringval = curarg ?
-		    quotestring(curarg, QT_BACKSLASH_SHOWNULL) : &nullstr;
+		    quotestring(metafy(curarg, curlen, META_USEHEAP),
+				QT_BACKSLASH_SHOWNULL) : &nullstr;
 		*d = 's';
-		print_val(stringval);
+		print_val(unmetafy(stringval, &curlen));
 		break;
 	    case 'd':
 	    case 'i':
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 1b1d042..3a6e955 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -579,3 +579,7 @@
 0:Sorting of metafied Polish characters
 >a ą b c ć d e ę f
 >a ą b c ć d e ę f
+
+  printf '%q%q\n' 你你
+0:printf %q and quotestring and general metafy / token madness
+>你你


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-19  9:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-16 14:58 printf %q segfault lolilolicon
2016-10-16 16:03 ` Daniel Shahaf
2016-10-18 19:57   ` Peter Stephenson
2016-10-19  8:52     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).