* printf %q segfault
@ 2016-10-16 14:58 lolilolicon
2016-10-16 16:03 ` Daniel Shahaf
0 siblings, 1 reply; 4+ messages in thread
From: lolilolicon @ 2016-10-16 14:58 UTC (permalink / raw)
To: zsh-workers
The following produces segmentation fault:
printf '%q' 你
produced with zsh 5.2.
Ask if you need any more info.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: printf %q segfault
2016-10-16 14:58 printf %q segfault lolilolicon
@ 2016-10-16 16:03 ` Daniel Shahaf
2016-10-18 19:57 ` Peter Stephenson
0 siblings, 1 reply; 4+ messages in thread
From: Daniel Shahaf @ 2016-10-16 16:03 UTC (permalink / raw)
To: lolilolicon; +Cc: zsh-workers
lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> The following produces segmentation fault:
>
> printf '%q' 你
>
> produced with zsh 5.2.
>
> Ask if you need any more info.
With latest master it doesn't segfault, but it's not correct, either:
% printf '%q' 你 | xxd
0000000: 2427 5c33 3434 2724 275c 3237 3527 a0 $'\344'$'\275'.
The UTF-8 encoding of your character is E4 BD A0, however, the low byte
(0xA0) is output literally. Since a lone 0xA0 is not a valid UTF-8
sequence, my terminal renders it [if I remove the |xxd pipe] as a U+FFFD
REPLACEMENT CHARACTER instead.
This also reproduces with «printf '%q\n' $'\U00A0'», which should print
either « » (a non-breaking-space) or «$'\302'$'\240'» (the quotestring()
representation of the UTF-8 encoding of U+00A0; that encoding is C2 A0).
Bottom line: the byte 0xA0 should not be printed literally but escaped.
The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
branch in quotestring(); if it didn't take that branch, it'd behave
correctly.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: printf %q segfault
2016-10-16 16:03 ` Daniel Shahaf
@ 2016-10-18 19:57 ` Peter Stephenson
2016-10-19 8:52 ` Peter Stephenson
0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2016-10-18 19:57 UTC (permalink / raw)
To: zsh-workers
On Sun, 16 Oct 2016 16:03:12 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> > The following produces segmentation fault:
> >
> > printf '%q' 你
>
> The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
> branch in quotestring(); if it didn't take that branch, it'd behave
> correctly.
hmm...
mumble mumble metafy mumble bin_print mumble mumble total madness mumble
metafy shmetafy token shmoken grmph.
pws
diff --git a/Src/builtin.c b/Src/builtin.c
index 8b8b217..2db739f 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -4874,9 +4874,10 @@ bin_print(char *name, char **args, Options ops, int func)
break;
case 'q':
stringval = curarg ?
- quotestring(curarg, QT_BACKSLASH_SHOWNULL) : &nullstr;
+ quotestring(metafy(curarg, curlen, META_USEHEAP),
+ QT_BACKSLASH_SHOWNULL) : &nullstr;
*d = 's';
- print_val(stringval);
+ print_val(unmetafy(stringval, &curlen));
break;
case 'd':
case 'i':
diff --git a/Src/utils.c b/Src/utils.c
index db43529..e2657de 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5916,7 +5916,24 @@ quotestring(const char *s, int instring)
}
}
- if (itok(*u) || instring != QT_BACKSLASH) {
+ /*
+ * Now check if the output is unprintable in the
+ * current character set.
+ */
+ uend = u + MB_METACHARLENCONV(u, &cc);
+ if (
+#ifdef MULTIBYTE_SUPPORT
+ cc != WEOF &&
+#endif
+ WC_ISPRINT(cc)) {
+ if (dobackslash)
+ *v++ = '\\';
+ while (u < uend) {
+ if (*u == Meta)
+ *v++ = *u++;
+ *v++ = *u++;
+ }
+ } else if (itok(*u) || instring != QT_BACKSLASH) {
/* Needs to be passed straight through. */
if (dobackslash)
*v++ = '\\';
@@ -5940,25 +5957,6 @@ quotestring(const char *s, int instring)
} else
*v++ = *u++;
continue;
- }
-
- /*
- * Now check if the output is unprintable in the
- * current character set.
- */
- uend = u + MB_METACHARLENCONV(u, &cc);
- if (
-#ifdef MULTIBYTE_SUPPORT
- cc != WEOF &&
-#endif
- WC_ISPRINT(cc)) {
- if (dobackslash)
- *v++ = '\\';
- while (u < uend) {
- if (*u == Meta)
- *v++ = *u++;
- *v++ = *u++;
- }
} else {
/* Not printable */
*v++ = '$';
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 1b1d042..3a6e955 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -579,3 +579,7 @@
0:Sorting of metafied Polish characters
>a ą b c ć d e ę f
>a ą b c ć d e ę f
+
+ printf '%q%q\n' 你你
+0:printf %q and quotestring and general metafy / token madness
+>你你
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: printf %q segfault
2016-10-18 19:57 ` Peter Stephenson
@ 2016-10-19 8:52 ` Peter Stephenson
0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2016-10-19 8:52 UTC (permalink / raw)
To: zsh-workers
On Tue, 18 Oct 2016 20:57:15 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> On Sun, 16 Oct 2016 16:03:12 +0000
> Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
>
> > lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800:
> > > The following produces segmentation fault:
> > >
> > > printf '%q' 你
> >
> > The reason 0xA0 is output literally is that the code takes the "if (itok(*u))"
> > branch in quotestring(); if it didn't take that branch, it'd behave
> > correctly.
>
> mumble
I don't think we need the utils.c hunk. We shouldn't meet an unmetafied
token when the input is handled properly, unless the input actually is
still tokenised. There's at least one place where this does happen, I
think down in completion.
pws
diff --git a/Src/builtin.c b/Src/builtin.c
index 8b8b217..2db739f 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -4874,9 +4874,10 @@ bin_print(char *name, char **args, Options ops, int func)
break;
case 'q':
stringval = curarg ?
- quotestring(curarg, QT_BACKSLASH_SHOWNULL) : &nullstr;
+ quotestring(metafy(curarg, curlen, META_USEHEAP),
+ QT_BACKSLASH_SHOWNULL) : &nullstr;
*d = 's';
- print_val(stringval);
+ print_val(unmetafy(stringval, &curlen));
break;
case 'd':
case 'i':
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 1b1d042..3a6e955 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -579,3 +579,7 @@
0:Sorting of metafied Polish characters
>a ą b c ć d e ę f
>a ą b c ć d e ę f
+
+ printf '%q%q\n' 你你
+0:printf %q and quotestring and general metafy / token madness
+>你你
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-10-19 9:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-16 14:58 printf %q segfault lolilolicon
2016-10-16 16:03 ` Daniel Shahaf
2016-10-18 19:57 ` Peter Stephenson
2016-10-19 8:52 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).