* printf %q segfault @ 2016-10-16 14:58 lolilolicon 2016-10-16 16:03 ` Daniel Shahaf 0 siblings, 1 reply; 4+ messages in thread From: lolilolicon @ 2016-10-16 14:58 UTC (permalink / raw) To: zsh-workers The following produces segmentation fault: printf '%q' 你 produced with zsh 5.2. Ask if you need any more info. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: printf %q segfault 2016-10-16 14:58 printf %q segfault lolilolicon @ 2016-10-16 16:03 ` Daniel Shahaf 2016-10-18 19:57 ` Peter Stephenson 0 siblings, 1 reply; 4+ messages in thread From: Daniel Shahaf @ 2016-10-16 16:03 UTC (permalink / raw) To: lolilolicon; +Cc: zsh-workers lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800: > The following produces segmentation fault: > > printf '%q' 你 > > produced with zsh 5.2. > > Ask if you need any more info. With latest master it doesn't segfault, but it's not correct, either: % printf '%q' 你 | xxd 0000000: 2427 5c33 3434 2724 275c 3237 3527 a0 $'\344'$'\275'. The UTF-8 encoding of your character is E4 BD A0, however, the low byte (0xA0) is output literally. Since a lone 0xA0 is not a valid UTF-8 sequence, my terminal renders it [if I remove the |xxd pipe] as a U+FFFD REPLACEMENT CHARACTER instead. This also reproduces with «printf '%q\n' $'\U00A0'», which should print either « » (a non-breaking-space) or «$'\302'$'\240'» (the quotestring() representation of the UTF-8 encoding of U+00A0; that encoding is C2 A0). Bottom line: the byte 0xA0 should not be printed literally but escaped. The reason 0xA0 is output literally is that the code takes the "if (itok(*u))" branch in quotestring(); if it didn't take that branch, it'd behave correctly. Cheers, Daniel ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: printf %q segfault 2016-10-16 16:03 ` Daniel Shahaf @ 2016-10-18 19:57 ` Peter Stephenson 2016-10-19 8:52 ` Peter Stephenson 0 siblings, 1 reply; 4+ messages in thread From: Peter Stephenson @ 2016-10-18 19:57 UTC (permalink / raw) To: zsh-workers On Sun, 16 Oct 2016 16:03:12 +0000 Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800: > > The following produces segmentation fault: > > > > printf '%q' 你 > > The reason 0xA0 is output literally is that the code takes the "if (itok(*u))" > branch in quotestring(); if it didn't take that branch, it'd behave > correctly. hmm... mumble mumble metafy mumble bin_print mumble mumble total madness mumble metafy shmetafy token shmoken grmph. pws diff --git a/Src/builtin.c b/Src/builtin.c index 8b8b217..2db739f 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -4874,9 +4874,10 @@ bin_print(char *name, char **args, Options ops, int func) break; case 'q': stringval = curarg ? - quotestring(curarg, QT_BACKSLASH_SHOWNULL) : &nullstr; + quotestring(metafy(curarg, curlen, META_USEHEAP), + QT_BACKSLASH_SHOWNULL) : &nullstr; *d = 's'; - print_val(stringval); + print_val(unmetafy(stringval, &curlen)); break; case 'd': case 'i': diff --git a/Src/utils.c b/Src/utils.c index db43529..e2657de 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -5916,7 +5916,24 @@ quotestring(const char *s, int instring) } } - if (itok(*u) || instring != QT_BACKSLASH) { + /* + * Now check if the output is unprintable in the + * current character set. + */ + uend = u + MB_METACHARLENCONV(u, &cc); + if ( +#ifdef MULTIBYTE_SUPPORT + cc != WEOF && +#endif + WC_ISPRINT(cc)) { + if (dobackslash) + *v++ = '\\'; + while (u < uend) { + if (*u == Meta) + *v++ = *u++; + *v++ = *u++; + } + } else if (itok(*u) || instring != QT_BACKSLASH) { /* Needs to be passed straight through. */ if (dobackslash) *v++ = '\\'; @@ -5940,25 +5957,6 @@ quotestring(const char *s, int instring) } else *v++ = *u++; continue; - } - - /* - * Now check if the output is unprintable in the - * current character set. - */ - uend = u + MB_METACHARLENCONV(u, &cc); - if ( -#ifdef MULTIBYTE_SUPPORT - cc != WEOF && -#endif - WC_ISPRINT(cc)) { - if (dobackslash) - *v++ = '\\'; - while (u < uend) { - if (*u == Meta) - *v++ = *u++; - *v++ = *u++; - } } else { /* Not printable */ *v++ = '$'; diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst index 1b1d042..3a6e955 100644 --- a/Test/D07multibyte.ztst +++ b/Test/D07multibyte.ztst @@ -579,3 +579,7 @@ 0:Sorting of metafied Polish characters >a ą b c ć d e ę f >a ą b c ć d e ę f + + printf '%q%q\n' 你你 +0:printf %q and quotestring and general metafy / token madness +>你你 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: printf %q segfault 2016-10-18 19:57 ` Peter Stephenson @ 2016-10-19 8:52 ` Peter Stephenson 0 siblings, 0 replies; 4+ messages in thread From: Peter Stephenson @ 2016-10-19 8:52 UTC (permalink / raw) To: zsh-workers On Tue, 18 Oct 2016 20:57:15 +0100 Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > On Sun, 16 Oct 2016 16:03:12 +0000 > Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > > > lolilolicon wrote on Sun, Oct 16, 2016 at 22:58:14 +0800: > > > The following produces segmentation fault: > > > > > > printf '%q' 你 > > > > The reason 0xA0 is output literally is that the code takes the "if (itok(*u))" > > branch in quotestring(); if it didn't take that branch, it'd behave > > correctly. > > mumble I don't think we need the utils.c hunk. We shouldn't meet an unmetafied token when the input is handled properly, unless the input actually is still tokenised. There's at least one place where this does happen, I think down in completion. pws diff --git a/Src/builtin.c b/Src/builtin.c index 8b8b217..2db739f 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -4874,9 +4874,10 @@ bin_print(char *name, char **args, Options ops, int func) break; case 'q': stringval = curarg ? - quotestring(curarg, QT_BACKSLASH_SHOWNULL) : &nullstr; + quotestring(metafy(curarg, curlen, META_USEHEAP), + QT_BACKSLASH_SHOWNULL) : &nullstr; *d = 's'; - print_val(stringval); + print_val(unmetafy(stringval, &curlen)); break; case 'd': case 'i': diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst index 1b1d042..3a6e955 100644 --- a/Test/D07multibyte.ztst +++ b/Test/D07multibyte.ztst @@ -579,3 +579,7 @@ 0:Sorting of metafied Polish characters >a ą b c ć d e ę f >a ą b c ć d e ę f + + printf '%q%q\n' 你你 +0:printf %q and quotestring and general metafy / token madness +>你你 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-10-19 9:02 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-16 14:58 printf %q segfault lolilolicon 2016-10-16 16:03 ` Daniel Shahaf 2016-10-18 19:57 ` Peter Stephenson 2016-10-19 8:52 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).