* [BUG] zformat -f has no multibyte support @ 2019-12-23 22:24 zsugabubus 2019-12-24 20:23 ` Daniel Shahaf 0 siblings, 1 reply; 5+ messages in thread From: zsugabubus @ 2019-12-23 22:24 UTC (permalink / raw) To: zsh-workers Hi, $ setopt multibyte $ zformat -f X '%-3s' 's:ő'; echo $X "ő" $ zformat -f X '%.1s' 's:ő'; echo $X (garbage) $ zformat -f X '%-3s' 's:o'; echo $X " o" -- zsugabubus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support 2019-12-23 22:24 [BUG] zformat -f has no multibyte support zsugabubus @ 2019-12-24 20:23 ` Daniel Shahaf 2019-12-26 4:35 ` dana 0 siblings, 1 reply; 5+ messages in thread From: Daniel Shahaf @ 2019-12-24 20:23 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 510 bytes --] zsugabubus wrote on Mon, Dec 23, 2019 at 23:24:36 +0100: > $ setopt multibyte > $ zformat -f X '%-3s' 's:ő'; echo $X > "ő" > $ zformat -f X '%.1s' 's:ő'; echo $X > (garbage) > $ zformat -f X '%-3s' 's:o'; echo $X > " o" The printf builtin handles this correctly, so this should be fairly easy to fix. Actually, I don't suppose we could just call into the printf code directly, can we? It _works_ (see attachment), but it's not elegant. Aside: It is customary to use a valid from address. [-- Attachment #2: zformat-printf-PoC.patch --] [-- Type: text/x-diff, Size: 3519 bytes --] diff --git a/Src/Modules/zutil.c b/Src/Modules/zutil.c index 7d9bf05d6..bb00c8a24 100644 --- a/Src/Modules/zutil.c +++ b/Src/Modules/zutil.c @@ -775,7 +775,7 @@ static char *zformat_substring(char* instr, char **specs, char **outp, for (s = instr; *s && *s != endchar; s++) { if (*s == '%') { - int right, min = -1, max = -1, outl, testit; + int right, min = -1, max = -1, testit; char *spec, *start = s; if ((right = (*++s == '-'))) @@ -835,11 +835,49 @@ static char *zformat_substring(char* instr, char **specs, char **outp, } else if (skip) { continue; } else if ((spec = specs[STOUC(*s)])) { - int len; + int outl; + Param pm; + LinkList args = newlinklist(); + + /* '%', '-', min, ',', max, 's', NUL. */ + char fmt[1 + 1 + DIGBUFSIZE-1 + 1 + DIGBUFSIZE-1 + 1 + 1]; + + /* zformat uses minus to mean "pad on the left". + * printf uses minus to mean "pad on the right". */ + const char *optional_minus = (right ? "" : "-"); + + startparamscope(); + pm = createparam("REPLY", PM_LOCAL|PM_SCALAR); + if (pm) + pm->level = locallevel; /* because createparam() doesn't */ + + addlinknode(args, "printf"); + addlinknode(args, "-v"); + addlinknode(args, "REPLY"); + + if (min >= 0 && max >= 0) { + snprintf(fmt, sizeof(fmt), "%%%s%d.%ds", optional_minus, min, max); + } else if (min >= 0) { + snprintf(fmt, sizeof(fmt), "%%%s%ds", optional_minus, min); + } else if (max >= 0) { + snprintf(fmt, sizeof(fmt), "%%.%ds", max); + } else { + snprintf(fmt, sizeof(fmt), "%%%ss", optional_minus); + } + addlinknode(args, fmt); - if ((len = strlen(spec)) > max && max >= 0) - len = max; - outl = (min >= 0 ? (min > len ? min : len) : len); + addlinknode(args, spec); + + { + Builtin builtin_printf = + (Builtin)builtintab->getnode(builtintab, "printf"); + local_list0(assigns); + + init_list0(assigns); + execbuiltin(args, &assigns, builtin_printf); + } + + outl = strlen(getsparam("REPLY")); if (*ousedp + outl >= *olenp) { int nlen = *olenp + outl + 128; @@ -849,24 +887,11 @@ static char *zformat_substring(char* instr, char **specs, char **outp, *olenp = nlen; *outp = tmp; } - if (len >= outl) { - memcpy(*outp + *ousedp, spec, outl); + { + memcpy(*outp + *ousedp, getsparam("REPLY"), outl); *ousedp += outl; - } else { - int diff = outl - len; - - if (right) { - while (diff--) - (*outp)[(*ousedp)++] = ' '; - memcpy(*outp + *ousedp, spec, len); - *ousedp += len; - } else { - memcpy(*outp + *ousedp, spec, len); - *ousedp += len; - while (diff--) - (*outp)[(*ousedp)++] = ' '; - } } + endparamscope(); } else { int len = s - start + 1; diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst index e20315340..3e7ec061f 100644 --- a/Test/D07multibyte.ztst +++ b/Test/D07multibyte.ztst @@ -585,3 +585,12 @@ >OK F:A failure here may indicate the system regex library does not F:support character sets outside the portable 7-bit range. + + if zmodload zsh/zutil 2>/dev/null; then + zformat -f REPLY '%.3s' 's:ヌxéfoo' + echo $REPLY + else + ZTST_skip="can't load the zsh/zutil module for testing" + fi +0:zformat multibyte test +>ヌxé diff --git a/Test/V13zformat.ztst b/Test/V13zformat.ztst index 982866e13..91901cbf4 100644 --- a/Test/V13zformat.ztst +++ b/Test/V13zformat.ztst @@ -65,3 +65,5 @@ >ipsum.bar >bazbaz >\esc:ape + +# Multibyte tests in D07multibyte.ztst ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support 2019-12-24 20:23 ` Daniel Shahaf @ 2019-12-26 4:35 ` dana 2019-12-26 5:04 ` Daniel Shahaf 0 siblings, 1 reply; 5+ messages in thread From: dana @ 2019-12-26 4:35 UTC (permalink / raw) To: Daniel Shahaf; +Cc: zsh-workers On 24 Dec 2019, at 14:23, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > Actually, I don't suppose we could just call into the printf code directly, can > we? It _works_ (see attachment), but it's not elegant. A very quick before/after test shows that it reduces performance quite a bit, especially as the number of format specs increases. Admittedly, it's only a few µs per spec in absolute numbers (at least on my Mac, with the handful of operations i tested), and i don't know how that would compare with a 'lower-level' approach. I assume that multi-byte support will incur *some* penalty no matter how it's done, so maybe it's reasonable, idk. The only other thing i'd worry about is the user having to micro-manage REPLY, but it looks like the way you've done it avoids any issues with that. dana ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support 2019-12-26 4:35 ` dana @ 2019-12-26 5:04 ` Daniel Shahaf 2019-12-26 11:28 ` Daniel Shahaf 0 siblings, 1 reply; 5+ messages in thread From: Daniel Shahaf @ 2019-12-26 5:04 UTC (permalink / raw) To: zsh-workers dana wrote on Thu, 26 Dec 2019 04:35 +00:00: > On 24 Dec 2019, at 14:23, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > > Actually, I don't suppose we could just call into the printf code directly, can > > we? It _works_ (see attachment), but it's not elegant. > > A very quick before/after test shows that it reduces performance quite a bit, > especially as the number of format specs increases. Admittedly, it's only a > few µs per spec in absolute numbers (at least on my Mac, with the handful of > operations i tested), What is the _factor_ of slowdown? (Multiplicative, as opposed to additive) I.e., does the patch make things slower by 1%, or 10%, or 100%, or 1000%? > and i don't know how that would compare with a > 'lower-level' approach. I assume that multi-byte support will incur *some* > penalty no matter how it's done, so maybe it's reasonable, idk. I don't have an intuition for what the penalty should be, but to get a sense of the orders of magnitude involved we could the behaviour of the 'printf' builtin when built with and without multibyte support. Thanks, Daniel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support 2019-12-26 5:04 ` Daniel Shahaf @ 2019-12-26 11:28 ` Daniel Shahaf 0 siblings, 0 replies; 5+ messages in thread From: Daniel Shahaf @ 2019-12-26 11:28 UTC (permalink / raw) To: zsh-workers Daniel Shahaf wrote on Thu, 26 Dec 2019 05:04 +00:00: > dana wrote on Thu, 26 Dec 2019 04:35 +00:00: > > On 24 Dec 2019, at 14:23, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > > > Actually, I don't suppose we could just call into the printf code directly, can > > > we? It _works_ (see attachment), but it's not elegant. > > > > A very quick before/after test shows that it reduces performance quite a bit, > > especially as the number of format specs increases. Admittedly, it's only a > > few µs per spec in absolute numbers (at least on my Mac, with the handful of > > operations i tested), > > What is the _factor_ of slowdown? (Multiplicative, as opposed to additive) > I.e., does the patch make things slower by 1%, or 10%, or 100%, or 1000%? According to dana and I's results, the factor is between x4 and x29: [[[ % for 1 in */Src/zsh; do printf "%-16s%s\n" $1:h:h $($1 -fc 'time ( zformat -f x ${(r.1000000..%a.):-} a:o )'); done ( zformat -f x ${(r.1000000..%a.):-} a:o; ) 0.10s user 0.27s system 99% cpu 0.369 total master ( zformat -f x ${(r.1000000..%a.):-} a:o; ) 1.04s user 0.33s system 99% cpu 1.372 total patch ]]] [[[ # Without patch [DEV] zsh % repeat 5 utime zformat -f x ${(r.10000..%a.):-} a:o [429 µs] [507 µs] [499 µs] [468 µs] [609 µs] # With patch [DEV] zsh % repeat 5 utime zformat -f x ${(r.10000..%a.):-} a:o [16577 µs] [14063 µs] [13941 µs] [13972 µs] [13569 µs] ]]] I don't know how to explain the difference (we've both had other things to look into). ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-12-26 11:30 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-23 22:24 [BUG] zformat -f has no multibyte support zsugabubus 2019-12-24 20:23 ` Daniel Shahaf 2019-12-26 4:35 ` dana 2019-12-26 5:04 ` Daniel Shahaf 2019-12-26 11:28 ` Daniel Shahaf
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).