* [BUG] zformat -f has no multibyte support
@ 2019-12-23 22:24 zsugabubus
2019-12-24 20:23 ` Daniel Shahaf
0 siblings, 1 reply; 5+ messages in thread
From: zsugabubus @ 2019-12-23 22:24 UTC (permalink / raw)
To: zsh-workers
Hi,
$ setopt multibyte
$ zformat -f X '%-3s' 's:ő'; echo $X
"ő"
$ zformat -f X '%.1s' 's:ő'; echo $X
(garbage)
$ zformat -f X '%-3s' 's:o'; echo $X
" o"
--
zsugabubus
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support
2019-12-23 22:24 [BUG] zformat -f has no multibyte support zsugabubus
@ 2019-12-24 20:23 ` Daniel Shahaf
2019-12-26 4:35 ` dana
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Shahaf @ 2019-12-24 20:23 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 510 bytes --]
zsugabubus wrote on Mon, Dec 23, 2019 at 23:24:36 +0100:
> $ setopt multibyte
> $ zformat -f X '%-3s' 's:ő'; echo $X
> "ő"
> $ zformat -f X '%.1s' 's:ő'; echo $X
> (garbage)
> $ zformat -f X '%-3s' 's:o'; echo $X
> " o"
The printf builtin handles this correctly, so this should be fairly easy to fix.
Actually, I don't suppose we could just call into the printf code directly, can
we? It _works_ (see attachment), but it's not elegant.
Aside: It is customary to use a valid from address.
[-- Attachment #2: zformat-printf-PoC.patch --]
[-- Type: text/x-diff, Size: 3519 bytes --]
diff --git a/Src/Modules/zutil.c b/Src/Modules/zutil.c
index 7d9bf05d6..bb00c8a24 100644
--- a/Src/Modules/zutil.c
+++ b/Src/Modules/zutil.c
@@ -775,7 +775,7 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
for (s = instr; *s && *s != endchar; s++) {
if (*s == '%') {
- int right, min = -1, max = -1, outl, testit;
+ int right, min = -1, max = -1, testit;
char *spec, *start = s;
if ((right = (*++s == '-')))
@@ -835,11 +835,49 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
} else if (skip) {
continue;
} else if ((spec = specs[STOUC(*s)])) {
- int len;
+ int outl;
+ Param pm;
+ LinkList args = newlinklist();
+
+ /* '%', '-', min, ',', max, 's', NUL. */
+ char fmt[1 + 1 + DIGBUFSIZE-1 + 1 + DIGBUFSIZE-1 + 1 + 1];
+
+ /* zformat uses minus to mean "pad on the left".
+ * printf uses minus to mean "pad on the right". */
+ const char *optional_minus = (right ? "" : "-");
+
+ startparamscope();
+ pm = createparam("REPLY", PM_LOCAL|PM_SCALAR);
+ if (pm)
+ pm->level = locallevel; /* because createparam() doesn't */
+
+ addlinknode(args, "printf");
+ addlinknode(args, "-v");
+ addlinknode(args, "REPLY");
+
+ if (min >= 0 && max >= 0) {
+ snprintf(fmt, sizeof(fmt), "%%%s%d.%ds", optional_minus, min, max);
+ } else if (min >= 0) {
+ snprintf(fmt, sizeof(fmt), "%%%s%ds", optional_minus, min);
+ } else if (max >= 0) {
+ snprintf(fmt, sizeof(fmt), "%%.%ds", max);
+ } else {
+ snprintf(fmt, sizeof(fmt), "%%%ss", optional_minus);
+ }
+ addlinknode(args, fmt);
- if ((len = strlen(spec)) > max && max >= 0)
- len = max;
- outl = (min >= 0 ? (min > len ? min : len) : len);
+ addlinknode(args, spec);
+
+ {
+ Builtin builtin_printf =
+ (Builtin)builtintab->getnode(builtintab, "printf");
+ local_list0(assigns);
+
+ init_list0(assigns);
+ execbuiltin(args, &assigns, builtin_printf);
+ }
+
+ outl = strlen(getsparam("REPLY"));
if (*ousedp + outl >= *olenp) {
int nlen = *olenp + outl + 128;
@@ -849,24 +887,11 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
*olenp = nlen;
*outp = tmp;
}
- if (len >= outl) {
- memcpy(*outp + *ousedp, spec, outl);
+ {
+ memcpy(*outp + *ousedp, getsparam("REPLY"), outl);
*ousedp += outl;
- } else {
- int diff = outl - len;
-
- if (right) {
- while (diff--)
- (*outp)[(*ousedp)++] = ' ';
- memcpy(*outp + *ousedp, spec, len);
- *ousedp += len;
- } else {
- memcpy(*outp + *ousedp, spec, len);
- *ousedp += len;
- while (diff--)
- (*outp)[(*ousedp)++] = ' ';
- }
}
+ endparamscope();
} else {
int len = s - start + 1;
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index e20315340..3e7ec061f 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -585,3 +585,12 @@
>OK
F:A failure here may indicate the system regex library does not
F:support character sets outside the portable 7-bit range.
+
+ if zmodload zsh/zutil 2>/dev/null; then
+ zformat -f REPLY '%.3s' 's:ヌxéfoo'
+ echo $REPLY
+ else
+ ZTST_skip="can't load the zsh/zutil module for testing"
+ fi
+0:zformat multibyte test
+>ヌxé
diff --git a/Test/V13zformat.ztst b/Test/V13zformat.ztst
index 982866e13..91901cbf4 100644
--- a/Test/V13zformat.ztst
+++ b/Test/V13zformat.ztst
@@ -65,3 +65,5 @@
>ipsum.bar
>bazbaz
>\esc:ape
+
+# Multibyte tests in D07multibyte.ztst
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support
2019-12-24 20:23 ` Daniel Shahaf
@ 2019-12-26 4:35 ` dana
2019-12-26 5:04 ` Daniel Shahaf
0 siblings, 1 reply; 5+ messages in thread
From: dana @ 2019-12-26 4:35 UTC (permalink / raw)
To: Daniel Shahaf; +Cc: zsh-workers
On 24 Dec 2019, at 14:23, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> Actually, I don't suppose we could just call into the printf code directly, can
> we? It _works_ (see attachment), but it's not elegant.
A very quick before/after test shows that it reduces performance quite a bit,
especially as the number of format specs increases. Admittedly, it's only a
few µs per spec in absolute numbers (at least on my Mac, with the handful of
operations i tested), and i don't know how that would compare with a
'lower-level' approach. I assume that multi-byte support will incur *some*
penalty no matter how it's done, so maybe it's reasonable, idk.
The only other thing i'd worry about is the user having to micro-manage REPLY,
but it looks like the way you've done it avoids any issues with that.
dana
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support
2019-12-26 4:35 ` dana
@ 2019-12-26 5:04 ` Daniel Shahaf
2019-12-26 11:28 ` Daniel Shahaf
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Shahaf @ 2019-12-26 5:04 UTC (permalink / raw)
To: zsh-workers
dana wrote on Thu, 26 Dec 2019 04:35 +00:00:
> On 24 Dec 2019, at 14:23, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > Actually, I don't suppose we could just call into the printf code directly, can
> > we? It _works_ (see attachment), but it's not elegant.
>
> A very quick before/after test shows that it reduces performance quite a bit,
> especially as the number of format specs increases. Admittedly, it's only a
> few µs per spec in absolute numbers (at least on my Mac, with the handful of
> operations i tested),
What is the _factor_ of slowdown? (Multiplicative, as opposed to additive)
I.e., does the patch make things slower by 1%, or 10%, or 100%, or 1000%?
> and i don't know how that would compare with a
> 'lower-level' approach. I assume that multi-byte support will incur *some*
> penalty no matter how it's done, so maybe it's reasonable, idk.
I don't have an intuition for what the penalty should be, but to get a
sense of the orders of magnitude involved we could the behaviour of the
'printf' builtin when built with and without multibyte support.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] zformat -f has no multibyte support
2019-12-26 5:04 ` Daniel Shahaf
@ 2019-12-26 11:28 ` Daniel Shahaf
0 siblings, 0 replies; 5+ messages in thread
From: Daniel Shahaf @ 2019-12-26 11:28 UTC (permalink / raw)
To: zsh-workers
Daniel Shahaf wrote on Thu, 26 Dec 2019 05:04 +00:00:
> dana wrote on Thu, 26 Dec 2019 04:35 +00:00:
> > On 24 Dec 2019, at 14:23, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > > Actually, I don't suppose we could just call into the printf code directly, can
> > > we? It _works_ (see attachment), but it's not elegant.
> >
> > A very quick before/after test shows that it reduces performance quite a bit,
> > especially as the number of format specs increases. Admittedly, it's only a
> > few µs per spec in absolute numbers (at least on my Mac, with the handful of
> > operations i tested),
>
> What is the _factor_ of slowdown? (Multiplicative, as opposed to additive)
> I.e., does the patch make things slower by 1%, or 10%, or 100%, or 1000%?
According to dana and I's results, the factor is between x4 and x29:
[[[
% for 1 in */Src/zsh; do printf "%-16s%s\n" $1:h:h $($1 -fc 'time ( zformat -f x ${(r.1000000..%a.):-} a:o )'); done
( zformat -f x ${(r.1000000..%a.):-} a:o; ) 0.10s user 0.27s system 99% cpu 0.369 total
master
( zformat -f x ${(r.1000000..%a.):-} a:o; ) 1.04s user 0.33s system 99% cpu 1.372 total
patch
]]]
[[[
# Without patch
[DEV] zsh % repeat 5 utime zformat -f x ${(r.10000..%a.):-} a:o
[429 µs]
[507 µs]
[499 µs]
[468 µs]
[609 µs]
# With patch
[DEV] zsh % repeat 5 utime zformat -f x ${(r.10000..%a.):-} a:o
[16577 µs]
[14063 µs]
[13941 µs]
[13972 µs]
[13569 µs]
]]]
I don't know how to explain the difference (we've both had other things to look into).
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-12-26 11:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-23 22:24 [BUG] zformat -f has no multibyte support zsugabubus
2019-12-24 20:23 ` Daniel Shahaf
2019-12-26 4:35 ` dana
2019-12-26 5:04 ` Daniel Shahaf
2019-12-26 11:28 ` Daniel Shahaf
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).