zsh-workers
 help / color / mirror / code / Atom feed
* Feature request: ${(l[-3][0])var} to do left padding *without truncation*
@ 2024-08-03 14:22 Stephane Chazelas
  2024-08-03 14:51 ` Mikael Magnusson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-03 14:22 UTC (permalink / raw)
  To: Zsh hackers list

At the moment:

$ set -o extendedglob
$ a=1-12-123-1234-12345
$ echo ${a//(#m)<->/${(l[3][0])MATCH}}
001-012-123-234-345

Numbers are "l"eft padded to a length of 3 with 0s, but also
truncated to 3 digits when longer. It's often not desired.

Same happens with -3 (though that doesn't seem to be documented):

$ echo ${a//(#m)<->/${(l[-3][0])MATCH}}
001-012-123-234-345

One can avoid the truncation with things like:

$ echo ${a//(#m)<->/${(l[$#MATCH > 3 ? $#match : 3][0])MATCH}}
001-012-123-1234-12345

Including wia a math function helper:

$ atleast() (( $#MATCH > $1 ? $#MATCH : $1 )); functions -M atleast
$ echo ${a//(#m)<->/${(l[atleast(3)][0])MATCH}}
001-012-123-1234-12345

But that's a bit cumbersome and needs to be adapted when the "m"
flag is also used..

Would it be possible to have a way to disable the truncating,
maybe via negative numbers where ${(l[3][0])var} would
pad-and-truncate as is currently does, but ${(l[-3][0])var}
would pad only (leave longer strings alone)?

Same for "r"ight-padding.

Or is there already a smarter way to do it than with my ternary
arithmetic expression above?

-- 
Stephane


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 14:22 Feature request: ${(l[-3][0])var} to do left padding *without truncation* Stephane Chazelas
@ 2024-08-03 14:51 ` Mikael Magnusson
  2024-08-03 20:31 ` Stephane Chazelas
  2024-08-03 21:20 ` Bart Schaefer
  2 siblings, 0 replies; 13+ messages in thread
From: Mikael Magnusson @ 2024-08-03 14:51 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Aug 3, 2024 at 4:23 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> At the moment:
>
> $ set -o extendedglob
> $ a=1-12-123-1234-12345
> $ echo ${a//(#m)<->/${(l[3][0])MATCH}}
> 001-012-123-234-345
>
> Numbers are "l"eft padded to a length of 3 with 0s, but also
> truncated to 3 digits when longer. It's often not desired.
>
> Same happens with -3 (though that doesn't seem to be documented):
>
> $ echo ${a//(#m)<->/${(l[-3][0])MATCH}}
> 001-012-123-234-345
>
> One can avoid the truncation with things like:
>
> $ echo ${a//(#m)<->/${(l[$#MATCH > 3 ? $#match : 3][0])MATCH}}
> 001-012-123-1234-12345
>
> Including wia a math function helper:
>
> $ atleast() (( $#MATCH > $1 ? $#MATCH : $1 )); functions -M atleast
> $ echo ${a//(#m)<->/${(l[atleast(3)][0])MATCH}}
> 001-012-123-1234-12345
>
> But that's a bit cumbersome and needs to be adapted when the "m"
> flag is also used..
>
> Would it be possible to have a way to disable the truncating,
> maybe via negative numbers where ${(l[3][0])var} would
> pad-and-truncate as is currently does, but ${(l[-3][0])var}
> would pad only (leave longer strings alone)?
>
> Same for "r"ight-padding.
>
> Or is there already a smarter way to do it than with my ternary
> arithmetic expression above?

(No actual answer to the question follows)

I was trying to be clever, but ran into the same issue here as well,
% typeset -Z3 MATCH
% echo ${a//(#m)<->/$MATCH}
001-012-123-234-345

This kinda works but gives you an extra - at the end,
% printf '%03i-' ${(s:-:)a}
001-012-123-1234-12345-
% printf -v a '%03i-' ${(s:-:)a}; echo ${a%-}
001-012-123-1234-12345

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 14:22 Feature request: ${(l[-3][0])var} to do left padding *without truncation* Stephane Chazelas
  2024-08-03 14:51 ` Mikael Magnusson
@ 2024-08-03 20:31 ` Stephane Chazelas
  2024-08-03 20:42   ` Stephane Chazelas
  2024-08-03 20:53   ` Bart Schaefer
  2024-08-03 21:20 ` Bart Schaefer
  2 siblings, 2 replies; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-03 20:31 UTC (permalink / raw)
  To: Zsh hackers list

2024-08-03 15:22:41 +0100, Stephane Chazelas:
[...]
> Same happens with -3 (though that doesn't seem to be documented):
> 
> $ echo ${a//(#m)<->/${(l[-3][0])MATCH}}
> 001-012-123-234-345
[...]

Actually, the code has:

                case 'l':
                    tt = 1;
                /* fall through */
                case 'r':
                    s++;
                    /* delimiter position */
                    del0 = s;
                    num = get_intarg(&s, &dellen);
                    if (num < 0)
                        goto flagerr;

So it looks like it was intended at some point for it to be an error for the
number to be negative, but get_intarg has:

    if (ret < 0)
        ret = -ret;

And earlier versions of the "l"/"r" handling code had:

                case 'l':
                    tt = 1;
                /* fall through */
                case 'r':
                    t = get_strarg(++s);
                    if (!*t)
                        goto flagerr;
                    sav = *t;
                    *t = '\0';
                    d = dupstring(s + 1);
                    untokenize(d);
                    if ((num = mathevalarg(d, &d)) < 0)
                        num = -num;

The get_intarg was added in 3.0.1 in 1996.

Those l/r like most other flags added in 2.5 AFAICT and always accepted
negative padding length (sign ignored) though that was never documented.

-- 
Stephane



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 20:31 ` Stephane Chazelas
@ 2024-08-03 20:42   ` Stephane Chazelas
  2024-08-03 21:28     ` Bart Schaefer
  2024-08-03 20:53   ` Bart Schaefer
  1 sibling, 1 reply; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-03 20:42 UTC (permalink / raw)
  To: Zsh hackers list

2024-08-03 21:31:52 +0100, Stephane Chazelas:
[...]
> but get_intarg has:
> 
>     if (ret < 0)
>         ret = -ret;
[...]

For the record, other usages of get_intarg are in ${(I[3])var/x/y} to replace
the 3rd occurrence, with again an ineffective:

                case 'I':
                    s++;
                    flnum = get_intarg(&s, &dellen);
                    if (flnum < 0)
                        goto flagerr;
                    s--;
                    break;

Here a negative value could be useful to replace the 3rd last occurrence.

And in $var:F[3]s/X/Y/ to repeat the substitution 3 times. This time no check
for negative values.

-- 
Stephane


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 20:31 ` Stephane Chazelas
  2024-08-03 20:42   ` Stephane Chazelas
@ 2024-08-03 20:53   ` Bart Schaefer
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2024-08-03 20:53 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Aug 3, 2024 at 1:31 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
>                     num = get_intarg(&s, &dellen);
>                     if (num < 0)
>                         goto flagerr;
>
> So it looks like it was intended at some point for it to be an error for the
> number to be negative

A closer examination of get_intarg() indicates that a negative return
means a parse error, not that the argument examined is allowed to be
negative.  E.g.:

    char *t = get_strarg(*s, &arglen);
    ...
    if (!*t)
        return -1;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 14:22 Feature request: ${(l[-3][0])var} to do left padding *without truncation* Stephane Chazelas
  2024-08-03 14:51 ` Mikael Magnusson
  2024-08-03 20:31 ` Stephane Chazelas
@ 2024-08-03 21:20 ` Bart Schaefer
  2 siblings, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2024-08-03 21:20 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Aug 3, 2024 at 7:22 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> $ set -o extendedglob
> $ a=1-12-123-1234-12345
> $ echo ${a//(#m)<->/${(l[3][0])MATCH}}
> 001-012-123-234-345
>
> Numbers are "l"eft padded to a length of 3 with 0s, but also
> truncated to 3 digits when longer. It's often not desired.

In the current dev version / forthcoming release (whenever that happens):

echo ${a//(#m)<->/${|printf -v REPLY "%03d" "$MATCH"}}


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 20:42   ` Stephane Chazelas
@ 2024-08-03 21:28     ` Bart Schaefer
  2024-08-04  6:44       ` Stephane Chazelas
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2024-08-03 21:28 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Aug 3, 2024 at 1:42 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> And in $var:F[3]s/X/Y/ to repeat the substitution 3 times. This time no check
> for negative values.

On Sat, Aug 3, 2024 at 1:53 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> A closer examination of get_intarg() indicates that a negative return
> means a parse error

This may mean that the :F case is hiding a bug.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-03 21:28     ` Bart Schaefer
@ 2024-08-04  6:44       ` Stephane Chazelas
  2024-08-04  7:53         ` Stephane Chazelas
  0 siblings, 1 reply; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-04  6:44 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-08-03 14:28:03 -0700, Bart Schaefer:
> On Sat, Aug 3, 2024 at 1:42 PM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > And in $var:F[3]s/X/Y/ to repeat the substitution 3 times. This time no check
> > for negative values.
> On Sat, Aug 3, 2024 at 1:53 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > A closer examination of get_intarg() indicates that a negative return
> > means a parse error

Ah yes, sorry, I missed the point about get_intarg returning -1
upon error.

> This may mean that the :F case is hiding a bug.

Yes, negative values are handled like "f" (repeat as long as it
changes something), but get_intarg() only returns negative upon
error.

So for instance a=a; echo $a:F[1-]s/a/aa/ outputs an error but
runs into an infinite loop.

Maybe best would be to have get_intarg() return true/false and
the value by reference, and handle negative value on a case by
case basis in a more useful way:

${(I[-3])var/x/y} substitute the 3rd last occurrence
${(l[-3][0])var}  pad to length 3 without truncating
${(r[-3][0])var}  pad to length 3 without truncating
$var:F[-3]s/x/y/  error or treat like "f"

-- 
Stephane


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-04  6:44       ` Stephane Chazelas
@ 2024-08-04  7:53         ` Stephane Chazelas
  2024-08-04 10:56           ` Stephane Chazelas
  2024-08-05  2:15           ` Bart Schaefer
  0 siblings, 2 replies; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-04  7:53 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

2024-08-04 07:44:26 +0100, Stephane Chazelas:
[...]
> Maybe best would be to have get_intarg() return true/false and
> the value by reference, and handle negative value on a case by
> case basis in a more useful way:
[...]
> ${(l[-3][0])var}  pad to length 3 without truncating
> ${(r[-3][0])var}  pad to length 3 without truncating
[...]

Ah, I forgot those 2 can be combined, so there's the question of
what to do if one is positive and one is negative:

1- either skip truncating at all if any is negative
2- or truncate only on the side where it's positive. Would that even
  make sense?

Or instead of relying on negative value, use another flag to
skip the truncation?

Here's a patch doing 1 and returning errors for F[-1] I[-1]:

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 7eade4a11..aa497f415 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1403,6 +1403,10 @@ Left and right padding may be used together.  In this case the strategy
 is to apply left padding to the first half width of each of the resulting
 words, and right padding to the second half.  If the string to be
 padded has odd width the extra padding is applied on the left.
+
+If the left or right padding length is negative, then the absolute value is
+used and the truncating is disabled (strings longer than the total padding
+length are left alone).
 )
 item(tt(s:)var(string)tt(:))(
 Force field splitting at the
diff --git a/Src/subst.c b/Src/subst.c
index a079672df..79e147963 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -898,7 +898,8 @@ dopadding(char *str, int prenum, int postnum, char *preone, char *postone,
     )
 {
     char *def, *ret, *t, *r;
-    int ls, ls2, lpreone, lpostone, lpremul, lpostmul, lr, f, m, c, cc, cl;
+    int ls, ls2, lpreone, lpostone, lpremul, lpostmul, lr, f, m, c, cc, cl, total;
+    int padonly = 0;
     convchar_t cchar;
 
     MB_METACHARINIT();
@@ -922,7 +923,17 @@ dopadding(char *str, int prenum, int postnum, char *preone, char *postone,
     lpremul = MB_METASTRLEN2(premul, multi_width);
     lpostmul = MB_METASTRLEN2(postmul, multi_width);
 
-    if (prenum + postnum == ls)
+    if (prenum < 0) {
+	padonly = 1;
+	prenum = -prenum;
+    }
+    if (postnum < 0) {
+	padonly = 1;
+	postnum = -postnum;
+    }
+    total = prenum + postnum;
+
+    if (total == ls || (padonly && (total < ls)))
 	return str;
 
     /*
@@ -1425,7 +1436,7 @@ get_strarg(char *s, int *lenp)
 
 /**/
 static int
-get_intarg(char **s, int *delmatchp)
+get_intarg(char **s, int *delmatchp, int *result)
 {
     int arglen;
     char *t = get_strarg(*s, &arglen);
@@ -1434,24 +1445,23 @@ get_intarg(char **s, int *delmatchp)
 
     *delmatchp = 0;
     if (!*t)
-	return -1;
+	return 0;
     sav = *t;
     *t = '\0';
     p = dupstring(*s + arglen);
     *s = t + arglen;
     *t = sav;
     if (parsestr(&p))
-	return -1;
+	return 0;
     singsub(&p);
     if (errflag)
-	return -1;
+	return 0;
     ret = mathevali(p);
     if (errflag)
-	return -1;
-    if (ret < 0)
-	ret = -ret;
+	return 0;
     *delmatchp = arglen;
-    return ret;
+    *result = ret;
+    return 1;
 }
 
 /* Parsing for the (e) flag. */
@@ -1772,7 +1782,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
     /* Replacement string for /orig/repl and //orig/repl */
     char *replstr = NULL;
     /* The numbers for (l) and (r) */
-    zlong prenum = 0, postnum = 0;
+    int prenum = 0, postnum = 0;
 #ifdef MULTIBYTE_SUPPORT
     /* The (m) flag: use width of multibyte characters */
     int multi_width = 0;
@@ -2130,7 +2140,6 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 	} else if (c == '(' || c == Inpar) {
 	    char *t, sav;
 	    int tt = 0;
-	    zlong num;
 	    /*
 	     * The (p) flag is only remembered within
 	     * this block.  It says we do print-style handling
@@ -2187,9 +2196,13 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 		    break;
 		case 'I':
 		    s++;
-		    flnum = get_intarg(&s, &dellen);
-		    if (flnum < 0)
+		    if (!get_intarg(&s, &dellen, &flnum))
+			goto flagerr;
+		    if (flnum < 0) {
+			/* TODO: handle -3 as 3rd last */
+			zerr("I flag argument must be positive");
 			goto flagerr;
+		    }
 		    s--;
 		    break;
 
@@ -2322,13 +2335,8 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 		    s++;
 		    /* delimiter position */
 		    del0 = s;
-		    num = get_intarg(&s, &dellen);
-		    if (num < 0)
+		    if (!get_intarg(&s, &dellen, tt ? &prenum : &postnum))
 			goto flagerr;
-		    if (tt)
-			prenum = num;
-		    else
-			postnum = num;
 		    /* must have same delimiter if more arguments */
 		    if (!dellen || memcmp(del0, s, dellen)) {
 			/* decrement since loop will increment */
@@ -4702,7 +4710,11 @@ modify(char **str, char **ptr, int inbrace)
 		break;
 	    case 'F':
 		(*ptr)++;
-		rec = get_intarg(ptr, &dellen);
+		if (!get_intarg(ptr, &dellen, &rec)) return;
+		if (rec < 0) {
+		    zerr("F flag argument must be positive");
+		    return;
+		}
 		break;
 	    default:
 		*ptr = lptr;

-- 
Stephane


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-04  7:53         ` Stephane Chazelas
@ 2024-08-04 10:56           ` Stephane Chazelas
  2024-08-05  2:15           ` Bart Schaefer
  1 sibling, 0 replies; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-04 10:56 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

2024-08-04 08:53:17 +0100, Stephane Chazelas:
[...]
> Here's a patch doing 1 and returning errors for F[-1] I[-1]:
[...]

Oh, that's wrong when there's a l[-3][0][prefix] or
r[-3][0][suffix]. I suppose when truncation is disabled, prefix
and suffix should be added unconditionally and not truncated.

I'll see if I can come up with a better patch.

-- 
Stephane


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-04  7:53         ` Stephane Chazelas
  2024-08-04 10:56           ` Stephane Chazelas
@ 2024-08-05  2:15           ` Bart Schaefer
  2024-08-05  3:56             ` Bart Schaefer
  1 sibling, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2024-08-05  2:15 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

On Sun, Aug 4, 2024 at 12:53 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> -get_intarg(char **s, int *delmatchp)
> +get_intarg(char **s, int *delmatchp, int *result)

I have a slight preference for leaving get_intarg() as it is, and
instead have the caller check for a leading '-' before calling it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-05  2:15           ` Bart Schaefer
@ 2024-08-05  3:56             ` Bart Schaefer
  2024-08-05  6:10               ` Stephane Chazelas
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2024-08-05  3:56 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Aug 4, 2024 at 7:15 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> I have a slight preference for leaving get_intarg() as it is, and
> instead have the caller check for a leading '-' before calling it.

... but that depends on whether "-" is supposed to be a signifier like
in printf, or whether a negative is meant to be computable.  Hm.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature request: ${(l[-3][0])var} to do left padding *without truncation*
  2024-08-05  3:56             ` Bart Schaefer
@ 2024-08-05  6:10               ` Stephane Chazelas
  0 siblings, 0 replies; 13+ messages in thread
From: Stephane Chazelas @ 2024-08-05  6:10 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-08-04 20:56:19 -0700, Bart Schaefer:
> On Sun, Aug 4, 2024 at 7:15 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > I have a slight preference for leaving get_intarg() as it is, and
> > instead have the caller check for a leading '-' before calling it.
> 
> ... but that depends on whether "-" is supposed to be a signifier like
> in printf, or whether a negative is meant to be computable.  Hm.

It being a l[arithmetic-expression][padding], the latter would
make more sense I'd think.

Like in width=-3, ${(l[width][pad])array}.

In the case of the "I" flag (also using get_intarg), the
${(I[-1])var/x/y} (not implemented in my patch, just a TODO
there) would be consistent with $array[-1] (I guess a
${(I[2,-3])var/x/y} could also be added and then get_intarg()
could no longer be used).

If requiring a prefix to the number, then maybe using:

${(l[<arith][pad])var}: pad+truncate (default if <, > ommitted)
${(l[>arith][pad])var}: pad only

would be better, as a leading </> is otherwise not valid in an
arithmetic expression.

Now, beside the issue with prefix/suffix, my patch was also
wrong when s and r are combined as then two halves of the string
are padded+truncated separately. It's not just a matter of
leaving the string (+prefix+suffix) alone if it would fit in the
total padding length, as halves could still end up being
truncated then. Looks like I'll need to give it more thought.

-- 
Stephane


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-08-05  6:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-03 14:22 Feature request: ${(l[-3][0])var} to do left padding *without truncation* Stephane Chazelas
2024-08-03 14:51 ` Mikael Magnusson
2024-08-03 20:31 ` Stephane Chazelas
2024-08-03 20:42   ` Stephane Chazelas
2024-08-03 21:28     ` Bart Schaefer
2024-08-04  6:44       ` Stephane Chazelas
2024-08-04  7:53         ` Stephane Chazelas
2024-08-04 10:56           ` Stephane Chazelas
2024-08-05  2:15           ` Bart Schaefer
2024-08-05  3:56             ` Bart Schaefer
2024-08-05  6:10               ` Stephane Chazelas
2024-08-03 20:53   ` Bart Schaefer
2024-08-03 21:20 ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).