zsh-workers
 help / color / mirror / code / Atom feed
* $var not expanded in ${x?$var}
@ 2023-01-13  8:02 Stephane Chazelas
  2023-01-16  8:35 ` Daniel Shahaf
  2023-01-16 17:15 ` Peter Stephenson
  0 siblings, 2 replies; 33+ messages in thread
From: Stephane Chazelas @ 2023-01-13  8:02 UTC (permalink / raw)
  To: Zsh hackers list

$ zsh -c 'echo ${1?$USERNAME}'
zsh:1: 1: $USERNAME

No quote removal either:

$ zsh -c 'echo ${1?"x"}'
zsh:1: 1: "x"

Doc says:

> In any of the above expressions that test a variable and substitute an
> alternate WORD, note that you can use standard shell quoting in the WORD
> value to selectively override the splitting done by the SH_WORD_SPLIT
> option and the = flag, but not splitting by the s:STRING: flag.

POSIX:

> ${parameter:?[word]}
> Indicate Error if Null or Unset. If parameter is unset or
> null, the *expansion* of word (or a message indicating it is
> unset if word is omitted) shall be written to standard error
> and the shell exits with a non-zero exit status. Otherwise,
> the value of parameter shall be substituted. An interactive
> shell need not exit.

(also note that line number being printed twice)

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2023-01-13  8:02 $var not expanded in ${x?$var} Stephane Chazelas
@ 2023-01-16  8:35 ` Daniel Shahaf
  2023-01-16 17:15 ` Peter Stephenson
  1 sibling, 0 replies; 33+ messages in thread
From: Daniel Shahaf @ 2023-01-16  8:35 UTC (permalink / raw)
  To: Stephane Chazelas, Zsh hackers list

Stephane Chazelas wrote on Fri, 13 Jan 2023 08:02 +00:00:
> $ zsh -c 'echo ${1?$USERNAME}'
> zsh:1: 1: $USERNAME
>
> No quote removal either:
>
> $ zsh -c 'echo ${1?"x"}'
> zsh:1: 1: "x"
>

Thanks for the report.

> (also note that line number being printed twice)

No, it's not.  The second "1" is the parameter name:

$ zsh -fc 'echo ${foo?$USERNAME}' 
zsh:1: foo: $USERNAME

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2023-01-13  8:02 $var not expanded in ${x?$var} Stephane Chazelas
  2023-01-16  8:35 ` Daniel Shahaf
@ 2023-01-16 17:15 ` Peter Stephenson
  2024-02-20  7:05   ` Stephane Chazelas
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Stephenson @ 2023-01-16 17:15 UTC (permalink / raw)
  To: zsh workers

> On 13/01/2023 08:02 Stephane Chazelas <stephane@chazelas.org> wrote:
> 
>  
> $ zsh -c 'echo ${1?$USERNAME}'
> zsh:1: 1: $USERNAME
> 
> No quote removal either:
> 
> $ zsh -c 'echo ${1?"x"}'
> zsh:1: 1: "x"
> 
> Doc says:
> 
> > In any of the above expressions that test a variable and substitute an
> > alternate WORD, note that you can use standard shell quoting in the WORD
> > value to selectively override the splitting done by the SH_WORD_SPLIT
> > option and the = flag, but not splitting by the s:STRING: flag.

In fact the shell does not "substitute an alternate WORD" here, it
just prints it out, but the difference is easy to miss and expanding it
seems the right thing to do from other points of view, so I've noted it
in the doc.

pws

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index eb8cdbae5..857715a95 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -665,7 +665,9 @@ item(tt(${)var(name)tt(:?)var(word)tt(}))(
 In the first form, if var(name) is set, or in the second form if var(name)
 is both set and non-null, then substitute its value; otherwise, print
 var(word) and exit from the shell.  Interactive shells instead return to
-the prompt.  If var(word) is omitted, then a standard message is printed.
+the prompt.  If var(word) is omitted, then a standard message is
+printed.  Note that var(word) is expanded even though its value
+is not substituted onto the command line.
 )
 enditem()
 
diff --git a/Src/subst.c b/Src/subst.c
index 897188862..4ad9fee1a 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -3076,7 +3076,11 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 	    if (vunset) {
                 if (isset(EXECOPT)) {
                     *idend = '\0';
-                    zerr("%s: %s", idbeg, *s ? s : "parameter not set");
+		    if (*s){
+			singsub(&s);
+			zerr("%s: %s", idbeg, s);
+		    } else
+			zerr("%s: %s", idbeg, "parameter not set");
                     /*
                      * In interactive shell we need to return to
                      * top-level prompt --- don't clear this error
diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index 6bf55b4db..7dd5d82d7 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -110,6 +110,11 @@
 *>*foo:1: 1: no arguments given
 >reached
 
+  message="expand me and remove quotes"
+  (: ${UNSET_PARAM?$message})
+1:${...?....} performs expansion on the message
+?(eval):2: UNSET_PARAM: expand me and remove quotes
+
   print ${set1:+word1} ${set1+word2} ${null1:+word3} ${null1+word4}
   print ${unset1:+word5} ${unset1+word6}
 0:${...:+...}, ${...+...}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2023-01-16 17:15 ` Peter Stephenson
@ 2024-02-20  7:05   ` Stephane Chazelas
  2024-02-20 17:45     ` Bart Schaefer
  0 siblings, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-20  7:05 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh workers

2023-01-16 17:15:35 +0000, Peter Stephenson:
> > On 13/01/2023 08:02 Stephane Chazelas <stephane@chazelas.org> wrote:
> >  
> > $ zsh -c 'echo ${1?$USERNAME}'
> > zsh:1: 1: $USERNAME
> > 
> > No quote removal either:
> > 
> > $ zsh -c 'echo ${1?"x"}'
> > zsh:1: 1: "x"
> > 
> > Doc says:
> > 
> > > In any of the above expressions that test a variable and substitute an
> > > alternate WORD, note that you can use standard shell quoting in the WORD
> > > value to selectively override the splitting done by the SH_WORD_SPLIT
> > > option and the = flag, but not splitting by the s:STRING: flag.
> 
> In fact the shell does not "substitute an alternate WORD" here, it
> just prints it out, but the difference is easy to miss and expanding it
> seems the right thing to do from other points of view, so I've noted it
> in the doc.
[...]

Thanks.

$ echo ${1?$'a\nb'}
zsh: 1: a\nb
$ print -Prv err '%F{red}BAD%f'
$ a=${1?$err}
zsh: 1: ^[[31mBAD^[[39m

That transformation of newline into \n and other nicezputs'ness
seems undesirable to me though.

It doesn't seem unreasonable for someone to want to provide a
multiline error message or one with terminal escape sequences.

See for instance
https://unix.stackexchange.com/questions/769679/bash-parameter-substitution-with-multi-line-error-message

AFAICT, except for bash doing the IFS-split + join, all other
shells send the expansion verbatim to stderr.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2024-02-20  7:05   ` Stephane Chazelas
@ 2024-02-20 17:45     ` Bart Schaefer
  2024-02-20 19:39       ` Stephane Chazelas
  0 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-20 17:45 UTC (permalink / raw)
  To: zsh workers

On Mon, Feb 19, 2024 at 11:05 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> $ echo ${1?$'a\nb'}
> zsh: 1: a\nb
> $ print -Prv err '%F{red}BAD%f'
> $ a=${1?$err}
> zsh: 1: ^[[31mBAD^[[39m
>
> That transformation of newline into \n and other nicezputs'ness
> seems undesirable to me though.

It's probably worthwhile for utils.c:zerrmsg() to have a %-placeholder
that means a literal string.  Not hard to do, just need to choose a
letter.  %S ?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2024-02-20 17:45     ` Bart Schaefer
@ 2024-02-20 19:39       ` Stephane Chazelas
  2024-02-21  4:44         ` Bart Schaefer
  0 siblings, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-20 19:39 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-20 09:45:48 -0800, Bart Schaefer:
> On Mon, Feb 19, 2024 at 11:05 PM Stephane Chazelas
> <stephane@chazelas.org> wrote:
[...]
> > That transformation of newline into \n and other nicezputs'ness
> > seems undesirable to me though.
> 
> It's probably worthwhile for utils.c:zerrmsg() to have a %-placeholder
> that means a literal string.  Not hard to do, just need to choose a
> letter.  %S ?

That was my thought as well with %S or %#s (alternative form of
%s) or %r (raw). But then I thought maybe there was already
something available to print some raw error.

Looking at https://austingroupbugs.net/view.php?id=1771 where
availability of printf %X directives is discussed, looks like %S
is the least likely to conflict with printf(3) or printf(1) if
that was ever to be a concern.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2024-02-20 19:39       ` Stephane Chazelas
@ 2024-02-21  4:44         ` Bart Schaefer
  2024-02-21 19:45           ` Stephane Chazelas
  0 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-21  4:44 UTC (permalink / raw)
  To: zsh workers

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Tue, Feb 20, 2024 at 11:39 AM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> Looking at https://austingroupbugs.net/view.php?id=1771 where
> availability of printf %X directives is discussed, looks like %S
> is the least likely to conflict with printf(3) or printf(1) if
> that was ever to be a concern.

Zsh contains approximately 500 calls to output error messages, at
least a third of which include strings that would have come from user
input.  I wonder how many of those are eventually going to want %S or
at least some less-than-"nice" processing.

[-- Attachment #2: zerr%S.txt --]
[-- Type: text/plain, Size: 1156 bytes --]

diff --git a/Src/subst.c b/Src/subst.c
index 650c09de2..ddf9f9de9 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -3272,7 +3272,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
                     *idend = '\0';
 		    if (*s){
 			singsub(&s);
-			zerr("%s: %s", idbeg, s);
+			zerr("%s: %S", idbeg, s);
 		    } else
 			zerr("%s: %s", idbeg, "parameter not set");
                     /*
diff --git a/Src/utils.c b/Src/utils.c
index 0fda92709..0482f3316 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -123,6 +123,7 @@ set_widearray(char *mb_array, Widechar_array wca)
 
    Code	Argument types		Prints
    %s	const char *		C string (null terminated)
+   %S	const char *		C string (null terminated), output raw
    %l	const char *, int	C string of given length (null not required)
    %L	long			decimal value
    %d	int			decimal value
@@ -309,6 +310,10 @@ zerrmsg(FILE *file, const char *fmt, va_list ap)
 		str = va_arg(ap, const char *);
 		nicezputs(str, file);
 		break;
+	    case 'S':
+		str = va_arg(ap, const char *);
+		fprintf(file, "%s", str);
+		break;
 	    case 'l': {
 		char *s;
 		str = va_arg(ap, const char *);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2024-02-21  4:44         ` Bart Schaefer
@ 2024-02-21 19:45           ` Stephane Chazelas
  2024-02-21 19:52             ` Bart Schaefer
  0 siblings, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-21 19:45 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-20 20:44:47 -0800, Bart Schaefer:
[...]
> +	    case 'S':
> +		str = va_arg(ap, const char *);
> +		fprintf(file, "%s", str);
> +		break;
[...]

I suspect there's something wrong with that as it could not
possibly work for ${var?$'\0'}.

(could also be simplified to fputs(str, file) though I suspect
it makes no difference with modern compilers).

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2024-02-21 19:45           ` Stephane Chazelas
@ 2024-02-21 19:52             ` Bart Schaefer
  2024-02-21 20:21               ` Stephane Chazelas
  0 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-21 19:52 UTC (permalink / raw)
  To: zsh workers

On Wed, Feb 21, 2024 at 11:45 AM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> 2024-02-20 20:44:47 -0800, Bart Schaefer:
> [...]
> > +         case 'S':
> > +             str = va_arg(ap, const char *);
> > +             fprintf(file, "%s", str);
> > +             break;
> [...]
>
> I suspect there's something wrong with that as it could not
> possibly work for ${var?$'\0'}.

What would you expect it to do?  You can't have it both raw and not
raw.  With the patch applied:

% : ${var?$'\0oopsie'}
zsh: var: ? oopsie

Might try running that through valgrind later, not presently on a
machine that has valgrind.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: $var not expanded in ${x?$var}
  2024-02-21 19:52             ` Bart Schaefer
@ 2024-02-21 20:21               ` Stephane Chazelas
  2024-02-22  0:46                 ` [PATCH] unmetafy " Bart Schaefer
  0 siblings, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-21 20:21 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-21 11:52:51 -0800, Bart Schaefer:
> On Wed, Feb 21, 2024 at 11:45 AM Stephane Chazelas
> <stephane@chazelas.org> wrote:
> >
> > 2024-02-20 20:44:47 -0800, Bart Schaefer:
> > [...]
> > > +         case 'S':
> > > +             str = va_arg(ap, const char *);
> > > +             fprintf(file, "%s", str);
> > > +             break;
> > [...]
> >
> > I suspect there's something wrong with that as it could not
> > possibly work for ${var?$'\0'}.
> 
> What would you expect it to do?  You can't have it both raw and not
> raw.

I would expect it to write the NUL character ('\0') to stderr,
which fprintf couldn't do as it takes a NUL-delimited string.

With the patch applied, I get:

$ (: ${1?$'a\0b'}) |& sed -n l
zsh: 1: a\203 b$

Maybe needs some unmetafy + fwrite.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] unmetafy Re: $var not expanded in ${x?$var}
  2024-02-21 20:21               ` Stephane Chazelas
@ 2024-02-22  0:46                 ` Bart Schaefer
  2024-02-22  7:23                   ` Stephane Chazelas
  0 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-22  0:46 UTC (permalink / raw)
  To: zsh workers

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Wed, Feb 21, 2024 at 12:21 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> > What would you expect it to do?  You can't have it both raw and not
> > raw.
>
> I would expect it to write the NUL character ('\0') to stderr
[...]
> Maybe needs some unmetafy + fwrite.

Bleah, the argument to zerrmsg is const char * so can't be passed to
unmetafy without copy.

Is this more like it?

[-- Attachment #2: zerr%S.txt --]
[-- Type: text/plain, Size: 1184 bytes --]

diff --git a/Src/subst.c b/Src/subst.c
index 650c09de2..ddf9f9de9 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -3272,7 +3272,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
                     *idend = '\0';
 		    if (*s){
 			singsub(&s);
-			zerr("%s: %s", idbeg, s);
+			zerr("%s: %S", idbeg, s);
 		    } else
 			zerr("%s: %s", idbeg, "parameter not set");
                     /*
diff --git a/Src/utils.c b/Src/utils.c
index 0fda92709..a15284815 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -123,6 +123,7 @@ set_widearray(char *mb_array, Widechar_array wca)
 
    Code	Argument types		Prints
    %s	const char *		C string (null terminated)
+   %S	const char *		C string (null terminated), output raw
    %l	const char *, int	C string of given length (null not required)
    %L	long			decimal value
    %d	int			decimal value
@@ -309,6 +310,10 @@ zerrmsg(FILE *file, const char *fmt, va_list ap)
 		str = va_arg(ap, const char *);
 		nicezputs(str, file);
 		break;
+	    case 'S':
+		str = va_arg(ap, const char *);
+		fwrite(unmetafy(dupstring(str), &num), num, 1, file);
+		break;
 	    case 'l': {
 		char *s;
 		str = va_arg(ap, const char *);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] unmetafy Re: $var not expanded in ${x?$var}
  2024-02-22  0:46                 ` [PATCH] unmetafy " Bart Schaefer
@ 2024-02-22  7:23                   ` Stephane Chazelas
  2024-02-22  7:55                     ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas
  2024-02-22  8:34                     ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa
  0 siblings, 2 replies; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-22  7:23 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-21 16:46:23 -0800, Bart Schaefer:
[...]
>     Code	Argument types		Prints
>     %s	const char *		C string (null terminated)
> +   %S	const char *		C string (null terminated), output raw

May be worth pointing out there that the string is expected to
be metafied (for both %s and %S).

>     %l	const char *, int	C string of given length (null not required)

Would that one be expected to be metafied?

>     %L	long			decimal value
>     %d	int			decimal value
> @@ -309,6 +310,10 @@ zerrmsg(FILE *file, const char *fmt, va_list ap)
>  		str = va_arg(ap, const char *);
>  		nicezputs(str, file);
>  		break;
> +	    case 'S':
> +		str = va_arg(ap, const char *);
> +		fwrite(unmetafy(dupstring(str), &num), num, 1, file);
> +		break;
[...]

Being no C expert, I wonder if it's safe (portable) to set and
use num in the same call like that. Is it guaranteed to be done
in the right order?

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-22  7:23                   ` Stephane Chazelas
@ 2024-02-22  7:55                     ` Stephane Chazelas
  2024-02-22 17:02                       ` Bart Schaefer
                                         ` (2 more replies)
  2024-02-22  8:34                     ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa
  1 sibling, 3 replies; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-22  7:55 UTC (permalink / raw)
  To: Bart Schaefer, zsh workers

2024-02-22 07:23:13 +0000, Stephane Chazelas:
> 2024-02-21 16:46:23 -0800, Bart Schaefer:
> [...]
> >     Code	Argument types		Prints
> >     %s	const char *		C string (null terminated)
> > +   %S	const char *		C string (null terminated), output raw
> 
> May be worth pointing out there that the string is expected to
> be metafied (for both %s and %S).
> 
> >     %l	const char *, int	C string of given length (null not required)
> 
> Would that one be expected to be metafied?

I see it used in this error message:

$ printf '%d\n' $'1+|a\x83 c'
zsh: bad math expression: operand expected at `|a^@c'
0
$ printf '%d\n' $'1+|a\0b'
zsh: bad math expression: operand expected at `|a'
0
$ printf '%d\n' '1+|ÃÃÃÃÃÃ'
zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...'
0
$ ((1+|ÃÃÃÃÃÃ))
zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] unmetafy Re: $var not expanded in ${x?$var}
  2024-02-22  7:23                   ` Stephane Chazelas
  2024-02-22  7:55                     ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas
@ 2024-02-22  8:34                     ` Roman Perepelitsa
  2024-02-22 17:07                       ` Bart Schaefer
  1 sibling, 1 reply; 33+ messages in thread
From: Roman Perepelitsa @ 2024-02-22  8:34 UTC (permalink / raw)
  To: Bart Schaefer, zsh workers

On Thu, Feb 22, 2024 at 8:23 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> > +             fwrite(unmetafy(dupstring(str), &num), num, 1, file);
>
> Being no C expert, I wonder if it's safe (portable) to set and
> use num in the same call like that. Is it guaranteed to be done
> in the right order?

In C, function arguments are evaluated in unspecified order. The
behavior of the fwrite() call above depends on the evaluation order of
its arguments, and is therefore unspecified.

Evaluation can even be interleaved. Consider:

    a(b(), c(d()))

Here, it is possible that functions will be invoked in this order: d,
b, c. Basically, function arguments are evaluated before the function
is invoked (obviously), but other than that there are no evaluation
order guarantees. Here's another example of unspecified behavior:

    *f() = g();

Here, f() and g() can be evaluated in any order.

There is a related gotcha where the same object is accessed more than
once between sequence points and at least one of these accesses is a
write. Like this:

    int a = 0;
    int b = ++a + a;

Or like this:

    foo(++a, a);

This is undefined behavior, not just unspecified.

Roman.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-22  7:55                     ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas
@ 2024-02-22 17:02                       ` Bart Schaefer
  2024-02-22 22:31                         ` Bart Schaefer
  2024-02-23  0:33                       ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
  2024-02-25  5:06                       ` [PATCH] count multibyte and metafied characters correctly for math ops errors Bart Schaefer
  2 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-22 17:02 UTC (permalink / raw)
  To: zsh workers

On Wed, Feb 21, 2024 at 11:55 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> 2024-02-22 07:23:13 +0000, Stephane Chazelas:
> > 2024-02-21 16:46:23 -0800, Bart Schaefer:
> > [...]
> > >     Code    Argument types          Prints
> > >     %s      const char *            C string (null terminated)
> > > +   %S      const char *            C string (null terminated), output raw
> >
> > May be worth pointing out there that the string is expected to
> > be metafied (for both %s and %S).

Actually the string is NOT really expected to be metafied ... usually
it's a plain string hardwired in the calling code, so the caller
actually should have the responsibility for unmetafy.  In the
particular case of %l --

> > >     %l      const char *, int       C string of given length (null not required)
> >
> > Would that one be expected to be metafied?
>
> I see it used in this error message:
>
> $ printf '%d\n' $'1+|a\x83 c'
> zsh: bad math expression: operand expected at `|a^@c'

-- there's no (or not presently) any version of unmetafy that works
without nul-termination.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] unmetafy Re: $var not expanded in ${x?$var}
  2024-02-22  8:34                     ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa
@ 2024-02-22 17:07                       ` Bart Schaefer
  0 siblings, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-22 17:07 UTC (permalink / raw)
  To: zsh workers

On Thu, Feb 22, 2024 at 12:35 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> On Thu, Feb 22, 2024 at 8:23 AM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > > +             fwrite(unmetafy(dupstring(str), &num), num, 1, file);
> >
> > Being no C expert, I wonder if it's safe (portable) to set and
> > use num in the same call like that. Is it guaranteed to be done
> > in the right order?
>
> In C, function arguments are evaluated in unspecified order.

Yeah, been too long since I had to think about sequence points.

However, per commentary in the other thread, I'm going to recommend
against using this patch in the first place.  We need to rethink this
whole approach.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-22 17:02                       ` Bart Schaefer
@ 2024-02-22 22:31                         ` Bart Schaefer
  2024-02-23  0:49                           ` Bart Schaefer
  2024-02-23 19:27                           ` Stephane Chazelas
  0 siblings, 2 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-22 22:31 UTC (permalink / raw)
  To: zsh workers

On Thu, Feb 22, 2024 at 9:07 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> However, per commentary in the other thread, I'm going to recommend
> against using this patch in the first place.  We need to rethink this
> whole approach.

On Thu, Feb 22, 2024 at 9:02 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Actually the string is NOT really expected to be metafied

... in fact, %l allocates space for it to be metafied during output.

> ... usually it's a plain string hardwired in the calling code

Curiously, the only existing uses of %l are for strings that came from
user input (arithmetic or a history reference) so they're almost
certainly already metafied.

So ... one possible approach is to declare that %l is the way to
produce "raw" output, and have it perform the unmetafy() + fwrite().
However, it's not clear to me that's preferred in Stephane's examples
like this one:

> > $ printf '%d\n' $'1+|a\x83 c'
> > zsh: bad math expression: operand expected at `|a^@c'

Do we really want to "see" a raw \x83 there?  No one would
intentionally use a math syntax error e.g. to produce terminal control
sequences.

> the caller actually should have the responsibility for unmetafy.

Another possibility therefore is to to have %l do raw output but NOT
unmetafy(), so the caller would need to either unmetafy() and pass a
length, or nicedup() the string when that was the desired output
(which handles the unmetafy() internally, and nul-terminates).

The final option would be to leave %l alone and add %S with a length
argument ... but I don't see that as much better than the foregoing,
given the very limited use of %l so far.

Opinions?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-22  7:55                     ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas
  2024-02-22 17:02                       ` Bart Schaefer
@ 2024-02-23  0:33                       ` Bart Schaefer
  2024-02-25  5:06                       ` [PATCH] count multibyte and metafied characters correctly for math ops errors Bart Schaefer
  2 siblings, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-23  0:33 UTC (permalink / raw)
  To: zsh workers

Incidentally ...

On Wed, Feb 21, 2024 at 11:55 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> $ printf '%d\n' $'1+|a\0b'
> zsh: bad math expression: operand expected at `|a'

That doesn't have anything to do with error message presentation, the
integer parser itself stops at the nul byte before even checking the
error.  GIGO.

And the differences in these two ...

> $ printf '%d\n' '1+|ÃÃÃÃÃÃ'
> zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...'

> $ ((1+|ÃÃÃÃÃÃ))
> zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'

... are down to the parsing of quoted arguments vs. ((...)), with some
fudge because of the way the use of ellipsis is calculated.  This
would have to be sorted out in bin_print() or something.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-22 22:31                         ` Bart Schaefer
@ 2024-02-23  0:49                           ` Bart Schaefer
  2024-02-23 19:27                           ` Stephane Chazelas
  1 sibling, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-23  0:49 UTC (permalink / raw)
  To: zsh workers

On Thu, Feb 22, 2024 at 2:31 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Another possibility therefore is to to have %l do raw output but NOT
> unmetafy(), so the caller would need to either unmetafy() and pass a
> length, or nicedup() the string when that was the desired output
> (which handles the unmetafy() internally, and nul-terminates).

The changes for that are minimal.  With them, Stephane's math-garbles
handle the ellipsis more cleanly:

% printf '%d\n' '1+|ÃÃÃÃÃÃ'
zsh: bad math expression: operand expected at `|\M-C\M-c\...'
0
% ((1+|ÃÃÃÃÃÃ))
zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...'

And the ${..?..} error messages come out as expected:

% err=$'one\ntwo'
% print ${a?$err}
zsh: a: one
two

I'm not sure what if anything to change about yyerror() in parse.c,
it's already untokenize()ing what it passes and maybe that shouldn't
be "nice" anyway?

No patch until I have more feedback, this time.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-22 22:31                         ` Bart Schaefer
  2024-02-23  0:49                           ` Bart Schaefer
@ 2024-02-23 19:27                           ` Stephane Chazelas
  2024-02-23 22:32                             ` Bart Schaefer
  1 sibling, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-23 19:27 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-22 14:31:12 -0800, Bart Schaefer:
[...]
> Opinions?

There are two separate things here:

1. whether user-supplied text in messages should be output raw or
   with the non-printable characters given some common visible
   representation (like \n for newline, ^C for 0x03, \M-^C for
   0x83, ^@ for NUL, ^[ for ESC, \uffff for the encoding of
   U+FFFF...) by going through nicezputs().

2. Whether internally in the code the data should be passed
  "metafied" or not to zerr* functions.

For 1, IMO, when the error message is generated by zsh, it
should go through nicezputs(). zsh should decide of the
formatting, have it pass escape sequences as-is would make it
hard to understand and diagnose the error.

For instance, 

$ printf 'uname\r\n' | zsh
zsh: command not found: unane^M

is more useful than

zsh: command not found: uname

Where CR when sent to a terminal moves the cursor to the left
column so we don't see the problem is caused by that extra bogus
character.
 
The only cases where it should be passed raw is when the error
message is constructed by the user, where the user is expected
to be able to decide the formatting.

Like in:

syserror -p $'\e[1;31mERROR\e[m: '
echo ${1?$'multiline\nerror\nmessage'} ${DISPLAY:?$'\e[1;31mNo graphics'}

(syserror likely doesn't use zerrmsg anyway).

For 2, it looks like zerrmsg expects its input metafied and as
you say, that input in most cases is likely to be metafied
already. Not metafied would mean either we couldn't pass text
containing NUL, or we'd need to pass it as ptr+len.

So what makes most sense to me:

%s remains passed metafied and is output nicezputs'ed
%l same, truncated to the given number of bytes (though
   truncating to a number of characters or at least not cutting
   in the middle of character) would be nicer, but maybe
   overkill.
%S also passed metafied, but no nicezputs.

Now, my previous message was showing there were quite a few
issue with the metafication and possibly with the nicezputs'ing
and/or multibyte handling.

> $ printf '%d\n' $'1+|a\x83 c'
> zsh: bad math expression: operand expected at `|a^@c'

Should have been:

zsh: bad math expression: operand expected at `|aM-^C c'

The text was not passed metafied to zerrmsg with 0x83 0x20 then
incorrectly unmetafied to NUL, then rendered by nicezputs as ^@.

[...]
> $ printf '%d\n' '1+|ÃÃÃÃÃÃ'
> zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...'

I picked à because it's a letter from the latin script, so you
can even use it in variable names:

$ zsh -c '(( ÃÃÃÃÃÃ ++ )); typeset -p ÃÃÃÃÃÃ'
typeset -i ÃÃÃÃÃÃ=1

But its UTF-8 encoding happens to contain the 0x83 byte used in
metafication.

$ printf %s à | od -An -vtx1
 c3 83

$ printf %s à | cat -v
M-CM-^C

Again above, we see the effect of a missing metafication.

The error should have been:

zsh: bad math expression: operand expected at `|ÃÃÃÃÃÃ'

Like in:

~$ printf '%d\n' '1+|AAAAAA'
zsh: bad math expression: operand expected at `|AAAAAA'

> 0
> $ ((1+|ÃÃÃÃÃÃ))
> zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'

In that case, metafication OK, but character cut in the middle.

2024-02-22 16:49:20 -0800, Bart Schaefer:
> The changes for that are minimal.  With them, Stephane's math-garbles
> handle the ellipsis more cleanly:
> 
> % printf '%d\n' '1+|ÃÃÃÃÃÃ'
> zsh: bad math expression: operand expected at `|\M-C\M-c\...'
> 0
> % ((1+|ÃÃÃÃÃÃ))
> zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...'

It seems rather worse to me.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-23 19:27                           ` Stephane Chazelas
@ 2024-02-23 22:32                             ` Bart Schaefer
  2024-02-23 23:38                               ` Bart Schaefer
  2024-02-24  9:47                               ` Stephane Chazelas
  0 siblings, 2 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-23 22:32 UTC (permalink / raw)
  To: zsh workers

On Fri, Feb 23, 2024 at 11:27 AM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> 1. whether user-supplied text in messages should be output raw

The question here is whether error message is about MISTAKENLY
user-supplied text, vs. INTENTIONALLY supplied.  My assertion is that
it's only the caller of zerr() and related functions that can
interpret whether the text is intentional or not.

I would argue that your math examples are mistakes, at least as far as
the parser could possibly "know".

> 2. Whether internally in the code the data should be passed
>   "metafied" or not to zerr* functions.

That still only means we need one way to instruct the function to skip
"nice"-ing and one way to say not to.

Skipping ahead a bit, we'll come around again later ...

> > $ printf '%d\n' $'1+|a\x83 c'
> > zsh: bad math expression: operand expected at `|a^@c'
>
> Should have been:
>
> zsh: bad math expression: operand expected at `|aM-^C c'

You're missing part of my point here.

% printf '%d\n' $(( 1+|a\x83 c ))
zsh: bad math expression: operand expected at `|a\x83 c '

That is IMO more useful than either of "^@" or "M-^C " and is down to
the difference between using printf on a $'...' string (which is
interpreted before printf even gets its mitts on it, and two layers
before the math parser does) vs. using the actual math parser
directly.  This has nothing to do with how the string is passed to
zerr() and everything to do with how printf and parsing interpret the
input -- by the time the math parser actually calls zerr() it can't
know how to unwind that, and the internals of zerr() are even further
removed.

I would therefore argue that these examples are out of scope for this
discussion -- these examples are not about how zerr() et al. should
receive strings, they're about how the math parser should receive
them, and needs to be fixed upstream e.g. in bin_print().

More relevant to this discussion is that math errors are one of the
two existing callers using the %l format, so any attempt to improve
this is going to require changing those calls anyway.

> For 1, IMO, when the error message is generated by zsh, it
> should go through nicezputs(). zsh should decide of the
> formatting, have it pass escape sequences as-is would make it
> hard to understand and diagnose the error.

Agreed in concept, but there's a difference between errors actually
generated BY zsh, and errors with user input that zsh is reporting.
For example, the same literal string might be a file name generated by
globbing, or it might be something the user typed out in a
syntactically invalid command.  There's no way to put intelligence
about how to format those into the guts of zerr().

The question of whether to go through all 500 calls to zerr(),
zwarn(), etc., and "fix" them is something entirely else.  Presently
we have just the one known case of wanting something different for
${var?msg}.

> For 2, it looks like zerrmsg expects its input metafied and as
> you say, that input in most cases is likely to be metafied
> already. Not metafied would mean either we couldn't pass text
> containing NUL, or we'd need to pass it as ptr+len.

There's already a way to pass text not containing NUL (%s) and a way
to pass text as ptr+len (%l).  There are a vanishingly small number of
uses of the latter (2 callers out of the ~500 total call examples).
There's exactly one case so far of wanting output to contain NUL, and
per the "only caller can interpret" assertion, it seems worthwhile to
use %l for the NUL case and let the other 3 callers decide to "nice"
the strings they pass (or not).

This not only skips extra metafication needed to use the proposed %S,
but also simplifies the implementation of %l, and requires the
addition of only 1 or 2 lines of code to each of the two existing
callers using %l (maybe zero lines of code in the case of yyerror()).

> %S also passed metafied, but no nicezputs.

That requires metafy in the caller followed by unmetafy in zerr().
Much easier to remove code from %l than to add it to a new %S,
especially given that we're editing the solitary caller where %S would
be used.

> Now, my previous message was showing there were quite a few
> issue with the metafication and possibly with the nicezputs'ing
> and/or multibyte handling.

Fine, but not fixable in zerr() and friends.

> [...]
> > $ printf '%d\n' '1+|ÃÃÃÃÃÃ'
> > zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...'
>
> I picked à because it's a letter from the latin script, so you
> can even use it in variable names

Sure, but it's not being interpreted as any part of a variable name
here, because you put it in quotes and then passed it to printf.
Remove the "|" and you get something worse:

% printf '%d\n' '1+ÃÃÃÃÃÃ'
BUG: unexpected end of string in ztrlen()
zsh: bad math expression: operand expected at `\M-C\M-c\M-c\M-c\M-c\M-c'
0
% printf '%d\n' $((1+ÃÃÃÃÃÃ))
1

(Also a bit weird that the first \M-C is capitalized and the rest are
not?)  Still not a problem to be resolved in zerr().

> > $ ((1+|ÃÃÃÃÃÃ))
> > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'
>
> In that case, metafication OK, but character cut in the middle.

Still not zerr()'s fault and needs to be addressed where the number of
bytes for %l is being calculated in checkunary().

> > % ((1+|ÃÃÃÃÃÃ))
> > zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...'
>
> It seems rather worse to me.

That's because of the way I chose to lift nice-ifying up into
checkunary() for testing the approach.  It's hard to be consistent
there, given the foregoing business about different formats being sent
down from printf vs. $((...)), and it's also why I said "no patch
without feedback".


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-23 22:32                             ` Bart Schaefer
@ 2024-02-23 23:38                               ` Bart Schaefer
  2024-02-24  9:47                               ` Stephane Chazelas
  1 sibling, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-23 23:38 UTC (permalink / raw)
  To: zsh workers

On Fri, Feb 23, 2024 at 2:32 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> More relevant to this discussion is that math errors are one of the
> two existing callers using the %l format, so any attempt to improve
> this is going to require changing those calls anyway.
>
> we have just the one known case of wanting something different for
> ${var?msg}.
>
> There's exactly one case so far of wanting output to contain NUL, and
> per the "only caller can interpret" assertion, it seems worthwhile to
> use %l for the NUL case and let the other 3 callers decide to "nice"
> the strings they pass (or not).

That last sentence should be "other 3 calls" not callers (there are only 2).

Having worked out that issue with bin_print() [other thread], it
appears to me that NONE of the existing calls using %l want their
output nice'd (for example, the only way to get consistent readable
errors from math garble is to output the strings raw, or at least
something less than nice'd and yet to be determined).

This suggests strongly to me that changing %l and scrapping the %S
idea is the way to go.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-23 22:32                             ` Bart Schaefer
  2024-02-23 23:38                               ` Bart Schaefer
@ 2024-02-24  9:47                               ` Stephane Chazelas
  2024-02-24 10:36                                 ` Stephane Chazelas
  2024-02-25  2:17                                 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
  1 sibling, 2 replies; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-24  9:47 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-23 14:32:49 -0800, Bart Schaefer:
[...]
> > zsh: bad math expression: operand expected at `|aM-^C c'
> 
> You're missing part of my point here.
> 
> % printf '%d\n' $(( 1+|a\x83 c ))
> zsh: bad math expression: operand expected at `|a\x83 c '
> 
> That is IMO more useful than either of "^@" or "M-^C " and is down to
> the difference between using printf on a $'...' string (which is
> interpreted before printf even gets its mitts on it, and two layers
> before the math parser does) vs. using the actual math parser
> directly.  This has nothing to do with how the string is passed to
> zerr() and everything to do with how printf and parsing interpret the
> input -- by the time the math parser actually calls zerr() it can't
> know how to unwind that, and the internals of zerr() are even further
> removed.
> 
> I would therefore argue that these examples are out of scope for this
> discussion -- these examples are not about how zerr() et al. should
> receive strings, they're about how the math parser should receive
> them, and needs to be fixed upstream e.g. in bin_print().
[...]

I agree the bug is in printf which forgets to metafy the input
before passing to the math parse. Which can be seen with:

$ typeset -A a
$ printf '%d\n' 'a[ÃÃÃÃÃÃ]=1'
1
$ (( a[ÃÃÃÃÃÃ] = 2 ))
typeset -A a=( [ÃÃÃÃÃÃ]=2 [$'\M-C\M-c\M-c\M-c\M-c\M-c\M-\C-C']=1 )

> More relevant to this discussion is that math errors are one of the
> two existing callers using the %l format, so any attempt to improve
> this is going to require changing those calls anyway.

I don't see why we'd need to change the call to zerr in those
cases. Just fix printf.

$ a=$'\x83 foobarbaz' b='\x83 foobarbaz'
~$ (( 1+|$b ))
zsh: bad math expression: operand expected at `|\x83 foob...'
$ (( 1+|$a ))
zsh: bad math expression: operand expected at `|\M-^C foobarb...'

Are correct, we do want the 0x83 byte which is not printable to
be rendered as \M-^C.

> 
> > For 1, IMO, when the error message is generated by zsh, it
> > should go through nicezputs(). zsh should decide of the
> > formatting, have it pass escape sequences as-is would make it
> > hard to understand and diagnose the error.
> 
> Agreed in concept, but there's a difference between errors actually
> generated BY zsh, and errors with user input that zsh is reporting.
> For example, the same literal string might be a file name generated by
> globbing, or it might be something the user typed out in a
> syntactically invalid command.  There's no way to put intelligence
> about how to format those into the guts of zerr().

I don't think that's a contention point. All those cases are
cases where we need to make the non-printable characters in the
user data visible with nicezputs.

The question is not about user input vs no user input in the
displayed error, but only for those where there's user input,
whether that user input is mean to be an error message formatted
by the user or not. And I can only think of
${var[:]?user-supplied-error}, and imagine that at least 99% of
the 499 other cases are not about printing a user-supplied error
message.

> There's already a way to pass text not containing NUL (%s) and a way
> to pass text as ptr+len (%l).  There are a vanishingly small number of
> uses of the latter (2 callers out of the ~500 total call examples).
> There's exactly one case so far of wanting output to contain NUL, and
> per the "only caller can interpret" assertion, it seems worthwhile to
> use %l for the NUL case and let the other 3 callers decide to "nice"
> the strings they pass (or not).
> 
> This not only skips extra metafication needed to use the proposed %S,
> but also simplifies the implementation of %l, and requires the
> addition of only 1 or 2 lines of code to each of the two existing
> callers using %l (maybe zero lines of code in the case of yyerror()).
> 
> > %S also passed metafied, but no nicezputs.
> 
> That requires metafy in the caller followed by unmetafy in zerr().
> Much easier to remove code from %l than to add it to a new %S,
> especially given that we're editing the solitary caller where %S would
> be used.

But in the case of ${var?err}, the err is already metafied, so
if you make %l take unmetafied input, you're just moving the
unmetafication to the caller which is counterproductive as it
makes it break on NULs.

Also %l is intended (at least in the one case I saw it used) to
truncase user input, so it should be nicezputs'ed.

> 
> > Now, my previous message was showing there were quite a few
> > issue with the metafication and possibly with the nicezputs'ing
> > and/or multibyte handling.
> 
> Fine, but not fixable in zerr() and friends.

Sorry for the confusion, I didn't mean to say that's where it
was to be fixed. I agree it's all cases where it's the caller
failing to do the metafication (in the case of printf, the
metafication was missing from much earlier).

[...]
> % printf '%d\n' '1+ÃÃÃÃÃÃ'
> BUG: unexpected end of string in ztrlen()
> zsh: bad math expression: operand expected at `\M-C\M-c\M-c\M-c\M-c\M-c'
> 0
> % printf '%d\n' $((1+ÃÃÃÃÃÃ))
> 1
> 
> (Also a bit weird that the first \M-C is capitalized and the rest are
> not?)  Still not a problem to be resolved in zerr().

\M-C is the visual representaion of 0xc3, \M-c of 0xe3, ÃÃ is
c3 83 c3 83. It's just that unmetafy turned 83 c3 into e3.

> > > $ ((1+|ÃÃÃÃÃÃ))
> > > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'
> >
> > In that case, metafication OK, but character cut in the middle.
> 
> Still not zerr()'s fault and needs to be addressed where the number of
> bytes for %l is being calculated in checkunary().

zerr could try and decode the string as text and truncate the
specified number of *characters* instead of bytes, but like I
said, that may be overkill as we can live with the odd character
cut in the middle.

> > > % ((1+|ÃÃÃÃÃÃ))
> > > zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...'
> >
> > It seems rather worse to me.
> 
> That's because of the way I chose to lift nice-ifying up into
> checkunary() for testing the approach.  It's hard to be consistent
> there, given the foregoing business about different formats being sent
> down from printf vs. $((...)), and it's also why I said "no patch
> without feedback".

I guess those ? are some 0x83 bytes added by metafication, and
we're missing the corresponding unmetafy.

To me, the only things to do are:

1. add a %S for raw output (expects metafied input like
   everything else) to tbe used by ${var[:]?error} and likely only
   those.

2. Add missing metafy in bin_print (and possibly elsewhere)
   before calling the math parser

3. Fix those cases where zerrmsg is called with %s/%l/%S
   arguments non-metafied like in that "bad interpreter" case
   above.

4. (optional): Improve %l usages to truncate based on number of
   characters rather than bytes or at least avoid cutting
   characters in the middle.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-24  9:47                               ` Stephane Chazelas
@ 2024-02-24 10:36                                 ` Stephane Chazelas
  2024-02-25  4:35                                   ` [PATCH] 'bad interpreter' error garbled Bart Schaefer
  2024-02-25  2:17                                 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
  1 sibling, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-24 10:36 UTC (permalink / raw)
  To: Bart Schaefer, zsh workers

2024-02-24 09:47:22 +0000, Stephane Chazelas:
[...]
> 3. Fix those cases where zerrmsg is called with %s/%l/%S
>    arguments non-metafied like in that "bad interpreter" case
>    above.
[...]

Sorry, I just realised I didn't mention that "case above":

$ echo '#! ÃÃÃÃ' > a
$ chmod +x a
$ ./a
zsh: ./a: bad interpreter: \M-C\M-c\M-c\M-c\M-^C: no such file or directory

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-24  9:47                               ` Stephane Chazelas
  2024-02-24 10:36                                 ` Stephane Chazelas
@ 2024-02-25  2:17                                 ` Bart Schaefer
  2024-02-25  6:05                                   ` Bart Schaefer
  1 sibling, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-25  2:17 UTC (permalink / raw)
  To: zsh workers

On Sat, Feb 24, 2024 at 1:47 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> 2024-02-23 14:32:49 -0800, Bart Schaefer:
> [...]
> > More relevant to this discussion is that math errors are one of the
> > two existing callers using the %l format, so any attempt to improve
> > this is going to require changing those calls anyway.
>
> I don't see why we'd need to change the call to zerr in those
> cases. Just fix printf.

That doesn't resolve the issue that math.c:checkunary() is
miscalculating the truncation width even before calling zerr().

Having looked more closely now, if I fix that calculation, then
checkunary() can do the truncation itself and doesn't need to use %l
any longer.

> But in the case of ${var?err}, the err is already metafied, so
> if you make %l take unmetafied input, you're just moving the
> unmetafication to the caller which is counterproductive as it
> makes it break on NULs.

No, it doesn't make it break on NULs, because unmetafy() returns the
length of the resulting string including the NULs, which we can then
pass down to zerr().

The actual tradeoff to make %l do the raw output is:
+ fix the width calculation in math.c, then use %s *
+ add one call to metafy() in parse.c, to use %s instead of %l **
+ add one call to unmetafy() in subst.c, to use %l instead of %s
- remove metalen(), zhalloc(), memcpy(), and nicezputs() from utils.c
+ add one call to fwrite() in place of the above

* Which we should do anyway, so not really a tradeoff
** It's not even certain that's required, given that we're outputting
a string from the lexer

> Also %l is intended (at least in the one case I saw it used) to
> truncase user input, so it should be nicezputs'ed.

Unfortunately it's used *incorrectly* to truncate user input at that one place.

> To me, the only things to do are:
>
> 1. add a %S for raw output (expects metafied input like
>    everything else) to tbe used by ${var[:]?error} and likely only
>    those.

That adds another branch in utils.c with at least the same unmetafy()
+ fwrite(), instead of removing code that the callers no longer need.

> 2. Add missing metafy in bin_print (and possibly elsewhere)
>    before calling the math parser

Needed either way.

> 3. Fix those cases where zerrmsg is called with %s/%l/%S
>    arguments non-metafied like in that "bad interpreter" case
>    above.

That's a whack-a-mole hunt and also orthogonal to whether or not to add %S.

> 4. (optional): Improve %l usages to truncate based on number of
>    characters rather than bytes or at least avoid cutting
>    characters in the middle.

Not needed.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] 'bad interpreter' error garbled
  2024-02-24 10:36                                 ` Stephane Chazelas
@ 2024-02-25  4:35                                   ` Bart Schaefer
  2024-02-25  5:26                                     ` Bart Schaefer
  0 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-25  4:35 UTC (permalink / raw)
  To: zsh workers

[-- Attachment #1: Type: text/plain, Size: 219 bytes --]

On Sat, Feb 24, 2024 at 2:36 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> $ echo '#! ÃÃÃÃ' > a
> $ chmod +x a
> $ ./a
> zsh: ./a: bad interpreter: \M-C\M-c\M-c\M-c\M-^C: no such file or directory

[-- Attachment #2: bad-bad-interp.txt --]
[-- Type: text/plain, Size: 778 bytes --]

diff --git a/Src/exec.c b/Src/exec.c
index 1565cb13e..c75aa78d6 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -551,7 +551,7 @@ zexecve(char *pth, char **argv, char **newenvp)
 			    break;
 		    if (t0 == ct)
 			zerr("%s: bad interpreter: %s: %e", pth,
-			     execvebuf + 2, eno);
+			     metafy(execvebuf + 2, -1, META_NOALLOC), eno);
 		    else {
 			while (inblank(execvebuf[t0]))
 			    execvebuf[t0--] = '\0';
@@ -574,8 +574,8 @@ zexecve(char *pth, char **argv, char **newenvp)
 				    execve(pprog, argv - 2, newenvp);
 				}
 			    }
-			    zerr("%s: bad interpreter: %s: %e", pth, ptr2,
-				 eno);
+			    zerr("%s: bad interpreter: %s: %e", pth,
+				 metafy(ptr2, -1, META_NOALLOC), eno);
 			} else if (*ptr) {
 			    *ptr = '\0';
 			    argv[-2] = ptr2;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] count multibyte and metafied characters correctly for math ops errors
  2024-02-22  7:55                     ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas
  2024-02-22 17:02                       ` Bart Schaefer
  2024-02-23  0:33                       ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
@ 2024-02-25  5:06                       ` Bart Schaefer
  2 siblings, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-25  5:06 UTC (permalink / raw)
  To: zsh workers

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]

On Wed, Feb 21, 2024 at 11:55 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> $ ((1+|ÃÃÃÃÃÃ))
> zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'

As Stephane also pointed out, ÃÃÃÃÃÃ is a valid identifier, so it's
not farfetched that someone could make a typo using one in a math
expression.

The code added here could be made into a utility [metalen() is almost
there] but as I don't know of anywhere else this is needed, I just
inlined it.  I looked through the other mb_* routines in utils.c but
they all appeared to do what would be extraneous work for this
purpose.

[-- Attachment #2: mathop-errmsgs.txt --]
[-- Type: text/plain, Size: 1106 bytes --]

diff --git a/Src/math.c b/Src/math.c
index 50b69d6a1..d97dae238 100644
--- a/Src/math.c
+++ b/Src/math.c
@@ -1557,21 +1557,32 @@ checkunary(int mtokc, char *mptr)
 	    errmsg = 2;
     }
     if (errmsg) {
-	int len, over = 0;
+	int len = 0, over = 0;
 	char *errtype = errmsg == 2 ? "operator" : "operand";
 	while (inblank(*mptr))
 	    mptr++;
-	len = ztrlen(mptr);
-	if (len > 10) {
-	    len = 10;
-	    over = 1;
+	if (isset(MULTIBYTE))
+	    MB_CHARINIT();
+	while (over < 10 && mptr[len]) {
+	    if (isset(MULTIBYTE))
+		len += MB_METACHARLEN(mptr+len);
+	    else
+		len += (mptr[len] == Meta ? 2 : 1);
+	    ++over;
+	}
+	if ((over = mptr[len])) {
+	    mptr = dupstring(mptr);
+	    if (mptr[len] == Meta)
+		mptr[len+1] = 0;
+	    else
+		mptr[len] = 0;
 	}
 	if (!*mptr)
 	    zerr("bad math expression: %s expected at end of string",
 		errtype);
 	else
-	    zerr("bad math expression: %s expected at `%l%s'",
-		 errtype, mptr, len, over ? "..." : "");
+	    zerr("bad math expression: %s expected at `%s%s'",
+		 errtype, mptr, over ? "..." : "");
     }
     unary = !(tp & OP_OPF);
 }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] 'bad interpreter' error garbled
  2024-02-25  4:35                                   ` [PATCH] 'bad interpreter' error garbled Bart Schaefer
@ 2024-02-25  5:26                                     ` Bart Schaefer
  0 siblings, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-25  5:26 UTC (permalink / raw)
  To: zsh workers

Oops, just noticed a pair of typos in the patch -- META_NOALLOC was
meant to be META_STATIC.  I'll push a fix.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-25  2:17                                 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
@ 2024-02-25  6:05                                   ` Bart Schaefer
  2024-02-25  7:29                                     ` Stephane Chazelas
  0 siblings, 1 reply; 33+ messages in thread
From: Bart Schaefer @ 2024-02-25  6:05 UTC (permalink / raw)
  To: zsh workers

On Sat, Feb 24, 2024 at 6:17 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> The actual tradeoff to make %l do the raw output is:
> + fix the width calculation in math.c, then use %s *
> + add one call to metafy() in parse.c, to use %s instead of %l **
> + add one call to unmetafy() in subst.c, to use %l instead of %s
> - remove metalen(), zhalloc(), memcpy(), and nicezputs() from utils.c
> + add one call to fwrite() in place of the above

Stephane, if you really still think it preferable to add %S (thus
moving the unmetafy() from subst.c to utils.c) then I think we should
just do away with %l entirely.  Either way we're ending up with a
branch in zerrmsg() that's used for exactly one special case.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})
  2024-02-25  6:05                                   ` Bart Schaefer
@ 2024-02-25  7:29                                     ` Stephane Chazelas
  2024-02-25  8:28                                       ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas
  0 siblings, 1 reply; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-25  7:29 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

2024-02-24 22:05:26 -0800, Bart Schaefer:
> On Sat, Feb 24, 2024 at 6:17 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > The actual tradeoff to make %l do the raw output is:
> > + fix the width calculation in math.c, then use %s *
> > + add one call to metafy() in parse.c, to use %s instead of %l **
> > + add one call to unmetafy() in subst.c, to use %l instead of %s
> > - remove metalen(), zhalloc(), memcpy(), and nicezputs() from utils.c
> > + add one call to fwrite() in place of the above
> 
> Stephane, if you really still think it preferable to add %S (thus
> moving the unmetafy() from subst.c to utils.c) then I think we should
> just do away with %l entirely.  Either way we're ending up with a
> branch in zerrmsg() that's used for exactly one special case.

I don't have a strong opinion on it. Either way, the internal
documentation should make it clear the metafy and nicezputs
expectations.

Like:

%s expects metafied string, prints it nicezputs'ed

%l expects non-metafied string and byte length, prints it raw.
   it's the caller's responsibility to make sure the string is
   not truncated in the middle of a character.

Note that there's truncating in between  the bytes of a
character and there's truncating between the characters of a
grapheme cluster like decomposed Hangul syllables (or
$'Ste\u301phane'). I don't think we'd want to try and address
that.

Other option could be to have %s, %S and %*s %*S and let
nicezputs do the truncation and ellipsis in %*s case which might
be a cleaner interface if (likely) more cumbersome to implement.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* typeset -<non-ASCII> (Was: metafication in error messages)
  2024-02-25  7:29                                     ` Stephane Chazelas
@ 2024-02-25  8:28                                       ` Stephane Chazelas
  2024-02-25  8:35                                         ` Stephane Chazelas
  2024-02-25 21:02                                         ` Bart Schaefer
  0 siblings, 2 replies; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-25  8:28 UTC (permalink / raw)
  To: Bart Schaefer, zsh workers

By the way, while looking at other culprits for missing
metafication in calls to zerr*, I found (in UTF-8 locale):

$ typeset -é
typeset: bad option: -^
~$ zsh -c 'typeset -é' |& sed -n l
zsh:typeset:1: bad option: -^\003$

(gdb)
305         while (*fmt)
(gdb)
306             if (*fmt == '%') {
(gdb)
307                 fmt++;
(gdb)
308                 switch (*fmt++) {
(gdb)
348                     num = va_arg(ap, int);
(gdb)
350                     mb_charinit();
(gdb) p num
$2 = -61
(gdb) p (unsigned char) num
$3 = 195 '\303'

(that's 0xc3, the first byte of é, normally rendered as \M-C).

We end up with a ^ because in wcs_nicechar_sel():

wcs_nicechar_sel (c=-61 L'\xffffffc3', widthp=0x0, swidep=0x0, quotable=0) at utils.c:602
602         int ret = 0;
(gdb) n
603         VARARR(char, mbstr, MB_CUR_MAX);
(gdb)
612         newalloc = NICECHAR_MAX;
(gdb)
613         if (bufalloc != newalloc)
(gdb)
619         s = buf;
(gdb)
620         if (!WC_ISPRINT(c) && (c < 0x80 || !isset(PRINTEIGHTBIT))) {
(gdb)
621             if (c == 0x7f) {
(gdb)
629             } else if (c == L'\n') {
(gdb)
632             } else if (c == L'\t') {
(gdb)
635             } else if (c < 0x20) {
(gdb)
636                 if (quotable) {
(gdb)
641                     *s++ = '^';

It's < 0x20 because negative and we end up with ^<0x3> because
that's -61 + 64.

zerrmsg's %c expects a wchar_t when MULTIBYTE_SUPPORT is
enabled, but here we're passing it the first byte (signed) of
the encoding of the wide character.

Calling it with (unsigned char)*arg is not much better as you
end up with: zsh:typeset:1: bad option: -Ã

As U+00C3 (Ã) happens to be printable. If it weren't -\M-C would
not be much better anyway. And you also get zsh:typeset:1: bad
option: -Ã for typeset -$'\xc3'

Some options could be:
- return a "no non-ASCII/multibyte option supported" error
  message when *arg >= 0x80
- extract and decode the full multibyte character before passing
  to zerrmsg. which leaves the problem of how to render byte
  sequences that can't be decoded.

Maybe there are already functions in the code that do that?

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: typeset -<non-ASCII> (Was: metafication in error messages)
  2024-02-25  8:28                                       ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas
@ 2024-02-25  8:35                                         ` Stephane Chazelas
  2024-02-25 21:02                                         ` Bart Schaefer
  1 sibling, 0 replies; 33+ messages in thread
From: Stephane Chazelas @ 2024-02-25  8:35 UTC (permalink / raw)
  To: Bart Schaefer, zsh workers

$ zsh -c $'typeset -\x83'
zsh:typeset:1: bad option: -\M-C

is also incorrect, \M-C is 0xc3. 0x83 is \M-^C.

-- 
Stephane


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: typeset -<non-ASCII> (Was: metafication in error messages)
  2024-02-25  8:28                                       ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas
  2024-02-25  8:35                                         ` Stephane Chazelas
@ 2024-02-25 21:02                                         ` Bart Schaefer
  1 sibling, 0 replies; 33+ messages in thread
From: Bart Schaefer @ 2024-02-25 21:02 UTC (permalink / raw)
  To: zsh workers

On Sun, Feb 25, 2024 at 12:29 AM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> By the way, while looking at other culprits for missing
> metafication in calls to zerr*, I found (in UTF-8 locale):
>
> $ typeset -é
> typeset: bad option: -^

I did warn that any such effort was going to be whack-a-mole.

On Sun, Feb 25, 2024 at 12:35 AM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> $ zsh -c $'typeset -\x83'
> zsh:typeset:1: bad option: -\M-C
>
> is also incorrect, \M-C is 0xc3. 0x83 is \M-^C.

IMO this is GIGO -- we have not and likely never will have non-ascii
option letters, so detecting multibyte characters and pretty-printing
them on error is a waste of effort.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-02-25 21:02 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13  8:02 $var not expanded in ${x?$var} Stephane Chazelas
2023-01-16  8:35 ` Daniel Shahaf
2023-01-16 17:15 ` Peter Stephenson
2024-02-20  7:05   ` Stephane Chazelas
2024-02-20 17:45     ` Bart Schaefer
2024-02-20 19:39       ` Stephane Chazelas
2024-02-21  4:44         ` Bart Schaefer
2024-02-21 19:45           ` Stephane Chazelas
2024-02-21 19:52             ` Bart Schaefer
2024-02-21 20:21               ` Stephane Chazelas
2024-02-22  0:46                 ` [PATCH] unmetafy " Bart Schaefer
2024-02-22  7:23                   ` Stephane Chazelas
2024-02-22  7:55                     ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas
2024-02-22 17:02                       ` Bart Schaefer
2024-02-22 22:31                         ` Bart Schaefer
2024-02-23  0:49                           ` Bart Schaefer
2024-02-23 19:27                           ` Stephane Chazelas
2024-02-23 22:32                             ` Bart Schaefer
2024-02-23 23:38                               ` Bart Schaefer
2024-02-24  9:47                               ` Stephane Chazelas
2024-02-24 10:36                                 ` Stephane Chazelas
2024-02-25  4:35                                   ` [PATCH] 'bad interpreter' error garbled Bart Schaefer
2024-02-25  5:26                                     ` Bart Schaefer
2024-02-25  2:17                                 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
2024-02-25  6:05                                   ` Bart Schaefer
2024-02-25  7:29                                     ` Stephane Chazelas
2024-02-25  8:28                                       ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas
2024-02-25  8:35                                         ` Stephane Chazelas
2024-02-25 21:02                                         ` Bart Schaefer
2024-02-23  0:33                       ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer
2024-02-25  5:06                       ` [PATCH] count multibyte and metafied characters correctly for math ops errors Bart Schaefer
2024-02-22  8:34                     ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa
2024-02-22 17:07                       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).