* $var not expanded in ${x?$var} @ 2023-01-13 8:02 Stephane Chazelas 2023-01-16 8:35 ` Daniel Shahaf 2023-01-16 17:15 ` Peter Stephenson 0 siblings, 2 replies; 33+ messages in thread From: Stephane Chazelas @ 2023-01-13 8:02 UTC (permalink / raw) To: Zsh hackers list $ zsh -c 'echo ${1?$USERNAME}' zsh:1: 1: $USERNAME No quote removal either: $ zsh -c 'echo ${1?"x"}' zsh:1: 1: "x" Doc says: > In any of the above expressions that test a variable and substitute an > alternate WORD, note that you can use standard shell quoting in the WORD > value to selectively override the splitting done by the SH_WORD_SPLIT > option and the = flag, but not splitting by the s:STRING: flag. POSIX: > ${parameter:?[word]} > Indicate Error if Null or Unset. If parameter is unset or > null, the *expansion* of word (or a message indicating it is > unset if word is omitted) shall be written to standard error > and the shell exits with a non-zero exit status. Otherwise, > the value of parameter shall be substituted. An interactive > shell need not exit. (also note that line number being printed twice) -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2023-01-13 8:02 $var not expanded in ${x?$var} Stephane Chazelas @ 2023-01-16 8:35 ` Daniel Shahaf 2023-01-16 17:15 ` Peter Stephenson 1 sibling, 0 replies; 33+ messages in thread From: Daniel Shahaf @ 2023-01-16 8:35 UTC (permalink / raw) To: Stephane Chazelas, Zsh hackers list Stephane Chazelas wrote on Fri, 13 Jan 2023 08:02 +00:00: > $ zsh -c 'echo ${1?$USERNAME}' > zsh:1: 1: $USERNAME > > No quote removal either: > > $ zsh -c 'echo ${1?"x"}' > zsh:1: 1: "x" > Thanks for the report. > (also note that line number being printed twice) No, it's not. The second "1" is the parameter name: $ zsh -fc 'echo ${foo?$USERNAME}' zsh:1: foo: $USERNAME Cheers, Daniel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2023-01-13 8:02 $var not expanded in ${x?$var} Stephane Chazelas 2023-01-16 8:35 ` Daniel Shahaf @ 2023-01-16 17:15 ` Peter Stephenson 2024-02-20 7:05 ` Stephane Chazelas 1 sibling, 1 reply; 33+ messages in thread From: Peter Stephenson @ 2023-01-16 17:15 UTC (permalink / raw) To: zsh workers > On 13/01/2023 08:02 Stephane Chazelas <stephane@chazelas.org> wrote: > > > $ zsh -c 'echo ${1?$USERNAME}' > zsh:1: 1: $USERNAME > > No quote removal either: > > $ zsh -c 'echo ${1?"x"}' > zsh:1: 1: "x" > > Doc says: > > > In any of the above expressions that test a variable and substitute an > > alternate WORD, note that you can use standard shell quoting in the WORD > > value to selectively override the splitting done by the SH_WORD_SPLIT > > option and the = flag, but not splitting by the s:STRING: flag. In fact the shell does not "substitute an alternate WORD" here, it just prints it out, but the difference is easy to miss and expanding it seems the right thing to do from other points of view, so I've noted it in the doc. pws diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo index eb8cdbae5..857715a95 100644 --- a/Doc/Zsh/expn.yo +++ b/Doc/Zsh/expn.yo @@ -665,7 +665,9 @@ item(tt(${)var(name)tt(:?)var(word)tt(}))( In the first form, if var(name) is set, or in the second form if var(name) is both set and non-null, then substitute its value; otherwise, print var(word) and exit from the shell. Interactive shells instead return to -the prompt. If var(word) is omitted, then a standard message is printed. +the prompt. If var(word) is omitted, then a standard message is +printed. Note that var(word) is expanded even though its value +is not substituted onto the command line. ) enditem() diff --git a/Src/subst.c b/Src/subst.c index 897188862..4ad9fee1a 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -3076,7 +3076,11 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, if (vunset) { if (isset(EXECOPT)) { *idend = '\0'; - zerr("%s: %s", idbeg, *s ? s : "parameter not set"); + if (*s){ + singsub(&s); + zerr("%s: %s", idbeg, s); + } else + zerr("%s: %s", idbeg, "parameter not set"); /* * In interactive shell we need to return to * top-level prompt --- don't clear this error diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst index 6bf55b4db..7dd5d82d7 100644 --- a/Test/D04parameter.ztst +++ b/Test/D04parameter.ztst @@ -110,6 +110,11 @@ *>*foo:1: 1: no arguments given >reached + message="expand me and remove quotes" + (: ${UNSET_PARAM?$message}) +1:${...?....} performs expansion on the message +?(eval):2: UNSET_PARAM: expand me and remove quotes + print ${set1:+word1} ${set1+word2} ${null1:+word3} ${null1+word4} print ${unset1:+word5} ${unset1+word6} 0:${...:+...}, ${...+...} ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2023-01-16 17:15 ` Peter Stephenson @ 2024-02-20 7:05 ` Stephane Chazelas 2024-02-20 17:45 ` Bart Schaefer 0 siblings, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-20 7:05 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh workers 2023-01-16 17:15:35 +0000, Peter Stephenson: > > On 13/01/2023 08:02 Stephane Chazelas <stephane@chazelas.org> wrote: > > > > $ zsh -c 'echo ${1?$USERNAME}' > > zsh:1: 1: $USERNAME > > > > No quote removal either: > > > > $ zsh -c 'echo ${1?"x"}' > > zsh:1: 1: "x" > > > > Doc says: > > > > > In any of the above expressions that test a variable and substitute an > > > alternate WORD, note that you can use standard shell quoting in the WORD > > > value to selectively override the splitting done by the SH_WORD_SPLIT > > > option and the = flag, but not splitting by the s:STRING: flag. > > In fact the shell does not "substitute an alternate WORD" here, it > just prints it out, but the difference is easy to miss and expanding it > seems the right thing to do from other points of view, so I've noted it > in the doc. [...] Thanks. $ echo ${1?$'a\nb'} zsh: 1: a\nb $ print -Prv err '%F{red}BAD%f' $ a=${1?$err} zsh: 1: ^[[31mBAD^[[39m That transformation of newline into \n and other nicezputs'ness seems undesirable to me though. It doesn't seem unreasonable for someone to want to provide a multiline error message or one with terminal escape sequences. See for instance https://unix.stackexchange.com/questions/769679/bash-parameter-substitution-with-multi-line-error-message AFAICT, except for bash doing the IFS-split + join, all other shells send the expansion verbatim to stderr. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2024-02-20 7:05 ` Stephane Chazelas @ 2024-02-20 17:45 ` Bart Schaefer 2024-02-20 19:39 ` Stephane Chazelas 0 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-20 17:45 UTC (permalink / raw) To: zsh workers On Mon, Feb 19, 2024 at 11:05 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ echo ${1?$'a\nb'} > zsh: 1: a\nb > $ print -Prv err '%F{red}BAD%f' > $ a=${1?$err} > zsh: 1: ^[[31mBAD^[[39m > > That transformation of newline into \n and other nicezputs'ness > seems undesirable to me though. It's probably worthwhile for utils.c:zerrmsg() to have a %-placeholder that means a literal string. Not hard to do, just need to choose a letter. %S ? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2024-02-20 17:45 ` Bart Schaefer @ 2024-02-20 19:39 ` Stephane Chazelas 2024-02-21 4:44 ` Bart Schaefer 0 siblings, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-20 19:39 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-20 09:45:48 -0800, Bart Schaefer: > On Mon, Feb 19, 2024 at 11:05 PM Stephane Chazelas > <stephane@chazelas.org> wrote: [...] > > That transformation of newline into \n and other nicezputs'ness > > seems undesirable to me though. > > It's probably worthwhile for utils.c:zerrmsg() to have a %-placeholder > that means a literal string. Not hard to do, just need to choose a > letter. %S ? That was my thought as well with %S or %#s (alternative form of %s) or %r (raw). But then I thought maybe there was already something available to print some raw error. Looking at https://austingroupbugs.net/view.php?id=1771 where availability of printf %X directives is discussed, looks like %S is the least likely to conflict with printf(3) or printf(1) if that was ever to be a concern. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2024-02-20 19:39 ` Stephane Chazelas @ 2024-02-21 4:44 ` Bart Schaefer 2024-02-21 19:45 ` Stephane Chazelas 0 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-21 4:44 UTC (permalink / raw) To: zsh workers [-- Attachment #1: Type: text/plain, Size: 577 bytes --] On Tue, Feb 20, 2024 at 11:39 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > Looking at https://austingroupbugs.net/view.php?id=1771 where > availability of printf %X directives is discussed, looks like %S > is the least likely to conflict with printf(3) or printf(1) if > that was ever to be a concern. Zsh contains approximately 500 calls to output error messages, at least a third of which include strings that would have come from user input. I wonder how many of those are eventually going to want %S or at least some less-than-"nice" processing. [-- Attachment #2: zerr%S.txt --] [-- Type: text/plain, Size: 1156 bytes --] diff --git a/Src/subst.c b/Src/subst.c index 650c09de2..ddf9f9de9 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -3272,7 +3272,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, *idend = '\0'; if (*s){ singsub(&s); - zerr("%s: %s", idbeg, s); + zerr("%s: %S", idbeg, s); } else zerr("%s: %s", idbeg, "parameter not set"); /* diff --git a/Src/utils.c b/Src/utils.c index 0fda92709..0482f3316 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -123,6 +123,7 @@ set_widearray(char *mb_array, Widechar_array wca) Code Argument types Prints %s const char * C string (null terminated) + %S const char * C string (null terminated), output raw %l const char *, int C string of given length (null not required) %L long decimal value %d int decimal value @@ -309,6 +310,10 @@ zerrmsg(FILE *file, const char *fmt, va_list ap) str = va_arg(ap, const char *); nicezputs(str, file); break; + case 'S': + str = va_arg(ap, const char *); + fprintf(file, "%s", str); + break; case 'l': { char *s; str = va_arg(ap, const char *); ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2024-02-21 4:44 ` Bart Schaefer @ 2024-02-21 19:45 ` Stephane Chazelas 2024-02-21 19:52 ` Bart Schaefer 0 siblings, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-21 19:45 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-20 20:44:47 -0800, Bart Schaefer: [...] > + case 'S': > + str = va_arg(ap, const char *); > + fprintf(file, "%s", str); > + break; [...] I suspect there's something wrong with that as it could not possibly work for ${var?$'\0'}. (could also be simplified to fputs(str, file) though I suspect it makes no difference with modern compilers). -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2024-02-21 19:45 ` Stephane Chazelas @ 2024-02-21 19:52 ` Bart Schaefer 2024-02-21 20:21 ` Stephane Chazelas 0 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-21 19:52 UTC (permalink / raw) To: zsh workers On Wed, Feb 21, 2024 at 11:45 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2024-02-20 20:44:47 -0800, Bart Schaefer: > [...] > > + case 'S': > > + str = va_arg(ap, const char *); > > + fprintf(file, "%s", str); > > + break; > [...] > > I suspect there's something wrong with that as it could not > possibly work for ${var?$'\0'}. What would you expect it to do? You can't have it both raw and not raw. With the patch applied: % : ${var?$'\0oopsie'} zsh: var: ? oopsie Might try running that through valgrind later, not presently on a machine that has valgrind. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: $var not expanded in ${x?$var} 2024-02-21 19:52 ` Bart Schaefer @ 2024-02-21 20:21 ` Stephane Chazelas 2024-02-22 0:46 ` [PATCH] unmetafy " Bart Schaefer 0 siblings, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-21 20:21 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-21 11:52:51 -0800, Bart Schaefer: > On Wed, Feb 21, 2024 at 11:45 AM Stephane Chazelas > <stephane@chazelas.org> wrote: > > > > 2024-02-20 20:44:47 -0800, Bart Schaefer: > > [...] > > > + case 'S': > > > + str = va_arg(ap, const char *); > > > + fprintf(file, "%s", str); > > > + break; > > [...] > > > > I suspect there's something wrong with that as it could not > > possibly work for ${var?$'\0'}. > > What would you expect it to do? You can't have it both raw and not > raw. I would expect it to write the NUL character ('\0') to stderr, which fprintf couldn't do as it takes a NUL-delimited string. With the patch applied, I get: $ (: ${1?$'a\0b'}) |& sed -n l zsh: 1: a\203 b$ Maybe needs some unmetafy + fwrite. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] unmetafy Re: $var not expanded in ${x?$var} 2024-02-21 20:21 ` Stephane Chazelas @ 2024-02-22 0:46 ` Bart Schaefer 2024-02-22 7:23 ` Stephane Chazelas 0 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-22 0:46 UTC (permalink / raw) To: zsh workers [-- Attachment #1: Type: text/plain, Size: 407 bytes --] On Wed, Feb 21, 2024 at 12:21 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > > What would you expect it to do? You can't have it both raw and not > > raw. > > I would expect it to write the NUL character ('\0') to stderr [...] > Maybe needs some unmetafy + fwrite. Bleah, the argument to zerrmsg is const char * so can't be passed to unmetafy without copy. Is this more like it? [-- Attachment #2: zerr%S.txt --] [-- Type: text/plain, Size: 1184 bytes --] diff --git a/Src/subst.c b/Src/subst.c index 650c09de2..ddf9f9de9 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -3272,7 +3272,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, *idend = '\0'; if (*s){ singsub(&s); - zerr("%s: %s", idbeg, s); + zerr("%s: %S", idbeg, s); } else zerr("%s: %s", idbeg, "parameter not set"); /* diff --git a/Src/utils.c b/Src/utils.c index 0fda92709..a15284815 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -123,6 +123,7 @@ set_widearray(char *mb_array, Widechar_array wca) Code Argument types Prints %s const char * C string (null terminated) + %S const char * C string (null terminated), output raw %l const char *, int C string of given length (null not required) %L long decimal value %d int decimal value @@ -309,6 +310,10 @@ zerrmsg(FILE *file, const char *fmt, va_list ap) str = va_arg(ap, const char *); nicezputs(str, file); break; + case 'S': + str = va_arg(ap, const char *); + fwrite(unmetafy(dupstring(str), &num), num, 1, file); + break; case 'l': { char *s; str = va_arg(ap, const char *); ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] unmetafy Re: $var not expanded in ${x?$var} 2024-02-22 0:46 ` [PATCH] unmetafy " Bart Schaefer @ 2024-02-22 7:23 ` Stephane Chazelas 2024-02-22 7:55 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas 2024-02-22 8:34 ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa 0 siblings, 2 replies; 33+ messages in thread From: Stephane Chazelas @ 2024-02-22 7:23 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-21 16:46:23 -0800, Bart Schaefer: [...] > Code Argument types Prints > %s const char * C string (null terminated) > + %S const char * C string (null terminated), output raw May be worth pointing out there that the string is expected to be metafied (for both %s and %S). > %l const char *, int C string of given length (null not required) Would that one be expected to be metafied? > %L long decimal value > %d int decimal value > @@ -309,6 +310,10 @@ zerrmsg(FILE *file, const char *fmt, va_list ap) > str = va_arg(ap, const char *); > nicezputs(str, file); > break; > + case 'S': > + str = va_arg(ap, const char *); > + fwrite(unmetafy(dupstring(str), &num), num, 1, file); > + break; [...] Being no C expert, I wonder if it's safe (portable) to set and use num in the same call like that. Is it guaranteed to be done in the right order? -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-22 7:23 ` Stephane Chazelas @ 2024-02-22 7:55 ` Stephane Chazelas 2024-02-22 17:02 ` Bart Schaefer ` (2 more replies) 2024-02-22 8:34 ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa 1 sibling, 3 replies; 33+ messages in thread From: Stephane Chazelas @ 2024-02-22 7:55 UTC (permalink / raw) To: Bart Schaefer, zsh workers 2024-02-22 07:23:13 +0000, Stephane Chazelas: > 2024-02-21 16:46:23 -0800, Bart Schaefer: > [...] > > Code Argument types Prints > > %s const char * C string (null terminated) > > + %S const char * C string (null terminated), output raw > > May be worth pointing out there that the string is expected to > be metafied (for both %s and %S). > > > %l const char *, int C string of given length (null not required) > > Would that one be expected to be metafied? I see it used in this error message: $ printf '%d\n' $'1+|a\x83 c' zsh: bad math expression: operand expected at `|a^@c' 0 $ printf '%d\n' $'1+|a\0b' zsh: bad math expression: operand expected at `|a' 0 $ printf '%d\n' '1+|ÃÃÃÃÃÃ' zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...' 0 $ ((1+|ÃÃÃÃÃÃ)) zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...' -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-22 7:55 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas @ 2024-02-22 17:02 ` Bart Schaefer 2024-02-22 22:31 ` Bart Schaefer 2024-02-23 0:33 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer 2024-02-25 5:06 ` [PATCH] count multibyte and metafied characters correctly for math ops errors Bart Schaefer 2 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-22 17:02 UTC (permalink / raw) To: zsh workers On Wed, Feb 21, 2024 at 11:55 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2024-02-22 07:23:13 +0000, Stephane Chazelas: > > 2024-02-21 16:46:23 -0800, Bart Schaefer: > > [...] > > > Code Argument types Prints > > > %s const char * C string (null terminated) > > > + %S const char * C string (null terminated), output raw > > > > May be worth pointing out there that the string is expected to > > be metafied (for both %s and %S). Actually the string is NOT really expected to be metafied ... usually it's a plain string hardwired in the calling code, so the caller actually should have the responsibility for unmetafy. In the particular case of %l -- > > > %l const char *, int C string of given length (null not required) > > > > Would that one be expected to be metafied? > > I see it used in this error message: > > $ printf '%d\n' $'1+|a\x83 c' > zsh: bad math expression: operand expected at `|a^@c' -- there's no (or not presently) any version of unmetafy that works without nul-termination. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-22 17:02 ` Bart Schaefer @ 2024-02-22 22:31 ` Bart Schaefer 2024-02-23 0:49 ` Bart Schaefer 2024-02-23 19:27 ` Stephane Chazelas 0 siblings, 2 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-22 22:31 UTC (permalink / raw) To: zsh workers On Thu, Feb 22, 2024 at 9:07 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > However, per commentary in the other thread, I'm going to recommend > against using this patch in the first place. We need to rethink this > whole approach. On Thu, Feb 22, 2024 at 9:02 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > Actually the string is NOT really expected to be metafied ... in fact, %l allocates space for it to be metafied during output. > ... usually it's a plain string hardwired in the calling code Curiously, the only existing uses of %l are for strings that came from user input (arithmetic or a history reference) so they're almost certainly already metafied. So ... one possible approach is to declare that %l is the way to produce "raw" output, and have it perform the unmetafy() + fwrite(). However, it's not clear to me that's preferred in Stephane's examples like this one: > > $ printf '%d\n' $'1+|a\x83 c' > > zsh: bad math expression: operand expected at `|a^@c' Do we really want to "see" a raw \x83 there? No one would intentionally use a math syntax error e.g. to produce terminal control sequences. > the caller actually should have the responsibility for unmetafy. Another possibility therefore is to to have %l do raw output but NOT unmetafy(), so the caller would need to either unmetafy() and pass a length, or nicedup() the string when that was the desired output (which handles the unmetafy() internally, and nul-terminates). The final option would be to leave %l alone and add %S with a length argument ... but I don't see that as much better than the foregoing, given the very limited use of %l so far. Opinions? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-22 22:31 ` Bart Schaefer @ 2024-02-23 0:49 ` Bart Schaefer 2024-02-23 19:27 ` Stephane Chazelas 1 sibling, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-23 0:49 UTC (permalink / raw) To: zsh workers On Thu, Feb 22, 2024 at 2:31 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > Another possibility therefore is to to have %l do raw output but NOT > unmetafy(), so the caller would need to either unmetafy() and pass a > length, or nicedup() the string when that was the desired output > (which handles the unmetafy() internally, and nul-terminates). The changes for that are minimal. With them, Stephane's math-garbles handle the ellipsis more cleanly: % printf '%d\n' '1+|ÃÃÃÃÃÃ' zsh: bad math expression: operand expected at `|\M-C\M-c\...' 0 % ((1+|ÃÃÃÃÃÃ)) zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...' And the ${..?..} error messages come out as expected: % err=$'one\ntwo' % print ${a?$err} zsh: a: one two I'm not sure what if anything to change about yyerror() in parse.c, it's already untokenize()ing what it passes and maybe that shouldn't be "nice" anyway? No patch until I have more feedback, this time. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-22 22:31 ` Bart Schaefer 2024-02-23 0:49 ` Bart Schaefer @ 2024-02-23 19:27 ` Stephane Chazelas 2024-02-23 22:32 ` Bart Schaefer 1 sibling, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-23 19:27 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-22 14:31:12 -0800, Bart Schaefer: [...] > Opinions? There are two separate things here: 1. whether user-supplied text in messages should be output raw or with the non-printable characters given some common visible representation (like \n for newline, ^C for 0x03, \M-^C for 0x83, ^@ for NUL, ^[ for ESC, \uffff for the encoding of U+FFFF...) by going through nicezputs(). 2. Whether internally in the code the data should be passed "metafied" or not to zerr* functions. For 1, IMO, when the error message is generated by zsh, it should go through nicezputs(). zsh should decide of the formatting, have it pass escape sequences as-is would make it hard to understand and diagnose the error. For instance, $ printf 'uname\r\n' | zsh zsh: command not found: unane^M is more useful than zsh: command not found: uname Where CR when sent to a terminal moves the cursor to the left column so we don't see the problem is caused by that extra bogus character. The only cases where it should be passed raw is when the error message is constructed by the user, where the user is expected to be able to decide the formatting. Like in: syserror -p $'\e[1;31mERROR\e[m: ' echo ${1?$'multiline\nerror\nmessage'} ${DISPLAY:?$'\e[1;31mNo graphics'} (syserror likely doesn't use zerrmsg anyway). For 2, it looks like zerrmsg expects its input metafied and as you say, that input in most cases is likely to be metafied already. Not metafied would mean either we couldn't pass text containing NUL, or we'd need to pass it as ptr+len. So what makes most sense to me: %s remains passed metafied and is output nicezputs'ed %l same, truncated to the given number of bytes (though truncating to a number of characters or at least not cutting in the middle of character) would be nicer, but maybe overkill. %S also passed metafied, but no nicezputs. Now, my previous message was showing there were quite a few issue with the metafication and possibly with the nicezputs'ing and/or multibyte handling. > $ printf '%d\n' $'1+|a\x83 c' > zsh: bad math expression: operand expected at `|a^@c' Should have been: zsh: bad math expression: operand expected at `|aM-^C c' The text was not passed metafied to zerrmsg with 0x83 0x20 then incorrectly unmetafied to NUL, then rendered by nicezputs as ^@. [...] > $ printf '%d\n' '1+|ÃÃÃÃÃÃ' > zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...' I picked à because it's a letter from the latin script, so you can even use it in variable names: $ zsh -c '(( ÃÃÃÃÃà ++ )); typeset -p ÃÃÃÃÃÃ' typeset -i ÃÃÃÃÃÃ=1 But its UTF-8 encoding happens to contain the 0x83 byte used in metafication. $ printf %s à | od -An -vtx1 c3 83 $ printf %s à | cat -v M-CM-^C Again above, we see the effect of a missing metafication. The error should have been: zsh: bad math expression: operand expected at `|ÃÃÃÃÃÃ' Like in: ~$ printf '%d\n' '1+|AAAAAA' zsh: bad math expression: operand expected at `|AAAAAA' > 0 > $ ((1+|ÃÃÃÃÃÃ)) > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...' In that case, metafication OK, but character cut in the middle. 2024-02-22 16:49:20 -0800, Bart Schaefer: > The changes for that are minimal. With them, Stephane's math-garbles > handle the ellipsis more cleanly: > > % printf '%d\n' '1+|ÃÃÃÃÃÃ' > zsh: bad math expression: operand expected at `|\M-C\M-c\...' > 0 > % ((1+|ÃÃÃÃÃÃ)) > zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...' It seems rather worse to me. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-23 19:27 ` Stephane Chazelas @ 2024-02-23 22:32 ` Bart Schaefer 2024-02-23 23:38 ` Bart Schaefer 2024-02-24 9:47 ` Stephane Chazelas 0 siblings, 2 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-23 22:32 UTC (permalink / raw) To: zsh workers On Fri, Feb 23, 2024 at 11:27 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > 1. whether user-supplied text in messages should be output raw The question here is whether error message is about MISTAKENLY user-supplied text, vs. INTENTIONALLY supplied. My assertion is that it's only the caller of zerr() and related functions that can interpret whether the text is intentional or not. I would argue that your math examples are mistakes, at least as far as the parser could possibly "know". > 2. Whether internally in the code the data should be passed > "metafied" or not to zerr* functions. That still only means we need one way to instruct the function to skip "nice"-ing and one way to say not to. Skipping ahead a bit, we'll come around again later ... > > $ printf '%d\n' $'1+|a\x83 c' > > zsh: bad math expression: operand expected at `|a^@c' > > Should have been: > > zsh: bad math expression: operand expected at `|aM-^C c' You're missing part of my point here. % printf '%d\n' $(( 1+|a\x83 c )) zsh: bad math expression: operand expected at `|a\x83 c ' That is IMO more useful than either of "^@" or "M-^C " and is down to the difference between using printf on a $'...' string (which is interpreted before printf even gets its mitts on it, and two layers before the math parser does) vs. using the actual math parser directly. This has nothing to do with how the string is passed to zerr() and everything to do with how printf and parsing interpret the input -- by the time the math parser actually calls zerr() it can't know how to unwind that, and the internals of zerr() are even further removed. I would therefore argue that these examples are out of scope for this discussion -- these examples are not about how zerr() et al. should receive strings, they're about how the math parser should receive them, and needs to be fixed upstream e.g. in bin_print(). More relevant to this discussion is that math errors are one of the two existing callers using the %l format, so any attempt to improve this is going to require changing those calls anyway. > For 1, IMO, when the error message is generated by zsh, it > should go through nicezputs(). zsh should decide of the > formatting, have it pass escape sequences as-is would make it > hard to understand and diagnose the error. Agreed in concept, but there's a difference between errors actually generated BY zsh, and errors with user input that zsh is reporting. For example, the same literal string might be a file name generated by globbing, or it might be something the user typed out in a syntactically invalid command. There's no way to put intelligence about how to format those into the guts of zerr(). The question of whether to go through all 500 calls to zerr(), zwarn(), etc., and "fix" them is something entirely else. Presently we have just the one known case of wanting something different for ${var?msg}. > For 2, it looks like zerrmsg expects its input metafied and as > you say, that input in most cases is likely to be metafied > already. Not metafied would mean either we couldn't pass text > containing NUL, or we'd need to pass it as ptr+len. There's already a way to pass text not containing NUL (%s) and a way to pass text as ptr+len (%l). There are a vanishingly small number of uses of the latter (2 callers out of the ~500 total call examples). There's exactly one case so far of wanting output to contain NUL, and per the "only caller can interpret" assertion, it seems worthwhile to use %l for the NUL case and let the other 3 callers decide to "nice" the strings they pass (or not). This not only skips extra metafication needed to use the proposed %S, but also simplifies the implementation of %l, and requires the addition of only 1 or 2 lines of code to each of the two existing callers using %l (maybe zero lines of code in the case of yyerror()). > %S also passed metafied, but no nicezputs. That requires metafy in the caller followed by unmetafy in zerr(). Much easier to remove code from %l than to add it to a new %S, especially given that we're editing the solitary caller where %S would be used. > Now, my previous message was showing there were quite a few > issue with the metafication and possibly with the nicezputs'ing > and/or multibyte handling. Fine, but not fixable in zerr() and friends. > [...] > > $ printf '%d\n' '1+|ÃÃÃÃÃÃ' > > zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...' > > I picked à because it's a letter from the latin script, so you > can even use it in variable names Sure, but it's not being interpreted as any part of a variable name here, because you put it in quotes and then passed it to printf. Remove the "|" and you get something worse: % printf '%d\n' '1+ÃÃÃÃÃÃ' BUG: unexpected end of string in ztrlen() zsh: bad math expression: operand expected at `\M-C\M-c\M-c\M-c\M-c\M-c' 0 % printf '%d\n' $((1+ÃÃÃÃÃÃ)) 1 (Also a bit weird that the first \M-C is capitalized and the rest are not?) Still not a problem to be resolved in zerr(). > > $ ((1+|ÃÃÃÃÃÃ)) > > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...' > > In that case, metafication OK, but character cut in the middle. Still not zerr()'s fault and needs to be addressed where the number of bytes for %l is being calculated in checkunary(). > > % ((1+|ÃÃÃÃÃÃ)) > > zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...' > > It seems rather worse to me. That's because of the way I chose to lift nice-ifying up into checkunary() for testing the approach. It's hard to be consistent there, given the foregoing business about different formats being sent down from printf vs. $((...)), and it's also why I said "no patch without feedback". ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-23 22:32 ` Bart Schaefer @ 2024-02-23 23:38 ` Bart Schaefer 2024-02-24 9:47 ` Stephane Chazelas 1 sibling, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-23 23:38 UTC (permalink / raw) To: zsh workers On Fri, Feb 23, 2024 at 2:32 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > More relevant to this discussion is that math errors are one of the > two existing callers using the %l format, so any attempt to improve > this is going to require changing those calls anyway. > > we have just the one known case of wanting something different for > ${var?msg}. > > There's exactly one case so far of wanting output to contain NUL, and > per the "only caller can interpret" assertion, it seems worthwhile to > use %l for the NUL case and let the other 3 callers decide to "nice" > the strings they pass (or not). That last sentence should be "other 3 calls" not callers (there are only 2). Having worked out that issue with bin_print() [other thread], it appears to me that NONE of the existing calls using %l want their output nice'd (for example, the only way to get consistent readable errors from math garble is to output the strings raw, or at least something less than nice'd and yet to be determined). This suggests strongly to me that changing %l and scrapping the %S idea is the way to go. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-23 22:32 ` Bart Schaefer 2024-02-23 23:38 ` Bart Schaefer @ 2024-02-24 9:47 ` Stephane Chazelas 2024-02-24 10:36 ` Stephane Chazelas 2024-02-25 2:17 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer 1 sibling, 2 replies; 33+ messages in thread From: Stephane Chazelas @ 2024-02-24 9:47 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-23 14:32:49 -0800, Bart Schaefer: [...] > > zsh: bad math expression: operand expected at `|aM-^C c' > > You're missing part of my point here. > > % printf '%d\n' $(( 1+|a\x83 c )) > zsh: bad math expression: operand expected at `|a\x83 c ' > > That is IMO more useful than either of "^@" or "M-^C " and is down to > the difference between using printf on a $'...' string (which is > interpreted before printf even gets its mitts on it, and two layers > before the math parser does) vs. using the actual math parser > directly. This has nothing to do with how the string is passed to > zerr() and everything to do with how printf and parsing interpret the > input -- by the time the math parser actually calls zerr() it can't > know how to unwind that, and the internals of zerr() are even further > removed. > > I would therefore argue that these examples are out of scope for this > discussion -- these examples are not about how zerr() et al. should > receive strings, they're about how the math parser should receive > them, and needs to be fixed upstream e.g. in bin_print(). [...] I agree the bug is in printf which forgets to metafy the input before passing to the math parse. Which can be seen with: $ typeset -A a $ printf '%d\n' 'a[ÃÃÃÃÃÃ]=1' 1 $ (( a[ÃÃÃÃÃÃ] = 2 )) typeset -A a=( [ÃÃÃÃÃÃ]=2 [$'\M-C\M-c\M-c\M-c\M-c\M-c\M-\C-C']=1 ) > More relevant to this discussion is that math errors are one of the > two existing callers using the %l format, so any attempt to improve > this is going to require changing those calls anyway. I don't see why we'd need to change the call to zerr in those cases. Just fix printf. $ a=$'\x83 foobarbaz' b='\x83 foobarbaz' ~$ (( 1+|$b )) zsh: bad math expression: operand expected at `|\x83 foob...' $ (( 1+|$a )) zsh: bad math expression: operand expected at `|\M-^C foobarb...' Are correct, we do want the 0x83 byte which is not printable to be rendered as \M-^C. > > > For 1, IMO, when the error message is generated by zsh, it > > should go through nicezputs(). zsh should decide of the > > formatting, have it pass escape sequences as-is would make it > > hard to understand and diagnose the error. > > Agreed in concept, but there's a difference between errors actually > generated BY zsh, and errors with user input that zsh is reporting. > For example, the same literal string might be a file name generated by > globbing, or it might be something the user typed out in a > syntactically invalid command. There's no way to put intelligence > about how to format those into the guts of zerr(). I don't think that's a contention point. All those cases are cases where we need to make the non-printable characters in the user data visible with nicezputs. The question is not about user input vs no user input in the displayed error, but only for those where there's user input, whether that user input is mean to be an error message formatted by the user or not. And I can only think of ${var[:]?user-supplied-error}, and imagine that at least 99% of the 499 other cases are not about printing a user-supplied error message. > There's already a way to pass text not containing NUL (%s) and a way > to pass text as ptr+len (%l). There are a vanishingly small number of > uses of the latter (2 callers out of the ~500 total call examples). > There's exactly one case so far of wanting output to contain NUL, and > per the "only caller can interpret" assertion, it seems worthwhile to > use %l for the NUL case and let the other 3 callers decide to "nice" > the strings they pass (or not). > > This not only skips extra metafication needed to use the proposed %S, > but also simplifies the implementation of %l, and requires the > addition of only 1 or 2 lines of code to each of the two existing > callers using %l (maybe zero lines of code in the case of yyerror()). > > > %S also passed metafied, but no nicezputs. > > That requires metafy in the caller followed by unmetafy in zerr(). > Much easier to remove code from %l than to add it to a new %S, > especially given that we're editing the solitary caller where %S would > be used. But in the case of ${var?err}, the err is already metafied, so if you make %l take unmetafied input, you're just moving the unmetafication to the caller which is counterproductive as it makes it break on NULs. Also %l is intended (at least in the one case I saw it used) to truncase user input, so it should be nicezputs'ed. > > > Now, my previous message was showing there were quite a few > > issue with the metafication and possibly with the nicezputs'ing > > and/or multibyte handling. > > Fine, but not fixable in zerr() and friends. Sorry for the confusion, I didn't mean to say that's where it was to be fixed. I agree it's all cases where it's the caller failing to do the metafication (in the case of printf, the metafication was missing from much earlier). [...] > % printf '%d\n' '1+ÃÃÃÃÃÃ' > BUG: unexpected end of string in ztrlen() > zsh: bad math expression: operand expected at `\M-C\M-c\M-c\M-c\M-c\M-c' > 0 > % printf '%d\n' $((1+ÃÃÃÃÃÃ)) > 1 > > (Also a bit weird that the first \M-C is capitalized and the rest are > not?) Still not a problem to be resolved in zerr(). \M-C is the visual representaion of 0xc3, \M-c of 0xe3, ÃÃ is c3 83 c3 83. It's just that unmetafy turned 83 c3 into e3. > > > $ ((1+|ÃÃÃÃÃÃ)) > > > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...' > > > > In that case, metafication OK, but character cut in the middle. > > Still not zerr()'s fault and needs to be addressed where the number of > bytes for %l is being calculated in checkunary(). zerr could try and decode the string as text and truncate the specified number of *characters* instead of bytes, but like I said, that may be overkill as we can live with the odd character cut in the middle. > > > % ((1+|ÃÃÃÃÃÃ)) > > > zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...' > > > > It seems rather worse to me. > > That's because of the way I chose to lift nice-ifying up into > checkunary() for testing the approach. It's hard to be consistent > there, given the foregoing business about different formats being sent > down from printf vs. $((...)), and it's also why I said "no patch > without feedback". I guess those ? are some 0x83 bytes added by metafication, and we're missing the corresponding unmetafy. To me, the only things to do are: 1. add a %S for raw output (expects metafied input like everything else) to tbe used by ${var[:]?error} and likely only those. 2. Add missing metafy in bin_print (and possibly elsewhere) before calling the math parser 3. Fix those cases where zerrmsg is called with %s/%l/%S arguments non-metafied like in that "bad interpreter" case above. 4. (optional): Improve %l usages to truncate based on number of characters rather than bytes or at least avoid cutting characters in the middle. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-24 9:47 ` Stephane Chazelas @ 2024-02-24 10:36 ` Stephane Chazelas 2024-02-25 4:35 ` [PATCH] 'bad interpreter' error garbled Bart Schaefer 2024-02-25 2:17 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer 1 sibling, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-24 10:36 UTC (permalink / raw) To: Bart Schaefer, zsh workers 2024-02-24 09:47:22 +0000, Stephane Chazelas: [...] > 3. Fix those cases where zerrmsg is called with %s/%l/%S > arguments non-metafied like in that "bad interpreter" case > above. [...] Sorry, I just realised I didn't mention that "case above": $ echo '#! ÃÃÃÃ' > a $ chmod +x a $ ./a zsh: ./a: bad interpreter: \M-C\M-c\M-c\M-c\M-^C: no such file or directory -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] 'bad interpreter' error garbled 2024-02-24 10:36 ` Stephane Chazelas @ 2024-02-25 4:35 ` Bart Schaefer 2024-02-25 5:26 ` Bart Schaefer 0 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-25 4:35 UTC (permalink / raw) To: zsh workers [-- Attachment #1: Type: text/plain, Size: 219 bytes --] On Sat, Feb 24, 2024 at 2:36 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ echo '#! ÃÃÃÃ' > a > $ chmod +x a > $ ./a > zsh: ./a: bad interpreter: \M-C\M-c\M-c\M-c\M-^C: no such file or directory [-- Attachment #2: bad-bad-interp.txt --] [-- Type: text/plain, Size: 778 bytes --] diff --git a/Src/exec.c b/Src/exec.c index 1565cb13e..c75aa78d6 100644 --- a/Src/exec.c +++ b/Src/exec.c @@ -551,7 +551,7 @@ zexecve(char *pth, char **argv, char **newenvp) break; if (t0 == ct) zerr("%s: bad interpreter: %s: %e", pth, - execvebuf + 2, eno); + metafy(execvebuf + 2, -1, META_NOALLOC), eno); else { while (inblank(execvebuf[t0])) execvebuf[t0--] = '\0'; @@ -574,8 +574,8 @@ zexecve(char *pth, char **argv, char **newenvp) execve(pprog, argv - 2, newenvp); } } - zerr("%s: bad interpreter: %s: %e", pth, ptr2, - eno); + zerr("%s: bad interpreter: %s: %e", pth, + metafy(ptr2, -1, META_NOALLOC), eno); } else if (*ptr) { *ptr = '\0'; argv[-2] = ptr2; ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] 'bad interpreter' error garbled 2024-02-25 4:35 ` [PATCH] 'bad interpreter' error garbled Bart Schaefer @ 2024-02-25 5:26 ` Bart Schaefer 0 siblings, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-25 5:26 UTC (permalink / raw) To: zsh workers Oops, just noticed a pair of typos in the patch -- META_NOALLOC was meant to be META_STATIC. I'll push a fix. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-24 9:47 ` Stephane Chazelas 2024-02-24 10:36 ` Stephane Chazelas @ 2024-02-25 2:17 ` Bart Schaefer 2024-02-25 6:05 ` Bart Schaefer 1 sibling, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-25 2:17 UTC (permalink / raw) To: zsh workers On Sat, Feb 24, 2024 at 1:47 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2024-02-23 14:32:49 -0800, Bart Schaefer: > [...] > > More relevant to this discussion is that math errors are one of the > > two existing callers using the %l format, so any attempt to improve > > this is going to require changing those calls anyway. > > I don't see why we'd need to change the call to zerr in those > cases. Just fix printf. That doesn't resolve the issue that math.c:checkunary() is miscalculating the truncation width even before calling zerr(). Having looked more closely now, if I fix that calculation, then checkunary() can do the truncation itself and doesn't need to use %l any longer. > But in the case of ${var?err}, the err is already metafied, so > if you make %l take unmetafied input, you're just moving the > unmetafication to the caller which is counterproductive as it > makes it break on NULs. No, it doesn't make it break on NULs, because unmetafy() returns the length of the resulting string including the NULs, which we can then pass down to zerr(). The actual tradeoff to make %l do the raw output is: + fix the width calculation in math.c, then use %s * + add one call to metafy() in parse.c, to use %s instead of %l ** + add one call to unmetafy() in subst.c, to use %l instead of %s - remove metalen(), zhalloc(), memcpy(), and nicezputs() from utils.c + add one call to fwrite() in place of the above * Which we should do anyway, so not really a tradeoff ** It's not even certain that's required, given that we're outputting a string from the lexer > Also %l is intended (at least in the one case I saw it used) to > truncase user input, so it should be nicezputs'ed. Unfortunately it's used *incorrectly* to truncate user input at that one place. > To me, the only things to do are: > > 1. add a %S for raw output (expects metafied input like > everything else) to tbe used by ${var[:]?error} and likely only > those. That adds another branch in utils.c with at least the same unmetafy() + fwrite(), instead of removing code that the callers no longer need. > 2. Add missing metafy in bin_print (and possibly elsewhere) > before calling the math parser Needed either way. > 3. Fix those cases where zerrmsg is called with %s/%l/%S > arguments non-metafied like in that "bad interpreter" case > above. That's a whack-a-mole hunt and also orthogonal to whether or not to add %S. > 4. (optional): Improve %l usages to truncate based on number of > characters rather than bytes or at least avoid cutting > characters in the middle. Not needed. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-25 2:17 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer @ 2024-02-25 6:05 ` Bart Schaefer 2024-02-25 7:29 ` Stephane Chazelas 0 siblings, 1 reply; 33+ messages in thread From: Bart Schaefer @ 2024-02-25 6:05 UTC (permalink / raw) To: zsh workers On Sat, Feb 24, 2024 at 6:17 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > The actual tradeoff to make %l do the raw output is: > + fix the width calculation in math.c, then use %s * > + add one call to metafy() in parse.c, to use %s instead of %l ** > + add one call to unmetafy() in subst.c, to use %l instead of %s > - remove metalen(), zhalloc(), memcpy(), and nicezputs() from utils.c > + add one call to fwrite() in place of the above Stephane, if you really still think it preferable to add %S (thus moving the unmetafy() from subst.c to utils.c) then I think we should just do away with %l entirely. Either way we're ending up with a branch in zerrmsg() that's used for exactly one special case. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-25 6:05 ` Bart Schaefer @ 2024-02-25 7:29 ` Stephane Chazelas 2024-02-25 8:28 ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas 0 siblings, 1 reply; 33+ messages in thread From: Stephane Chazelas @ 2024-02-25 7:29 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh workers 2024-02-24 22:05:26 -0800, Bart Schaefer: > On Sat, Feb 24, 2024 at 6:17 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > The actual tradeoff to make %l do the raw output is: > > + fix the width calculation in math.c, then use %s * > > + add one call to metafy() in parse.c, to use %s instead of %l ** > > + add one call to unmetafy() in subst.c, to use %l instead of %s > > - remove metalen(), zhalloc(), memcpy(), and nicezputs() from utils.c > > + add one call to fwrite() in place of the above > > Stephane, if you really still think it preferable to add %S (thus > moving the unmetafy() from subst.c to utils.c) then I think we should > just do away with %l entirely. Either way we're ending up with a > branch in zerrmsg() that's used for exactly one special case. I don't have a strong opinion on it. Either way, the internal documentation should make it clear the metafy and nicezputs expectations. Like: %s expects metafied string, prints it nicezputs'ed %l expects non-metafied string and byte length, prints it raw. it's the caller's responsibility to make sure the string is not truncated in the middle of a character. Note that there's truncating in between the bytes of a character and there's truncating between the characters of a grapheme cluster like decomposed Hangul syllables (or $'Ste\u301phane'). I don't think we'd want to try and address that. Other option could be to have %s, %S and %*s %*S and let nicezputs do the truncation and ellipsis in %*s case which might be a cleaner interface if (likely) more cumbersome to implement. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* typeset -<non-ASCII> (Was: metafication in error messages) 2024-02-25 7:29 ` Stephane Chazelas @ 2024-02-25 8:28 ` Stephane Chazelas 2024-02-25 8:35 ` Stephane Chazelas 2024-02-25 21:02 ` Bart Schaefer 0 siblings, 2 replies; 33+ messages in thread From: Stephane Chazelas @ 2024-02-25 8:28 UTC (permalink / raw) To: Bart Schaefer, zsh workers By the way, while looking at other culprits for missing metafication in calls to zerr*, I found (in UTF-8 locale): $ typeset -é typeset: bad option: -^ ~$ zsh -c 'typeset -é' |& sed -n l zsh:typeset:1: bad option: -^\003$ (gdb) 305 while (*fmt) (gdb) 306 if (*fmt == '%') { (gdb) 307 fmt++; (gdb) 308 switch (*fmt++) { (gdb) 348 num = va_arg(ap, int); (gdb) 350 mb_charinit(); (gdb) p num $2 = -61 (gdb) p (unsigned char) num $3 = 195 '\303' (that's 0xc3, the first byte of é, normally rendered as \M-C). We end up with a ^ because in wcs_nicechar_sel(): wcs_nicechar_sel (c=-61 L'\xffffffc3', widthp=0x0, swidep=0x0, quotable=0) at utils.c:602 602 int ret = 0; (gdb) n 603 VARARR(char, mbstr, MB_CUR_MAX); (gdb) 612 newalloc = NICECHAR_MAX; (gdb) 613 if (bufalloc != newalloc) (gdb) 619 s = buf; (gdb) 620 if (!WC_ISPRINT(c) && (c < 0x80 || !isset(PRINTEIGHTBIT))) { (gdb) 621 if (c == 0x7f) { (gdb) 629 } else if (c == L'\n') { (gdb) 632 } else if (c == L'\t') { (gdb) 635 } else if (c < 0x20) { (gdb) 636 if (quotable) { (gdb) 641 *s++ = '^'; It's < 0x20 because negative and we end up with ^<0x3> because that's -61 + 64. zerrmsg's %c expects a wchar_t when MULTIBYTE_SUPPORT is enabled, but here we're passing it the first byte (signed) of the encoding of the wide character. Calling it with (unsigned char)*arg is not much better as you end up with: zsh:typeset:1: bad option: -Ã As U+00C3 (Ã) happens to be printable. If it weren't -\M-C would not be much better anyway. And you also get zsh:typeset:1: bad option: -Ã for typeset -$'\xc3' Some options could be: - return a "no non-ASCII/multibyte option supported" error message when *arg >= 0x80 - extract and decode the full multibyte character before passing to zerrmsg. which leaves the problem of how to render byte sequences that can't be decoded. Maybe there are already functions in the code that do that? -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: typeset -<non-ASCII> (Was: metafication in error messages) 2024-02-25 8:28 ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas @ 2024-02-25 8:35 ` Stephane Chazelas 2024-02-25 21:02 ` Bart Schaefer 1 sibling, 0 replies; 33+ messages in thread From: Stephane Chazelas @ 2024-02-25 8:35 UTC (permalink / raw) To: Bart Schaefer, zsh workers $ zsh -c $'typeset -\x83' zsh:typeset:1: bad option: -\M-C is also incorrect, \M-C is 0xc3. 0x83 is \M-^C. -- Stephane ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: typeset -<non-ASCII> (Was: metafication in error messages) 2024-02-25 8:28 ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas 2024-02-25 8:35 ` Stephane Chazelas @ 2024-02-25 21:02 ` Bart Schaefer 1 sibling, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-25 21:02 UTC (permalink / raw) To: zsh workers On Sun, Feb 25, 2024 at 12:29 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > By the way, while looking at other culprits for missing > metafication in calls to zerr*, I found (in UTF-8 locale): > > $ typeset -é > typeset: bad option: -^ I did warn that any such effort was going to be whack-a-mole. On Sun, Feb 25, 2024 at 12:35 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ zsh -c $'typeset -\x83' > zsh:typeset:1: bad option: -\M-C > > is also incorrect, \M-C is 0xc3. 0x83 is \M-^C. IMO this is GIGO -- we have not and likely never will have non-ascii option letters, so detecting multibyte characters and pretty-printing them on error is a waste of effort. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) 2024-02-22 7:55 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas 2024-02-22 17:02 ` Bart Schaefer @ 2024-02-23 0:33 ` Bart Schaefer 2024-02-25 5:06 ` [PATCH] count multibyte and metafied characters correctly for math ops errors Bart Schaefer 2 siblings, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-23 0:33 UTC (permalink / raw) To: zsh workers Incidentally ... On Wed, Feb 21, 2024 at 11:55 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ printf '%d\n' $'1+|a\0b' > zsh: bad math expression: operand expected at `|a' That doesn't have anything to do with error message presentation, the integer parser itself stops at the nul byte before even checking the error. GIGO. And the differences in these two ... > $ printf '%d\n' '1+|ÃÃÃÃÃÃ' > zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...' > $ ((1+|ÃÃÃÃÃÃ)) > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...' ... are down to the parsing of quoted arguments vs. ((...)), with some fudge because of the way the use of ellipsis is calculated. This would have to be sorted out in bin_print() or something. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] count multibyte and metafied characters correctly for math ops errors 2024-02-22 7:55 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas 2024-02-22 17:02 ` Bart Schaefer 2024-02-23 0:33 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer @ 2024-02-25 5:06 ` Bart Schaefer 2 siblings, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-25 5:06 UTC (permalink / raw) To: zsh workers [-- Attachment #1: Type: text/plain, Size: 624 bytes --] On Wed, Feb 21, 2024 at 11:55 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ ((1+|ÃÃÃÃÃÃ)) > zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...' As Stephane also pointed out, ÃÃÃÃÃÃ is a valid identifier, so it's not farfetched that someone could make a typo using one in a math expression. The code added here could be made into a utility [metalen() is almost there] but as I don't know of anywhere else this is needed, I just inlined it. I looked through the other mb_* routines in utils.c but they all appeared to do what would be extraneous work for this purpose. [-- Attachment #2: mathop-errmsgs.txt --] [-- Type: text/plain, Size: 1106 bytes --] diff --git a/Src/math.c b/Src/math.c index 50b69d6a1..d97dae238 100644 --- a/Src/math.c +++ b/Src/math.c @@ -1557,21 +1557,32 @@ checkunary(int mtokc, char *mptr) errmsg = 2; } if (errmsg) { - int len, over = 0; + int len = 0, over = 0; char *errtype = errmsg == 2 ? "operator" : "operand"; while (inblank(*mptr)) mptr++; - len = ztrlen(mptr); - if (len > 10) { - len = 10; - over = 1; + if (isset(MULTIBYTE)) + MB_CHARINIT(); + while (over < 10 && mptr[len]) { + if (isset(MULTIBYTE)) + len += MB_METACHARLEN(mptr+len); + else + len += (mptr[len] == Meta ? 2 : 1); + ++over; + } + if ((over = mptr[len])) { + mptr = dupstring(mptr); + if (mptr[len] == Meta) + mptr[len+1] = 0; + else + mptr[len] = 0; } if (!*mptr) zerr("bad math expression: %s expected at end of string", errtype); else - zerr("bad math expression: %s expected at `%l%s'", - errtype, mptr, len, over ? "..." : ""); + zerr("bad math expression: %s expected at `%s%s'", + errtype, mptr, over ? "..." : ""); } unary = !(tp & OP_OPF); } ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] unmetafy Re: $var not expanded in ${x?$var} 2024-02-22 7:23 ` Stephane Chazelas 2024-02-22 7:55 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas @ 2024-02-22 8:34 ` Roman Perepelitsa 2024-02-22 17:07 ` Bart Schaefer 1 sibling, 1 reply; 33+ messages in thread From: Roman Perepelitsa @ 2024-02-22 8:34 UTC (permalink / raw) To: Bart Schaefer, zsh workers On Thu, Feb 22, 2024 at 8:23 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > > + fwrite(unmetafy(dupstring(str), &num), num, 1, file); > > Being no C expert, I wonder if it's safe (portable) to set and > use num in the same call like that. Is it guaranteed to be done > in the right order? In C, function arguments are evaluated in unspecified order. The behavior of the fwrite() call above depends on the evaluation order of its arguments, and is therefore unspecified. Evaluation can even be interleaved. Consider: a(b(), c(d())) Here, it is possible that functions will be invoked in this order: d, b, c. Basically, function arguments are evaluated before the function is invoked (obviously), but other than that there are no evaluation order guarantees. Here's another example of unspecified behavior: *f() = g(); Here, f() and g() can be evaluated in any order. There is a related gotcha where the same object is accessed more than once between sequence points and at least one of these accesses is a write. Like this: int a = 0; int b = ++a + a; Or like this: foo(++a, a); This is undefined behavior, not just unspecified. Roman. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] unmetafy Re: $var not expanded in ${x?$var} 2024-02-22 8:34 ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa @ 2024-02-22 17:07 ` Bart Schaefer 0 siblings, 0 replies; 33+ messages in thread From: Bart Schaefer @ 2024-02-22 17:07 UTC (permalink / raw) To: zsh workers On Thu, Feb 22, 2024 at 12:35 AM Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote: > > On Thu, Feb 22, 2024 at 8:23 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > > > > + fwrite(unmetafy(dupstring(str), &num), num, 1, file); > > > > Being no C expert, I wonder if it's safe (portable) to set and > > use num in the same call like that. Is it guaranteed to be done > > in the right order? > > In C, function arguments are evaluated in unspecified order. Yeah, been too long since I had to think about sequence points. However, per commentary in the other thread, I'm going to recommend against using this patch in the first place. We need to rethink this whole approach. ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2024-02-25 21:02 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-13 8:02 $var not expanded in ${x?$var} Stephane Chazelas 2023-01-16 8:35 ` Daniel Shahaf 2023-01-16 17:15 ` Peter Stephenson 2024-02-20 7:05 ` Stephane Chazelas 2024-02-20 17:45 ` Bart Schaefer 2024-02-20 19:39 ` Stephane Chazelas 2024-02-21 4:44 ` Bart Schaefer 2024-02-21 19:45 ` Stephane Chazelas 2024-02-21 19:52 ` Bart Schaefer 2024-02-21 20:21 ` Stephane Chazelas 2024-02-22 0:46 ` [PATCH] unmetafy " Bart Schaefer 2024-02-22 7:23 ` Stephane Chazelas 2024-02-22 7:55 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Stephane Chazelas 2024-02-22 17:02 ` Bart Schaefer 2024-02-22 22:31 ` Bart Schaefer 2024-02-23 0:49 ` Bart Schaefer 2024-02-23 19:27 ` Stephane Chazelas 2024-02-23 22:32 ` Bart Schaefer 2024-02-23 23:38 ` Bart Schaefer 2024-02-24 9:47 ` Stephane Chazelas 2024-02-24 10:36 ` Stephane Chazelas 2024-02-25 4:35 ` [PATCH] 'bad interpreter' error garbled Bart Schaefer 2024-02-25 5:26 ` Bart Schaefer 2024-02-25 2:17 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer 2024-02-25 6:05 ` Bart Schaefer 2024-02-25 7:29 ` Stephane Chazelas 2024-02-25 8:28 ` typeset -<non-ASCII> (Was: metafication in error messages) Stephane Chazelas 2024-02-25 8:35 ` Stephane Chazelas 2024-02-25 21:02 ` Bart Schaefer 2024-02-23 0:33 ` Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Bart Schaefer 2024-02-25 5:06 ` [PATCH] count multibyte and metafied characters correctly for math ops errors Bart Schaefer 2024-02-22 8:34 ` [PATCH] unmetafy Re: $var not expanded in ${x?$var} Roman Perepelitsa 2024-02-22 17:07 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).