zsh-workers
 help / color / mirror / code / Atom feed
* \M-^C vs \203 vs \x83 as visual representations of bytes
@ 2024-02-25  9:07 Stephane Chazelas
  2024-02-25 17:26 ` Mark J. Reed
  2024-02-25 18:25 ` Stephane Chazelas
  0 siblings, 2 replies; 5+ messages in thread
From: Stephane Chazelas @ 2024-02-25  9:07 UTC (permalink / raw)
  To: Zsh hackers list

I guess \M-^C (or \M-\C-C) as the representation of 0x83 made
sense to people in the 80s/90s when they could actually type
Meta-Ctrl-C on their keyboard to input them.

Noaways, you can still enter ^C with Ctrl+C but bytes >= 0x80
are used for non-ASCII characters, and Alt-C usually sends ^[c
(0x1b 0x63) and Alt-Ctrl-C ^[^C (0x1b 0x3)

I find the \x83 representation more useful when giving visual
representations of bytes not forming part of a printable
characters (and \uffff / \U0010ffff for valid but non-printable
multi-byte characters). Octal used to be more popular than hex,
but I think nowadays it's the reverse, though I still find \203
more useful than \M-^C if not \x83.

What do people think?

Would it make sense to change some of the output intended for
user consumption such as:

$ a=$'\x83'
$ typeset a
a=$'\M-\C-C'
$ print -r ${(q+)a}
$'\M-\C-C'
$ (set -x; : $a)
+/bin/zsh:29> : $'\M-\C-C'


$ let $a
zsh: bad math expression: illegal character: \M-C
$ let 1+$a
zsh: bad math expression: operand expected at `\M-^C'

(another M-C vs M-^C bug above by the way).

Comparison with some other tools/shells:

$ echo $a | cat -v
M-^C
$ echo $a | sed -n l
\203$
$ bash -xc 'printf "%q\n" "$a"'
+ printf '%q\n' $'\203'
$'\203'
$ ksh -xc 'printf "%q\n" "$a"'
+ printf '%q\n' $'\x83'
$'\x83'

-- 
Stephane


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: \M-^C vs \203 vs \x83 as visual representations of bytes
  2024-02-25  9:07 \M-^C vs \203 vs \x83 as visual representations of bytes Stephane Chazelas
@ 2024-02-25 17:26 ` Mark J. Reed
  2024-02-25 18:50   ` Stephane Chazelas
  2024-02-25 20:54   ` Bart Schaefer
  2024-02-25 18:25 ` Stephane Chazelas
  1 sibling, 2 replies; 5+ messages in thread
From: Mark J. Reed @ 2024-02-25 17:26 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

Wow; I had no idea Zsh printed out nonprintable characters that way. I
concur that it would make sense to change, modulo backward compatibility
considerations. Maybe a settable option?

 The M- "meta" syntax is something I associate with Emacs key-binding, and
have rarely seen in other contexts.

On Sun, Feb 25, 2024 at 4:08 AM Stephane Chazelas <stephane@chazelas.org>
wrote:

> I guess \M-^C (or \M-\C-C) as the representation of 0x83 made
> sense to people in the 80s/90s when they could actually type
> Meta-Ctrl-C on their keyboard to input them.
>
> Noaways, you can still enter ^C with Ctrl+C but bytes >= 0x80
> are used for non-ASCII characters, and Alt-C usually sends ^[c
> (0x1b 0x63) and Alt-Ctrl-C ^[^C (0x1b 0x3)
>
> I find the \x83 representation more useful when giving visual
> representations of bytes not forming part of a printable
> characters (and \uffff / \U0010ffff for valid but non-printable
> multi-byte characters). Octal used to be more popular than hex,
> but I think nowadays it's the reverse, though I still find \203
> more useful than \M-^C if not \x83.
>
> What do people think?
>
> Would it make sense to change some of the output intended for
> user consumption such as:
>
> $ a=$'\x83'
> $ typeset a
> a=$'\M-\C-C'
> $ print -r ${(q+)a}
> $'\M-\C-C'
> $ (set -x; : $a)
> +/bin/zsh:29> : $'\M-\C-C'
>
>
> $ let $a
> zsh: bad math expression: illegal character: \M-C
> $ let 1+$a
> zsh: bad math expression: operand expected at `\M-^C'
>
> (another M-C vs M-^C bug above by the way).
>
> Comparison with some other tools/shells:
>
> $ echo $a | cat -v
> M-^C
> $ echo $a | sed -n l
> \203$
> $ bash -xc 'printf "%q\n" "$a"'
> + printf '%q\n' $'\203'
> $'\203'
> $ ksh -xc 'printf "%q\n" "$a"'
> + printf '%q\n' $'\x83'
> $'\x83'
>
> --
> Stephane
>
>

-- 
Mark J. Reed <markjreed@gmail.com>

[-- Attachment #2: Type: text/html, Size: 2654 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: \M-^C vs \203 vs \x83 as visual representations of bytes
  2024-02-25  9:07 \M-^C vs \203 vs \x83 as visual representations of bytes Stephane Chazelas
  2024-02-25 17:26 ` Mark J. Reed
@ 2024-02-25 18:25 ` Stephane Chazelas
  1 sibling, 0 replies; 5+ messages in thread
From: Stephane Chazelas @ 2024-02-25 18:25 UTC (permalink / raw)
  To: Zsh hackers list

2024-02-25 09:07:51 +0000, Stephane Chazelas:
[...]
> $ ksh -xc 'printf "%q\n" "$a"'
> + printf '%q\n' $'\x83'
> $'\x83'
[...]

For the record, ksh93 switched from using \203 to using \x83 in
ksh93u+ in 2012.
https://github.com/ksh93/ksh93-history/commit/1753aa035#diff-c04f1e85360bcf953a69830ce48b4c81a0cdea3851729a0090527e82f21e803bR436

-- 
Stephane


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: \M-^C vs \203 vs \x83 as visual representations of bytes
  2024-02-25 17:26 ` Mark J. Reed
@ 2024-02-25 18:50   ` Stephane Chazelas
  2024-02-25 20:54   ` Bart Schaefer
  1 sibling, 0 replies; 5+ messages in thread
From: Stephane Chazelas @ 2024-02-25 18:50 UTC (permalink / raw)
  To: Mark J. Reed; +Cc: Zsh hackers list

2024-02-25 12:26:35 -0500, Mark J. Reed:
> Wow; I had no idea Zsh printed out nonprintable characters that way. I
> concur that it would make sense to change, modulo backward compatibility
> considerations. Maybe a settable option?
[...]

Note that I'm not suggesting zsh stop accepting it on input, but
to change the output format.

On input, zsh supports \203 and \x83 more widely than \M-\C-C

Supported by print and $'...'

$ print '\x83\203\M-\C-C' | sed -n l
\203\203\203$
$ print -r $'\x83\203\M-\C-C' | sed -n l
\203\203\203$

But not:

$ echo '\x83\203\M-\C-C' | sed -n l
\203\\203\\M-\\C-C$
$ echo '\x83\0203\M-\C-C' | sed -n l
\203\203\\M-\\C-C$
$ printf '\x83\0203\M-\C-C\n' | sed -n l
\203\0203\\M-\\C-C$

So I wouldn't think the switch would break backward
compatibility. Switching to \203 or \x83 would actually improve
compatibility with other shells.

Ksh has \CC instead of \C-C (inside $'...') and it's \M-C seems to expand to
^[C (and \M-c doesn't seem to be recognised).

In ksh however, \xfff is the same as \ufff (while \xff is not
the same as \uff) and you need \x[ff]f or \x{ff}f to have a 0xff
byte followed by f. So the output would not be compatible with
ksh if switching to \xHH.

$ ksh -c 'printf "%q\n" "$@"' ksh $'\xff' $'\xfff'
$'\xff'
$'\x[ff]f'

See also
https://github.com/ksh93/ksh/commit/ac8991e5257978a6359c001b7fa227c334fd9e18

-- 
Stephane


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: \M-^C vs \203 vs \x83 as visual representations of bytes
  2024-02-25 17:26 ` Mark J. Reed
  2024-02-25 18:50   ` Stephane Chazelas
@ 2024-02-25 20:54   ` Bart Schaefer
  1 sibling, 0 replies; 5+ messages in thread
From: Bart Schaefer @ 2024-02-25 20:54 UTC (permalink / raw)
  To: Mark J. Reed; +Cc: Zsh hackers list

On Sun, Feb 25, 2024 at 9:26 AM Mark J. Reed <markjreed@gmail.com> wrote:
>
>  The M- "meta" syntax is something I associate with Emacs key-binding

Which is why zsh prints it that way, so that error output is typically
in the same format as e.g. bindkey definitions.

I have no particular opinion about this but will not myself be
implementing any changes in this regard.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-25 20:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-25  9:07 \M-^C vs \203 vs \x83 as visual representations of bytes Stephane Chazelas
2024-02-25 17:26 ` Mark J. Reed
2024-02-25 18:50   ` Stephane Chazelas
2024-02-25 20:54   ` Bart Schaefer
2024-02-25 18:25 ` Stephane Chazelas

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).