zsh-workers
 help / color / mirror / code / Atom feed
* UNICODE Private Use Area characters in BUFFER
@ 2022-10-23 10:12 Roman Perepelitsa
  2022-10-23 16:29 ` Mikael Magnusson
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-23 10:12 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 778 bytes --]

Zle cannot display UNICODE Private Use Area characters in BUFFER.

    % f() BUFFER=$'\uE0B0'
    % zle -N f
    % bindkey '^T' f
    % <Ctrl-T>

Expected: the last line shows the glyph for U+E0B0 (whichever way the
terminal chooses to render it).

Actual: the last line shows <b0> in reverse video.

I haven't looked at the code but my guess is that zle assumes that
characters from Private Use Area never have a native reasonable
representation, so it attempts to show codepoints. This assumption is
incorrect. I think Private Use Area characters shouldn't be handled
specially. If there is no glyph in the terminal's font for a character, let
the terminal decide how to present that.

Note: Private Use Area characters work fine everywhere else. For example,
in PS1.

Roman.

[-- Attachment #2: Type: text/html, Size: 1021 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 10:12 UNICODE Private Use Area characters in BUFFER Roman Perepelitsa
@ 2022-10-23 16:29 ` Mikael Magnusson
  2022-10-23 16:43   ` Roman Perepelitsa
  0 siblings, 1 reply; 23+ messages in thread
From: Mikael Magnusson @ 2022-10-23 16:29 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On 10/23/22, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> Zle cannot display UNICODE Private Use Area characters in BUFFER.
>
>     % f() BUFFER=$'\uE0B0'
>     % zle -N f
>     % bindkey '^T' f
>     % <Ctrl-T>
>
> Expected: the last line shows the glyph for U+E0B0 (whichever way the
> terminal chooses to render it).
>
> Actual: the last line shows <b0> in reverse video.
>
> I haven't looked at the code but my guess is that zle assumes that
> characters from Private Use Area never have a native reasonable
> representation, so it attempts to show codepoints. This assumption is
> incorrect. I think Private Use Area characters shouldn't be handled
> specially. If there is no glyph in the terminal's font for a character, let
> the terminal decide how to present that.
>
> Note: Private Use Area characters work fine everywhere else. For example,
> in PS1.

I'm not sure we have any choice, we have to know how wide every
character we print is, and presumably there is no defined width for
them as the characters themselves are not defined.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 16:29 ` Mikael Magnusson
@ 2022-10-23 16:43   ` Roman Perepelitsa
  2022-10-23 17:02     ` Bart Schaefer
  2022-10-23 22:42     ` Mikael Magnusson
  0 siblings, 2 replies; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-23 16:43 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Zsh hackers list

On Sun, Oct 23, 2022 at 6:29 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> > Note: Private Use Area characters work fine everywhere else. For example,
> > in PS1.
>
> I'm not sure we have any choice, we have to know how wide every
> character we print is, and presumably there is no defined width for
> them as the characters themselves are not defined.

All terminals by default display characters from Private Use Area as
narrow. Zsh also (correctly) treats them as narrow. For example, you
can do this:

    PS1=$'\uE0B0 '

Whether your terminal can render this glyph or not, everything will
work fine. The character will take one column and zsh will know that.

A few more tests to show that Private Use Area characters work find in
zsh with the exception that you cannot put then in BUFFER:

    % x=$'\uE0B0'

    % print -r -- ${(m)#x}
    1

    % print -r -- ${${(%):-$x%1(l.at least 1 column.)}[2,-1]}
    at least 1 column

    % print -r --  ${${(%):-$x%2(l..less than 2 columns)}[2,-1]}
    less than 2 columns

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 16:43   ` Roman Perepelitsa
@ 2022-10-23 17:02     ` Bart Schaefer
  2022-10-23 17:29       ` Roman Perepelitsa
  2022-10-23 22:42     ` Mikael Magnusson
  1 sibling, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2022-10-23 17:02 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Mikael Magnusson, Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 360 bytes --]

On Sun, Oct 23, 2022, 9:45 AM Roman Perepelitsa <roman.perepelitsa@gmail.com>
wrote:

> A few more tests to show that Private Use Area characters work find in
> zsh with the exception that you cannot put then in BUFFER:
>

I don't have the code handy, but I suspect this is due to the
implementation of wisprnt() [sic] returning false for those characters.

>

[-- Attachment #2: Type: text/html, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 17:02     ` Bart Schaefer
@ 2022-10-23 17:29       ` Roman Perepelitsa
  2022-10-23 18:30         ` Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER) Bart Schaefer
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-23 17:29 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Mikael Magnusson, Zsh hackers list

On Sun, Oct 23, 2022 at 7:02 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Oct 23, 2022, 9:45 AM Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
>>
>> A few more tests to show that Private Use Area characters work find in
>> zsh with the exception that you cannot put then in BUFFER:
>
>
> I don't have the code handy, but I suspect this is due to the implementation of wisprnt() [sic] returning false for those characters.

You are right, iswprint(0xE0B0) returns 0.

I'm compiling zsh with --enable-unicode9, so instead of iswprint() it
goes into u9_iswprint(). This function explicitly handles this case
and returns 0, just like iswprint(). So we get this:

    WCWIDTH(0xE0B0) => 1
    WC_ISPRINT(0xE0B0) => 0

I think u9_iswprint() should return 1 for Private Use Area characters.

Roman.

P.S.

Why isn't --enable-unicode9 a default?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER)
  2022-10-23 17:29       ` Roman Perepelitsa
@ 2022-10-23 18:30         ` Bart Schaefer
  2022-10-23 19:30           ` Roman Perepelitsa
  2022-10-23 21:57           ` Mikael Magnusson
  2022-10-23 18:54         ` UNICODE Private Use Area characters in BUFFER Bart Schaefer
  2022-11-04  9:55         ` Jun T
  2 siblings, 2 replies; 23+ messages in thread
From: Bart Schaefer @ 2022-10-23 18:30 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Mikael Magnusson, Zsh hackers list

On Sun, Oct 23, 2022 at 10:29 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> Why isn't --enable-unicode9 a default?

We don't have a real configure.ac test for it, just a block to add it
to config.h if explicitly enabled.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 17:29       ` Roman Perepelitsa
  2022-10-23 18:30         ` Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER) Bart Schaefer
@ 2022-10-23 18:54         ` Bart Schaefer
  2022-10-23 19:26           ` Roman Perepelitsa
  2022-11-04  9:55         ` Jun T
  2 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2022-10-23 18:54 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Mikael Magnusson, Zsh hackers list

On Sun, Oct 23, 2022 at 10:29 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> You are right, iswprint(0xE0B0) returns 0.

Interestingly, ${(V)...} et al. don't consult iswprint(), they just
call wcs_nicechar_sel() which only consults WCWIDTH().

Some characters like $'\u21A9' report and occupy a width of 1, but
part of the character overlaps the cell to the right when displayed.

> I think u9_iswprint() should return 1 for Private Use Area characters.

Suggest a patch to Src/wcwidth9.h ?  I'm not sure whether to update
the static wcwidth9() function, or the entries in the lookup tables.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 18:54         ` UNICODE Private Use Area characters in BUFFER Bart Schaefer
@ 2022-10-23 19:26           ` Roman Perepelitsa
  0 siblings, 0 replies; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-23 19:26 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Mikael Magnusson, Zsh hackers list

On Sun, Oct 23, 2022 at 8:54 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Oct 23, 2022 at 10:29 AM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
> >
> > You are right, iswprint(0xE0B0) returns 0.
>
> Interestingly, ${(V)...} et al. don't consult iswprint(), they just
> call wcs_nicechar_sel() which only consults WCWIDTH().

As far as I can tell, WCWIDTH() always gives the right answer when
building with --enable-unicode9. It's only WC_ISPRINT() that sometimes
gives an incorrect (in my opinion) answer.

> Some characters like $'\u21A9' report and occupy a width of 1, but
> part of the character overlaps the cell to the right when displayed.

This is a property of your font and not of the character. Glyphs don't
have to be confined within their bearings. In monospace fonts they
usually are but even this rule is often broken. The font I'm using
(and the one I recommend to powerlevel10k users) has glyphs that
overlap left, right, top and bottom neighbours.

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER)
  2022-10-23 18:30         ` Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER) Bart Schaefer
@ 2022-10-23 19:30           ` Roman Perepelitsa
  2022-10-23 21:57           ` Mikael Magnusson
  1 sibling, 0 replies; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-23 19:30 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Mikael Magnusson, Zsh hackers list

On Sun, Oct 23, 2022 at 8:30 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Oct 23, 2022 at 10:29 AM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
> >
> > Why isn't --enable-unicode9 a default?
>
> We don't have a real configure.ac test for it, just a block to add it
> to config.h if explicitly enabled.

Would it be better to flip the default? Are there downsides to
enabling this option?

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER)
  2022-10-23 18:30         ` Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER) Bart Schaefer
  2022-10-23 19:30           ` Roman Perepelitsa
@ 2022-10-23 21:57           ` Mikael Magnusson
  1 sibling, 0 replies; 23+ messages in thread
From: Mikael Magnusson @ 2022-10-23 21:57 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Roman Perepelitsa, Zsh hackers list

On 10/23/22, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sun, Oct 23, 2022 at 10:29 AM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
>>
>> Why isn't --enable-unicode9 a default?
>
> We don't have a real configure.ac test for it, just a block to add it
> to config.h if explicitly enabled.

It looks like the test for broken wcwidth can enable unicode9 as well.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 16:43   ` Roman Perepelitsa
  2022-10-23 17:02     ` Bart Schaefer
@ 2022-10-23 22:42     ` Mikael Magnusson
  2022-10-23 23:16       ` Roman Perepelitsa
  1 sibling, 1 reply; 23+ messages in thread
From: Mikael Magnusson @ 2022-10-23 22:42 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On 10/23/22, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> On Sun, Oct 23, 2022 at 6:29 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>>
>> > Note: Private Use Area characters work fine everywhere else. For
>> > example,
>> > in PS1.
>>
>> I'm not sure we have any choice, we have to know how wide every
>> character we print is, and presumably there is no defined width for
>> them as the characters themselves are not defined.
>
> All terminals by default display characters from Private Use Area as
> narrow.

There is no reason to assume this to be the case though, since it is
explicitly unstandardized.

> Zsh also (correctly) treats them as narrow. For example, you
> can do this:
>
>     PS1=$'\uE0B0 '
>
> Whether your terminal can render this glyph or not, everything will
> work fine. The character will take one column and zsh will know that.

Whether or not the terminal uses 0, 1 or 2 spaces for the printed
character, it is okay that we print them assuming it uses 1 space,
since the user has access to the %{%}%G mechanisms to adjust for it in
prompts, this is not possible in the interactive buffer obviously.

> A few more tests to show that Private Use Area characters work find in
> zsh with the exception that you cannot put then in BUFFER:
>
>     % x=$'\uE0B0'
>
>     % print -r -- ${(m)#x}
>     1
>
>     % print -r -- ${${(%):-$x%1(l.at least 1 column.)}[2,-1]}
>     at least 1 column
>
>     % print -r --  ${${(%):-$x%2(l..less than 2 columns)}[2,-1]}
>     less than 2 columns

I or anyone else can make a terminal that does something else with
these codepoints. (I'm just pointing this out).

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 22:42     ` Mikael Magnusson
@ 2022-10-23 23:16       ` Roman Perepelitsa
  2022-10-23 23:35         ` Bart Schaefer
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-23 23:16 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Zsh hackers list

On Mon, Oct 24, 2022 at 12:42 AM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 10/23/22, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> >
> > All terminals by default display characters from Private Use Area as
> > narrow.
>
> There is no reason to assume this to be the case though, since it is
> explicitly unstandardized.

Sorry, I should've clarified that I'm not making an assumption. I'm
talking about the actual terminals that exist. The fact that all of
them render Private Use Area characters as narrow means that zsh is
doing the right thing by treating them as narrow.

> > A few more tests to show that Private Use Area characters work find in
> > zsh with the exception that you cannot put then in BUFFER:
> >
> >     % x=$'\uE0B0'
> >
> >     % print -r -- ${(m)#x}
> >     1
> >
> >     % print -r -- ${${(%):-$x%1(l.at least 1 column.)}[2,-1]}
> >     at least 1 column
> >
> >     % print -r --  ${${(%):-$x%2(l..less than 2 columns)}[2,-1]}
> >     less than 2 columns
>
> I or anyone else can make a terminal that does something else with
> these codepoints. (I'm just pointing this out).

This code shows how zsh treats characters from Private Use Area. The
output doesn't depend on or require a terminal. I wanted to show that
zsh handles characters from Private Use Area just fine. The only place
I know of where zsh cannot handle them is BUFFER.

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 23:16       ` Roman Perepelitsa
@ 2022-10-23 23:35         ` Bart Schaefer
  2022-10-23 23:46           ` Bart Schaefer
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2022-10-23 23:35 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Oct 23, 2022 at 4:22 PM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> This code shows how zsh treats characters from Private Use Area. The
> output doesn't depend on or require a terminal. I wanted to show that
> zsh handles characters from Private Use Area just fine. The only place
> I know of where zsh cannot handle them is BUFFER.

That's sort of the point?  BUFFER does depend on and require a
terminal.  Asserting that zsh "handles" those characters in other
contexts isn't indicative of anything beyond demonstrating that
terminal "handling" is a special case.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 23:35         ` Bart Schaefer
@ 2022-10-23 23:46           ` Bart Schaefer
  2022-10-24  1:27             ` Mikael Magnusson
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2022-10-23 23:46 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Oct 23, 2022 at 4:35 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Asserting that zsh "handles" those characters in other
> contexts isn't indicative of anything beyond demonstrating that
> terminal "handling" is a special case.

Seems to me we've got the following options:

1.  Do nothing.
2.  Presume Roman is correct that these characters can always be
treated as printable and narrow.  (Still no answer as to how best to
change this?)
3.  Add an option UNICODE_PRINTABLE_NARROW that when set, asserts all
these characters to be printable and narrow.  Default ... on?
4.  Add special variable(s) (perhaps via module?) to allow remapping
the wcwidth9.h lookup tables to make individual characters printable
and set their width.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 23:46           ` Bart Schaefer
@ 2022-10-24  1:27             ` Mikael Magnusson
  2022-10-24  1:43               ` Bart Schaefer
  0 siblings, 1 reply; 23+ messages in thread
From: Mikael Magnusson @ 2022-10-24  1:27 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On 10/24/22, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sun, Oct 23, 2022 at 4:35 PM Bart Schaefer <schaefer@brasslantern.com>
> wrote:
>>
>> Asserting that zsh "handles" those characters in other
>> contexts isn't indicative of anything beyond demonstrating that
>> terminal "handling" is a special case.
>
> Seems to me we've got the following options:
>
> 1.  Do nothing.
> 2.  Presume Roman is correct that these characters can always be
> treated as printable and narrow.  (Still no answer as to how best to
> change this?)
> 3.  Add an option UNICODE_PRINTABLE_NARROW that when set, asserts all
> these characters to be printable and narrow.  Default ... on?
> 4.  Add special variable(s) (perhaps via module?) to allow remapping
> the wcwidth9.h lookup tables to make individual characters printable
> and set their width.

I think if we should do anything with wcwidth9.h, it's remove it.
Since adding it there have been 6 subsequent unicode standards, the
latest one adding over 4000 ideographs alone[1] (I don't know what
width the version 9 wcwidth gives for this range). It is probably
returning wrong values for many more thousands of characters on
systems where the libc has newer tables than unicode 9. I suppose it
could be useful to enable when remoting into old systems from a modern
one.

We should probably at least mark it as deprecated, glibc 2.26 added
support for unicode 9 and was released in august 2017, and the unicode
9 wcwidth.h was added to zsh in november 2016, a rather small window
where it mattered. What happened in unicode 9 was that the
presentation width for all emoji was changed to 2[2], I'm not sure how
this motivated people to add custom tables to every program they used
instead of simply updating glibc and have every program be correct at
once...

[1] https://home.unicode.org/announcing-the-unicode-standard-version-15-0/
[2] I couldn't find a more official reference than this atm,
https://github.com/irssi/irssi/issues/720

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-24  1:27             ` Mikael Magnusson
@ 2022-10-24  1:43               ` Bart Schaefer
  2022-10-24 10:50                 ` Roman Perepelitsa
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Schaefer @ 2022-10-24  1:43 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Zsh hackers list

On Sun, Oct 23, 2022 at 6:27 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> I think if we should do anything with wcwidth9.h, it's remove it.

I must be missing something.  If glibc already supports unicode9, then
isn't the right thing just for Roman to NOT --enable-unicode9?  Or is
there something else about #define ENABLE_UNICODE9 that's magic?

If glibc supports unicode9 but iswprint() still returns false for the
Private Use Area, don't we still need to do something (or decide to do
nothing) to address Roman's issue?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-24  1:43               ` Bart Schaefer
@ 2022-10-24 10:50                 ` Roman Perepelitsa
  2022-11-04 10:31                   ` Jun T
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Perepelitsa @ 2022-10-24 10:50 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Mikael Magnusson, Zsh hackers list

On Mon, Oct 24, 2022 at 3:44 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Oct 23, 2022 at 6:27 PM Mikael Magnusson <mikachu@gmail.com> wrote:
> >
> > I think if we should do anything with wcwidth9.h, it's remove it.
>
> I must be missing something.  If glibc already supports unicode9, then
> isn't the right thing just for Roman to NOT --enable-unicode9?

The behavior w.r.t. Prive Use Area characters in BUFFER is the same
whether or not --enable-unicode9 is used. In other words, iswprint()
returns false.

On Mon, Oct 24, 2022 at 1:36 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Oct 23, 2022 at 4:22 PM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
> >
> > This code shows how zsh treats characters from Private Use Area. The
> > output doesn't depend on or require a terminal. I wanted to show that
> > zsh handles characters from Private Use Area just fine. The only place
> > I know of where zsh cannot handle them is BUFFER.
>
> That's sort of the point?  BUFFER does depend on and require a
> terminal.  Asserting that zsh "handles" those characters in other
> contexts isn't indicative of anything beyond demonstrating that
> terminal "handling" is a special case.

The fact that Private Use Area characters work in PS1 is informative
here, as is the expansion of ${(m)#foo}. In both contexts zsh assumes
that Private Use Area characters occupy one column.

I went ahead and checked how other shells behave if you paste U+E0B0
into the command line. They all output the character as-is. Shells
that need to keep track of the cursor position all assume that U+E0B0
is narrow. The shells I tried are: bash, csh, mksh, ash and dash.

There is one thing I didn't mention that is relevant. There is at
least one terminal (iTerm2) that has an option to treat Private Use
Area characters as wide. This option is off by default.

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-23 17:29       ` Roman Perepelitsa
  2022-10-23 18:30         ` Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER) Bart Schaefer
  2022-10-23 18:54         ` UNICODE Private Use Area characters in BUFFER Bart Schaefer
@ 2022-11-04  9:55         ` Jun T
  2 siblings, 0 replies; 23+ messages in thread
From: Jun T @ 2022-11-04  9:55 UTC (permalink / raw)
  To: zsh-workers


> 2022/10/24 2:29, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> 
> You are right, iswprint(0xE0B0) returns 0.
> 
> I'm compiling zsh with --enable-unicode9, so instead of iswprint() it
> goes into u9_iswprint(). This function explicitly handles this case
> and returns 0, just like iswprint(). So we get this:
> 
>    WCWIDTH(0xE0B0) => 1
>    WC_ISPRINT(0xE0B0) => 0

I think iswprint(0xe0b0) (or WC_ISWPRINT()) returns 1 (in UTF-8 locale).
The reason that it doesn't work in Zle seems to be in Zle/zle_refresh.c:

1328 #ifdef MULTIBYTE_SUPPORT                                              
1329         else if (                                            
1330 #ifdef __STDC_ISO_10646__                                              
1331                  !ZSH_INVALID_WCHAR_TEST(*t) &&                        
1332 #endif                                                           
1333                  WC_ISPRINT(*t) && (width = WCWIDTH(*t)) > 0) {

__STDC_ISO_10646__ is defined in (probably all) Linux (but not in macOS),
and ZSH_INVALID_WCHAR_TEST() is defined in Zle/zle.h:

512 /* The start of the private range we use, for 256 characters */
513 #define ZSH_INVALID_WCHAR_BASE  (0xe000U) 
514 /* Detect a wide character within our range */       
515 #define ZSH_INVALID_WCHAR_TEST(x)                       \
516     ((unsigned)(x) >= ZSH_INVALID_WCHAR_BASE &&         \  
517      (unsigned)(x) <= (ZSH_INVALID_WCHAR_BASE + 255u))   

ZSH_INVALID_WCHAR_TEST() returns true for the wide character wc in the
range 0xe000 <= wc <= 0xe0ff. It seems zsh assume that this range
is not used by users and use it for representing "invalid" (or incomplete)
characters (see line 452 in Zle/zle_utils.c).

If characters in this range need be output as is, then we need some
options or such to disable this feature.

On macOS __STDC_ISO_10646__ is not defined (I think this is a bug of
macOS), and the character U+e0b0 is output as is. But on standard
macOS there is no font that has a glyph for this character, and
it is rendered as "a square with ? inside" (double width).
If you install a font that has a gliph for this character, and if the
gliph is single width, then I guess it will work OK in Zle.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-10-24 10:50                 ` Roman Perepelitsa
@ 2022-11-04 10:31                   ` Jun T
  2022-11-04 10:33                     ` Roman Perepelitsa
  0 siblings, 1 reply; 23+ messages in thread
From: Jun T @ 2022-11-04 10:31 UTC (permalink / raw)
  To: zsh-workers


> 2022/10/24 19:50, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> 
> There is one thing I didn't mention that is relevant. There is at
> least one terminal (iTerm2) that has an option to treat Private Use
> Area characters as wide. This option is off by default.

I can't find this option. Do you mean the option "Ambiguous characters
are double-width"? If so, many other terminals (gnome-terminal, Apple's
Terminal.app, etc.) have similar options.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-11-04 10:31                   ` Jun T
@ 2022-11-04 10:33                     ` Roman Perepelitsa
  2022-11-04 11:06                       ` Jun T
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Perepelitsa @ 2022-11-04 10:33 UTC (permalink / raw)
  To: Jun T; +Cc: zsh-workers

On Fri, Nov 4, 2022 at 11:32 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
>
> > 2022/10/24 19:50, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> >
> > There is one thing I didn't mention that is relevant. There is at
> > least one terminal (iTerm2) that has an option to treat Private Use
> > Area characters as wide. This option is off by default.
>
> I can't find this option. Do you mean the option "Ambiguous characters
> are double-width"?

That's the one.

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-11-04 10:33                     ` Roman Perepelitsa
@ 2022-11-04 11:06                       ` Jun T
  2022-11-04 11:09                         ` Roman Perepelitsa
  0 siblings, 1 reply; 23+ messages in thread
From: Jun T @ 2022-11-04 11:06 UTC (permalink / raw)
  To: zsh-workers


> 2022/11/04 19:33, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> 
> On Fri, Nov 4, 2022 at 11:32 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>> 
>> 
>> I can't find this option. Do you mean the option "Ambiguous characters
>> are double-width"?
> 
> That's the one.

Then many other terminals (gnome-terminal, Apple's Terminal.app, etc.)
have similar options. The option is mainly intended to control the width
of "East Asian Ambiguous Width" characters, such as $'\u25ef' (there are
many of them).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-11-04 11:06                       ` Jun T
@ 2022-11-04 11:09                         ` Roman Perepelitsa
  2022-11-04 15:32                           ` Jun T
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Perepelitsa @ 2022-11-04 11:09 UTC (permalink / raw)
  To: Jun T; +Cc: zsh-workers

On Fri, Nov 4, 2022 at 12:07 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
>
> > 2022/11/04 19:33, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> >
> > On Fri, Nov 4, 2022 at 11:32 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
> >>
> >>
> >> I can't find this option. Do you mean the option "Ambiguous characters
> >> are double-width"?
> >
> > That's the one.
>
> Then many other terminals (gnome-terminal, Apple's Terminal.app, etc.)
> have similar options. The option is mainly intended to control the width
> of "East Asian Ambiguous Width" characters, such as $'\u25ef' (there are
> many of them).

Perhaps zsh should have a similar option? Currently, if the terminal
option is switched on, prompt will break if it includes private use
area chars.

Roman.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: UNICODE Private Use Area characters in BUFFER
  2022-11-04 11:09                         ` Roman Perepelitsa
@ 2022-11-04 15:32                           ` Jun T
  0 siblings, 0 replies; 23+ messages in thread
From: Jun T @ 2022-11-04 15:32 UTC (permalink / raw)
  To: zsh-workers

Sorry, this is not related with the Roman's original problem
(so can be ignored if you do not use East Asian Ambiguous Width chars).

> 2022/11/04 20:09, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> 
> Perhaps zsh should have a similar option? Currently, if the terminal
> option is switched on, prompt will break if it includes private use
> area chars.

It is not a problem of zsh. iTerm2 (and other terminals) uses the option
to control the width of both "East Asian Ambiguous Width" and "User Area"
characters (and maybe other characters with ambiguous width). If your
"User Area" characters are all single width, and if you do not use East
Asian ambig. chars, then you are fine just by unsetting the option
(aside from the problem that Zle can't use them).

If you want to use the East Asian ambig. chars (with CJK fonts) then
you need to set the option on to display them correctly (double width),
but zsh (and probably bash/readline) assume them to be single width
(since wcwidth() returns 1) and can't edit them correctly.

This is a well known problem among Japanese (and Chinese/Korean?)
users. But the problem is in wcwidth (or locale definition), and
I think we need not fix it in zsh.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-11-04 15:32 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-23 10:12 UNICODE Private Use Area characters in BUFFER Roman Perepelitsa
2022-10-23 16:29 ` Mikael Magnusson
2022-10-23 16:43   ` Roman Perepelitsa
2022-10-23 17:02     ` Bart Schaefer
2022-10-23 17:29       ` Roman Perepelitsa
2022-10-23 18:30         ` Unicode9 (was Re: UNICODE Private Use Area characters in BUFFER) Bart Schaefer
2022-10-23 19:30           ` Roman Perepelitsa
2022-10-23 21:57           ` Mikael Magnusson
2022-10-23 18:54         ` UNICODE Private Use Area characters in BUFFER Bart Schaefer
2022-10-23 19:26           ` Roman Perepelitsa
2022-11-04  9:55         ` Jun T
2022-10-23 22:42     ` Mikael Magnusson
2022-10-23 23:16       ` Roman Perepelitsa
2022-10-23 23:35         ` Bart Schaefer
2022-10-23 23:46           ` Bart Schaefer
2022-10-24  1:27             ` Mikael Magnusson
2022-10-24  1:43               ` Bart Schaefer
2022-10-24 10:50                 ` Roman Perepelitsa
2022-11-04 10:31                   ` Jun T
2022-11-04 10:33                     ` Roman Perepelitsa
2022-11-04 11:06                       ` Jun T
2022-11-04 11:09                         ` Roman Perepelitsa
2022-11-04 15:32                           ` Jun T

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).