zsh-workers
 help / color / mirror / code / Atom feed
From: Advait Maybhate <advait@warp.dev>
To: Mikael Magnusson <mikachu@gmail.com>
Cc: zsh-workers@zsh.org, Aloke Desai <aloke@warp.dev>,
	Zach Bai <zachbai@warp.dev>
Subject: Re: [BUG] ZLE character width with emoji presentation variation selectors in Unicode
Date: Fri, 10 May 2024 13:11:56 -0400	[thread overview]
Message-ID: <CAN+tYMcLu5kCRM1cN1tV4O65ALBkFQd44WtrQH7Ly6QsKgTxAg@mail.gmail.com> (raw)
In-Reply-To: <CAHYJk3S0kbgWiTv4sX5nJEdBWGSp6U=MyoknDqCH9X4GxgT5LQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6026 bytes --]

Gotcha, thanks for the context! Combining emojis are weird :)

Hmm, agreed that it won't be possible to use the same standard across all
terminals - hence, I was thinking terminfo would allow the terminal to
indicate whether it supports these variation selectors with wide characters?

Yep, I was referencing TR51 from Unicode as well (emoji presentation
selectors
<https://www.unicode.org/reports/tr51/#def_emoji_presentation_selector>).

For examples of display errors/differences with terminals
for \0x2601\0xFE0F (images hosted on GitHub to avoid embeds here):
- Kitty
<https://github.com/warpdotdev/Warp/assets/12927474/b8ae2aae-7be4-4a9b-a471-423d098b5c8a>
-
the prior example with bracketed paste. Kitty renders this as 2 cells wide
and width is computed as 2 cells wide.
- Default Mac terminal
<https://github.com/warpdotdev/Warp/assets/12927474/a4af9db8-7741-4607-aab4-7dc170e9baa2>
- rendered as 2 cells wide, but width is computed as 1 cell wide. Results
in the next character overlapping the emoji.
- iTerm 2
<https://github.com/warpdotdev/Warp/assets/12927474/ea082464-0856-4e46-89aa-59d4014949f2>
- same as default Mac terminal (next char overlaps).
- Alacritty
<https://github.com/warpdotdev/Warp/assets/12927474/e134c39a-99d6-4f6b-84c5-86c8ae1edf51>
- renders as 1 cell wide and width is also computed as 1 cell wide.
Essentially ignores the emoji variation selector.

In fish's case, I believe they use ridiculousfish/widecharwidth
<https://github.com/ridiculousfish/widecharwidth> which does seem to handle
emoji presentation selectors. unicode-width, part of the Rust stdlib,
recently added support for correctly reporting the width of these sequences
as well: unicode-width/pull/41
<https://github.com/unicode-rs/unicode-width/pull/41>. I believe the
wcwidth for something like \0x2601\0xFE0F should be 2 (assuming the
terminal supports it)?

From looking a bit into wcwidth, it seems like it doesn't inherently
support width for a sequence of code points. I just tried this out in C++
with ICU (International Components for Unicode library) and grapheme
clusters to demonstrate the width calculation as 2 with this sequence:
gist.github.com/Advait-M/a326cd2e474b9520dc893765ec4cb2c4.

Best,
Advait

On Fri, May 10, 2024 at 5:54 AM Mikael Magnusson <mikachu@gmail.com> wrote:

> On Fri, May 10, 2024 at 11:37 AM Mikael Magnusson <mikachu@gmail.com>
> wrote:
> >
> > On Thu, May 9, 2024 at 4:46 PM Advait Maybhate <advait@warp.dev> wrote:
> > >
> > > Hey folks!
> > >
> > >
> > > Wanted to file a bug report/get a discussion going on the best way to
> handle emoji variation selectors with Unicode characters.
> > >
> > >
> > > Metadata:
> > >
> > > Zsh version: zsh 5.9 (x86_64-apple-darwin23.0), OS version: macOS
> Sonoma 14.3.1
> > >
> > > Terminal: tested across Warp, Kitty, default Mac terminal, Alacritty,
> iTerm 2
> > >
> > >
> > > ZLE incorrectly treats characters with the emoji variation selector as
> 1 character instead of 2 characters, causing off-by-one cursor movement
> issues in terminals that (correctly) treat it as 2 characters.
> > >
> > >
> > > This is most easily reproduced in Kitty (v0.34), which renders and
> calculates these emojis as 2 cells (most terminal emulators seem to
> incorrectly handle this case of Unicode).
> > >
> > >
> > > To repro:
> > >
> > > Paste in the command “echo ☁️” into Kitty (the last character is
> \0x2601 followed by \0xFE0F). Note that this results in bracketed paste
> mode in Zsh.
> > >
> > >
> > > Expected behavior:
> > >
> > > ZLE contains “echo ☁️”.
> > >
> > >
> > > Actual behavior:
> > >
> > > ZLE contains “eecho ☁️” (note the additional “e” at the beginning here
> - inverted colors from the bracketed paste). Confirmed that this is due to
> an off-by-one on the cursor instruction, from the PTY recording.
> > >
> > >
> > > Screenshot: link
> > >
> > >
> > > I’d love to discuss how to fix this for terminals that do respect
> variation selectors. One way to do this could be via a new `terminfo`
> entry, but I’d love to know what ZSH devs think! I’m an engineer building
> the Warp terminal, so I’d be happy to work on any terminal-side changes of
> this with `terminfo` (we actually use bracketed paste mode for all
> commands, to best support multiline commands with Warp's input editor)!
> > >
> > >
> > > Notably, Fish 3.6 seems to calculate the width correctly as 2 cells
> (this is what originally prompted my investigation, due to the Starship
> prompt - see fish-shell/issues/10461), along with Bash (using bracketed
> paste with Bash 5.2).
> > >
> > >
> > > I’ve seen 2017/msg00432 which is related to this, but deals with
> 0xFE0E not 0xFE0F.
> >
> > Generally speaking it is impossible to handle combining emoji, since
> > the specification allows the rendering to either combine or not
> > combine the glyphs, it is not possible for zsh to know how much space
> > they will take up. Of course, your problem isn't even about combining
> > emoji, but as far as I can see the same conceptual problem applies
> > here; there is no way for zsh to know what "render as an image"
> > implies for glyph width, all we can do is call wcwidth.
>
> I also meant to say, if wcwidth for the base glyph is 1, then adding a
> composing character after with a width of 0, it will not magically
> change the width of the base glyph and cannot do so.
> https://www.unicode.org/reports/tr51/ does mention that "Current
> practice is for emoji to have a square aspect ratio, deriving from
> their origin in Japanese. For interoperability, it is recommended that
> this practice be continued with current and future emoji. They will
> typically have about the same vertical placement and advance width as
> CJK ideographs." but zsh cannot have some custom tables of emoji
> widths, either wcwidth works correctly or it doesn't.
>
> --
> Mikael Magnusson
>

[-- Attachment #2: Type: text/html, Size: 7785 bytes --]

  reply	other threads:[~2024-05-10 17:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-09 14:45 Advait Maybhate
2024-05-10  9:37 ` Mikael Magnusson
2024-05-10  9:54   ` Mikael Magnusson
2024-05-10 17:11     ` Advait Maybhate [this message]
2024-05-10 18:57       ` Mikael Magnusson
2024-05-14  0:08         ` Advait Maybhate
2024-05-10 20:40       ` Bart Schaefer
2024-05-14  0:04         ` Advait Maybhate

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN+tYMcLu5kCRM1cN1tV4O65ALBkFQd44WtrQH7Ly6QsKgTxAg@mail.gmail.com \
    --to=advait@warp.dev \
    --cc=aloke@warp.dev \
    --cc=mikachu@gmail.com \
    --cc=zachbai@warp.dev \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).