Gotcha, thanks for the context! Combining emojis are weird :)

Hmm, agreed that it won't be possible to use the same standard across all terminals - hence, I was thinking terminfo would allow the terminal to indicate whether it supports these variation selectors with wide characters?

Yep, I was referencing TR51 from Unicode as well (emoji presentation selectors).

For examples of display errors/differences with terminals for \0x2601\0xFE0F (images hosted on GitHub to avoid embeds here):
- Kitty - the prior example with bracketed paste. Kitty renders this as 2 cells wide and width is computed as 2 cells wide.
- Default Mac terminal - rendered as 2 cells wide, but width is computed as 1 cell wide. Results in the next character overlapping the emoji.
- iTerm 2 - same as default Mac terminal (next char overlaps).
- Alacritty - renders as 1 cell wide and width is also computed as 1 cell wide. Essentially ignores the emoji variation selector.

In fish's case, I believe they use ridiculousfish/widecharwidth which does seem to handle emoji presentation selectors. unicode-width, part of the Rust stdlib, recently added support for correctly reporting the width of these sequences as well: unicode-width/pull/41. I believe the wcwidth for something like \0x2601\0xFE0F should be 2 (assuming the terminal supports it)?

From looking a bit into wcwidth, it seems like it doesn't inherently support width for a sequence of code points. I just tried this out in C++ with ICU (International Components for Unicode library) and grapheme clusters to demonstrate the width calculation as 2 with this sequence: gist.github.com/Advait-M/a326cd2e474b9520dc893765ec4cb2c4.

Best,
Advait

On Fri, May 10, 2024 at 5:54 AM Mikael Magnusson <mikachu@gmail.com> wrote:
On Fri, May 10, 2024 at 11:37 AM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On Thu, May 9, 2024 at 4:46 PM Advait Maybhate <advait@warp.dev> wrote:
> >
> > Hey folks!
> >
> >
> > Wanted to file a bug report/get a discussion going on the best way to handle emoji variation selectors with Unicode characters.
> >
> >
> > Metadata:
> >
> > Zsh version: zsh 5.9 (x86_64-apple-darwin23.0), OS version: macOS Sonoma 14.3.1
> >
> > Terminal: tested across Warp, Kitty, default Mac terminal, Alacritty, iTerm 2
> >
> >
> > ZLE incorrectly treats characters with the emoji variation selector as 1 character instead of 2 characters, causing off-by-one cursor movement issues in terminals that (correctly) treat it as 2 characters.
> >
> >
> > This is most easily reproduced in Kitty (v0.34), which renders and calculates these emojis as 2 cells (most terminal emulators seem to incorrectly handle this case of Unicode).
> >
> >
> > To repro:
> >
> > Paste in the command “echo ☁️” into Kitty (the last character is \0x2601 followed by \0xFE0F). Note that this results in bracketed paste mode in Zsh.
> >
> >
> > Expected behavior:
> >
> > ZLE contains “echo ☁️”.
> >
> >
> > Actual behavior:
> >
> > ZLE contains “eecho ☁️” (note the additional “e” at the beginning here - inverted colors from the bracketed paste). Confirmed that this is due to an off-by-one on the cursor instruction, from the PTY recording.
> >
> >
> > Screenshot: link
> >
> >
> > I’d love to discuss how to fix this for terminals that do respect variation selectors. One way to do this could be via a new `terminfo` entry, but I’d love to know what ZSH devs think! I’m an engineer building the Warp terminal, so I’d be happy to work on any terminal-side changes of this with `terminfo` (we actually use bracketed paste mode for all commands, to best support multiline commands with Warp's input editor)!
> >
> >
> > Notably, Fish 3.6 seems to calculate the width correctly as 2 cells (this is what originally prompted my investigation, due to the Starship prompt - see fish-shell/issues/10461), along with Bash (using bracketed paste with Bash 5.2).
> >
> >
> > I’ve seen 2017/msg00432 which is related to this, but deals with 0xFE0E not 0xFE0F.
>
> Generally speaking it is impossible to handle combining emoji, since
> the specification allows the rendering to either combine or not
> combine the glyphs, it is not possible for zsh to know how much space
> they will take up. Of course, your problem isn't even about combining
> emoji, but as far as I can see the same conceptual problem applies
> here; there is no way for zsh to know what "render as an image"
> implies for glyph width, all we can do is call wcwidth.

I also meant to say, if wcwidth for the base glyph is 1, then adding a
composing character after with a width of 0, it will not magically
change the width of the base glyph and cannot do so.
https://www.unicode.org/reports/tr51/ does mention that "Current
practice is for emoji to have a square aspect ratio, deriving from
their origin in Japanese. For interoperability, it is recommended that
this practice be continued with current and future emoji. They will
typically have about the same vertical placement and advance width as
CJK ideographs." but zsh cannot have some custom tables of emoji
widths, either wcwidth works correctly or it doesn't.

--
Mikael Magnusson