Gotcha, thanks for the context! Combining emojis are weird :) Hmm, agreed that it won't be possible to use the same standard across all terminals - hence, I was thinking terminfo would allow the terminal to indicate whether it supports these variation selectors with wide characters? Yep, I was referencing TR51 from Unicode as well (emoji presentation selectors ). For examples of display errors/differences with terminals for \0x2601\0xFE0F (images hosted on GitHub to avoid embeds here): - Kitty - the prior example with bracketed paste. Kitty renders this as 2 cells wide and width is computed as 2 cells wide. - Default Mac terminal - rendered as 2 cells wide, but width is computed as 1 cell wide. Results in the next character overlapping the emoji. - iTerm 2 - same as default Mac terminal (next char overlaps). - Alacritty - renders as 1 cell wide and width is also computed as 1 cell wide. Essentially ignores the emoji variation selector. In fish's case, I believe they use ridiculousfish/widecharwidth which does seem to handle emoji presentation selectors. unicode-width, part of the Rust stdlib, recently added support for correctly reporting the width of these sequences as well: unicode-width/pull/41 . I believe the wcwidth for something like \0x2601\0xFE0F should be 2 (assuming the terminal supports it)? From looking a bit into wcwidth, it seems like it doesn't inherently support width for a sequence of code points. I just tried this out in C++ with ICU (International Components for Unicode library) and grapheme clusters to demonstrate the width calculation as 2 with this sequence: gist.github.com/Advait-M/a326cd2e474b9520dc893765ec4cb2c4. Best, Advait On Fri, May 10, 2024 at 5:54 AM Mikael Magnusson wrote: > On Fri, May 10, 2024 at 11:37 AM Mikael Magnusson > wrote: > > > > On Thu, May 9, 2024 at 4:46 PM Advait Maybhate wrote: > > > > > > Hey folks! > > > > > > > > > Wanted to file a bug report/get a discussion going on the best way to > handle emoji variation selectors with Unicode characters. > > > > > > > > > Metadata: > > > > > > Zsh version: zsh 5.9 (x86_64-apple-darwin23.0), OS version: macOS > Sonoma 14.3.1 > > > > > > Terminal: tested across Warp, Kitty, default Mac terminal, Alacritty, > iTerm 2 > > > > > > > > > ZLE incorrectly treats characters with the emoji variation selector as > 1 character instead of 2 characters, causing off-by-one cursor movement > issues in terminals that (correctly) treat it as 2 characters. > > > > > > > > > This is most easily reproduced in Kitty (v0.34), which renders and > calculates these emojis as 2 cells (most terminal emulators seem to > incorrectly handle this case of Unicode). > > > > > > > > > To repro: > > > > > > Paste in the command “echo ☁️” into Kitty (the last character is > \0x2601 followed by \0xFE0F). Note that this results in bracketed paste > mode in Zsh. > > > > > > > > > Expected behavior: > > > > > > ZLE contains “echo ☁️”. > > > > > > > > > Actual behavior: > > > > > > ZLE contains “eecho ☁️” (note the additional “e” at the beginning here > - inverted colors from the bracketed paste). Confirmed that this is due to > an off-by-one on the cursor instruction, from the PTY recording. > > > > > > > > > Screenshot: link > > > > > > > > > I’d love to discuss how to fix this for terminals that do respect > variation selectors. One way to do this could be via a new `terminfo` > entry, but I’d love to know what ZSH devs think! I’m an engineer building > the Warp terminal, so I’d be happy to work on any terminal-side changes of > this with `terminfo` (we actually use bracketed paste mode for all > commands, to best support multiline commands with Warp's input editor)! > > > > > > > > > Notably, Fish 3.6 seems to calculate the width correctly as 2 cells > (this is what originally prompted my investigation, due to the Starship > prompt - see fish-shell/issues/10461), along with Bash (using bracketed > paste with Bash 5.2). > > > > > > > > > I’ve seen 2017/msg00432 which is related to this, but deals with > 0xFE0E not 0xFE0F. > > > > Generally speaking it is impossible to handle combining emoji, since > > the specification allows the rendering to either combine or not > > combine the glyphs, it is not possible for zsh to know how much space > > they will take up. Of course, your problem isn't even about combining > > emoji, but as far as I can see the same conceptual problem applies > > here; there is no way for zsh to know what "render as an image" > > implies for glyph width, all we can do is call wcwidth. > > I also meant to say, if wcwidth for the base glyph is 1, then adding a > composing character after with a width of 0, it will not magically > change the width of the base glyph and cannot do so. > https://www.unicode.org/reports/tr51/ does mention that "Current > practice is for emoji to have a square aspect ratio, deriving from > their origin in Japanese. For interoperability, it is recommended that > this practice be continued with current and future emoji. They will > typically have about the same vertical placement and advance width as > CJK ideographs." but zsh cannot have some custom tables of emoji > widths, either wcwidth works correctly or it doesn't. > > -- > Mikael Magnusson >