From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: from zero.zsh.org (zero.zsh.org [IPv6:2a02:898:31:0:48:4558:7a:7368]) by inbox.vuxu.org (Postfix) with ESMTP id 6088B22561 for ; Tue, 14 May 2024 02:09:01 +0200 (CEST) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1715645341; b=DPP57Nr9WlWrAR/CPCujXRyUfYh2yeRMScBvgrJ120C+Tdq/Bqpv/5gQI4nUDIorDULQxnBUI0 NXTXlTreOzKOs7MTLflXoj1DIkzPUgLNJRhIV2anmoyHJnuSsqEK5oK4oU8gH84Xpe5dp3V4AK wme58pDqbIzSsVmJzcO6Oxe5Eu7OE3l3xindDd/V+uCKCkJO2jb9rVtn0s2170qnC2ROEoy7VL laeev/ZSTo67AY0fauSvddwAOVBxGa916zWd202Q1rjW9Fra61RYpi6G0m5K0XrAdXEkS0xMZe u50CpKAFzrO09eOgXSuurpe0wgQRtLV1pDBpYMOso1GPlg==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mail-ot1-f41.google.com) smtp.remote-ip=209.85.210.41; dkim=pass header.d=warp-dev.20230601.gappssmtp.com header.s=20230601 header.a=rsa-sha256; dmarc=none header.from=warp.dev; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1715645341; bh=dQEkTYKWPlXMvcW3RknhuQ33+js2ikDj6HQRIx5MwIg=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:DKIM-Signature:DKIM-Signature; b=Ndvr9fkt8I9s5r6PPayvBAb7JtCUbdb2ViKOtK3IyJItccQgQF+9udhp1vkhKx8aR7akJnekUZ T++ejBHJ+VGtKniIA2dIvkfoLfF/+iwGajSTyGP8E37kYzYPQQbKGJqO/ufhGK1usURjnwo1DZ IhXzNwKGoYVrDY1Cl59hck3nudXVkuC3WXlL8HgufCv0w5Y3HSga0ClPGjkBlxPmWpnZEd4Rbj V8QHkXBFtbywAnIZ8vxspCmBno/ARAMGT/z1QDtJ4xSYLMhbvbfYfMeyHeeayNbMIQRZKwLQjO a5zdfr+os4mzFEOzON9VjpQemc/aDwIVaqbjl3C8brtVZw==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Type:Cc:To:Subject:Message-ID :Date:From:In-Reply-To:References:MIME-Version:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=dQEkTYKWPlXMvcW3RknhuQ33+js2ikDj6HQRIx5MwIg=; b=SL5GRCL+K5PMWNJD8X8SaP6IrX DvizRAfFOyJrRKk/r4JX/s9lpCTm7W+S8VkwrXCln7Zr0+3OedagOeeMIX8yQQ6NJw1RHmHB/Yokm OuMMjZ86EC4WFYzm2+hupYsfZBr8gLbCdIbcXlNH0wetkAiK5lfnU1skBba9c/zpRYtb71uUaxMa4 O9W+LLJX10RvJIxiQbhowbMcIkbAtbWBPJdiAOF7fU2p2h9IYsUvgplFDLpzDIpBj/EYd31A6U0qt ohs/iICbrrSc/WOR0m6Jtks4c3JVSASI7cuw9XRgON+LE65E8L0B3zoObYHsRDsPZThbhKyOuWBp7 JHm1kMaQ==; Received: by zero.zsh.org with local id 1s6fj3-0000CI-0F; Tue, 14 May 2024 00:09:01 +0000 Authentication-Results: zsh.org; iprev=pass (mail-ot1-f41.google.com) smtp.remote-ip=209.85.210.41; dkim=pass header.d=warp-dev.20230601.gappssmtp.com header.s=20230601 header.a=rsa-sha256; dmarc=none header.from=warp.dev; arc=none Received: from mail-ot1-f41.google.com ([209.85.210.41]:60625) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) id 1s6fin-000PsA-Fr; Tue, 14 May 2024 00:08:45 +0000 Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-6f0e975f67fso2651958a34.0 for ; Mon, 13 May 2024 17:08:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=warp-dev.20230601.gappssmtp.com; s=20230601; t=1715645324; x=1716250124; darn=zsh.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dQEkTYKWPlXMvcW3RknhuQ33+js2ikDj6HQRIx5MwIg=; b=T7b3EKjqOJgWFnWQ1aylrp+zQYAR9K4YiprWQClBb4AKBiPJDPGaSVxeK4VU6bkuBh hw+wDVUeLVQ6eUbYUvQ7NRzoSM5w47OrGbbFLpBCnT+7cV2OFdBntM+YnBhn2dtcMcX9 WqifF7MqSECx2SDnlRGClz5Bav3ARIrQfsZ9TUIBOh8sqjQ3/YZSOYyUAEwDdBXkDG1w Aj0fhBu9cd8elnsXsyWtH7Ps105z+tGPA2QuV/J/VzSW89LQ+zKNIl8fEQGyzN3Ma9vn KzW7ddaIHqQ8NOnUiLclBdUBAMStUWpO615miwEv8tmJv2szKn6SW4reWHo1baxXlZDi hu1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715645324; x=1716250124; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dQEkTYKWPlXMvcW3RknhuQ33+js2ikDj6HQRIx5MwIg=; b=YndEyI0IHTgbDZWtQT9dl2pMQpIXzzSjmXXjKttgT3VoMSMltWtWRIWlfGYXTchxAy rfpb/DaNihpdChkJA4Ulb4ceUseHrEMMcxVixqe91+jMhrv3dVBXB3TmAeIHIWtZl5f3 LqOC8TvydCbeK49UlPYg7z+RxZ+x73qTVbzxw+NSebV5Xfj+cAxj+tMZD+VCv4AXQ5K5 xd5NGzpklbl2QRiO5AMX9GgNkZ1AMLw5SrcUsRfePbl3+GGDy/O3TdeJz4hJWcKRwPHF cZkmgutW9IpVCX8unCn4eDa+LTbITljUV3hSoBQIpzzxcdsvHcvpROoScYNDDuwH3lhr 4nWg== X-Gm-Message-State: AOJu0YysS9oaIo4lcljOeSBGkrRGBrvQ6xVxI1X7//aPKZADQAVIIEuQ OvQAsE8AKk8KmB7SjGXC7c0soKIC6+z83ROBDlL12lNuTzfqLHqx9xbJzX3eagZrVUA8ZSWg+95 WMnhBTAQVu+VxN2DpHCpZP9Cw95xES0NT15mazQ== X-Google-Smtp-Source: AGHT+IHLJvqRC3F3WEox79AI29xwGavQTgT6NfzBFlBwv50um8TW8FaAJpBaYMhJMm1sGJQH0fJgYmpi6532IC21jVM= X-Received: by 2002:a9d:6a0a:0:b0:6f0:e4d4:9651 with SMTP id 46e09a7af769-6f0e91390aamr12500162a34.19.1715645324446; Mon, 13 May 2024 17:08:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Advait Maybhate Date: Mon, 13 May 2024 20:08:33 -0400 Message-ID: Subject: Re: [BUG] ZLE character width with emoji presentation variation selectors in Unicode To: Mikael Magnusson Cc: zsh-workers@zsh.org Content-Type: multipart/alternative; boundary="0000000000001fe4da06185eceba" X-Seq: 52934 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: --0000000000001fe4da06185eceba Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Agreed that there's no particular phrasing for this in the Unicode spec wrt the exact width differences. I believe that'll largely be left up to the renderer (in web, mobile, desktop, etc. contexts). Given that, it seems like the optimal path forward might be to ask the terminal emulator for this information to ensure alignment in what the shell thinks vs. the terminal (for widths)? Gotcha re composing characters - that makes sense, thanks for explaining! But yep, I've got a fallback mechanism here in mind for Zsh (render as 2 cells wide but only reserve 1 cell, to match the shell, similar to iTerm) - my goal with opening this issue was to kick off a discussion on the "correct" way to approach this in Zsh and how to best support this going forward. Since the current experience I've got in mind is suboptimal for Zsh (compared to Bash/Fish) within Warp, for example, due to these limitations. Best, Advait On Fri, May 10, 2024 at 2:57=E2=80=AFPM Mikael Magnusson wrote: > On Fri, May 10, 2024 at 7:12=E2=80=AFPM Advait Maybhate = wrote: > > > > Gotcha, thanks for the context! Combining emojis are weird :) > > > > Hmm, agreed that it won't be possible to use the same standard across > all terminals - hence, I was thinking terminfo would allow the terminal t= o > indicate whether it supports these variation selectors with wide characte= rs? > > > > Yep, I was referencing TR51 from Unicode as well (emoji presentation > selectors). > > From what I could tell (I'm not an expert), there is no phrasing that > implies the width should be different for the emoji presentation form > and the text presentation form. > > > From looking a bit into wcwidth, it seems like it doesn't inherently > support width for a sequence of code points. I just tried this out in C++ > with ICU (International Components for Unicode library) and grapheme > clusters to demonstrate the width calculation as 2 with this sequence: > gist.github.com/Advait-M/a326cd2e474b9520dc893765ec4cb2c4. > > Yes, normal compose sequences are a base character with a width, and > composing characters with 0 width (but effectively rendering to the > left of the insertion point, on top of the base character.) > > -- > Mikael Magnusson > --0000000000001fe4da06185eceba Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Agreed that there's no particular phrasing for this in= the Unicode spec wrt the exact width differences. I believe that'll la= rgely be left up to the renderer (in web, mobile, desktop, etc. contexts).= =C2=A0

Given that, it seems like the optimal path forwar= d might be to ask the terminal emulator for this information to ensure alig= nment in what the shell thinks vs. the terminal (for widths)?
Gotcha re composing characters - that makes sense,=C2=A0thanks = for explaining!

But yep, I've got a fallback m= echanism here in mind for Zsh (render as 2 cells wide but only reserve 1 ce= ll, to match the shell, similar to iTerm) - my goal with opening this issue= was to kick off a discussion on the "correct" way to approach th= is in Zsh and how to best support this going forward. Since the current exp= erience I've got in mind is suboptimal for Zsh (compared to Bash/Fish) = within Warp, for example, due to these limitations.=C2=A0

Best,
Advait

On Fri, May 10, 2024 at 2:57=E2=80=AFPM= Mikael Magnusson <mikachu@gmail.co= m> wrote:
On Fri, May 10, 2024 at 7:12=E2=80=AFPM Advait Maybhate <advait@warp.dev> wrote:
>
> Gotcha, thanks for the context! Combining emojis are weird :)
>
> Hmm, agreed that it won't be possible to use the same standard acr= oss all terminals - hence, I was thinking terminfo would allow the terminal= to indicate whether it supports these variation selectors with wide charac= ters?
>
> Yep, I was referencing TR51 from Unicode as well (emoji presentation s= electors).

>From what I could tell (I'm not an expert), there is no phrasing that implies the width should be different for the emoji presentation form
and the text presentation form.

> From looking a bit into wcwidth, it seems like it doesn't inherent= ly support width for a sequence of code points. I just tried this out in C+= + with ICU (International Components for Unicode library) and grapheme clus= ters to demonstrate the width calculation as 2 with this sequence: gist.github.com/Advait-M/a326cd2e474b9520= dc893765ec4cb2c4.

Yes, normal compose sequences are a base character with a width, and
composing characters with 0 width (but effectively rendering to the
left of the insertion point, on top of the base character.)

--
Mikael Magnusson
--0000000000001fe4da06185eceba--