From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, NICE_REPLY_A autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29746 invoked from network); 17 Oct 2022 20:31:11 -0000 Received: from 9front.inri.net (168.235.81.73) by inbox.vuxu.org with ESMTPUTF8; 17 Oct 2022 20:31:11 -0000 Received: from mail.posixcafe.org ([45.76.19.58]) by 9front; Mon Oct 17 16:28:53 -0400 2022 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=posixcafe.org; s=20200506; t=1666038612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y3B2tolqI6irh7b+EEajy0S1AznfzvppwE/fyCMMcYI=; b=tg5mr6HRV3gZSLBSVmEdkvkedevgD4RVtu065iOlEOLWuSnVhLTdFA28TubUcqXhsVdjFH ewLKKRHBg9y/M0AFbQh9w1rHyY57Ow/LyWVufp/tWANGkmbMN2nEGKPsc27i8OXgNdWDlZ VCMwfJJASa1E+8cFHDRxLQ4P0tOiMDQ= Received: from [192.168.168.200] (161-97-228-135.mynextlight.net [161.97.228.135]) by mail.posixcafe.org (OpenSMTPD) with ESMTPSA id 2c21d8f9 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for <9front@9front.org>; Mon, 17 Oct 2022 15:30:12 -0500 (CDT) Message-ID: Date: Mon, 17 Oct 2022 14:28:42 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Content-Language: en-US To: 9front@9front.org References: <5B9E186DD7F2C4E2910241F9FDEB0C09@wopr.sciops.net> From: Jacob Moody In-Reply-To: <5B9E186DD7F2C4E2910241F9FDEB0C09@wopr.sciops.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: HTTP over TOR configuration session framework Subject: Re: [9front] [PATCH] introduce code points above BMP to /lib/unicode Reply-To: 9front@9front.org Precedence: bulk Some additional discussion I'd like to get some input on is if we should just include the entirety of UnicodeData.txt. There are some fields in there, notably decomposition mappings, that would be quite useful. It would also be nice to generate the ranges used in things like runetype(2) from the upstream documents so that we can more easily keep up to date. On this topic, I have been considering what should be done about compositional runes in general, as we currently do nothing with them. For some quick background, these are runes typically used for diacritic or tonal markings(but not exclusively) in unicode that are meant to be combined with another base rune. For various reasons many combinations have specific precomposed runes they map to. Currently our fonts support only these precomposed variants. One way we could get better is to put in some unicode normalization, specifically I am looking at NFC, in someplace like libdraw. Checking for normalization is cheap, and fixing up strings under the hood would be an easy way to make (better) use of the bitmaps in our fonts already. NFC canonically decomposes then recomposes the runes to consistently fully precompose the string before handing it off to the fonts. It is worth pointing out also that we can't precompose everything, there are ranges in unicode where you have no option but to implement shaping yourself. This won't address those, and would be nice to not get in the way of that down the road. Realistically this would allow us to support a large majority of decomposed latin, decomposed Korean, and some other decomposed edge cases that do provide precomposed variants. This matters if keyboard maps provide these combinational runes, which as I understand it is not uncommon. With this change, the combinational runes would essentially become zero width codepoints to the perspective of libdraw users. Which means backspacing (without any changes) would require two(or more) hits to fully strike out the rune, progressively unwinding the modifications. This makes sense to me, but I cant make assumptions about how others use these runes. A bit of a ramble, but I wanted to write out what I've been thinking so someone else can pick it apart if they'd like. Thanks, moody