9front - general discussion about 9front
 help / color / mirror / Atom feed
From: Jacob Moody <moody@mail.posixcafe.org>
To: 9front@9front.org
Subject: Re: [9front] [PATCH] introduce code points above BMP to /lib/unicode
Date: Mon, 17 Oct 2022 14:28:42 -0600	[thread overview]
Message-ID: <ac9cdeb2-8b06-76ba-5f32-e65ed70bb32b@posixcafe.org> (raw)
In-Reply-To: <5B9E186DD7F2C4E2910241F9FDEB0C09@wopr.sciops.net>

Some additional discussion I'd like to get some input on is if we
should just include the entirety of UnicodeData.txt.  There are some
fields in there, notably decomposition mappings, that would be quite
useful.  It would also be nice to generate the ranges used in things
like runetype(2) from the upstream documents so that we can more
easily keep up to date.

On this topic, I have been considering what should be done about
compositional runes in general, as we currently do nothing with them.
For some quick background, these are runes typically used for
diacritic or tonal markings(but not exclusively) in unicode that are
meant to be combined with another base rune.  For various reasons many
combinations have specific precomposed runes they map to.  Currently
our fonts support only these precomposed variants.

One way we could get better is to put in some unicode normalization,
specifically I am looking at NFC, in someplace like libdraw.  Checking
for normalization is cheap, and fixing up strings under the hood would
be an easy way to make (better) use of the bitmaps in our fonts
already.  NFC canonically decomposes then recomposes the runes to
consistently fully precompose the string before handing it off
to the fonts.

It is worth pointing out also that we can't precompose everything,
there are ranges in unicode where you have no option but to implement
shaping yourself.  This won't address those, and would be nice to not
get in the way of that down the road.  Realistically this would allow
us to support a large majority of decomposed latin, decomposed Korean,
and some other decomposed edge cases that do provide precomposed
variants.

This matters if keyboard maps provide these combinational runes, which
as I understand it is not uncommon.  With this change, the
combinational runes would essentially become zero width codepoints to
the perspective of libdraw users.  Which means backspacing (without
any changes) would require two(or more) hits to fully strike out the
rune, progressively unwinding the modifications.  This makes sense to
me, but I cant make assumptions about how others use these runes.

A bit of a ramble, but I wanted to write out what I've been thinking
so someone else can pick it apart if they'd like.

Thanks,
moody

  reply	other threads:[~2022-10-17 20:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17  5:24 Jacob Moody
2022-10-17  6:34 ` qwx
2022-10-17 20:28   ` Jacob Moody [this message]
2022-10-23  5:07     ` qwx
2022-10-23 14:28       ` ori
2022-10-23 16:58         ` qwx

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac9cdeb2-8b06-76ba-5f32-e65ed70bb32b@posixcafe.org \
    --to=moody@mail.posixcafe.org \
    --cc=9front@9front.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).