9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: andrey mirtchovski <mirtchovski@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] Woes of New Language Support
Date: Sun, 26 Jul 2009 01:41:16 -0600	[thread overview]
Message-ID: <14ec7b180907260041h18f63c64x871a7059cc9244bb@mail.gmail.com> (raw)
In-Reply-To: <8318421630e9613cfbdf14c1eae5f080@quanstro.net>

diacritics (combining characters) are a real mess in Unicode. with so
much space in the format why did they have to go this route, i wonder?

erik mentioned cyrillic. i did have an old church slavonic bible text
i was attempting to display correctly on Plan 9 sometime in 2003-4.
top is x11 with correctly (i presume) combined characters, below is
the Plan 9 rendering:
http://mirtchovski.com/screenshots/x-p9-diacritics.jpg

there's a pattern there, as you can see: the combining char always
follows the char it's combined with, so you can try simply not
advancing forward as a first draft of implementing char combinations
in Plan 9. there doesn't seem to be a default list of "combining"
characters in UTF so you'll have to pick up all glyphs described as
"combining" and check for them when you input. fun and slow :)

the real problem isn't in viewing them however, but comes when you
start searching for them: it's easy to search for ë (e-umlaut) for
example, but what if it's described as e+"U+0308 COMBINING DIAERESIS"?
the answer is the UTS#18 Regular Expressions technical standard which
probably contributes at least half of the slowness of gnu grep
discussed in another thread. http://www.unicode.org/reports/tr18/



  reply	other threads:[~2009-07-26  7:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-26  1:55 akumar
2009-07-26  5:08 ` erik quanstrom
2009-07-26  7:41   ` andrey mirtchovski [this message]
2009-07-26 14:32     ` erik quanstrom
2009-07-28 10:39       ` Charles Forsyth
2009-07-28 14:11         ` Ethan Grammatikidis
2009-07-28 14:52           ` John Floren
2009-07-28 17:46             ` Ethan Grammatikidis
2009-07-26  9:04   ` Salman Aljammaz
2009-07-26 13:48     ` erik quanstrom
2009-07-26 14:12       ` tlaronde
2009-07-26 14:24         ` erik quanstrom
2009-07-26 17:56       ` Nathaniel W Filardo
2009-07-26 18:39       ` Jack Johnson
2009-07-27  0:28         ` erik quanstrom
2009-07-26 11:43 Akshat Kumar
2009-07-26 12:01 Akshat Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14ec7b180907260041h18f63c64x871a7059cc9244bb@mail.gmail.com \
    --to=mirtchovski@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).