Re: Unicode, Korean, normalization form, Mac OS X and tab completion

zsh-workers
 help / color / mirror / code / Atom feed

From: Peter Stephenson <p.w.stephenson@ntlworld.com>
To: "Zsh List Hackers'" <zsh-workers@zsh.org>
Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion
Date: Sat, 31 May 2014 20:16:17 +0100	[thread overview]
Message-ID: <20140531201617.4ca60ab8@pws-pc.ntlworld.com> (raw)
In-Reply-To: <AB81F9FB-8D84-4656-9EFE-F2F98B196861@me.com>

On Sat, 31 May 2014 12:56:06 +0900
Kwon Yeolhyun <yeolhyunkwon@me.com> wrote:
> 4) Mac OS X uses normalized string as filename. Assuming there’s a
> file with the name of 가나다, it has the name of
> ㄱㅏㄴㅏㄷㅏ(decomposed into hangul jamos) internally. (Link to hangul
> jamos:
> http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352&number=1024)
> 5) I guess the reason why the tab completion has failed is that zsh
> compare the user input, 가나다, with the filename, ㄱㅏㄴㅏㄷㅏ.
> 가나다 and ㄱㅏㄴㅏㄷㅏ are canonically equivalent but have different
> binary representations.

You're right, this is a real problem that could do with solving.

The actual conversion between the two is easy enough --- though most of
use here don't use MACs or character sets that show up the problem, so
we'd need a volunteer to help with this (relatively) easy bit.

The difficult bit, about which I suspect only Bart and I are likely to
have detailed opinions, is where to do the conversion.

Doing it at the point where data is read from the keyboard is
problematic, since what we put back onto the command line is quite
intricately tied to what we read from it in the first place, and
arbitrary transformations at this point make it hard to know what to put
back after the completion.

Doing it right down in the guts is even harder --- there are some
incredibly complicated things going on to support features like partial
word completion that currently treat data simply as octet strings, and
upgrading this is a huge job.

So if we can guarantee the keyboard input is in one form (and I'm not
sure we necessarily can) it might be easier to convert file names into
that format.  The trouble here is that to be consistent we need to
convert all data passed into the completion system, e.g. from file
contents passed as strings via functions.  (In principle it's
more correct to normalise all input anyway.)

I'm currently wondering if there is scope for normalising keyboard input
really early --- before we feed it back to the shell --- and turning it
back into the usual keyboard form right at the end, perhaps not worrying
too much if the original input was in a different form as long as
they're equivalent.  But I suspect it's not that easy.

So this will take a certain amount of thought.

pws

next prev parent reply	other threads:[~2014-05-31 19:16 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-31  3:56 Kwon Yeolhyun
2014-05-31 15:21 ` Chet Ramey
2014-05-31 18:47   ` Bart Schaefer
2014-05-31 19:16 ` Peter Stephenson [this message]
2014-05-31 21:29   ` Bart Schaefer
2014-06-01  2:25     ` Daniel Shahaf
2014-06-01  5:30       ` Kwon Yeolhyun
2014-06-01 16:53         ` Daniel Shahaf
2014-06-01  7:56       ` Bart Schaefer
2014-06-01 16:46         ` Daniel Shahaf
2014-06-01 17:00         ` Jun T.
2014-06-01 19:13           ` Bart Schaefer
2014-06-02 17:01             ` Jun T.
2014-06-02 17:14               ` Bart Schaefer
2014-06-01 19:53           ` Bart Schaefer
2014-06-02 11:58             ` Kwon Yeolhyun
2014-06-02 14:23               ` Kwon Yeolhyun
2014-06-02 15:14                 ` Bart Schaefer
2014-06-02 15:27                   ` Peter Stephenson
2014-06-02 15:48                     ` Kwon Yeolhyun
2014-06-02 15:27                   ` Kwon Yeolhyun
2014-06-02 15:49                     ` Bart Schaefer
2014-06-02 15:58                       ` Kwon Yeolhyun
2014-06-02 14:31               ` Bart Schaefer
2014-06-02 17:15             ` Jun T.
2014-06-02 17:27               ` Bart Schaefer
2014-06-05 14:34                 ` Jun T.
2014-06-05 15:00                   ` Bart Schaefer
2014-06-02  5:17           ` Kwon Yeolhyun
2014-06-02  7:39             ` Jun T.
2014-06-02  8:42               ` Kwon Yeolhyun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140531201617.4ca60ab8@pws-pc.ntlworld.com \
    --to=p.w.stephenson@ntlworld.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).