zsh-workers
 help / color / mirror / code / Atom feed
From: Kwon Yeolhyun <yeolhyunkwon@me.com>
To: Zsh List Hackers' <zsh-workers@zsh.org>
Subject: Unicode, Korean, normalization form, Mac OS X and tab completion
Date: Sat, 31 May 2014 12:56:06 +0900	[thread overview]
Message-ID: <AB81F9FB-8D84-4656-9EFE-F2F98B196861@me.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

I have to work with lots of files of Korean names. 
But the problem is that zsh failed in tab completion with Korean files.
So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
Also I searched mailing list and read some threads related to unicode or multibyte support. 
But I can’t find any solution.

I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Here’s my description about the issue..

1) Unicode spec has defined normalization forms, which is related to canonical equivalence, comparing two unicode strings.
2) Normalized forms are to decompose a character into some components.
    For example, Å(alphabet A with a ring above) -> A(alphabet A) + ˚(ring above) or 가(hangul syllable ga) -> ㄱ(hangul choseoung gieuk) + ㅏ(hangul jungseong ah)
3) A Korean letter, a.k.a hangul, has three parts: Choseong, jungseong, jongseong. For example, 가 is decomposed into the choseong, ㄱ, and the jungseong, ㅏ.
    And 각 can break down into ㄱ,ㅏ,ㄱ(the jongseong).
4) Mac OS X uses normalized string as filename. Assuming there’s a file with the name of 가나다, it has the name of ㄱㅏㄴㅏㄷㅏ(decomposed into hangul jamos) internally. (Link to hangul jamos: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352&number=1024 )
5) I guess the reason why the tab completion has failed is that zsh compare the user input, 가나다, with the filename, ㄱㅏㄴㅏㄷㅏ.
    가나다 and ㄱㅏㄴㅏㄷㅏ are canonically equivalent but have different binary representations.
6) I insist that comparing two unicode strings must be done with respect to the canonical equivalence.
7) Unicode spec has the dedicated section for treating hangul syllables. Fortunately, hangul can be decomposed and composed algorithmically.
( Please refer to the unicode spec section 3.12 under “Parsing" http://www.unicode.org/faq/specifications.html )
8) On Ubuntu, the tab completion is perfectly working. Currently, this issue is restricted to Mac OS X. (I’ve never tested on the other platform.)
9) I think this is related to the COMBINING_CHAR option but the option is not regarding hangul.
10 ) Now, the latest version of bash is the only shell with working tab completion feature on Mac OS X.
11) ‘Hangul’ is the name of Korean letters. If you have interested in it, please refer to http://en.wikipedia.org/wiki/Hangul

Thanks for reading.

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

             reply	other threads:[~2014-05-31  3:56 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-31  3:56 Kwon Yeolhyun [this message]
2014-05-31 15:21 ` Chet Ramey
2014-05-31 18:47   ` Bart Schaefer
2014-05-31 19:16 ` Peter Stephenson
2014-05-31 21:29   ` Bart Schaefer
2014-06-01  2:25     ` Daniel Shahaf
2014-06-01  5:30       ` Kwon Yeolhyun
2014-06-01 16:53         ` Daniel Shahaf
2014-06-01  7:56       ` Bart Schaefer
2014-06-01 16:46         ` Daniel Shahaf
2014-06-01 17:00         ` Jun T.
2014-06-01 19:13           ` Bart Schaefer
2014-06-02 17:01             ` Jun T.
2014-06-02 17:14               ` Bart Schaefer
2014-06-01 19:53           ` Bart Schaefer
2014-06-02 11:58             ` Kwon Yeolhyun
2014-06-02 14:23               ` Kwon Yeolhyun
2014-06-02 15:14                 ` Bart Schaefer
2014-06-02 15:27                   ` Peter Stephenson
2014-06-02 15:48                     ` Kwon Yeolhyun
2014-06-02 15:27                   ` Kwon Yeolhyun
2014-06-02 15:49                     ` Bart Schaefer
2014-06-02 15:58                       ` Kwon Yeolhyun
2014-06-02 14:31               ` Bart Schaefer
2014-06-02 17:15             ` Jun T.
2014-06-02 17:27               ` Bart Schaefer
2014-06-05 14:34                 ` Jun T.
2014-06-05 15:00                   ` Bart Schaefer
2014-06-02  5:17           ` Kwon Yeolhyun
2014-06-02  7:39             ` Jun T.
2014-06-02  8:42               ` Kwon Yeolhyun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AB81F9FB-8D84-4656-9EFE-F2F98B196861@me.com \
    --to=yeolhyunkwon@me.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).