zsh-workers
 help / color / mirror / code / Atom feed
* Unicode, Korean, normalization form, Mac OS X and tab completion
@ 2014-05-31  3:56 Kwon Yeolhyun
  2014-05-31 15:21 ` Chet Ramey
  2014-05-31 19:16 ` Peter Stephenson
  0 siblings, 2 replies; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-05-31  3:56 UTC (permalink / raw)
  To: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

I have to work with lots of files of Korean names. 
But the problem is that zsh failed in tab completion with Korean files.
So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
Also I searched mailing list and read some threads related to unicode or multibyte support. 
But I can’t find any solution.

I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Here’s my description about the issue..

1) Unicode spec has defined normalization forms, which is related to canonical equivalence, comparing two unicode strings.
2) Normalized forms are to decompose a character into some components.
    For example, Å(alphabet A with a ring above) -> A(alphabet A) + ˚(ring above) or 가(hangul syllable ga) -> ㄱ(hangul choseoung gieuk) + ㅏ(hangul jungseong ah)
3) A Korean letter, a.k.a hangul, has three parts: Choseong, jungseong, jongseong. For example, 가 is decomposed into the choseong, ㄱ, and the jungseong, ㅏ.
    And 각 can break down into ㄱ,ㅏ,ㄱ(the jongseong).
4) Mac OS X uses normalized string as filename. Assuming there’s a file with the name of 가나다, it has the name of ㄱㅏㄴㅏㄷㅏ(decomposed into hangul jamos) internally. (Link to hangul jamos: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352&number=1024 )
5) I guess the reason why the tab completion has failed is that zsh compare the user input, 가나다, with the filename, ㄱㅏㄴㅏㄷㅏ.
    가나다 and ㄱㅏㄴㅏㄷㅏ are canonically equivalent but have different binary representations.
6) I insist that comparing two unicode strings must be done with respect to the canonical equivalence.
7) Unicode spec has the dedicated section for treating hangul syllables. Fortunately, hangul can be decomposed and composed algorithmically.
( Please refer to the unicode spec section 3.12 under “Parsing" http://www.unicode.org/faq/specifications.html )
8) On Ubuntu, the tab completion is perfectly working. Currently, this issue is restricted to Mac OS X. (I’ve never tested on the other platform.)
9) I think this is related to the COMBINING_CHAR option but the option is not regarding hangul.
10 ) Now, the latest version of bash is the only shell with working tab completion feature on Mac OS X.
11) ‘Hangul’ is the name of Korean letters. If you have interested in it, please refer to http://en.wikipedia.org/wiki/Hangul

Thanks for reading.

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-06-05 15:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-31  3:56 Unicode, Korean, normalization form, Mac OS X and tab completion Kwon Yeolhyun
2014-05-31 15:21 ` Chet Ramey
2014-05-31 18:47   ` Bart Schaefer
2014-05-31 19:16 ` Peter Stephenson
2014-05-31 21:29   ` Bart Schaefer
2014-06-01  2:25     ` Daniel Shahaf
2014-06-01  5:30       ` Kwon Yeolhyun
2014-06-01 16:53         ` Daniel Shahaf
2014-06-01  7:56       ` Bart Schaefer
2014-06-01 16:46         ` Daniel Shahaf
2014-06-01 17:00         ` Jun T.
2014-06-01 19:13           ` Bart Schaefer
2014-06-02 17:01             ` Jun T.
2014-06-02 17:14               ` Bart Schaefer
2014-06-01 19:53           ` Bart Schaefer
2014-06-02 11:58             ` Kwon Yeolhyun
2014-06-02 14:23               ` Kwon Yeolhyun
2014-06-02 15:14                 ` Bart Schaefer
2014-06-02 15:27                   ` Peter Stephenson
2014-06-02 15:48                     ` Kwon Yeolhyun
2014-06-02 15:27                   ` Kwon Yeolhyun
2014-06-02 15:49                     ` Bart Schaefer
2014-06-02 15:58                       ` Kwon Yeolhyun
2014-06-02 14:31               ` Bart Schaefer
2014-06-02 17:15             ` Jun T.
2014-06-02 17:27               ` Bart Schaefer
2014-06-05 14:34                 ` Jun T.
2014-06-05 15:00                   ` Bart Schaefer
2014-06-02  5:17           ` Kwon Yeolhyun
2014-06-02  7:39             ` Jun T.
2014-06-02  8:42               ` Kwon Yeolhyun

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).