From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19451 invoked by alias); 31 May 2014 03:56:36 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32633 Received: (qmail 16678 invoked from network); 31 May 2014 03:56:19 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.2 MIME-version: 1.0 From: Kwon Yeolhyun Content-type: multipart/signed; boundary="Apple-Mail=_D7F66619-9036-4BFC-A8B0-D1F02A17CBFA"; protocol="application/pgp-signature"; micalg=pgp-sha512 Subject: Unicode, Korean, normalization form, Mac OS X and tab completion Message-id: Date: Sat, 31 May 2014 12:56:06 +0900 To: Zsh List Hackers' X-Mailer: Apple Mail (2.1878.2) X-MANTSH: 1TEIXWV4bG1oaGkdHB0lGUkdDRl5PWBoaHREKTEMXGx0EGx0YBBIZBBsdEBseGh8 aEQpYTRdLEQptfhcaEQpMWRcbGhsbEQpZSRcRClleF2hjeREKQ04XSxsYGmJCH2lhGEd+GXhzB x58GhkfHGgYEQpYXBcZBBoEGxsHTU4fGBgYGUsFGx0EGx0YBBIZBBsdEBseGh8bEQpeWRdhXFx OGBEKQ1oXGx4EHBkEEhsEHRoRCkJFF2IbRV5/XmweWhJjEQpCThdscGB5QB1iUmkaYhEKQkwXY EAFUmljZWlheXsRCkJsF2JzGGdrUHNIbGNOEQpCQBdtZWFmYHpDaxx7SBEKQlgXegVHSX5+QR5 4b1oRCnBoF3pLTENraR5+axhHEQpwaBdpYUFTWGxzUllybREKcGgXaGMfHVJwSV5YTAURCnBoF 2UTfG5EEmBoaRheEQpwaBdub0lkH2JSf3tDBREKcH8XZRgcSWFbaUhbXm0RCnBfF2xpSFlFXGU dWUxkEQpwfxdjUkV8e3lGZH1EWhEKcF8XYBJyS0tQQV8bfVARCnB/F2l4T3B6SBMaThp5EQpwX xdkHBJTZFNvWH1DWBEKcGsXZUdSZhlzbFNYXk0RCnBLF2JpchNYXVxnbVNzEQpwaxdgYXhwTVN zTkkbfREKcGwXYhkYfF1mSWhraRoRCnBMF2VoXnp4aUNBXmllEQ== X-CLX-Spam: false X-CLX-Score: 1011 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-05-30_07:2014-05-30,2014-05-30,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1405310053 --Apple-Mail=_D7F66619-9036-4BFC-A8B0-D1F02A17CBFA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I have to work with lots of files of Korean names.=20 But the problem is that zsh failed in tab completion with Korean files. So I=E2=80=99ve done research to figure out what=E2=80=99s going on and = I found some keywords such as unicode, normalization form, Mac OS X, and = decomposition. Also I searched mailing list and read some threads related to unicode or = multibyte support.=20 But I can=E2=80=99t find any solution. I=E2=80=99m not an expert about Unicode, zsh, Mac OS X. So I=E2=80=99m = asking your help.. Here=E2=80=99s my description about the issue.. 1) Unicode spec has defined normalization forms, which is related to = canonical equivalence, comparing two unicode strings. 2) Normalized forms are to decompose a character into some components. For example, =C3=85(alphabet A with a ring above) -> A(alphabet A) + = =CB=9A(ring above) or =EA=B0=80(hangul syllable ga) -> =E3=84=B1(hangul = choseoung gieuk) + =E3=85=8F(hangul jungseong ah) 3) A Korean letter, a.k.a hangul, has three parts: Choseong, jungseong, = jongseong. For example, =EA=B0=80 is decomposed into the choseong, =E3=84=B1= , and the jungseong, =E3=85=8F. And =EA=B0=81 can break down into =E3=84=B1,=E3=85=8F,=E3=84=B1(the = jongseong). 4) Mac OS X uses normalized string as filename. Assuming there=E2=80=99s = a file with the name of =EA=B0=80=EB=82=98=EB=8B=A4, it has the name of = =E3=84=B1=E3=85=8F=E3=84=B4=E3=85=8F=E3=84=B7=E3=85=8F(decomposed into = hangul jamos) internally. (Link to hangul jamos: = http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3D4352&number=3D1= 024 ) 5) I guess the reason why the tab completion has failed is that zsh = compare the user input, =EA=B0=80=EB=82=98=EB=8B=A4, with the filename, = =E3=84=B1=E3=85=8F=E3=84=B4=E3=85=8F=E3=84=B7=E3=85=8F. =EA=B0=80=EB=82=98=EB=8B=A4 and =E3=84=B1=E3=85=8F=E3=84=B4=E3=85=8F=E3= =84=B7=E3=85=8F are canonically equivalent but have different binary = representations. 6) I insist that comparing two unicode strings must be done with respect = to the canonical equivalence. 7) Unicode spec has the dedicated section for treating hangul syllables. = Fortunately, hangul can be decomposed and composed algorithmically. ( Please refer to the unicode spec section 3.12 under =E2=80=9CParsing" = http://www.unicode.org/faq/specifications.html ) 8) On Ubuntu, the tab completion is perfectly working. Currently, this = issue is restricted to Mac OS X. (I=E2=80=99ve never tested on the other = platform.) 9) I think this is related to the COMBINING_CHAR option but the option = is not regarding hangul. 10 ) Now, the latest version of bash is the only shell with working tab = completion feature on Mac OS X. 11) =E2=80=98Hangul=E2=80=99 is the name of Korean letters. If you have = interested in it, please refer to http://en.wikipedia.org/wiki/Hangul Thanks for reading. --Apple-Mail=_D7F66619-9036-4BFC-A8B0-D1F02A17CBFA Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJTiVLWAAoJEDdY1K+v3Mu7x48P/0x4ucKbMUtkSHfL82iNKxi8 O83c2yQLgdt8ABGom7IOao7/NCoPoyUcB0f73aff8vGxzCendiL0WYfZujtmuflR IuPIUVtM68bHCjUqpPyVCCDmINQ6yjsZ8UjZ6jCQlT4q/CjS6NoidrGaYja0E/U5 P9JYsfRew1fCE3b2IvQIhQY8+zdadUGU3Qmw6u4WDmtPQFzC2biBNIgO0gFOZWfj lPZseUIoGwlgh7Rhj746HpztUqFIGir70ZzWT0kNun/FyvxL2AH429WT0BJjYU+H MiCPw1/1vxVc9rfXZuPclVgP2tq1Aeq15wFGUm0raQXWo9FlmaIe3IdRBRpl9vZ6 76UcZ81uCFTdzVUYgN9/RSRshDTOsH+gVEZ6I/iSeDwzuWuIL2Y1cuzeQTi87wIv AQPpiog+NhUyiVzoYN0SHShnfKUraLwM+SSBhE7+EzfLJ0hZeBIYh/qyJHbcjAy+ a1mnKMs/eeF7l6tXXmsIqaDnoqqdG9U0JleT1E3YE2veHdnvVf6XPqL83UDytLWl Y93YlTU9wSaW80aA48sruodMY/GnrmbMnai2K6PTm/fZxGOd99ps+8d7al1u1wZR KKCUIwBJDukJlrFgo/UOaT0WcwDZiFTflIHvNtzvMiSeQR2A1BFJ/2sGW7Q/yfg5 Mt23bw3YQsQQLM4jkFtG =aIFo -----END PGP SIGNATURE----- --Apple-Mail=_D7F66619-9036-4BFC-A8B0-D1F02A17CBFA--