From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22251 invoked by alias); 2 Jun 2014 17:01:30 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32673 Received: (qmail 6786 invoked from network); 2 Jun 2014 17:01:23 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 X-Biglobe-Sender: Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\)) Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion From: "Jun T." In-Reply-To: Date: Tue, 3 Jun 2014 02:01:19 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: <59416708-1209-4DFC-B4D0-2D7A5878A662@kba.biglobe.ne.jp> References: <20140531201617.4ca60ab8@pws-pc.ntlworld.com> <140531142926.ZM556@torch.brasslantern.com> <20140601022527.GD1820@tarsus.local2> <140601005624.ZM3283@torch.brasslantern.com> To: zsh-workers@zsh.org X-Mailer: Apple Mail (2.1878.2) X-Biglobe-Spnum: 50963 2014/06/02 04:13, Bart Schaefer wrote: >> $ ls u # completes to =C3=BCber (useful for some user??) >=20 > The current behavior here is pretty much by accident, I've been thinking the NFD/NFC problem is not so serious because I can use u instead of =C3=BC (u is easier to type than =C3=BC on my = keyboard), and simply guessed that Western non-English-speacking people (German, = French, Spanish, etc.) were using something like u. But maybe I was wrong. In Japanese, some Hiragana/Katakana can have a kind of accent, e.g., =E3=81=8B + accent =3D =E3=81=8C. It's OK for me to type =E3=81=8B = instead of =E3=81=8C, but many Japanese Mac/zsh users were frustrated with the problem and one of those users came up with the patch I mentioned in the previous = post. I was thinking Korean (and Chinese) are free from the NFC/NFD problem, = but now I know I was wrong. I didn't know that Korean filenames are = completely decomposed down to each consonant/vowel. It was a surprise to know that $ echo '\u1100 \u1161' =E1=84=80 =E1=85=A1 $ echo'\u1100\u1161'=20 =EA=B0=80 Anyway, I did the following quick tests concerning the file sharing among Mac and Win/Linux. But the tests are incomplete, and I did them in a hurry so there may be mistakes: (1) File sharing between Mac and Windows (samba): It seems samba server/client on Mac do automatic conversion between NFD and NFC. A Mac volume mounted on Win behaves as if it is a NFC = volume, and a Win volume mounted on Mac behaves as if it is an NFD volume. This means composing readdir() output on Windows is not necessary even if the volume is physically an NFD volume, while it must be converted to NFC on Mac even if the volume is physically a NFC volume. (2) A USB flash drive (FAT format): If mounted on a Windows box it is a NFC volume, of course, and if = mounted on Mac it behaves as if it is a NFD volume (decomposed by a driver on = Mac). So the situation is the same as (1). I believe Linux behaves similarly as Windows. (3) File sharing between Mac and Linux (NFS): If a Mac volume is mounted on Linux, then no NFC/NFD conversion takes place; it seems readdir() on Linux returns NFD filenames for the volume. (I enabled nfsd on my Mac with the default setting. I looked into = nfsd(8) or exports(5) man pages but they don't mention anything about NFC/NFD). This means that zsh on Linux can't complete decomposed filename = correctly. But it seems iconv(3) on Linux doesn't support UTF-8-MAC and I can't = think of any solution here. I had no time to test mounting Linux volume on Mac, but the mount_nfs(8) man page on Mac says it has an option to convert NFD filename on Mac to NFC filename on the Linux server. I also couldn't test mounting Mac volume on Linux via samba, but I guess it behaves as if it is a NFC volume on Linux. The results so far suggest that readdir() output must be always = converted to NFC on Mac. On Linux (and maybe on Windows) no conversion is possible because = iconv() doesn't support UTF-8-MAC, but conversion is not necessary except for = when mounting Mac volume via NFS.