From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28966 invoked by alias); 31 May 2014 15:28:33 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32636 Received: (qmail 3020 invoked from network); 31 May 2014 15:28:16 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, SPF_HELO_PASS autolearn=ham version=3.3.2 Message-ID: <5389F394.60205@case.edu> Date: Sat, 31 May 2014 11:21:56 -0400 From: Chet Ramey Reply-To: chet.ramey@case.edu Organization: ITS, Case Western Reserve University User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Kwon Yeolhyun CC: "Zsh List Hackers'" , chet.ramey@case.edu Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Junkmail-Whitelist: YES (by domain whitelist at mpv2.tis.cwru.edu) On 5/30/14, 11:56 PM, Kwon Yeolhyun wrote: > I have to work with lots of files of Korean names. > But the problem is that zsh failed in tab completion with Korean files. > So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition. > Also I searched mailing list and read some threads related to unicode or multibyte support. > But I can’t find any solution. > > I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help.. Your description and solution are right on the mark. Mac OS X stores and returns filenames in decomposed Unicode (NFD), while Mac keyboards return characters in precomposed Unicode (NFC). Decomposed Unicode is as you describe: certain characters are `decomposed' into multiple codepoints. (My use of NFD and NFC is not exact, but it's useful shorthand.) What I did in bash was to convert between keyboard and file system representations when performing filename comparisons for filename completion. Zsh can do the same using iconv, which provides (on Mac OS X) the UTF-8-MAC encoding to do the conversion. One possible strategy is to convert each filename to NFC for comparison, something like the following. 1. Keyboard input stays in NFC and is converted (dequoted, for example) to a `raw' form for comparison. 2. Read directory, assume each name will be returned in NFD, convert name to NFC. 3. Perform comparison using whatever strategy you'd like (e.g., taking case into account, mapping equivalent characters, whatever) 4. If the comparison succeeds, add the matching filename (NFC) to the list of completions. 5. If you have to add the filename to the command line (e.g., there is a single match), you have already converted it to NFC and can insert it directly. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU chet@case.edu http://cnswww.cns.cwru.edu/~chet/