From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13158 invoked by alias); 31 May 2014 18:47:24 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32637 Received: (qmail 23934 invoked from network); 31 May 2014 18:47:21 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 From: Bart Schaefer Message-id: <140531114719.ZM457@torch.brasslantern.com> Date: Sat, 31 May 2014 11:47:19 -0700 In-reply-to: <5389F394.60205@case.edu> Comments: In reply to Chet Ramey "Re: Unicode, Korean, normalization form, Mac OS X and tab completion" (May 31, 11:21am) References: <5389F394.60205@case.edu> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: "Zsh List Hackers'" Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion Cc: chet.ramey@case.edu, Kwon Yeolhyun MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Thanks for the reply, Chet. On May 31, 11:21am, Chet Ramey wrote: } Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet } } Your description and solution are right on the mark. Mac OS X stores and } returns filenames in decomposed Unicode (NFD), while Mac keyboards return } characters in precomposed Unicode (NFC). Hrm. I'm rather surprised this hasn't broken something *else*, because zsh is freely mixing keyboard and filesytem representations all over the place. E.g., does globbing also fail, in at least some cases? } What I did in bash was to convert between keyboard and file system } representations when performing filename comparisons for filename } completion. Zsh can do the same using iconv, which provides (on Mac } OS X) the UTF-8-MAC encoding to do the conversion. Unfortunately it's not isolated there. Except for the (old, deprecated) compctl completions, zsh does all the interesting work in shell functions with strings that may come from glob patterns or array variables or any number of other places. Only sometimes are those strings passed through the helper builtin that interprets them as file names, and even then it can't possibly know whether they originated from readdir(). Fortunately, I think it *would* be OK to use the zreaddir() wrapper to convert everything from NFD to NFC. zreaddir() already applies zsh's metafy() operation to all the file names, so as long as the OS properly converts back to NFD (which it must, or we'd already be in deep doody from throwing keyboard input at it) it should be safe to also iconv() at this point. This should cover globbing as well as completion. What are the configure / compile-time / run-time tests needed to detect this situation? Are we going to run into problems with e.g. NFS or Samba filesystems that are NOT in NFD representation? Do we need to handle this as a general case where we should always be testing in some way for wonky filesystems in order to normalize (e.g., a Mac FS mounted on Linux)?