From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18672 invoked by alias); 1 Jun 2014 07:56:43 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32647 Received: (qmail 18343 invoked from network); 1 Jun 2014 07:56:30 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 From: Bart Schaefer Message-id: <140601005624.ZM3283@torch.brasslantern.com> Date: Sun, 01 Jun 2014 00:56:24 -0700 In-reply-to: <20140601022527.GD1820@tarsus.local2> Comments: In reply to Daniel Shahaf "Re: Unicode, Korean, normalization form, Mac OS X and tab completion" (Jun 1, 2:25am) References: <20140531201617.4ca60ab8@pws-pc.ntlworld.com> <140531142926.ZM556@torch.brasslantern.com> <20140601022527.GD1820@tarsus.local2> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: "Zsh List Hackers'" Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: quoted-printable On Jun 1, 2:25am, Daniel Shahaf wrote: } } What about, say, people doing 'ls' and copy-pasting a filename from the } output into a command line? Wouldn't that result in NFD keyboard } input? Yes, but there's only so far that it makes sense to go with this. For example, [[ foo=C3=A1 =3D fooa=CC ]] arguably should not normalize, and scr= ipt file contents should not be normalized, etc. I think messing with the command input stream will create more problems than it solves. What we *might* need is for patcompile() also to normalize (though that potentially violates what I just said about [[ ... ]], depending on which encoding is the pattern and which is the string to be matched). Maybe this needs to be part of the (#u) qualifier handling, or a related new qualifier. (Note there's little to no existing support for wide characters in e.g. matcher-list range specifications, so no point in going there yet.) } FWIW, while OS X always returns NFD filenames, one could also imagine an } OS that is normalization-aware (forbids creating a file if its } normalized name is the same as the normalized name of an existing file) } but octet-sequence-preserving, and on such an OS both the readdir() } output and the user input would need to be normalized. This case is ultimately the same as your first example. Either the two forms of name should be treated the same, in which case normalizing the results of readdir() is enough, or they should be treated as different even though you aren't allowed to create both of them, in which case they should not be normalized at all (and then there better be some way outside the shell, e.g., at the TTY driver layer, to choose the input encoding). Maybe the completion system should use (#u) more often, or maybe there needs to be a setopt to cause all patterns to act as if (#u) ... If there's a tricky bit, it's knowing which encoding is the default for input so you can normalize to that one. } Also, other unixes allow you to have both the NFC-form and NFD-form in } the same directory, e.g., 'touch fooa fooa' works just fine on linux } ext4 (the first filename is composed, the second decomposed); in such } cases normalization magic should not be done. Hence my question about what compile-time tests we need for this, and what if anything to do about Mac filesystems mounted on Linux.