From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16658 invoked by alias); 1 Jun 2014 16:47:02 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32650 Received: (qmail 16859 invoked from network); 1 Jun 2014 16:46:46 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= daniel.shahaf.name; h=date:from:to:subject:message-id:references :mime-version:content-type:in-reply-to; s=mesmtp; bh=aGBt8sOVA2U f7ti1nbnd5VP3BbM=; b=3FH6njlH0fbSO2QjWTVOhg0/RIkgwv+WtOVjw2zwic1 4V3dINuM7nrHcw/wqJ97vdNUat3BIV+Dn20Hgafu/EXn/PPST/wCsNKkeOYbYTiw QuupFwLBNVZhdaeHBVHj/i3+9hQhicdZqofxUlysi8sAGxO9CTYvg317ZbUAuJP4 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=date:from:to:subject:message-id :references:mime-version:content-type:in-reply-to; s=smtpout; bh=aGBt8sOVA2Uf7ti1nbnd5VP3BbM=; b=U7/PAYnV6at48bwrwnpvjzy2/gCb WnmhI4sVqu6RItrgJM76u3vaWLx9oVojSn7vfXnr9vBC6n7K33GnqiNTO7sSXKy6 1rYbgbxfIWRj+kQ9sP6Vkp5qjPC7+272pEM3u1X/dAXWl8H5nJZniNjFu0hKHqtb 2WMynw+3oG/PcWM= X-Sasl-enc: v4t3ECKBeGZaKa05I3n1vm7UiJWKzklNTZylbNSsKnUd 1401641201 Date: Sun, 1 Jun 2014 16:46:34 +0000 From: Daniel Shahaf To: Zsh List Hackers' Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion Message-ID: <20140601164634.GA1965@tarsus.local2> References: <20140531201617.4ca60ab8@pws-pc.ntlworld.com> <140531142926.ZM556@torch.brasslantern.com> <20140601022527.GD1820@tarsus.local2> <140601005624.ZM3283@torch.brasslantern.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <140601005624.ZM3283@torch.brasslantern.com> User-Agent: Mutt/1.5.21 (2010-09-15) Bart Schaefer wrote on Sun, Jun 01, 2014 at 00:56:24 -0700: > On Jun 1, 2:25am, Daniel Shahaf wrote: > } FWIW, while OS X always returns NFD filenames, one could also imagine an > } OS that is normalization-aware (forbids creating a file if its > } normalized name is the same as the normalized name of an existing file) > } but octet-sequence-preserving, and on such an OS both the readdir() > } output and the user input would need to be normalized. > > This case is ultimately the same as your first example. Either the two > forms of name should be treated the same, in which case normalizing the > results of readdir() is enough, or they should be treated as different > even though you aren't allowed to create both of them, in which case > they should not be normalized at all (and then there better be some way > outside the shell, e.g., at the TTY driver layer, to choose the input > encoding). > > Maybe the completion system should use (#u) more often, or maybe there > needs to be a setopt to cause all patterns to act as if (#u) ... > > If there's a tricky bit, it's knowing which encoding is the default for > input so you can normalize to that one. Well, sure, if the user input is normalized to NFC before it hits zsh, then the problem is simpler (either NFC->NFD the input or NFD->NFC readdir). I was trying to solve the more general problem of matching non-normalized readdir output to non-normalized user input; perhaps that would be an overkill. Daniel