From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27823 invoked by alias); 1 Jun 2014 16:53:56 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32651 Received: (qmail 24324 invoked from network); 1 Jun 2014 16:53:54 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= daniel.shahaf.name; h=date:from:to:cc:subject:message-id :references:mime-version:content-type:content-transfer-encoding :in-reply-to; s=mesmtp; bh=lTIqlHLNQJKLCSS7MCMHITuSHhk=; b=wBdCt SiFDgpQsF7X+v94FWAPBPHtH7P7HlXz+IwS0If9zEwh37nBEeoNhi73xo3BfNdJ7 VVKS59mDyh3Y8+IrZlP6v8FSQcpZ7/8ZKWWFjcf6ndaPZQIYstmtRf+CUKN9sXra zqsjUhdd1z/Z37oE/qy0RoHpFphTpMPoS9JNms= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=date:from:to:cc:subject:message-id :references:mime-version:content-type:content-transfer-encoding :in-reply-to; s=smtpout; bh=lTIqlHLNQJKLCSS7MCMHITuSHhk=; b=EBzd WZGIMPWcyqC68Kh18B955M/+xhbSCIJEgyV2sXX2VWqPtHesmf0NSz/1I9g+tcXv hB6EXB/KmI2WN20QR4fiEDE5l+MjUoFTzgYZSAitKmH/49E/Rhmej04ZoZw9WnHp pYSlwnKu0WokzTZbGP+EJJNUd031N/KfiLezo9Y= X-Sasl-enc: RQZSIFTwkzrdde2qnA7he0wCK/nncP2avcDxxnDa/wYO 1401641632 Date: Sun, 1 Jun 2014 16:53:47 +0000 From: Daniel Shahaf To: Kwon Yeolhyun Cc: Zsh List Hackers' Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion Message-ID: <20140601165347.GB1965@tarsus.local2> References: <20140531201617.4ca60ab8@pws-pc.ntlworld.com> <140531142926.ZM556@torch.brasslantern.com> <20140601022527.GD1820@tarsus.local2> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Kwon Yeolhyun wrote on Sun, Jun 01, 2014 at 14:30:03 +0900: > > On Jun 1, 2014, at 11:25 AM, Daniel Shahaf wrote: > > > Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700: > >> On May 31, 8:16pm, Peter Stephenson wrote: > >> } > >> } I'm currently wondering if there is scope for normalising keyboard input > >> } really early --- before we feed it back to the shell --- and turning it > >> } back into the usual keyboard form right at the end > >> > >> Per thread with Chet, I think normalizing the filesystem is the easier > >> way to go. Keyboard input is already as close to normalized as it needs > >> to be, I think, and with only a couple of exceptions all the names we > >> get from the filesystem come through zreaddir(). > > > > What about, say, people doing 'ls' and copy-pasting a filename from the > > output into a command line? Wouldn't that result in NFD keyboard > > input? > > > > FWIW, while OS X always returns NFD filenames, one could also imagine an > > OS that is normalization-aware (forbids creating a file if its > > normalized name is the same as the normalized name of an existing file) > > but octet-sequence-preserving, and on such an OS both the readdir() > > output and the user input would need to be normalized. > > > > Also, other unixes allow you to have both the NFC-form and NFD-form in > > the same directory, e.g., 'touch fooá fooá' works just fine on linux > > ext4 (the first filename is composed, the second decomposed); in such > > cases normalization magic should not be done. > > > > Fun! :-) > > > > Daniel > > Fortunately, I think Mac OS X can handle input in decomposed or composed form. Yes, AFAIK, OS X accepts input in any normalization and returns NFD-normalized filenames. > So I think we can convert decomposed filenames into composed after readdir. It will work at least for Korean. That would work if the input is in NFC. > Detecting, composing, and decomposing hangul can be done easily. It is easy to convert any Unicode string to NFC or to NFD, not just strings consisting of Hangul codepoints. Cheers, Daniel