zsh-workers
 help / color / mirror / code / Atom feed
From: Daniel Shahaf <d.s@daniel.shahaf.name>
To: Kwon Yeolhyun <yeolhyunkwon@me.com>
Cc: Zsh List Hackers' <zsh-workers@zsh.org>
Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion
Date: Sun, 1 Jun 2014 16:53:47 +0000	[thread overview]
Message-ID: <20140601165347.GB1965@tarsus.local2> (raw)
In-Reply-To: <E7EE4668-E047-46AE-A50A-A03F66ACE295@me.com>

Kwon Yeolhyun wrote on Sun, Jun 01, 2014 at 14:30:03 +0900:
> 
> On Jun 1, 2014, at 11:25 AM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> 
> > Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700:
> >> On May 31,  8:16pm, Peter Stephenson wrote:
> >> }
> >> } I'm currently wondering if there is scope for normalising keyboard input
> >> } really early --- before we feed it back to the shell --- and turning it
> >> } back into the usual keyboard form right at the end
> >> 
> >> Per thread with Chet, I think normalizing the filesystem is the easier
> >> way to go.  Keyboard input is already as close to normalized as it needs
> >> to be, I think, and with only a couple of exceptions all the names we
> >> get from the filesystem come through zreaddir().
> > 
> > What about, say, people doing 'ls' and copy-pasting a filename from the
> > output into a command line?  Wouldn't that result in NFD keyboard
> > input?
> > 
> > FWIW, while OS X always returns NFD filenames, one could also imagine an
> > OS that is normalization-aware (forbids creating a file if its
> > normalized name is the same as the normalized name of an existing file)
> > but octet-sequence-preserving, and on such an OS both the readdir()
> > output and the user input would need to be normalized.
> > 
> > Also, other unixes allow you to have both the NFC-form and NFD-form in
> > the same directory, e.g., 'touch fooá fooá' works just fine on linux
> > ext4 (the first filename is composed, the second decomposed); in such
> > cases normalization magic should not be done.
> > 
> > Fun! :-)
> > 
> > Daniel
> 
> Fortunately, I think Mac OS X can handle input in decomposed or composed form.

Yes, AFAIK, OS X accepts input in any normalization and returns
NFD-normalized filenames.

> So I think we can convert decomposed filenames into composed after readdir. It will work at least for Korean.

That would work if the input is in NFC.

> Detecting, composing, and decomposing hangul can be done easily.

It is easy to convert any Unicode string to NFC or to NFD, not just
strings consisting of Hangul codepoints.

Cheers,

Daniel


  reply	other threads:[~2014-06-01 16:53 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-31  3:56 Kwon Yeolhyun
2014-05-31 15:21 ` Chet Ramey
2014-05-31 18:47   ` Bart Schaefer
2014-05-31 19:16 ` Peter Stephenson
2014-05-31 21:29   ` Bart Schaefer
2014-06-01  2:25     ` Daniel Shahaf
2014-06-01  5:30       ` Kwon Yeolhyun
2014-06-01 16:53         ` Daniel Shahaf [this message]
2014-06-01  7:56       ` Bart Schaefer
2014-06-01 16:46         ` Daniel Shahaf
2014-06-01 17:00         ` Jun T.
2014-06-01 19:13           ` Bart Schaefer
2014-06-02 17:01             ` Jun T.
2014-06-02 17:14               ` Bart Schaefer
2014-06-01 19:53           ` Bart Schaefer
2014-06-02 11:58             ` Kwon Yeolhyun
2014-06-02 14:23               ` Kwon Yeolhyun
2014-06-02 15:14                 ` Bart Schaefer
2014-06-02 15:27                   ` Peter Stephenson
2014-06-02 15:48                     ` Kwon Yeolhyun
2014-06-02 15:27                   ` Kwon Yeolhyun
2014-06-02 15:49                     ` Bart Schaefer
2014-06-02 15:58                       ` Kwon Yeolhyun
2014-06-02 14:31               ` Bart Schaefer
2014-06-02 17:15             ` Jun T.
2014-06-02 17:27               ` Bart Schaefer
2014-06-05 14:34                 ` Jun T.
2014-06-05 15:00                   ` Bart Schaefer
2014-06-02  5:17           ` Kwon Yeolhyun
2014-06-02  7:39             ` Jun T.
2014-06-02  8:42               ` Kwon Yeolhyun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140601165347.GB1965@tarsus.local2 \
    --to=d.s@daniel.shahaf.name \
    --cc=yeolhyunkwon@me.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).