zsh-workers
 help / color / mirror / code / Atom feed
From: Chet Ramey <chet.ramey@case.edu>
To: Kwon Yeolhyun <yeolhyunkwon@me.com>
Cc: "Zsh List Hackers'" <zsh-workers@zsh.org>, chet.ramey@case.edu
Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion
Date: Sat, 31 May 2014 11:21:56 -0400	[thread overview]
Message-ID: <5389F394.60205@case.edu> (raw)
In-Reply-To: <AB81F9FB-8D84-4656-9EFE-F2F98B196861@me.com>

On 5/30/14, 11:56 PM, Kwon Yeolhyun wrote:
> I have to work with lots of files of Korean names. 
> But the problem is that zsh failed in tab completion with Korean files.
> So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
> Also I searched mailing list and read some threads related to unicode or multibyte support. 
> But I can’t find any solution.
> 
> I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Your description and solution are right on the mark.  Mac OS X stores and
returns filenames in decomposed Unicode (NFD), while Mac keyboards return
characters in precomposed Unicode (NFC).  Decomposed Unicode is as you
describe: certain characters are `decomposed' into multiple codepoints.
(My use of NFD and NFC is not exact, but it's useful shorthand.)

What I did in bash was to convert between keyboard and file system
representations when performing filename comparisons for filename
completion.  Zsh can do the same using iconv, which provides (on Mac
OS X) the UTF-8-MAC encoding to do the conversion.

One possible strategy is to convert each filename to NFC for comparison,
something like the following.

1.  Keyboard input stays in NFC and is converted (dequoted, for example)
    to a `raw' form for comparison.

2.  Read directory, assume each name will be returned in NFD, convert
    name to NFC.

3.  Perform comparison using whatever strategy you'd like (e.g., taking
    case into account, mapping equivalent characters, whatever)

4.  If the comparison succeeds, add the matching filename (NFC) to the
    list of completions.

5.  If you have to add the filename to the command line (e.g., there is a
    single match), you have already converted it to NFC and can insert it
    directly.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/


  reply	other threads:[~2014-05-31 15:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-31  3:56 Kwon Yeolhyun
2014-05-31 15:21 ` Chet Ramey [this message]
2014-05-31 18:47   ` Bart Schaefer
2014-05-31 19:16 ` Peter Stephenson
2014-05-31 21:29   ` Bart Schaefer
2014-06-01  2:25     ` Daniel Shahaf
2014-06-01  5:30       ` Kwon Yeolhyun
2014-06-01 16:53         ` Daniel Shahaf
2014-06-01  7:56       ` Bart Schaefer
2014-06-01 16:46         ` Daniel Shahaf
2014-06-01 17:00         ` Jun T.
2014-06-01 19:13           ` Bart Schaefer
2014-06-02 17:01             ` Jun T.
2014-06-02 17:14               ` Bart Schaefer
2014-06-01 19:53           ` Bart Schaefer
2014-06-02 11:58             ` Kwon Yeolhyun
2014-06-02 14:23               ` Kwon Yeolhyun
2014-06-02 15:14                 ` Bart Schaefer
2014-06-02 15:27                   ` Peter Stephenson
2014-06-02 15:48                     ` Kwon Yeolhyun
2014-06-02 15:27                   ` Kwon Yeolhyun
2014-06-02 15:49                     ` Bart Schaefer
2014-06-02 15:58                       ` Kwon Yeolhyun
2014-06-02 14:31               ` Bart Schaefer
2014-06-02 17:15             ` Jun T.
2014-06-02 17:27               ` Bart Schaefer
2014-06-05 14:34                 ` Jun T.
2014-06-05 15:00                   ` Bart Schaefer
2014-06-02  5:17           ` Kwon Yeolhyun
2014-06-02  7:39             ` Jun T.
2014-06-02  8:42               ` Kwon Yeolhyun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5389F394.60205@case.edu \
    --to=chet.ramey@case.edu \
    --cc=yeolhyunkwon@me.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).