zsh-workers
 help / color / mirror / code / Atom feed
* Unicode, Korean, normalization form, Mac OS X and tab completion
@ 2014-05-31  3:56 Kwon Yeolhyun
  2014-05-31 15:21 ` Chet Ramey
  2014-05-31 19:16 ` Peter Stephenson
  0 siblings, 2 replies; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-05-31  3:56 UTC (permalink / raw)
  To: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

I have to work with lots of files of Korean names. 
But the problem is that zsh failed in tab completion with Korean files.
So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
Also I searched mailing list and read some threads related to unicode or multibyte support. 
But I can’t find any solution.

I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Here’s my description about the issue..

1) Unicode spec has defined normalization forms, which is related to canonical equivalence, comparing two unicode strings.
2) Normalized forms are to decompose a character into some components.
    For example, Å(alphabet A with a ring above) -> A(alphabet A) + ˚(ring above) or 가(hangul syllable ga) -> ㄱ(hangul choseoung gieuk) + ㅏ(hangul jungseong ah)
3) A Korean letter, a.k.a hangul, has three parts: Choseong, jungseong, jongseong. For example, 가 is decomposed into the choseong, ㄱ, and the jungseong, ㅏ.
    And 각 can break down into ㄱ,ㅏ,ㄱ(the jongseong).
4) Mac OS X uses normalized string as filename. Assuming there’s a file with the name of 가나다, it has the name of ㄱㅏㄴㅏㄷㅏ(decomposed into hangul jamos) internally. (Link to hangul jamos: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352&number=1024 )
5) I guess the reason why the tab completion has failed is that zsh compare the user input, 가나다, with the filename, ㄱㅏㄴㅏㄷㅏ.
    가나다 and ㄱㅏㄴㅏㄷㅏ are canonically equivalent but have different binary representations.
6) I insist that comparing two unicode strings must be done with respect to the canonical equivalence.
7) Unicode spec has the dedicated section for treating hangul syllables. Fortunately, hangul can be decomposed and composed algorithmically.
( Please refer to the unicode spec section 3.12 under “Parsing" http://www.unicode.org/faq/specifications.html )
8) On Ubuntu, the tab completion is perfectly working. Currently, this issue is restricted to Mac OS X. (I’ve never tested on the other platform.)
9) I think this is related to the COMBINING_CHAR option but the option is not regarding hangul.
10 ) Now, the latest version of bash is the only shell with working tab completion feature on Mac OS X.
11) ‘Hangul’ is the name of Korean letters. If you have interested in it, please refer to http://en.wikipedia.org/wiki/Hangul

Thanks for reading.

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-05-31  3:56 Unicode, Korean, normalization form, Mac OS X and tab completion Kwon Yeolhyun
@ 2014-05-31 15:21 ` Chet Ramey
  2014-05-31 18:47   ` Bart Schaefer
  2014-05-31 19:16 ` Peter Stephenson
  1 sibling, 1 reply; 31+ messages in thread
From: Chet Ramey @ 2014-05-31 15:21 UTC (permalink / raw)
  To: Kwon Yeolhyun; +Cc: Zsh List Hackers', chet.ramey

On 5/30/14, 11:56 PM, Kwon Yeolhyun wrote:
> I have to work with lots of files of Korean names. 
> But the problem is that zsh failed in tab completion with Korean files.
> So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
> Also I searched mailing list and read some threads related to unicode or multibyte support. 
> But I can’t find any solution.
> 
> I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Your description and solution are right on the mark.  Mac OS X stores and
returns filenames in decomposed Unicode (NFD), while Mac keyboards return
characters in precomposed Unicode (NFC).  Decomposed Unicode is as you
describe: certain characters are `decomposed' into multiple codepoints.
(My use of NFD and NFC is not exact, but it's useful shorthand.)

What I did in bash was to convert between keyboard and file system
representations when performing filename comparisons for filename
completion.  Zsh can do the same using iconv, which provides (on Mac
OS X) the UTF-8-MAC encoding to do the conversion.

One possible strategy is to convert each filename to NFC for comparison,
something like the following.

1.  Keyboard input stays in NFC and is converted (dequoted, for example)
    to a `raw' form for comparison.

2.  Read directory, assume each name will be returned in NFD, convert
    name to NFC.

3.  Perform comparison using whatever strategy you'd like (e.g., taking
    case into account, mapping equivalent characters, whatever)

4.  If the comparison succeeds, add the matching filename (NFC) to the
    list of completions.

5.  If you have to add the filename to the command line (e.g., there is a
    single match), you have already converted it to NFC and can insert it
    directly.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-05-31 15:21 ` Chet Ramey
@ 2014-05-31 18:47   ` Bart Schaefer
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-05-31 18:47 UTC (permalink / raw)
  To: Zsh List Hackers'; +Cc: chet.ramey, Kwon Yeolhyun

Thanks for the reply, Chet.

On May 31, 11:21am, Chet Ramey wrote:
} Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet
}
} Your description and solution are right on the mark.  Mac OS X stores and
} returns filenames in decomposed Unicode (NFD), while Mac keyboards return
} characters in precomposed Unicode (NFC).

Hrm.  I'm rather surprised this hasn't broken something *else*, because
zsh is freely mixing keyboard and filesytem representations all over the
place.  E.g., does globbing also fail, in at least some cases?

} What I did in bash was to convert between keyboard and file system
} representations when performing filename comparisons for filename
} completion.  Zsh can do the same using iconv, which provides (on Mac
} OS X) the UTF-8-MAC encoding to do the conversion.

Unfortunately it's not isolated there.  Except for the (old, deprecated)
compctl completions, zsh does all the interesting work in shell functions
with strings that may come from glob patterns or array variables or any
number of other places.  Only sometimes are those strings passed through
the helper builtin that interprets them as file names, and even then it
can't possibly know whether they originated from readdir().

Fortunately, I think it *would* be OK to use the zreaddir() wrapper to
convert everything from NFD to NFC.  zreaddir() already applies zsh's
metafy() operation to all the file names, so as long as the OS properly
converts back to NFD (which it must, or we'd already be in deep doody
from throwing keyboard input at it) it should be safe to also iconv()
at this point.  This should cover globbing as well as completion.

What are the configure / compile-time / run-time tests needed to detect
this situation?  Are we going to run into problems with e.g. NFS or Samba
filesystems that are NOT in NFD representation?  Do we need to handle this
as a general case where we should always be testing in some way for wonky
filesystems in order to normalize (e.g., a Mac FS mounted on Linux)?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-05-31  3:56 Unicode, Korean, normalization form, Mac OS X and tab completion Kwon Yeolhyun
  2014-05-31 15:21 ` Chet Ramey
@ 2014-05-31 19:16 ` Peter Stephenson
  2014-05-31 21:29   ` Bart Schaefer
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Stephenson @ 2014-05-31 19:16 UTC (permalink / raw)
  To: Zsh List Hackers'

On Sat, 31 May 2014 12:56:06 +0900
Kwon Yeolhyun <yeolhyunkwon@me.com> wrote:
> 4) Mac OS X uses normalized string as filename. Assuming there’s a
> file with the name of 가나다, it has the name of
> ㄱㅏㄴㅏㄷㅏ(decomposed into hangul jamos) internally. (Link to hangul
> jamos:
> http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352&number=1024)
> 5) I guess the reason why the tab completion has failed is that zsh
> compare the user input, 가나다, with the filename, ㄱㅏㄴㅏㄷㅏ.
> 가나다 and ㄱㅏㄴㅏㄷㅏ are canonically equivalent but have different
> binary representations.

You're right, this is a real problem that could do with solving.

The actual conversion between the two is easy enough --- though most of
use here don't use MACs or character sets that show up the problem, so
we'd need a volunteer to help with this (relatively) easy bit.

The difficult bit, about which I suspect only Bart and I are likely to
have detailed opinions, is where to do the conversion.

Doing it at the point where data is read from the keyboard is
problematic, since what we put back onto the command line is quite
intricately tied to what we read from it in the first place, and
arbitrary transformations at this point make it hard to know what to put
back after the completion.

Doing it right down in the guts is even harder --- there are some
incredibly complicated things going on to support features like partial
word completion that currently treat data simply as octet strings, and
upgrading this is a huge job.

So if we can guarantee the keyboard input is in one form (and I'm not
sure we necessarily can) it might be easier to convert file names into
that format.  The trouble here is that to be consistent we need to
convert all data passed into the completion system, e.g. from file
contents passed as strings via functions.  (In principle it's
more correct to normalise all input anyway.)

I'm currently wondering if there is scope for normalising keyboard input
really early --- before we feed it back to the shell --- and turning it
back into the usual keyboard form right at the end, perhaps not worrying
too much if the original input was in a different form as long as
they're equivalent.  But I suspect it's not that easy.

So this will take a certain amount of thought.

pws


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-05-31 19:16 ` Peter Stephenson
@ 2014-05-31 21:29   ` Bart Schaefer
  2014-06-01  2:25     ` Daniel Shahaf
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Schaefer @ 2014-05-31 21:29 UTC (permalink / raw)
  To: Zsh List Hackers'

On May 31,  8:16pm, Peter Stephenson wrote:
}
} I'm currently wondering if there is scope for normalising keyboard input
} really early --- before we feed it back to the shell --- and turning it
} back into the usual keyboard form right at the end

Per thread with Chet, I think normalizing the filesystem is the easier
way to go.  Keyboard input is already as close to normalized as it needs
to be, I think, and with only a couple of exceptions all the names we
get from the filesystem come through zreaddir().

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-05-31 21:29   ` Bart Schaefer
@ 2014-06-01  2:25     ` Daniel Shahaf
  2014-06-01  5:30       ` Kwon Yeolhyun
  2014-06-01  7:56       ` Bart Schaefer
  0 siblings, 2 replies; 31+ messages in thread
From: Daniel Shahaf @ 2014-06-01  2:25 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh List Hackers'

Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700:
> On May 31,  8:16pm, Peter Stephenson wrote:
> }
> } I'm currently wondering if there is scope for normalising keyboard input
> } really early --- before we feed it back to the shell --- and turning it
> } back into the usual keyboard form right at the end
> 
> Per thread with Chet, I think normalizing the filesystem is the easier
> way to go.  Keyboard input is already as close to normalized as it needs
> to be, I think, and with only a couple of exceptions all the names we
> get from the filesystem come through zreaddir().

What about, say, people doing 'ls' and copy-pasting a filename from the
output into a command line?  Wouldn't that result in NFD keyboard
input?

FWIW, while OS X always returns NFD filenames, one could also imagine an
OS that is normalization-aware (forbids creating a file if its
normalized name is the same as the normalized name of an existing file)
but octet-sequence-preserving, and on such an OS both the readdir()
output and the user input would need to be normalized.

Also, other unixes allow you to have both the NFC-form and NFD-form in
the same directory, e.g., 'touch fooá fooá' works just fine on linux
ext4 (the first filename is composed, the second decomposed); in such
cases normalization magic should not be done.

Fun! :-)

Daniel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01  2:25     ` Daniel Shahaf
@ 2014-06-01  5:30       ` Kwon Yeolhyun
  2014-06-01 16:53         ` Daniel Shahaf
  2014-06-01  7:56       ` Bart Schaefer
  1 sibling, 1 reply; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-01  5:30 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 3190 bytes --]


On Jun 1, 2014, at 11:25 AM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:

> Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700:
>> On May 31,  8:16pm, Peter Stephenson wrote:
>> }
>> } I'm currently wondering if there is scope for normalising keyboard input
>> } really early --- before we feed it back to the shell --- and turning it
>> } back into the usual keyboard form right at the end
>> 
>> Per thread with Chet, I think normalizing the filesystem is the easier
>> way to go.  Keyboard input is already as close to normalized as it needs
>> to be, I think, and with only a couple of exceptions all the names we
>> get from the filesystem come through zreaddir().
> 
> What about, say, people doing 'ls' and copy-pasting a filename from the
> output into a command line?  Wouldn't that result in NFD keyboard
> input?
> 
> FWIW, while OS X always returns NFD filenames, one could also imagine an
> OS that is normalization-aware (forbids creating a file if its
> normalized name is the same as the normalized name of an existing file)
> but octet-sequence-preserving, and on such an OS both the readdir()
> output and the user input would need to be normalized.
> 
> Also, other unixes allow you to have both the NFC-form and NFD-form in
> the same directory, e.g., 'touch fooá fooá' works just fine on linux
> ext4 (the first filename is composed, the second decomposed); in such
> cases normalization magic should not be done.
> 
> Fun! :-)
> 
> Daniel

Fortunately, I think Mac OS X can handle input in decomposed or composed form.
Here’s some code I tested:

================ hangul.c =========================
#include <stdio.h>
#include <dirent.h>

int main() {

    char *fname = "한글/가나다";
    char *dirname = "한글";
    DIR *dirp = opendir(dirname);
    struct dirent *direntry = NULL;
    FILE *fp = fopen(fname, "r");
    char buf[512];

    if (dirp == NULL) {
        printf("Failed to read the directory: %s\n", dirname);
        if (fp > 0)
            fclose(fp);
        return -1;
    }

    while ((direntry = readdir(dirp)) != NULL) {
        printf("file name: %s\n", direntry->d_name);
        if (direntry->d_name[0] == '.')
            continue;
    }
    closedir(dirp);

    if (fp == NULL) {
        printf("Failed to read %s\n", fname);
        return -1;
    }    else {
        fread(buf, sizeof(buf), 1, fp);
        printf("%s\n", buf);
    }
    fclose(fp);

    return 0;
}
======= END ========
And the output is 

> mkdir 한글
> touch 한글/가나다
> echo “test success!” > 한글/가나다
> clang -g hangul.c
> ./a.out
file name: .
file name: ..
file name: 가나다
test success!

I checked the contents of memory using lldb and I confirmed that fname is UTF-8 composed chars and the returned filename from readdir is UTF-8 decomposed chars.
But file operation (reading in the above codes and writing is also working) is working perfectly.
So I think we can convert decomposed filenames into composed after readdir. It will work at least for Korean.
Detecting, composing, and decomposing hangul can be done easily.


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01  2:25     ` Daniel Shahaf
  2014-06-01  5:30       ` Kwon Yeolhyun
@ 2014-06-01  7:56       ` Bart Schaefer
  2014-06-01 16:46         ` Daniel Shahaf
  2014-06-01 17:00         ` Jun T.
  1 sibling, 2 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-06-01  7:56 UTC (permalink / raw)
  To: Zsh List Hackers'

On Jun 1,  2:25am, Daniel Shahaf wrote:
}
} What about, say, people doing 'ls' and copy-pasting a filename from the
} output into a command line?  Wouldn't that result in NFD keyboard
} input?

Yes, but there's only so far that it makes sense to go with this.  For
example, [[ fooá = fooaÌ ]] arguably should not normalize, and script
file contents should not be normalized, etc.  I think messing with the
command input stream will create more problems than it solves.

What we *might* need is for patcompile() also to normalize (though that
potentially violates what I just said about [[ ... ]], depending on which
encoding is the pattern and which is the string to be matched).  Maybe
this needs to be part of the (#u) qualifier handling, or a related new
qualifier.

(Note there's little to no existing support for wide characters in e.g.
matcher-list range specifications, so no point in going there yet.)

} FWIW, while OS X always returns NFD filenames, one could also imagine an
} OS that is normalization-aware (forbids creating a file if its
} normalized name is the same as the normalized name of an existing file)
} but octet-sequence-preserving, and on such an OS both the readdir()
} output and the user input would need to be normalized.

This case is ultimately the same as your first example.  Either the two
forms of name should be treated the same, in which case normalizing the
results of readdir() is enough, or they should be treated as different
even though you aren't allowed to create both of them, in which case
they should not be normalized at all (and then there better be some way
outside the shell, e.g., at the TTY driver layer, to choose the input
encoding).

Maybe the completion system should use (#u) more often, or maybe there
needs to be a setopt to cause all patterns to act as if (#u) ...

If there's a tricky bit, it's knowing which encoding is the default for
input so you can normalize to that one.

} Also, other unixes allow you to have both the NFC-form and NFD-form in
} the same directory, e.g., 'touch fooa fooa' works just fine on linux
} ext4 (the first filename is composed, the second decomposed); in such
} cases normalization magic should not be done.

Hence my question about what compile-time tests we need for this, and
what if anything to do about Mac filesystems mounted on Linux.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01  7:56       ` Bart Schaefer
@ 2014-06-01 16:46         ` Daniel Shahaf
  2014-06-01 17:00         ` Jun T.
  1 sibling, 0 replies; 31+ messages in thread
From: Daniel Shahaf @ 2014-06-01 16:46 UTC (permalink / raw)
  To: Zsh List Hackers'

Bart Schaefer wrote on Sun, Jun 01, 2014 at 00:56:24 -0700:
> On Jun 1,  2:25am, Daniel Shahaf wrote:
> } FWIW, while OS X always returns NFD filenames, one could also imagine an
> } OS that is normalization-aware (forbids creating a file if its
> } normalized name is the same as the normalized name of an existing file)
> } but octet-sequence-preserving, and on such an OS both the readdir()
> } output and the user input would need to be normalized.
> 
> This case is ultimately the same as your first example.  Either the two
> forms of name should be treated the same, in which case normalizing the
> results of readdir() is enough, or they should be treated as different
> even though you aren't allowed to create both of them, in which case
> they should not be normalized at all (and then there better be some way
> outside the shell, e.g., at the TTY driver layer, to choose the input
> encoding).
> 
> Maybe the completion system should use (#u) more often, or maybe there
> needs to be a setopt to cause all patterns to act as if (#u) ...
> 
> If there's a tricky bit, it's knowing which encoding is the default for
> input so you can normalize to that one.

Well, sure, if the user input is normalized to NFC before it hits zsh,
then the problem is simpler (either NFC->NFD the input or NFD->NFC
readdir).  I was trying to solve the more general problem of matching
non-normalized readdir output to non-normalized user input; perhaps
that would be an overkill.

Daniel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01  5:30       ` Kwon Yeolhyun
@ 2014-06-01 16:53         ` Daniel Shahaf
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Shahaf @ 2014-06-01 16:53 UTC (permalink / raw)
  To: Kwon Yeolhyun; +Cc: Zsh List Hackers'

Kwon Yeolhyun wrote on Sun, Jun 01, 2014 at 14:30:03 +0900:
> 
> On Jun 1, 2014, at 11:25 AM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> 
> > Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700:
> >> On May 31,  8:16pm, Peter Stephenson wrote:
> >> }
> >> } I'm currently wondering if there is scope for normalising keyboard input
> >> } really early --- before we feed it back to the shell --- and turning it
> >> } back into the usual keyboard form right at the end
> >> 
> >> Per thread with Chet, I think normalizing the filesystem is the easier
> >> way to go.  Keyboard input is already as close to normalized as it needs
> >> to be, I think, and with only a couple of exceptions all the names we
> >> get from the filesystem come through zreaddir().
> > 
> > What about, say, people doing 'ls' and copy-pasting a filename from the
> > output into a command line?  Wouldn't that result in NFD keyboard
> > input?
> > 
> > FWIW, while OS X always returns NFD filenames, one could also imagine an
> > OS that is normalization-aware (forbids creating a file if its
> > normalized name is the same as the normalized name of an existing file)
> > but octet-sequence-preserving, and on such an OS both the readdir()
> > output and the user input would need to be normalized.
> > 
> > Also, other unixes allow you to have both the NFC-form and NFD-form in
> > the same directory, e.g., 'touch fooá fooá' works just fine on linux
> > ext4 (the first filename is composed, the second decomposed); in such
> > cases normalization magic should not be done.
> > 
> > Fun! :-)
> > 
> > Daniel
> 
> Fortunately, I think Mac OS X can handle input in decomposed or composed form.

Yes, AFAIK, OS X accepts input in any normalization and returns
NFD-normalized filenames.

> So I think we can convert decomposed filenames into composed after readdir. It will work at least for Korean.

That would work if the input is in NFC.

> Detecting, composing, and decomposing hangul can be done easily.

It is easy to convert any Unicode string to NFC or to NFD, not just
strings consisting of Hangul codepoints.

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01  7:56       ` Bart Schaefer
  2014-06-01 16:46         ` Daniel Shahaf
@ 2014-06-01 17:00         ` Jun T.
  2014-06-01 19:13           ` Bart Schaefer
                             ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Jun T. @ 2014-06-01 17:00 UTC (permalink / raw)
  To: zsh-workers

There is a patch by a Japanese user which simply converts
file names obtained by readder() into the composed form ("NFC"):
	https://gist.github.com/waltarix/1403346
The patch in this gist is against zsh-5.0.0 (I guess).
I attached the same patch against the current git master below
(I added defined(__APPLE__) to the #if condition).
We may use this to see what kind of problem may appear by this simple
approach.

Kwon Yeolhyun, can you test this patch ? 

In the current zsh (without this patch), 
$ ls 가<TAB>
doesn't work if 가 is input from keyboard (NFC), but works if it is
pasted from the ls output (NFD). With the patch, the opposite happens.
 
Of course this patch affect not only Korean but any languages which
have decomposable character. For example, if you have a file named über 
in the current directory, with the current zsh (without the patch):

$ ls u<TAB>	# completes to über (useful for some user??)
$ ls ü<TAB>	# fails to complete

and u* matches with über while ü* doesn't.
With the patch, the we get the opposite behavior.

Jun



diff --git a/Src/utils.c b/Src/utils.c
index 9439227..86b61f1 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4270,6 +4270,13 @@ mod_export char *
 zreaddir(DIR *dir, int ignoredots)
 {
     struct dirent *de;
+#if defined(HAVE_ICONV) && defined(__APPLE__)
+    static iconv_t conv_ds = (iconv_t)NULL;
+    static char *conv_name = (char *)NULL;
+    char *temp_name;
+    char *temp_name_ptr, *orig_name_ptr;
+    size_t temp_name_len, orig_name_len;
+#endif
 
     do {
 	de = readdir(dir);
@@ -4278,6 +4285,23 @@ zreaddir(DIR *dir, int ignoredots)
     } while(ignoredots && de->d_name[0] == '.' &&
 	(!de->d_name[1] || (de->d_name[1] == '.' && !de->d_name[2])));
 
+#if defined(HAVE_ICONV) && defined(__APPLE__)
+    if (!conv_ds)
+	conv_ds = iconv_open("UTF-8", "UTF-8-MAC");
+    if (conv_ds) {
+	orig_name_ptr = de->d_name;
+	orig_name_len = strlen(de->d_name);
+	conv_name = zrealloc(conv_name, orig_name_len+1);
+	temp_name_ptr = conv_name;
+	temp_name_len = orig_name_len;
+	if (iconv(conv_ds,&orig_name_ptr,&orig_name_len,&temp_name_ptr,&temp_name_len) >= 0) {
+	    *temp_name_ptr = '\0';
+	    temp_name = conv_name;
+	    return metafy(temp_name, -1, META_STATIC);
+	}
+    }
+#endif
+
     return metafy(de->d_name, -1, META_STATIC);
 }
 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01 17:00         ` Jun T.
@ 2014-06-01 19:13           ` Bart Schaefer
  2014-06-02 17:01             ` Jun T.
  2014-06-01 19:53           ` Bart Schaefer
  2014-06-02  5:17           ` Kwon Yeolhyun
  2 siblings, 1 reply; 31+ messages in thread
From: Bart Schaefer @ 2014-06-01 19:13 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On Sun, Jun 1, 2014 at 10:00 AM, Jun T. <takimoto-j@kba.biglobe.ne.jp>
wrote:

> There is a patch by a Japanese user which simply converts
> file names obtained by readder() into the composed form ("NFC"):
>         https://gist.github.com/waltarix/1403346
> The patch in this gist is against zsh-5.0.0 (I guess).
> I attached the same patch against the current git master below
> (I added defined(__APPLE__) to the #if condition).
>

Arigatoo gozaimasu!  (Watch me practice my limited and rusty Nihongo.)


> In the current zsh (without this patch),
> $ ls 가<TAB>
> doesn't work if 가 is input from keyboard (NFC), but works if it is
> pasted from the ls output (NFD). With the patch, the opposite happens.
>

This is as expected; both might work if patcompile() were also smart about
it.

For example, if you have a file named über
> in the current directory, with the current zsh (without the patch):
>
> $ ls u<TAB>     # completes to über (useful for some user??)
> $ ls ü<TAB>     # fails to complete
>
> and u* matches with über while ü* doesn't.
> With the patch, the we get the opposite behavior.
>

The current behavior here is pretty much by accident, because the
decomposed character for "ü" happens to be "u+umlaut" and (if I'm reading
this correctly) at the lowest level the pattern match is applied octet-wise
rather than character-wise, so "*" matches the umlaut and "u" is considered
a prefix.  Arguably the current behavior is wrong.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01 17:00         ` Jun T.
  2014-06-01 19:13           ` Bart Schaefer
@ 2014-06-01 19:53           ` Bart Schaefer
  2014-06-02 11:58             ` Kwon Yeolhyun
  2014-06-02 17:15             ` Jun T.
  2014-06-02  5:17           ` Kwon Yeolhyun
  2 siblings, 2 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-06-01 19:53 UTC (permalink / raw)
  To: zsh-workers

On Jun 2,  2:00am, Jun T. wrote:
} Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet
}
} There is a patch by a Japanese user which simply converts
} file names obtained by readder() into the composed form ("NFC")

I tried this and found a couple of problems, mainly that iconv() can
return >= 0 but do nothing if the input has no convertible characters.
Here is my suggested correction, also eliminates unnecessary temp_name:

diff --git a/Src/utils.c b/Src/utils.c
index 9439227..8b512bb 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4270,6 +4270,12 @@ mod_export char *
 zreaddir(DIR *dir, int ignoredots)
 {
     struct dirent *de;
+#if defined(HAVE_ICONV) && defined(__APPLE__)
+    static iconv_t conv_ds = (iconv_t)0;
+    static char *conv_name = 0;
+    char *conv_name_ptr, *orig_name_ptr;
+    size_t conv_name_len, orig_name_len;
+#endif
 
     do {
 	de = readdir(dir);
@@ -4278,6 +4284,31 @@ zreaddir(DIR *dir, int ignoredots)
     } while(ignoredots && de->d_name[0] == '.' &&
 	(!de->d_name[1] || (de->d_name[1] == '.' && !de->d_name[2])));
 
+#if defined(HAVE_ICONV) && defined(__APPLE__)
+    if (!conv_ds)
+	conv_ds = iconv_open("UTF-8", "UTF-8-MAC");
+    if (conv_ds) {
+	/* Force initial state in case re-using conv_ds */
+	(void) iconv(conv_ds, 0, &orig_name_len, 0, &conv_name_len);
+
+	orig_name_ptr = de->d_name;
+	orig_name_len = strlen(de->d_name);
+	conv_name = zrealloc(conv_name, orig_name_len+1);
+	conv_name_ptr = conv_name;
+	conv_name_len = orig_name_len;
+	if (iconv(conv_ds,
+		  &orig_name_ptr, &orig_name_len,
+		  &conv_name_ptr, &conv_name_len) >= 0) {
+	  if (orig_name_len == 0) {
+	    /* Completely converted, metafy and return */
+	    *conv_name_ptr = '\0';
+	    return metafy(conv_name, -1, META_STATIC);
+	  }
+	}
+	/* Error, or conversion incomplete, keep the original name */
+    }
+#endif
+
     return metafy(de->d_name, -1, META_STATIC);
 }
 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01 17:00         ` Jun T.
  2014-06-01 19:13           ` Bart Schaefer
  2014-06-01 19:53           ` Bart Schaefer
@ 2014-06-02  5:17           ` Kwon Yeolhyun
  2014-06-02  7:39             ` Jun T.
  2 siblings, 1 reply; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02  5:17 UTC (permalink / raw)
  To: Jun T.; +Cc: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]


On Jun 2, 2014, at 2:00 AM, Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:

> There is a patch by a Japanese user which simply converts
> file names obtained by readder() into the composed form ("NFC"):
> 	https://gist.github.com/waltarix/1403346
> The patch in this gist is against zsh-5.0.0 (I guess).
> I attached the same patch against the current git master below
> (I added defined(__APPLE__) to the #if condition).
> We may use this to see what kind of problem may appear by this simple
> approach.
> 
> Kwon Yeolhyun, can you test this patch ? 

Hm… On my system(Mac OS X 10.9.3, Xcode 5.1.1), it seems that iconv doesn’t support 'UTF-8-MAC.’
I’m going to try ICU library.
Is it ok with zsh?


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02  5:17           ` Kwon Yeolhyun
@ 2014-06-02  7:39             ` Jun T.
  2014-06-02  8:42               ` Kwon Yeolhyun
  0 siblings, 1 reply; 31+ messages in thread
From: Jun T. @ 2014-06-02  7:39 UTC (permalink / raw)
  To: zsh-workers; +Cc: Kwon Yeolhyun


On 2014/06/02, at 14:17, Kwon Yeolhyun <yeolhyunkwon@me.com> wrote:
> Hm… On my system(Mac OS X 10.9.3, Xcode 5.1.1), it seems that iconv doesn’t support 'UTF-8-MAC.’

That's strange.
How did you check whether your iconv supports UTF-8-MAC or not?
On the command line, what is the output of the following command?

$ iconv -l






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02  7:39             ` Jun T.
@ 2014-06-02  8:42               ` Kwon Yeolhyun
  0 siblings, 0 replies; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02  8:42 UTC (permalink / raw)
  To: Jun T.; +Cc: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On Jun 2, 2014, at 4:39 PM, Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:

> 
> On 2014/06/02, at 14:17, Kwon Yeolhyun <yeolhyunkwon@me.com> wrote:
>> Hm… On my system(Mac OS X 10.9.3, Xcode 5.1.1), it seems that iconv doesn’t support 'UTF-8-MAC.’
> 
> That's strange.
> How did you check whether your iconv supports UTF-8-MAC or not?
> On the command line, what is the output of the following command?
> 
> $ iconv -l
> 

Oops. My test code was wrong. I’ll test with the code suggested by Bart Schaefer.
I expect it would be good.


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01 19:53           ` Bart Schaefer
@ 2014-06-02 11:58             ` Kwon Yeolhyun
  2014-06-02 14:23               ` Kwon Yeolhyun
  2014-06-02 14:31               ` Bart Schaefer
  2014-06-02 17:15             ` Jun T.
  1 sibling, 2 replies; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02 11:58 UTC (permalink / raw)
  To: Bart Schaefer, Jun T.; +Cc: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

On Jun 2, 2014, at 4:53 AM, Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Jun 2,  2:00am, Jun T. wrote:
> } Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet
> }
> } There is a patch by a Japanese user which simply converts
> } file names obtained by readder() into the composed form ("NFC")
> 
> I tried this and found a couple of problems, mainly that iconv() can
> return >= 0 but do nothing if the input has no convertible characters.
> Here is my suggested correction, also eliminates unnecessary temp_name:
> 
> diff --git a/Src/utils.c b/Src/utils.c
> index 9439227..8b512bb 100644
> --- a/Src/utils.c
> +++ b/Src/utils.c
> @@ -4270,6 +4270,12 @@ mod_export char *
> zreaddir(DIR *dir, int ignoredots)
> {
>     struct dirent *de;
> +#if defined(HAVE_ICONV) && defined(__APPLE__)
> +    static iconv_t conv_ds = (iconv_t)0;
> +    static char *conv_name = 0;
> +    char *conv_name_ptr, *orig_name_ptr;
> +    size_t conv_name_len, orig_name_len;
> +#endif
> 
>     do {
> 	de = readdir(dir);
> @@ -4278,6 +4284,31 @@ zreaddir(DIR *dir, int ignoredots)
>     } while(ignoredots && de->d_name[0] == '.' &&
> 	(!de->d_name[1] || (de->d_name[1] == '.' && !de->d_name[2])));
> 
> +#if defined(HAVE_ICONV) && defined(__APPLE__)
> +    if (!conv_ds)
> +	conv_ds = iconv_open("UTF-8", "UTF-8-MAC");
> +    if (conv_ds) {
> +	/* Force initial state in case re-using conv_ds */
> +	(void) iconv(conv_ds, 0, &orig_name_len, 0, &conv_name_len);
> +
> +	orig_name_ptr = de->d_name;
> +	orig_name_len = strlen(de->d_name);
> +	conv_name = zrealloc(conv_name, orig_name_len+1);
> +	conv_name_ptr = conv_name;
> +	conv_name_len = orig_name_len;
> +	if (iconv(conv_ds,
> +		  &orig_name_ptr, &orig_name_len,
> +		  &conv_name_ptr, &conv_name_len) >= 0) {
> +	  if (orig_name_len == 0) {
> +	    /* Completely converted, metafy and return */
> +	    *conv_name_ptr = '\0';
> +	    return metafy(conv_name, -1, META_STATIC);
> +	  }
> +	}
> +	/* Error, or conversion incomplete, keep the original name */
> +    }
> +#endif
> +
>     return metafy(de->d_name, -1, META_STATIC);
> }
> 

The patch is working properly so far. 

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 11:58             ` Kwon Yeolhyun
@ 2014-06-02 14:23               ` Kwon Yeolhyun
  2014-06-02 15:14                 ` Bart Schaefer
  2014-06-02 14:31               ` Bart Schaefer
  1 sibling, 1 reply; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02 14:23 UTC (permalink / raw)
  To: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 265 bytes --]

After the patch, I changed my login shell to the zsh.

$ sudo chsh -s /usr/local/bin/zsh yeolhyunkwon

Then, the terminal is closed immediately after open.

However, if I execute /usr/local/bin/zsh on the pre-opened shell (bash), it’s perfectly working.



[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 11:58             ` Kwon Yeolhyun
  2014-06-02 14:23               ` Kwon Yeolhyun
@ 2014-06-02 14:31               ` Bart Schaefer
  1 sibling, 0 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-06-02 14:31 UTC (permalink / raw)
  To: Zsh List Hackers', Kwon Yeolhyun

On Jun 2,  8:58pm, Kwon Yeolhyun wrote:
}
} The patch is working properly so far. 

Thanks, I've pushed it to the central git.

Anyone care to jump in on patcompile() and/or the (#u) qualifier?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 14:23               ` Kwon Yeolhyun
@ 2014-06-02 15:14                 ` Bart Schaefer
  2014-06-02 15:27                   ` Peter Stephenson
  2014-06-02 15:27                   ` Kwon Yeolhyun
  0 siblings, 2 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-06-02 15:14 UTC (permalink / raw)
  To: Kwon Yeolhyun, Zsh List Hackers'

On Jun 2, 11:23pm, Kwon Yeolhyun wrote:
} 
} After the patch, I changed my login shell to the zsh.
} 
} $ sudo chsh -s /usr/local/bin/zsh yeolhyunkwon
} 
} Then, the terminal is closed immediately after open.

Hm.  Can you try putting

exec 2> $HOME/zsh-errors
set -x

at the top of your ~/.zshenv file and see whether anything is written
to the zsh-errors file?  If so, what is at the tail of it?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 15:14                 ` Bart Schaefer
@ 2014-06-02 15:27                   ` Peter Stephenson
  2014-06-02 15:48                     ` Kwon Yeolhyun
  2014-06-02 15:27                   ` Kwon Yeolhyun
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Stephenson @ 2014-06-02 15:27 UTC (permalink / raw)
  To: Zsh List Hackers'

On Mon, 02 Jun 2014 08:14:26 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jun 2, 11:23pm, Kwon Yeolhyun wrote:
> } 
> } After the patch, I changed my login shell to the zsh.
> } 
> } $ sudo chsh -s /usr/local/bin/zsh yeolhyunkwon
> } 
> } Then, the terminal is closed immediately after open.
> 
> Hm.  Can you try putting
> 
> exec 2> $HOME/zsh-errors
> set -x
> 
> at the top of your ~/.zshenv file and see whether anything is written
> to the zsh-errors file?  If so, what is at the tail of it?

I wonder if it's a missing library.  In which case any report might go
somewhere else, e.g. the equivalent of the X windows session log.

pws


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 15:14                 ` Bart Schaefer
  2014-06-02 15:27                   ` Peter Stephenson
@ 2014-06-02 15:27                   ` Kwon Yeolhyun
  2014-06-02 15:49                     ` Bart Schaefer
  1 sibling, 1 reply; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02 15:27 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh List Hackers'


[-- Attachment #1.1: Type: text/plain, Size: 1733 bytes --]


On Jun 3, 2014, at 12:14 AM, Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Jun 2, 11:23pm, Kwon Yeolhyun wrote:
> } 
> } After the patch, I changed my login shell to the zsh.
> } 
> } $ sudo chsh -s /usr/local/bin/zsh yeolhyunkwon
> } 
> } Then, the terminal is closed immediately after open.
> 
> Hm.  Can you try putting
> 
> exec 2> $HOME/zsh-errors
> set -x
> 
> at the top of your ~/.zshenv file and see whether anything is written
> to the zsh-errors file?  If so, what is at the tail of it?

Oh.. I misunderstood your asking…

I did it again.

$ tail zsh-errors
+/Users/yeolhyunkwon/.oh-my-zsh/oh-my-zsh.sh:60> source /Users/yeolhyunkwon/.oh-my-zsh/plugins/virtualenvwrapper/virtualenvwrapper.plugin.zsh
+/Users/yeolhyunkwon/.oh-my-zsh/plugins/virtualenvwrapper/virtualenvwrapper.plugin.zsh:1> virtualenvwrapper=virtualenvwrapper.sh
+/Users/yeolhyunkwon/.oh-my-zsh/plugins/virtualenvwrapper/virtualenvwrapper.plugin.zsh:2> ((  1  ))
+/Users/yeolhyunkwon/.oh-my-zsh/plugins/virtualenvwrapper/virtualenvwrapper.plugin.zsh:3> source /Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh
+/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:2> set -e
+/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:3> [ -n '' ']'
+/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:5> program=virtualenvwrapper.sh
+/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:6> [ virtualenvwrapper.sh '=' python ']'
+/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:20> export 'PYENV_ROOT=/Users/yeolhyunkwon/.pyenv'
+/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:21> /usr/local/Cellar/pyenv/20140520/libexec/pyenv exec virtualenvwrapper.sh

hm…does pyenv seem to be suspicious?


[-- Attachment #1.2: zsh-errors --]
[-- Type: application/octet-stream, Size: 233834 bytes --]

[-- Attachment #1.3: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 15:27                   ` Peter Stephenson
@ 2014-06-02 15:48                     ` Kwon Yeolhyun
  0 siblings, 0 replies; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02 15:48 UTC (permalink / raw)
  To: Peter Stephenson, Bart Schaefer; +Cc: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 890 bytes --]


On Jun 3, 2014, at 12:27 AM, Peter Stephenson <p.stephenson@samsung.com> wrote:

> On Mon, 02 Jun 2014 08:14:26 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
>> On Jun 2, 11:23pm, Kwon Yeolhyun wrote:
>> } 
>> } After the patch, I changed my login shell to the zsh.
>> } 
>> } $ sudo chsh -s /usr/local/bin/zsh yeolhyunkwon
>> } 
>> } Then, the terminal is closed immediately after open.
>> 
>> Hm.  Can you try putting
>> 
>> exec 2> $HOME/zsh-errors
>> set -x
>> 
>> at the top of your ~/.zshenv file and see whether anything is written
>> to the zsh-errors file?  If so, what is at the tail of it?
> 
> I wonder if it's a missing library.  In which case any report might go
> somewhere else, e.g. the equivalent of the X windows session log.
> 
> pws

I can’t find the cause of the problem. But after reinstalling oh-my-zsh, it’s working properly.

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 15:27                   ` Kwon Yeolhyun
@ 2014-06-02 15:49                     ` Bart Schaefer
  2014-06-02 15:58                       ` Kwon Yeolhyun
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Schaefer @ 2014-06-02 15:49 UTC (permalink / raw)
  To: Kwon Yeolhyun; +Cc: Zsh List Hackers'

On Jun 3, 12:27am, Kwon Yeolhyun wrote:
} Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet
}
} +/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:20> export 'PYENV_ROOT=/Users/yeolhyunkwon/.pyenv'
} +/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:21> /usr/local/Cellar/pyenv/20140520/libexec/pyenv exec virtualenvwrapper.sh
} 
} hm...does pyenv seem to be suspicious?

The "exec" in there is a little suspicious.  I don't use virtualenvwrapper
myself but the docs for it say to read it with "source".

Just saw your mail about reinstalling oh-my-zsh, so I'm going to assume
that's cleared up.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 15:49                     ` Bart Schaefer
@ 2014-06-02 15:58                       ` Kwon Yeolhyun
  0 siblings, 0 replies; 31+ messages in thread
From: Kwon Yeolhyun @ 2014-06-02 15:58 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh List Hackers'

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]


On Jun 3, 2014, at 12:49 AM, Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Jun 3, 12:27am, Kwon Yeolhyun wrote:
> } Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet
> }
> } +/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:20> export 'PYENV_ROOT=/Users/yeolhyunkwon/.pyenv'
> } +/Users/yeolhyunkwon/.pyenv/shims/virtualenvwrapper.sh:21> /usr/local/Cellar/pyenv/20140520/libexec/pyenv exec virtualenvwrapper.sh
> } 
> } hm...does pyenv seem to be suspicious?
> 
> The "exec" in there is a little suspicious.  I don't use virtualenvwrapper
> myself but the docs for it say to read it with "source".
> 
> Just saw your mail about reinstalling oh-my-zsh, so I'm going to assume
> that's cleared up.

Ok.. I’ll check the pyenv things..
Anyway, I’m very happy to have zsh with tab completion!

Thanks, everyone!

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01 19:13           ` Bart Schaefer
@ 2014-06-02 17:01             ` Jun T.
  2014-06-02 17:14               ` Bart Schaefer
  0 siblings, 1 reply; 31+ messages in thread
From: Jun T. @ 2014-06-02 17:01 UTC (permalink / raw)
  To: zsh-workers


2014/06/02 04:13, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> $ ls u<TAB>     # completes to über (useful for some user??)
> 
> The current behavior here is pretty much by accident,

I've been thinking the NFD/NFC problem is not so serious because I can
use u<TAB> instead of ü<TAB> (u is easier to type than ü on my keyboard),
and simply guessed that Western non-English-speacking people (German, French,
Spanish, etc.) were using something like u<TAB>. But maybe I was wrong.

In Japanese, some Hiragana/Katakana can have a kind of accent, e.g.,
か + accent = が. It's OK for me to type か<TAB> instead of が<TAB>,
but many Japanese Mac/zsh users were frustrated with the problem and
one of those users came up with the patch I mentioned in the previous post.

I was thinking Korean (and Chinese) are free from the NFC/NFD problem, but
now I know I was wrong. I didn't know that Korean filenames are completely
decomposed down to each consonant/vowel. It was a surprise to know that

$ echo '\u1100 \u1161'
ᄀ ᅡ
$ echo'\u1100\u1161' 
가

Anyway, I did the following quick tests concerning the file sharing
among Mac and Win/Linux. But the tests are incomplete, and I did
them in a hurry so there may be mistakes:

(1) File sharing between Mac and Windows (samba):
It seems samba server/client on Mac do automatic conversion between
NFD and NFC. A Mac volume mounted on Win behaves as if it is a NFC volume,
and a Win volume mounted on Mac behaves as if it is an NFD volume.
This means composing readdir() output on Windows is not necessary
even if the volume is physically an NFD volume, while it must be
converted to NFC on Mac even if the volume is physically a NFC volume.

(2) A USB flash drive (FAT format):
If mounted on a Windows box it is a NFC volume, of course, and if mounted
on Mac it behaves as if it is a NFD volume (decomposed by a driver on Mac).
So the situation is the same as (1). I believe Linux behaves similarly
as Windows.

(3) File sharing between Mac and Linux (NFS):
If a Mac volume is mounted on Linux, then no NFC/NFD conversion takes
place; it seems readdir() on Linux returns NFD filenames for the volume.
(I enabled nfsd on my Mac with the default setting. I looked into nfsd(8)
or exports(5) man pages but they don't mention anything about NFC/NFD).
This means that zsh on Linux can't complete decomposed filename correctly.
But it seems iconv(3) on Linux doesn't support UTF-8-MAC and I can't think
of any solution here.

I had no time to test mounting Linux volume on Mac, but the mount_nfs(8)
man page on Mac says it has an option to convert NFD filename on Mac
to NFC filename on the Linux server.

I also couldn't test mounting Mac volume on Linux via samba, but I guess
it behaves as if it is a NFC volume on Linux.

The results so far suggest that readdir() output must be always converted
to NFC on Mac.
On Linux (and maybe on Windows) no conversion is possible because iconv()
doesn't support UTF-8-MAC, but conversion is not necessary except for when
mounting Mac volume via NFS.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 17:01             ` Jun T.
@ 2014-06-02 17:14               ` Bart Schaefer
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-06-02 17:14 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

Thanks for digging into this.

On Jun 2, 2014 10:01 AM, "Jun T." <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> The results so far suggest that readdir() output must be always converted
> to NFC on Mac.
> On Linux (and maybe on Windows) no conversion is possible because iconv()
> doesn't support UTF-8-MAC, but conversion is not necessary except for when
> mounting Mac volume via NFS.

I did some quick research into UTF-8-MAC and it appears that it's the same
as "standard" NFD except that it does NOT decompose a few specific
character ranges.  That may mean that it is possible to convert it by
passing a different source character set name to iconv, but I didn't get as
far as finding out which one to try.

However, applying iconv to every filename read from any source just because
it might be a Mac NFS mount seems like a waste of effort ...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-01 19:53           ` Bart Schaefer
  2014-06-02 11:58             ` Kwon Yeolhyun
@ 2014-06-02 17:15             ` Jun T.
  2014-06-02 17:27               ` Bart Schaefer
  1 sibling, 1 reply; 31+ messages in thread
From: Jun T. @ 2014-06-02 17:15 UTC (permalink / raw)
  To: zsh-workers


2014/06/02 04:53、Bart Schaefer <schaefer@brasslantern.com> のメール:
> +	if (iconv(conv_ds,
> +		  &orig_name_ptr, &orig_name_len,
> +		  &conv_name_ptr, &conv_name_len) >= 0) {
> +	  if (orig_name_len == 0) {
> +	    /* Completely converted, metafy and return */
> +	    *conv_name_ptr = '\0';
> +	    return metafy(conv_name, -1, META_STATIC);
> +	  }
> +	}

Is it possible for iconv() to return >=0 while orig_name_len != 0 ?
man iconv(3) mentions four possible cases,
three of them are error (return value is -1),
and the remaining one is (returnvalue>=0 and orig_name_len==0).


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 17:15             ` Jun T.
@ 2014-06-02 17:27               ` Bart Schaefer
  2014-06-05 14:34                 ` Jun T.
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Schaefer @ 2014-06-02 17:27 UTC (permalink / raw)
  To: Jun T.; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Jun 2, 2014 10:16 AM, "Jun T." <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> Is it possible for iconv() to return >=0 while orig_name_len != 0 ?
> man iconv(3) mentions four possible cases,
> three of them are error (return value is -1),
> and the remaining one is (returnvalue>=0 and orig_name_len==0).

I'm not sure how it maps onto the described return cases, but I encountered
a case in testing where iconv returned zero and orig_name_len was
unchanged, i.e., nothing was copied to conv_name_ptr.  Without the extra
test this results in empty string as the output filename.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-02 17:27               ` Bart Schaefer
@ 2014-06-05 14:34                 ` Jun T.
  2014-06-05 15:00                   ` Bart Schaefer
  0 siblings, 1 reply; 31+ messages in thread
From: Jun T. @ 2014-06-05 14:34 UTC (permalink / raw)
  To: zsh-workers

It seems we need to cast the return value of iconv() to a "signed" integer
for correctly detecting the error.


diff --git a/Src/utils.c b/Src/utils.c
index 8b512bb..2693ecd 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4296,14 +4296,13 @@ zreaddir(DIR *dir, int ignoredots)
 	conv_name = zrealloc(conv_name, orig_name_len+1);
 	conv_name_ptr = conv_name;
 	conv_name_len = orig_name_len;
-	if (iconv(conv_ds,
+	if ((long)iconv(conv_ds,
 		  &orig_name_ptr, &orig_name_len,
-		  &conv_name_ptr, &conv_name_len) >= 0) {
-	  if (orig_name_len == 0) {
+		  &conv_name_ptr, &conv_name_len) >= 0 &&
+	    orig_name_len == 0) {
 	    /* Completely converted, metafy and return */
 	    *conv_name_ptr = '\0';
 	    return metafy(conv_name, -1, META_STATIC);
-	  }
 	}
 	/* Error, or conversion incomplete, keep the original name */
     }




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Unicode, Korean, normalization form, Mac OS X and tab completion
  2014-06-05 14:34                 ` Jun T.
@ 2014-06-05 15:00                   ` Bart Schaefer
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Schaefer @ 2014-06-05 15:00 UTC (permalink / raw)
  To: zsh-workers

On Jun 5, 11:34pm, Jun T. wrote:
}
} It seems we need to cast the return value of iconv() to a "signed" integer
} for correctly detecting the error.

Hmm.  That may explain something else.  I suggest this instead:


diff --git a/Src/utils.c b/Src/utils.c
index 8b512bb..59b9435 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4287,7 +4287,7 @@ zreaddir(DIR *dir, int ignoredots)
 #if defined(HAVE_ICONV) && defined(__APPLE__)
     if (!conv_ds)
 	conv_ds = iconv_open("UTF-8", "UTF-8-MAC");
-    if (conv_ds) {
+    if (conv_ds != (iconv_t)(-1)) {
 	/* Force initial state in case re-using conv_ds */
 	(void) iconv(conv_ds, 0, &orig_name_len, 0, &conv_name_len);
 
@@ -4298,12 +4298,11 @@ zreaddir(DIR *dir, int ignoredots)
 	conv_name_len = orig_name_len;
 	if (iconv(conv_ds,
 		  &orig_name_ptr, &orig_name_len,
-		  &conv_name_ptr, &conv_name_len) >= 0) {
-	  if (orig_name_len == 0) {
+		  &conv_name_ptr, &conv_name_len) != (size_t)(-1) &&
+	    orig_name_len == 0) {
 	    /* Completely converted, metafy and return */
 	    *conv_name_ptr = '\0';
 	    return metafy(conv_name, -1, META_STATIC);
-	  }
 	}
 	/* Error, or conversion incomplete, keep the original name */
     }


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-06-05 15:01 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-31  3:56 Unicode, Korean, normalization form, Mac OS X and tab completion Kwon Yeolhyun
2014-05-31 15:21 ` Chet Ramey
2014-05-31 18:47   ` Bart Schaefer
2014-05-31 19:16 ` Peter Stephenson
2014-05-31 21:29   ` Bart Schaefer
2014-06-01  2:25     ` Daniel Shahaf
2014-06-01  5:30       ` Kwon Yeolhyun
2014-06-01 16:53         ` Daniel Shahaf
2014-06-01  7:56       ` Bart Schaefer
2014-06-01 16:46         ` Daniel Shahaf
2014-06-01 17:00         ` Jun T.
2014-06-01 19:13           ` Bart Schaefer
2014-06-02 17:01             ` Jun T.
2014-06-02 17:14               ` Bart Schaefer
2014-06-01 19:53           ` Bart Schaefer
2014-06-02 11:58             ` Kwon Yeolhyun
2014-06-02 14:23               ` Kwon Yeolhyun
2014-06-02 15:14                 ` Bart Schaefer
2014-06-02 15:27                   ` Peter Stephenson
2014-06-02 15:48                     ` Kwon Yeolhyun
2014-06-02 15:27                   ` Kwon Yeolhyun
2014-06-02 15:49                     ` Bart Schaefer
2014-06-02 15:58                       ` Kwon Yeolhyun
2014-06-02 14:31               ` Bart Schaefer
2014-06-02 17:15             ` Jun T.
2014-06-02 17:27               ` Bart Schaefer
2014-06-05 14:34                 ` Jun T.
2014-06-05 15:00                   ` Bart Schaefer
2014-06-02  5:17           ` Kwon Yeolhyun
2014-06-02  7:39             ` Jun T.
2014-06-02  8:42               ` Kwon Yeolhyun

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).