From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-32647-mason-zsh=primenet.com.au@zsh.org>
Received: (qmail 18672 invoked by alias); 1 Jun 2014 07:56:43 -0000
Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Seq: 32647
Received: (qmail 18343 invoked from network); 1 Jun 2014 07:56:30 -0000
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE
	autolearn=ham version=3.3.2
From: Bart Schaefer <schaefer@brasslantern.com>
Message-id: <140601005624.ZM3283@torch.brasslantern.com>
Date: Sun, 01 Jun 2014 00:56:24 -0700
In-reply-to: <20140601022527.GD1820@tarsus.local2>
Comments: In reply to Daniel Shahaf <d.s@daniel.shahaf.name>
 "Re: Unicode, Korean, normalization form, Mac OS X and tab completion" (Jun
 1,  2:25am)
References: <AB81F9FB-8D84-4656-9EFE-F2F98B196861@me.com>
	<20140531201617.4ca60ab8@pws-pc.ntlworld.com>
	<140531142926.ZM556@torch.brasslantern.com>
	<20140601022527.GD1820@tarsus.local2>
X-Mailer: OpenZMail Classic (0.9.2 24April2005)
To: "Zsh List Hackers'" <zsh-workers@zsh.org>
Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab completion
MIME-version: 1.0
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: quoted-printable

On Jun 1,  2:25am, Daniel Shahaf wrote:
}
} What about, say, people doing 'ls' and copy-pasting a filename from the
} output into a command line?  Wouldn't that result in NFD keyboard
} input?

Yes, but there's only so far that it makes sense to go with this.  For
example, [[ foo=C3=A1 =3D fooa=CC ]] arguably should not normalize, and scr=
ipt
file contents should not be normalized, etc.  I think messing with the
command input stream will create more problems than it solves.

What we *might* need is for patcompile() also to normalize (though that
potentially violates what I just said about [[ ... ]], depending on which
encoding is the pattern and which is the string to be matched).  Maybe
this needs to be part of the (#u) qualifier handling, or a related new
qualifier.

(Note there's little to no existing support for wide characters in e.g.
matcher-list range specifications, so no point in going there yet.)

} FWIW, while OS X always returns NFD filenames, one could also imagine an
} OS that is normalization-aware (forbids creating a file if its
} normalized name is the same as the normalized name of an existing file)
} but octet-sequence-preserving, and on such an OS both the readdir()
} output and the user input would need to be normalized.

This case is ultimately the same as your first example.  Either the two
forms of name should be treated the same, in which case normalizing the
results of readdir() is enough, or they should be treated as different
even though you aren't allowed to create both of them, in which case
they should not be normalized at all (and then there better be some way
outside the shell, e.g., at the TTY driver layer, to choose the input
encoding).

Maybe the completion system should use (#u) more often, or maybe there
needs to be a setopt to cause all patterns to act as if (#u) ...

If there's a tricky bit, it's knowing which encoding is the default for
input so you can normalize to that one.

} Also, other unixes allow you to have both the NFC-form and NFD-form in
} the same directory, e.g., 'touch fooa fooa' works just fine on linux
} ext4 (the first filename is composed, the second decomposed); in such
} cases normalization magic should not be done.

Hence my question about what compile-time tests we need for this, and
what if anything to do about Mac filesystems mounted on Linux.