From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-28602-mason-zsh=primenet.com.au@zsh.org>
Received: (qmail 15261 invoked by alias); 8 Jan 2011 23:21:55 -0000
Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Seq: 28602
Received: (qmail 29885 invoked from network); 8 Jan 2011 23:21:53 -0000
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE
	autolearn=ham version=3.3.1
Received-SPF: pass (ns1.primenet.com.au: SPF record at ntlworld.com designates 81.103.221.49 as permitted sender)
Date: Sat, 8 Jan 2011 23:21:40 +0000
From: Peter Stephenson <p.w.stephenson@ntlworld.com>
To: zsh-workers@zsh.org
Subject: Re: filename completion with umlauts (again)
Message-ID: <20110108232140.2661ba00@pws-pc.ntlworld.com>
In-Reply-To: <110108142301.ZM2102@torch.brasslantern.com>
References: <20110106232712.GA11387@spiegl.de>
	<AANLkTik9unZtuPR-4CM2oKLRT9Soct-XFWmiEajQzbK9@mail.gmail.com>
	<20110107094419.141d8d67@pwslap01u.europe.root.pri>
	<20110107233459.GA29168@spiegl.de>
	<110107231048.ZM919@torch.brasslantern.com>
	<20110108202122.5decaa0b@pws-pc.ntlworld.com>
	<110108142301.ZM2102@torch.brasslantern.com>
X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.0; x86_64-redhat-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Cloudmark-Analysis: v=1.1 cv=JvdXmxIgLJv2/GthKqHpGJEEHukvLcvELVXUanXFreg= c=1 sm=0 a=L80sod66sRkA:10 a=kj9zAlcOel0A:10 a=q2GGsy2AAAAA:8 a=NLZqzBF-AAAA:8 a=dRSb77mciMzgM1I70uYA:9 a=hkvP5Ob0imGDykZx4MgA:7 a=B2APNDy_6nfK8A0CUmgVZ4z0WYcA:4 a=CjuIK1q_8ugA:10 a=I6wTmPyJxzYA:10 a=_dQi-Dcv4p4A:10 a=G53XA0t5t-9EEQuc:21 a=wR2LtgR64pWqIlcw:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117

On Sat, 08 Jan 2011 14:22:59 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> } The collating order might be potentially a problem if you use literal
> } characters, but that's already fixed in a general way by allowing the
> } syntax:
> } 
> }   m:{[:upper:][:lower:]}={[:lower:][:upper:]}
> 
> The syntax is supported but the handling doesn't appear to be special-
> cased; mb_patmatchindex() does not differ from patchmatchindex() in its
> handling of PP_UPPER or PP_LOWER and assumes ranges are numerically
> contiguous.

The relevant code is in Src/Zle/compmatch.c.  (There are some references
to matchers in other parts of the completion code, and there's a little
bit of extra help from the regular expression code but that's fairly
trivial.)  Equivalence classes are handled by
pattern_match_equivalence().  In every other place equivalence classes
are treated identically to normal character classes.

> What is it that I continue to fail to see?

See any number of while loops over character arrays in compmatch.c; as
one example, the loop at line 529 in match_str().  The various arrays
are simply char *'s and they're not even metafied (if I remember right;
that's how we support 8-bit single byte encodings, by direct
comparison).  The place is full of expressions like "w + aoff - aol" and
"l[-(llen + zoff)]".  All these arrays need to refer either to multibyte
characters with appropriate arithmetic using mbsrtowcs() and friends, or
need to be converted to wide characters and back at appropriate points,
and in the latter case we need to convert everything relevant into wide
characters and back again, in some cases potentially losing information
since not everything on the command line is guaranteed to be a multibyte
string corresponding to a valid character in the current locale.  (For
example, you can complete a file name containing ISO-8859-1 characters
even when the locale is UTF-8; this should work even though the
characters don't show up properly.)

If you *can* prove it's trivial, of course...

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/