From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-28600-mason-zsh=primenet.com.au@zsh.org>
Received: (qmail 23816 invoked by alias); 8 Jan 2011 22:23:22 -0000
Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Seq: 28600
Received: (qmail 27025 invoked from network); 8 Jan 2011 22:23:21 -0000
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE
	autolearn=ham version=3.3.1
Received-SPF: none (ns1.primenet.com.au: domain at closedmail.com does not designate permitted sender hosts)
From: Bart Schaefer <schaefer@brasslantern.com>
Message-id: <110108142301.ZM2102@torch.brasslantern.com>
Date: Sat, 08 Jan 2011 14:22:59 -0800
In-reply-to: <20110108202122.5decaa0b@pws-pc.ntlworld.com>
Comments: In reply to Peter Stephenson <p.w.stephenson@ntlworld.com>
 "Re: filename completion with umlauts (again)" (Jan  8,  8:21pm)
References: <20110106232712.GA11387@spiegl.de>
	<AANLkTik9unZtuPR-4CM2oKLRT9Soct-XFWmiEajQzbK9@mail.gmail.com>
	<20110107094419.141d8d67@pwslap01u.europe.root.pri>
	<20110107233459.GA29168@spiegl.de>	<110107231048.ZM919@torch.brasslantern.com>
	<20110108202122.5decaa0b@pws-pc.ntlworld.com>
X-Mailer: OpenZMail Classic (0.9.2 24April2005)
To: zsh-workers@zsh.org
Subject: Re: filename completion with umlauts (again)
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii

[>workers]

On Jan 8,  8:21pm, Peter Stephenson wrote:
}
} The remaining problem is the multibyte one; the matcher code is heavily
} tied to one character per array position in a way that doesn't make it
} easy to turn multibyte into wide characters and back (and that doesn't
} always make it obvious what the @*!@! it's actually doing with the
} array).

"The array" ...

Digging through the list archives I find a reference to "the characters
stored in the matcher are not handled as multibyte" but parse_pattern() 
seems to be converting multibyte input to convchar_t so that's not it
any longer.  (Is it?)

Hence it must be genpatarr in bld_line(), and the problem is that even
though we can determine correctly that the left-side of the equivalence
class matches the original character on the line, we can't select the
appropriate corresponding character from the right-side of the class?

Which implies that the root of the problem is mb_patchmatchindex() in
Src/pattern.c, and what I said before really is true:  It's not simple
to expand an "a-z" style representation into an enumeration of all the
characters within the range, figure out that it's the Nth position in
the expansion, and then find the corresponding Nth position in another
range, when either or both ranges might be multibyte; and even if it were
possible to select the correct position in both ranges it's unclear when
to convert the result back to multibyte.

} The collating order might be potentially a problem if you use literal
} characters, but that's already fixed in a general way by allowing the
} syntax:
} 
}   m:{[:upper:][:lower:]}={[:lower:][:upper:]}

The syntax is supported but the handling doesn't appear to be special-
cased; mb_patmatchindex() does not differ from patchmatchindex() in its
handling of PP_UPPER or PP_LOWER and assumes ranges are numerically
contiguous.

What is it that I continue to fail to see?

BTW in the comments before compmatch.c:pattern_match_restrict() there's a
reference to "s will be NULL" but there is no variable or argument "s".
I suspect it must mean "wsc".

--