From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23816 invoked by alias); 8 Jan 2011 22:23:22 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 28600 Received: (qmail 27025 invoked from network); 8 Jan 2011 22:23:21 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received-SPF: none (ns1.primenet.com.au: domain at closedmail.com does not designate permitted sender hosts) From: Bart Schaefer Message-id: <110108142301.ZM2102@torch.brasslantern.com> Date: Sat, 08 Jan 2011 14:22:59 -0800 In-reply-to: <20110108202122.5decaa0b@pws-pc.ntlworld.com> Comments: In reply to Peter Stephenson "Re: filename completion with umlauts (again)" (Jan 8, 8:21pm) References: <20110106232712.GA11387@spiegl.de> <20110107094419.141d8d67@pwslap01u.europe.root.pri> <20110107233459.GA29168@spiegl.de> <110107231048.ZM919@torch.brasslantern.com> <20110108202122.5decaa0b@pws-pc.ntlworld.com> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: zsh-workers@zsh.org Subject: Re: filename completion with umlauts (again) MIME-version: 1.0 Content-type: text/plain; charset=us-ascii [>workers] On Jan 8, 8:21pm, Peter Stephenson wrote: } } The remaining problem is the multibyte one; the matcher code is heavily } tied to one character per array position in a way that doesn't make it } easy to turn multibyte into wide characters and back (and that doesn't } always make it obvious what the @*!@! it's actually doing with the } array). "The array" ... Digging through the list archives I find a reference to "the characters stored in the matcher are not handled as multibyte" but parse_pattern() seems to be converting multibyte input to convchar_t so that's not it any longer. (Is it?) Hence it must be genpatarr in bld_line(), and the problem is that even though we can determine correctly that the left-side of the equivalence class matches the original character on the line, we can't select the appropriate corresponding character from the right-side of the class? Which implies that the root of the problem is mb_patchmatchindex() in Src/pattern.c, and what I said before really is true: It's not simple to expand an "a-z" style representation into an enumeration of all the characters within the range, figure out that it's the Nth position in the expansion, and then find the corresponding Nth position in another range, when either or both ranges might be multibyte; and even if it were possible to select the correct position in both ranges it's unclear when to convert the result back to multibyte. } The collating order might be potentially a problem if you use literal } characters, but that's already fixed in a general way by allowing the } syntax: } } m:{[:upper:][:lower:]}={[:lower:][:upper:]} The syntax is supported but the handling doesn't appear to be special- cased; mb_patmatchindex() does not differ from patchmatchindex() in its handling of PP_UPPER or PP_LOWER and assumes ranges are numerically contiguous. What is it that I continue to fail to see? BTW in the comments before compmatch.c:pattern_match_restrict() there's a reference to "s will be NULL" but there is no variable or argument "s". I suspect it must mean "wsc". --