From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15261 invoked by alias); 8 Jan 2011 23:21:55 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 28602 Received: (qmail 29885 invoked from network); 8 Jan 2011 23:21:53 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received-SPF: pass (ns1.primenet.com.au: SPF record at ntlworld.com designates 81.103.221.49 as permitted sender) Date: Sat, 8 Jan 2011 23:21:40 +0000 From: Peter Stephenson To: zsh-workers@zsh.org Subject: Re: filename completion with umlauts (again) Message-ID: <20110108232140.2661ba00@pws-pc.ntlworld.com> In-Reply-To: <110108142301.ZM2102@torch.brasslantern.com> References: <20110106232712.GA11387@spiegl.de> <20110107094419.141d8d67@pwslap01u.europe.root.pri> <20110107233459.GA29168@spiegl.de> <110107231048.ZM919@torch.brasslantern.com> <20110108202122.5decaa0b@pws-pc.ntlworld.com> <110108142301.ZM2102@torch.brasslantern.com> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.0; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Cloudmark-Analysis: v=1.1 cv=JvdXmxIgLJv2/GthKqHpGJEEHukvLcvELVXUanXFreg= c=1 sm=0 a=L80sod66sRkA:10 a=kj9zAlcOel0A:10 a=q2GGsy2AAAAA:8 a=NLZqzBF-AAAA:8 a=dRSb77mciMzgM1I70uYA:9 a=hkvP5Ob0imGDykZx4MgA:7 a=B2APNDy_6nfK8A0CUmgVZ4z0WYcA:4 a=CjuIK1q_8ugA:10 a=I6wTmPyJxzYA:10 a=_dQi-Dcv4p4A:10 a=G53XA0t5t-9EEQuc:21 a=wR2LtgR64pWqIlcw:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 On Sat, 08 Jan 2011 14:22:59 -0800 Bart Schaefer wrote: > } The collating order might be potentially a problem if you use literal > } characters, but that's already fixed in a general way by allowing the > } syntax: > } > } m:{[:upper:][:lower:]}={[:lower:][:upper:]} > > The syntax is supported but the handling doesn't appear to be special- > cased; mb_patmatchindex() does not differ from patchmatchindex() in its > handling of PP_UPPER or PP_LOWER and assumes ranges are numerically > contiguous. The relevant code is in Src/Zle/compmatch.c. (There are some references to matchers in other parts of the completion code, and there's a little bit of extra help from the regular expression code but that's fairly trivial.) Equivalence classes are handled by pattern_match_equivalence(). In every other place equivalence classes are treated identically to normal character classes. > What is it that I continue to fail to see? See any number of while loops over character arrays in compmatch.c; as one example, the loop at line 529 in match_str(). The various arrays are simply char *'s and they're not even metafied (if I remember right; that's how we support 8-bit single byte encodings, by direct comparison). The place is full of expressions like "w + aoff - aol" and "l[-(llen + zoff)]". All these arrays need to refer either to multibyte characters with appropriate arithmetic using mbsrtowcs() and friends, or need to be converted to wide characters and back at appropriate points, and in the latter case we need to convert everything relevant into wide characters and back again, in some cases potentially losing information since not everything on the command line is guaranteed to be a multibyte string corresponding to a valid character in the current locale. (For example, you can complete a file name containing ISO-8859-1 characters even when the locale is UTF-8; this should work even though the characters don't show up properly.) If you *can* prove it's trivial, of course... -- Peter Stephenson Web page now at http://homepage.ntlworld.com/p.w.stephenson/