From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5963 invoked from network); 29 Sep 2006 17:08:58 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.5 (2006-08-29) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,DRUGS_MUSCLE, FORGED_RCVD_HELO autolearn=ham version=3.1.5 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 29 Sep 2006 17:08:58 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 8112 invoked from network); 29 Sep 2006 17:08:53 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 29 Sep 2006 17:08:53 -0000 Received: (qmail 15105 invoked by alias); 29 Sep 2006 17:08:49 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 22788 Received: (qmail 15096 invoked from network); 29 Sep 2006 17:08:48 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 29 Sep 2006 17:08:48 -0000 Received: (qmail 7706 invoked from network); 29 Sep 2006 17:08:48 -0000 Received: from cluster-d.mailcontrol.com (217.69.20.190) by a.mx.sunsite.dk with SMTP; 29 Sep 2006 17:08:47 -0000 Received: from cameurexb01.EUROPE.ROOT.PRI ([62.189.241.200]) by rly20d.srv.mailcontrol.com (MailControl) with ESMTP id k8TH8jcZ019989 for ; Fri, 29 Sep 2006 18:08:45 +0100 Received: from news01.csr.com ([10.103.143.38]) by cameurexb01.EUROPE.ROOT.PRI with Microsoft SMTPSVC(6.0.3790.1830); Fri, 29 Sep 2006 18:08:44 +0100 Date: Fri, 29 Sep 2006 18:08:43 +0100 From: Peter Stephenson To: zsh-workers@sunsite.dk Subject: Re: Stuff to do Message-Id: <20060929180843.3293cffe.pws@csr.com> In-Reply-To: <200609292037.17847.arvidjaar@newmail.ru> References: <200609271211.k8RCBW5N023914@news01.csr.com> <200609292037.17847.arvidjaar@newmail.ru> Organization: Cambridge Silicon Radio X-Mailer: Sylpheed version 2.2.7 (GTK+ 2.8.20; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 29 Sep 2006 17:08:44.0336 (UTC) FILETIME=[EB6DBB00:01C6E3E9] X-Scanned-By: MailControl A-07-04-02 (www.mailcontrol.com) on 10.68.0.130 Andrey Borzenkov wrote: > 1. matcher code assumes character == byte and is using 256 bytes array to > build character equivalence classes. What is worse, it is passing this array > around between different functions to suppply results of previous matching. I > have here patch (attached) that eliminates external dependency on this array > so matcher internals can be more easily changed. This seems to make code a > bit more understandable irrespectively :) OK to commit? Yes, the more the calling conventions are sanitized like this the better I like it. The references to external data are one of my worst nightmares. > 2. Usage of magic array for character classes ([abcd]) can be naturally > superceded by using either generic pattern matching or direct comparison. > Pattern matching provides for using something like [[:lower:]] and possibly > using matchers etc but potential side effects of extended globbing need > review. I do not know what is faster. Is it OK? I'd be quite keen on being able to do this by using globbing. I think the current uses of matcher specifications are limited enough (sometimes by necessity, as we're seeing) that an extension wouldn't be a problem for compatibility; however, I don't know how to mix this with the equivalence class stuff. It would be quite nice to keep it in one place in pattern.c, but I doubt if that's going to work with all the additions we need. > 3. Equivalence classes ({abcd}={xyzw}) do not scale beyond single byte > characters. But if we check usage I believe, it has never been used for > anything beyond case-insensitive matching. For this particular usage I > suggest using new matcher type: > > m:LPAT>upper > m:LPAT>lower > > with obvious semantic - character from line is converted to lower or upper and > compared with character from potential match. So m:{a-z}={A-Z} becomes > m:?>upper etc. > > We still can implement {...} for character _set_ but not for character range. > So far I do not consider it major problem. I think we'll need to keep it working for ASCII for compatibility, but not extending it to other characters is, as you say, not a big problem. However, maybe it's not a problem at all; see below. > 4. The hardest part. Right anchor. For this matcher must match _backward_. I > am not aware of any way to walk backward as long as we assume arbitrary > encoding. Options apparently are >... > b) convert this code to use wide characters. Not sure if this is a viable > option. This is the option I was thinking about, and it removes the range problem since it extends the ASCII logic in a natural way (it may be system dependent, but that's the absolute least of our worries). I don't think it's a problem using wide characters locally for the comparisons. Indeed, the pattern match code does all its character class stuff with wide characters (or kludged wide characters which are just the unsigned char values if a multibyte sequence doesn't convert). It doesn't really make sense to allow for unconvertible characters in matcher comparisons---it's great to be able to insert them on the command line in some fashion, but the matcher specs only make sense for characters that are convertible. The worst problem is that we lose the ability to do matching control where (say) much of the string is ASCII, and our match rules only use ASCII, but there are also characters that don't work in the current locale. I don't think this is a big issue and there are possible ways round: - partial conversion - convert them at this stage to $'\...' sequences instead of later - use marked wide characters where we record a byte that can't be converted --- any of which could be bolted on later. So I don't think that's a showstopper. I was wondering how much of the code we needed to convert to use wide characters, and vaguely came to the conclusion the only reasonable sane way was to do it fairly locally within the comparison function(s), since otherwise the interface to the rest of the completion system gets very hairy. However, I haven't actually looked at the code again since coming to that conclusion. However, if there's an easy way of doing it by another method, fine. I suspect there isn't. -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php