From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28628 invoked from network); 18 May 2008 23:58:25 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 18 May 2008 23:58:25 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 19141 invoked from network); 18 May 2008 23:58:17 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 18 May 2008 23:58:16 -0000 Received: (qmail 12078 invoked by alias); 18 May 2008 23:58:12 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 25069 Received: (qmail 12063 invoked from network); 18 May 2008 23:58:11 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 18 May 2008 23:58:11 -0000 Received: from vms046pub.verizon.net (vms046pub.verizon.net [206.46.252.46]) by bifrost.dotsrc.org (Postfix) with ESMTP id 8527C8059114 for ; Mon, 19 May 2008 01:58:07 +0200 (CEST) Received: from torch.brasslantern.com ([71.116.113.54]) by vms046.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K13001QH98JUOE1@vms046.mailsrvcs.net> for zsh-workers@sunsite.dk; Sun, 18 May 2008 18:57:56 -0500 (CDT) Received: from torch.brasslantern.com (localhost.localdomain [127.0.0.1]) by torch.brasslantern.com (8.13.1/8.13.1) with ESMTP id m4INvsQ2004387 for ; Sun, 18 May 2008 16:57:54 -0700 Received: (from schaefer@localhost) by torch.brasslantern.com (8.13.1/8.13.1/Submit) id m4INvriD004386 for zsh-workers@sunsite.dk; Sun, 18 May 2008 16:57:53 -0700 Date: Sun, 18 May 2008 16:57:53 -0700 From: Bart Schaefer Subject: Re: compmatch behaviour In-reply-to: <10710.1211137299@pws-pc> To: zsh-workers@sunsite.dk (Zsh hackers list) Message-id: <080518165753.ZM4385@torch.brasslantern.com> MIME-version: 1.0 X-Mailer: OpenZMail Classic (0.9.2 24April2005) Content-type: text/plain; charset=us-ascii References: <10710.1211137299@pws-pc> Comments: In reply to Peter Stephenson "compmatch behaviour" (May 18, 8:01pm) X-Virus-Scanned: ClamAV 0.91.2/7152/Mon May 19 00:50:55 2008 on bifrost X-Virus-Status: Clean On May 18, 8:01pm, Peter Stephenson wrote: } } However, there's one bit that's got me stumped, and unfortunately it's } the core of the whole business. bld_line() in Src/Zle/compmatch.c works } as follows: So if I comprehend your question, it's not that you need help figuring out what the code is doing. You want help figuring out what it should do instead, because the space of all possible wide characters is much to big to brute-force it the way Sven originally did. Right? } ... because we can have patterns associated with both the trial } string and the word on the command line, we have got ourselves into } a position where the logic is naturally qudratic: both sides can in } principle change and consequently we need to change one side to see if } it can match the other. I'm not sure this is quite right (so maybe it's just your coherency or my comprehension that's off). There are two situations being handled simultaneously here, and maybe the first thing to do is to separate them. The first situation is where wpat is a correspondence class and we need to select the corresponding position out of lpat. The second case is where lpat is an equivalence class and we need to try every possible character in the class at line position *lp. The two cases don't actually overlap as far as I can tell -- Sven has branched the same loop that searches for c in lpat->tab in the first case, to do double duty as the loop that acts on every character in lpat->tab in the second case. Either (but not both) of the two situations could occur at every value of lp, which is what the recursion is covering. This is limited by the length of the line --- there might be an optimization opportunity in testing sooner whether *(lp + 1) will be the end of line, but the depth of the recursion is not related to the size of the character class. I think the first think that's needed is to change the Cpattern struct from a dense array indexed by the ascii value of a character, into ... well, something else, so that it's not necessary to iterate over it in the first case, and so the iteration is more sparse in the second case. Of course, that may make pattern_match() more complicated .... I'm not sure there's any way to avoid iterating in the second case. Besides handling two possible ways of interpreting the character class, I think (without tracing very far up the call chain) that this is also constructing the prefix string that's going to be shown to the user in the event of a common prefix shared by ambiguous matches. It's not enough to check whether the test character is in the class, you might have to know *which* character it is in the class, so to speak.