From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21609 invoked from network); 18 Oct 2004 11:47:36 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 18 Oct 2004 11:47:36 -0000 Received: (qmail 97517 invoked from network); 18 Oct 2004 11:47:30 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 18 Oct 2004 11:47:30 -0000 Received: (qmail 11204 invoked by alias); 18 Oct 2004 11:47:25 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20500 Received: (qmail 11190 invoked from network); 18 Oct 2004 11:47:24 -0000 Received: from unknown (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 18 Oct 2004 11:47:24 -0000 Received: (qmail 97126 invoked from network); 18 Oct 2004 11:47:06 -0000 Received: from lhuumrelay3.lnd.ops.eu.uu.net (62.189.58.19) by a.mx.sunsite.dk with SMTP; 18 Oct 2004 11:47:05 -0000 Received: from MAILSWEEPER01.csr.com (mailhost1.csr.com [62.189.183.235]) by lhuumrelay3.lnd.ops.eu.uu.net (8.11.0/8.11.0) with ESMTP id i9IBl3v29286 for ; Mon, 18 Oct 2004 11:47:03 GMT Received: from EXCHANGE02.csr.com (unverified [192.168.137.45]) by MAILSWEEPER01.csr.com (Content Technologies SMTPRS 4.3.12) with ESMTP id for ; Mon, 18 Oct 2004 12:46:00 +0100 Received: from news01.csr.com ([192.168.143.38]) by EXCHANGE02.csr.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 18 Oct 2004 12:49:45 +0100 Received: from news01.csr.com (localhost.localdomain [127.0.0.1]) by news01.csr.com (8.12.11/8.12.11) with ESMTP id i9IBl2Wr008105 for ; Mon, 18 Oct 2004 12:47:02 +0100 Received: from csr.com (pws@localhost) by news01.csr.com (8.12.11/8.12.11/Submit) with ESMTP id i9IBl2kN008102 for ; Mon, 18 Oct 2004 12:47:02 +0100 Message-Id: <200410181147.i9IBl2kN008102@news01.csr.com> X-Authentication-Warning: news01.csr.com: pws owned process doing -bs To: zsh-workers@sunsite.dk (Zsh hackers list) Subject: Pattern changes, part 2 Date: Mon, 18 Oct 2004 12:47:02 +0100 From: Peter Stephenson X-OriginalArrivalTime: 18 Oct 2004 11:49:45.0351 (UTC) FILETIME=[90002170:01C4B508] X-Spam-Checker-Version: SpamAssassin 2.63 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, hits=0.0 required=6.0 tests=none autolearn=no version=2.63 X-Spam-Hits: 0.0 I did some more work on pattern matching over the weekend. The main idea is to make it easier to handle multibyte characters by using the normal string representation whenever convenient. All tests still pass. - The test string is now unmetafied for comparing against the pattern. Literal strings in the pattern are also unmetafied. I've turned the METAINCs in the pattern matcher into CHARINCs where appropriate; this is currently a trivial increment but is a placeholder to say "go to next character". (There is no change in places where the string remains metafied which will still need more thought.) The new code should be significantly more efficient during pattern matching, since it doesn't have to test for Meta characters in many places, although I haven't benchmarked it. - Character sets [...] are still metafied; we need the special characters to indicate ranges and Posix ctype names. - Pure strings are still metafied. (These are signalled by a special flag indicating the value stored is a string rather than the normal pattern programme.) It became clear that changing this would be inefficient, particularly in globbing where we use the result of the pattern matcher to add to the (metafied) path buffer. There are actually two cases: o We can spot immediately that the string doesn't have special characters. This is the normal case and is handled fairly efficiently. o There are special characters around but nonetheless the string is a pure string. There is one case where we need to handle this properly, which is when the string in question is ".." or ".", since those are never matched by globbing. An example where this could occur would be a path segment (#i).. with extended globbing. Here, we only find out we have a pure string after unmetafying into the pattern programme, so we need to metafy again. This isn't so hot, but it's actually a rare corner case. - The interface used by parameter substitutions has been tidied up. o The call patmatchlen() gets the length of the match, so that nothing outside pattern.c needs pointers into the test string. This was necessary since the strings may now be reallocated, but is neater anyway. (This is the metafied length, which is what the parameter code needs --- and this will probably continue, I don't thinks there's a case for unmetafying there. There is some minor inefficiency in counting metafiable characters in the matched part of the trial string.) o The horrible global patoffset has disappeared. Now the offset to be added to indices into parameters is passed as an argument. I should have done it this way all along. - Minor fix for numeric ranges: will now match any integer that is too large to represent in the internal integer type. This has worked for <-> for some time, but it wasn't special-cased if there was a lower range. I will commit this directly (with a ChangeLog entry, this time). By the way, we really need a lot more tests which require the use of the Meta character, and not just for pattern matching. Adding this while the character representation is in flux is probably not particularly useful, however. -- Peter Stephenson Software Engineer CSR Ltd., Science Park, Milton Road, Cambridge, CB4 0WH, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com **********************************************************************