From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21883 invoked from network); 27 Feb 2009 08:34:04 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 27 Feb 2009 08:34:04 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 28258 invoked from network); 27 Feb 2009 08:33:57 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 27 Feb 2009 08:33:57 -0000 Received: (qmail 27556 invoked by alias); 27 Feb 2009 08:33:51 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 26621 Received: (qmail 27543 invoked from network); 27 Feb 2009 08:33:50 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 27 Feb 2009 08:33:50 -0000 Received: from mx.spodhuis.org (redoubt.spodhuis.org [193.202.115.177]) by bifrost.dotsrc.org (Postfix) with ESMTPS id 1C4358058F83 for ; Fri, 27 Feb 2009 09:33:47 +0100 (CET) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=d200902; d=spodhuis.org; h=Received:Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To; b=muVLl7b16gtC3sTOf+vZrG2jyAlu5TejJTj/GSlLWs6HcMir74ZnxXXhfOLTwKHYOkvCcT9aQ1TVeSqAnTUykx0FCqgyHBNn3ySc9Jyn4W95Qmj343U/5Oog0ukTIaI/GqE6F62+ZfNRSBb54AjwwSig95kEeJHii0XiPX61FyE=; Received: by smtp.spodhuis.org with local id 1LcyA8-00029H-Lx; Fri, 27 Feb 2009 08:33:40 +0000 Date: Fri, 27 Feb 2009 00:33:40 -0800 From: Phil Pennock To: Jon Strait Cc: zsh workers Subject: Re: PATCH: New options for the PCRE module Message-ID: <20090227083340.GA44689@redoubt.spodhuis.org> Mail-Followup-To: Jon Strait , zsh workers References: <49A79326.5070703@moonloop.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49A79326.5070703@moonloop.net> X-Virus-Scanned: ClamAV 0.92.1/9053/Fri Feb 27 07:47:50 2009 on bifrost X-Virus-Status: Clean Chiming in because I'm the most recent person to mess with regex functionality (I think). On 2009-02-26 at 23:15 -0800, Jon Strait wrote: > 1. A new '-s' option to pcre_compile. This is the frequently set > PCRE_DOTALL option, allowing the dot character to match a newline as well. This makes sense, since we already have the other common options as flags. In the meantime, you know about the internal option setting feature of PCRE syntax, right? Putting (?s) at the start of the pattern is equivalent. > On the safe side, regarding the possibility of multi-byte characters, > I'm assuming that the returned offset positions are only for sending > back to pcre_match and not for indexing on a match string, because the > offsets are in byte count, not character count. This is dubious. I can see someone quite reasonably using $var[start,end] for substring extraction; the shell should be internally consistent. In a worst-case scenario, there could be another option to select which offset semantics shall be used. Peter's work on UTF-8 support has so far managed to keep the user from ever knowing or caring about this. Or return four numbers instead of two, so that anyone using the interface has to be aware of the difference and can think about it. I'm not coming up with a more elegant solution. > 3. A needed correction: all of the module's external variables are now > unset on each match attempt, so that a failed match will be obvious. Well, the exit status is set already. And since the last shell release, we've documented explicitly that nothing is altered: 2009-01-15: * 26312: Phil Pennock: Doc/Zsh/cond.yo, Doc/Zsh/mod_pcre.yo, Doc/Zsh/mod_regex.yo: Document no variables altered on failed match. On the other hand, there's value to a reset too. I don't have a strong preference either way, but now is the time to fix it, before there's been a release which documents the behaviour. :) Part of the problem is that pcre_regex has been in Zsh for many years and we tend to be cautious when changing behaviour. I doubt that anyone is relying on the value being unchanged after a match attempt. Thus the lack of a strong preference. Anyone else? While you're at it, there's also the zsh/regex module which uses the system's normal extended regex libraries and if you're changing the semantics of one, both should change. > Could someone please point me to the doc files that would need updating > (for the zshmodule man page), or if someone here has that part > automated, I can send them whatever targeted write-up they want. Doc/Zsh/mod_pcre.yo (and mod_regex.yo), which are in YODL format. > - ret = pcre_exec(pcre_pattern, pcre_hints, *args, strlen(*args), 0, 0, ovec, ovecsize); > + ret = pcre_exec(pcre_pattern, pcre_hints, *args, strlen(*args), offset_start, 0, ovec, ovecsize); How gracefully does pcre_exec() fail when offset_start is set to a value larger than the length of the string? To maxint-smallnumber? -Phil