From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29384 invoked from network); 26 Apr 2007 20:19:45 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,FORGED_RCVD_HELO autolearn=ham version=3.1.8 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 26 Apr 2007 20:19:45 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 50880 invoked from network); 26 Apr 2007 20:19:38 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 26 Apr 2007 20:19:38 -0000 Received: (qmail 1932 invoked by alias); 26 Apr 2007 20:19:35 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 23331 Received: (qmail 1922 invoked from network); 26 Apr 2007 20:19:35 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 26 Apr 2007 20:19:35 -0000 Received: (qmail 50585 invoked from network); 26 Apr 2007 20:19:35 -0000 Received: from redoubt.spodhuis.org (HELO mx.spodhuis.org) (193.202.115.177) by a.mx.sunsite.dk with SMTP; 26 Apr 2007 20:19:29 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=first1; d=spodhuis.org; h=Received:Date:From:To:Subject:Message-ID:Mail-Followup-To:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To; b=i3lKhbxO1dlvcUvyVs9j0uoM9QAhFg7zvMHQo7Ny8Y7hwU81XSzvv1iVaZgjliSqAEgfuHMvEldWyHlGN33SUc7GAESksdwXQEbgHe2ZaC9U2tC0jf6nUjW4fBg8z3Ua3/2HdD73vAC5Il5I1fTLEmguA1xVEJFCsj3JQTD8nsU=; Received: by smtp.spodhuis.org with local id 1HhAR6-0006AK-PP; Thu, 26 Apr 2007 20:19:28 +0000 Date: Thu, 26 Apr 2007 13:19:28 -0700 From: Phil Pennock To: zsh-workers@sunsite.dk Subject: Re: PATCH: =~ regex match Message-ID: <20070426201928.GA52120@redoubt.spodhuis.org> Mail-Followup-To: zsh-workers@sunsite.dk References: <20070426041938.GA44533@redoubt.spodhuis.org> <200704260931.l3Q9V2Ak014589@news01.csr.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200704260931.l3Q9V2Ak014589@news01.csr.com> On 2007-04-26 at 10:31 +0100, Peter Stephenson wrote: > I thought about =~ and it seemed to me that since it would be largely > there for bash compatibility it would be better to do it with the system > regexp library, which would be more compatible and wouldn't depend on > optional packages. It shouldn't be too hard to do. We probably > wouldn't support BASH_REMATCH, however. Bash uses extended regexps, so the PCRE stuff should be vaguely a superset, although it might be worth adding a different match operator, overloading with condid, to specify some PCRE options to pcre_compile; I was thinking sticking another case before CPCRE_PLAIN in cond_pcre_match which sets pcre_opts and then falls through to the plain case. That would mostly affect newline handling. I was also thinking about how to deal with UTF8, which is another potential advantage to sticking with PCRE. Zsh isn't specifically UTF-8 when in widechar, is it? Is the "right" way something like (untested): #if defined(MULTIBYTE_SUPPORT) && defined(HAVE_NL_LANGINFO) && defined(CODESET) { static int have_utf8_pcre = -1; if (!strcmp(nl_langinfo(CODESET), "UTF-8")) { if (have_utf8_pcre == -1) { if (pcre_config(PCRE_CONFIG_UTF8, &have_utf8_pcre) { have_utf8_pcre = -2; /* erk, failed to ask */ } } if (have_utf8_pcre > 0) { pcre_opts |= PCRE_UTF8; } } } #endif Which means that in non-UTF-8 multibyte locales, you'll get per-octet regexps, but in UTF-8 locales, a multibyte zsh with a libpcre also built with UTF-8 support will let you get "proper" matching. I'm envious of the =~ operator but that doesn't mean that I want to lose the funky stuff of PCRE when I use it -- I like negative lookahead assertions, freak that I am. As to BASH_REMATCH ... how frowned upon are new zsh options which auto-set for compatibility? It wouldn't be hard, since the infrastructure's all already in place. Call the zsh option BASH_REMATCH to set the BASH_REMATCH variable. :^) % [[ alphabetical =~ ^a([^a]+)a([^a]+)a ]] && print -l $match lph betic Change the last parameter in cond_pcre_match()'s call to zpcre_get_substrings() to be non-NULL if the zsh option is set so that a different receptacle to "match" is set. If I code this up, is it likely to make it in? If not, I won't bother as full bash compatibility isn't so important to me, only having =~. It's not like POSIX is involved here ... Thanks, -Phil