zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: zsh-workers@sunsite.dk
Subject: Re: PATCH: =~ regex match
Date: Fri, 27 Apr 2007 10:33:35 +0100	[thread overview]
Message-ID: <20070427103335.55f8d171.pws@csr.com> (raw)
In-Reply-To: <20070426201928.GA52120@redoubt.spodhuis.org>

Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> I was also thinking about how to deal with UTF8, which is another
> potential advantage to sticking with PCRE.  Zsh isn't specifically
> UTF-8 when in widechar, is it?

That's correct, but that's actually an advantage of the system regular
expression libraries, which will use the locale in the same way as the
rest of the system to handle multibyte strings.

> #if defined(MULTIBYTE_SUPPORT) && defined(HAVE_NL_LANGINFO) && defined
> #(CODESET)
>   {
>     static int have_utf8_pcre = -1;
> 
>     if (!strcmp(nl_langinfo(CODESET), "UTF-8")) {
>       if (have_utf8_pcre == -1) {
>         if (pcre_config(PCRE_CONFIG_UTF8, &have_utf8_pcre) {
> 	  have_utf8_pcre = -2; /* erk, failed to ask */
> 	}
>       }
> 
>       if (have_utf8_pcre > 0) {
>         pcre_opts |= PCRE_UTF8;
>       }
>     }
>   }
> #endif
> 
> Which means that in non-UTF-8 multibyte locales, you'll get per-octet
> regexps, but in UTF-8 locales, a multibyte zsh with a libpcre also
> built with UTF-8 support will let you get "proper" matching.

You might want to add that to the pcre library, if appropriate; you
probably also need to test for isset(MULTIBYTE) since unsetting the
multibyte option is supposed to force all strings to be single bytes.

> I'm envious of the =~ operator but that doesn't mean that I want to
> lose the funky stuff of PCRE when I use it -- I like negative
> lookahead assertions, freak that I am.

I don't think there's any question of removing -pcre-match.

> As to BASH_REMATCH ... how frowned upon are new zsh options which
> auto-set for compatibility?  It wouldn't be hard, since the
> infrastructure's all already in place.  Call the zsh option
> BASH_REMATCH to set the BASH_REMATCH variable.  :^)

That would be perfectly sensible.

> If I code this up, is it likely to make it in?  If not, I won't bother
> as full bash compatibility isn't so important to me, only having =~.
> It's not like POSIX is involved here ...

Well, actually it is, since basic shell features should use basic system
features wherever possible rather than requiring optional libraries.  If
we're going to add =~ because it's in bash I don't seen any real point
in duplicating -pcre-match to do it, and the POSIX
regcomp/regexec/regerror/regfree should be available just about
everywhere.

When that happens...

> I just double-checked something in passing and discovered that Bash
> uses the equivalent of KSH_ARRAYS, so the variable would need to be
> marked similarly to that and provided with the entire matched portion
> of the string in index 0.

We'll do it the usual way and respect the setting of KSH_ARRAYS.  This
is on in bash compatibility mode.  If that's not set, but BASH_REMATCH
is, we'll put the first match in $BASH_REMATCH[1].

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


      parent reply	other threads:[~2007-04-27  9:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-26  4:19 Phil Pennock
2007-04-26  5:12 ` Phil Pennock
2007-04-26  9:31 ` Peter Stephenson
2007-04-26 20:19   ` Phil Pennock
2007-04-27  0:06     ` Phil Pennock
2007-04-27  9:33     ` Peter Stephenson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070427103335.55f8d171.pws@csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).