zsh-workers
 help / color / mirror / code / Atom feed
From: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
To: zsh-workers@zsh.org
Subject: Re: Extending regexes
Date: Wed, 6 Jul 2022 19:07:40 -0400	[thread overview]
Message-ID: <YsYVvNOKTWeQU4k4@fullerene.field.pennock-tech.net> (raw)
In-Reply-To: <CAKc7PVAa718cFk2n1W=oxgxxRh-DEs8Cjvc5sKpqe8C3D+M-ig@mail.gmail.com>

On 2022-07-04 at 14:03 +0200, Sebastian Gniazdowski wrote:
> Zsh has extensions to regular regexes - the ~ and ^ negations. They, as it
> can be expected from negations that are required by Turing universal
> machines, introduce a whole new universe of computations over standard
> regular expressions. For example matching in an AND fashion:

For clarity: zsh has long had the module zsh/pcre, providing
-pcre-match; when the =~ regexp matching operator was added, we
deliberately chose to add a module zsh/regex to use the system ERE
libraries with -regex-match and made that the default implementation
behind the =~ operator.

If you're getting PCRE semantics, then probably somewhere in your
startup files you have something like `setopt re_match_pcre`.

A while back I wrote some bindings for using the RE2 library, which
matches the efficient regexps found in Go and which is licensed such
that more vendors might enable it by default with zsh.  I stopped as I
tried to puzzle through how to dig myself out of my own hole, in having
made `RE_MATCH_PCRE` be a simple boolean.

My _tentative_ thinking, which I'd appreciate feedback on, is to
introduce a new special parameter, `ZSH_EQTILDE_ENGINE` or somesuch;
have that only succeed when assigned a parseable value, and make
mutations of the RE_MATCH_PCRE be implicit assignments of `regex` or
`pcre` to that parameter.

Is this sane?  Are we happy introducing new special parameters, as long
as the name starts `zsh`?  Should the semantics just be "name of a
module" or a static list?  If "name of a module" then that would let
people do more than just use our engines (at their own risk), but should
we then update the .mdd files or the exported tables with some new
identifier to mark "use this function to back =~ when the engine points
here"?

I would quite like to move towards being able to expect "better, but
sane" REs to be available, even with commercial OS vendor builds of zsh.
I think RE2 is probably the best way forward, but ... I should probably
have asked long ago for advice on the design decisions which need to be
made.

-Phil


  parent reply	other threads:[~2022-07-06 23:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-04 12:03 Sebastian Gniazdowski
2022-07-04 13:47 ` Peter Stephenson
2022-07-04 19:15   ` Bart Schaefer
2022-07-04 19:41     ` Peter Stephenson
2022-07-06 10:03       ` Daniel Shahaf
2022-07-06 18:40   ` stephane
2022-07-06 23:07 ` Phil Pennock [this message]
2022-07-07  0:22   ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsYVvNOKTWeQU4k4@fullerene.field.pennock-tech.net \
    --to=zsh-workers+phil.pennock@spodhuis.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).