zsh-workers
 help / color / mirror / code / Atom feed
From: Stephane Chazelas <stephane.chazelas@gmail.com>
To: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
Cc: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: =~ doesn't work with NUL characters
Date: Thu, 15 Jun 2017 10:50:34 +0100	[thread overview]
Message-ID: <20170615095033.GC2416@chaz.gmail.com> (raw)
In-Reply-To: <20170614204938.GA76510@tower.spodhuis.org>

2017-06-14 16:49:38 -0400, Phil Pennock:
[...]
> Without rematchpcre, this is ERE per POSIX APIs, which don't portably
> support size-supplied strings, relying instead upon C-string
> null-termination.
> 
> Current macOS has regnexec() but this is not in the system regexp
> library I see on Ubuntu Trusty or FreeBSD 10.3.  It appears to be an
> extension from when they switched to the TRE implementation in macOS
> 10.8.  <https://laurikari.net/tre/>
> 
> Trying to support this would result in variations in behaviour across
> systems in a way which I think might be undesirable.  The whole point of
> adding the non-PCRE implementation was to match Bash behaviour by
> default, and Bash does the same thing.
[...]

A dirty trick in UTF-8 locales (the norm these days) may be to
encode NUL as U+7FFFFF00 (and bytes 0x80 -> 0xff that don't
form part  of valid characters as U_7FFFFF{80..FF}) (in both the
string and regexp).

That wouldn't work with every regexp implementation though as
some would treat those as invalid characters if they go by
the newer definition where valid characters are only
0000->D7FF, E000->10FFFF.

But with those that do, that would also make the behaviour more
consistent in cases like:

[[ $'\x80' = ? ]] vs [[ $'\x80' =~ '^.$' ]]

That wouldn't help in things like [[ x =~ $'[\0-\177]' ]] (which
anyway doesn't make sense in locales other than C/POSIX) though.

-- 
Stephane


      parent reply	other threads:[~2017-06-15  9:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-13 10:02 Stephane Chazelas
2017-06-14 20:49 ` Phil Pennock
2017-06-14 23:08   ` Bart Schaefer
2017-06-15  7:38   ` Peter Stephenson
2017-06-15  8:18   ` Stephane Chazelas
2017-06-15  9:50   ` Stephane Chazelas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170615095033.GC2416@chaz.gmail.com \
    --to=stephane.chazelas@gmail.com \
    --cc=zsh-workers+phil.pennock@spodhuis.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).