zsh-users
 help / color / mirror / code / Atom feed
* regex matching regression in 5.0.0 vs. 4.3.17
@ 2012-09-15 15:37 Moritz Bunkus
  2012-09-15 19:42 ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Moritz Bunkus @ 2012-09-15 15:37 UTC (permalink / raw)
  To: zsh-users

Hey,

I've been using the following two lines inside a function whose output
is used in PS1. Basically: PS1="...$(parse_git_branch)..."

And the lines I'm talking about:

function parse_git_branch {
  git_dir="$(git rev-parse --git-dir 2> /dev/null)"
  branch_pattern="\*\s+([^${IFS}]+)"
  no_branch_pattern="\*\s+\(no branch\)"

  if [[ ${git_status} =~ ${no_branch_pattern} ]]; then
    branch="detached"
  elif [[ ${git_status} =~ ${branch_pattern} ]]; then
    branch="${match[1]}"
  elif [[ -f ${git_dir}/BISECT_LOG ]]; then
    branch="${CLR_LIGHT_CYAN}bisecting"
  fi

  if [[ ! -z ${branch} ]]; then
    echo " (${branch})"
  fi
}

The idea is to get the current branch name if I'm inside a git repository.

This worked nicely with zsh 4.3.17 on my Ubuntu.

Now I wanted to use that same function on Arch Linux which is already
at zsh 5.0.0 and got the following error:

  parse_git_branch:8: failed to compile regex: Unmatched [ or [^

Turned out it didn't like the ${IFS} inside the pattern (never mind
the line number discrepency, I've cut out some other stuff from the
function).

I also made sure that this isn't Arch Linux specific by compiling zsh
5.0.0 from source on my Ubuntu machine (the one which usually runs
4.3.17). Same issue.

Replacing ${IFS} with \n gets rid of the error but does something
different than it did before if more than one local branch exists in a
git repository. Instead of just the current line ${match[1]} contains
everything starting from the current line until the end. Probably
because I don't escape that newline character properly.

So... any pointers? I'll try to bisect the actual change that caused
this, might take some time though.

Kind regards,
mosu


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regex matching regression in 5.0.0 vs. 4.3.17
  2012-09-15 15:37 regex matching regression in 5.0.0 vs. 4.3.17 Moritz Bunkus
@ 2012-09-15 19:42 ` Peter Stephenson
  2012-09-15 21:12   ` Moritz Bunkus
  2012-09-15 23:52   ` Phil Pennock
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Stephenson @ 2012-09-15 19:42 UTC (permalink / raw)
  To: zsh-users

On Sat, 15 Sep 2012 17:37:47 +0200
Moritz Bunkus <moritz@bunkus.org> wrote:
>   parse_git_branch:8: failed to compile regex: Unmatched [ or [^
> 
> Turned out it didn't like the ${IFS} inside the pattern (never mind
> the line number discrepency, I've cut out some other stuff from the
> function).

This simplifies to

[[ x =~ [^${IFS}] ]]

The problem is that IFS in zsh contains an ASCII null.  As the regular
expression is a null-terminated string, it ends at that point, with the
error noted.  The reason this has changed is that before the patch you
noted the null was left encoded as a space with an 8th-bit-set marker
before it; it didn't do the right thing, but as it was in a character
group you got away with it.  So it's actually not a new breakage, just a
different one.

I'm not aware of any standard way of getting a null character into a
regular expression; they don't understand \0 or anything similar, and
presumably in any case regexec() hiccups in exactly the same way as
regcomp().  Even with pcre

[[ $'\x00' =~ '\x00' ]]

doesn't work (c.f. $'\x41' and '\x41' which does).

So unless anyone can think of a smart solution, I think the only answer
is to remove NULL characters from the body of the regular expression and
document that this happens.

You can do ${IFS//$'\0'} but it's not clear to me you should have to,
that use of IFS seems like it ought to work.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regex matching regression in 5.0.0 vs. 4.3.17
  2012-09-15 19:42 ` Peter Stephenson
@ 2012-09-15 21:12   ` Moritz Bunkus
  2012-09-15 23:52   ` Phil Pennock
  1 sibling, 0 replies; 4+ messages in thread
From: Moritz Bunkus @ 2012-09-15 21:12 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

Hey,

thanks for the thorough explanation. I've switched to PCRE RE matching
which makes all that $IFS matching as easy as:

pcre_compile "^\* +(.*?)$"

Kind regards,
mosu


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regex matching regression in 5.0.0 vs. 4.3.17
  2012-09-15 19:42 ` Peter Stephenson
  2012-09-15 21:12   ` Moritz Bunkus
@ 2012-09-15 23:52   ` Phil Pennock
  1 sibling, 0 replies; 4+ messages in thread
From: Phil Pennock @ 2012-09-15 23:52 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

On 2012-09-15 at 20:42 +0100, Peter Stephenson wrote:
> So unless anyone can think of a smart solution, I think the only answer
> is to remove NULL characters from the body of the regular expression and
> document that this happens.

The situation sucks, clearly.

So: is it better to change the NUL to something else, to strip it out
(shortening the pattern) or to just document that NULs are bad?

For the POSIX system library regex module, a NUL will always be bad.

For PCRE, pcre_exec() takes a length parameter for the haystack string,
so one option might be to change the NUL in the _pattern_ to be \x00
instead?

It seems that for PCRE, supplying a length-receiving parameter to
unmetafy() and comparing that to strlen() should be right, and then
switching the result if so.

If I do this, then zsh/pcre should be able to handle NULs fine in both
needle/pattern and haystack.

For regex .. generally, I'm not in favour of hidden mutations of strings
which might change whether they match or not.  I can just document it as
a limitation of non-PCRE?

-Phil


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-09-16  0:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-15 15:37 regex matching regression in 5.0.0 vs. 4.3.17 Moritz Bunkus
2012-09-15 19:42 ` Peter Stephenson
2012-09-15 21:12   ` Moritz Bunkus
2012-09-15 23:52   ` Phil Pennock

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).