zsh-workers
 help / color / mirror / code / Atom feed
* Issue with ${var#(*_)(#cN,M)}
@ 2015-10-19  9:33 Stephane Chazelas
  2015-10-19 19:17 ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Chazelas @ 2015-10-19  9:33 UTC (permalink / raw)
  To: Zsh hackers list

Unless I'm missing something, this looks like a bug:

~$ a='1_2_3_4_5_6'
~$ echo ${a#(*_)(#c1)}
2_3_4_5_6 #OK
~$ echo ${a#(*_)(#c2)}
2_3_4_5_6
~$ echo ${a#(*_)(#c3)}
3_4_5_6
~$ echo ${a#(*_)(#c4)}
4_5_6
~$ echo ${a#(*_)(#c5)}
4_5_6
~$ echo ${a#(*_)(#c6)}
3_4_5_6
~$ echo ${a#(*_)(#c7)}
4_5_6

~$ echo ${a%(_*)(#c1)}
1_2_3_4_5
~$ echo ${a%(_*)(#c2)}
1_2_3_4_5_6
~$ echo ${a%(_*)(#c3)}
1_2_3_4_5_6
~$ echo ${a%(_*)(#c4)}
1_2_3_4

~$ echo ${(S)a/(*_)(#c1)/+}
+2_3_4_5_6
~$ echo ${(S)a/(*_)(#c2)/+}
+2_3_4_5_6
~$ echo ${(S)a/(*_)(#c3)/+}
+3_4_5_6

These are OK:

~$ echo ${a#(?_)(#c1)}
2_3_4_5_6
~$ echo ${a#(?_)(#c2)}
3_4_5_6
~$ echo ${a#(?_)(#c3)}
4_5_6

~$ echo ${a#([^_]#_)(#c1)}
2_3_4_5_6
~$ echo ${a#([^_]#_)(#c2)}
3_4_5_6
~$ echo ${a#([^_]#_)(#c3)}
4_5_6


zsh 5.0.2 (x86_64-pc-linux-gnu) and zsh 5.1.1 (x86_64-debian-linux-gnu)

-- 
Stephane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-19  9:33 Issue with ${var#(*_)(#cN,M)} Stephane Chazelas
@ 2015-10-19 19:17 ` Bart Schaefer
  2015-10-20 19:09   ` Stephane Chazelas
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2015-10-19 19:17 UTC (permalink / raw)
  To: Zsh hackers list

On Oct 19, 10:33am, Stephane Chazelas wrote:
} Subject: Issue with ${var#(*_)(#cN,M)}
}
} Unless I'm missing something, this looks like a bug:

Hm.  I think it's counting the number of times it backtracked.  E.g.

} ~$ a='1_2_3_4_5_6'
} ~$ echo ${a#(*_)(#c2)}
} 2_3_4_5_6

Here, it first matched "1_2_3_4_5_" but then couldn't match a second
time, so it backtracked, matched "1_", and stopped counting.

However, there's an interaction with ${a#...} here -- because you've
asked for the shortest match, glob.c:igetmatch() first tries for the
longest match and then "brute-force" (see comment in glob.c) looks
for a shorter one.  So the pattern code gets invoked multiple times.
To make (*_)(#c2) work as you'd expect (each (*_) uses the shortest
match and then is tried again on the remainder) I think we would have
to teach the pattern code itself about shortest/longest match.

There's a further issue that backreferences don't seem to be well-
defined when a parenthesized subpattern is required to repeat.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-19 19:17 ` Bart Schaefer
@ 2015-10-20 19:09   ` Stephane Chazelas
  2015-10-20 23:04     ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Chazelas @ 2015-10-20 19:09 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2015-10-19 12:17:28 -0700, Bart Schaefer:
> On Oct 19, 10:33am, Stephane Chazelas wrote:
> } Subject: Issue with ${var#(*_)(#cN,M)}
> }
> } Unless I'm missing something, this looks like a bug:
> 
> Hm.  I think it's counting the number of times it backtracked.  E.g.
> 
> } ~$ a='1_2_3_4_5_6'
> } ~$ echo ${a#(*_)(#c2)}
> } 2_3_4_5_6
> 
> Here, it first matched "1_2_3_4_5_" but then couldn't match a second
> time, so it backtracked, matched "1_", and stopped counting.
[...]

Note that the:

~$ echo ${a#*_*_}
3_4_5_6
~$ echo ${a#*_*_*_}
4_5_6

work OK.

And also:

~$ echo ${a#(*_)(*_)(*_|)}
3_4_5_6
~$ echo ${a#(*_)(*_)(*_|)4}
_5_6

(equivalent of (#c2,3))

-- 
Stephane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-20 19:09   ` Stephane Chazelas
@ 2015-10-20 23:04     ` Bart Schaefer
  2015-10-27 10:00       ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2015-10-20 23:04 UTC (permalink / raw)
  To: Zsh hackers list

On Oct 20,  8:09pm, Stephane Chazelas wrote:
} Subject: Re: Issue with ${var#(*_)(#cN,M)}
}
} 2015-10-19 12:17:28 -0700, Bart Schaefer:
} > 
} > } ~$ a='1_2_3_4_5_6'
} > } ~$ echo ${a#(*_)(#c2)}
} > } 2_3_4_5_6
} > 
} > Here, it first matched "1_2_3_4_5_" but then couldn't match a second
} > time, so it backtracked, matched "1_", and stopped counting.
} 
} Note that the:
} 
} ~$ echo ${a#*_*_}
} 3_4_5_6
} ~$ echo ${a#*_*_*_}
} 4_5_6
} 
} work OK.

Well, yes, but not really relevant.

} And also:
} 
} ~$ echo ${a#(*_)(*_)(*_|)}
} 3_4_5_6

The (#c) modifier is not implemented by replicating the pattern, it's
implemented by counting the number of successful trials that can be
made using the single pattern.  So it really makes no difference to
the bug that manually repeating the pattern does the right thing.

What's messing it up is the "*" operator and the backtracking that is
implied because * can match anything.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-20 23:04     ` Bart Schaefer
@ 2015-10-27 10:00       ` Peter Stephenson
  2015-10-27 10:46         ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2015-10-27 10:00 UTC (permalink / raw)
  To: Zsh hackers list

Original problem
> } ~$ a='1_2_3_4_5_6'
> } ~$ echo ${a#(*_)(#c2)}
> } 2_3_4_5_6

On Tue, 20 Oct 2015 16:04:22 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> What's messing it up is the "*" operator and the backtracking that is
> implied because * can match anything.

Exactly.  What's backtracking over what in what order here is a bit of
nightmare, and I'm not sure I'm likely to get my mind round it.

Unless someone does, you'll be better of sticking to

% a='1_2_3_4_5_6'
% echo ${a#([^_]#_)(#c2)}
3_4_5_6

and then we don't have the "*" within the group to worry about.

pws


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-27 10:00       ` Peter Stephenson
@ 2015-10-27 10:46         ` Peter Stephenson
  2015-10-27 11:03           ` Stephane Chazelas
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2015-10-27 10:46 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, 27 Oct 2015 10:00:34 +0000
Peter Stephenson <p.stephenson@samsung.com> wrote:
> Original problem
> > } ~$ a='1_2_3_4_5_6'
> > } ~$ echo ${a#(*_)(#c2)}
> > } 2_3_4_5_6
> 
> On Tue, 20 Oct 2015 16:04:22 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > What's messing it up is the "*" operator and the backtracking that is
> > implied because * can match anything.
> 
> Exactly.  What's backtracking over what in what order here is a bit of
> nightmare, and I'm not sure I'm likely to get my mind round it.
> 
> Unless someone does, you'll be better of sticking to
> 
> % a='1_2_3_4_5_6'
> % echo ${a#([^_]#_)(#c2)}
> 3_4_5_6
> 
> and then we don't have the "*" within the group to worry about.

Indeed, I've just noticed that with
% egrep --version
egrep (GNU grep) 2.8

the following:

% egrep '^(*_){2}$' <<<'1_2_'

fails to match completely, i.e the backtracking is too complicated
to handle, whereas

% egrep '^([^_]+_){2}$' <<<'1_2_'

succeeds.  At this point, I'm going to document the difficulty and
slowly retreat backwards from the dark corner.

pws

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 5ea8610..49a0f0d 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2192,6 +2192,16 @@ inclusive.  The form tt(LPAR()#c)var(N)tt(RPAR()) requires exactly tt(N)
 matches; tt(LPAR()#c,)var(M)tt(RPAR()) is equivalent to specifying var(N)
 as 0; tt(LPAR()#c)var(N)tt(,RPAR()) specifies that there is no maximum
 limit on the number of matches.
+
+Note that if the previous group of characters contains wildcards,
+results can be unpredictable to the point of being logically incorrect.
+It is recommended that the pattern be trimmed to match the minimum
+possible.  For example, to match a string of the form `tt(1_2_3_)', use
+a pattern of the form `tt(LPAR()[[:digit:]]##_+RPAR()LPAR()#c3+RPAR())', not
+`tt(LPAR()*_+RPAR()LPAR()#c3+RPAR())'.  This arises from the
+complicated interaction between attempts to match a number of
+repetitions of the whole pattern and attempts to match the wildcard
+`tt(*)'.
 )
 vindex(MATCH)
 vindex(MBEGIN)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-27 10:46         ` Peter Stephenson
@ 2015-10-27 11:03           ` Stephane Chazelas
  2015-10-27 11:11             ` Peter Stephenson
  2015-10-27 11:11             ` Stephane Chazelas
  0 siblings, 2 replies; 10+ messages in thread
From: Stephane Chazelas @ 2015-10-27 11:03 UTC (permalink / raw)
  To: zsh-workers

2015-10-27 10:46:33 +0000, Peter Stephenson:
[...]
> % egrep '^(*_){2}$' <<<'1_2_'
> 
> fails to match completely, i.e the backtracking is too complicated
> to handle, whereas
[...]

Except that it should be .* in REs and that REs are greedy.

$ egrep '^(.*_){2}$' <<<'1_2_'
1_2_


$ grep -Eo '^(.*_){2}' <<<'1_2_3_4_5'
1_2_3_4_
$ grep -Po '^(.*?_){2}' <<<'1_2_3_4_5'
1_2_

ksh93 is also fine with it:

$ a='1_2_3_4_5'  ksh -c 'echo "${a#{2}(*_)}"'
3_4_5
$ a='1_2_3_4_5'  ksh -c 'echo "${a##{2}(*_)}"'
5

The zsh limitation should probably be documented if not fixed.

-- 
Stephane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-27 11:03           ` Stephane Chazelas
@ 2015-10-27 11:11             ` Peter Stephenson
  2015-10-27 11:11             ` Stephane Chazelas
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 2015-10-27 11:11 UTC (permalink / raw)
  To: zsh-workers

On Tue, 27 Oct 2015 11:03:53 +0000
Stephane Chazelas <stephane.chazelas@gmail.com> wrote:
> 2015-10-27 10:46:33 +0000, Peter Stephenson:
> [...]
> > % egrep '^(*_){2}$' <<<'1_2_'
> > 
> > fails to match completely, i.e the backtracking is too complicated
> > to handle, whereas
> [...]
> 
> Except that it should be .* in REs and that REs are greedy.

You're right, I got the pattern wrong, and it does work.

pws


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-27 11:03           ` Stephane Chazelas
  2015-10-27 11:11             ` Peter Stephenson
@ 2015-10-27 11:11             ` Stephane Chazelas
  2015-10-27 11:37               ` Peter Stephenson
  1 sibling, 1 reply; 10+ messages in thread
From: Stephane Chazelas @ 2015-10-27 11:11 UTC (permalink / raw)
  To: zsh-workers

2015-10-27 11:03:53 +0000, Stephane Chazelas:
[...]
> ksh93 is also fine with it:
> 
> $ a='1_2_3_4_5'  ksh -c 'echo "${a#{2}(*_)}"'
> 3_4_5
> $ a='1_2_3_4_5'  ksh -c 'echo "${a##{2}(*_)}"'
> 5
> 
> The zsh limitation should probably be documented if not fixed.
[...]

Another work around is to use zsh's PCREs:

$ a='1_2_3_4_5'  zsh -o rematchpcre -c '[[ $a =~ "(?s)^(.*?_){2}" ]] &&echo $MATCH'
1_2_

$ a=$'1_2_3_4_5\nqweq'  zsh -o rematchpcre -c '[[ $a =~ "(?s)^(?:.*?_){2}(.*)" ]]; echo $match'
3_4_5
qweq

-- 
Stephane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with ${var#(*_)(#cN,M)}
  2015-10-27 11:11             ` Stephane Chazelas
@ 2015-10-27 11:37               ` Peter Stephenson
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 2015-10-27 11:37 UTC (permalink / raw)
  To: zsh-workers

Sigh.  Can't see the wood for the trees.  It is a backtracking problem,
but it's a simple bug with restoring the state when backtracking, not a
logical error in the matching machine.

I'll take out the weasel words again, shall I?

pws

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 49a0f0d..5ea8610 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2192,16 +2192,6 @@ inclusive.  The form tt(LPAR()#c)var(N)tt(RPAR()) requires exactly tt(N)
 matches; tt(LPAR()#c,)var(M)tt(RPAR()) is equivalent to specifying var(N)
 as 0; tt(LPAR()#c)var(N)tt(,RPAR()) specifies that there is no maximum
 limit on the number of matches.
-
-Note that if the previous group of characters contains wildcards,
-results can be unpredictable to the point of being logically incorrect.
-It is recommended that the pattern be trimmed to match the minimum
-possible.  For example, to match a string of the form `tt(1_2_3_)', use
-a pattern of the form `tt(LPAR()[[:digit:]]##_+RPAR()LPAR()#c3+RPAR())', not
-`tt(LPAR()*_+RPAR()LPAR()#c3+RPAR())'.  This arises from the
-complicated interaction between attempts to match a number of
-repetitions of the whole pattern and attempts to match the wildcard
-`tt(*)'.
 )
 vindex(MATCH)
 vindex(MBEGIN)
diff --git a/Src/pattern.c b/Src/pattern.c
index 8b07cca..9e8a80a 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3376,6 +3376,7 @@ patmatch(Upat prog)
 		    scan[P_CT_CURRENT].l = cur + 1;
 		    if (patmatch(scan + P_CT_OPERAND))
 			return 1;
+		    scan[P_CT_CURRENT].l = cur;
 		    patinput = patinput_thistime;
 		}
 		if (cur < min)
diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst
index 3e2095a..f944a4f 100644
--- a/Test/D02glob.ztst
+++ b/Test/D02glob.ztst
@@ -574,3 +574,11 @@
 0:Optimisation to squeeze multiple *'s used as ordinary glob wildcards.
 >glob.tmp/ra=1.0_et=3.5
 >glob.tmp/ra=1.0_et=3.5
+
+  [[ 1_2_ = (*_)(#c1) ]] && print 1 OK  # because * matches 1_2
+  [[ 1_2_ = (*_)(#c2) ]] && print 2 OK
+  [[ 1_2_ = (*_)(#c3) ]] || print 3 OK
+0:Some more complicated backtracking with match counts.
+>1 OK
+>2 OK
+>3 OK
diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index f1cc23e..cb7079c 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -1735,3 +1735,12 @@
 0:History modifier works the same for scalar and array substitution
 >ddd bdb cdc
 >ddd bdb cdc
+
+ a=1_2_3_4_5_6
+ print ${a#(*_)(#c2)}
+ print ${a#(*_)(#c5)}
+ print ${a#(*_)(#c7)}
+0:Complicated backtracking with match counts
+>3_4_5_6
+>6
+>1_2_3_4_5_6


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-10-27 11:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-19  9:33 Issue with ${var#(*_)(#cN,M)} Stephane Chazelas
2015-10-19 19:17 ` Bart Schaefer
2015-10-20 19:09   ` Stephane Chazelas
2015-10-20 23:04     ` Bart Schaefer
2015-10-27 10:00       ` Peter Stephenson
2015-10-27 10:46         ` Peter Stephenson
2015-10-27 11:03           ` Stephane Chazelas
2015-10-27 11:11             ` Peter Stephenson
2015-10-27 11:11             ` Stephane Chazelas
2015-10-27 11:37               ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).