From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5167 invoked by alias); 27 Oct 2015 10:46:44 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36977 Received: (qmail 12763 invoked from network); 27 Oct 2015 10:46:43 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-AuditID: cbfec7f5-f794b6d000001495-f0-562f560fc48a Date: Tue, 27 Oct 2015 10:46:33 +0000 From: Peter Stephenson To: Zsh hackers list Subject: Re: Issue with ${var#(*_)(#cN,M)} Message-id: <20151027104633.2479414f@pwslap01u.europe.root.pri> In-reply-to: <20151027100034.45f487f0@pwslap01u.europe.root.pri> References: <20151019093316.GA6957@chaz.gmail.com> <151019121728.ZM324@torch.brasslantern.com> <20151020190946.GA6560@chaz.gmail.com> <151020160422.ZM1778@torch.brasslantern.com> <20151027100034.45f487f0@pwslap01u.europe.root.pri> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrELMWRmVeSWpSXmKPExsVy+t/xy7r8YfphBtt+S1scbH7I5MDoserg B6YAxigum5TUnMyy1CJ9uwSujLmLdrEV3BCo2PlxDWsD4yneLkZODgkBE4mDjfuYIGwxiQv3 1rN1MXJxCAksZZRYv/ovWEJIYAaTxLNuf4jENkaJ93++MIIkWARUJbrufgSz2QQMJaZumg1m iwhoSew4eRKomYNDGMhe9boSJMwrYC+xtW8aI0iYU8BB4tobTYiR7xglri78AdbKL6AvcfXv J6iD7CVmXjnDCNErKPFj8j0WEJsZaOTmbU2sELa8xOY1b5kh7lSXuHF3N/sERqFZSFpmIWmZ haRlASPzKkbR1NLkguKk9FwjveLE3OLSvHS95PzcTYyQkP26g3HpMatDjAIcjEo8vAYVemFC rIllxZW5hxglOJiVRHgFdfTDhHhTEiurUovy44tKc1KLDzFKc7AoifPO3PU+REggPbEkNTs1 tSC1CCbLxMEp1cCYsKgzkeE+//wtz3b5yO7Z8cV7hXXfxz7jvzamV7if2a/ZPtlv5YXNsRMe 28nosbElhZzateEvx64f84tccx0lg0KD9Aw+zn+1rC3j7KcfBua3Vl/wWWRhk9U2RWzNql1m WTPmTbroZ5N9MVIwJfJG7ayfYnVTTgdMLbfn/LxjpnTMo5/+i7x1lViKMxINtZiLihMBf8Ey L1UCAAA= On Tue, 27 Oct 2015 10:00:34 +0000 Peter Stephenson wrote: > Original problem > > } ~$ a='1_2_3_4_5_6' > > } ~$ echo ${a#(*_)(#c2)} > > } 2_3_4_5_6 > > On Tue, 20 Oct 2015 16:04:22 -0700 > Bart Schaefer wrote: > > What's messing it up is the "*" operator and the backtracking that is > > implied because * can match anything. > > Exactly. What's backtracking over what in what order here is a bit of > nightmare, and I'm not sure I'm likely to get my mind round it. > > Unless someone does, you'll be better of sticking to > > % a='1_2_3_4_5_6' > % echo ${a#([^_]#_)(#c2)} > 3_4_5_6 > > and then we don't have the "*" within the group to worry about. Indeed, I've just noticed that with % egrep --version egrep (GNU grep) 2.8 the following: % egrep '^(*_){2}$' <<<'1_2_' fails to match completely, i.e the backtracking is too complicated to handle, whereas % egrep '^([^_]+_){2}$' <<<'1_2_' succeeds. At this point, I'm going to document the difficulty and slowly retreat backwards from the dark corner. pws diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo index 5ea8610..49a0f0d 100644 --- a/Doc/Zsh/expn.yo +++ b/Doc/Zsh/expn.yo @@ -2192,6 +2192,16 @@ inclusive. The form tt(LPAR()#c)var(N)tt(RPAR()) requires exactly tt(N) matches; tt(LPAR()#c,)var(M)tt(RPAR()) is equivalent to specifying var(N) as 0; tt(LPAR()#c)var(N)tt(,RPAR()) specifies that there is no maximum limit on the number of matches. + +Note that if the previous group of characters contains wildcards, +results can be unpredictable to the point of being logically incorrect. +It is recommended that the pattern be trimmed to match the minimum +possible. For example, to match a string of the form `tt(1_2_3_)', use +a pattern of the form `tt(LPAR()[[:digit:]]##_+RPAR()LPAR()#c3+RPAR())', not +`tt(LPAR()*_+RPAR()LPAR()#c3+RPAR())'. This arises from the +complicated interaction between attempts to match a number of +repetitions of the whole pattern and attempts to match the wildcard +`tt(*)'. ) vindex(MATCH) vindex(MBEGIN)