From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 805 invoked by alias); 8 Sep 2014 14:11:45 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 19059 Received: (qmail 5499 invoked from network); 8 Sep 2014 14:11:43 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, SPF_HELO_PASS autolearn=ham version=3.3.2 X-AuditID: cbfec7f5-b7f776d000003e54-4a-540db6c1b1b3 Date: Mon, 08 Sep 2014 15:01:35 +0100 From: Peter Stephenson To: Paulo =?UTF-8?B?Q8Opc2Fy?= Pereira de Andrade , zsh-users@zsh.org Subject: Re: There is a serious inefficiency in the way zsh handles wildcards Message-id: <20140908150135.6bbf5356@pwslap01u.europe.root.pri> In-reply-to: References: Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: quoted-printable X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprOLMWRmVeSWpSXmKPExsVy+t/xq7oHt/GGGHzs4ba4tPgSq8WOkysZ HZg8ds66y+6x6uAHpgCmKC6blNSczLLUIn27BK6MA8+msBRMEa54v/c9ewPjcv4uRk4OCQET iU239jFC2GISF+6tZ+ti5OIQEljKKDF//0wWCKefSeLMnJNMIFUsAqoS8yftYQGx2QQMJaZu mg3WLSJQKNH27QKYLSzgK7Fm7QZ2EJtXwF5i244eVhCbUyBY4ufBz8wgtpBAgMTUcyvAbH4B fYmrfz8xQVxhLzHzyhlGiF5BiR+T74HtYhZQl5g0bxEzhK0t8eTdBdYJjAKzkJTNQlI2C0nZ AkbmVYyiqaXJBcVJ6blGesWJucWleel6yfm5mxghgfl1B+PSY1aHGAU4GJV4eBOu8oQIsSaW FVfmHmKU4GBWEuH1WccbIsSbklhZlVqUH19UmpNafIiRiYNTqoFRit2C+68LW35Np3GZaEaj xtWdVy9NmrY/TEGR6+YCsRmPcie+njWZI2VdUtntjG9KEzee/uO3/15v+TqOvS22OVK5+g59 fb4HPlybcmNTjN25qLVPb11+coZ/ssbGuyfOf148w/H1u/laGypSzR+dytAs3zRtbdTtRxzz AyPc64/+PXJoYnfhLyWW4oxEQy3mouJEALqXBj4qAgAA On Mon, 08 Sep 2014 10:27:48 -0300 Paulo C=C3=A9sar Pereira de Andrade wrote: > ---%<--- > diff -up zsh-5.0.2/Src/pattern.c.orig zsh-5.0.2/Src/pattern.c > --- zsh-5.0.2/Src/pattern.c.orig 2014-09-03 12:21:44.673792750 -0300 > +++ zsh-5.0.2/Src/pattern.c 2014-09-03 12:22:28.069303587 -0300 > @@ -2911,6 +2911,10 @@ patmatch(Upat prog) > break; > case P_STAR: > /* Handle specially for speed, although really P_ONEHASH+P_ANY */ > + while (P_OP(next) =3D=3D P_STAR) { > + scan =3D next; > + next =3D PATNEXT(scan); > + } > case P_ONEHASH: > case P_TWOHASH: > /* > ---%<--- >=20 > Do you believe this patch is OK? In other words, if we're handling a "*" down in the pattern code --- as you say, we've already decided higher up if it's the special ** or *** for directories so there's no problem with those --- we can skip any immediately following *s because they don't add anything but will provoke horrifically inefficient recursion. (That's because when backtracking we keep trying each separate * from each position --- the number of possibilities is humongous.) Yes, that sounds entirely reasonable. It doesn't patch cleanly any more; I think the following works and I've added a new test (all tests pass). diff --git a/Src/pattern.c b/Src/pattern.c index 94a299e..adc73c1 100644 --- a/Src/pattern.c +++ b/Src/pattern.c @@ -3012,6 +3012,16 @@ patmatch(Upat prog) break; case P_STAR: /* Handle specially for speed, although really P_ONEHASH+P_ANY */ + while (P_OP(next) =3D=3D P_STAR) { + /* + * If there's another * following we can optimise it + * out. Chains of *'s can give pathologically bad + * performance. + */ + scan =3D next; + next =3D PATNEXT(scan); + } + /*FALLTHROUGH*/ case P_ONEHASH: case P_TWOHASH: /* diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst index 4697ca4..217ce7c 100644 --- a/Test/D02glob.ztst +++ b/Test/D02glob.ztst @@ -565,3 +565,10 @@ print $match[1] 0:(#q) is ignored completely in conditional pattern matching >fichier + +# The following should not cause excessive slowdown. + print glob.tmp/*.* + print glob.tmp/**************************.************************* +0:Optimisation to squeeze multiple *'s used as ordinary glob wildcards. +>glob.tmp/ra=3D1.0_et=3D3.5 +>glob.tmp/ra=3D1.0_et=3D3.5 Thanks. pws