From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27874 invoked by alias); 11 Sep 2014 13:54:40 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 19062 Received: (qmail 2136 invoked from network); 11 Sep 2014 13:54:28 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=cfSuHQrRYNS1o0stBLaPs3v4fZROc1UbOcgcTTWcxDA=; b=KHHasnyVobHSTLESGo0z46Lp74IUpjitOil2FJ6fGCHJmxTqNHOprasTSmy5WhsYZW esMzMfDwDy0p5h0efy89Dx8LpsYyuFGOeXLj8m1PTjj43jsi0t+x47v39EUBA6bTihG8 Qe1oMwBoF2peSUl/lhhOVHBQaVmZ5sHfILsLcXRfODajO99ooLMgqqt+CTk/t41JOktQ VhBPBAumq/j/41XPl96misVVCFnsgkjlDixmYGi8ASJ90eM7TgKTaCohUzVFSyIiZwGS I7/BqDQ3Q8rnB2hvOQUljJ6tlu+ea0YWhoJ4++j1Ldd2MZlSr42o7SpopXpJmBB629yQ IpeA== MIME-Version: 1.0 X-Received: by 10.70.61.106 with SMTP id o10mr2066405pdr.16.1410443663943; Thu, 11 Sep 2014 06:54:23 -0700 (PDT) In-Reply-To: <20140908150135.6bbf5356@pwslap01u.europe.root.pri> References: <20140908150135.6bbf5356@pwslap01u.europe.root.pri> Date: Thu, 11 Sep 2014 10:54:23 -0300 Message-ID: Subject: Re: There is a serious inefficiency in the way zsh handles wildcards From: =?UTF-8?Q?Paulo_C=C3=A9sar_Pereira_de_Andrade?= To: Peter Stephenson Cc: zsh-users@zsh.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2014-09-08 11:01 GMT-03:00 Peter Stephenson : > On Mon, 08 Sep 2014 10:27:48 -0300 > Paulo C=C3=A9sar Pereira de Andrade > wrote: >> ---%<--- >> diff -up zsh-5.0.2/Src/pattern.c.orig zsh-5.0.2/Src/pattern.c >> --- zsh-5.0.2/Src/pattern.c.orig 2014-09-03 12:21:44.673792750 -0300 >> +++ zsh-5.0.2/Src/pattern.c 2014-09-03 12:22:28.069303587 -0300 >> @@ -2911,6 +2911,10 @@ patmatch(Upat prog) >> break; >> case P_STAR: >> /* Handle specially for speed, although really P_ONEHASH+P_ANY */ >> + while (P_OP(next) =3D=3D P_STAR) { >> + scan =3D next; >> + next =3D PATNEXT(scan); >> + } >> case P_ONEHASH: >> case P_TWOHASH: >> /* >> ---%<--- >> >> Do you believe this patch is OK? > > In other words, if we're handling a "*" down in the pattern code --- as > you say, we've already decided higher up if it's the special ** or *** > for directories so there's no problem with those --- we can skip any > immediately following *s because they don't add anything but will provoke > horrifically inefficient recursion. (That's because when backtracking > we keep trying each separate * from each position --- the number of > possibilities is humongous.) > > Yes, that sounds entirely reasonable. It doesn't patch cleanly any > more; I think the following works and I've added a new test (all tests > pass). Just as a note, I noticed that the huge slowdown usually would only happen if there was a match, if there was nothing matching /tmp/*.* (usually a directory or a symlink to one) handling of the pattern would not cause noticeable delay. > diff --git a/Src/pattern.c b/Src/pattern.c > index 94a299e..adc73c1 100644 > --- a/Src/pattern.c > +++ b/Src/pattern.c > @@ -3012,6 +3012,16 @@ patmatch(Upat prog) > break; > case P_STAR: > /* Handle specially for speed, although really P_ONEHASH+P_AN= Y */ > + while (P_OP(next) =3D=3D P_STAR) { > + /* > + * If there's another * following we can optimise it > + * out. Chains of *'s can give pathologically bad > + * performance. > + */ > + scan =3D next; > + next =3D PATNEXT(scan); > + } > + /*FALLTHROUGH*/ > case P_ONEHASH: > case P_TWOHASH: > /* > diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst > index 4697ca4..217ce7c 100644 > --- a/Test/D02glob.ztst > +++ b/Test/D02glob.ztst > @@ -565,3 +565,10 @@ > print $match[1] > 0:(#q) is ignored completely in conditional pattern matching > >fichier > + > +# The following should not cause excessive slowdown. > + print glob.tmp/*.* > + print glob.tmp/**************************.************************* > +0:Optimisation to squeeze multiple *'s used as ordinary glob wildcards. > +>glob.tmp/ra=3D1.0_et=3D3.5 > +>glob.tmp/ra=3D1.0_et=3D3.5 > > > Thanks. > pws Thanks for applying the patch! Paulo