From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7129 invoked by alias); 4 Mar 2016 14:03:51 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: X-Seq: 21350 Received: (qmail 9309 invoked from network); 4 Mar 2016 14:03:50 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-AuditID: cbfec7f5-f79b16d000005389-4e-56d995c25441 Date: Fri, 04 Mar 2016 14:03:42 +0000 From: Peter Stephenson To: Zsh Users Subject: Re: Extended globbing seems to have become much slower in recent versions of Zsh Message-id: <20160304140342.4477e2c1@pwslap01u.europe.root.pri> In-reply-to: References: <160229111212.ZM4272@torch.brasslantern.com> <160301102807.ZM8036@torch.brasslantern.com> <160301160356.ZM10258@torch.brasslantern.com> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: quoted-printable X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrALMWRmVeSWpSXmKPExsVy+t/xq7qHpt4MM9jxRsBix8mVjA6MHqsO fmAKYIzisklJzcksSy3St0vgyviz+yJjwW6+irdNW1gaGK9xdzFyckgImEgcfv+LHcIWk7hw bz1bFyMXh5DAUkaJW2eXMEM405gkXvw4xA7hnGaU+LhyO1iLkMAZRonJ+/NAbBYBVYkb/w4y gdhsAoYSUzfNZgSxRQQUJc78+gYWFxaIlpjwbjEziM0rYC8x/9lq1i5GDg5OgWCJE/28EPO7 WCU6NzxmBanhF9CXuPr3ExPEefYSM6+cYYToFZT4MfkeC4jNLKAuMWneImYIW1viybsLrBC3 qUvcuLubfQKj8CwkLbOQtMxC0rKAkXkVo2hqaXJBcVJ6rpFecWJucWleul5yfu4mRkg4f93B uPSY1SFGAQ5GJR7eGw3Xw4RYE8uKK3MPMUpwMCuJ8L5pvxkmxJuSWFmVWpQfX1Sak1p8iFGa g0VJnHfmrvchQgLpiSWp2ampBalFMFkmDk6pBsZUrkVCD/dnK79p/W+569A9/baLKs4VO+qZ 9VnW7jskJ7ZCoqheQ+jIVZGvokIRJ5cy23yLn8Juu0T+DE99te1TZ7kLa/beZI7Ut+63ydVe NiFcLWKj6NMphUmFX188k2c0n7Cx5yd371xD8Rh5+26L3Fij2u6mc/tNzO9G9jqFLtDbemn5 MiWW4oxEQy3mouJEAEOWHZZjAgAA On Fri, 04 Mar 2016 14:22:23 +0100 Jesper Nyg=C3=A5rds wrote: > I don't know if this is relevant, but I have some more findings. I wanted > to know which sub directory was contributing the most to the amount of ti= me > taken to process the root directory. I then realized that the sum of the > time it took to process each sub directory separately was much lower than > processing the whole root directory at once. In the run below, you can see > that whilst it takes about 36 seconds to process the root directory, it > only takes about 13 seconds to process all directories one at a time. > Furthermore, when I descend into the sub directory taking the longest time > in the first run, and run all its sub directories in sequence, the sum of > these sub-sub directories is significantly lower than for the whole sub > directory. So obviously the processing time is not linear with the number > of files. You'd think that would point towards something higher level than the pattern matching itself, anyway. Brainstorming [this is a euphemism for I haven't the first clue what I'm talking about]... - Memory management associated with multiple directories; could be allocation or heap management. - Poor structuring of globbing code meaning it's repeating operations, i.e. it's not walking trees in a sensible fashion. - Some auxiliary operation associated with the file tree that's giving a similar effect to the foregoing (i.e. we walk the tree itself OK but then repeat some associated operation that should be cached). - Some form of context switching problem [now it's really getting silly] Most of these guesses ought to be amenable to tracing through some sort of profiling suite. Even some fprintfs to check what directories we've handled at what point might help. pws