Extended globbing seems to have become much slower in recent versions of Zsh

zsh-users
 help / color / mirror / code / Atom feed

* Extended globbing seems to have become much slower in recent versions of Zsh
@ 2016-02-28 20:18 Jesper Nygårds
  2016-02-29 19:12 ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Nygårds @ 2016-02-28 20:18 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 1879 bytes --]

I had a feeling that some of my scripts dealing with a large number of
files had become significantly slower lately, but I haven't been sure.
Today I took some time to find out whether or not I was only imagining
things.

The results of my experiments seem to indicate that something has changed
relatively recently with regards to extended globbing. I am using a Mac for
the moment, and I have currently have two versions of Zsh installed: 5.0.5
and 5.2. See below for my test run. As you can see, I do some globbing in a
fairly large directory (2.4 Gb and 231748 files). Notice that with a
complicated glob pattern, assigning matching files to an array takes about
33 seconds under 5.2, whereas the same assignment takes about 6 seconds
under 5.0.5. The speed difference with a less complicated glob (**/*) is
negligable.

Is there some change that could explain this difference, and is there
something to do about it? The difference in speed is so large that it
affects scripts that I run regularly quite a lot.

The test run, I hope it's self explanatory. All commands are run from the
same directory.

% print -l **/* | wc -l
231748

% du -sh .
2.4G .

% /usr/local/bin/zsh --version
zsh 5.2 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB

% date; myfiles=(**/*); date
Sun Feb 28 20:58:12 CET 2016
Sun Feb 28 20:58:27 CET 2016

% date;
myfiles=((#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D));
date
Sun Feb 28 20:58:39 CET 2016
Sun Feb 28 20:59:12 CET 2016

% /bin/zsh --version
zsh 5.0.5 (x86_64-apple-darwin14.0)

% /bin/zsh -f
% setopt EXTENDED_GLOB

% date; myfiles=(**/*); date
Sun Feb 28 21:00:43 CET 2016
Sun Feb 28 21:00:56 CET 2016

% date;
myfiles=((#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D));
date
Sun Feb 28 21:01:04 CET 2016
Sun Feb 28 21:01:10 CET 2016

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-02-28 20:18 Extended globbing seems to have become much slower in recent versions of Zsh Jesper Nygårds
@ 2016-02-29 19:12 ` Bart Schaefer
  2016-03-01 11:39   ` Jesper Nygårds
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-02-29 19:12 UTC (permalink / raw)
  To: Zsh Users

On Feb 28,  9:18pm, Jesper Nygards wrote:
}
} The results of my experiments seem to indicate that something has changed
} relatively recently with regards to extended globbing.

There were a number of changes mostly related to correct handling of
multibyte characters and single bytes in the range 128-255 appearing
in file and directory names.  This means some strings are passing
through the "unmetafication" process more often than they were, but
I can't say for sure this is the source of any slowdown.

It'd be helpful if you re-ran your tests with more variations of the
pattern:

**/*
**/*(-.D)
**/*~(your long list of exclusions)
(#i)**/*~(your long list of exclusions)
**/*~(your long list of exclusions)(-.D)

Also, first declare "float SECONDS" and then use "print $SECONDS" instead
of "date", so we get a more precise measure of where the time is spent.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-02-29 19:12 ` Bart Schaefer
@ 2016-03-01 11:39   ` Jesper Nygårds
  2016-03-01 18:28     ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Nygårds @ 2016-03-01 11:39 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 3404 bytes --]

Here's another run with your supplied patterns. I hope I didn't
misunderstand how you wanted me to use the SECONDS variable.

In any case, as you can see below, the globbing more or less takes the same
time for the simple patterns, but as soon as I introduce the exclusions,
the 5.2 version is around six times slower than the 5.0.5 version.



% du -sh .
2.4G .

% print -l **/*(D) | wc -l
272681

------------------------------------------------------

% /usr/local/bin/zsh --version
zsh 5.2 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f

% pwd
/opt/intyg

% setopt EXTENDED_GLOB
% float SECONDS MYNOW

% MYNOW=$SECONDS; myfiles=(**/*); print $(( $SECONDS - $MYNOW ))
20.477513999999996

% MYNOW=$SECONDS; myfiles=(**/*(-.D)); print $(( $SECONDS - $MYNOW ))
20.415430000000001

% MYNOW=$SECONDS;
myfiles=(**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D));
print $(( $SECONDS - $MYNOW ))
36.53519

% MYNOW=$SECONDS;
myfiles=((#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D));
print $(( $SECONDS - $MYNOW ))
38.467647999999997

% MYNOW=$SECONDS;
myfiles=(**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*));
print $(( $SECONDS - $MYNOW ))
23.871832999999981

MYNOW=$SECONDS;
myfiles=((#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*));
print $(( $SECONDS - $MYNOW ))
23.234887999999955

-------------------------------------------------------

% /bin/zsh --version
zsh 5.0.5 (x86_64-apple-darwin14.0)

% /bin/zsh -f

% pwd
/opt/intyg

% setopt EXTENDED_GLOB
% float SECONDS MYNOW

% MYNOW=$SECONDS; myfiles=(**/*); print $(( $SECONDS - $MYNOW ))
15.603122999999997

% MYNOW=$SECONDS; myfiles=(**/*(-.D)); print $(( $SECONDS - $MYNOW ))
17.826644999999999

% MYNOW=$SECONDS;
myfiles=(**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D));
print $(( $SECONDS - $MYNOW ))
6.5746579999999994

% MYNOW=$SECONDS;
myfiles=((#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D));
print $(( $SECONDS - $MYNOW ))
6.1519139999999766

% MYNOW=$SECONDS;
myfiles=(**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*));
print $(( $SECONDS - $MYNOW ))
3.6434740000000261

% MYNOW=$SECONDS;
myfiles=((#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*));
print $(( $SECONDS - $MYNOW ))
4.0673059999999737


On Mon, Feb 29, 2016 at 8:12 PM, Bart Schaefer <schaefer@brasslantern.com>
wrote:

> On Feb 28,  9:18pm, Jesper Nygards wrote:
> }
> } The results of my experiments seem to indicate that something has changed
> } relatively recently with regards to extended globbing.
>
> There were a number of changes mostly related to correct handling of
> multibyte characters and single bytes in the range 128-255 appearing
> in file and directory names.  This means some strings are passing
> through the "unmetafication" process more often than they were, but
> I can't say for sure this is the source of any slowdown.
>
> It'd be helpful if you re-ran your tests with more variations of the
> pattern:
>
> **/*
> **/*(-.D)
> **/*~(your long list of exclusions)
> (#i)**/*~(your long list of exclusions)
> **/*~(your long list of exclusions)(-.D)
>
> Also, first declare "float SECONDS" and then use "print $SECONDS" instead
> of "date", so we get a more precise measure of where the time is spent.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-01 11:39   ` Jesper Nygårds
@ 2016-03-01 18:28     ` Bart Schaefer
  2016-03-01 19:11       ` Jesper Nygårds
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-03-01 18:28 UTC (permalink / raw)
  To: Zsh Users

On Mar 1, 12:39pm, Jesper Nygards wrote:
} 
} Here's another run with your supplied patterns. I hope I didn't
} misunderstand how you wanted me to use the SECONDS variable.

Close enough.

} In any case, as you can see below, the globbing more or less takes the same
} time for the simple patterns, but as soon as I introduce the exclusions,
} the 5.2 version is around six times slower than the 5.0.5 version.

That tends to point to changes in pattern matching rather than globbing
(related but distinct operations), but can you also try it with

    : pattern-goes-here

instead of

    myfiles=( pattern-goes-here )

??  I'd like to factor out the contributions of memory management for the
array assignment.  I ran a couple of quick local tests and with assignment
there are about 50% more calls to malloc in the current git head than in
5.0.5, but without assignment there are slightly fewer in current.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-01 18:28     ` Bart Schaefer
@ 2016-03-01 19:11       ` Jesper Nygårds
  2016-03-02  0:03         ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Nygårds @ 2016-03-01 19:11 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 2585 bytes --]

New runs with your suggestion:


% /usr/local/bin/zsh --version
zsh 5.2 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
36.463269999999994

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
45.178413000000006

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
26.012812999999994

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
23.646717999999964

-------------------------------------------------------

% /bin/zsh --version
zsh 5.0.5 (x86_64-apple-darwin14.0)

% /bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
13.899865999999989

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
10.239834999999999

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
5.4469640000000084

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
6.2737860000000296


On Tue, Mar 1, 2016 at 7:28 PM, Bart Schaefer <schaefer@brasslantern.com>
wrote:

> On Mar 1, 12:39pm, Jesper Nygards wrote:
> }
> } Here's another run with your supplied patterns. I hope I didn't
> } misunderstand how you wanted me to use the SECONDS variable.
>
> Close enough.
>
> } In any case, as you can see below, the globbing more or less takes the
> same
> } time for the simple patterns, but as soon as I introduce the exclusions,
> } the 5.2 version is around six times slower than the 5.0.5 version.
>
> That tends to point to changes in pattern matching rather than globbing
> (related but distinct operations), but can you also try it with
>
>     : pattern-goes-here
>
> instead of
>
>     myfiles=( pattern-goes-here )
>
> ??  I'd like to factor out the contributions of memory management for the
> array assignment.  I ran a couple of quick local tests and with assignment
> there are about 50% more calls to malloc in the current git head than in
> 5.0.5, but without assignment there are slightly fewer in current.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-01 19:11       ` Jesper Nygårds
@ 2016-03-02  0:03         ` Bart Schaefer
  2016-03-02  8:39           ` Jesper Nygårds
  2016-03-02  9:32           ` Peter Stephenson
  0 siblings, 2 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-03-02  0:03 UTC (permalink / raw)
  To: Zsh Users

On Mar 1,  8:11pm, Jesper Nygards wrote:
}
} New runs with your suggestion:

Well, that's interesting.  Seems to indicate that building argv to pass
to builtins is even more expensive than an array assignment.

Still, I can't find anything on my system that causes as dramatic a
difference as yours from 5.0.5 -> 5.2+.  The (-.D) qualifiers seem to
have the largest effect for me, for you its the exclusion pattern.

For the sake of covering all bases, are you sure the same compiler
optimization flags were applied to the 5.0.5 and 5.2 builds you are
testing?  With your exclusion pattern and about 13,000 files in **/*,
there are about 100,000 calls to strlen() in either version, so any
optimization of that might make a large difference.

I did find a small but observable speedup when compiling without
MULTIBYTE_SUPPORT, too; there were a lot of multibyte repairs to
pattern.c after 5.0.5.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-02  0:03         ` Bart Schaefer
@ 2016-03-02  8:39           ` Jesper Nygårds
  2016-03-03  0:06             ` Bart Schaefer
  2016-03-02  9:32           ` Peter Stephenson
  1 sibling, 1 reply; 22+ messages in thread
From: Jesper Nygårds @ 2016-03-02  8:39 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 4465 bytes --]

Previously I was using the Apple-provided version of 5.0.5, and the
Homebrew version of 5.2. Now I have instead used different versions of
Homebrew zsh, and I have made sure that the compiler flags were the same
through all versions. I have then run the same tests again, with versions
5.0.5, 5.0.6, 5.1 and 5.2.


% /usr/local/bin/zsh --version
zsh 5.0.5 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
21.925625999999998

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
15.809167000000002

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
9.8283599999999893

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
4.0556160000000006

-------------------------------------------------------

% /usr/local/bin/zsh --version
zsh 5.0.6 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
51.819927999999997

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
54.338758999999996

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
35.885299000000003

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
29.573196999999993

-------------------------------------------------------

% /usr/local/bin/zsh --version
zsh 5.1 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
55.068858999999989

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
55.551114999999982

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
31.761401000000035

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
33.984271000000035

-------------------------------------------------------

% /usr/local/bin/zsh --version
zsh 5.2 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
46.693610000000007

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
48.288181999999992

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
31.169990000000013

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
25.161075000000011




On Wed, Mar 2, 2016 at 1:03 AM, Bart Schaefer <schaefer@brasslantern.com>
wrote:

> On Mar 1,  8:11pm, Jesper Nygards wrote:
> }
> } New runs with your suggestion:
>
> Well, that's interesting.  Seems to indicate that building argv to pass
> to builtins is even more expensive than an array assignment.
>
> Still, I can't find anything on my system that causes as dramatic a
> difference as yours from 5.0.5 -> 5.2+.  The (-.D) qualifiers seem to
> have the largest effect for me, for you its the exclusion pattern.
>
> For the sake of covering all bases, are you sure the same compiler
> optimization flags were applied to the 5.0.5 and 5.2 builds you are
> testing?  With your exclusion pattern and about 13,000 files in **/*,
> there are about 100,000 calls to strlen() in either version, so any
> optimization of that might make a large difference.
>
> I did find a small but observable speedup when compiling without
> MULTIBYTE_SUPPORT, too; there were a lot of multibyte repairs to
> pattern.c after 5.0.5.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-02  8:39           ` Jesper Nygårds
@ 2016-03-03  0:06             ` Bart Schaefer
  2016-03-04  8:17               ` Jesper Nygårds
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-03-03  0:06 UTC (permalink / raw)
  To: Zsh Users

On Wed, Mar 2, 2016 at 12:39 AM, Jesper Nygårds
<jesper.nygards@gmail.com> wrote:
> Previously I was using the Apple-provided version of 5.0.5, and the
> Homebrew version of 5.2. Now I have instead used different versions of
> Homebrew zsh, and I have made sure that the compiler flags were the same
> through all versions. I have then run the same tests again, with versions
> 5.0.5, 5.0.6, 5.1 and 5.2.

OK, thanks; next thing to try:  configure --with-stack-allocation

I've scanned through diffs and I just don't see any structural changes
that would account for that slowdown; the above is a bit of a
longshot.

Incidentally (for -workers) there's a typo in ChangeLog -- 32932:
Src/glob.c -- should say Src/subst.c.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-03  0:06             ` Bart Schaefer
@ 2016-03-04  8:17               ` Jesper Nygårds
  2016-03-04 13:22                 ` Jesper Nygårds
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Nygårds @ 2016-03-04  8:17 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 2679 bytes --]

I have run 5.2, compiled with and without the --with-stack-allocation flag.
It made no difference however. You can see that the runs below are very
similar in the measured times.

With "--with-stack-allocation":

% zsh --version
zsh 5.2 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
54.868589

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
47.001608000000019

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
38.805462000000006

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
32.296016000000009

-------------------------------------------------------

Without "--with-stack-allocation":

% /usr/local/bin/zsh --version
zsh 5.2 (x86_64-apple-darwin14.5.0)

% /usr/local/bin/zsh -f
% setopt EXTENDED_GLOB; float SECONDS MYNOW

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
53.115647000000003

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
print $(( $SECONDS - $MYNOW ))
42.988532000000006

% MYNOW=$SECONDS; :
**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
36.619570999999993

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
30.786290000000008


On Thu, Mar 3, 2016 at 1:06 AM, Bart Schaefer <schaefer@brasslantern.com>
wrote:

> On Wed, Mar 2, 2016 at 12:39 AM, Jesper Nygårds
> <jesper.nygards@gmail.com> wrote:
> > Previously I was using the Apple-provided version of 5.0.5, and the
> > Homebrew version of 5.2. Now I have instead used different versions of
> > Homebrew zsh, and I have made sure that the compiler flags were the same
> > through all versions. I have then run the same tests again, with versions
> > 5.0.5, 5.0.6, 5.1 and 5.2.
>
> OK, thanks; next thing to try:  configure --with-stack-allocation
>
> I've scanned through diffs and I just don't see any structural changes
> that would account for that slowdown; the above is a bit of a
> longshot.
>
> Incidentally (for -workers) there's a typo in ChangeLog -- 32932:
> Src/glob.c -- should say Src/subst.c.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-04  8:17               ` Jesper Nygårds
@ 2016-03-04 13:22                 ` Jesper Nygårds
  2016-03-04 14:03                   ` Peter Stephenson
  2016-03-05 17:37                   ` Bart Schaefer
  0 siblings, 2 replies; 22+ messages in thread
From: Jesper Nygårds @ 2016-03-04 13:22 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 5107 bytes --]

I don't know if this is relevant, but I have some more findings. I wanted
to know which sub directory was contributing the most to the amount of time
taken to process the root directory. I then realized that the sum of the
time it took to process each sub directory separately was much lower than
processing the whole root directory at once. In the run below, you can see
that whilst it takes about 36 seconds to process the root directory, it
only takes about 13 seconds to process all directories one at a time.
Furthermore, when I descend into the sub directory taking the longest time
in the first run, and run all its sub directories in sequence, the sum of
these sub-sub directories is significantly lower than for the whole sub
directory. So obviously the processing time is not linear with the number
of files.

% float SECONDS MYNOW;

% MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW ))
35.810789000000113

% for dir in */; do print $dir; cd $dir; MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW )); cd -q - >|/dev/null; done
common-pom/
0.00011700000004566391
common.wiki/
0.00033800000028350041
common/
1.0273979999997209
intygstjanst-konfiguration/
0.00071800000023358734
intygstjanst/
0.08040000000028158
intygstyper/
9.1739539999998669
minaintyg-konfiguration/
0.00063700000009703217
minaintyg/
0.67290499999990061
privatlakarportal-konfiguration/
0.0015870000001996232
privatlakarportal/
0.83380199999965043
schemas/
0.11772999999993772
tools/
0.0098799999996117549
webcert-konfiguration/
0.0018289999998160056
webcert/
1.4324129999999968

% cd intygstyper

% for dir in */; do print $dir; cd $dir; MYNOW=$SECONDS; :
(#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
print $(( $SECONDS - $MYNOW )); cd -q - >|/dev/null; done
fk/
1.9322819999997591
fk7263/
0.37481900000011592
target/
0.17797399999972185
tools/
0.0080309999998462445
ts-bas/
0.31286899999986417
ts-diabetes/
0.22946499999989101
ts-parent/
0.0033199999998032581


On Fri, Mar 4, 2016 at 9:17 AM, Jesper Nygårds <jesper.nygards@gmail.com>
wrote:

> I have run 5.2, compiled with and without the --with-stack-allocation
> flag. It made no difference however. You can see that the runs below are
> very similar in the measured times.
>
> With "--with-stack-allocation":
>
> % zsh --version
> zsh 5.2 (x86_64-apple-darwin14.5.0)
>
> % /usr/local/bin/zsh -f
> % setopt EXTENDED_GLOB; float SECONDS MYNOW
>
> % MYNOW=$SECONDS; :
> **/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
> print $(( $SECONDS - $MYNOW ))
> 54.868589
>
> % MYNOW=$SECONDS; :
> (#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
> print $(( $SECONDS - $MYNOW ))
> 47.001608000000019
>
> % MYNOW=$SECONDS; :
> **/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
> print $(( $SECONDS - $MYNOW ))
> 38.805462000000006
>
> % MYNOW=$SECONDS; :
> (#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
> print $(( $SECONDS - $MYNOW ))
> 32.296016000000009
>
> -------------------------------------------------------
>
> Without "--with-stack-allocation":
>
> % /usr/local/bin/zsh --version
> zsh 5.2 (x86_64-apple-darwin14.5.0)
>
> % /usr/local/bin/zsh -f
> % setopt EXTENDED_GLOB; float SECONDS MYNOW
>
> % MYNOW=$SECONDS; :
> **/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
> print $(( $SECONDS - $MYNOW ))
> 53.115647000000003
>
> % MYNOW=$SECONDS; :
> (#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*)(-.D);
> print $(( $SECONDS - $MYNOW ))
> 42.988532000000006
>
> % MYNOW=$SECONDS; :
> **/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
> print $(( $SECONDS - $MYNOW ))
> 36.619570999999993
>
> % MYNOW=$SECONDS; :
> (#i)**/*~(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*);
> print $(( $SECONDS - $MYNOW ))
> 30.786290000000008
>
>
> On Thu, Mar 3, 2016 at 1:06 AM, Bart Schaefer <schaefer@brasslantern.com>
> wrote:
>
>> On Wed, Mar 2, 2016 at 12:39 AM, Jesper Nygårds
>> <jesper.nygards@gmail.com> wrote:
>> > Previously I was using the Apple-provided version of 5.0.5, and the
>> > Homebrew version of 5.2. Now I have instead used different versions of
>> > Homebrew zsh, and I have made sure that the compiler flags were the same
>> > through all versions. I have then run the same tests again, with
>> versions
>> > 5.0.5, 5.0.6, 5.1 and 5.2.
>>
>> OK, thanks; next thing to try:  configure --with-stack-allocation
>>
>> I've scanned through diffs and I just don't see any structural changes
>> that would account for that slowdown; the above is a bit of a
>> longshot.
>>
>> Incidentally (for -workers) there's a typo in ChangeLog -- 32932:
>> Src/glob.c -- should say Src/subst.c.
>>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-04 13:22                 ` Jesper Nygårds
@ 2016-03-04 14:03                   ` Peter Stephenson
  2016-03-04 14:20                     ` Peter Stephenson
  2016-03-05 17:37                   ` Bart Schaefer
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-03-04 14:03 UTC (permalink / raw)
  To: Zsh Users

On Fri, 04 Mar 2016 14:22:23 +0100
Jesper Nygårds <jesper.nygards@gmail.com> wrote:
> I don't know if this is relevant, but I have some more findings. I wanted
> to know which sub directory was contributing the most to the amount of time
> taken to process the root directory. I then realized that the sum of the
> time it took to process each sub directory separately was much lower than
> processing the whole root directory at once. In the run below, you can see
> that whilst it takes about 36 seconds to process the root directory, it
> only takes about 13 seconds to process all directories one at a time.
> Furthermore, when I descend into the sub directory taking the longest time
> in the first run, and run all its sub directories in sequence, the sum of
> these sub-sub directories is significantly lower than for the whole sub
> directory. So obviously the processing time is not linear with the number
> of files.

You'd think that would point towards something higher level than the
pattern matching itself, anyway.  Brainstorming [this is a euphemism for
I haven't the first clue what I'm talking about]...

- Memory management associated with multiple directories; could be
allocation or heap management.

- Poor structuring of globbing code meaning it's repeating operations,
i.e. it's not walking trees in a sensible fashion.

- Some auxiliary operation associated with the file tree that's
giving a similar effect to the foregoing (i.e. we walk the tree itself OK
but then repeat some associated operation that should be cached).

- Some form of context switching problem [now it's really getting silly]

Most of these guesses ought to be amenable to tracing through some sort
of profiling suite.  Even some fprintfs to check what directories we've
handled at what point might help.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-04 14:03                   ` Peter Stephenson
@ 2016-03-04 14:20                     ` Peter Stephenson
  2016-03-04 21:49                       ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-03-04 14:20 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh Users

On Fri, 4 Mar 2016 14:03:42 +0000
Peter Stephenson <p.stephenson@samsung.com> wrote:
> - Memory management associated with multiple directories; could be
> allocation or heap management.

I'm forgetting it's not simply speed of glob/pattern code, it's speed of
certain forms of glob/pattern code.  So tree structure problems don't
really fit.

Another possible clue here: the exclusion modifiers require additional
allocations when running the platform --- see line 3037 of pattern.c
(there's another one later).  This is currently permanently rather than
heap allocated.  It's easy to change to heap, but I wouldn't suggest
that --- with any kind of complicated backtracking this would eat memory
like there's no tomorrow.

It would be possible to use local pools of memory for this, however, so
it didn't get allocated and freed every single time it was needed.  I
think we only need one allocation for each occurrence of an exclusion,
and usually there are very few in a pattern, so this isn't hard to
manage. This might be an optimisation worth having anyway.

Although this hasn't changed significantly since I put the pattern code
in roughly its current form *many* years ago, 1999 apparently, it's
hairy enough that it's quite possible it could trip over pathologies in
interacting with the system and libraries.

(So brainstorming wasn't quite so futile after all...)

pws

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-04 14:20                     ` Peter Stephenson
@ 2016-03-04 21:49                       ` Peter Stephenson
  2016-03-05 17:47                         ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-03-04 21:49 UTC (permalink / raw)
  To: Zsh Users

On Fri, 04 Mar 2016 14:20:46 +0000
Peter Stephenson <p.stephenson@samsung.com> wrote:
> Another possible clue here: the exclusion modifiers require additional
> allocations when running the platform --- see line 3037 of pattern.c
> (there's another one later).  This is currently permanently rather than
> heap allocated.  It's easy to change to heap, but I wouldn't suggest
> that --- with any kind of complicated backtracking this would eat memory
> like there's no tomorrow.
> 
> It would be possible to use local pools of memory for this, however, so
> it didn't get allocated and freed every single time it was needed.  I
> think we only need one allocation for each occurrence of an exclusion,
> and usually there are very few in a pattern, so this isn't hard to
> manage. This might be an optimisation worth having anyway.

This is what I meant.  It may not be related to the slow down in
question, but it should speed up repeated use of exclusion patterns in
any case.

If you have DEBUG turned on you get an output for the maximum number of
pool entries that were in use at one time in the globbing tests, which
is currently 4.  That's a measure of how complcated repeated patterns
with exclusion that required backtracking became.  4 suggests the pool
method is eminently practical.

I ran the globtests under valgrind and it reported no errors.

Follow-ups on the details of this patch should go to zsh-workers.

diff --git a/Src/glob.c b/Src/glob.c
index 2051016..3aa5a3c 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -1827,11 +1827,14 @@ zglob(LinkList list, LinkNode np, int nountok)
 					 sizeof(struct gmatch));
     matchct = 0;
     pattrystart();
+    patopt_glob_use();
 
     /* The actual processing takes place here: matches go into  *
      * matchbuf.  This is the only top-level call to scanner(). */
     scanner(q, shortcircuit);
 
+    patopt_glob_release();
+
     /* Deal with failures to match depending on options */
     if (matchct)
 	badcshglob |= 2;	/* at least one cmd. line expansion O.K. */
diff --git a/Src/pattern.c b/Src/pattern.c
index 72c7d97..348779c 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -73,7 +73,7 @@
 
 /*
  * The following union is used mostly for alignment purposes.
- * Normal nodes are longs, while certain nodes take a char * as an argument;
+ * Normal nodes are longs, while certain nodes take a pointer as an argument;
  * here we make sure that they both work out to the same length.
  * The compiled regexp we construct consists of upats stuck together;
  * anything else to be added (strings, numbers) is stuck after and
@@ -84,6 +84,7 @@
 union upat {
     long l;
     unsigned char *p;
+    Patopt_ref ref;
 };
 
 typedef union upat *Upat;
@@ -1852,10 +1853,129 @@ static void patoptail(long p, long val)
 
 
 /*
+ * Optimisation of memory allocation for pattern exclusion.
+ * This is to reduce the amount of allocation we do when
+ * running a platform.  We'll keep a pool available for
+ * this purpose.  We also optimise so that we never free
+ * or reduce string sizes across globbing for a particular
+ * pattern as this will have similar uses multiple times.
+ */
+
+/* The size of the pool */
+#define PATOPT_MEM_POOL_SIZE	(16)
+/* The maximum length of entries in the pool we'll retain normally */
+#define PATOPT_MEM_MAX_LEN      (128)
+
+/* Flag that we are globbing and should keep memory */
+static int patopt_glob;
+/* The pool */
+struct patopt_mem patopt_pool[PATOPT_MEM_POOL_SIZE];
+#ifdef DEBUG
+/* The most pool entries we've usd at once */
+static int patopt_pool_max_used;
+#endif
+
+static void
+patopt_free_excess(void)
+{
+    int iref;
+    Patopt_mem entry;
+    for (iref = 0, entry = patopt_pool;
+	 iref < PATOPT_MEM_POOL_SIZE;
+	 iref++, entry++) {
+	if (!entry->in_use 	/* wow, that's paranoid */
+	    && entry->len > PATOPT_MEM_MAX_LEN)
+	{
+	    free(entry->mem.ptr);
+	    entry->mem.ptr = NULL;
+	    entry->len = 0;
+	}
+    }
+}
+
+/**/
+mod_export void
+patopt_glob_use(void)
+{
+    patopt_glob++;
+}
+
+/**/
+mod_export void
+patopt_glob_release(void)
+{
+    if (!--patopt_glob)
+	patopt_free_excess();
+    DPUTS(patopt_glob < 0, "unmatched patopt_glob_release call");
+}
+
+/**/
+static Patopt_ref
+patopt_mem_alloc(size_t allocsize)
+{
+    /*
+     * The strategy here is we find the first unused entry and if
+     * it's too small we resize it immediately, on the basis that
+     * there will be many calls requiring the same (maximum) allocation
+     * size.
+     */
+    int iref;
+    Patopt_mem entry;
+    Patopt_ref ref;
+    for (iref = 0, entry = patopt_pool;
+	 iref < PATOPT_MEM_POOL_SIZE;
+	 iref++, entry++) {
+	if (!entry->in_use) {
+	    /* Grab  this one */
+	    entry->in_use = 1;
+	    entry->mem.ref = iref;
+	    if (entry->len < allocsize) {
+		/* Handles null */
+		entry->mem.ptr = (unsigned char *)
+		    zrealloc(entry->mem.ptr, allocsize);
+		entry->len = allocsize;
+	    }
+#ifdef DEBUG
+	    if (iref+1 > patopt_pool_max_used) {
+		patopt_pool_max_used = iref + 1;
+		setiparam("ZSH_PATOPT_POOL_MAX_USED", patopt_pool_max_used);
+	    }
+#endif
+	    return &entry->mem;
+	}
+    }
+    /*
+     * Overflowed pool, just allocate normally, except we need
+     * the reference structure as well which we'll tack on the
+     * front.
+     */
+    ref = (Patopt_ref)zalloc(sizeof(struct patopt_ref) + allocsize);
+    ref->ptr = (unsigned char *)ref + sizeof(struct patopt_ref);
+    ref->ref = -1;
+    return ref;
+}
+
+
+static void
+patopt_mem_free(Patopt_ref ref)
+{
+    Patopt_mem entry;
+    if (ref->ref < 0) {
+	free((void *)ref);
+	return;
+    }
+    entry = patopt_pool + ref->ref;
+    entry->in_use = 0;
+}
+
+/*
  * Run a pattern.
+ *
+ * P.S. this code is not re-entrant...
  */
 static char *patinstart;	/* Start of input string */
 static char *patinend;		/* End of input string */
+static int   patintotlen;       /* Full length of input string */
 static char *patinput;		/* String input pointer */
 static char *patinpath;		/* Full path for use with ~ exclusions */
 static int   patinlen;		/* Length of last successful match.
@@ -2291,6 +2411,7 @@ pattryrefs(Patprog prog, char *string, int stringlen, int unmetalenin,
 
     patflags = prog->flags;
     patinend = patinstart + stringlen;
+    patintotlen = stringlen;
     /*
      * From now on we do not require NULL termination of
      * the test string.  There should also be no more references
@@ -2302,7 +2423,7 @@ pattryrefs(Patprog prog, char *string, int stringlen, int unmetalenin,
 	 * Either we are testing against a pure string,
 	 * or we can match anything at all.
 	 */
-	int ret, pstrlen;
+	int pstrlen;
 	char *pstr;
 	if (patstralloc->alloced)
 	{
@@ -2404,8 +2525,6 @@ pattryrefs(Patprog prog, char *string, int stringlen, int unmetalenin,
 		}
 	    }
 	}
-
-	return ret;
     } else {
 	int q = queue_signal_level();
 
@@ -2456,6 +2575,7 @@ pattryrefs(Patprog prog, char *string, int stringlen, int unmetalenin,
 
 	dont_queue_signals();
 
+	patopt_glob_use();
 	if (patmatch((Upat)progstr)) {
 	    /*
 	     * we were lazy and didn't save the globflags if an exclusion
@@ -2594,11 +2714,11 @@ pattryrefs(Patprog prog, char *string, int stringlen, int unmetalenin,
 	    ret = 1;
 	} else
 	    ret = 0;
+	patopt_glob_release();
 
 	restore_queue_signals(q);
-
-	return ret;
     }
+    return ret;
 }
 
 /*
@@ -2947,9 +3067,9 @@ patmatch(Upat prog)
 		after = P_OPERAND(scan);
 		DPUTS(!P_ISEXCLUDE(after),
 		      "BUG: EXCSYNC not followed by EXCLUDE.");
-		DPUTS(!P_OPERAND(after)->p,
+		DPUTS(!P_OPERAND(after)->ref,
 		      "BUG: EXCSYNC not handled by EXCLUDE");
-		syncptr = P_OPERAND(after)->p + (patinput - patinstart);
+		syncptr = P_OPERAND(after)->ref->ptr + (patinput - patinstart);
 		/*
 		 * If we already matched from here, this time we fail.
 		 * See WBRANCH code for story about error count.
@@ -3018,7 +3138,7 @@ patmatch(Upat prog)
 			 */
 			Upat syncstrp;
 			char *origpatinend;
-			unsigned char *oldsyncstr;
+			Patopt_ref oldsyncstr;
 			char *matchpt = NULL;
 			int ret, savglobdots, matchederrs = 0;
 			int savparsfound = parsfound;
@@ -3032,9 +3152,15 @@ patmatch(Upat prog)
 			 * (foo~bar)(foo~bar)... from the exclusion point
 			 * of view, so we use a different sync string.
 			 */
-			oldsyncstr = syncstrp->p;
-			syncstrp->p = (unsigned char *)
-			    zshcalloc((patinend - patinstart) + 1);
+			oldsyncstr = syncstrp->ref;
+			/*
+			 * Allocate maximum length we could need, in
+			 * case we do this repeatedly, so we don't
+			 * to reallocate.
+			 */
+			syncstrp->ref = patopt_mem_alloc(patintotlen+1);
+			memset(syncstrp->ref->ptr, 0,
+			       (patinend - patinstart) + 1);
 			origpatinend = patinend;
 			while ((ret = patmatch(P_OPERAND(scan)))) {
 			    unsigned char *syncpt;
@@ -3054,9 +3180,11 @@ patmatch(Upat prog)
 			     * possibilities for approximation have been
 			     * checked.)
 			     */
-			    for (syncpt = syncstrp->p; !*syncpt; syncpt++)
+			    for (syncpt = syncstrp->ref->ptr;
+				 !*syncpt;
+				 syncpt++)
 				;
-			    synclen = syncpt - syncstrp->p;
+			    synclen = syncpt - syncstrp->ref->ptr;
 			    if (patinstart + synclen != patinend) {
 				/*
 				 * Temporarily mark the string as
@@ -3129,9 +3257,8 @@ patmatch(Upat prog)
 			    patglobflags = savglobflags;
 			    errsfound = saverrsfound;
 			}
-			zfree((char *)syncstrp->p,
-			      (patinend - patinstart) + 1);
-			syncstrp->p = oldsyncstr;
+			patopt_mem_free(syncstrp->ref);
+			syncstrp->ref = oldsyncstr;
 			if (ret) {
 			    patinput = matchpt;
 			    errsfound = matchederrs;
@@ -3141,7 +3268,8 @@ patmatch(Upat prog)
 			       P_ISEXCLUDE(scan))
 			    ;
 		    } else {
-			int ret = 1, pfree = 0;
+			int ret = 1;
+			Patopt_ref pfree = NULL;
 			Upat ptrp = NULL;
 			unsigned char *ptr;
 			if (P_OP(scan) == P_WBRANCH) {
@@ -3162,12 +3290,13 @@ patmatch(Upat prog)
 			     */
 			    opnd = P_OPERAND(scan);
 			    ptrp = opnd++;
-			    if (!ptrp->p) {
-				ptrp->p = (unsigned char *)
-				    zshcalloc((patinend - patinstart) + 1);
-				pfree = 1;
+			    if (!ptrp->ref) {
+				pfree = ptrp->ref =
+				    patopt_mem_alloc(patintotlen+1);
+				memset(pfree->ptr, 0,
+				       (patinend - patinstart) + 1);
 			    }
-			    ptr = ptrp->p + (patinput - patinstart);
+			    ptr = ptrp->ref->ptr + (patinput - patinstart);
 
 			    /*
 			     * Without approximation, this is just a
@@ -3188,9 +3317,8 @@ patmatch(Upat prog)
 			if (ret)
 			    ret = patmatch(opnd);
 			if (pfree) {
-			    zfree((char *)ptrp->p,
-				  (patinend - patinstart) + 1);
-			    ptrp->p = NULL;
+			    patopt_mem_free(pfree);
+			    ptrp->ref = NULL;
 			}
 			if (ret)
 			    return 1;
diff --git a/Src/zsh.h b/Src/zsh.h
index eee31da..34bdc50 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1613,6 +1613,38 @@ typedef struct zpc_disables_save *Zpc_disables_save;
 #define PP_UNKWN  20
 /* Range: token followed by the (possibly multibyte) start and end */
 #define PP_RANGE  21
+/*
+ * A reference to memory that may be from the pool or not.
+ * For convenience in use it contains the reference to say where it
+ * has come from as well as the memory.
+ *
+ * If this was from the pool, ptr is allocated from zalloc().
+ *
+ * If this was not from the pool, ref is -1, the structure is
+ * allocated form zalloc() with ptr pointing sizeof(the structure)
+ * bytes into that chunk.
+ */
+
+/**/
+struct patopt_ref
+{
+    /* Reference to a pool entry, or -1 if from zalloc() */
+    int ref;
+    /*
+     * Pointer to the memory.
+     */
+    unsigned char *ptr;
+};
+typedef struct patopt_ref *Patopt_ref;
+
+/* A chunk of emory from the pool */
+struct patopt_mem
+{
+    size_t len;
+    struct patopt_ref mem;
+    int in_use;
+};
+typedef struct patopt_mem *Patopt_mem;
 
 /*
  * Argument to get_match_ret() in glob.c
diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst
index a6b704a..76209a0 100644
--- a/Test/D02glob.ztst
+++ b/Test/D02glob.ztst
@@ -7,7 +7,11 @@
   : >glob.tmp/{,{dir1,dir2}/}{a,b,c}
 
   globtest () {
-    $ZTST_testdir/../Src/zsh -f $ZTST_srcdir/../Misc/$1
+    $ZTST_testdir/../Src/zsh -fc "
+      . $ZTST_srcdir/../Misc/$1
+      if [[ -n \$ZSH_PATOPT_POOL_MAX_USED ]]; then
+        print -u $ZTST_fd \"Max patopt_pool use: \$ZSH_PATOPT_POOL_MAX_USED\"
+      fi"
   }
 
   regress_absolute_path_and_core_dump() {


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-04 21:49                       ` Peter Stephenson
@ 2016-03-05 17:47                         ` Bart Schaefer
  2016-03-06 18:10                           ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-03-05 17:47 UTC (permalink / raw)
  To: Zsh Users

On Mar 4,  9:49pm, Peter Stephenson wrote:
} Subject: Re: Extended globbing seems to have become much slower in recent 
}
} This is what I meant.  It may not be related to the slow down in
} question, but it should speed up repeated use of exclusion patterns in
} any case.

Sadly, on my desktop this actually SLOWS DOWN the process even further.
The below is for a glob of 13,000 files (largest/deepest tree I have
handy) with "repeat 3" and Jesper's pattern:

zsh-5.0.5: 1.35s user 0.38s system 99% cpu 1.738 total
zsh-5.0.5: 1.17s user 0.21s system 98% cpu 1.394 total
zsh-5.0.5: 1.29s user 0.19s system 99% cpu 1.481 total

Src/zsh: 1.44s user 0.21s system 99% cpu 1.657 total
Src/zsh: 1.58s user 0.25s system 99% cpu 1.847 total
Src/zsh: 1.45s user 0.23s system 99% cpu 1.689 total

Src/zsh+u21352: 1.63s user 0.26s system 99% cpu 1.896 total
Src/zsh+u21352: 1.43s user 0.25s system 99% cpu 1.692 total
Src/zsh+u21352: 1.68s user 0.31s system 99% cpu 1.990 total

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-05 17:47                         ` Bart Schaefer
@ 2016-03-06 18:10                           ` Peter Stephenson
  2016-03-07  9:59                             ` Jesper Nygårds
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-03-06 18:10 UTC (permalink / raw)
  To: Zsh Users

On Sat, 5 Mar 2016 09:47:23 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mar 4,  9:49pm, Peter Stephenson wrote:
> } Subject: Re: Extended globbing seems to have become much slower in recent 
> }
> } This is what I meant.  It may not be related to the slow down in
> } question, but it should speed up repeated use of exclusion patterns in
> } any case.
> 
> Sadly, on my desktop this actually SLOWS DOWN the process even further.
> The below is for a glob of 13,000 files (largest/deepest tree I have
> handy) with "repeat 3" and Jesper's pattern:

That suggests either I've made a mistake in the optimisation portion
(which I didn't explicit test, only that it gave the correct behaviour),
or the repeated allocation ad nfreeing of similar length strings is
close to optimal in allocation already... or a combination.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-06 18:10                           ` Peter Stephenson
@ 2016-03-07  9:59                             ` Jesper Nygårds
  2016-03-07 10:15                               ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Nygårds @ 2016-03-07  9:59 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

I see no real difference with or without Peter's patch, I'm afraid.

I will incorporate the suggestions from Bart in my solution. But I feel I
haven't explained very well (or at all) what I'm trying to achieve. I use
this kind of globbing in a sort of "recursive grep" which I use
interactively to grep in my project's codebase. The inputs to the script
include both patterns that I search for and patterns that I want to
exclude. Up until now, I have composed the list of exclusions from the
input without separating ordinary files from directories, but Bart's tip
made me aware that such a separation would be worth doing to make it faster.

As a side note, since I have implemented this in zsh, I followed what I
thought was "zsh best practice", and composed this rather complicated file
list with a zsh glob expression, instead of relying on an external command
such as gnu "find". But after I discovered this bottle neck in my scripts,
I have experimented with instead letting "find" find the files, and indeed
that seems a whole lot faster. I mention that just to be sure that I am not
causing you to waste your time if my use case is so extreme that it's not
worth bothering about its performance.

On Sun, Mar 6, 2016 at 7:10 PM, Peter Stephenson <
p.w.stephenson@ntlworld.com> wrote:

> On Sat, 5 Mar 2016 09:47:23 -0800
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > On Mar 4,  9:49pm, Peter Stephenson wrote:
> > } Subject: Re: Extended globbing seems to have become much slower in
> recent
> > }
> > } This is what I meant.  It may not be related to the slow down in
> > } question, but it should speed up repeated use of exclusion patterns in
> > } any case.
> >
> > Sadly, on my desktop this actually SLOWS DOWN the process even further.
> > The below is for a glob of 13,000 files (largest/deepest tree I have
> > handy) with "repeat 3" and Jesper's pattern:
>
> That suggests either I've made a mistake in the optimisation portion
> (which I didn't explicit test, only that it gave the correct behaviour),
> or the repeated allocation ad nfreeing of similar length strings is
> close to optimal in allocation already... or a combination.
>
> pws
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-07  9:59                             ` Jesper Nygårds
@ 2016-03-07 10:15                               ` Peter Stephenson
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Stephenson @ 2016-03-07 10:15 UTC (permalink / raw)
  To: Zsh Users

On Mon, 07 Mar 2016 10:59:31 +0100
Jesper Nygårds <jesper.nygards@gmail.com> wrote:
> As a side note, since I have implemented this in zsh, I followed what I
> thought was "zsh best practice", and composed this rather complicated file
> list with a zsh glob expression, instead of relying on an external command
> such as gnu "find". But after I discovered this bottle neck in my scripts,
> I have experimented with instead letting "find" find the files, and indeed
> that seems a whole lot faster. I mention that just to be sure that I am not
> causing you to waste your time if my use case is so extreme that it's not
> worth bothering about its performance.

"find" is much more optimised for, well, finding files, rather than
doing general globbing / pattern matching tasks, so I think that makes
perfect sense if that's what your basic job is.  I often do that myself
for a very deep tree.

However, that doesn't mean everything on the zsh side is as good as it
can be.

I won't commit my patch unless I can find some way it's clearly not
doing what I intended that I can fix --- but it's useful to know the way
it's done at the moment isn't obviously causing problems as I've always
wondered.

pws

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-04 13:22                 ` Jesper Nygårds
  2016-03-04 14:03                   ` Peter Stephenson
@ 2016-03-05 17:37                   ` Bart Schaefer
  2016-03-06 17:31                     ` Bart Schaefer
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-03-05 17:37 UTC (permalink / raw)
  To: Zsh Users

On Mar 4,  2:22pm, Jesper Nygards wrote:
}
} I don't know if this is relevant, but I have some more findings. I wanted
} to know which sub directory was contributing the most to the amount of time
} taken to process the root directory. I then realized that the sum of the
} time it took to process each sub directory separately was much lower than
} processing the whole root directory at once.  [...]
} So obviously the processing time is not linear with the number
} of files.

Since exclusions are applied after globbing is completed, the processing
time for exclusions is proportional to the sum of the lengths of all the
file paths found by the recursive glob.  If you glob individual sub-
directories, you're not excluding based on the names of the container
directories.

E.g. you should find that

x=(**/*)
y=(${x:#(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*))

takes about the same amount of time as your **/*~pat formulation, and

x=(**/*(:t))
y=(${x:#(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*))

takes about the same amount of time as your loop over subdirs.

I still don't see a reason why pattern matching would have slowed down.

On the other hand, your pattern could be a lot more precise.  You have it
starting with (|*/) and ending with (|/*) which means that you do want to
exclude some directories, but obviously not all of the strings in the
middle are directory names.  If I were to guess that the directory names
are build, target, node, and node_modules, then things might go a lot
faster if you used

**/^(.idea|*.min.js|*.iml|TAGS)~(|*/)(build|target|node|node_modules)(|/*)

and even faster with

skipdirs() { reply=( ${REPLY%((build|target|node|node_modules)(|/*)} ) }

**/^(.idea|*.min.js|*.iml|TAGS)(+skipdirs)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-05 17:37                   ` Bart Schaefer
@ 2016-03-06 17:31                     ` Bart Schaefer
  2016-03-21 23:05                       ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-03-06 17:31 UTC (permalink / raw)
  To: Zsh Users

On Mar 5,  9:37am, Bart Schaefer wrote:
}
} E.g. you should find that
} 
} x=(**/*)
} y=(${x:#(|*/)(build|target|node|node_modules|.idea|*.min.js|*.iml|TAGS)(|/*))
} 
} takes about the same amount of time as your **/*~pat formulation

Curiously, the ${x:#...} construct is considerably faster than **/*~(...)
for me, even on 5.0.5.

I suspected briefly that signal handling changes (so that long-runing
recursive globs are interruptible) might be having an effect, but the
same code path through the signal queuing should be taken with both the
glob exclusion and the parameter substitution.

As I've mentioned, on my Linux desktop the majority of time is spent
in resolving the (-.D) qualifier, but that's partly because the (-)
is pulling in a new set of subdirectories so more files are found.

I'm beginning to think we need some kind of C-code-level profiling tool
like gprof to tell us where the time is being spent.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-06 17:31                     ` Bart Schaefer
@ 2016-03-21 23:05                       ` Bart Schaefer
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-03-21 23:05 UTC (permalink / raw)
  To: Zsh Users

On Mar 6,  9:31am, Bart Schaefer wrote:
}
} I suspected briefly that signal handling changes (so that long-runing
} recursive globs are interruptible) might be having an effect, but the
} same code path through the signal queuing should be taken with both the
} glob exclusion and the parameter substitution.

Just in case ... if workers/38188 has an effect on this (in either
direction) it might be useful to know that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-02  0:03         ` Bart Schaefer
  2016-03-02  8:39           ` Jesper Nygårds
@ 2016-03-02  9:32           ` Peter Stephenson
  2016-03-02 23:46             ` Bart Schaefer
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-03-02  9:32 UTC (permalink / raw)
  To: Zsh Users

On Tue, 01 Mar 2016 16:03:56 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mar 1,  8:11pm, Jesper Nygards wrote:
> }
> } New runs with your suggestion:
> 
> Well, that's interesting.  Seems to indicate that building argv to pass
> to builtins is even more expensive than an array assignment.
> 
> Still, I can't find anything on my system that causes as dramatic a
> difference as yours from 5.0.5 -> 5.2+.  The (-.D) qualifiers seem to
> have the largest effect for me, for you its the exclusion pattern.
>...
> I did find a small but observable speedup when compiling without
> MULTIBYTE_SUPPORT, too; there were a lot of multibyte repairs to
> pattern.c after 5.0.5.

There was a fairly recent optimsiation to handle patterns by metafying
earlier in the case of parameter substitutions.  Globbing shouldn't have
been substantially affected although the order of operations is a bit
different.  I don't see how this could be relevant, though.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Extended globbing seems to have become much slower in recent versions of Zsh
  2016-03-02  9:32           ` Peter Stephenson
@ 2016-03-02 23:46             ` Bart Schaefer
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-03-02 23:46 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh Users

On Wed, Mar 2, 2016 at 1:32 AM, Peter Stephenson
<p.stephenson@samsung.com> wrote:
>
> There was a fairly recent optimsiation to handle patterns by metafying
> earlier in the case of parameter substitutions.  Globbing shouldn't have
> been substantially affected although the order of operations is a bit
> different.  I don't see how this could be relevant, though.

That change also pre-dates the 5.2 release, so may explain the
improvement from 5.1 to 5.2 but not the slowdown from 5.0.5 to 5.0.6.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-03-21 23:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-28 20:18 Extended globbing seems to have become much slower in recent versions of Zsh Jesper Nygårds
2016-02-29 19:12 ` Bart Schaefer
2016-03-01 11:39   ` Jesper Nygårds
2016-03-01 18:28     ` Bart Schaefer
2016-03-01 19:11       ` Jesper Nygårds
2016-03-02  0:03         ` Bart Schaefer
2016-03-02  8:39           ` Jesper Nygårds
2016-03-03  0:06             ` Bart Schaefer
2016-03-04  8:17               ` Jesper Nygårds
2016-03-04 13:22                 ` Jesper Nygårds
2016-03-04 14:03                   ` Peter Stephenson
2016-03-04 14:20                     ` Peter Stephenson
2016-03-04 21:49                       ` Peter Stephenson
2016-03-05 17:47                         ` Bart Schaefer
2016-03-06 18:10                           ` Peter Stephenson
2016-03-07  9:59                             ` Jesper Nygårds
2016-03-07 10:15                               ` Peter Stephenson
2016-03-05 17:37                   ` Bart Schaefer
2016-03-06 17:31                     ` Bart Schaefer
2016-03-21 23:05                       ` Bart Schaefer
2016-03-02  9:32           ` Peter Stephenson
2016-03-02 23:46             ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).