* Slowdown around 5.0.5-dev-0
@ 2015-10-10 10:54 Sebastian Gniazdowski
2015-10-10 17:58 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-10 10:54 UTC (permalink / raw)
To: zsh-workers
Hello,
I was testing my scripts and saw that on 5.0.2 having 89k-elements
long is much faster than on 5.1.1-dev-0. I ran git bisect and found
the commit:
2014-01-18 23f98c3 32285: restart the fheap search in freeheap if the
current fheap arena is about to be discarded; fixes crash
I documented the behavior on video:
https://youtu.be/Crfoh05eWKQ
On good commit, the list works fast, with full possible speed, while
on bad commit moving one row up or down takes 1 second or more.
--- more details ---
The good commit can also slow down but always digs itself out of the
"crisis". The bad commit is always slow, and never digs out.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-10 10:54 Slowdown around 5.0.5-dev-0 Sebastian Gniazdowski
@ 2015-10-10 17:58 ` Bart Schaefer
2015-10-10 18:11 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-10 17:58 UTC (permalink / raw)
To: zsh-workers
On Oct 10, 12:54pm, Sebastian Gniazdowski wrote:
} Subject: Slowdown around 5.0.5-dev-0
}
} I was testing my scripts and saw that on 5.0.2 having 89k-elements
} long is much faster than on 5.1.1-dev-0. I ran git bisect and found
} the commit:
}
} 2014-01-18 23f98c3 32285: restart the fheap search in freeheap if the
} current fheap arena is about to be discarded; fixes crash
If you look back through the zsh-workers articles, you'll find that
the performance was even worse prior to workers/29175; there was a
huge gain in performance from that patch, but it introduced a crash
bug, so we had to give back some of the performance in workers/32285
to prevent the crash from happening.
Read the large comment in freeheap() for a description of what is
going on there, note particularly the paragraph ending "performance
goes to hell."
I fear a signficant rewrite of the heap-allocation code would be needed
to both avoid the crash and improve the performance. The current code
was designed with the expectation that zhalloc() would be a much more
common operation than freeheap(), but there are circumstances in which
the latter happens a lot.
It's possible that we could figure out a way to skip the scan that
updates all the h->used values in freeheap(), and pay the price on the
next zhalloc() instead.
I think (but am not 100% sure) that the pathological case happens when
the heap is already maximally filled, so a new arena is always added
to the end of the chain and then dropped again on each alloc/free.
On the other hand the shell really isn't intended to be manipulating
arrays of containing tens of thousands of elements.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-10 17:58 ` Bart Schaefer
@ 2015-10-10 18:11 ` Sebastian Gniazdowski
2015-10-10 18:32 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-10 18:11 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
On 10 October 2015 at 19:58, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On the other hand the shell really isn't intended to be manipulating
> arrays of containing tens of thousands of elements.
I was afraid that this will be stated, by you as the author of the
patch. It's just such a great thing that patterns are so fast, faster
than regular expressions (from zsh/regex). And that they are so
advanced. I can do fancy things like filtering for multiple words with
*foo*~^*bar*, colorizing output by using (#m) which is faster than
(#b) and provided for someone aware to utilize that gain and
functionality. What I mean is that we shouldn't now slip into "shell
isn't for that, anyway" after providing so much.
PS. Not everything is so great, I was successfully optimizing my code
by changing if[] into [] && || in crucial places. Removing variables
as even a single assignment seems to cost much. But overall there's
plenty in speed and functionality.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-10 18:11 ` Sebastian Gniazdowski
@ 2015-10-10 18:32 ` Sebastian Gniazdowski
2015-10-11 0:06 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-10 18:32 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
As for the code, it's interesting that it is just existence of large
array that slows down things. Just before spotting the >=5.0.5
slowdown I wrote an optimization:
if [ "$prev_start_idx" -ne "$NLIST_FROM_WHAT_IDX_LIST_IS_SHOWN" ]; then
prev_start_idx="$NLIST_FROM_WHAT_IDX_LIST_IS_SHOWN"
disp_list=(
"${(@)col_list[NLIST_FROM_WHAT_IDX_LIST_IS_SHOWN, end_idx]}" )
fi
It creates a small array disp_list that is used for display. 23
elements in my case. The large 89k array "sleeps" waiting for change
of page or some other trigger (details not important). The patch
helped for 5.0.2 (except the possible temporal "dig" into slowness)
but for >= 5.0.5 it helps just a little, while it should help
completely, as everything is around 23 elements long array.
Best regards,
Sebastian Gniazdowski
On 10 October 2015 at 20:11, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 10 October 2015 at 19:58, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> On the other hand the shell really isn't intended to be manipulating
>> arrays of containing tens of thousands of elements.
>
> I was afraid that this will be stated, by you as the author of the
> patch. It's just such a great thing that patterns are so fast, faster
> than regular expressions (from zsh/regex). And that they are so
> advanced. I can do fancy things like filtering for multiple words with
> *foo*~^*bar*, colorizing output by using (#m) which is faster than
> (#b) and provided for someone aware to utilize that gain and
> functionality. What I mean is that we shouldn't now slip into "shell
> isn't for that, anyway" after providing so much.
>
> PS. Not everything is so great, I was successfully optimizing my code
> by changing if[] into [] && || in crucial places. Removing variables
> as even a single assignment seems to cost much. But overall there's
> plenty in speed and functionality.
>
> Best regards,
> Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-10 18:32 ` Sebastian Gniazdowski
@ 2015-10-11 0:06 ` Bart Schaefer
2015-10-11 6:20 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 0:06 UTC (permalink / raw)
To: zsh-workers
On Oct 10, 8:32pm, Sebastian Gniazdowski wrote:
}
} As for the code, it's interesting that it is just existence of large
} array that slows down things.
To the best of my understanding (not having written the original code),
the zsh heap works like this:
The "heaps" and "fheap" pointers start out NULL.
The first zhalloc() creates a block of memory either of the "arena size"
or the size requested, whichever is larger. This block has a header
that contains a pointer to another block (initially NULL), and a counter
of the amount of allocated space in the current block.
On subsequent zhalloc(), either a pointer to the end of allocated
space in the existing block is returned (if sufficient space), or a
new block is allocated and linked through the pointer in in the header
of the existing block. Thus a linked list of partially-filled blocks
may be built up.
Every zhalloc() therefore looks for space in the first block it can
find that has enough. As an optimization, the "fheap" pointer is left
pointed at the earliest block that has any free space, so full blocks
can be skipped by the search at allocation time. However, if the only
block with available space is the last (most recently allocated) one,
fheap becomes invalid when that block is freed.
On pushheap(), a linked list of free space positions (struct heapstack)
is created for each existing block in the heap. The effect is that
every block is its own stack of heaps; there isn't a single global
stack of heaps. On popheap() or freeheap(), the list of free space
positions is traversed and the free pointer in each heap block is
reset to its position at the time it was pushed onto the stack.
It is this reset traversal that causes the slowdown. If all the existing
heap blocks are completely filled before the pushheap() occurs, fheap is
invalidated on every free, and then we spend a lot of time uselessly
resetting the free space pointers in those existing blocks.
(We also invalidate fheap any time we temporarily switch to a new heap,
which currently happens in completion, so completion operations will
invoke additional scans for arenas with free space.)
} The large 89k array "sleeps" waiting for change
Yes, so it fills the top of the heap and occupies it forever, thereby
causing the heap blocks it occupies to be uselessly traversed seeking
available space every time the smaller array is manipulated.
This suggests a couple of possible fixes, but I've run out of time to
experiment right now.
--
Barton E. Schaefer
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 0:06 ` Bart Schaefer
@ 2015-10-11 6:20 ` Bart Schaefer
2015-10-11 8:39 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 6:20 UTC (permalink / raw)
To: zsh-workers
On Oct 10, 5:06pm, Bart Schaefer wrote:
}
} This suggests a couple of possible fixes, but I've run out of time to
} experiment right now.
OK, this is the least intrusive change I can think of that might make
a difference. What this attempts to do is maintain a second pointer
into the list of heaps that tells freeheap() where to start working.
It's set on pushheap() and reset on popheap() so it freeheap() should
avoid searching for free space in some arenas that were completely
filled before pushheap() was called. However, it won't improve the
performance if there are a whole lot of heap arenas each with a small
amount of space available at the time of pushheap().
To improve THAT situation, I think we'd have to be willing to "leak"
those small pieces of heap until a popheap() happens, and instead
use at least one new arena following most calls to pushheap(). This
might mean that zsh's memory footprint grows, but on modern hardware
that may not be an issue.
All tests pass with this change in place, let's see what it does with
Sebastian's 89k array elements.
diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..01072a3 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -127,6 +127,12 @@ static Heap heaps;
static Heap fheap;
+/* same as fheap, except in the preceding stack context (it will be the
+ * first heap from which space in the current context may be recovered
+ * when heap space is freed) */
+
+static Heap fpop;
+
/**/
#ifdef ZSH_HEAP_DEBUG
/*
@@ -293,8 +299,11 @@ pushheap(void)
h_push++;
#endif
+ fpop = NULL;
for (h = heaps; h; h = h->next) {
DPUTS(!h->used, "BUG: empty heap");
+ if (!fpop && h->used < ARENA_SIZEOF(h))
+ fpop = h;
hs = (Heapstack) zalloc(sizeof(*hs));
hs->next = h->sp;
h->sp = hs;
@@ -341,9 +350,10 @@ freeheap(void)
*
* However, if the arena to which fheap points is unused, we want to
* free it, so we have no choice but to do the sweep for a new fheap.
+ * fpop is the first heap with free space following pushheap().
*/
if (fheap && !fheap->sp)
- fheap = NULL; /* We used to do this unconditionally */
+ fheap = fpop;
/*
* In other cases, either fheap is already correct, or it has never
* been set and this loop will do it, or it'll be reset from scratch
@@ -417,7 +427,7 @@ popheap(void)
h_pop++;
#endif
- fheap = NULL;
+ fheap = fpop = NULL;
for (h = heaps; h; h = hn) {
hn = h->next;
if ((hs = h->sp)) {
@@ -443,6 +453,8 @@ popheap(void)
#endif
if (!fheap && h->used < ARENA_SIZEOF(h))
fheap = h;
+ if (!fpop && h->sp && h->sp->used < ARENA_SIZEOF(h))
+ fpop = h;
zfree(hs, sizeof(*hs));
hl = h;
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 6:20 ` Bart Schaefer
@ 2015-10-11 8:39 ` Sebastian Gniazdowski
2015-10-11 16:17 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-11 8:39 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 3248 bytes --]
Zprof says there is no change:
https://youtu.be/6A-AKesvZKM
I also attach zprof result in file. BTW, the patch didn't apply to
head or to 23f98c3, I had to make changes manually.
Best regards,
Sebastian Gniazdowski
On 11 October 2015 at 08:20, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 10, 5:06pm, Bart Schaefer wrote:
> }
> } This suggests a couple of possible fixes, but I've run out of time to
> } experiment right now.
>
> OK, this is the least intrusive change I can think of that might make
> a difference. What this attempts to do is maintain a second pointer
> into the list of heaps that tells freeheap() where to start working.
> It's set on pushheap() and reset on popheap() so it freeheap() should
> avoid searching for free space in some arenas that were completely
> filled before pushheap() was called. However, it won't improve the
> performance if there are a whole lot of heap arenas each with a small
> amount of space available at the time of pushheap().
>
> To improve THAT situation, I think we'd have to be willing to "leak"
> those small pieces of heap until a popheap() happens, and instead
> use at least one new arena following most calls to pushheap(). This
> might mean that zsh's memory footprint grows, but on modern hardware
> that may not be an issue.
>
> All tests pass with this change in place, let's see what it does with
> Sebastian's 89k array elements.
>
>
> diff --git a/Src/mem.c b/Src/mem.c
> index b9569ea..01072a3 100644
> --- a/Src/mem.c
> +++ b/Src/mem.c
> @@ -127,6 +127,12 @@ static Heap heaps;
>
> static Heap fheap;
>
> +/* same as fheap, except in the preceding stack context (it will be the
> + * first heap from which space in the current context may be recovered
> + * when heap space is freed) */
> +
> +static Heap fpop;
> +
> /**/
> #ifdef ZSH_HEAP_DEBUG
> /*
> @@ -293,8 +299,11 @@ pushheap(void)
> h_push++;
> #endif
>
> + fpop = NULL;
> for (h = heaps; h; h = h->next) {
> DPUTS(!h->used, "BUG: empty heap");
> + if (!fpop && h->used < ARENA_SIZEOF(h))
> + fpop = h;
> hs = (Heapstack) zalloc(sizeof(*hs));
> hs->next = h->sp;
> h->sp = hs;
> @@ -341,9 +350,10 @@ freeheap(void)
> *
> * However, if the arena to which fheap points is unused, we want to
> * free it, so we have no choice but to do the sweep for a new fheap.
> + * fpop is the first heap with free space following pushheap().
> */
> if (fheap && !fheap->sp)
> - fheap = NULL; /* We used to do this unconditionally */
> + fheap = fpop;
> /*
> * In other cases, either fheap is already correct, or it has never
> * been set and this loop will do it, or it'll be reset from scratch
> @@ -417,7 +427,7 @@ popheap(void)
> h_pop++;
> #endif
>
> - fheap = NULL;
> + fheap = fpop = NULL;
> for (h = heaps; h; h = hn) {
> hn = h->next;
> if ((hs = h->sp)) {
> @@ -443,6 +453,8 @@ popheap(void)
> #endif
> if (!fheap && h->used < ARENA_SIZEOF(h))
> fheap = h;
> + if (!fpop && h->sp && h->sp->used < ARENA_SIZEOF(h))
> + fpop = h;
> zfree(hs, sizeof(*hs));
>
> hl = h;
[-- Attachment #2: result.txt --]
[-- Type: text/plain, Size: 1956 bytes --]
patched:
# time ( npanelize < ~/lsoflsof )
num calls time self name
-----------------------------------------------------------------------------------
1) 161 5453,04 33,87 43,53% 5453,04 33,87 43,53% _nlist_print_with_ansi
2) 1 12500,28 12500,28 99,79% 4052,65 4052,65 32,35% n-list
3) 7 8041,91 1148,84 64,20% 2588,87 369,84 20,67% n-list-draw
4) 7 327,91 46,84 2,62% 322,30 46,04 2,57% n-list-input
5) 7 40,77 5,82 0,33% 31,97 4,57 0,26% _nlist_setup_user_vars
6) 7 28,91 4,13 0,23% 28,91 4,13 0,23% _nlist_status_msg
7) 2 24,94 12,47 0,20% 24,94 12,47 0,20% _nlist_cursor_visibility
8) 1 26,79 26,79 0,21% 9,97 9,97 0,08% _nlist_exit
( n-panelize < ~/lsoflsof; ) 14,33s user 0,27s system 94% cpu 15,488 total
clean:
# time ( npanelize < ~/lsoflsof )
num calls time self name
-----------------------------------------------------------------------------------
1) 161 5410,49 33,61 43,57% 5410,49 33,61 43,57% _nlist_print_with_ansi
2) 1 12391,49 12391,49 99,78% 4030,62 4030,62 32,46% n-list
3) 7 7961,76 1137,39 64,11% 2551,27 364,47 20,54% n-list-draw
4) 7 324,36 46,34 2,61% 318,36 45,48 2,56% n-list-input
5) 7 39,22 5,60 0,32% 31,44 4,49 0,25% _nlist_setup_user_vars
6) 7 27,86 3,98 0,22% 27,86 3,98 0,22% _nlist_status_msg
7) 2 24,35 12,18 0,20% 24,35 12,18 0,20% _nlist_cursor_visibility
8) 1 27,05 27,05 0,22% 10,37 10,37 0,08% _nlist_exit
( n-panelize < ~/lsoflsof; ) 14,19s user 0,29s system 94% cpu 15,319 total
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 8:39 ` Sebastian Gniazdowski
@ 2015-10-11 16:17 ` Bart Schaefer
2015-10-11 16:48 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 16:17 UTC (permalink / raw)
To: zsh-workers
On Oct 11, 10:39am, Sebastian Gniazdowski wrote:
}
} Zprof says there is no change:
Well, that's not entirely surprising. It probably just means that there
is some space in a lot of arenas, so fpop doesn't move far from the head
of the list. Or it could mean that the real problem is popheap() rather
than freeheap().
} BTW, the patch didn't apply to head or to 23f98c3,
Curious. "git diff master origin/master" shows no diffs for my local
repository, and the patch was just a straight "git diff".
There were additional changes committed after 23f98c3 so there's no
reason to have expected it would apply to that.
In any case here's another stab at it. Should apply to 83a1757. This
should help if there's a lot of freeheap() happening, but not so much
if there's a lot of push/pop (in fact it may be worse in that case).
If it's push/pop that's the issue, the next step would be to try to
identify the crucial pushheap() and replace it with NEWHEAPS() if the
scope permits.
diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..d49f685 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -294,7 +294,7 @@ pushheap(void)
#endif
for (h = heaps; h; h = h->next) {
- DPUTS(!h->used, "BUG: empty heap");
+ DPUTS(!h->used && h->next, "BUG: empty heap");
hs = (Heapstack) zalloc(sizeof(*hs));
hs->next = h->sp;
h->sp = hs;
@@ -334,17 +334,15 @@ freeheap(void)
*
* Whenever fheap is NULL here, the loop below sweeps back over the
* entire heap list again, resetting the free space in every arena to
- * the amount stashed by pushheap() and finding the first arena with
+ * the amount stashed by pushheap() and finding the arena with the most
* free space to optimize zhalloc()'s next search. When there's a lot
* of stuff already on the heap, this is an enormous amount of work,
* and performance goes to hell.
*
- * However, if the arena to which fheap points is unused, we want to
- * free it, so we have no choice but to do the sweep for a new fheap.
- */
- if (fheap && !fheap->sp)
- fheap = NULL; /* We used to do this unconditionally */
- /*
+ * Therefore, we defer freeing the most recently allocated arena until
+ * we reach popheap(). This may fail to reclaim some space in earlier
+ * arenas.
+ *
* In other cases, either fheap is already correct, or it has never
* been set and this loop will do it, or it'll be reset from scratch
* on the next popheap(). So all that's needed here is to pick up
@@ -361,7 +359,11 @@ freeheap(void)
memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
#endif
h->used = h->sp->used;
- if (!fheap && h->used < ARENA_SIZEOF(h))
+ if (!fheap) {
+ if (h->used < ARENA_SIZEOF(h))
+ fheap = h;
+ } else if (ARENA_SIZEOF(h) - h->used >
+ ARENA_SIZEOF(fheap) - fheap->used)
fheap = h;
hl = h;
#ifdef ZSH_HEAP_DEBUG
@@ -384,6 +386,26 @@ freeheap(void)
VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
#endif
} else {
+ if (h->next) {
+ /* We want to cut this out of the arena list if we can */
+ if (h == heaps)
+ hl = heaps = h->next;
+ else if (hl && hl->next == h)
+ hl->next = h->next;
+ else {
+ DPUTS(hl, "hl->next != h when freeing");
+ hl = h;
+ continue;
+ }
+ h->next = NULL;
+ } else {
+ /* Leave an empty arena at the end until popped */
+ h->used = 0;
+ fheap = hl = h;
+ break;
+ }
+ if (fheap == h)
+ fheap = NULL;
#ifdef USE_MMAP
munmap((void *) h, h->size);
#else
@@ -441,12 +463,29 @@ popheap(void)
#ifdef ZSH_VALGRIND
VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
#endif
- if (!fheap && h->used < ARENA_SIZEOF(h))
+ if (!fheap) {
+ if (h->used < ARENA_SIZEOF(h))
+ fheap = h;
+ } else if (ARENA_SIZEOF(h) - h->used >
+ ARENA_SIZEOF(fheap) - fheap->used)
fheap = h;
zfree(hs, sizeof(*hs));
hl = h;
} else {
+ if (h->next) {
+ /* We want to cut this out of the arena list if we can */
+ if (h == heaps)
+ hl = heaps = h->next;
+ else if (hl && hl->next == h)
+ hl->next = h->next;
+ else {
+ DPUTS(hl, "hl->next != h when popping");
+ hl = h;
+ continue;
+ }
+ h->next = NULL;
+ }
#ifdef USE_MMAP
munmap((void *) h, h->size);
#else
@@ -524,7 +563,7 @@ zheapptr(void *p)
mod_export void *
zhalloc(size_t size)
{
- Heap h;
+ Heap h, hp = NULL;
size_t n;
#ifdef ZSH_VALGRIND
size_t req_size = size;
@@ -546,6 +585,7 @@ zhalloc(size_t size)
for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
? fheap : heaps);
h; h = h->next) {
+ hp = h;
if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
void *ret;
@@ -566,7 +606,6 @@ zhalloc(size_t size)
}
}
{
- Heap hp;
/* not found, allocate new heap */
#if defined(ZSH_MEM) && !defined(USE_MMAP)
static int called = 0;
@@ -575,7 +614,6 @@ zhalloc(size_t size)
#endif
n = HEAP_ARENA_SIZE > size ? HEAPSIZE : size + sizeof(*h);
- for (hp = NULL, h = heaps; h; hp = h, h = h->next);
#ifdef USE_MMAP
h = mmap_heap_alloc(&n);
@@ -607,6 +645,7 @@ zhalloc(size_t size)
VALGRIND_MEMPOOL_ALLOC((char *)h, (char *)arena(h), req_size);
#endif
+ DPUTS(hp && hp->next, "failed to find end of chain in zhalloc");
if (hp)
hp->next = h;
else
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 16:17 ` Bart Schaefer
@ 2015-10-11 16:48 ` Sebastian Gniazdowski
2015-10-11 17:31 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-11 16:48 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 6695 bytes --]
It's now much faster, zprof time 11851 vs 3433. That's still strongly
lagging behavior. One other trait is that it slows down after a while.
Best regards,
Sebastian Gniazdowski
On 11 October 2015 at 18:17, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 11, 10:39am, Sebastian Gniazdowski wrote:
> }
> } Zprof says there is no change:
>
> Well, that's not entirely surprising. It probably just means that there
> is some space in a lot of arenas, so fpop doesn't move far from the head
> of the list. Or it could mean that the real problem is popheap() rather
> than freeheap().
>
> } BTW, the patch didn't apply to head or to 23f98c3,
>
> Curious. "git diff master origin/master" shows no diffs for my local
> repository, and the patch was just a straight "git diff".
>
> There were additional changes committed after 23f98c3 so there's no
> reason to have expected it would apply to that.
>
> In any case here's another stab at it. Should apply to 83a1757. This
> should help if there's a lot of freeheap() happening, but not so much
> if there's a lot of push/pop (in fact it may be worse in that case).
>
> If it's push/pop that's the issue, the next step would be to try to
> identify the crucial pushheap() and replace it with NEWHEAPS() if the
> scope permits.
>
>
> diff --git a/Src/mem.c b/Src/mem.c
> index b9569ea..d49f685 100644
> --- a/Src/mem.c
> +++ b/Src/mem.c
> @@ -294,7 +294,7 @@ pushheap(void)
> #endif
>
> for (h = heaps; h; h = h->next) {
> - DPUTS(!h->used, "BUG: empty heap");
> + DPUTS(!h->used && h->next, "BUG: empty heap");
> hs = (Heapstack) zalloc(sizeof(*hs));
> hs->next = h->sp;
> h->sp = hs;
> @@ -334,17 +334,15 @@ freeheap(void)
> *
> * Whenever fheap is NULL here, the loop below sweeps back over the
> * entire heap list again, resetting the free space in every arena to
> - * the amount stashed by pushheap() and finding the first arena with
> + * the amount stashed by pushheap() and finding the arena with the most
> * free space to optimize zhalloc()'s next search. When there's a lot
> * of stuff already on the heap, this is an enormous amount of work,
> * and performance goes to hell.
> *
> - * However, if the arena to which fheap points is unused, we want to
> - * free it, so we have no choice but to do the sweep for a new fheap.
> - */
> - if (fheap && !fheap->sp)
> - fheap = NULL; /* We used to do this unconditionally */
> - /*
> + * Therefore, we defer freeing the most recently allocated arena until
> + * we reach popheap(). This may fail to reclaim some space in earlier
> + * arenas.
> + *
> * In other cases, either fheap is already correct, or it has never
> * been set and this loop will do it, or it'll be reset from scratch
> * on the next popheap(). So all that's needed here is to pick up
> @@ -361,7 +359,11 @@ freeheap(void)
> memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
> #endif
> h->used = h->sp->used;
> - if (!fheap && h->used < ARENA_SIZEOF(h))
> + if (!fheap) {
> + if (h->used < ARENA_SIZEOF(h))
> + fheap = h;
> + } else if (ARENA_SIZEOF(h) - h->used >
> + ARENA_SIZEOF(fheap) - fheap->used)
> fheap = h;
> hl = h;
> #ifdef ZSH_HEAP_DEBUG
> @@ -384,6 +386,26 @@ freeheap(void)
> VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
> #endif
> } else {
> + if (h->next) {
> + /* We want to cut this out of the arena list if we can */
> + if (h == heaps)
> + hl = heaps = h->next;
> + else if (hl && hl->next == h)
> + hl->next = h->next;
> + else {
> + DPUTS(hl, "hl->next != h when freeing");
> + hl = h;
> + continue;
> + }
> + h->next = NULL;
> + } else {
> + /* Leave an empty arena at the end until popped */
> + h->used = 0;
> + fheap = hl = h;
> + break;
> + }
> + if (fheap == h)
> + fheap = NULL;
> #ifdef USE_MMAP
> munmap((void *) h, h->size);
> #else
> @@ -441,12 +463,29 @@ popheap(void)
> #ifdef ZSH_VALGRIND
> VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
> #endif
> - if (!fheap && h->used < ARENA_SIZEOF(h))
> + if (!fheap) {
> + if (h->used < ARENA_SIZEOF(h))
> + fheap = h;
> + } else if (ARENA_SIZEOF(h) - h->used >
> + ARENA_SIZEOF(fheap) - fheap->used)
> fheap = h;
> zfree(hs, sizeof(*hs));
>
> hl = h;
> } else {
> + if (h->next) {
> + /* We want to cut this out of the arena list if we can */
> + if (h == heaps)
> + hl = heaps = h->next;
> + else if (hl && hl->next == h)
> + hl->next = h->next;
> + else {
> + DPUTS(hl, "hl->next != h when popping");
> + hl = h;
> + continue;
> + }
> + h->next = NULL;
> + }
> #ifdef USE_MMAP
> munmap((void *) h, h->size);
> #else
> @@ -524,7 +563,7 @@ zheapptr(void *p)
> mod_export void *
> zhalloc(size_t size)
> {
> - Heap h;
> + Heap h, hp = NULL;
> size_t n;
> #ifdef ZSH_VALGRIND
> size_t req_size = size;
> @@ -546,6 +585,7 @@ zhalloc(size_t size)
> for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
> ? fheap : heaps);
> h; h = h->next) {
> + hp = h;
> if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
> void *ret;
>
> @@ -566,7 +606,6 @@ zhalloc(size_t size)
> }
> }
> {
> - Heap hp;
> /* not found, allocate new heap */
> #if defined(ZSH_MEM) && !defined(USE_MMAP)
> static int called = 0;
> @@ -575,7 +614,6 @@ zhalloc(size_t size)
> #endif
>
> n = HEAP_ARENA_SIZE > size ? HEAPSIZE : size + sizeof(*h);
> - for (hp = NULL, h = heaps; h; hp = h, h = h->next);
>
> #ifdef USE_MMAP
> h = mmap_heap_alloc(&n);
> @@ -607,6 +645,7 @@ zhalloc(size_t size)
> VALGRIND_MEMPOOL_ALLOC((char *)h, (char *)arena(h), req_size);
> #endif
>
> + DPUTS(hp && hp->next, "failed to find end of chain in zhalloc");
> if (hp)
> hp->next = h;
> else
[-- Attachment #2: result2.txt --]
[-- Type: text/plain, Size: 1886 bytes --]
patched:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 3433,98 3433,98 99,62% 2439,38 2439,38 70,77% n-list
2) 161 638,21 3,96 18,52% 638,21 3,96 18,52% _nlist_print_with_ansi
3) 7 928,72 132,67 26,94% 290,51 41,50 8,43% n-list-draw
4) 7 50,29 7,18 1,46% 50,20 7,17 1,46% n-list-input
5) 2 11,39 5,69 0,33% 11,39 5,69 0,33% _nlist_cursor_visibility
6) 7 10,79 1,54 0,31% 10,62 1,52 0,31% _nlist_setup_user_vars
7) 1 12,96 12,96 0,38% 6,09 6,09 0,18% _nlist_exit
8) 7 0,29 0,04 0,01% 0,29 0,04 0,01% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 5,18s user 0,18s system 89% cpu 5,959 total
clean:
num calls time self name
-----------------------------------------------------------------------------------
1) 161 5266,46 32,71 44,35% 5266,46 32,71 44,35% _nlist_print_with_ansi
2) 1 11851,19 11851,19 99,79% 3706,12 3706,12 31,21% n-list
3) 7 7754,84 1107,83 65,30% 2488,38 355,48 20,95% n-list-draw
4) 7 316,40 45,20 2,66% 310,96 44,42 2,62% n-list-input
5) 7 38,35 5,48 0,32% 30,59 4,37 0,26% _nlist_setup_user_vars
6) 7 26,60 3,80 0,22% 26,60 3,80 0,22% _nlist_status_msg
7) 2 23,89 11,95 0,20% 23,89 11,95 0,20% _nlist_cursor_visibility
8) 1 24,35 24,35 0,21% 9,34 9,34 0,08% _nlist_exit
( n-panelize < ~/lsoflsof; ) 13,77s user 0,21s system 95% cpu 14,660 total
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 16:48 ` Sebastian Gniazdowski
@ 2015-10-11 17:31 ` Bart Schaefer
2015-10-11 18:05 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 17:31 UTC (permalink / raw)
To: zsh-workers
On Oct 11, 6:48pm, Sebastian Gniazdowski wrote:
}
} It's now much faster, zprof time 11851 vs 3433. That's still strongly
} lagging behavior. One other trait is that it slows down after a while.
The whole algorithm is designed to maximally fill the allocated space
and thereby minimize the memory footprint, not to maximize speed. So
the more often you grow/free the heap, the more time it spends looking
for unused space in the existing arenas.
Besides zprof, it would be interesting to see the maximum memory usage
of the patched and unpatched shells.
Does this change in zhalloc() make any difference? This goes farther
in the direction of abandoning small bits of earlier arenas until the
popheap() comes around.
diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..63a5afe 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -543,9 +543,14 @@ zhalloc(size_t size)
/* find a heap with enough free space */
- for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
- ? fheap : heaps);
- h; h = h->next) {
+ /*
+ * This previously assigned:
+ * h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
+ * ? fheap : heaps);
+ * but we think that nothing upstream of fheap has more free space,
+ * so why start over at heaps just because fheap has too little?
+ */
+ for (h = (fheap ? fheap : heaps); h; h = h->next) {
if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
void *ret;
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 17:31 ` Bart Schaefer
@ 2015-10-11 18:05 ` Sebastian Gniazdowski
2015-10-11 21:22 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-11 18:05 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]
The patch (alone) much helped, zprof time is now 3027 and there is no
after-while slow down. Htop shows RES of 152 MB. For the first patch
RES is 166 MB. For the clean version it's 152 MB. I didn't combine the
patches (BTW. the problems with applying patches were caused by expand
tabs that occured on copy-pasting).
On 11 October 2015 at 19:31, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 11, 6:48pm, Sebastian Gniazdowski wrote:
> }
> } It's now much faster, zprof time 11851 vs 3433. That's still strongly
> } lagging behavior. One other trait is that it slows down after a while.
>
> The whole algorithm is designed to maximally fill the allocated space
> and thereby minimize the memory footprint, not to maximize speed. So
> the more often you grow/free the heap, the more time it spends looking
> for unused space in the existing arenas.
>
> Besides zprof, it would be interesting to see the maximum memory usage
> of the patched and unpatched shells.
>
> Does this change in zhalloc() make any difference? This goes farther
> in the direction of abandoning small bits of earlier arenas until the
> popheap() comes around.
>
> diff --git a/Src/mem.c b/Src/mem.c
> index b9569ea..63a5afe 100644
> --- a/Src/mem.c
> +++ b/Src/mem.c
> @@ -543,9 +543,14 @@ zhalloc(size_t size)
>
> /* find a heap with enough free space */
>
> - for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
> - ? fheap : heaps);
> - h; h = h->next) {
> + /*
> + * This previously assigned:
> + * h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
> + * ? fheap : heaps);
> + * but we think that nothing upstream of fheap has more free space,
> + * so why start over at heaps just because fheap has too little?
> + */
> + for (h = (fheap ? fheap : heaps); h; h = h->next) {
> if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
> void *ret;
>
[-- Attachment #2: result3.txt --]
[-- Type: text/plain, Size: 2856 bytes --]
patched (zhalloc):
num calls time self name
-----------------------------------------------------------------------------------
1) 1 3027,41 3027,41 99,53% 2025,50 2025,50 66,59% n-list
2) 161 674,02 4,19 22,16% 674,02 4,19 22,16% _nlist_print_with_ansi
3) 7 962,52 137,50 31,64% 288,50 41,21 9,48% n-list-draw
4) 7 24,50 3,50 0,81% 24,31 3,47 0,80% n-list-input
5) 2 13,09 6,54 0,43% 13,09 6,54 0,43% _nlist_cursor_visibility
6) 7 9,18 1,31 0,30% 8,99 1,28 0,30% _nlist_setup_user_vars
7) 1 14,33 14,33 0,47% 6,23 6,23 0,20% _nlist_exit
8) 7 0,71 0,10 0,02% 0,71 0,10 0,02% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 4,62s user 0,18s system 88% cpu 5,432 total
patched (the first patch):
num calls time self name
-----------------------------------------------------------------------------------
1) 1 3547,43 3547,43 99,63% 2560,58 2560,58 71,92% n-list
2) 161 636,56 3,95 17,88% 636,56 3,95 17,88% _nlist_print_with_ansi
3) 7 921,28 131,61 25,88% 284,71 40,67 8,00% n-list-draw
4) 7 49,42 7,06 1,39% 49,33 7,05 1,39% n-list-input
5) 2 11,96 5,98 0,34% 11,96 5,98 0,34% _nlist_cursor_visibility
6) 7 10,80 1,54 0,30% 10,64 1,52 0,30% _nlist_setup_user_vars
7) 1 13,05 13,05 0,37% 6,17 6,17 0,17% _nlist_exit
8) 7 0,28 0,04 0,01% 0,28 0,04 0,01% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 5,17s user 0,18s system 88% cpu 6,032 total
clean:
num calls time self name
-----------------------------------------------------------------------------------
1) 161 5176,77 32,15 43,84% 5176,77 32,15 43,84% _nlist_print_with_ansi
2) 1 11785,16 11785,16 99,79% 3755,23 3755,23 31,80% n-list
3) 7 7640,70 1091,53 64,70% 2463,93 351,99 20,86% n-list-draw
4) 7 316,45 45,21 2,68% 310,97 44,42 2,63% n-list-input
5) 7 37,72 5,39 0,32% 30,00 4,29 0,25% _nlist_setup_user_vars
6) 7 26,52 3,79 0,22% 26,52 3,79 0,22% _nlist_status_msg
7) 2 23,68 11,84 0,20% 23,68 11,84 0,20% _nlist_cursor_visibility
8) 1 24,45 24,45 0,21% 9,32 9,32 0,08% _nlist_exit
( n-panelize < ~/lsoflsof; ) 13,64s user 0,20s system 95% cpu 14,548 total
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 18:05 ` Sebastian Gniazdowski
@ 2015-10-11 21:22 ` Bart Schaefer
2015-10-12 8:21 ` Sebastian Gniazdowski
2015-10-12 12:05 ` Sebastian Gniazdowski
0 siblings, 2 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 21:22 UTC (permalink / raw)
To: zsh-workers
Please don't top-post, or if you do, don't include the entire message to
which you are replying as an excerpt. If using gmail, you often can't
tell this is happening; click on the little [...] box at the bottom and
selectively delete, or cursor-key down so the [...] is highlighted and
then press delete to kill the entire excerpt. Thanks.
On Oct 11, 8:05pm, Sebastian Gniazdowski wrote:
}
} The patch (alone) much helped, zprof time is now 3027
Wow, I wasn't expecting that much difference.
} Htop shows RES of 152 MB. For the first patch
} RES is 166 MB. For the clean version it's 152 MB.
OK, this makes sense because the patch from workers/36834 is leaving some
heaps around longer whereas the patch from workers/36836 is just avoiding
a re-scan of the list of heaps at allocation time.
} I didn't combine the patches
It may be interesting to do so, they shouldn't conflict. From what you
said before --
} > } It's now much faster, zprof time 11851 vs 3433.
-- each of them cut the time by about 2/3, so together they may get it
below 3000.
--
Barton E. Schaefer
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 21:22 ` Bart Schaefer
@ 2015-10-12 8:21 ` Sebastian Gniazdowski
2015-10-12 14:01 ` Bart Schaefer
2015-10-13 13:07 ` Sebastian Gniazdowski
2015-10-12 12:05 ` Sebastian Gniazdowski
1 sibling, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-12 8:21 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 977 bytes --]
On 11 October 2015 at 23:22, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 11, 8:05pm, Sebastian Gniazdowski wrote:
> } I didn't combine the patches
>
> It may be interesting to do so, they shouldn't conflict. From what you
> said before --
>
> } > } It's now much faster, zprof time 11851 vs 3433.
>
> -- each of them cut the time by about 2/3, so together they may get it
> below 3000.
With both patches zprof running time is around 1806. RES is 152M and
goes down nicely to 37564 when I quit the script. Didn't observe
after-while slow down. The tool is almost fully usable for 89k
elements (there's still a bit lag), however searching is slow, I
attached zprof results for searching. It is done via two following
invocations:
list=( "${(@M)list:#(#i)*$~search_pattern*}" )
col_list=(
"${(@)list//(#mi)($~colsearch_pattern)/$red${MATCH}$reset}" )
Maybe they trigger some specific behavior on heap?
> --
> Barton E. Schaefer
[-- Attachment #2: result4.txt --]
[-- Type: text/plain, Size: 1886 bytes --]
patched:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 1806,60 1806,60 99,17% 1072,67 1072,67 58,88% n-list
2) 161 458,49 2,85 25,17% 458,49 2,85 25,17% _nlist_print_with_ansi
3) 7 696,02 99,43 38,21% 237,53 33,93 13,04% n-list-draw
4) 7 23,89 3,41 1,31% 23,77 3,40 1,31% n-list-input
5) 2 13,03 6,51 0,72% 13,03 6,51 0,72% _nlist_cursor_visibility
6) 7 9,15 1,31 0,50% 8,96 1,28 0,49% _nlist_setup_user_vars
7) 1 15,03 15,03 0,83% 6,61 6,61 0,36% _nlist_exit
8) 7 0,28 0,04 0,02% 0,28 0,04 0,02% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 3,16s user 0,16s system 85% cpu 3,884 total
clean:
num calls time self name
-----------------------------------------------------------------------------------
1) 161 5131,50 31,87 43,12% 5131,50 31,87 43,12% _nlist_print_with_ansi
2) 1 11873,99 11873,99 99,78% 3933,58 3933,58 33,06% n-list
3) 7 7560,21 1080,03 63,53% 2428,71 346,96 20,41% n-list-draw
4) 7 309,00 44,14 2,60% 303,75 43,39 2,55% n-list-input
5) 7 37,24 5,32 0,31% 29,64 4,23 0,25% _nlist_setup_user_vars
6) 7 26,21 3,74 0,22% 26,21 3,74 0,22% _nlist_status_msg
7) 2 23,46 11,73 0,20% 23,46 11,73 0,20% _nlist_cursor_visibility
8) 1 26,01 26,01 0,22% 10,30 10,30 0,09% _nlist_exit
( n-panelize < ~/lsoflsof; ) 13,52s user 0,19s system 93% cpu 14,614 total
[-- Attachment #3: result_search.txt --]
[-- Type: text/plain, Size: 3812 bytes --]
the long patch:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 227359,12 227359,12 99,99% 205528,35 205528,35 90,39% n-list
2) 161 19586,67 121,66 8,61% 19586,67 121,66 8,61% _nlist_print_with_ansi
3) 7 21154,74 3022,11 9,30% 1568,07 224,01 0,69% n-list-draw
4) 7 589,99 84,28 0,26% 565,13 80,73 0,25% n-list-input
5) 4 54,45 13,61 0,02% 54,45 13,61 0,02% _nlist_verify_vars
6) 4 35,64 8,91 0,02% 35,64 8,91 0,02% _nlist_cursor_visibility
7) 7 25,32 3,62 0,01% 25,25 3,61 0,01% _nlist_setup_user_vars
8) 1 13,08 13,08 0,01% 6,60 6,60 0,00% _nlist_exit
( n-panelize < ~/lsoflsof; ) 227,77s user 0,92s system 99% cpu 3:49,78 total
zhalloc patch:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 49381,60 49381,60 99,97% 39748,80 39748,80 80,47% n-list
2) 161 8858,04 55,02 17,93% 8858,04 55,02 17,93% _nlist_print_with_ansi
3) 7 9560,66 1365,81 19,36% 702,62 100,37 1,42% n-list-draw
4) 7 63,54 9,08 0,13% 46,58 6,65 0,09% n-list-input
5) 4 28,48 7,12 0,06% 28,48 7,12 0,06% _nlist_cursor_visibility
6) 1 14,35 14,35 0,03% 6,35 6,35 0,01% _nlist_exit
7) 7 3,38 0,48 0,01% 3,31 0,47 0,01% _nlist_setup_user_vars
8) 7 1,49 0,21 0,00% 1,49 0,21 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 50,29s user 0,72s system 98% cpu 51,757 total
both patches:
num calls time self name
-----------------------------------------------------------------------------------
1) 161 11162,41 69,33 50,81% 11162,41 69,33 50,81% _nlist_print_with_ansi
2) 1 21956,87 21956,87 99,94% 9857,85 9857,85 44,87% n-list
3) 7 12012,06 1716,01 54,67% 849,65 121,38 3,87% n-list-draw
4) 7 79,19 11,31 0,36% 60,36 8,62 0,27% n-list-input
5) 4 29,77 7,44 0,14% 29,77 7,44 0,14% _nlist_cursor_visibility
6) 1 13,19 13,19 0,06% 6,59 6,59 0,03% _nlist_exit
7) 7 2,83 0,40 0,01% 2,76 0,39 0,01% _nlist_setup_user_vars
8) 7 0,36 0,05 0,00% 0,36 0,05 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 22,20s user 1,15s system 97% cpu 24,021 total
clean:
num calls time self name
-----------------------------------------------------------------------------------
1) 161 139846,51 868,61 55,42% 139846,51 868,61 55,42% _nlist_print_with_ansi
2) 1 252311,10 252311,10 99,99% 107251,66 107251,66 42,50% n-list
3) 7 144256,54 20608,08 57,17% 4410,03 630,00 1,75% n-list-draw
4) 7 698,59 99,80 0,28% 626,00 89,43 0,25% n-list-input
5) 4 95,33 23,83 0,04% 95,33 23,83 0,04% _nlist_cursor_visibility
6) 7 59,63 8,52 0,02% 59,63 8,52 0,02% _nlist_status_msg
7) 7 36,60 5,23 0,01% 35,84 5,12 0,01% _nlist_setup_user_vars
8) 1 24,74 24,74 0,01% 9,86 9,86 0,00% _nlist_exit
( n-panelize < ~/lsoflsof; ) 253,40s user 0,88s system 99% cpu 4:15,08 total
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-11 21:22 ` Bart Schaefer
2015-10-12 8:21 ` Sebastian Gniazdowski
@ 2015-10-12 12:05 ` Sebastian Gniazdowski
2015-10-12 15:13 ` Bart Schaefer
1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-12 12:05 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
On 11 October 2015 at 23:22, Bart Schaefer <schaefer@brasslantern.com> wrote:
> } Htop shows RES of 152 MB. For the first patch
> } RES is 166 MB. For the clean version it's 152 MB.
>
> OK, this makes sense because the patch from workers/36834 is leaving some
> heaps around longer whereas the patch from workers/36836 is just avoiding
> a re-scan of the list of heaps at allocation time.
Can something be done inside the script to make memory more prepared
for heavy computation? I tried putting:
# Allocate a large buffer just to free it
a=""
a="${(r:2000000:: :)a}"
a=""
before:
list=( "${(@M)list:#(#i)*$~search_pattern*}" )
But it doesn't make difference
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-12 8:21 ` Sebastian Gniazdowski
@ 2015-10-12 14:01 ` Bart Schaefer
2015-10-12 16:50 ` Sebastian Gniazdowski
2015-10-13 13:07 ` Sebastian Gniazdowski
1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-12 14:01 UTC (permalink / raw)
To: zsh-workers
On Oct 12, 10:21am, Sebastian Gniazdowski wrote:
}
} With both patches zprof running time is around 1806. RES is 152M and
} goes down nicely to 37564 when I quit the script. Didn't observe
} after-while slow down. The tool is almost fully usable for 89k
} elements (there's still a bit lag), however searching is slow,
If you configure --enable-zsh-heap-debug you can edit mem.c to set a
value for the heap_debug_verbosity variable which will cause a LOT of
debugging messages to be printed to stderr. This may tell you where
it's spending time.
Shell variables are always passed around internally by value rather
than by reference, so operations on a big list will copy it a lot.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-12 12:05 ` Sebastian Gniazdowski
@ 2015-10-12 15:13 ` Bart Schaefer
0 siblings, 0 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-12 15:13 UTC (permalink / raw)
To: zsh-workers
On Oct 12, 2:05pm, Sebastian Gniazdowski wrote:
}
} Can something be done inside the script to make memory more prepared
} for heavy computation?
Doubtful. The heap is a layer on top of malloc, so although it might
be possible to get system memory from malloc into the heap list, that
won't remove any of the heap implementation overhead, nor will it
keep zsh from copying the data around internally every time you use
${list} in the script later.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-12 14:01 ` Bart Schaefer
@ 2015-10-12 16:50 ` Sebastian Gniazdowski
2015-10-13 0:33 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-12 16:50 UTC (permalink / raw)
To: Bart Schaefer; +Cc: zsh-workers
On 12 October 2015 at 16:01, Bart Schaefer <schaefer@brasslantern.com> wrote:
> If you configure --enable-zsh-heap-debug you can edit mem.c to set a
> value for the heap_debug_verbosity variable which will cause a LOT of
> debugging messages to be printed to stderr. This may tell you where
> it's spending time.
Did that, set heap_debug_verbosity to 0x7f. When I issue a search,
screen gets constantly outputted with messages about heap pop. One
series started with:
HEAP DEBUG: heap ff604f popped, old heap was ff51af.
HEAP DEBUG: heap ff6050 popped, old heap was ff51b0.
HEAP DEBUG: heap ff6051 popped, old heap was ff51b1.
To then went through:
HEAP DEBUG: heap ff5592 popped, old heap was 52dde.
HEAP DEBUG: heap ff5593 popped, old heap was 52ddf.
HEAP DEBUG: heap 50984 popped, old heap was 4fe96.
HEAP DEBUG: heap 50985 popped, old heap was 4fea7.
And ended at:
HEAP DEBUG: heap 5145f popped, old heap was 50981.
HEAP DEBUG: heap 51460 popped, old heap was 50982.
HEAP DEBUG: heap 51461 popped, old heap was 50983.
Set my terminal scroll back to "unlimited", repeated that. However
scroll back buffer got cut off at some point. At top there was:
HEAP DEBUG: heap fb5f19 popped, old heap was fb5079.
HEAP DEBUG: heap fb5f1a popped, old heap was fb507a.
...
Then:
HEAP DEBUG: heap fb5466 popped, old heap was d35b.
HEAP DEBUG: heap fb5467 popped, old heap was d35c.
HEAP DEBUG: heap a41f popped, old heap was 9931.
HEAP DEBUG: heap a420 popped, old heap was 9942.
And at the end:
HEAP DEBUG: heap aefb popped, old heap was a41d.
HEAP DEBUG: heap aefc popped, old heap was a41e.
HEAP DEBUG: heap 9931 popped, old heap was 98f8.
HEAP DEBUG: heap 98f8 freed, new id is fb71a8.
HEAP DEBUG: heap fb71a8 pushed, new id is fb71a9.
HEAP DEBUG: heap fb71a9 popped, old heap was fb71a8.
Maybe the numbers mean something?
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-12 16:50 ` Sebastian Gniazdowski
@ 2015-10-13 0:33 ` Bart Schaefer
2015-10-13 8:21 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-13 0:33 UTC (permalink / raw)
To: zsh-workers
On Oct 12, 6:50pm, Sebastian Gniazdowski wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} On 12 October 2015 at 16:01, Bart Schaefer <schaefer@brasslantern.com> wrote:
} > If you configure --enable-zsh-heap-debug you can edit mem.c to set a
} > value for the heap_debug_verbosity variable
}
} Did that, set heap_debug_verbosity to 0x7f. When I issue a search,
} screen gets constantly outputted with messages about heap pop.
Yes, you probably want to prefix your test with
{
exec {fd}>&2 2>&debug_trace_file.txt
# your test here
} always {
exec 2>&$fd
}
(choose your own file name). I mean to mention that but forgot.
If you capture the output in a file (it may be a BIG file) you can do
things like "fgrep -c allocated" to count the number of times each heap
event happens.
} Maybe the numbers mean something?
Yes, they're an ID assigned every time a heap is either pushed or freed;
e.g.:
} HEAP DEBUG: heap 98f8 freed, new id is fb71a8.
The heap ID is incremented once at each push or free, so there have been
a minimum of 16439472 heap operations (fb71a8 - 98f8) between the
creation of that heap and the eventual free. (Note the number of heap
operations is the number of push+free multiplied by the average number
of heap arenas, because a new ID is assigned to each arena every time.)
There should be a message about "pushed" corresponding to each "popped",
like this pair:
} HEAP DEBUG: heap fb71a8 pushed, new id is fb71a9.
} HEAP DEBUG: heap fb71a9 popped, old heap was fb71a8.
You should also have sometimes seen "allocated memory from ..." or
"create new heap ..." messages. The fact that your scrollback is
almost entirely "popped" messages is an indication that you're really
deep in a recursion at the time your script exits.
--
Barton E. Schaefer
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-13 0:33 ` Bart Schaefer
@ 2015-10-13 8:21 ` Sebastian Gniazdowski
2015-10-13 15:52 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-13 8:21 UTC (permalink / raw)
To: zsh-workers
On 13 October 2015 at 02:33, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 12, 6:50pm, Sebastian Gniazdowski wrote:
> } Maybe the numbers mean something?
>
> Yes, they're an ID assigned every time a heap is either pushed or freed;
> e.g.:
I meant that maybe the numbers yield some optimization strategy. I
think we did a great job speeding up memory-heavy Zsh 6.5 (typical
execution) or 11.5 times (use of patterns on arrays). Thank you for
your effort (my tools greatly benefit), credits are mostly to you. I
would try to implement something more. Have you any more ideas for
optimization strategies? Basing on your description of heap mechanism
I could try to implement it. But maybe there is nothing left to be
done?
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-12 8:21 ` Sebastian Gniazdowski
2015-10-12 14:01 ` Bart Schaefer
@ 2015-10-13 13:07 ` Sebastian Gniazdowski
2015-10-13 13:31 ` Sebastian Gniazdowski
1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-13 13:07 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]
On 12 October 2015 at 10:21, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> With both patches zprof running time is around 1806. RES is 152M and
> goes down nicely to 37564 when I quit the script. Didn't observe
> after-while slow down. The tool is almost fully usable for 89k
> elements (there's still a bit lag), however searching is slow, I
> attached zprof results for searching. It is done via two following
> invocations:
>
> list=( "${(@M)list:#(#i)*$~search_pattern*}" )
> col_list=(
> "${(@)list//(#mi)($~colsearch_pattern)/$red${MATCH}$reset}" )
>
> Maybe they trigger some specific behavior on heap?
Did one more memory and performance test for version with both patches
and without them. Earlier I just started the script and moved 9 lines,
then checked memory. Now I ran search with $search_pattern and
$colsearch_pattern set to "s", single leter, then checked memory. RES
is 433M and 432M, so it's fine. The patched (HEAD) version is 14 times
faster. I also checked 5.0.2 – RES is 400M, and it's ~8 times slower
than the patched version.
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: result_search2.txt --]
[-- Type: text/plain, Size: 3078 bytes --]
both patches (HEAD):
num calls time self name
-----------------------------------------------------------------------------------
1) 1 10112,65 10112,65 99,85% 5124,47 5124,47 50,60% n-list
2) 92 4597,96 49,98 45,40% 4597,96 49,98 45,40% _nlist_print_with_ansi
3) 4 4910,97 1227,74 48,49% 313,01 78,25 3,09% n-list-draw
4) 4 69,24 17,31 0,68% 53,26 13,31 0,53% n-list-input
5) 4 28,51 7,13 0,28% 28,51 7,13 0,28% _nlist_cursor_visibility
6) 1 14,79 14,79 0,15% 7,11 7,11 0,07% _nlist_exit
7) 4 2,78 0,70 0,03% 2,72 0,68 0,03% _nlist_setup_user_vars
8) 4 0,18 0,05 0,00% 0,18 0,05 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 10,88s user 0,52s system 93% cpu 12,153 total
RES: 433M
no patches:
num calls time self name
-----------------------------------------------------------------------------------
1) 92 77542,89 842,86 54,42% 77542,89 842,86 54,42% _nlist_print_with_ansi
2) 1 142460,54 142460,54 99,98% 61816,42 61816,42 43,38% n-list
3) 4 80130,34 20032,59 56,24% 2587,46 646,86 1,82% n-list-draw
4) 4 456,02 114,01 0,32% 362,14 90,54 0,25% n-list-input
5) 4 117,37 29,34 0,08% 117,37 29,34 0,08% _nlist_cursor_visibility
6) 4 28,08 7,02 0,02% 28,08 7,02 0,02% _nlist_status_msg
7) 4 19,50 4,88 0,01% 18,71 4,68 0,01% _nlist_setup_user_vars
8) 1 26,80 26,80 0,02% 11,03 11,03 0,01% _nlist_exit
( n-panelize < ~/lsoflsof; ) 143,73s user 0,54s system 99% cpu 2:25,15 total
RES: 432M
5.0.2 - the test is biased because I browsed list for ~10 seconds before the
script catched "second breath" and started to work fast. But the user time here
is quite accurate.
num calls time self name
-----------------------------------------------------------------------------------
1) 828 43962,09 53,09 48,09% 43962,09 53,09 48,09% _nlist_print_with_ansi
2) 1 91406,33 91406,33 99,99% 40327,39 40327,39 44,12% n-list
3) 36 50295,77 1397,10 55,02% 6333,68 175,94 6,93% n-list-draw
4) 36 602,80 16,74 0,66% 538,63 14,96 0,59% n-list-input
5) 36 108,53 3,01 0,12% 84,31 2,34 0,09% _nlist_setup_user_vars
6) 36 63,53 1,76 0,07% 63,53 1,76 0,07% _nlist_status_msg
7) 4 62,24 15,56 0,07% 62,24 15,56 0,07% _nlist_cursor_visibility
8) 34 24,22 0,71 0,03% 24,22 0,71 0,03% _nlist_compute_user_vars_difference
( n-panelize < ~/lsoflsof; ) 84,60s user 0,42s system 90% cpu 1:33,78 total
RES: 400M
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-13 13:07 ` Sebastian Gniazdowski
@ 2015-10-13 13:31 ` Sebastian Gniazdowski
0 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-13 13:31 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1391 bytes --]
PS. test where I don't wait for 5.0.2 to become fast. RES is 392M.
Best regards,
Sebastian Gniazdowski
On 13 October 2015 at 15:07, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 12 October 2015 at 10:21, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> With both patches zprof running time is around 1806. RES is 152M and
>> goes down nicely to 37564 when I quit the script. Didn't observe
>> after-while slow down. The tool is almost fully usable for 89k
>> elements (there's still a bit lag), however searching is slow, I
>> attached zprof results for searching. It is done via two following
>> invocations:
>>
>> list=( "${(@M)list:#(#i)*$~search_pattern*}" )
>> col_list=(
>> "${(@)list//(#mi)($~colsearch_pattern)/$red${MATCH}$reset}" )
>>
>> Maybe they trigger some specific behavior on heap?
>
> Did one more memory and performance test for version with both patches
> and without them. Earlier I just started the script and moved 9 lines,
> then checked memory. Now I ran search with $search_pattern and
> $colsearch_pattern set to "s", single leter, then checked memory. RES
> is 433M and 432M, so it's fine. The patched (HEAD) version is 14 times
> faster. I also checked 5.0.2 – RES is 400M, and it's ~8 times slower
> than the patched version.
>
> Best regards,
> Sebastian Gniazdowski
[-- Attachment #2: result_search2a.txt --]
[-- Type: text/plain, Size: 1100 bytes --]
5.0.2 - without waiting for "second breath", just ~1 sec to check if it's slow
(i.e. at the first breath). Searching for "s" (lists of 89k elements)
num calls time self name
-----------------------------------------------------------------------------------
1) 115 58373,85 507,60 54,48% 58373,85 507,60 54,48% _nlist_print_with_ansi
2) 1 107124,50 107124,50 99,98% 46496,94 46496,94 43,40% n-list
3) 5 60289,00 12057,80 56,27% 1915,15 383,03 1,79% n-list-draw
4) 5 298,42 59,68 0,28% 230,25 46,05 0,21% n-list-input
5) 4 88,38 22,09 0,08% 88,38 22,09 0,08% _nlist_cursor_visibility
6) 5 17,66 3,53 0,02% 17,66 3,53 0,02% _nlist_status_msg
7) 5 14,79 2,96 0,01% 14,21 2,84 0,01% _nlist_setup_user_vars
8) 1 23,31 23,31 0,02% 9,11 9,11 0,01% _nlist_exit
( n-panelize < ~/lsoflsof; ) 108,20s user 0,44s system 99% cpu 1:49,53 total
RES: 392M
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-13 8:21 ` Sebastian Gniazdowski
@ 2015-10-13 15:52 ` Bart Schaefer
2015-10-14 6:50 ` Sebastian Gniazdowski
2015-10-14 13:27 ` Peter Stephenson
0 siblings, 2 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-13 15:52 UTC (permalink / raw)
To: zsh-workers
On Oct 13, 10:21am, Sebastian Gniazdowski wrote:
}
} I meant that maybe the numbers yield some optimization strategy.
} I could try to implement it. But maybe there is nothing left to be
} done?
I think we're all out of low-level strategies without completely redoing
the algorithm. As I noted, it's optimized mostly for space rather than
speed. Based on your later two emails, we've increased memory footprint
by about 25% in order to get a significant increase in speed, so that is
probably an OK trade-off on most modern hardware.
At this point I think any gains would be found higher up -- calls to
push/pop the heap that aren't necessary, or places [similar to the use
in completion code] where we could replace push/pop with a NEWHEAPS()
block to avoid scanning existing heaps for free space.
For example, the following shows what I mean. This may be a poor
choice if most functions are small, so I won't suggest committing, but
this is the kind of well-isolated push/pop that might benefit.
diff --git a/Src/exec.c b/Src/exec.c
index bcc8065..b9c40df 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -5067,10 +5067,11 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
#ifdef MAX_FUNCTION_DEPTH
static int funcdepth;
#endif
+ Heap funcheap;
queue_signals(); /* Lots of memory and global state changes coming */
- pushheap();
+ NEWHEAPS(funcheap) {
oargv0 = NULL;
obreaks = breaks;
@@ -5290,7 +5291,8 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
numpipestats = oldnumpipestats;
memcpy(pipestats, oldpipestats, sizeof(int)*numpipestats);
}
- popheap();
+
+ } OLDHEAPS;
unqueue_signals();
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-13 15:52 ` Bart Schaefer
@ 2015-10-14 6:50 ` Sebastian Gniazdowski
2015-10-14 13:27 ` Peter Stephenson
1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-14 6:50 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 808 bytes --]
With this patch the list is instant fast. It can be seen here:
https://youtu.be/jwBZY6fmQHs
(the zprof result – 1.9x speed up – doesn't fully express how fast is the list)
I attach results for browsing the list and searching through it.
Searching is faster 2.15 times and 1.88 times.
What's interesting is lower RES for the new patch. It's like with the
new patch RES can easier go down after peak. I once saw 179M after
peak of ~400M. The two-patches version can also go down, but maybe not
that easy. So this is quite confusing. I think a more synthetic
testing is needed to recognize how the memory behaves. Now I manually
open a script, give it a work, run htop.
Maybe the patch isn't that bad? Instant fast list is a great thing for me.
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: result_search3.txt --]
[-- Type: text/plain, Size: 3970 bytes --]
=== searching for "sy s" (translated to *sy*~^*s*) ===
new patch:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 10195,21 10195,21 99,94% 9943,58 9943,58 97,47% n-list
2) 161 212,69 1,32 2,08% 212,69 1,32 2,08% _nlist_print_with_ansi
3) 4 24,20 6,05 0,24% 24,20 6,05 0,24% _nlist_cursor_visibility
4) 7 228,66 32,67 2,24% 15,97 2,28 0,16% n-list-draw
5) 7 15,94 2,28 0,16% 3,12 0,45 0,03% n-list-input
6) 1 6,22 6,22 0,06% 0,73 0,73 0,01% _nlist_exit
7) 7 0,49 0,07 0,00% 0,43 0,06 0,00% _nlist_setup_user_vars
8) 7 0,31 0,04 0,00% 0,31 0,04 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 10,59s user 1,08s system 95% cpu 12,228 total
RES: 318M
two previous patches (from repository):
num calls time self name
-----------------------------------------------------------------------------------
1) 161 11294,81 70,15 51,38% 11294,81 70,15 51,38% _nlist_print_with_ansi
2) 1 21965,97 21965,97 99,93% 9726,11 9726,11 44,25% n-list
3) 7 12150,96 1735,85 55,28% 856,14 122,31 3,89% n-list-draw
4) 7 79,61 11,37 0,36% 61,09 8,73 0,28% n-list-input
5) 4 32,33 8,08 0,15% 32,33 8,08 0,15% _nlist_cursor_visibility
6) 1 15,82 15,82 0,07% 7,81 7,81 0,04% _nlist_exit
7) 7 2,85 0,41 0,01% 2,78 0,40 0,01% _nlist_setup_user_vars
8) 7 0,32 0,05 0,00% 0,32 0,05 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 22,37s user 0,93s system 97% cpu 23,972 total
RES: 492M
=== searching for "s" ===
new patch:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 5199,43 5199,43 99,88% 5048,37 5048,37 96,98% n-list
2) 92 117,53 1,28 2,26% 117,53 1,28 2,26% _nlist_print_with_ansi
3) 4 27,57 6,89 0,53% 27,57 6,89 0,53% _nlist_cursor_visibility
4) 4 126,26 31,57 2,43% 8,74 2,18 0,17% n-list-draw
5) 4 18,59 4,65 0,36% 2,20 0,55 0,04% n-list-input
6) 1 6,29 6,29 0,12% 0,73 0,73 0,01% _nlist_exit
7) 4 0,27 0,07 0,01% 0,21 0,05 0,00% _nlist_setup_user_vars
8) 4 0,14 0,04 0,00% 0,14 0,04 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 6,21s user 0,63s system 94% cpu 7,237 total
RES: 446M
two previous patches (from repository):
num calls time self name
-----------------------------------------------------------------------------------
1) 1 9781,26 9781,26 99,86% 4805,60 4805,60 49,06% n-list
2) 92 4583,54 49,82 46,80% 4583,54 49,82 46,80% _nlist_print_with_ansi
3) 4 4897,09 1224,27 50,00% 313,55 78,39 3,20% n-list-draw
4) 4 69,30 17,33 0,71% 52,21 13,05 0,53% n-list-input
5) 4 29,65 7,41 0,30% 29,65 7,41 0,30% _nlist_cursor_visibility
6) 1 13,58 13,58 0,14% 7,17 7,17 0,07% _nlist_exit
7) 4 2,79 0,70 0,03% 2,71 0,68 0,03% _nlist_setup_user_vars
8) 4 0,18 0,05 0,00% 0,18 0,05 0,00% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 10,88s user 0,52s system 96% cpu 11,779 total
RES: 439M
[-- Attachment #3: result5.txt --]
[-- Type: text/plain, Size: 1913 bytes --]
new patch:
num calls time self name
-----------------------------------------------------------------------------------
1) 1 965,93 965,93 99,29% 908,88 908,88 93,42% n-list
2) 161 34,56 0,21 3,55% 34,56 0,21 3,55% _nlist_print_with_ansi
3) 7 49,01 7,00 5,04% 14,44 2,06 1,48% n-list-draw
4) 2 11,36 5,68 1,17% 11,36 5,68 1,17% _nlist_cursor_visibility
5) 7 2,18 0,31 0,22% 2,08 0,30 0,21% n-list-input
6) 1 6,94 6,94 0,71% 0,53 0,53 0,05% _nlist_exit
7) 7 0,55 0,08 0,06% 0,38 0,05 0,04% _nlist_setup_user_vars
8) 7 0,25 0,04 0,03% 0,25 0,04 0,03% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 2,50s user 0,17s system 87% cpu 3,055 total
two previous patches (from HEAD):
num calls time self name
-----------------------------------------------------------------------------------
1) 1 1861,59 1861,59 99,22% 1082,47 1082,47 57,70% n-list
2) 161 486,65 3,02 25,94% 486,65 3,02 25,94% _nlist_print_with_ansi
3) 7 738,66 105,52 39,37% 252,01 36,00 13,43% n-list-draw
4) 7 25,10 3,59 1,34% 24,97 3,57 1,33% n-list-input
5) 2 12,54 6,27 0,67% 12,54 6,27 0,67% _nlist_cursor_visibility
6) 7 9,61 1,37 0,51% 9,37 1,34 0,50% _nlist_setup_user_vars
7) 1 14,61 14,61 0,78% 7,32 7,32 0,39% _nlist_exit
8) 7 0,39 0,06 0,02% 0,39 0,06 0,02% _nlist_status_msg
( n-panelize < ~/lsoflsof; ) 3,20s user 0,17s system 85% cpu 3,928 total
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-13 15:52 ` Bart Schaefer
2015-10-14 6:50 ` Sebastian Gniazdowski
@ 2015-10-14 13:27 ` Peter Stephenson
2015-10-14 16:25 ` Bart Schaefer
1 sibling, 1 reply; 61+ messages in thread
From: Peter Stephenson @ 2015-10-14 13:27 UTC (permalink / raw)
To: zsh-workers
On Tue, 13 Oct 2015 08:52:46 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> For example, the following shows what I mean. This may be a poor
> choice if most functions are small, so I won't suggest committing, but
> this is the kind of well-isolated push/pop that might benefit.
I'd suggest a quick finger test with the completion system to see if
anything is obviously worse --- if anything is going to exercise large
function stacks with small-to-medium size functions, it's that.
pws
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-14 13:27 ` Peter Stephenson
@ 2015-10-14 16:25 ` Bart Schaefer
2015-10-14 16:50 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-14 16:25 UTC (permalink / raw)
To: zsh-workers
On Oct 14, 8:50am, Sebastian Gniazdowski wrote:
}
} With this patch the list is instant fast.
NEWHEAPS() is O(1) where pushheap() is O(N) where N is the number of
arenas already in the heap. So it's not suprising that it's faster
in the specific case you're trying (lots of data in the heap for a
relatively long time) -- the question is would it slow down in the
more normal use case of a mostly-empty heap and a lot of function
calls, because the one operation done by NEWHEAPS() is heavier than
any one of the N operations in pushheap().
On the other hand for interactive use the difference in the small-memory
case is probably never going to be noticed.
} What's interesting is lower RES for the new patch. It's like with the
} new patch RES can easier go down after peak.
With pushheap() the code tries to maximally use available space in the
already allocated heap arenas; with NEWHEAPS() that available space is
ignored and we create all new arenas. It's possible that allows (or
unintentionally forces) the system malloc() to clump new allocations
together better so that they can be released back to the system sooner.
A helpful side-effect if so, but I wouldn't count on it being true for
all malloc libraries.
On Oct 14, 2:27pm, Peter Stephenson wrote:
}
} I'd suggest a quick finger test with the completion system to see if
} anything is obviously worse --- if anything is going to exercise large
} function stacks with small-to-medium size functions, it's that.
Actually the completion system is the place it's most likely to entirely
break down -- completion already uses NEWHEAPS() to get its own heap, and
then frequently employs SWITCHHEAPS() to be able to pass values back on
the regular heap. If we're also swapping the regular heap around with
NEWHEAPS() it's possible things might get very confused, or that an entire
list of heaps might be leaked.
However, I did just run the Test/Y* tests with valgrind and they all passed
and reported no leaks. (I did get:
==32081== Warning: invalid file descriptor -1 in syscall close()
==32081== Warning: invalid file descriptor -1 in syscall close()
from both Y01 and Y02 but curiously not Y03.)
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-14 16:25 ` Bart Schaefer
@ 2015-10-14 16:50 ` Bart Schaefer
2015-10-15 4:32 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-14 16:50 UTC (permalink / raw)
To: zsh-workers
On Oct 14, 9:25am, Bart Schaefer wrote:
}
} NEWHEAPS() is O(1) where pushheap() is O(N) where N is the number of
} arenas [...] the question is would it slow down in the more normal use
} case of a mostly-empty heap and a lot of function calls
Just as a data point, I tried swapping out the push/pop pair in the
source() function with NEWHEAPS/OLDHEAPS and "time"-ing a full run of
"make check" each way. NEWHEAPS was slightly slower and had higher
CPU use -- but the test scripts aren't a typical shell use pattern.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-14 16:50 ` Bart Schaefer
@ 2015-10-15 4:32 ` Bart Schaefer
2015-10-15 13:03 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-15 4:32 UTC (permalink / raw)
To: zsh-workers
On Oct 14, 9:50am, Bart Schaefer wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} On Oct 14, 9:25am, Bart Schaefer wrote:
} }
} } NEWHEAPS() is O(1) where pushheap() is O(N) where N is the number of
} } arenas [...] the question is would it slow down in the more normal use
} } case of a mostly-empty heap and a lot of function calls
}
} Just as a data point, I tried swapping out the push/pop pair in the
} source() function with NEWHEAPS/OLDHEAPS [...]
} NEWHEAPS was slightly slower
I've now gone through all the pushheap() in the code ... after ruling out
the cases where the code between push and pop couldn't be neatly wrapped
as a { block }, the only other pair it might make sense to replace is in
sort.c:strmetasort() and even that is questionable.
Given that we have direct evidence that it's beneficial to use NEWHEAPS
in doshfunc() in at least one case, I'll go ahead and commit 36853 and
we can see how it goes.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-15 4:32 ` Bart Schaefer
@ 2015-10-15 13:03 ` Sebastian Gniazdowski
2015-10-16 0:35 ` Bart Schaefer
2015-10-16 0:37 ` Bart Schaefer
0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-15 13:03 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]
On 15 October 2015 at 06:32, Bart Schaefer <schaefer@brasslantern.com> wrote:
> Given that we have direct evidence that it's beneficial to use NEWHEAPS
> in doshfunc() in at least one case, I'll go ahead and commit 36853 and
> we can see how it goes.
I've written a script that makes some memory testing easy:
https://github.com/psprint/zsh-tools/blob/master/mem-test.zsh
It runs copies of given zshells, which are directed to call specified
test function from the script. The test function then waits for the
main zsh to query memory size (via top). Example execution:
# Tests for zsh
string_test 480
array_test 3.21875
# Tests for zsh-head-both
string_test 5.1953125
array_test 3.05859375
# Tests for zsh-5.1.1-dev-0-clean
string_test 5.76171875
array_test 2.703125
Results are in megabytes of RES memory. The first set is for 5.0.2. It
seems there is something causing high memory usage, which isn't
activated for the array test, only for string_test.
I will write more more stressful functions to grasp any differences.
What test function could stress the recent changes?
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: mem-test.zsh --]
[-- Type: application/octet-stream, Size: 2751 bytes --]
#!/bin/zsh
emulate -L zsh
setopt extendedglob
zshs=( /bin/zsh zsh-head-both zsh-5.1.1-dev-0-clean )
# Convert sizes to number of megabytes
to_mbytes() {
local size="$1"
#echo "Converting $1"
if [[ "$size" = [0-9]#[Mm]* ]]; then
size="${size%[Mm]*}"
elif [[ "$size" = [0-9]#[Kk]* ]]; then
size="${size%[Kk]*}"
(( size = size / 1024.0 ))
elif [[ "$size" = [0-9]# ]]; then
case $( uname ) in
*Linux*)
(( size = size / 1024.0 ))
;;
*)
(( size = size / (1024.0 * 1024.0) ))
;;
esac
else
echo "Bad size occured: $size"
fi
REPLY="$size"
}
#
# Children administration
#
trap "finished" SIGUSR1
finished() {
FINISHED=1
}
# Waits for signal from child process
# Gets its memory size
wait_get_mem() {
while [ "$FINISHED" -eq 0 ]; do
LANG=C sleep 0.5
done
case $( uname ) in
*Darwin*)
output=( "${(@f)"$( top -pid "$SUB_PID" -stats mem -l 1 )"}" )
to_mbytes "$output[-1]"
;;
*Linux*)
output=( "${(@f)"$( top -p "$SUB_PID" -bn 1 )"}" )
output=$output[-1]
output=( $=output )
to_mbytes "$output[6]"
;;
esac
kill -15 "$SUB_PID"
# Suitable for gnuplot - X Y
echo "$TEST $REPLY"
}
_finished_signal_wait() {
kill -SIGUSR1 "$MAIN_PID"
sleep 60
}
#
# Tests
#
tests=( string_test array_test )
string_test() {
local a=""
integer i=50000
while (( i -- )); do a+="$i"; done
_finished_signal_wait
}
array_test() {
typeset -a a
integer i=10000
while (( i -- )); do a+=( $i ); done
_finished_signal_wait
}
#
# Main code
#
# Detect main vs. for-test invocation
if [ -z "$1" ]; then
for current_zsh in "$zshs[@]"; do
type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
zsh_binary="${current_zsh##*/}"
echo "# Tests for $zsh_binary"
for test in "$tests[@]"; do
FINISHED=0
TEST="$test"
"$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &
SUB_PID=$!
wait_get_mem
done
echo
echo
done
# Example gnuplot invocation:
#set style data histogram
#set style fill solid border rgb "black"
#plot "result" index 0 using 2: xtic(1), "result" index 1 using 2: xtic(1), "result" index 2 using 2: xtic(1)
else
MAIN_PID="$1"
zsh_binary="${2##*/}"
shift
shift
# Echo status only when output is not to terminal
[ ! -t 1 ] && echo >&2 "Running [$zsh_binary]: $@"
# Run the test
"$@"
fi
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-15 13:03 ` Sebastian Gniazdowski
@ 2015-10-16 0:35 ` Bart Schaefer
2015-10-17 9:12 ` Sebastian Gniazdowski
2015-10-16 0:37 ` Bart Schaefer
1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-16 0:35 UTC (permalink / raw)
To: zsh-workers
On Oct 15, 3:03pm, Sebastian Gniazdowski wrote:
}
} https://github.com/psprint/zsh-tools/blob/master/mem-test.zsh
}
} # Tests for zsh-5.1.1-dev-0-clean
} string_test 5.76171875
} array_test 2.703125
}
} Results are in megabytes of RES memory. The first set is for 5.0.2. It
} seems there is something causing high memory usage, which isn't
} activated for the array test, only for string_test.
The array test only iterates 10000 times, the string test 50000 ? Does
that not account for the difference?
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-15 13:03 ` Sebastian Gniazdowski
2015-10-16 0:35 ` Bart Schaefer
@ 2015-10-16 0:37 ` Bart Schaefer
1 sibling, 0 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-16 0:37 UTC (permalink / raw)
To: zsh-workers
On Oct 15, 3:03pm, Sebastian Gniazdowski wrote:
}
} What test function could stress the recent changes?
Something that makes a lot of function calls, some with and some
without declaring local variables, and to different depths of recursion.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-16 0:35 ` Bart Schaefer
@ 2015-10-17 9:12 ` Sebastian Gniazdowski
2015-10-17 9:24 ` Sebastian Gniazdowski
2015-10-18 16:19 ` Bart Schaefer
0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-17 9:12 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 659 bytes --]
On 16 October 2015 at 02:35, Bart Schaefer <schaefer@brasslantern.com> wrote:
> The array test only iterates 10000 times, the string test 50000 ? Does
> that not account for the difference?
Apparently not. I've modified the mem-test script so that it queries
usage of memory also during the tests. It can be seen that for strings
(and zsh 5.0.2) the memory is high from the beginning.
It can be also seen that the recent three patches increase memory
usage. A clean, free from the patches version of zsh
(5.1.1-dev-0-clean) keeps the memory always low. What's good news is
that the memory usage is lower than in 5.0.2.
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: mem-test.txt --]
[-- Type: text/plain, Size: 866 bytes --]
# Tests for zsh
# string_test 453, 571, 650, 711, 758, 800, 835,
string_test 835
# array_test 2.2, 2.9, 3.5, 3.9, 4.7, 5.4, 6.4, 7.5, 7.9, 8.1, 8.6, 8.8, 8.9, 9.2, 9.4, 9.7, 10, 10, 10,
array_test 10
# Tests for zsh-head-both
# string_test 109, 238, 367, 497, 625, 8.9,
string_test 8.9
# array_test 2.1, 3, 3.5, 4, 4.7, 5.8, 6.7, 7.8, 8.2, 8.5, 8.7, 9, 9.2, 9.5, 9.7, 10, 10, 10, 10,
array_test 10
# Tests for zsh-newheaps-three-patches
# string_test 111, 245, 376, 509, 637, 8.9,
string_test 8.9
# array_test 2, 2.8, 3.7, 4.1, 4.7, 5.2, 6.2, 7.2, 7.9, 8.1, 8.4, 8.7, 8.9, 9.2, 9.4, 9.7, 10, 10, 10, 10,
array_test 10
# Tests for zsh-5.1.1-dev-0-clean
# string_test 5.7, 8, 9.5, 11, 11, 10, 10, 11, 11, 11, 12, 11, 10,
string_test 10
# array_test 2, 2.9, 3.5, 3.9, 4.3, 4.6, 5.6, 6.6, 7.3, 7.7, 8, 8.2, 8.5, 8.7, 8.9, 9.1, 9.3, 9.5, 9.7, 10, 10,
array_test 10
[-- Attachment #3: mem-test.png --]
[-- Type: image/png, Size: 89441 bytes --]
[-- Attachment #4: mem-test.zsh --]
[-- Type: application/octet-stream, Size: 3014 bytes --]
#!/bin/zsh
emulate -L zsh
setopt extendedglob
zshs=( /bin/zsh zsh-head-both zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )
# Convert sizes to number of megabytes
to_mbytes() {
local size="$1"
if [[ "$size" = [0-9]#[Mm]* ]]; then
size="${size%[Mm]*}"
elif [[ "$size" = [0-9]#[Kk]* ]]; then
size="${size%[Kk]*}"
(( size = size / 1024.0 ))
elif [[ "$size" = [0-9]# ]]; then
case $( uname ) in
*Linux*)
(( size = size / 1024.0 ))
;;
*)
(( size = size / (1024.0 * 1024.0) ))
;;
esac
else
echo "Bad size occured: $size"
fi
REPLY="$size"
}
#
# Children administration
#
trap "finished" SIGUSR1
finished() {
FINISHED=1
}
# Gets memory size of the zsh that runs the test
get_mem() {
case $( uname ) in
*Darwin*)
output=( "${(@f)"$( top -pid "$SUB_PID" -stats mem -l 1 )"}" )
to_mbytes "$output[-1]"
;;
*Linux*)
output=( "${(@f)"$( top -p "$SUB_PID" -bn 1 )"}" )
output=$output[-1]
output=( $=output )
to_mbytes "$output[6]"
;;
esac
}
# Waits for signal from child process
wait_get_mem() {
local number
echo -n "# $TEST "
while [ "$FINISHED" -eq 0 ]; do
sleep 3
get_mem
number=`LANG=C printf "%.1f" "$REPLY"`
echo -n "${number%.0}, "
done
get_mem
kill -15 "$SUB_PID"
# Suitable for gnuplot - X Y
number=`LANG=C printf "%.1f" "$REPLY"`
echo
echo "$TEST" "${number%.0}"
}
_finished_signal_wait() {
kill -SIGUSR1 "$MAIN_PID"
sleep 60
}
#
# Tests
#
tests=( string_test array_test )
string_test() {
local a=""
integer i=150000
repeat $i; do a+="$i"; done
_finished_signal_wait
}
array_test() {
typeset -a a
integer i=30000
repeat $i; do a+=( $i ); done
_finished_signal_wait
}
#
# Main code
#
# Detect main vs. for-test invocation
if [ -z "$1" ]; then
for current_zsh in "$zshs[@]"; do
type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
zsh_binary="${current_zsh##*/}"
echo "# Tests for $zsh_binary"
for test in "$tests[@]"; do
FINISHED=0
TEST="$test"
"$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &
SUB_PID=$!
wait_get_mem
done
echo
echo
done
# Example gnuplot invocation:
#set style data histogram
#set style fill solid border rgb "black"
#plot "result" index 0 using 2: xtic(1), "result" index 1 using 2: xtic(1), "result" index 2 using 2: xtic(1)
else
MAIN_PID="$1"
zsh_binary="${2##*/}"
shift
shift
# Echo status only when output is not to terminal
[ ! -t 1 ] && echo >&2 "Running [$zsh_binary]: $@"
# Run the test
eval "run_test() { $@ }"
run_test
fi
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-17 9:12 ` Sebastian Gniazdowski
@ 2015-10-17 9:24 ` Sebastian Gniazdowski
2015-10-18 16:19 ` Bart Schaefer
1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-17 9:24 UTC (permalink / raw)
To: zsh-workers
On 17 October 2015 at 11:12, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> (5.1.1-dev-0-clean) keeps the memory always low. What's good news is
> that the memory usage is lower than in 5.0.2.
Also, the patched versions drop down memory usage as soon as test
function finishes doing its work.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-17 9:12 ` Sebastian Gniazdowski
2015-10-17 9:24 ` Sebastian Gniazdowski
@ 2015-10-18 16:19 ` Bart Schaefer
2015-10-18 20:40 ` Sebastian Gniazdowski
1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-18 16:19 UTC (permalink / raw)
To: Sebastian Gniazdowski; +Cc: Zsh hackers list
On Sat, Oct 17, 2015 at 2:12 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 16 October 2015 at 02:35, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> The array test only iterates 10000 times, the string test 50000 ? Does
>> that not account for the difference?
>
> Apparently not. I've modified the mem-test script so that it queries
> usage of memory also during the tests. It can be seen that for strings
> (and zsh 5.0.2) the memory is high from the beginning.
I'm strongly inclined to think that's more likely an artifact of the
testing method than of the test itself -- have you tried running the
tests in the opposite order? You also haven't explained your
reasoning for having the string test iterate so many more times than
the array test. Does it have to do with keeping total run time
similar?
> It can be also seen that the recent three patches increase memory
> usage. A clean, free from the patches version of zsh
> (5.1.1-dev-0-clean) keeps the memory always low.
That's entirely to be expected -- all three patches increase speed by
avoiding the work necessary to minimize memory footprint, until (as
you noted in next message) the execution stack is unwound.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-18 16:19 ` Bart Schaefer
@ 2015-10-18 20:40 ` Sebastian Gniazdowski
2015-10-18 21:07 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-18 20:40 UTC (permalink / raw)
To: zsh-workers
On 18 October 2015 at 18:19, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sat, Oct 17, 2015 at 2:12 AM, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> On 16 October 2015 at 02:35, Bart Schaefer <schaefer@brasslantern.com> wrote:
>>> The array test only iterates 10000 times, the string test 50000 ? Does
>>> that not account for the difference?
>>
>> Apparently not. I've modified the mem-test script so that it queries
>> usage of memory also during the tests. It can be seen that for strings
>> (and zsh 5.0.2) the memory is high from the beginning.
>
> I'm strongly inclined to think that's more likely an artifact of the
> testing method than of the test itself have you tried running the
> tests in the opposite order?
The test script is quite well written. Here is how a test is being run:
"$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &
It starts new zsh process – the one in "$current_zsh" variable – and
feeds it with the same script file, giving in arguments pid of main
Zsh, name of tested Zsh and name of test function to run. This should
imply full separation of the tests. That's why order shouldn't matter.
However I did reverse the order to just check and results are the
same. The memory profile of three-patches version is thus:
# string_test 41, 164, 279, 388, 504, 602, 707, last: 11
You are quite right that the memory usage isn't "high from the
beginning". It starts from 41 MB, goes to 164, 279, etc. in quite
controlled way. Previous version of the script awaited 3 seconds
before doing first check, that's why it started from values of 100 MB,
and I rather misinterpreted this. How to interpret these still high
values is now open.
> You also haven't explained your
> reasoning for having the string test iterate so many more times than
> the array test. Does it have to do with keeping total run time
> similar?
Yes that's the reason. I now do even more iterations to make the test
last. During this time I query memory size, obtaining a nice sequence
of memory usages. It nicely illustrates behavior of Zsh.
>> It can be also seen that the recent three patches increase memory
>> usage. A clean, free from the patches version of zsh
>> (5.1.1-dev-0-clean) keeps the memory always low.
>
> That's entirely to be expected -- all three patches increase speed by
> avoiding the work necessary to minimize memory footprint, until (as
> you noted in next message) the execution stack is unwound.
The script now computes averages. Average of 5.0.2 is 622.3, of
two-patches version 327.7, of the three patches (third is the new
heaps) 329.7. So it's better than without the patches, having in mind
also the quick drop after stack is unwound.
Tried a new test function:
function_test() {
local count
if [ -z "$1" ]; then
repeat 10000; do function_test 100; done
_finished_signal_wait
else
count="$1"
fi
if (( count -- > 0 )); then
function_test "$count"
fi
}
Its memory usage is low for all Zsh versions: ~2.5 MB, (e.g. 2.7, 2.9,
2.8, 2.6, 2.9, 2.8, 2.6, 2.9, 2.9, 2.8).
Best Regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-18 20:40 ` Sebastian Gniazdowski
@ 2015-10-18 21:07 ` Bart Schaefer
2015-10-18 21:31 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-18 21:07 UTC (permalink / raw)
To: Sebastian Gniazdowski; +Cc: Zsh hackers list
[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]
On Oct 18, 2015 1:40 PM, "Sebastian Gniazdowski" <sgniazdowski@gmail.com>
wrote:
>
> On 18 October 2015 at 18:19, Bart Schaefer <schaefer@brasslantern.com>
wrote:
> >
> > I'm strongly inclined to think that's more likely an artifact of the
> > testing method than of the test itself have you tried running the
> > tests in the opposite order?
>
> It starts new zsh process – the one in "$current_zsh" variable – and
> feeds it with the same script file, giving in arguments pid of main
> Zsh, name of tested Zsh and name of test function to run. This should
> imply full separation of the tests. That's why order shouldn't matter.
Sorry, I didn't mean order of zsh versions, I meant did you try doing
array_test before string_test within each version.
> You are quite right that the memory usage isn't "high from the
> beginning". It starts from 41 MB, goes to 164, 279, etc. in quite
> controlled way. Previous version of the script awaited 3 seconds
> before doing first check, that's why it started from values of 100 MB,
> and I rather misinterpreted this. How to interpret these still high
> values is now open.
String is stored as one contiguous block of bytes, so each time it exceeds
the size of the current heap block an entire new large-enough block is
allocated, and no previous block can never be re-used because the string
never shrinks.
> Tried a new test function:
>
> Its memory usage is low for all Zsh versions: ~2.5 MB, (e.g. 2.7, 2.9,
> 2.8, 2.6, 2.9, 2.8, 2.6, 2.9, 2.9, 2.8).
For functions the test would also need to compare run time / CPU usage - I
would not expect memory usage to be interesting unless the function
declared locals.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-18 21:07 ` Bart Schaefer
@ 2015-10-18 21:31 ` Sebastian Gniazdowski
2015-10-19 17:21 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-18 21:31 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 943 bytes --]
On 18 October 2015 at 23:07, Bart Schaefer <schaefer@brasslantern.com> wrote:
> Sorry, I didn't mean order of zsh versions, I meant did you try doing
> array_test before string_test within each version.
That's what I did
> String is stored as one contiguous block of bytes, so each time it exceeds
> the size of the current heap block an entire new large-enough block is
> allocated, and no previous block can never be re-used because the string
> never shrinks.
How can 5.1.1-dev-0 with no patches maintain low memory usage for strings then?
# string_test 3.3, 7.8, 9.4, 11, 11, 11, 11, 11, 11, 12, 11, 12, 11,
12, last: 11
> For functions the test would also need to compare run time / CPU usage - I
> would not expect memory usage to be interesting unless the function declared
> locals.
The function test has two samples more (7 vs. 9), taken in intervals
of 3 seconds, so there is a difference.
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: mem-test2.txt --]
[-- Type: text/plain, Size: 1165 bytes --]
# Tests for zsh
# array_test 1.5, 2.6, 3.3, 4, 4.5, 4.8, 5.7, 6.4, 7.5, 7.9, 8.1, 8.4, 8.7, 8.9, last: 9
array_test 5.6
# string_test 361, 509, 608, 676, 728, 771, 806, last: 836
string_test 608.8
# function_test 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, last: 2.7
function_test 2.7
# Tests for zsh-head-both
# array_test 1.5, 2.6, 3.2, 4, 4.3, 4.8, 6, 7, 7.7, 8.2, 8.4, 8.7, 9, last: 9.1
array_test 5.5
# string_test 44, 173, 302, 431, 560, 681, last: 10
string_test 302.0
# function_test 2.5, 2.6, 2.6, 2.5, 2.6, 2.6, 2.6, last: 2.5
function_test 2.6
# Tests for zsh-newheaps-three-patches
# array_test 1.5, 2.7, 3.4, 3.9, 4.3, 4.8, 5.9, 6.6, 7.6, 8, 8.3, 8.6, 8.9, 9.1, last: 9.1
array_test 5.7
# string_test 46, 173, 290, 402, 511, 614, 707, last: 10
string_test 339.3
# function_test 2.9, 2.8, 2.6, 2.6, 2.8, 2.7, 2.6, 2.8, 2.9, last: 2.6
function_test 2.7
# Tests for zsh-5.1.1-dev-0-clean
# array_test 1.5, 2.6, 3.3, 4, 4.5, 5, 6.3, 7, 7.9, 8.2, 8.5, 8.7, 9, last: 9.1
array_test 5.6
# string_test 3.1, 7.3, 9, 11, 11, 11, 11, 11, 11, 12, 11, 12, 12, 11, last: 11
string_test 10.2
# function_test 2.6, 2.5, 2.6, 2.6, 2.6, 2.6, 2.5, last: 2.5
function_test 2.6
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-18 21:31 ` Sebastian Gniazdowski
@ 2015-10-19 17:21 ` Bart Schaefer
2015-10-22 12:49 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-19 17:21 UTC (permalink / raw)
To: zsh-workers
On Oct 18, 11:31pm, Sebastian Gniazdowski wrote:
}
} > String is stored as one contiguous block of bytes, so each time it exceeds
} > the size of the current heap block an entire new large-enough block is
} > allocated, and no previous block can never be re-used because the string
} > never shrinks.
}
} How can 5.1.1-dev-0 with no patches maintain low memory usage for
} strings then?
It's probably this:
+ h->next = NULL;
+ } else {
+ /* Leave an empty arena at the end until popped */
+ h->used = 0;
+ fheap = hl = h;
+ break;
+ }
+ if (fheap == h)
We leave that empty arena at the end, but then on the next zhalloc()
it's still not big enough, so another one gets allocated.
You could try removing that "else" to let the trailing block be freed
again, but then we're back to needing to compare speed as well as space
across shell versions.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-19 17:21 ` Bart Schaefer
@ 2015-10-22 12:49 ` Sebastian Gniazdowski
2015-10-22 15:00 ` Bart Schaefer
2015-10-23 6:32 ` Sebastian Gniazdowski
0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-22 12:49 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1990 bytes --]
On 19 October 2015 at 19:21, Bart Schaefer <schaefer@brasslantern.com> wrote:
> We leave that empty arena at the end, but then on the next zhalloc()
> it's still not big enough, so another one gets allocated.
>
> You could try removing that "else" to let the trailing block be freed
> again, but then we're back to needing to compare speed as well as space
> across shell versions.
I removed the else, code works longer and memory is still allocated.
This is seen in mem-test3.txt, where there are many samples and large
numbers, and in perf-test.txt, where there noted is long running time
(numbers are milliseconds)
I wrote perf-test.zsh to automatically perform performance tests:
https://github.com/psprint/zsh-tools/
With it and with mem-test.zsh I tested following zsh variants:
- 5.0.2 osx
- newheaps patch (zsh-newheaps-only)
- newheaps and zhalloc patch 36836 (zsh-newheaps-zhalloc)
- zhalloc and the previous patch 36834 (zsh-head-both-patches)
- all three patches (zsh-newheaps-three-patches)
- a 5.1.1-dev-0 clean zsh
- the empty arena else removed
My conclusions:
- 36834 introduces high memory usage; I would suggest to use only two
optimizations – newheaps and zhalloc; my znavtools are instant fast
with them, what's slower is searching (wrote search_test function,
results attached); basically half of pattern search speedup is lost
with 36834, however what's gained is beautiful memory usage, without
firefox-like values for RES (~700 MB); newheaps is responsible for
instant-responsivity of my script, 36836 for much faster searching –
that's a good compromise
- searching with patterns through very large arrays (700k) allocates
264 MB of memory regardless of zsh version; this probably tells
something
- still not sure why 36834 allocates much memory for the string test
- newheaps makes function calls longer, but it's not a substantial
difference – 21196 ms vs. 18549 ms
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: mem-test3.txt --]
[-- Type: text/plain, Size: 2620 bytes --]
# Tests for zsh-empty-arena-else
# string_test 51, 152, 206, 247, 280, 305, 327, 345, 363, 380, 394, 407, 421, 432, 444, 455, 464, 474, 483, 493, 502, 509, 518, 525, 534, 541, 549, 555, 560, 569, 575, 581, last: 586
string_test 421.3
# array_test 1, 2.7, 2.9, 3.8, 4.3, 5.6, 6.5, 7.6, 8, 8.4, 8.7, 8.9, last: 9.1
array_test 5.4
# function_test 2.6, 2.9, 2.7, 2.9, 2.9, 2.8, 2.9, 2.7, last: 2.6
function_test 2.8
# search_test 110, 215, last: 264
search_test 110.0
# Tests for zsh-newheaps-zhalloc
# string_test 2.9, 7.7, 9.3, 11, 11, 11, 12, 11, 12, 11, 11, 11, 12, last: 11
string_test 10.1
# array_test 1, 2, 3.1, 3.3, 4.5, 5.3, 6.3, 7.5, 7.8, 8.2, 8.5, 8.7, last: 8.9
array_test 5.2
# function_test 2.7, 2.7, 2.8, 2.6, 2.6, 2.9, 2.9, 3, last: 2.6
function_test 2.7
# search_test 86, 158, 200, 237, last: 265
search_test 148.0
# Tests for zsh-newheaps-only
# string_test 3, 7.7, 9.1, 11, 11, 11, 12, 11, 11, 11, 12, 11, 11, last: 10
string_test 10.1
# array_test 1.4, 2, 3, 3.8, 4.4, 5.3, 6.3, 7, 7.7, 8, 8.3, 8.5, last: 8.8
array_test 5.2
# function_test 2.7, 2.6, 2.8, 2.7, 2.9, 2.7, 2.7, 2.7, last: 2.6
function_test 2.7
# search_test 62, 122, 144, 171, 196, 218, 238, 259, last: 266
search_test 164.4
# Tests for zsh-5.0.2-osx
# string_test 2.5, 7.4, 8.9, 11, 11, 12, 11, 11, 11, 11, 12, 12, last: 10
string_test 9.9
# array_test 1.4, 2.2, 3, 4, 4.6, 5.3, 6.3, 7.4, 7.8, 8, 8.4, 8.7, last: 8.8
array_test 5.3
# function_test 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, last: 2.6
function_test 2.6
# search_test 59, 122, 152, 179, 204, 226, 245, last: 264
search_test 157.0
# Tests for zsh-head-both-patches
# string_test 40, 182, 318, 456, 589, 725, last: 9.7
string_test 317.0
# array_test 1.1, 2.5, 3, 3.8, 4.3, 5, 6, 7, 7.5, 7.8, 8.2, 8.3, last: 8.5
array_test 5.1
# function_test 2.6, 2.6, 2.6, 2.6, 2.5, 2.6, 2.6, last: 2.5
function_test 2.6
# search_test 122, 217, last: 266
search_test 122.0
# Tests for zsh-newheaps-three-patches
# string_test 44, 188, 331, 471, 612, last: 8.9
string_test 258.5
# array_test 1.1, 2, 3.1, 3.8, 4.4, 5.1, 6.2, 7.2, 7.7, 8, 8.2, 8.6, last: 8.7
array_test 5.2
# function_test 2.7, 2.7, 2.9, 2.8, 2.9, 2.7, 2.9, 2.6, last: 2.6
function_test 2.8
# search_test 116, 216, last: 265
search_test 116.0
# Tests for zsh-5.1.1-dev-0-clean
# string_test 2.6, 7.3, 8.8, 11, 11, 11, 10, 11, 12, 11, 12, 11, last: 10
string_test 9.8
# array_test 1.3, 2.2, 2.3, 3.4, 4.2, 5, 6.1, 6.9, 7.6, 7.8, 8.1, 8.4, last: 8.6
array_test 5.0
# function_test 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, last: 2.6
function_test 2.6
# search_test 62, 121, 151, 177, 201, 223, 243, 264, last: 265
search_test 168.3
[-- Attachment #3: perf-test.txt --]
[-- Type: text/plain, Size: 1487 bytes --]
Running [zsh-empty-arena-else]: string_test 13749,47
Running [zsh-empty-arena-else]: array_test 12182,06
Running [zsh-empty-arena-else]: function_test 21345,66
Running [zsh-empty-arena-else]: search_test 3849,80
Running [zsh-newheaps-zhalloc]: string_test 9990,75
Running [zsh-newheaps-zhalloc]: array_test 12109,69
Running [zsh-newheaps-zhalloc]: function_test 21318,52
Running [zsh-newheaps-zhalloc]: search_test 6842,58
Running [zsh-newheaps-only]: string_test 9999,97
Running [zsh-newheaps-only]: array_test 11925,05
Running [zsh-newheaps-only]: function_test 21196,27
Running [zsh-newheaps-only]: search_test 11670,71
Running [zsh-5.0.2-osx]: string_test 9628,87
Running [zsh-5.0.2-osx]: array_test 12294,85
Running [zsh-5.0.2-osx]: function_test 18668,42
Running [zsh-5.0.2-osx]: search_test 11518,38
Running [zsh-head-both-patches]: string_test 4385,93
Running [zsh-head-both-patches]: array_test 11933,38
Running [zsh-head-both-patches]: function_test 18549,57
Running [zsh-head-both-patches]: search_test 3949,40
Running [zsh-newheaps-three-patches]: string_test 4351,58
Running [zsh-newheaps-three-patches]: array_test 12111,25
Running [zsh-newheaps-three-patches]: function_test 21356,30
Running [zsh-newheaps-three-patches]: search_test 4063,73
Running [zsh-5.1.1-dev-0-clean]: string_test 9651,08
Running [zsh-5.1.1-dev-0-clean]: array_test 12749,21
Running [zsh-5.1.1-dev-0-clean]: function_test 18667,25
Running [zsh-5.1.1-dev-0-clean]: search_test 11386,36
[-- Attachment #4: mem-test.zsh --]
[-- Type: application/octet-stream, Size: 4111 bytes --]
#!/bin/zsh
emulate -L zsh
setopt extendedglob
zshs=( zsh-empty-arena-else zsh-newheaps-zhalloc zsh-newheaps-only )
zshs+=( zsh-5.0.2-osx zsh-head-both-patches zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )
# Convert sizes to number of megabytes
to_mbytes() {
local size="$1"
#echo Converting: "$1"
if [[ "$size" = [0-9]#[Mm]* ]]; then
size="${size%[Mm]*}"
elif [[ "$size" = [0-9]#[Kk]* ]]; then
size="${size%[Kk]*}"
(( size = size / 1024.0 ))
elif [[ "$size" = [0-9]# ]]; then
case $( uname ) in
*Linux*)
(( size = size / 1024.0 ))
;;
*)
(( size = size / (1024.0 * 1024.0) ))
;;
esac
else
echo "Bad size occured: $size"
fi
REPLY="$size"
}
#
# Children administration
#
trap "finished" SIGUSR1
finished() {
FINISHED=1
}
# Gets memory size of the zsh that runs the test
get_mem() {
case $( uname ) in
*Darwin*)
output=( "${(@f)"$( top -pid "$SUB_PID" -stats mem -l 1 )"}" )
to_mbytes "$output[-1]"
;;
*Linux*)
output=( "${(@f)"$( top -p "$SUB_PID" -bn 1 )"}" )
output=$output[-1]
output=( $=output )
to_mbytes "$output[6]"
;;
esac
}
# Waits for signal from child process
wait_get_mem() {
local number
float average=0.0
integer count=0
echo -n "# $TEST "
sleep 1
while [ "$FINISHED" -eq 0 ]; do
get_mem
number=`LANG=C printf "%.1f" "$REPLY"`
echo -n "${number%.0}, "
average+=number
count+=1
sleep 3
done
# Remove last result - to be certain that the average
# is only above data from actively working test function
if [ "$count" -gt 1 ]; then
average=average-number
count=count-1
fi
average=average/count
# Take the explicit last result, it's telling
# how zsh behaves when computation is stopped
sleep 1
get_mem
number=`LANG=C printf "%.1f" "$REPLY"`
echo "last: ${number%.0}"
kill -15 "$SUB_PID"
# Suitable for gnuplot - X Y
LANG=C printf "$TEST %.1f\n" $average
}
_finished_signal_wait() {
kill -SIGUSR1 "$MAIN_PID"
sleep 60
}
#
# Tests
#
tests=( string_test array_test function_test search_test )
string_test() {
local a=""
integer i=150000
repeat $i; do a+="$i"; done
_finished_signal_wait
}
array_test() {
typeset -a a
integer i=25000
repeat $i; do a+=( $i ); done
_finished_signal_wait
}
function_test() {
local count
if [ -z "$1" ]; then
repeat 10000; do function_test 100; done
_finished_signal_wait
else
count="$1"
fi
if (( count -- > 0 )); then
function_test "$count"
fi
}
search_test() {
integer elements=700000
a="${(r:elements:: _:)b}"
a=( $=a )
a=( "${(@M)a:#(#i)*_*}" )
a=( "${(@)a//(#mi)(_|a)/-${MATCH}-}" )
a=( "${(@)a//(#bi)(_|-)/|${match[1]}|}" )
_finished_signal_wait
}
#
# Main code
#
# Detect main vs. for-test invocation
if [ -z "$1" ]; then
for current_zsh in "$zshs[@]"; do
type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
zsh_binary="${current_zsh##*/}"
echo "# Tests for $zsh_binary"
for test in "$tests[@]"; do
FINISHED=0
TEST="$test"
"$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &
SUB_PID=$!
wait_get_mem
done
echo
echo
done
# Example gnuplot invocation:
#set style data histogram
#set style fill solid border rgb "black"
#plot "result" index 0 using 2: xtic(1), "result" index 1 using 2: xtic(1), "result" index 2 using 2: xtic(1)
else
MAIN_PID="$1"
zsh_binary="${2##*/}"
shift
shift
# Echo status only when output is not to terminal
[ ! -t 1 ] && echo >&2 "Running [$zsh_binary]: $@"
# Run the test
eval "run_test() { $@ }"
run_test
fi
[-- Attachment #5: perf-test.zsh --]
[-- Type: application/octet-stream, Size: 2168 bytes --]
#!/bin/zsh
emulate -L zsh
setopt extendedglob
zmodload zsh/zprof
zshs=( zsh-empty-arena-else zsh-newheaps-zhalloc zsh-newheaps-only )
zshs+=( zsh-5.0.2-osx zsh-head-both-patches zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )
#
# Children administration
#
trap "finished" SIGUSR1
finished() {
FINISHED=1
}
# Waits for signal from child process
wait_for_end_of_test() {
while [ "$FINISHED" -eq 0 ]; do
sleep 1
done
kill -15 "$SUB_PID"
}
_finished_signal_wait() {
kill -SIGUSR1 "$MAIN_PID"
sleep 60
}
#
# Tests
#
tests=( string_test array_test function_test search_test )
float multiplier=0.5
string_test() {
local a=""
integer i=$(( 150000*multiplier ))
repeat $i; do a+="$i"; done
}
array_test() {
typeset -a a
integer i=$(( 25000*multiplier ))
repeat $i; do a+=( $i ); done
}
function_test() {
local count
integer i=$(( 10000*multiplier ))
if [ -z "$1" ]; then
repeat $i; do function_test 100; done
else
count="$1"
fi
if (( count -- > 0 )); then
function_test "$count"
fi
}
search_test() {
integer elements=$(( 800000 * multiplier ))
a="${(r:elements:: _:)b}"
a=( $=a )
a=( "${(@M)a:#(#i)*_*}" )
a=( "${(@)a//(#mi)(_|a)/-${MATCH}-}" )
a=( "${(@)a//(#bi)(_|-)/|${match[1]}|}" )
}
#
# Main code
#
# Detect main vs. for-test invocation
if [ -z "$1" ]; then
for current_zsh in "$zshs[@]"; do
type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
zsh_binary="${current_zsh##*/}"
for test in "$tests[@]"; do
FINISHED=0
TEST="$test"
"$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &
SUB_PID=$!
wait_for_end_of_test
done
echo
echo
done
else
MAIN_PID="$1"
zsh_binary="${2##*/}"
shift
shift
echo -n "Running [$zsh_binary]: $@ "
# Run the test
zprof -c
"$@"
zprof_out=( "${(@f)"$( zprof )"}" )
zprof_out="$zprof_out[3]"
zprof_out=( $=zprof_out )
echo "$zprof_out[3]"
_finished_signal_wait
fi
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-22 12:49 ` Sebastian Gniazdowski
@ 2015-10-22 15:00 ` Bart Schaefer
2015-10-22 16:28 ` Sebastian Gniazdowski
2015-10-23 15:40 ` Sebastian Gniazdowski
2015-10-23 6:32 ` Sebastian Gniazdowski
1 sibling, 2 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-22 15:00 UTC (permalink / raw)
To: zsh-workers
On Oct 22, 2:49pm, Sebastian Gniazdowski wrote:
}
} My conclusions:
} - 36834 introduces high memory usage; I would suggest to use only two
} optimizations -- newheaps and zhalloc; my znavtools are instant fast
} with them, what's slower is searching (wrote search_test function,
} results attached)
My apologies, but I'm not sure I followed all of that. Can you produce
a diff against commit 8e9a68ad14655c1949b1a04d5715a5caa8c344ee that
shows the situation you think is the best so far?
} - still not sure why 36834 allocates much memory for the string test
Seeing the aforementioned diff may provide a hint.
} - newheaps makes function calls longer, but it's not a substantial
} difference
14% is pretty substantial, but it's probably all gained back if the
function does anything significant with local variables.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-22 15:00 ` Bart Schaefer
@ 2015-10-22 16:28 ` Sebastian Gniazdowski
2015-10-22 16:33 ` Sebastian Gniazdowski
2015-10-23 15:40 ` Sebastian Gniazdowski
1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-22 16:28 UTC (permalink / raw)
To: zsh-workers
On 22 October 2015 at 17:00, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 22, 2:49pm, Sebastian Gniazdowski wrote:
> }
> } My conclusions:
> } - 36834 introduces high memory usage; I would suggest to use only two
> } optimizations -- newheaps and zhalloc; my znavtools are instant fast
> } with them, what's slower is searching (wrote search_test function,
> } results attached)
>
> My apologies, but I'm not sure I followed all of that. Can you produce
> a diff against commit 8e9a68ad14655c1949b1a04d5715a5caa8c344ee that
> shows the situation you think is the best so far?
Patch 9f8e3e8 (36834) and 643aca9 (lost part of 36834) needs to be
reverted. The patches that should stay are 36836 (nicknamed zhalloc by
me) and 36853 (nicknamed newheaps). 9f8e3e8 doesn't revert
automatically, I can probably do it later if you won't.
> } - newheaps makes function calls longer, but it's not a substantial
> } difference
>
> 14% is pretty substantial, but it's probably all gained back if the
> function does anything significant with local variables.
Yes, I strongly looked forward at instant responsivity ignoring the
loss. Wanted to do some real world test – find fully zsh-based
distribution, switch Zsh between newheaps version and no-newheaps
version, compare boot times, but there isn't such distribution. Peter
described one possible real world test, but not sure how to do it, I
might think about it again.
Function calls aren't that fast anyway, are they? When optimizing ZNT
I pasted an expand_tabs() function instead of calling it 30k times
because even with the body of the function literally empty, zprof
still reported 10-15% use of CPU time. I might later revert ZNT to
that commit to compare how this behaves now, it might be surprisingly
better than worse.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-22 16:28 ` Sebastian Gniazdowski
@ 2015-10-22 16:33 ` Sebastian Gniazdowski
0 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-22 16:33 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 225 bytes --]
PS. I also attach diff with the two should-stay patches, it has the
state of the code that I used to build zsh-newheaps-zhalloc. It
contains newheaps and zhalloc patch, and not the 36834.
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: newheaps_zhalloc.patch --]
[-- Type: application/octet-stream, Size: 1408 bytes --]
diff --git a/Src/exec.c b/Src/exec.c
index 154bbb8..76af7e9 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -5066,10 +5066,11 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
#ifdef MAX_FUNCTION_DEPTH
static int funcdepth;
#endif
+ Heap funcheap;
queue_signals(); /* Lots of memory and global state changes coming */
- pushheap();
+ NEWHEAPS(funcheap) {
oargv0 = NULL;
obreaks = breaks;
@@ -5289,7 +5290,8 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
numpipestats = oldnumpipestats;
memcpy(pipestats, oldpipestats, sizeof(int)*numpipestats);
}
- popheap();
+
+ } OLDHEAPS;
unqueue_signals();
diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..63a5afe 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -543,9 +543,14 @@ zhalloc(size_t size)
/* find a heap with enough free space */
- for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
- ? fheap : heaps);
- h; h = h->next) {
+ /*
+ * This previously assigned:
+ * h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
+ * ? fheap : heaps);
+ * but we think that nothing upstream of fheap has more free space,
+ * so why start over at heaps just because fheap has too little?
+ */
+ for (h = (fheap ? fheap : heaps); h; h = h->next) {
if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
void *ret;
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-22 12:49 ` Sebastian Gniazdowski
2015-10-22 15:00 ` Bart Schaefer
@ 2015-10-23 6:32 ` Sebastian Gniazdowski
1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-23 6:32 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]
I ran the performance tests again with script formatting them
differently – they are aggregated on test function, not on zsh
version. This allows easier interpretation. A new conclusion is that
zhalloc patch doesn't help for string_test. So the first patch (36834)
speeds up string operations and also introduces increased memory
usage.
Best regards,
Sebastian Gniazdowski
On 22 October 2015 at 14:49, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 19 October 2015 at 19:21, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> We leave that empty arena at the end, but then on the next zhalloc()
>> it's still not big enough, so another one gets allocated.
>>
>> You could try removing that "else" to let the trailing block be freed
>> again, but then we're back to needing to compare speed as well as space
>> across shell versions.
>
> I removed the else, code works longer and memory is still allocated.
> This is seen in mem-test3.txt, where there are many samples and large
> numbers, and in perf-test.txt, where there noted is long running time
> (numbers are milliseconds)
>
> I wrote perf-test.zsh to automatically perform performance tests:
> https://github.com/psprint/zsh-tools/
>
> With it and with mem-test.zsh I tested following zsh variants:
> - 5.0.2 osx
> - newheaps patch (zsh-newheaps-only)
> - newheaps and zhalloc patch 36836 (zsh-newheaps-zhalloc)
> - zhalloc and the previous patch 36834 (zsh-head-both-patches)
> - all three patches (zsh-newheaps-three-patches)
> - a 5.1.1-dev-0 clean zsh
> - the empty arena else removed
>
> My conclusions:
> - 36834 introduces high memory usage; I would suggest to use only two
> optimizations – newheaps and zhalloc; my znavtools are instant fast
> with them, what's slower is searching (wrote search_test function,
> results attached); basically half of pattern search speedup is lost
> with 36834, however what's gained is beautiful memory usage, without
> firefox-like values for RES (~700 MB); newheaps is responsible for
> instant-responsivity of my script, 36836 for much faster searching –
> that's a good compromise
>
> - searching with patterns through very large arrays (700k) allocates
> 264 MB of memory regardless of zsh version; this probably tells
> something
>
> - still not sure why 36834 allocates much memory for the string test
>
> - newheaps makes function calls longer, but it's not a substantial
> difference – 21196 ms vs. 18549 ms
>
> Best regards,
> Sebastian Gniazdowski
[-- Attachment #2: perf-test2.txt --]
[-- Type: text/plain, Size: 1880 bytes --]
Running [zsh-empty-arena-else]: string_test 24933,14
Running [zsh-newheaps-zhalloc]: string_test 10045,79
Running [zsh-newheaps-only]: string_test 10053,55
Running [zsh-5.0.2-osx]: string_test 9726,88
Running [zsh-head-both-patches]: string_test 4204,30
Running [zsh-newheaps-three-patches]: string_test 4406,47
Running [zsh-5.1.1-dev-0-clean]: string_test 9600,21
Running [zsh-empty-arena-else]: array_test 12689,70
Running [zsh-newheaps-zhalloc]: array_test 12042,77
Running [zsh-newheaps-only]: array_test 11985,55
Running [zsh-5.0.2-osx]: array_test 12206,27
Running [zsh-head-both-patches]: array_test 12320,56
Running [zsh-newheaps-three-patches]: array_test 11788,41
Running [zsh-5.1.1-dev-0-clean]: array_test 12814,19
Running [zsh-empty-arena-else]: function_test 21774,49
Running [zsh-newheaps-zhalloc]: function_test 21981,39
Running [zsh-newheaps-only]: function_test 21268,29
Running [zsh-5.0.2-osx]: function_test 18539,70
Running [zsh-head-both-patches]: function_test 18774,33
Running [zsh-newheaps-three-patches]: function_test 21171,02
Running [zsh-5.1.1-dev-0-clean]: function_test 18940,36
Running [zsh-empty-arena-else]: search_test 3934,60
Running [zsh-newheaps-zhalloc]: search_test 7007,60
Running [zsh-newheaps-only]: search_test 11653,22
Running [zsh-5.0.2-osx]: search_test 11136,82
Running [zsh-head-both-patches]: search_test 3872,39
Running [zsh-newheaps-three-patches]: search_test 3960,67
Running [zsh-5.1.1-dev-0-clean]: search_test 11281,82
[-- Attachment #3: perf-test.zsh --]
[-- Type: application/octet-stream, Size: 2219 bytes --]
#!/bin/zsh
emulate -L zsh
setopt extendedglob
zmodload zsh/zprof
zshs=( zsh-empty-arena-else zsh-newheaps-zhalloc zsh-newheaps-only )
zshs+=( zsh-5.0.2-osx zsh-head-both-patches zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )
#
# Children administration
#
trap "finished" SIGUSR1
finished() {
FINISHED=1
}
# Waits for signal from child process
wait_for_end_of_test() {
while [ "$FINISHED" -eq 0 ]; do
sleep 1
done
kill -15 "$SUB_PID"
}
_finished_signal_wait() {
kill -SIGUSR1 "$MAIN_PID"
sleep 60
}
#
# Tests
#
tests=( string_test array_test function_test search_test )
float multiplier=0.5
string_test() {
local a=""
integer i=$(( 150000*multiplier ))
repeat $i; do a+="$i"; done
}
array_test() {
typeset -a a
integer i=$(( 25000*multiplier ))
repeat $i; do a+=( $i ); done
}
function_test() {
local count
integer i=$(( 10000*multiplier ))
if [ -z "$1" ]; then
repeat $i; do function_test 100; done
else
count="$1"
fi
if (( count -- > 0 )); then
function_test "$count"
fi
}
search_test() {
integer elements=$(( 800000 * multiplier ))
a="${(r:elements:: _:)b}"
a=( $=a )
a=( "${(@M)a:#(#i)*_*}" )
a=( "${(@)a//(#mi)(_|a)/-${MATCH}-}" )
a=( "${(@)a//(#bi)(_|-)/|${match[1]}|}" )
}
#
# Main code
#
# Detect main vs. for-test invocation
if [ -z "$1" ]; then
for test in "$tests[@]"; do
for current_zsh in "$zshs[@]"; do
type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
zsh_binary="${current_zsh##*/}"
FINISHED=0
TEST="$test"
"$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &
SUB_PID=$!
wait_for_end_of_test
done
echo
done
else
MAIN_PID="$1"
zsh_binary="${2##*/}"
shift
shift
txt="Running [$zsh_binary]: "
echo -n "${(r:40:: :)txt}" "${(r:15:: :)*}"
# Run the test
zprof -c
"$@"
zprof_out=( "${(@f)"$( zprof )"}" )
zprof_out="$zprof_out[3]"
zprof_out=( $=zprof_out )
echo "${(l:10:: :)zprof_out[3]}"
_finished_signal_wait
fi
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-22 15:00 ` Bart Schaefer
2015-10-22 16:28 ` Sebastian Gniazdowski
@ 2015-10-23 15:40 ` Sebastian Gniazdowski
2015-10-23 15:57 ` Sebastian Gniazdowski
1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-23 15:40 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]
I attach the patch that withdraws 9f8e3e8 and 643aca9 (which are 36834)
Best regards,
Sebastian Gniazdowski
On 22 October 2015 at 17:00, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 22, 2:49pm, Sebastian Gniazdowski wrote:
> }
> } My conclusions:
> } - 36834 introduces high memory usage; I would suggest to use only two
> } optimizations -- newheaps and zhalloc; my znavtools are instant fast
> } with them, what's slower is searching (wrote search_test function,
> } results attached)
>
> My apologies, but I'm not sure I followed all of that. Can you produce
> a diff against commit 8e9a68ad14655c1949b1a04d5715a5caa8c344ee that
> shows the situation you think is the best so far?
>
> } - still not sure why 36834 allocates much memory for the string test
>
> Seeing the aforementioned diff may provide a hint.
>
> } - newheaps makes function calls longer, but it's not a substantial
> } difference
>
> 14% is pretty substantial, but it's probably all gained back if the
> function does anything significant with local variables.
[-- Attachment #2: fheaps_revert.patch --]
[-- Type: application/octet-stream, Size: 4413 bytes --]
diff --git a/Src/mem.c b/Src/mem.c
index 68bd767..63a5afe 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -294,7 +294,7 @@ pushheap(void)
#endif
for (h = heaps; h; h = h->next) {
- DPUTS(!h->used && h->next, "BUG: empty heap");
+ DPUTS(!h->used, "BUG: empty heap");
hs = (Heapstack) zalloc(sizeof(*hs));
hs->next = h->sp;
h->sp = hs;
@@ -334,15 +334,17 @@ freeheap(void)
*
* Whenever fheap is NULL here, the loop below sweeps back over the
* entire heap list again, resetting the free space in every arena to
- * the amount stashed by pushheap() and finding the arena with the most
+ * the amount stashed by pushheap() and finding the first arena with
* free space to optimize zhalloc()'s next search. When there's a lot
* of stuff already on the heap, this is an enormous amount of work,
* and performance goes to hell.
*
- * Therefore, we defer freeing the most recently allocated arena until
- * we reach popheap(). This may fail to reclaim some space in earlier
- * arenas.
- *
+ * However, if the arena to which fheap points is unused, we want to
+ * free it, so we have no choice but to do the sweep for a new fheap.
+ */
+ if (fheap && !fheap->sp)
+ fheap = NULL; /* We used to do this unconditionally */
+ /*
* In other cases, either fheap is already correct, or it has never
* been set and this loop will do it, or it'll be reset from scratch
* on the next popheap(). So all that's needed here is to pick up
@@ -359,11 +361,7 @@ freeheap(void)
memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
#endif
h->used = h->sp->used;
- if (!fheap) {
- if (h->used < ARENA_SIZEOF(h))
- fheap = h;
- } else if (ARENA_SIZEOF(h) - h->used >
- ARENA_SIZEOF(fheap) - fheap->used)
+ if (!fheap && h->used < ARENA_SIZEOF(h))
fheap = h;
hl = h;
#ifdef ZSH_HEAP_DEBUG
@@ -386,26 +384,6 @@ freeheap(void)
VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
#endif
} else {
- if (fheap == h)
- fheap = NULL;
- if (h->next) {
- /* We want to cut this out of the arena list if we can */
- if (h == heaps)
- hl = heaps = h->next;
- else if (hl && hl->next == h)
- hl->next = h->next;
- else {
- DPUTS(hl, "hl->next != h when freeing");
- hl = h;
- continue;
- }
- h->next = NULL;
- } else {
- /* Leave an empty arena at the end until popped */
- h->used = 0;
- fheap = hl = h;
- break;
- }
#ifdef USE_MMAP
munmap((void *) h, h->size);
#else
@@ -463,29 +441,12 @@ popheap(void)
#ifdef ZSH_VALGRIND
VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
#endif
- if (!fheap) {
- if (h->used < ARENA_SIZEOF(h))
- fheap = h;
- } else if (ARENA_SIZEOF(h) - h->used >
- ARENA_SIZEOF(fheap) - fheap->used)
+ if (!fheap && h->used < ARENA_SIZEOF(h))
fheap = h;
zfree(hs, sizeof(*hs));
hl = h;
} else {
- if (h->next) {
- /* We want to cut this out of the arena list if we can */
- if (h == heaps)
- hl = heaps = h->next;
- else if (hl && hl->next == h)
- hl->next = h->next;
- else {
- DPUTS(hl, "hl->next != h when popping");
- hl = h;
- continue;
- }
- h->next = NULL;
- }
#ifdef USE_MMAP
munmap((void *) h, h->size);
#else
@@ -563,7 +524,7 @@ zheapptr(void *p)
mod_export void *
zhalloc(size_t size)
{
- Heap h, hp = NULL;
+ Heap h;
size_t n;
#ifdef ZSH_VALGRIND
size_t req_size = size;
@@ -590,7 +551,6 @@ zhalloc(size_t size)
* so why start over at heaps just because fheap has too little?
*/
for (h = (fheap ? fheap : heaps); h; h = h->next) {
- hp = h;
if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
void *ret;
@@ -611,6 +571,7 @@ zhalloc(size_t size)
}
}
{
+ Heap hp;
/* not found, allocate new heap */
#if defined(ZSH_MEM) && !defined(USE_MMAP)
static int called = 0;
@@ -619,6 +580,7 @@ zhalloc(size_t size)
#endif
n = HEAP_ARENA_SIZE > size ? HEAPSIZE : size + sizeof(*h);
+ for (hp = NULL, h = heaps; h; hp = h, h = h->next);
#ifdef USE_MMAP
h = mmap_heap_alloc(&n);
@@ -650,7 +612,6 @@ zhalloc(size_t size)
VALGRIND_MEMPOOL_ALLOC((char *)h, (char *)arena(h), req_size);
#endif
- DPUTS(hp && hp->next, "failed to find end of chain in zhalloc");
if (hp)
hp->next = h;
else
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-23 15:40 ` Sebastian Gniazdowski
@ 2015-10-23 15:57 ` Sebastian Gniazdowski
2015-10-23 19:26 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-23 15:57 UTC (permalink / raw)
To: zsh-workers
On 23 October 2015 at 17:40, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> I attach the patch that withdraws 9f8e3e8 and 643aca9 (which are 36834)
One note is that 643aca9 isn't fully from 36834 because:
if (fheap == h)
fheap = NULL;
Is moved to top of "if (h->next) {". Was this intended?
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-23 15:57 ` Sebastian Gniazdowski
@ 2015-10-23 19:26 ` Bart Schaefer
2015-10-23 23:50 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-23 19:26 UTC (permalink / raw)
To: Sebastian Gniazdowski, zsh-workers
On Oct 23, 8:32am, Sebastian Gniazdowski wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
}
} [ text/plain
} Encoded with "quoted-printable" ] :
}
} I ran the performance tests again with script formatting them
} differently - they are aggregated on test function, not on zsh
} version. This allows easier interpretation. A new conclusion is that
} zhalloc patch doesn't help for string_test. So the first patch (36834)
} speeds up string operations and also introduces increased memory
} usage.
Thanks for the patch. I think you've removed too much; the part about
finding the arena with the most space rather than first arena with any
space should have stayed, as should the changes to zhalloc() that were
included in 36834.
Try the below patch against 8e9a68ad1 ?
On Oct 23, 5:57pm, Sebastian Gniazdowski wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} On 23 October 2015 at 17:40, Sebastian Gniazdowski
} <sgniazdowski@gmail.com> wrote:
} > I attach the patch that withdraws 9f8e3e8 and 643aca9 (which are 36834)
}
} One note is that 643aca9 isn't fully from 36834 because:
} if (fheap == h)
} fheap = NULL;
}
} Is moved to top of "if (h->next) {". Was this intended?
Yes it was, but irrelevant since that whole block is being backed out.
diff --git a/Src/mem.c b/Src/mem.c
index 68bd767..15d8fd2 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -294,7 +294,7 @@ pushheap(void)
#endif
for (h = heaps; h; h = h->next) {
- DPUTS(!h->used && h->next, "BUG: empty heap");
+ DPUTS(!h->used, "BUG: empty heap");
hs = (Heapstack) zalloc(sizeof(*hs));
hs->next = h->sp;
h->sp = hs;
@@ -339,10 +339,12 @@ freeheap(void)
* of stuff already on the heap, this is an enormous amount of work,
* and performance goes to hell.
*
- * Therefore, we defer freeing the most recently allocated arena until
- * we reach popheap(). This may fail to reclaim some space in earlier
- * arenas.
- *
+ * However, if the arena to which fheap points is unused, we want to
+ * free it, so we have no choice but to do the sweep for a new fheap.
+ */
+ if (fheap && !fheap->sp)
+ fheap = NULL; /* We used to do this unconditionally */
+ /*
* In other cases, either fheap is already correct, or it has never
* been set and this loop will do it, or it'll be reset from scratch
* on the next popheap(). So all that's needed here is to pick up
@@ -386,26 +388,6 @@ freeheap(void)
VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
#endif
} else {
- if (fheap == h)
- fheap = NULL;
- if (h->next) {
- /* We want to cut this out of the arena list if we can */
- if (h == heaps)
- hl = heaps = h->next;
- else if (hl && hl->next == h)
- hl->next = h->next;
- else {
- DPUTS(hl, "hl->next != h when freeing");
- hl = h;
- continue;
- }
- h->next = NULL;
- } else {
- /* Leave an empty arena at the end until popped */
- h->used = 0;
- fheap = hl = h;
- break;
- }
#ifdef USE_MMAP
munmap((void *) h, h->size);
#else
@@ -473,19 +455,6 @@ popheap(void)
hl = h;
} else {
- if (h->next) {
- /* We want to cut this out of the arena list if we can */
- if (h == heaps)
- hl = heaps = h->next;
- else if (hl && hl->next == h)
- hl->next = h->next;
- else {
- DPUTS(hl, "hl->next != h when popping");
- hl = h;
- continue;
- }
- h->next = NULL;
- }
#ifdef USE_MMAP
munmap((void *) h, h->size);
#else
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-23 19:26 ` Bart Schaefer
@ 2015-10-23 23:50 ` Bart Schaefer
2015-10-24 6:09 ` Sebastian Gniazdowski
` (2 more replies)
0 siblings, 3 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-23 23:50 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]
On Oct 23, 12:26pm, Bart Schaefer wrote:
}
} Try the below patch against 8e9a68ad1 ?
Attached are my results from running the Sebastian's tests against
whatever binaries I had lying around that had zprof linked, plus
zsh with workers/36926. Output attached.
I then rebuilt git head (8e9a68ad) and ran the tests again. First
thing to note is that the results for the string test changed by
about 20% just because of running the test a second time, possibly
because system state changed as a result of using mmap() for heap.
However, if the results can be considered reliable anyway, note that
36926 retains most of the search_test speed, which I think is an
improvement over backing out all of 36834 (Sebastian's results for
zsh-newheaps-zhalloc, if I'm reading his email correctly).
Also zsh-5.0.7 - zsh-5.1 use a LOT of memory for the search test,
and 8e9a68ad uses almost as much for the string test, but 36926 is
pretty good at keeping both in check (of course the real reason
the search test is better is not mem.c but instead PWS's work in
pattern.c). Also 36926 and 8e9a68ad had noticeably smaller memory
footprint with the function test than 5.1 or earlier, so NEWHEAPS
there is definitely worth the slight loss of speed.
I don't have hard numbers to show for memory because I was watching
"top" ... but each new process seems to inherit some memory from the
previous one, so those mmap()ed segments are probably being shared
between the parent shell (the one driving the test script) and its
children?
Conclusion: 36926 is a good thing. Thanks, Sebastian, for observing
the memory usage issue with 36384. Can you confirm my results?
--
Barton E. Schaefer
[-- Attachment #2: Text --]
[-- Type: text/plain , Size: 2472 bytes --]
Pass 1:
Running [zsh-5.0.7]: string_test 51764.74
Running [zsh-5.0.8]: string_test 51473.90
Running [zsh-5.1]: string_test 52185.28
Running [zsh-36926]: string_test 38504.65
Running [zsh-5.0.7]: array_test 6893.61
Running [zsh-5.0.8]: array_test 6783.31
Running [zsh-5.1]: array_test 7013.65
Running [zsh-36926]: array_test 6214.61
Running [zsh-5.0.7]: function_test 21903.79
Running [zsh-5.0.8]: function_test 23027.64
Running [zsh-5.1]: function_test 22488.52
Running [zsh-36926]: function_test 30285.55
Running [zsh-5.0.7]: search_test 8359.19
Running [zsh-5.0.8]: search_test 8626.48
Running [zsh-5.1]: search_test 8176.59
Running [zsh-36926]: search_test 4394.92
Pass 2 (added build of git head as "zsh"):
Running [zsh-5.0.7]: string_test 41757.54
Running [zsh-5.0.8]: string_test 41443.93
Running [zsh-5.1]: string_test 41697.08
Running [zsh]: string_test 23052.51
Running [zsh-36926]: string_test 26685.68
Running [zsh-5.0.7]: array_test 6956.99
Running [zsh-5.0.8]: array_test 6791.33
Running [zsh-5.1]: array_test 6996.48
Running [zsh]: array_test 6273.76
Running [zsh-36926]: array_test 6326.08
Running [zsh-5.0.7]: function_test 21892.22
Running [zsh-5.0.8]: function_test 22012.86
Running [zsh-5.1]: function_test 22046.14
Running [zsh]: function_test 29858.53
Running [zsh-36926]: function_test 28719.64
Running [zsh-5.0.7]: search_test 8254.72
Running [zsh-5.0.8]: search_test 8508.30
Running [zsh-5.1]: search_test 8599.57
Running [zsh]: search_test 4402.41
Running [zsh-36926]: search_test 4244.59
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-23 23:50 ` Bart Schaefer
@ 2015-10-24 6:09 ` Sebastian Gniazdowski
2015-10-24 7:37 ` Sebastian Gniazdowski
2015-10-24 6:27 ` Sebastian Gniazdowski
2015-10-24 10:54 ` Sebastian Gniazdowski
2 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 6:09 UTC (permalink / raw)
To: zsh-workers
On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 23, 12:26pm, Bart Schaefer wrote:
> }
> } Try the below patch against 8e9a68ad1 ?
>
> However, if the results can be considered reliable anyway, note that
> 36926 retains most of the search_test speed, which I think is an
> improvement over backing out all of 36834 (Sebastian's results for
> zsh-newheaps-zhalloc, if I'm reading his email correctly).
I ran quick tests yesterday and obtained this too. Good news that not
all gains of 36834 will be withdrawn! :) The newheaps-zhalloc is what
you write, two following patches without 36834.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-23 23:50 ` Bart Schaefer
2015-10-24 6:09 ` Sebastian Gniazdowski
@ 2015-10-24 6:27 ` Sebastian Gniazdowski
2015-10-24 10:54 ` Sebastian Gniazdowski
2 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 6:27 UTC (permalink / raw)
To: zsh-workers
On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 23, 12:26pm, Bart Schaefer wrote:
> }
> } Try the below patch against 8e9a68ad1 ?
>
> Attached are my results from running the Sebastian's tests against
> whatever binaries I had lying around that had zprof linked, plus
> zsh with workers/36926. Output attached.
>
> I then rebuilt git head (8e9a68ad) and ran the tests again. First
> thing to note is that the results for the string test changed by
> about 20% just because of running the test a second time, possibly
> because system state changed as a result of using mmap() for heap.
Maybe some other process was using CPU? My tests are stable and when
there is a difference it's because I switched a window or firefox with
active flash was running in background
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 6:09 ` Sebastian Gniazdowski
@ 2015-10-24 7:37 ` Sebastian Gniazdowski
2015-10-24 8:04 ` Sebastian Gniazdowski
2015-10-24 19:39 ` Bart Schaefer
0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 7:37 UTC (permalink / raw)
To: zsh-workers
On 24 October 2015 at 08:09, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> On Oct 23, 12:26pm, Bart Schaefer wrote:
>> }
>> } Try the below patch against 8e9a68ad1 ?
>>
>> However, if the results can be considered reliable anyway, note that
>> 36926 retains most of the search_test speed, which I think is an
>> improvement over backing out all of 36834 (Sebastian's results for
>> zsh-newheaps-zhalloc, if I'm reading his email correctly).
>
> I ran quick tests yesterday and obtained this too. Good news that not
> all gains of 36834 will be withdrawn! :) The newheaps-zhalloc is what
> you write, two following patches without 36834.
I misread this. What I wanted to state, is that your patch from
yesterday retained search_test speed.
The fact that 36836 has its contribution in speeding up search_test is
was described by me in previous email:
> newheaps is responsible for
> instant-responsivity of my script, 36836 for much faster searching –
> that's a good compromise
So the thing is that your patch from yesterday retains search_test
speed up of 36834, while fixing the memory issues of 36834. Together
with 36836, this gives full retain of search_test speedup, nothing is
lost, speed of pattern matching is as high as with no revert from
current head state. What's lost is string_test speedup 36834 gave,
however, it did so with cost of increased memory usage
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 7:37 ` Sebastian Gniazdowski
@ 2015-10-24 8:04 ` Sebastian Gniazdowski
2015-10-24 19:39 ` Bart Schaefer
1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 8:04 UTC (permalink / raw)
To: zsh-workers
Turns out I read this correctly for the first time. You meant 36926,
which is the yesterday's patch. Having this in mind my email still
makes sense. I just described how 36834, 36836, 36926 behave.
On 24 October 2015 at 09:37, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 24 October 2015 at 08:09, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
>>> On Oct 23, 12:26pm, Bart Schaefer wrote:
>>> }
>>> } Try the below patch against 8e9a68ad1 ?
>>>
>>> However, if the results can be considered reliable anyway, note that
>>> 36926 retains most of the search_test speed, which I think is an
>>> improvement over backing out all of 36834 (Sebastian's results for
>>> zsh-newheaps-zhalloc, if I'm reading his email correctly).
>>
>> I ran quick tests yesterday and obtained this too. Good news that not
>> all gains of 36834 will be withdrawn! :) The newheaps-zhalloc is what
>> you write, two following patches without 36834.
>
> I misread this. What I wanted to state, is that your patch from
> yesterday retained search_test speed.
>
> The fact that 36836 has its contribution in speeding up search_test is
> was described by me in previous email:
>
>> newheaps is responsible for
>> instant-responsivity of my script, 36836 for much faster searching –
>> that's a good compromise
>
> So the thing is that your patch from yesterday retains search_test
> speed up of 36834, while fixing the memory issues of 36834. Together
> with 36836, this gives full retain of search_test speedup, nothing is
> lost, speed of pattern matching is as high as with no revert from
> current head state. What's lost is string_test speedup 36834 gave,
> however, it did so with cost of increased memory usage
>
> Best regards,
> Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-23 23:50 ` Bart Schaefer
2015-10-24 6:09 ` Sebastian Gniazdowski
2015-10-24 6:27 ` Sebastian Gniazdowski
@ 2015-10-24 10:54 ` Sebastian Gniazdowski
2015-10-24 11:25 ` Sebastian Gniazdowski
2015-10-24 16:31 ` Bart Schaefer
2 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 10:54 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]
I attach my results for the same builds as yours. Tried to calibrate
the running time and I ended up running really long tests (quite
exhausting to move mouse every few minutes as my screensaver is broken
and always turns on after few minutes). Will try to calibrate again
later, and I also am running the same tests on a ubuntu netbook. Few
conclusions:
> Running [zsh-5.0.7]: string_test 48631,87
> Running [zsh-5.0.8]: string_test 86302,14
This says there something happened after 5.0.7 and reveals itself on
OS X, could try to track it with bisect.
> Running [zsh-5.1]: string_test 86698,42
> Running [zsh-head-8e9a68a]: string_test 38887,54
> Running [zsh-36926]: string_test 86497,62
This says 36926 doesn't include string boost up (at cost of memory
use) that current head has.
> Running [zsh-5.1]: search_test 51321,70
> Running [zsh-head-8e9a68a]: search_test 10071,26
> Running [zsh-36926]: search_test 10260,60
This says that pattern maching boost up is retained.
> Running [zsh-5.1]: function_test 49070,83
> Running [zsh-head-8e9a68a]: function_test 56299,14
> Running [zsh-36926]: function_test 54456,34
This informs about slowdown caused by newheaps patch, and points that
36926 can make it smaller.
I also did memory tests:
> # Tests for zsh-head-8e9a68a
> # string_test 42, 182, 318, 457, 590, 720, last: 10
> string_test 317.8
This informs about the 36834 problem with memory.
> # Tests for zsh-36926
> # string_test 2.1, 6.6, 8.6, 10, 10, 11, 9.5, 11, 11, 11, 11, 11, last: 9.6
> string_test 9.3
This shows it's solved.
> # search_test 110, 215, last: 265
> search_test 110.0
This is the high memory usage when matching patterns in large arrays
you mentioned.
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: perf-test-36926.txt --]
[-- Type: text/plain, Size: 2480 bytes --]
2.3 Intel Core i5, 8 GB RAM, OS X 10.9.2
Pass 1:
Running [zsh-5.0.7]: string_test 48631,87
Running [zsh-5.0.8]: string_test 86302,14
Running [zsh-5.1]: string_test 87970,33
Running [zsh-36926]: string_test 86682,35
Running [zsh-5.0.7]: array_test 87699,03
Running [zsh-5.0.8]: array_test 91540,41
Running [zsh-5.1]: array_test 82907,69
Running [zsh-36926]: array_test 82203,27
Running [zsh-5.0.7]: function_test 44150,02
Running [zsh-5.0.8]: function_test 47915,24
Running [zsh-5.1]: function_test 48448,95
Running [zsh-36926]: function_test 57126,86
Running [zsh-5.0.7]: search_test 55862,93
Running [zsh-5.0.8]: search_test 51480,07
Running [zsh-5.1]: search_test 51027,38
Running [zsh-36926]: search_test 10274,11
Pass 2:
Running [zsh-5.0.7]: string_test 48799,25
Running [zsh-5.0.8]: string_test 86682,21
Running [zsh-5.1]: string_test 86698,42
Running [zsh-head-8e9a68a]: string_test 38887,54
Running [zsh-36926]: string_test 86497,62
Running [zsh-5.0.7]: array_test 86754,03
Running [zsh-5.0.8]: array_test 82451,51
Running [zsh-5.1]: array_test 82880,56
Running [zsh-head-8e9a68a]: array_test 82276,80
Running [zsh-36926]: array_test 83075,01
Running [zsh-5.0.7]: function_test 45516,67
Running [zsh-5.0.8]: function_test 49122,46
Running [zsh-5.1]: function_test 49070,83
Running [zsh-head-8e9a68a]: function_test 56299,14
Running [zsh-36926]: function_test 54456,34
Running [zsh-5.0.7]: search_test 57267,29
Running [zsh-5.0.8]: search_test 50841,91
Running [zsh-5.1]: search_test 51321,70
Running [zsh-head-8e9a68a]: search_test 10071,26
Running [zsh-36926]: search_test 10260,60
[-- Attachment #3: mem-test-36926.txt --]
[-- Type: text/plain, Size: 1779 bytes --]
2.3 Intel Core i5, 8 GB RAM, OS X 10.9.2
# Tests for zsh-5.0.7
# string_test 5.8, 10, 12, 11, 11, 12, last: 11
string_test 10.0
# array_test 1.3, 2.3, 3.2, 3.9, 4.3, 5.2, 6.2, 7.3, 7.7, 8.1, 8.3, 8.6, last: 8.7
array_test 5.3
# function_test 2.7, 2.8, 2.8, 2.8, 2.8, 2.8, last: 2.8
function_test 2.8
# search_test 65, 117, 149, 175, 198, 219, 238, last: 257
search_test 153.8
# Tests for zsh-5.0.8
# string_test 2.1, 6.7, 8.6, 10, 10, 11, 9.5, 10, 11, 11, 11, 11, last: 9.6
string_test 9.2
# array_test 1.2, 2.1, 3.3, 3.7, 4.4, 5.4, 6.4, 7.1, 7.8, 8.1, 8.4, 8.6, last: 8.9
array_test 5.3
# function_test 2.5, 2.6, 2.5, 2.6, 2.5, 2.5, last: 2.5
function_test 2.5
# search_test 60, 126, 159, 187, 212, 235, last: 256
search_test 148.8
# Tests for zsh-5.1
# string_test 2.4, 7.1, 8.8, 10, 10, 11, 9.7, 10, 11, 11, 12, 11, last: 10
string_test 9.4
# array_test 1.1, 2.4, 2.9, 3.5, 4.3, 5.3, 6.3, 7.1, 7.8, 8.1, 8.4, 8.7, last: 8.8
array_test 5.2
# function_test 2.5, 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, last: 2.6
function_test 2.6
# search_test 60, 125, 158, 186, 211, 234, 256, last: 257
search_test 162.3
# Tests for zsh-head-8e9a68a
# string_test 42, 182, 318, 457, 590, 720, last: 10
string_test 317.8
# array_test 1, 2.5, 3, 4, 4.5, 5.2, 6.3, 7.3, 7.8, 8.2, 8.4, 8.7, last: 8.8
array_test 5.3
# function_test 2.6, 2.7, 2.6, 2.9, 2.8, 2.9, 2.7, 2.8, last: 2.5
function_test 2.7
# search_test 106, 213, last: 264
search_test 106.0
# Tests for zsh-36926
# string_test 2.1, 6.6, 8.6, 10, 10, 11, 9.5, 11, 11, 11, 11, 11, last: 9.6
string_test 9.3
# array_test 1.2, 2.6, 3.1, 3.7, 4.4, 5.5, 6.5, 7.3, 8, 8.3, 8.5, 8.9, last: 9.1
array_test 5.4
# function_test 2.9, 2.7, 2.7, 2.8, 2.9, 2.9, 2.9, 2.7, last: 2.6
function_test 2.8
# search_test 110, 215, last: 265
search_test 110.0
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 10:54 ` Sebastian Gniazdowski
@ 2015-10-24 11:25 ` Sebastian Gniazdowski
2015-10-24 16:31 ` Bart Schaefer
1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 11:25 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 82 bytes --]
Performance tests from Ubuntu Atom 1.6 laptop
Best regards
Sebastian Gniazdowski
[-- Attachment #2: perf-test-36926-atom.txt --]
[-- Type: text/plain, Size: 2468 bytes --]
Intel Atom CPU 330 @ 1.60GHz
Pass 1:
Running [zsh-5.0.7]: string_test 58573.49
Running [zsh-5.0.8]: string_test 58206.73
Running [zsh-5.1]: string_test 58998.19
Running [zsh-36926]: string_test 58491.39
Running [zsh-5.0.7]: array_test 27821.68
Running [zsh-5.0.8]: array_test 28300.12
Running [zsh-5.1]: array_test 28862.40
Running [zsh-36926]: array_test 27642.97
Running [zsh-5.0.7]: function_test 102523.34
Running [zsh-5.0.8]: function_test 105471.44
Running [zsh-5.1]: function_test 107102.55
Running [zsh-36926]: function_test 122497.27
Running [zsh-5.0.7]: search_test 35029.27
Running [zsh-5.0.8]: search_test 36958.43
Running [zsh-5.1]: search_test 36506.89
Running [zsh-36926]: search_test 27938.21
Pass 2:
Running [zsh-5.0.7]: string_test 58273.59
Running [zsh-5.0.8]: string_test 58166.23
Running [zsh-5.1]: string_test 58190.08
Running [zsh-head-8e9a68a]: string_test 39081.73
Running [zsh-36926]: string_test 58978.04
Running [zsh-5.0.7]: array_test 27647.93
Running [zsh-5.0.8]: array_test 28353.12
Running [zsh-5.1]: array_test 28626.84
Running [zsh-head-8e9a68a]: array_test 27842.07
Running [zsh-36926]: array_test 27620.49
Running [zsh-5.0.7]: function_test 102423.34
Running [zsh-5.0.8]: function_test 105560.15
Running [zsh-5.1]: function_test 107393.99
Running [zsh-head-8e9a68a]: function_test 121731.23
Running [zsh-36926]: function_test 121420.25
Running [zsh-5.0.7]: search_test 34996.94
Running [zsh-5.0.8]: search_test 36811.87
Running [zsh-5.1]: search_test 36438.17
Running [zsh-head-8e9a68a]: search_test 27979.98
Running [zsh-36926]: search_test 27904.23
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 10:54 ` Sebastian Gniazdowski
2015-10-24 11:25 ` Sebastian Gniazdowski
@ 2015-10-24 16:31 ` Bart Schaefer
2015-10-24 16:41 ` Bart Schaefer
1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-24 16:31 UTC (permalink / raw)
To: zsh-workers
On Oct 24, 12:54pm, Sebastian Gniazdowski wrote:
}
} > Running [zsh-5.0.7]: string_test 48631,87
} > Running [zsh-5.0.8]: string_test 86302,14
}
} This says there something happened after 5.0.7 and reveals itself on
} OS X, could try to track it with bisect.
What happened is this:
34451: use mmap() also on Mac OS X
Let MAP_ANONYMOUS be an alias to MAP_ANON if the former is not
defined.
Which strongly suggests that using the system malloc() rather than our
own mmap() strategy was the correct thing to do on MacOS, and we should
back that out and make note. Also might be worth checking whether the
behavior is better on other platforms if we disable mmap().
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 16:31 ` Bart Schaefer
@ 2015-10-24 16:41 ` Bart Schaefer
0 siblings, 0 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-24 16:41 UTC (permalink / raw)
To: zsh-workers
On Oct 24, 9:31am, Bart Schaefer wrote:
}
} Also might be worth checking whether the
} behavior is better on other platforms if we disable mmap().
I set HAVE_MMAP 0 in config.h, recompiled, and compared on my CentOS VM;
there is no significant difference in string test:
Running [zsh-noMMAP]: string_test 26612.85
Running [zsh-36926]: string_test 26688.94
Function test is actually slightly worse:
Running [zsh-noMMAP]: function_test 28053.23
Running [zsh-36926]: function_test 27759.94
No significant diff in the other tests either. So currently this only
seems to apply to MacOS.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 7:37 ` Sebastian Gniazdowski
2015-10-24 8:04 ` Sebastian Gniazdowski
@ 2015-10-24 19:39 ` Bart Schaefer
2015-10-25 7:35 ` Sebastian Gniazdowski
1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-24 19:39 UTC (permalink / raw)
To: zsh-workers
On Oct 24, 9:37am, Sebastian Gniazdowski wrote:
}
} So the thing is that your patch from yesterday retains search_test
} speed up of 36834, while fixing the memory issues of 36834. Together
} with 36836, this gives full retain of search_test speedup, nothing is
} lost, speed of pattern matching is as high as with no revert from
} current head state. What's lost is string_test speedup 36834 gave,
} however, it did so with cost of increased memory usage
OK, let's try it this way. This retains MOST of 36834, but puts back
scanning the entire heap if we intend to free the block to which the
current fheap pointer refers.
(This is a patch against 8e9a68ad but applies to current head.)
diff --git a/Src/mem.c b/Src/mem.c
index 68bd767..ca9ab96 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -343,6 +343,12 @@ freeheap(void)
* we reach popheap(). This may fail to reclaim some space in earlier
* arenas.
*
+ * However, if the arena to which fheap points is unused, we want to
+ * free it, so we have no choice but to do the sweep for a new fheap.
+ */
+ if (fheap && !fheap->sp)
+ fheap = NULL; /* We used to do this unconditionally */
+ /*
* In other cases, either fheap is already correct, or it has never
* been set and this loop will do it, or it'll be reset from scratch
* on the next popheap(). So all that's needed here is to pick up
Here are my results for that; memory usage is comparable to 36926 while
speed is improved.
Running [zsh-HEAD]: string_test 23203.72
Running [zsh-36926]: string_test 38478.23
Running [zsh-HEAD+rescan]: string_test 28787.44
Running [zsh-HEAD]: array_test 6261.07
Running [zsh-36926]: array_test 6269.27
Running [zsh-HEAD+rescan]: array_test 6269.94
Running [zsh-HEAD]: function_test 29609.59
Running [zsh-36926]: function_test 30133.26
Running [zsh-HEAD+rescan]: function_test 30237.71
Running [zsh-HEAD]: search_test 4209.37
Running [zsh-36926]: search_test 4194.97
Running [zsh-HEAD+rescan]: search_test 4260.73
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-24 19:39 ` Bart Schaefer
@ 2015-10-25 7:35 ` Sebastian Gniazdowski
2015-10-25 17:23 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-25 7:35 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 668 bytes --]
On 24 October 2015 at 21:39, Bart Schaefer <schaefer@brasslantern.com> wrote:
> OK, let's try it this way. This retains MOST of 36834, but puts back
> scanning the entire heap if we intend to free the block to which the
> current fheap pointer refers.
>
> (This is a patch against 8e9a68ad but applies to current head.)
My results. 36943 has string_test speed of current head (so it's
slightly better than in your results). And memory usage is in check,
so that's a fix.
I calibrated for string_test, it's interesting how much slower
array_test is on OS X. My machine is 2.3 GHz, maybe that's some reason
(your is faster IIRC).
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: perf-test-36943.txt --]
[-- Type: text/plain, Size: 808 bytes --]
Running [zsh-head-8e9a68a]: string_test 22081,62
Running [zsh-36926]: string_test 48766,75
Running [zsh-36943]: string_test 21897,12
Running [zsh-head-8e9a68a]: array_test 48814,49
Running [zsh-36926]: array_test 48213,22
Running [zsh-36943]: array_test 49198,79
Running [zsh-head-8e9a68a]: function_test 43091,00
Running [zsh-36926]: function_test 44121,87
Running [zsh-36943]: function_test 42354,93
Running [zsh-head-8e9a68a]: search_test 8396,50
Running [zsh-36926]: search_test 8122,48
Running [zsh-36943]: search_test 7816,17
[-- Attachment #3: mem-test-36943.txt --]
[-- Type: text/plain, Size: 1003 bytes --]
# Tests for zsh-head-8e9a68a
# string_test 43, 179, 318, 451, 583, 713, last: 9.6
string_test 314.8
# array_test 1.3, 2.5, 3.3, 3.7, 4.3, 5.3, 6.3, 7, 7.8, 8, 8.4, 8.6, last: 8.8
array_test 5.3
# function_test 2.7, 2.9, 2.6, 2.9, 2.9, 2.8, 2.7, 2.6, last: 2.6
function_test 2.8
# search_test 111, 213, last: 266
search_test 111.0
# Tests for zsh-36926
# string_test 3.1, 7.6, 9.1, 11, 11, 12, 10, 11, 11, 12, 12, 11, last: 10
string_test 10.0
# array_test 1.5, 2.3, 3.3, 3.7, 4.3, 5, 5.8, 6.8, 7.5, 7.8, 8.1, 8.4, last: 8.6
array_test 5.1
# function_test 2.8, 2.6, 2.6, 2.7, 2.9, 2.8, 2.8, 2.9, last: 2.6
function_test 2.7
# search_test 108, 214, last: 266
search_test 108.0
# Tests for zsh-36943
# string_test 4.6, 8.6, 9.5, 10, 10, 11, last: 8.9
string_test 8.5
# array_test 1.3, 2.4, 2.9, 3.7, 4.3, 5.1, 5.9, 6.9, 7.6, 7.8, 8.2, 8.4, last: 8.7
array_test 5.1
# function_test 2.6, 2.6, 2.7, 2.8, 2.6, 2.9, 2.8, 2.8, last: 2.6
function_test 2.7
# search_test 111, 216, last: 266
search_test 111.0
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-25 7:35 ` Sebastian Gniazdowski
@ 2015-10-25 17:23 ` Bart Schaefer
2015-10-25 20:56 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-25 17:23 UTC (permalink / raw)
To: Sebastian Gniazdowski, zsh-workers
On Oct 25, 8:35am, Sebastian Gniazdowski wrote:
}
} I calibrated for string_test, it's interesting how much slower
} array_test is on OS X. My machine is 2.3 GHz, maybe that's some reason
} (your is faster IIRC).
diff --git a/Src/mem.c b/Src/mem.c
index 158ad96..3138bb1 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -79,9 +79,16 @@
#include <sys/mman.h>
+#if 0
+/*
+ * This change was designed to enable use of memory mapping on MacOS.
+ * However, performance tests indicate that MacOS mapped regions are
+ * significantly slower to allocate than memory from malloc().
+ */
#if defined(MAP_ANON) && !defined(MAP_ANONYMOUS)
#define MAP_ANONYMOUS MAP_ANON
#endif
+#endif /* 0 */
#if defined(MAP_ANONYMOUS) && defined(MAP_PRIVATE)
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-25 17:23 ` Bart Schaefer
@ 2015-10-25 20:56 ` Sebastian Gniazdowski
2015-10-26 0:49 ` Bart Schaefer
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-25 20:56 UTC (permalink / raw)
To: zsh-workers
[-- Attachment #1: Type: text/plain, Size: 61 bytes --]
Turns out that's not it
Best regards,
Sebastian Gniazdowski
[-- Attachment #2: perf-test-36943-nommap.txt --]
[-- Type: text/plain, Size: 1076 bytes --]
Running [zsh-head-8e9a68a]: string_test 22085,99
Running [zsh-36926]: string_test 48248,70
Running [zsh-36943]: string_test 21867,31
Running [zsh-36943-nommap]: string_test 27014,77
Running [zsh-head-8e9a68a]: array_test 49213,94
Running [zsh-36926]: array_test 48707,25
Running [zsh-36943]: array_test 49611,80
Running [zsh-36943-nommap]: array_test 50140,97
Running [zsh-head-8e9a68a]: function_test 42722,47
Running [zsh-36926]: function_test 42292,10
Running [zsh-36943]: function_test 42579,06
Running [zsh-36943-nommap]: function_test 34560,54
Running [zsh-head-8e9a68a]: search_test 7849,94
Running [zsh-36926]: search_test 8117,99
Running [zsh-36943]: search_test 7759,65
Running [zsh-36943-nommap]: search_test 7928,84
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-25 20:56 ` Sebastian Gniazdowski
@ 2015-10-26 0:49 ` Bart Schaefer
2015-10-26 7:41 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-26 0:49 UTC (permalink / raw)
To: zsh-workers
On Oct 25, 9:56pm, Sebastian Gniazdowski wrote:
}
} Turns out that's not it
Hm. If it shows up between 5.0.7 and 5.0.8 and can't be explained by
different compilers or different compiler flags, then other culprits
might be:
* 36ec763d - 33976: fix overlapping strcpy()
* one of the several commits that added queue_signals()
That mmap change was the only thing in mem.c from 5.0.7 -> 5.0.8.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-26 0:49 ` Bart Schaefer
@ 2015-10-26 7:41 ` Sebastian Gniazdowski
2015-10-26 7:47 ` Sebastian Gniazdowski
0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-26 7:41 UTC (permalink / raw)
To: zsh-workers
On 26 October 2015 at 01:49, Bart Schaefer <schaefer@brasslantern.com> wrote:
> Hm. If it shows up between 5.0.7 and 5.0.8 and can't be explained by
> different compilers or different compiler flags, then other culprits
> might be:
>
> * 36ec763d - 33976: fix overlapping strcpy()
> * one of the several commits that added queue_signals()
>
> That mmap change was the only thing in mem.c from 5.0.7 -> 5.0.8.
I ran git bisect and it pointed to the mmap commit you mentioned
earlier, i.e. to 1f62d8d0be7290883991fb4de640fbecb7714917. Array test
wasn't changing during the search, i.e. was continuously slower.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Slowdown around 5.0.5-dev-0
2015-10-26 7:41 ` Sebastian Gniazdowski
@ 2015-10-26 7:47 ` Sebastian Gniazdowski
0 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-26 7:47 UTC (permalink / raw)
To: zsh-workers
PS. My OS is Mavericks, curious how this behaves on Yosemite. Have
quite broken system, planning to upgrade to Yosemite, but maybe
someone has such revision and could test this, and I would stay on
Mavericks as it serves well as testing platform for my scripts and
Zsh.
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2015-10-26 7:48 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-10 10:54 Slowdown around 5.0.5-dev-0 Sebastian Gniazdowski
2015-10-10 17:58 ` Bart Schaefer
2015-10-10 18:11 ` Sebastian Gniazdowski
2015-10-10 18:32 ` Sebastian Gniazdowski
2015-10-11 0:06 ` Bart Schaefer
2015-10-11 6:20 ` Bart Schaefer
2015-10-11 8:39 ` Sebastian Gniazdowski
2015-10-11 16:17 ` Bart Schaefer
2015-10-11 16:48 ` Sebastian Gniazdowski
2015-10-11 17:31 ` Bart Schaefer
2015-10-11 18:05 ` Sebastian Gniazdowski
2015-10-11 21:22 ` Bart Schaefer
2015-10-12 8:21 ` Sebastian Gniazdowski
2015-10-12 14:01 ` Bart Schaefer
2015-10-12 16:50 ` Sebastian Gniazdowski
2015-10-13 0:33 ` Bart Schaefer
2015-10-13 8:21 ` Sebastian Gniazdowski
2015-10-13 15:52 ` Bart Schaefer
2015-10-14 6:50 ` Sebastian Gniazdowski
2015-10-14 13:27 ` Peter Stephenson
2015-10-14 16:25 ` Bart Schaefer
2015-10-14 16:50 ` Bart Schaefer
2015-10-15 4:32 ` Bart Schaefer
2015-10-15 13:03 ` Sebastian Gniazdowski
2015-10-16 0:35 ` Bart Schaefer
2015-10-17 9:12 ` Sebastian Gniazdowski
2015-10-17 9:24 ` Sebastian Gniazdowski
2015-10-18 16:19 ` Bart Schaefer
2015-10-18 20:40 ` Sebastian Gniazdowski
2015-10-18 21:07 ` Bart Schaefer
2015-10-18 21:31 ` Sebastian Gniazdowski
2015-10-19 17:21 ` Bart Schaefer
2015-10-22 12:49 ` Sebastian Gniazdowski
2015-10-22 15:00 ` Bart Schaefer
2015-10-22 16:28 ` Sebastian Gniazdowski
2015-10-22 16:33 ` Sebastian Gniazdowski
2015-10-23 15:40 ` Sebastian Gniazdowski
2015-10-23 15:57 ` Sebastian Gniazdowski
2015-10-23 19:26 ` Bart Schaefer
2015-10-23 23:50 ` Bart Schaefer
2015-10-24 6:09 ` Sebastian Gniazdowski
2015-10-24 7:37 ` Sebastian Gniazdowski
2015-10-24 8:04 ` Sebastian Gniazdowski
2015-10-24 19:39 ` Bart Schaefer
2015-10-25 7:35 ` Sebastian Gniazdowski
2015-10-25 17:23 ` Bart Schaefer
2015-10-25 20:56 ` Sebastian Gniazdowski
2015-10-26 0:49 ` Bart Schaefer
2015-10-26 7:41 ` Sebastian Gniazdowski
2015-10-26 7:47 ` Sebastian Gniazdowski
2015-10-24 6:27 ` Sebastian Gniazdowski
2015-10-24 10:54 ` Sebastian Gniazdowski
2015-10-24 11:25 ` Sebastian Gniazdowski
2015-10-24 16:31 ` Bart Schaefer
2015-10-24 16:41 ` Bart Schaefer
2015-10-23 6:32 ` Sebastian Gniazdowski
2015-10-16 0:37 ` Bart Schaefer
2015-10-13 13:07 ` Sebastian Gniazdowski
2015-10-13 13:31 ` Sebastian Gniazdowski
2015-10-12 12:05 ` Sebastian Gniazdowski
2015-10-12 15:13 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).