zsh-workers
 help / color / mirror / code / Atom feed
* Slowdown around 5.0.5-dev-0
@ 2015-10-10 10:54 Sebastian Gniazdowski
  2015-10-10 17:58 ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-10 10:54 UTC (permalink / raw)
  To: zsh-workers

Hello,
I was testing my scripts and saw that on 5.0.2 having 89k-elements
long is much faster than on 5.1.1-dev-0. I ran git bisect and found
the commit:

2014-01-18 23f98c3 32285: restart the fheap search in freeheap if the
current fheap arena is about to be discarded; fixes crash

I documented the behavior on video:
https://youtu.be/Crfoh05eWKQ

On good commit, the list works fast, with full possible speed, while
on bad commit moving one row up or down takes 1 second or more.

--- more details ---

The good commit can also slow down but always digs itself out of the
"crisis". The bad commit is always slow, and never digs out.


Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-10 10:54 Slowdown around 5.0.5-dev-0 Sebastian Gniazdowski
@ 2015-10-10 17:58 ` Bart Schaefer
  2015-10-10 18:11   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-10 17:58 UTC (permalink / raw)
  To: zsh-workers

On Oct 10, 12:54pm, Sebastian Gniazdowski wrote:
} Subject: Slowdown around 5.0.5-dev-0
}
} I was testing my scripts and saw that on 5.0.2 having 89k-elements
} long is much faster than on 5.1.1-dev-0. I ran git bisect and found
} the commit:
} 
} 2014-01-18 23f98c3 32285: restart the fheap search in freeheap if the
} current fheap arena is about to be discarded; fixes crash

If you look back through the zsh-workers articles, you'll find that
the performance was even worse prior to workers/29175; there was a
huge gain in performance from that patch, but it introduced a crash
bug, so we had to give back some of the performance in workers/32285
to prevent the crash from happening.

Read the large comment in freeheap() for a description of what is
going on there, note particularly the paragraph ending "performance
goes to hell."

I fear a signficant rewrite of the heap-allocation code would be needed
to both avoid the crash and improve the performance.  The current code
was designed with the expectation that zhalloc() would be a much more
common operation than freeheap(), but there are circumstances in which
the latter happens a lot.

It's possible that we could figure out a way to skip the scan that
updates all the h->used values in freeheap(), and pay the price on the
next zhalloc() instead.

I think (but am not 100% sure) that the pathological case happens when
the heap is already maximally filled, so a new arena is always added
to the end of the chain and then dropped again on each alloc/free.

On the other hand the shell really isn't intended to be manipulating
arrays of containing tens of thousands of elements.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-10 17:58 ` Bart Schaefer
@ 2015-10-10 18:11   ` Sebastian Gniazdowski
  2015-10-10 18:32     ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-10 18:11 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On 10 October 2015 at 19:58, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On the other hand the shell really isn't intended to be manipulating
> arrays of containing tens of thousands of elements.

I was afraid that this will be stated, by you as the author of the
patch. It's just such a great thing that patterns are so fast, faster
than regular expressions (from zsh/regex). And that they are so
advanced. I can do fancy things like filtering for multiple words with
*foo*~^*bar*, colorizing output by using (#m) which is faster than
(#b) and provided for someone aware to utilize that gain and
functionality. What I mean is that we shouldn't now slip into "shell
isn't for that, anyway" after providing so much.

PS. Not everything is so great, I was successfully optimizing my code
by changing if[] into [] && || in crucial places. Removing variables
as even a single assignment seems to cost much. But overall there's
plenty in speed and functionality.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-10 18:11   ` Sebastian Gniazdowski
@ 2015-10-10 18:32     ` Sebastian Gniazdowski
  2015-10-11  0:06       ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-10 18:32 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

As for the code, it's interesting that it is just existence of large
array that slows down things. Just before spotting the >=5.0.5
slowdown I wrote an optimization:

        if [ "$prev_start_idx" -ne "$NLIST_FROM_WHAT_IDX_LIST_IS_SHOWN" ]; then
            prev_start_idx="$NLIST_FROM_WHAT_IDX_LIST_IS_SHOWN"
            disp_list=(
"${(@)col_list[NLIST_FROM_WHAT_IDX_LIST_IS_SHOWN, end_idx]}" )
        fi

It creates a small array disp_list that is used for display. 23
elements in my case. The large 89k array "sleeps" waiting for change
of page or some other trigger (details not important). The patch
helped for 5.0.2 (except the possible temporal "dig" into slowness)
but for >= 5.0.5 it helps just a little, while it should help
completely, as everything is around 23 elements long array.

Best regards,
Sebastian Gniazdowski


On 10 October 2015 at 20:11, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 10 October 2015 at 19:58, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> On the other hand the shell really isn't intended to be manipulating
>> arrays of containing tens of thousands of elements.
>
> I was afraid that this will be stated, by you as the author of the
> patch. It's just such a great thing that patterns are so fast, faster
> than regular expressions (from zsh/regex). And that they are so
> advanced. I can do fancy things like filtering for multiple words with
> *foo*~^*bar*, colorizing output by using (#m) which is faster than
> (#b) and provided for someone aware to utilize that gain and
> functionality. What I mean is that we shouldn't now slip into "shell
> isn't for that, anyway" after providing so much.
>
> PS. Not everything is so great, I was successfully optimizing my code
> by changing if[] into [] && || in crucial places. Removing variables
> as even a single assignment seems to cost much. But overall there's
> plenty in speed and functionality.
>
> Best regards,
> Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-10 18:32     ` Sebastian Gniazdowski
@ 2015-10-11  0:06       ` Bart Schaefer
  2015-10-11  6:20         ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11  0:06 UTC (permalink / raw)
  To: zsh-workers

On Oct 10,  8:32pm, Sebastian Gniazdowski wrote:
}
} As for the code, it's interesting that it is just existence of large
} array that slows down things.

To the best of my understanding (not having written the original code),
the zsh heap works like this:

The "heaps" and "fheap" pointers start out NULL.

The first zhalloc() creates a block of memory either of the "arena size"
or the size requested, whichever is larger.  This block has a header
that contains a pointer to another block (initially NULL), and a counter
of the amount of allocated space in the current block.

On subsequent zhalloc(), either a pointer to the end of allocated
space in the existing block is returned (if sufficient space), or a
new block is allocated and linked through the pointer in in the header
of the existing block.  Thus a linked list of partially-filled blocks
may be built up.

Every zhalloc() therefore looks for space in the first block it can
find that has enough.  As an optimization, the "fheap" pointer is left
pointed at the earliest block that has any free space, so full blocks
can be skipped by the search at allocation time.  However, if the only
block with available space is the last (most recently allocated) one,
fheap becomes invalid when that block is freed.

On pushheap(), a linked list of free space positions (struct heapstack)
is created for each existing block in the heap.  The effect is that
every block is its own stack of heaps; there isn't a single global
stack of heaps.  On popheap() or freeheap(), the list of free space
positions is traversed and the free pointer in each heap block is
reset to its position at the time it was pushed onto the stack.

It is this reset traversal that causes the slowdown.  If all the existing
heap blocks are completely filled before the pushheap() occurs, fheap is
invalidated on every free, and then we spend a lot of time uselessly
resetting the free space pointers in those existing blocks.

(We also invalidate fheap any time we temporarily switch to a new heap,
which currently happens in completion, so completion operations will
invoke additional scans for arenas with free space.)

} The large 89k array "sleeps" waiting for change

Yes, so it fills the top of the heap and occupies it forever, thereby
causing the heap blocks it occupies to be uselessly traversed seeking
available space every time the smaller array is manipulated.

This suggests a couple of possible fixes, but I've run out of time to
experiment right now.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11  0:06       ` Bart Schaefer
@ 2015-10-11  6:20         ` Bart Schaefer
  2015-10-11  8:39           ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11  6:20 UTC (permalink / raw)
  To: zsh-workers

On Oct 10,  5:06pm, Bart Schaefer wrote:
}
} This suggests a couple of possible fixes, but I've run out of time to
} experiment right now.

OK, this is the least intrusive change I can think of that might make
a difference.  What this attempts to do is maintain a second pointer
into the list of heaps that tells freeheap() where to start working.  
It's set on pushheap() and reset on popheap() so it freeheap() should
avoid searching for free space in some arenas that were completely
filled before pushheap() was called.  However, it won't improve the
performance if there are a whole lot of heap arenas each with a small
amount of space available at the time of pushheap().

To improve THAT situation, I think we'd have to be willing to "leak"
those small pieces of heap until a popheap() happens, and instead
use at least one new arena following most calls to pushheap().  This
might mean that zsh's memory footprint grows, but on modern hardware
that may not be an issue.

All tests pass with this change in place, let's see what it does with
Sebastian's 89k array elements.


diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..01072a3 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -127,6 +127,12 @@ static Heap heaps;
 
 static Heap fheap;
 
+/* same as fheap, except in the preceding stack context (it will be the
+ * first heap from which space in the current context may be recovered
+ * when heap space is freed) */
+
+static Heap fpop;
+
 /**/
 #ifdef ZSH_HEAP_DEBUG
 /*
@@ -293,8 +299,11 @@ pushheap(void)
     h_push++;
 #endif
 
+    fpop = NULL;
     for (h = heaps; h; h = h->next) {
 	DPUTS(!h->used, "BUG: empty heap");
+	if (!fpop && h->used < ARENA_SIZEOF(h))
+	    fpop = h;
 	hs = (Heapstack) zalloc(sizeof(*hs));
 	hs->next = h->sp;
 	h->sp = hs;
@@ -341,9 +350,10 @@ freeheap(void)
      *
      * However, if the arena to which fheap points is unused, we want to
      * free it, so we have no choice but to do the sweep for a new fheap.
+     * fpop is the first heap with free space following pushheap().
      */
     if (fheap && !fheap->sp)
-	fheap = NULL;	/* We used to do this unconditionally */
+	fheap = fpop;
     /*
      * In other cases, either fheap is already correct, or it has never
      * been set and this loop will do it, or it'll be reset from scratch
@@ -417,7 +427,7 @@ popheap(void)
     h_pop++;
 #endif
 
-    fheap = NULL;
+    fheap = fpop = NULL;
     for (h = heaps; h; h = hn) {
 	hn = h->next;
 	if ((hs = h->sp)) {
@@ -443,6 +453,8 @@ popheap(void)
 #endif
 	    if (!fheap && h->used < ARENA_SIZEOF(h))
 		fheap = h;
+	    if (!fpop && h->sp && h->sp->used < ARENA_SIZEOF(h))
+		fpop = h;
 	    zfree(hs, sizeof(*hs));
 
 	    hl = h;


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11  6:20         ` Bart Schaefer
@ 2015-10-11  8:39           ` Sebastian Gniazdowski
  2015-10-11 16:17             ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-11  8:39 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 3248 bytes --]

Zprof says there is no change:
https://youtu.be/6A-AKesvZKM

I also attach zprof result in file. BTW, the patch didn't apply to
head or to 23f98c3, I had to make changes manually.

Best regards,
Sebastian Gniazdowski

On 11 October 2015 at 08:20, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 10,  5:06pm, Bart Schaefer wrote:
> }
> } This suggests a couple of possible fixes, but I've run out of time to
> } experiment right now.
>
> OK, this is the least intrusive change I can think of that might make
> a difference.  What this attempts to do is maintain a second pointer
> into the list of heaps that tells freeheap() where to start working.
> It's set on pushheap() and reset on popheap() so it freeheap() should
> avoid searching for free space in some arenas that were completely
> filled before pushheap() was called.  However, it won't improve the
> performance if there are a whole lot of heap arenas each with a small
> amount of space available at the time of pushheap().
>
> To improve THAT situation, I think we'd have to be willing to "leak"
> those small pieces of heap until a popheap() happens, and instead
> use at least one new arena following most calls to pushheap().  This
> might mean that zsh's memory footprint grows, but on modern hardware
> that may not be an issue.
>
> All tests pass with this change in place, let's see what it does with
> Sebastian's 89k array elements.
>
>
> diff --git a/Src/mem.c b/Src/mem.c
> index b9569ea..01072a3 100644
> --- a/Src/mem.c
> +++ b/Src/mem.c
> @@ -127,6 +127,12 @@ static Heap heaps;
>
>  static Heap fheap;
>
> +/* same as fheap, except in the preceding stack context (it will be the
> + * first heap from which space in the current context may be recovered
> + * when heap space is freed) */
> +
> +static Heap fpop;
> +
>  /**/
>  #ifdef ZSH_HEAP_DEBUG
>  /*
> @@ -293,8 +299,11 @@ pushheap(void)
>      h_push++;
>  #endif
>
> +    fpop = NULL;
>      for (h = heaps; h; h = h->next) {
>         DPUTS(!h->used, "BUG: empty heap");
> +       if (!fpop && h->used < ARENA_SIZEOF(h))
> +           fpop = h;
>         hs = (Heapstack) zalloc(sizeof(*hs));
>         hs->next = h->sp;
>         h->sp = hs;
> @@ -341,9 +350,10 @@ freeheap(void)
>       *
>       * However, if the arena to which fheap points is unused, we want to
>       * free it, so we have no choice but to do the sweep for a new fheap.
> +     * fpop is the first heap with free space following pushheap().
>       */
>      if (fheap && !fheap->sp)
> -       fheap = NULL;   /* We used to do this unconditionally */
> +       fheap = fpop;
>      /*
>       * In other cases, either fheap is already correct, or it has never
>       * been set and this loop will do it, or it'll be reset from scratch
> @@ -417,7 +427,7 @@ popheap(void)
>      h_pop++;
>  #endif
>
> -    fheap = NULL;
> +    fheap = fpop = NULL;
>      for (h = heaps; h; h = hn) {
>         hn = h->next;
>         if ((hs = h->sp)) {
> @@ -443,6 +453,8 @@ popheap(void)
>  #endif
>             if (!fheap && h->used < ARENA_SIZEOF(h))
>                 fheap = h;
> +           if (!fpop && h->sp && h->sp->used < ARENA_SIZEOF(h))
> +               fpop = h;
>             zfree(hs, sizeof(*hs));
>
>             hl = h;

[-- Attachment #2: result.txt --]
[-- Type: text/plain, Size: 1956 bytes --]

patched:
# time ( npanelize < ~/lsoflsof )
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161        5453,04    33,87   43,53%   5453,04    33,87   43,53%  _nlist_print_with_ansi
 2)    1       12500,28 12500,28   99,79%   4052,65  4052,65   32,35%  n-list
 3)    7        8041,91  1148,84   64,20%   2588,87   369,84   20,67%  n-list-draw
 4)    7         327,91    46,84    2,62%    322,30    46,04    2,57%  n-list-input
 5)    7          40,77     5,82    0,33%     31,97     4,57    0,26%  _nlist_setup_user_vars
 6)    7          28,91     4,13    0,23%     28,91     4,13    0,23%  _nlist_status_msg
 7)    2          24,94    12,47    0,20%     24,94    12,47    0,20%  _nlist_cursor_visibility
 8)    1          26,79    26,79    0,21%      9,97     9,97    0,08%  _nlist_exit
( n-panelize < ~/lsoflsof; )  14,33s user 0,27s system 94% cpu 15,488 total

clean:
# time ( npanelize < ~/lsoflsof )
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161        5410,49    33,61   43,57%   5410,49    33,61   43,57%  _nlist_print_with_ansi
 2)    1       12391,49 12391,49   99,78%   4030,62  4030,62   32,46%  n-list
 3)    7        7961,76  1137,39   64,11%   2551,27   364,47   20,54%  n-list-draw
 4)    7         324,36    46,34    2,61%    318,36    45,48    2,56%  n-list-input
 5)    7          39,22     5,60    0,32%     31,44     4,49    0,25%  _nlist_setup_user_vars
 6)    7          27,86     3,98    0,22%     27,86     3,98    0,22%  _nlist_status_msg
 7)    2          24,35    12,18    0,20%     24,35    12,18    0,20%  _nlist_cursor_visibility
 8)    1          27,05    27,05    0,22%     10,37    10,37    0,08%  _nlist_exit
( n-panelize < ~/lsoflsof; )  14,19s user 0,29s system 94% cpu 15,319 total


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11  8:39           ` Sebastian Gniazdowski
@ 2015-10-11 16:17             ` Bart Schaefer
  2015-10-11 16:48               ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 16:17 UTC (permalink / raw)
  To: zsh-workers

On Oct 11, 10:39am, Sebastian Gniazdowski wrote:
}
} Zprof says there is no change:

Well, that's not entirely surprising.  It probably just means that there
is some space in a lot of arenas, so fpop doesn't move far from the head
of the list.  Or it could mean that the real problem is popheap() rather
than freeheap().

} BTW, the patch didn't apply to head or to 23f98c3,

Curious.  "git diff master origin/master" shows no diffs for my local
repository, and the patch was just a straight "git diff".

There were additional changes committed after 23f98c3 so there's no
reason to have expected it would apply to that.

In any case here's another stab at it.  Should apply to 83a1757.  This
should help if there's a lot of freeheap() happening, but not so much
if there's a lot of push/pop (in fact it may be worse in that case).

If it's push/pop that's the issue, the next step would be to try to
identify the crucial pushheap() and replace it with NEWHEAPS() if the
scope permits.


diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..d49f685 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -294,7 +294,7 @@ pushheap(void)
 #endif
 
     for (h = heaps; h; h = h->next) {
-	DPUTS(!h->used, "BUG: empty heap");
+	DPUTS(!h->used && h->next, "BUG: empty heap");
 	hs = (Heapstack) zalloc(sizeof(*hs));
 	hs->next = h->sp;
 	h->sp = hs;
@@ -334,17 +334,15 @@ freeheap(void)
      *
      * Whenever fheap is NULL here, the loop below sweeps back over the
      * entire heap list again, resetting the free space in every arena to
-     * the amount stashed by pushheap() and finding the first arena with
+     * the amount stashed by pushheap() and finding the arena with the most
      * free space to optimize zhalloc()'s next search.  When there's a lot
      * of stuff already on the heap, this is an enormous amount of work,
      * and performance goes to hell.
      *
-     * However, if the arena to which fheap points is unused, we want to
-     * free it, so we have no choice but to do the sweep for a new fheap.
-     */
-    if (fheap && !fheap->sp)
-	fheap = NULL;	/* We used to do this unconditionally */
-    /*
+     * Therefore, we defer freeing the most recently allocated arena until
+     * we reach popheap().  This may fail to reclaim some space in earlier
+     * arenas.
+     *
      * In other cases, either fheap is already correct, or it has never
      * been set and this loop will do it, or it'll be reset from scratch
      * on the next popheap().  So all that's needed here is to pick up
@@ -361,7 +359,11 @@ freeheap(void)
 	    memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
 #endif
 	    h->used = h->sp->used;
-	    if (!fheap && h->used < ARENA_SIZEOF(h))
+	    if (!fheap) {
+		if (h->used < ARENA_SIZEOF(h))
+		    fheap = h;
+	    } else if (ARENA_SIZEOF(h) - h->used >
+		       ARENA_SIZEOF(fheap) - fheap->used)
 		fheap = h;
 	    hl = h;
 #ifdef ZSH_HEAP_DEBUG
@@ -384,6 +386,26 @@ freeheap(void)
 	    VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
 #endif
 	} else {
+	    if (h->next) {
+		/* We want to cut this out of the arena list if we can */
+		if (h == heaps)
+		    hl = heaps = h->next;
+		else if (hl && hl->next == h)
+		    hl->next = h->next;
+		else {
+		    DPUTS(hl, "hl->next != h when freeing");
+		    hl = h;
+		    continue;
+		}
+		h->next = NULL;
+	    } else {
+		/* Leave an empty arena at the end until popped */
+		h->used = 0;
+		fheap = hl = h;
+		break;
+	    }
+	    if (fheap == h)
+		fheap = NULL;
 #ifdef USE_MMAP
 	    munmap((void *) h, h->size);
 #else
@@ -441,12 +463,29 @@ popheap(void)
 #ifdef ZSH_VALGRIND
 	    VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
 #endif
-	    if (!fheap && h->used < ARENA_SIZEOF(h))
+	    if (!fheap) {
+		if (h->used < ARENA_SIZEOF(h))
+		    fheap = h;
+	    } else if (ARENA_SIZEOF(h) - h->used >
+		       ARENA_SIZEOF(fheap) - fheap->used)
 		fheap = h;
 	    zfree(hs, sizeof(*hs));
 
 	    hl = h;
 	} else {
+	    if (h->next) {
+		/* We want to cut this out of the arena list if we can */
+		if (h == heaps)
+		    hl = heaps = h->next;
+		else if (hl && hl->next == h)
+		    hl->next = h->next;
+		else {
+		    DPUTS(hl, "hl->next != h when popping");
+		    hl = h;
+		    continue;
+		}
+		h->next = NULL;
+	    }
 #ifdef USE_MMAP
 	    munmap((void *) h, h->size);
 #else
@@ -524,7 +563,7 @@ zheapptr(void *p)
 mod_export void *
 zhalloc(size_t size)
 {
-    Heap h;
+    Heap h, hp = NULL;
     size_t n;
 #ifdef ZSH_VALGRIND
     size_t req_size = size;
@@ -546,6 +585,7 @@ zhalloc(size_t size)
     for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
 	      ? fheap : heaps);
 	 h; h = h->next) {
+	hp = h;
 	if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
 	    void *ret;
 
@@ -566,7 +606,6 @@ zhalloc(size_t size)
 	}
     }
     {
-	Heap hp;
         /* not found, allocate new heap */
 #if defined(ZSH_MEM) && !defined(USE_MMAP)
 	static int called = 0;
@@ -575,7 +614,6 @@ zhalloc(size_t size)
 #endif
 
 	n = HEAP_ARENA_SIZE > size ? HEAPSIZE : size + sizeof(*h);
-	for (hp = NULL, h = heaps; h; hp = h, h = h->next);
 
 #ifdef USE_MMAP
 	h = mmap_heap_alloc(&n);
@@ -607,6 +645,7 @@ zhalloc(size_t size)
 	VALGRIND_MEMPOOL_ALLOC((char *)h, (char *)arena(h), req_size);
 #endif
 
+	DPUTS(hp && hp->next, "failed to find end of chain in zhalloc");
 	if (hp)
 	    hp->next = h;
 	else


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11 16:17             ` Bart Schaefer
@ 2015-10-11 16:48               ` Sebastian Gniazdowski
  2015-10-11 17:31                 ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-11 16:48 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 6695 bytes --]

It's now much faster, zprof time 11851 vs 3433. That's still strongly
lagging behavior. One other trait is that it slows down after a while.


Best regards,
Sebastian Gniazdowski

On 11 October 2015 at 18:17, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 11, 10:39am, Sebastian Gniazdowski wrote:
> }
> } Zprof says there is no change:
>
> Well, that's not entirely surprising.  It probably just means that there
> is some space in a lot of arenas, so fpop doesn't move far from the head
> of the list.  Or it could mean that the real problem is popheap() rather
> than freeheap().
>
> } BTW, the patch didn't apply to head or to 23f98c3,
>
> Curious.  "git diff master origin/master" shows no diffs for my local
> repository, and the patch was just a straight "git diff".
>
> There were additional changes committed after 23f98c3 so there's no
> reason to have expected it would apply to that.
>
> In any case here's another stab at it.  Should apply to 83a1757.  This
> should help if there's a lot of freeheap() happening, but not so much
> if there's a lot of push/pop (in fact it may be worse in that case).
>
> If it's push/pop that's the issue, the next step would be to try to
> identify the crucial pushheap() and replace it with NEWHEAPS() if the
> scope permits.
>
>
> diff --git a/Src/mem.c b/Src/mem.c
> index b9569ea..d49f685 100644
> --- a/Src/mem.c
> +++ b/Src/mem.c
> @@ -294,7 +294,7 @@ pushheap(void)
>  #endif
>
>      for (h = heaps; h; h = h->next) {
> -       DPUTS(!h->used, "BUG: empty heap");
> +       DPUTS(!h->used && h->next, "BUG: empty heap");
>         hs = (Heapstack) zalloc(sizeof(*hs));
>         hs->next = h->sp;
>         h->sp = hs;
> @@ -334,17 +334,15 @@ freeheap(void)
>       *
>       * Whenever fheap is NULL here, the loop below sweeps back over the
>       * entire heap list again, resetting the free space in every arena to
> -     * the amount stashed by pushheap() and finding the first arena with
> +     * the amount stashed by pushheap() and finding the arena with the most
>       * free space to optimize zhalloc()'s next search.  When there's a lot
>       * of stuff already on the heap, this is an enormous amount of work,
>       * and performance goes to hell.
>       *
> -     * However, if the arena to which fheap points is unused, we want to
> -     * free it, so we have no choice but to do the sweep for a new fheap.
> -     */
> -    if (fheap && !fheap->sp)
> -       fheap = NULL;   /* We used to do this unconditionally */
> -    /*
> +     * Therefore, we defer freeing the most recently allocated arena until
> +     * we reach popheap().  This may fail to reclaim some space in earlier
> +     * arenas.
> +     *
>       * In other cases, either fheap is already correct, or it has never
>       * been set and this loop will do it, or it'll be reset from scratch
>       * on the next popheap().  So all that's needed here is to pick up
> @@ -361,7 +359,11 @@ freeheap(void)
>             memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
>  #endif
>             h->used = h->sp->used;
> -           if (!fheap && h->used < ARENA_SIZEOF(h))
> +           if (!fheap) {
> +               if (h->used < ARENA_SIZEOF(h))
> +                   fheap = h;
> +           } else if (ARENA_SIZEOF(h) - h->used >
> +                      ARENA_SIZEOF(fheap) - fheap->used)
>                 fheap = h;
>             hl = h;
>  #ifdef ZSH_HEAP_DEBUG
> @@ -384,6 +386,26 @@ freeheap(void)
>             VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
>  #endif
>         } else {
> +           if (h->next) {
> +               /* We want to cut this out of the arena list if we can */
> +               if (h == heaps)
> +                   hl = heaps = h->next;
> +               else if (hl && hl->next == h)
> +                   hl->next = h->next;
> +               else {
> +                   DPUTS(hl, "hl->next != h when freeing");
> +                   hl = h;
> +                   continue;
> +               }
> +               h->next = NULL;
> +           } else {
> +               /* Leave an empty arena at the end until popped */
> +               h->used = 0;
> +               fheap = hl = h;
> +               break;
> +           }
> +           if (fheap == h)
> +               fheap = NULL;
>  #ifdef USE_MMAP
>             munmap((void *) h, h->size);
>  #else
> @@ -441,12 +463,29 @@ popheap(void)
>  #ifdef ZSH_VALGRIND
>             VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
>  #endif
> -           if (!fheap && h->used < ARENA_SIZEOF(h))
> +           if (!fheap) {
> +               if (h->used < ARENA_SIZEOF(h))
> +                   fheap = h;
> +           } else if (ARENA_SIZEOF(h) - h->used >
> +                      ARENA_SIZEOF(fheap) - fheap->used)
>                 fheap = h;
>             zfree(hs, sizeof(*hs));
>
>             hl = h;
>         } else {
> +           if (h->next) {
> +               /* We want to cut this out of the arena list if we can */
> +               if (h == heaps)
> +                   hl = heaps = h->next;
> +               else if (hl && hl->next == h)
> +                   hl->next = h->next;
> +               else {
> +                   DPUTS(hl, "hl->next != h when popping");
> +                   hl = h;
> +                   continue;
> +               }
> +               h->next = NULL;
> +           }
>  #ifdef USE_MMAP
>             munmap((void *) h, h->size);
>  #else
> @@ -524,7 +563,7 @@ zheapptr(void *p)
>  mod_export void *
>  zhalloc(size_t size)
>  {
> -    Heap h;
> +    Heap h, hp = NULL;
>      size_t n;
>  #ifdef ZSH_VALGRIND
>      size_t req_size = size;
> @@ -546,6 +585,7 @@ zhalloc(size_t size)
>      for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
>               ? fheap : heaps);
>          h; h = h->next) {
> +       hp = h;
>         if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
>             void *ret;
>
> @@ -566,7 +606,6 @@ zhalloc(size_t size)
>         }
>      }
>      {
> -       Heap hp;
>          /* not found, allocate new heap */
>  #if defined(ZSH_MEM) && !defined(USE_MMAP)
>         static int called = 0;
> @@ -575,7 +614,6 @@ zhalloc(size_t size)
>  #endif
>
>         n = HEAP_ARENA_SIZE > size ? HEAPSIZE : size + sizeof(*h);
> -       for (hp = NULL, h = heaps; h; hp = h, h = h->next);
>
>  #ifdef USE_MMAP
>         h = mmap_heap_alloc(&n);
> @@ -607,6 +645,7 @@ zhalloc(size_t size)
>         VALGRIND_MEMPOOL_ALLOC((char *)h, (char *)arena(h), req_size);
>  #endif
>
> +       DPUTS(hp && hp->next, "failed to find end of chain in zhalloc");
>         if (hp)
>             hp->next = h;
>         else

[-- Attachment #2: result2.txt --]
[-- Type: text/plain, Size: 1886 bytes --]

patched:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        3433,98  3433,98   99,62%   2439,38  2439,38   70,77%  n-list
 2)  161         638,21     3,96   18,52%    638,21     3,96   18,52%  _nlist_print_with_ansi
 3)    7         928,72   132,67   26,94%    290,51    41,50    8,43%  n-list-draw
 4)    7          50,29     7,18    1,46%     50,20     7,17    1,46%  n-list-input
 5)    2          11,39     5,69    0,33%     11,39     5,69    0,33%  _nlist_cursor_visibility
 6)    7          10,79     1,54    0,31%     10,62     1,52    0,31%  _nlist_setup_user_vars
 7)    1          12,96    12,96    0,38%      6,09     6,09    0,18%  _nlist_exit
 8)    7           0,29     0,04    0,01%      0,29     0,04    0,01%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  5,18s user 0,18s system 89% cpu 5,959 total

clean:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161        5266,46    32,71   44,35%   5266,46    32,71   44,35%  _nlist_print_with_ansi
 2)    1       11851,19 11851,19   99,79%   3706,12  3706,12   31,21%  n-list
 3)    7        7754,84  1107,83   65,30%   2488,38   355,48   20,95%  n-list-draw
 4)    7         316,40    45,20    2,66%    310,96    44,42    2,62%  n-list-input
 5)    7          38,35     5,48    0,32%     30,59     4,37    0,26%  _nlist_setup_user_vars
 6)    7          26,60     3,80    0,22%     26,60     3,80    0,22%  _nlist_status_msg
 7)    2          23,89    11,95    0,20%     23,89    11,95    0,20%  _nlist_cursor_visibility
 8)    1          24,35    24,35    0,21%      9,34     9,34    0,08%  _nlist_exit
( n-panelize < ~/lsoflsof; )  13,77s user 0,21s system 95% cpu 14,660 total


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11 16:48               ` Sebastian Gniazdowski
@ 2015-10-11 17:31                 ` Bart Schaefer
  2015-10-11 18:05                   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 17:31 UTC (permalink / raw)
  To: zsh-workers

On Oct 11,  6:48pm, Sebastian Gniazdowski wrote:
} 
} It's now much faster, zprof time 11851 vs 3433. That's still strongly
} lagging behavior. One other trait is that it slows down after a while.

The whole algorithm is designed to maximally fill the allocated space
and thereby minimize the memory footprint, not to maximize speed.  So
the more often you grow/free the heap, the more time it spends looking
for unused space in the existing arenas.

Besides zprof, it would be interesting to see the maximum memory usage
of the patched and unpatched shells.

Does this change in zhalloc() make any difference?  This goes farther
in the direction of abandoning small bits of earlier arenas until the
popheap() comes around.

diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..63a5afe 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -543,9 +543,14 @@ zhalloc(size_t size)
 
     /* find a heap with enough free space */
 
-    for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
-	      ? fheap : heaps);
-	 h; h = h->next) {
+    /*
+     * This previously assigned:
+     *   h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
+     *	      ? fheap : heaps);
+     * but we think that nothing upstream of fheap has more free space,
+     * so why start over at heaps just because fheap has too little?
+     */
+    for (h = (fheap ? fheap : heaps); h; h = h->next) {
 	if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
 	    void *ret;
 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11 17:31                 ` Bart Schaefer
@ 2015-10-11 18:05                   ` Sebastian Gniazdowski
  2015-10-11 21:22                     ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-11 18:05 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

The patch (alone) much helped, zprof time is now 3027 and there is no
after-while slow down. Htop shows RES of 152 MB. For the first patch
RES is 166 MB. For the clean version it's 152 MB. I didn't combine the
patches (BTW. the problems with applying patches were caused by expand
tabs that occured on copy-pasting).


On 11 October 2015 at 19:31, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 11,  6:48pm, Sebastian Gniazdowski wrote:
> }
> } It's now much faster, zprof time 11851 vs 3433. That's still strongly
> } lagging behavior. One other trait is that it slows down after a while.
>
> The whole algorithm is designed to maximally fill the allocated space
> and thereby minimize the memory footprint, not to maximize speed.  So
> the more often you grow/free the heap, the more time it spends looking
> for unused space in the existing arenas.
>
> Besides zprof, it would be interesting to see the maximum memory usage
> of the patched and unpatched shells.
>
> Does this change in zhalloc() make any difference?  This goes farther
> in the direction of abandoning small bits of earlier arenas until the
> popheap() comes around.
>
> diff --git a/Src/mem.c b/Src/mem.c
> index b9569ea..63a5afe 100644
> --- a/Src/mem.c
> +++ b/Src/mem.c
> @@ -543,9 +543,14 @@ zhalloc(size_t size)
>
>      /* find a heap with enough free space */
>
> -    for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
> -             ? fheap : heaps);
> -        h; h = h->next) {
> +    /*
> +     * This previously assigned:
> +     *   h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
> +     *       ? fheap : heaps);
> +     * but we think that nothing upstream of fheap has more free space,
> +     * so why start over at heaps just because fheap has too little?
> +     */
> +    for (h = (fheap ? fheap : heaps); h; h = h->next) {
>         if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
>             void *ret;
>

[-- Attachment #2: result3.txt --]
[-- Type: text/plain, Size: 2856 bytes --]

patched (zhalloc):
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        3027,41  3027,41   99,53%   2025,50  2025,50   66,59%  n-list
 2)  161         674,02     4,19   22,16%    674,02     4,19   22,16%  _nlist_print_with_ansi
 3)    7         962,52   137,50   31,64%    288,50    41,21    9,48%  n-list-draw
 4)    7          24,50     3,50    0,81%     24,31     3,47    0,80%  n-list-input
 5)    2          13,09     6,54    0,43%     13,09     6,54    0,43%  _nlist_cursor_visibility
 6)    7           9,18     1,31    0,30%      8,99     1,28    0,30%  _nlist_setup_user_vars
 7)    1          14,33    14,33    0,47%      6,23     6,23    0,20%  _nlist_exit
 8)    7           0,71     0,10    0,02%      0,71     0,10    0,02%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  4,62s user 0,18s system 88% cpu 5,432 total

patched (the first patch):
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        3547,43  3547,43   99,63%   2560,58  2560,58   71,92%  n-list
 2)  161         636,56     3,95   17,88%    636,56     3,95   17,88%  _nlist_print_with_ansi
 3)    7         921,28   131,61   25,88%    284,71    40,67    8,00%  n-list-draw
 4)    7          49,42     7,06    1,39%     49,33     7,05    1,39%  n-list-input
 5)    2          11,96     5,98    0,34%     11,96     5,98    0,34%  _nlist_cursor_visibility
 6)    7          10,80     1,54    0,30%     10,64     1,52    0,30%  _nlist_setup_user_vars
 7)    1          13,05    13,05    0,37%      6,17     6,17    0,17%  _nlist_exit
 8)    7           0,28     0,04    0,01%      0,28     0,04    0,01%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  5,17s user 0,18s system 88% cpu 6,032 total

clean:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161        5176,77    32,15   43,84%   5176,77    32,15   43,84%  _nlist_print_with_ansi
 2)    1       11785,16 11785,16   99,79%   3755,23  3755,23   31,80%  n-list
 3)    7        7640,70  1091,53   64,70%   2463,93   351,99   20,86%  n-list-draw
 4)    7         316,45    45,21    2,68%    310,97    44,42    2,63%  n-list-input
 5)    7          37,72     5,39    0,32%     30,00     4,29    0,25%  _nlist_setup_user_vars
 6)    7          26,52     3,79    0,22%     26,52     3,79    0,22%  _nlist_status_msg
 7)    2          23,68    11,84    0,20%     23,68    11,84    0,20%  _nlist_cursor_visibility
 8)    1          24,45    24,45    0,21%      9,32     9,32    0,08%  _nlist_exit
( n-panelize < ~/lsoflsof; )  13,64s user 0,20s system 95% cpu 14,548 total

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11 18:05                   ` Sebastian Gniazdowski
@ 2015-10-11 21:22                     ` Bart Schaefer
  2015-10-12  8:21                       ` Sebastian Gniazdowski
  2015-10-12 12:05                       ` Sebastian Gniazdowski
  0 siblings, 2 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-11 21:22 UTC (permalink / raw)
  To: zsh-workers

Please don't top-post, or if you do, don't include the entire message to
which you are replying as an excerpt.  If using gmail, you often can't
tell this is happening; click on the little [...] box at the bottom and
selectively delete, or cursor-key down so the [...] is highlighted and
then press delete to kill the entire excerpt.  Thanks.

On Oct 11,  8:05pm, Sebastian Gniazdowski wrote:
}
} The patch (alone) much helped, zprof time is now 3027

Wow, I wasn't expecting that much difference.

} Htop shows RES of 152 MB. For the first patch
} RES is 166 MB. For the clean version it's 152 MB.

OK, this makes sense because the patch from workers/36834 is leaving some
heaps around longer whereas the patch from workers/36836 is just avoiding
a re-scan of the list of heaps at allocation time.

} I didn't combine the patches

It may be interesting to do so, they shouldn't conflict.  From what you
said before --

} > } It's now much faster, zprof time 11851 vs 3433.

-- each of them cut the time by about 2/3, so together they may get it
below 3000.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11 21:22                     ` Bart Schaefer
@ 2015-10-12  8:21                       ` Sebastian Gniazdowski
  2015-10-12 14:01                         ` Bart Schaefer
  2015-10-13 13:07                         ` Sebastian Gniazdowski
  2015-10-12 12:05                       ` Sebastian Gniazdowski
  1 sibling, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-12  8:21 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

On 11 October 2015 at 23:22, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 11,  8:05pm, Sebastian Gniazdowski wrote:
> } I didn't combine the patches
>
> It may be interesting to do so, they shouldn't conflict.  From what you
> said before --
>
> } > } It's now much faster, zprof time 11851 vs 3433.
>
> -- each of them cut the time by about 2/3, so together they may get it
> below 3000.

With both patches zprof running time is around 1806. RES is 152M and
goes down nicely to 37564 when I quit the script. Didn't observe
after-while slow down. The tool is almost fully usable for 89k
elements (there's still a bit lag), however searching is slow, I
attached zprof results for searching. It is done via two following
invocations:

                list=( "${(@M)list:#(#i)*$~search_pattern*}" )
                col_list=(
"${(@)list//(#mi)($~colsearch_pattern)/$red${MATCH}$reset}" )

Maybe they trigger some specific behavior on heap?

> --
> Barton E. Schaefer

[-- Attachment #2: result4.txt --]
[-- Type: text/plain, Size: 1886 bytes --]

patched:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        1806,60  1806,60   99,17%   1072,67  1072,67   58,88%  n-list
 2)  161         458,49     2,85   25,17%    458,49     2,85   25,17%  _nlist_print_with_ansi
 3)    7         696,02    99,43   38,21%    237,53    33,93   13,04%  n-list-draw
 4)    7          23,89     3,41    1,31%     23,77     3,40    1,31%  n-list-input
 5)    2          13,03     6,51    0,72%     13,03     6,51    0,72%  _nlist_cursor_visibility
 6)    7           9,15     1,31    0,50%      8,96     1,28    0,49%  _nlist_setup_user_vars
 7)    1          15,03    15,03    0,83%      6,61     6,61    0,36%  _nlist_exit
 8)    7           0,28     0,04    0,02%      0,28     0,04    0,02%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  3,16s user 0,16s system 85% cpu 3,884 total

clean:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161        5131,50    31,87   43,12%   5131,50    31,87   43,12%  _nlist_print_with_ansi
 2)    1       11873,99 11873,99   99,78%   3933,58  3933,58   33,06%  n-list
 3)    7        7560,21  1080,03   63,53%   2428,71   346,96   20,41%  n-list-draw
 4)    7         309,00    44,14    2,60%    303,75    43,39    2,55%  n-list-input
 5)    7          37,24     5,32    0,31%     29,64     4,23    0,25%  _nlist_setup_user_vars
 6)    7          26,21     3,74    0,22%     26,21     3,74    0,22%  _nlist_status_msg
 7)    2          23,46    11,73    0,20%     23,46    11,73    0,20%  _nlist_cursor_visibility
 8)    1          26,01    26,01    0,22%     10,30    10,30    0,09%  _nlist_exit
( n-panelize < ~/lsoflsof; )  13,52s user 0,19s system 93% cpu 14,614 total


[-- Attachment #3: result_search.txt --]
[-- Type: text/plain, Size: 3812 bytes --]

the long patch:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1       227359,12 227359,12   99,99%  205528,35 205528,35   90,39%  n-list
 2)  161       19586,67   121,66    8,61%  19586,67   121,66    8,61%  _nlist_print_with_ansi
 3)    7       21154,74  3022,11    9,30%   1568,07   224,01    0,69%  n-list-draw
 4)    7         589,99    84,28    0,26%    565,13    80,73    0,25%  n-list-input
 5)    4          54,45    13,61    0,02%     54,45    13,61    0,02%  _nlist_verify_vars
 6)    4          35,64     8,91    0,02%     35,64     8,91    0,02%  _nlist_cursor_visibility
 7)    7          25,32     3,62    0,01%     25,25     3,61    0,01%  _nlist_setup_user_vars
 8)    1          13,08    13,08    0,01%      6,60     6,60    0,00%  _nlist_exit
( n-panelize < ~/lsoflsof; )  227,77s user 0,92s system 99% cpu 3:49,78 total

zhalloc patch:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1       49381,60 49381,60   99,97%  39748,80 39748,80   80,47%  n-list
 2)  161        8858,04    55,02   17,93%   8858,04    55,02   17,93%  _nlist_print_with_ansi
 3)    7        9560,66  1365,81   19,36%    702,62   100,37    1,42%  n-list-draw
 4)    7          63,54     9,08    0,13%     46,58     6,65    0,09%  n-list-input
 5)    4          28,48     7,12    0,06%     28,48     7,12    0,06%  _nlist_cursor_visibility
 6)    1          14,35    14,35    0,03%      6,35     6,35    0,01%  _nlist_exit
 7)    7           3,38     0,48    0,01%      3,31     0,47    0,01%  _nlist_setup_user_vars
 8)    7           1,49     0,21    0,00%      1,49     0,21    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  50,29s user 0,72s system 98% cpu 51,757 total

both patches:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161       11162,41    69,33   50,81%  11162,41    69,33   50,81%  _nlist_print_with_ansi
 2)    1       21956,87 21956,87   99,94%   9857,85  9857,85   44,87%  n-list
 3)    7       12012,06  1716,01   54,67%    849,65   121,38    3,87%  n-list-draw
 4)    7          79,19    11,31    0,36%     60,36     8,62    0,27%  n-list-input
 5)    4          29,77     7,44    0,14%     29,77     7,44    0,14%  _nlist_cursor_visibility
 6)    1          13,19    13,19    0,06%      6,59     6,59    0,03%  _nlist_exit
 7)    7           2,83     0,40    0,01%      2,76     0,39    0,01%  _nlist_setup_user_vars
 8)    7           0,36     0,05    0,00%      0,36     0,05    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  22,20s user 1,15s system 97% cpu 24,021 total

clean:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161       139846,51   868,61   55,42%  139846,51   868,61   55,42%  _nlist_print_with_ansi
 2)    1       252311,10 252311,10   99,99%  107251,66 107251,66   42,50%  n-list
 3)    7       144256,54 20608,08   57,17%   4410,03   630,00    1,75%  n-list-draw
 4)    7         698,59    99,80    0,28%    626,00    89,43    0,25%  n-list-input
 5)    4          95,33    23,83    0,04%     95,33    23,83    0,04%  _nlist_cursor_visibility
 6)    7          59,63     8,52    0,02%     59,63     8,52    0,02%  _nlist_status_msg
 7)    7          36,60     5,23    0,01%     35,84     5,12    0,01%  _nlist_setup_user_vars
 8)    1          24,74    24,74    0,01%      9,86     9,86    0,00%  _nlist_exit
( n-panelize < ~/lsoflsof; )  253,40s user 0,88s system 99% cpu 4:15,08 total


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-11 21:22                     ` Bart Schaefer
  2015-10-12  8:21                       ` Sebastian Gniazdowski
@ 2015-10-12 12:05                       ` Sebastian Gniazdowski
  2015-10-12 15:13                         ` Bart Schaefer
  1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-12 12:05 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On 11 October 2015 at 23:22, Bart Schaefer <schaefer@brasslantern.com> wrote:
> } Htop shows RES of 152 MB. For the first patch
> } RES is 166 MB. For the clean version it's 152 MB.
>
> OK, this makes sense because the patch from workers/36834 is leaving some
> heaps around longer whereas the patch from workers/36836 is just avoiding
> a re-scan of the list of heaps at allocation time.

Can something be done inside the script to make memory more prepared
for heavy computation? I tried putting:

                # Allocate a large buffer just to free it
                a=""
                a="${(r:2000000:: :)a}"
                a=""

before:
                list=( "${(@M)list:#(#i)*$~search_pattern*}" )

But it doesn't make difference

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-12  8:21                       ` Sebastian Gniazdowski
@ 2015-10-12 14:01                         ` Bart Schaefer
  2015-10-12 16:50                           ` Sebastian Gniazdowski
  2015-10-13 13:07                         ` Sebastian Gniazdowski
  1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-12 14:01 UTC (permalink / raw)
  To: zsh-workers

On Oct 12, 10:21am, Sebastian Gniazdowski wrote:
}
} With both patches zprof running time is around 1806. RES is 152M and
} goes down nicely to 37564 when I quit the script. Didn't observe
} after-while slow down. The tool is almost fully usable for 89k
} elements (there's still a bit lag), however searching is slow,

If you configure --enable-zsh-heap-debug you can edit mem.c to set a
value for the heap_debug_verbosity variable which will cause a LOT of
debugging messages to be printed to stderr.  This may tell you where
it's spending time.

Shell variables are always passed around internally by value rather
than by reference, so operations on a big list will copy it a lot.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-12 12:05                       ` Sebastian Gniazdowski
@ 2015-10-12 15:13                         ` Bart Schaefer
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-12 15:13 UTC (permalink / raw)
  To: zsh-workers

On Oct 12,  2:05pm, Sebastian Gniazdowski wrote:
}
} Can something be done inside the script to make memory more prepared
} for heavy computation?

Doubtful.  The heap is a layer on top of malloc, so although it might
be possible to get system memory from malloc into the heap list, that
won't remove any of the heap implementation overhead, nor will it
keep zsh from copying the data around internally every time you use
${list} in the script later.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-12 14:01                         ` Bart Schaefer
@ 2015-10-12 16:50                           ` Sebastian Gniazdowski
  2015-10-13  0:33                             ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-12 16:50 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On 12 October 2015 at 16:01, Bart Schaefer <schaefer@brasslantern.com> wrote:
> If you configure --enable-zsh-heap-debug you can edit mem.c to set a
> value for the heap_debug_verbosity variable which will cause a LOT of
> debugging messages to be printed to stderr.  This may tell you where
> it's spending time.

Did that, set heap_debug_verbosity to 0x7f. When I issue a search,
screen gets constantly outputted with messages about heap pop. One
series started with:
HEAP DEBUG: heap ff604f popped, old heap was ff51af.
HEAP DEBUG: heap ff6050 popped, old heap was ff51b0.
HEAP DEBUG: heap ff6051 popped, old heap was ff51b1.

To then went through:
HEAP DEBUG: heap ff5592 popped, old heap was 52dde.
HEAP DEBUG: heap ff5593 popped, old heap was 52ddf.
HEAP DEBUG: heap 50984 popped, old heap was 4fe96.
HEAP DEBUG: heap 50985 popped, old heap was 4fea7.

And ended at:
HEAP DEBUG: heap 5145f popped, old heap was 50981.
HEAP DEBUG: heap 51460 popped, old heap was 50982.
HEAP DEBUG: heap 51461 popped, old heap was 50983.

Set my terminal scroll back to "unlimited", repeated that. However
scroll back buffer got cut off at some point. At top there was:

HEAP DEBUG: heap fb5f19 popped, old heap was fb5079.
HEAP DEBUG: heap fb5f1a popped, old heap was fb507a.
...

Then:
HEAP DEBUG: heap fb5466 popped, old heap was d35b.
HEAP DEBUG: heap fb5467 popped, old heap was d35c.
HEAP DEBUG: heap a41f popped, old heap was 9931.
HEAP DEBUG: heap a420 popped, old heap was 9942.

And at the end:
HEAP DEBUG: heap aefb popped, old heap was a41d.
HEAP DEBUG: heap aefc popped, old heap was a41e.
HEAP DEBUG: heap 9931 popped, old heap was 98f8.
HEAP DEBUG: heap 98f8 freed, new id is fb71a8.
HEAP DEBUG: heap fb71a8 pushed, new id is fb71a9.
HEAP DEBUG: heap fb71a9 popped, old heap was fb71a8.

Maybe the numbers mean something?

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-12 16:50                           ` Sebastian Gniazdowski
@ 2015-10-13  0:33                             ` Bart Schaefer
  2015-10-13  8:21                               ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-13  0:33 UTC (permalink / raw)
  To: zsh-workers

On Oct 12,  6:50pm, Sebastian Gniazdowski wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} On 12 October 2015 at 16:01, Bart Schaefer <schaefer@brasslantern.com> wrote:
} > If you configure --enable-zsh-heap-debug you can edit mem.c to set a
} > value for the heap_debug_verbosity variable
} 
} Did that, set heap_debug_verbosity to 0x7f. When I issue a search,
} screen gets constantly outputted with messages about heap pop.

Yes, you probably want to prefix your test with

	{
	  exec {fd}>&2 2>&debug_trace_file.txt
	  # your test here
	} always {
	  exec 2>&$fd
	}

(choose your own file name).  I mean to mention that but forgot.

If you capture the output in a file (it may be a BIG file) you can do
things like "fgrep -c allocated" to count the number of times each heap
event happens.

} Maybe the numbers mean something?

Yes, they're an ID assigned every time a heap is either pushed or freed;
e.g.:

} HEAP DEBUG: heap 98f8 freed, new id is fb71a8.

The heap ID is incremented once at each push or free, so there have been
a minimum of 16439472 heap operations (fb71a8 - 98f8) between the
creation of that heap and the eventual free.  (Note the number of heap
operations is the number of push+free multiplied by the average number
of heap arenas, because a new ID is assigned to each arena every time.)

There should be a message about "pushed" corresponding to each "popped",
like this pair:

} HEAP DEBUG: heap fb71a8 pushed, new id is fb71a9.
} HEAP DEBUG: heap fb71a9 popped, old heap was fb71a8.

You should also have sometimes seen "allocated memory from ..." or
"create new heap ..." messages.  The fact that your scrollback is
almost entirely "popped" messages is an indication that you're really
deep in a recursion at the time your script exits.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-13  0:33                             ` Bart Schaefer
@ 2015-10-13  8:21                               ` Sebastian Gniazdowski
  2015-10-13 15:52                                 ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-13  8:21 UTC (permalink / raw)
  To: zsh-workers

On 13 October 2015 at 02:33, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 12,  6:50pm, Sebastian Gniazdowski wrote:
> } Maybe the numbers mean something?
>
> Yes, they're an ID assigned every time a heap is either pushed or freed;
> e.g.:

I meant that maybe the numbers yield some optimization strategy. I
think we did a great job speeding up memory-heavy Zsh 6.5 (typical
execution) or 11.5 times (use of patterns on arrays). Thank you for
your effort (my tools greatly benefit), credits are mostly to you. I
would try to implement something more. Have you any more ideas for
optimization strategies? Basing on your description of heap mechanism
I could try to implement it. But maybe there is nothing left to be
done?

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-12  8:21                       ` Sebastian Gniazdowski
  2015-10-12 14:01                         ` Bart Schaefer
@ 2015-10-13 13:07                         ` Sebastian Gniazdowski
  2015-10-13 13:31                           ` Sebastian Gniazdowski
  1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-13 13:07 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

On 12 October 2015 at 10:21, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> With both patches zprof running time is around 1806. RES is 152M and
> goes down nicely to 37564 when I quit the script. Didn't observe
> after-while slow down. The tool is almost fully usable for 89k
> elements (there's still a bit lag), however searching is slow, I
> attached zprof results for searching. It is done via two following
> invocations:
>
>                 list=( "${(@M)list:#(#i)*$~search_pattern*}" )
>                 col_list=(
> "${(@)list//(#mi)($~colsearch_pattern)/$red${MATCH}$reset}" )
>
> Maybe they trigger some specific behavior on heap?

Did one more memory and performance test for version with both patches
and without them. Earlier I just started the script and moved 9 lines,
then checked memory. Now I ran search with $search_pattern and
$colsearch_pattern set to "s", single leter, then checked memory. RES
is 433M and 432M, so it's fine. The patched (HEAD) version is 14 times
faster. I also checked 5.0.2 – RES is 400M, and it's ~8 times slower
than the patched version.

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: result_search2.txt --]
[-- Type: text/plain, Size: 3078 bytes --]

both patches (HEAD):
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1       10112,65 10112,65   99,85%   5124,47  5124,47   50,60%  n-list
 2)   92        4597,96    49,98   45,40%   4597,96    49,98   45,40%  _nlist_print_with_ansi
 3)    4        4910,97  1227,74   48,49%    313,01    78,25    3,09%  n-list-draw
 4)    4          69,24    17,31    0,68%     53,26    13,31    0,53%  n-list-input
 5)    4          28,51     7,13    0,28%     28,51     7,13    0,28%  _nlist_cursor_visibility
 6)    1          14,79    14,79    0,15%      7,11     7,11    0,07%  _nlist_exit
 7)    4           2,78     0,70    0,03%      2,72     0,68    0,03%  _nlist_setup_user_vars
 8)    4           0,18     0,05    0,00%      0,18     0,05    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  10,88s user 0,52s system 93% cpu 12,153 total

RES: 433M

no patches:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)   92       77542,89   842,86   54,42%  77542,89   842,86   54,42%  _nlist_print_with_ansi
 2)    1       142460,54 142460,54   99,98%  61816,42 61816,42   43,38%  n-list
 3)    4       80130,34 20032,59   56,24%   2587,46   646,86    1,82%  n-list-draw
 4)    4         456,02   114,01    0,32%    362,14    90,54    0,25%  n-list-input
 5)    4         117,37    29,34    0,08%    117,37    29,34    0,08%  _nlist_cursor_visibility
 6)    4          28,08     7,02    0,02%     28,08     7,02    0,02%  _nlist_status_msg
 7)    4          19,50     4,88    0,01%     18,71     4,68    0,01%  _nlist_setup_user_vars
 8)    1          26,80    26,80    0,02%     11,03    11,03    0,01%  _nlist_exit
( n-panelize < ~/lsoflsof; )  143,73s user 0,54s system 99% cpu 2:25,15 total

RES: 432M

5.0.2 - the test is biased because I browsed list for ~10 seconds before the
script catched "second breath" and started to work fast. But the user time here
is quite accurate.
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  828       43962,09    53,09   48,09%  43962,09    53,09   48,09%  _nlist_print_with_ansi
 2)    1       91406,33 91406,33   99,99%  40327,39 40327,39   44,12%  n-list
 3)   36       50295,77  1397,10   55,02%   6333,68   175,94    6,93%  n-list-draw
 4)   36         602,80    16,74    0,66%    538,63    14,96    0,59%  n-list-input
 5)   36         108,53     3,01    0,12%     84,31     2,34    0,09%  _nlist_setup_user_vars
 6)   36          63,53     1,76    0,07%     63,53     1,76    0,07%  _nlist_status_msg
 7)    4          62,24    15,56    0,07%     62,24    15,56    0,07%  _nlist_cursor_visibility
 8)   34          24,22     0,71    0,03%     24,22     0,71    0,03%  _nlist_compute_user_vars_difference
( n-panelize < ~/lsoflsof; )  84,60s user 0,42s system 90% cpu 1:33,78 total

RES: 400M

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-13 13:07                         ` Sebastian Gniazdowski
@ 2015-10-13 13:31                           ` Sebastian Gniazdowski
  0 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-13 13:31 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1391 bytes --]

PS. test where I don't wait for 5.0.2 to become fast. RES is 392M.

Best regards,
Sebastian Gniazdowski


On 13 October 2015 at 15:07, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 12 October 2015 at 10:21, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> With both patches zprof running time is around 1806. RES is 152M and
>> goes down nicely to 37564 when I quit the script. Didn't observe
>> after-while slow down. The tool is almost fully usable for 89k
>> elements (there's still a bit lag), however searching is slow, I
>> attached zprof results for searching. It is done via two following
>> invocations:
>>
>>                 list=( "${(@M)list:#(#i)*$~search_pattern*}" )
>>                 col_list=(
>> "${(@)list//(#mi)($~colsearch_pattern)/$red${MATCH}$reset}" )
>>
>> Maybe they trigger some specific behavior on heap?
>
> Did one more memory and performance test for version with both patches
> and without them. Earlier I just started the script and moved 9 lines,
> then checked memory. Now I ran search with $search_pattern and
> $colsearch_pattern set to "s", single leter, then checked memory. RES
> is 433M and 432M, so it's fine. The patched (HEAD) version is 14 times
> faster. I also checked 5.0.2 – RES is 400M, and it's ~8 times slower
> than the patched version.
>
> Best regards,
> Sebastian Gniazdowski

[-- Attachment #2: result_search2a.txt --]
[-- Type: text/plain, Size: 1100 bytes --]

5.0.2 - without waiting for "second breath", just ~1 sec to check if it's slow
(i.e. at the first breath). Searching for "s" (lists of 89k elements)

num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  115       58373,85   507,60   54,48%  58373,85   507,60   54,48%  _nlist_print_with_ansi
 2)    1       107124,50 107124,50   99,98%  46496,94 46496,94   43,40%  n-list
 3)    5       60289,00 12057,80   56,27%   1915,15   383,03    1,79%  n-list-draw
 4)    5         298,42    59,68    0,28%    230,25    46,05    0,21%  n-list-input
 5)    4          88,38    22,09    0,08%     88,38    22,09    0,08%  _nlist_cursor_visibility
 6)    5          17,66     3,53    0,02%     17,66     3,53    0,02%  _nlist_status_msg
 7)    5          14,79     2,96    0,01%     14,21     2,84    0,01%  _nlist_setup_user_vars
 8)    1          23,31    23,31    0,02%      9,11     9,11    0,01%  _nlist_exit
( n-panelize < ~/lsoflsof; )  108,20s user 0,44s system 99% cpu 1:49,53 total

RES: 392M

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-13  8:21                               ` Sebastian Gniazdowski
@ 2015-10-13 15:52                                 ` Bart Schaefer
  2015-10-14  6:50                                   ` Sebastian Gniazdowski
  2015-10-14 13:27                                   ` Peter Stephenson
  0 siblings, 2 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-13 15:52 UTC (permalink / raw)
  To: zsh-workers

On Oct 13, 10:21am, Sebastian Gniazdowski wrote:
}
} I meant that maybe the numbers yield some optimization strategy.
} I could try to implement it. But maybe there is nothing left to be
} done?

I think we're all out of low-level strategies without completely redoing
the algorithm.  As I noted, it's optimized mostly for space rather than
speed.  Based on your later two emails, we've increased memory footprint
by about 25% in order to get a significant increase in speed, so that is
probably an OK trade-off on most modern hardware.

At this point I think any gains would be found higher up -- calls to
push/pop the heap that aren't necessary, or places [similar to the use
in completion code] where we could replace push/pop with a NEWHEAPS()
block to avoid scanning existing heaps for free space.

For example, the following shows what I mean.  This may be a poor
choice if most functions are small, so I won't suggest committing, but
this is the kind of well-isolated push/pop that might benefit.

diff --git a/Src/exec.c b/Src/exec.c
index bcc8065..b9c40df 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -5067,10 +5067,11 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
 #ifdef MAX_FUNCTION_DEPTH
     static int funcdepth;
 #endif
+    Heap funcheap;
 
     queue_signals();	/* Lots of memory and global state changes coming */
 
-    pushheap();
+    NEWHEAPS(funcheap) {
 
     oargv0 = NULL;
     obreaks = breaks;
@@ -5290,7 +5291,8 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
 	numpipestats = oldnumpipestats;
 	memcpy(pipestats, oldpipestats, sizeof(int)*numpipestats);
     }
-    popheap();
+
+    } OLDHEAPS;
 
     unqueue_signals();
 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-13 15:52                                 ` Bart Schaefer
@ 2015-10-14  6:50                                   ` Sebastian Gniazdowski
  2015-10-14 13:27                                   ` Peter Stephenson
  1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-14  6:50 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

With this patch the list is instant fast. It can be seen here:

https://youtu.be/jwBZY6fmQHs

(the zprof result – 1.9x speed up – doesn't fully express how fast is the list)

I attach results for browsing the list and searching through it.
Searching is faster 2.15 times and 1.88 times.

What's interesting is lower RES for the new patch. It's like with the
new patch RES can easier go down after peak. I once saw 179M after
peak of ~400M. The two-patches version can also go down, but maybe not
that easy. So this is quite confusing. I think a more synthetic
testing is needed to recognize how the memory behaves. Now I manually
open a script, give it a work, run htop.

Maybe the patch isn't that bad? Instant fast list is a great thing for me.

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: result_search3.txt --]
[-- Type: text/plain, Size: 3970 bytes --]

=== searching for "sy s" (translated to *sy*~^*s*) ===

new patch:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1       10195,21 10195,21   99,94%   9943,58  9943,58   97,47%  n-list
 2)  161         212,69     1,32    2,08%    212,69     1,32    2,08%  _nlist_print_with_ansi
 3)    4          24,20     6,05    0,24%     24,20     6,05    0,24%  _nlist_cursor_visibility
 4)    7         228,66    32,67    2,24%     15,97     2,28    0,16%  n-list-draw
 5)    7          15,94     2,28    0,16%      3,12     0,45    0,03%  n-list-input
 6)    1           6,22     6,22    0,06%      0,73     0,73    0,01%  _nlist_exit
 7)    7           0,49     0,07    0,00%      0,43     0,06    0,00%  _nlist_setup_user_vars
 8)    7           0,31     0,04    0,00%      0,31     0,04    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  10,59s user 1,08s system 95% cpu 12,228 total

RES: 318M

two previous patches (from repository):
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)  161       11294,81    70,15   51,38%  11294,81    70,15   51,38%  _nlist_print_with_ansi
 2)    1       21965,97 21965,97   99,93%   9726,11  9726,11   44,25%  n-list
 3)    7       12150,96  1735,85   55,28%    856,14   122,31    3,89%  n-list-draw
 4)    7          79,61    11,37    0,36%     61,09     8,73    0,28%  n-list-input
 5)    4          32,33     8,08    0,15%     32,33     8,08    0,15%  _nlist_cursor_visibility
 6)    1          15,82    15,82    0,07%      7,81     7,81    0,04%  _nlist_exit
 7)    7           2,85     0,41    0,01%      2,78     0,40    0,01%  _nlist_setup_user_vars
 8)    7           0,32     0,05    0,00%      0,32     0,05    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  22,37s user 0,93s system 97% cpu 23,972 total

RES: 492M

=== searching for "s" ===

new patch:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        5199,43  5199,43   99,88%   5048,37  5048,37   96,98%  n-list
 2)   92         117,53     1,28    2,26%    117,53     1,28    2,26%  _nlist_print_with_ansi
 3)    4          27,57     6,89    0,53%     27,57     6,89    0,53%  _nlist_cursor_visibility
 4)    4         126,26    31,57    2,43%      8,74     2,18    0,17%  n-list-draw
 5)    4          18,59     4,65    0,36%      2,20     0,55    0,04%  n-list-input
 6)    1           6,29     6,29    0,12%      0,73     0,73    0,01%  _nlist_exit
 7)    4           0,27     0,07    0,01%      0,21     0,05    0,00%  _nlist_setup_user_vars
 8)    4           0,14     0,04    0,00%      0,14     0,04    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  6,21s user 0,63s system 94% cpu 7,237 total

RES: 446M

two previous patches (from repository):
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        9781,26  9781,26   99,86%   4805,60  4805,60   49,06%  n-list
 2)   92        4583,54    49,82   46,80%   4583,54    49,82   46,80%  _nlist_print_with_ansi
 3)    4        4897,09  1224,27   50,00%    313,55    78,39    3,20%  n-list-draw
 4)    4          69,30    17,33    0,71%     52,21    13,05    0,53%  n-list-input
 5)    4          29,65     7,41    0,30%     29,65     7,41    0,30%  _nlist_cursor_visibility
 6)    1          13,58    13,58    0,14%      7,17     7,17    0,07%  _nlist_exit
 7)    4           2,79     0,70    0,03%      2,71     0,68    0,03%  _nlist_setup_user_vars
 8)    4           0,18     0,05    0,00%      0,18     0,05    0,00%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  10,88s user 0,52s system 96% cpu 11,779 total

RES: 439M

[-- Attachment #3: result5.txt --]
[-- Type: text/plain, Size: 1913 bytes --]

new patch:
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1         965,93   965,93   99,29%    908,88   908,88   93,42%  n-list
 2)  161          34,56     0,21    3,55%     34,56     0,21    3,55%  _nlist_print_with_ansi
 3)    7          49,01     7,00    5,04%     14,44     2,06    1,48%  n-list-draw
 4)    2          11,36     5,68    1,17%     11,36     5,68    1,17%  _nlist_cursor_visibility
 5)    7           2,18     0,31    0,22%      2,08     0,30    0,21%  n-list-input
 6)    1           6,94     6,94    0,71%      0,53     0,53    0,05%  _nlist_exit
 7)    7           0,55     0,08    0,06%      0,38     0,05    0,04%  _nlist_setup_user_vars
 8)    7           0,25     0,04    0,03%      0,25     0,04    0,03%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  2,50s user 0,17s system 87% cpu 3,055 total

two previous patches (from HEAD):
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        1861,59  1861,59   99,22%   1082,47  1082,47   57,70%  n-list
 2)  161         486,65     3,02   25,94%    486,65     3,02   25,94%  _nlist_print_with_ansi
 3)    7         738,66   105,52   39,37%    252,01    36,00   13,43%  n-list-draw
 4)    7          25,10     3,59    1,34%     24,97     3,57    1,33%  n-list-input
 5)    2          12,54     6,27    0,67%     12,54     6,27    0,67%  _nlist_cursor_visibility
 6)    7           9,61     1,37    0,51%      9,37     1,34    0,50%  _nlist_setup_user_vars
 7)    1          14,61    14,61    0,78%      7,32     7,32    0,39%  _nlist_exit
 8)    7           0,39     0,06    0,02%      0,39     0,06    0,02%  _nlist_status_msg
( n-panelize < ~/lsoflsof; )  3,20s user 0,17s system 85% cpu 3,928 total


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-13 15:52                                 ` Bart Schaefer
  2015-10-14  6:50                                   ` Sebastian Gniazdowski
@ 2015-10-14 13:27                                   ` Peter Stephenson
  2015-10-14 16:25                                     ` Bart Schaefer
  1 sibling, 1 reply; 61+ messages in thread
From: Peter Stephenson @ 2015-10-14 13:27 UTC (permalink / raw)
  To: zsh-workers

On Tue, 13 Oct 2015 08:52:46 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> For example, the following shows what I mean.  This may be a poor
> choice if most functions are small, so I won't suggest committing, but
> this is the kind of well-isolated push/pop that might benefit.

I'd suggest a quick finger test with the completion system to see if
anything is obviously worse --- if anything is going to exercise large
function stacks with small-to-medium size functions, it's that.

pws


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-14 13:27                                   ` Peter Stephenson
@ 2015-10-14 16:25                                     ` Bart Schaefer
  2015-10-14 16:50                                       ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-14 16:25 UTC (permalink / raw)
  To: zsh-workers

On Oct 14,  8:50am, Sebastian Gniazdowski wrote:
}
} With this patch the list is instant fast.

NEWHEAPS() is O(1) where pushheap() is O(N) where N is the number of
arenas already in the heap.  So it's not suprising that it's faster
in the specific case you're trying (lots of data in the heap for a
relatively long time) -- the question is would it slow down in the
more normal use case of a mostly-empty heap and a lot of function
calls, because the one operation done by NEWHEAPS() is heavier than
any one of the N operations in pushheap().

On the other hand for interactive use the difference in the small-memory
case is probably never going to be noticed.

} What's interesting is lower RES for the new patch. It's like with the
} new patch RES can easier go down after peak.

With pushheap() the code tries to maximally use available space in the
already allocated heap arenas; with NEWHEAPS() that available space is
ignored and we create all new arenas.  It's possible that allows (or
unintentionally forces) the system malloc() to clump new allocations
together better so that they can be released back to the system sooner.
A helpful side-effect if so, but I wouldn't count on it being true for
all malloc libraries.


On Oct 14,  2:27pm, Peter Stephenson wrote:
}
} I'd suggest a quick finger test with the completion system to see if
} anything is obviously worse --- if anything is going to exercise large
} function stacks with small-to-medium size functions, it's that.

Actually the completion system is the place it's most likely to entirely
break down -- completion already uses NEWHEAPS() to get its own heap, and
then frequently employs SWITCHHEAPS() to be able to pass values back on
the regular heap.  If we're also swapping the regular heap around with
NEWHEAPS() it's possible things might get very confused, or that an entire
list of heaps might be leaked.

However, I did just run the Test/Y* tests with valgrind and they all passed
and reported no leaks.  (I did get:

==32081== Warning: invalid file descriptor -1 in syscall close()
==32081== Warning: invalid file descriptor -1 in syscall close()

from both Y01 and Y02 but curiously not Y03.)


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-14 16:25                                     ` Bart Schaefer
@ 2015-10-14 16:50                                       ` Bart Schaefer
  2015-10-15  4:32                                         ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-14 16:50 UTC (permalink / raw)
  To: zsh-workers

On Oct 14,  9:25am, Bart Schaefer wrote:
}
} NEWHEAPS() is O(1) where pushheap() is O(N) where N is the number of
} arenas [...] the question is would it slow down in the more normal use
} case of a mostly-empty heap and a lot of function calls

Just as a data point, I tried swapping out the push/pop pair in the
source() function with NEWHEAPS/OLDHEAPS and "time"-ing a full run of
"make check" each way.  NEWHEAPS was slightly slower and had higher
CPU use -- but the test scripts aren't a typical shell use pattern.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-14 16:50                                       ` Bart Schaefer
@ 2015-10-15  4:32                                         ` Bart Schaefer
  2015-10-15 13:03                                           ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-15  4:32 UTC (permalink / raw)
  To: zsh-workers

On Oct 14,  9:50am, Bart Schaefer wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} On Oct 14,  9:25am, Bart Schaefer wrote:
} }
} } NEWHEAPS() is O(1) where pushheap() is O(N) where N is the number of
} } arenas [...] the question is would it slow down in the more normal use
} } case of a mostly-empty heap and a lot of function calls
} 
} Just as a data point, I tried swapping out the push/pop pair in the
} source() function with NEWHEAPS/OLDHEAPS [...]
} NEWHEAPS was slightly slower

I've now gone through all the pushheap() in the code ... after ruling out
the cases where the code between push and pop couldn't be neatly wrapped
as a { block }, the only other pair it might make sense to replace is in
sort.c:strmetasort() and even that is questionable.

Given that we have direct evidence that it's beneficial to use NEWHEAPS
in doshfunc() in at least one case, I'll go ahead and commit 36853 and
we can see how it goes.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-15  4:32                                         ` Bart Schaefer
@ 2015-10-15 13:03                                           ` Sebastian Gniazdowski
  2015-10-16  0:35                                             ` Bart Schaefer
  2015-10-16  0:37                                             ` Bart Schaefer
  0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-15 13:03 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

On 15 October 2015 at 06:32, Bart Schaefer <schaefer@brasslantern.com> wrote:
> Given that we have direct evidence that it's beneficial to use NEWHEAPS
> in doshfunc() in at least one case, I'll go ahead and commit 36853 and
> we can see how it goes.

I've written a script that makes some memory testing easy:

https://github.com/psprint/zsh-tools/blob/master/mem-test.zsh

It runs copies of given zshells, which are directed to call specified
test function from the script. The test function then waits for the
main zsh to query memory size (via top). Example execution:

# Tests for zsh
string_test 480
array_test 3.21875


# Tests for zsh-head-both
string_test 5.1953125
array_test 3.05859375


# Tests for zsh-5.1.1-dev-0-clean
string_test 5.76171875
array_test 2.703125

Results are in megabytes of RES memory. The first set is for 5.0.2. It
seems there is something causing high memory usage, which isn't
activated for the array test, only for string_test.

I will write more more stressful functions to grasp any differences.
What test function could stress the recent changes?

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: mem-test.zsh --]
[-- Type: application/octet-stream, Size: 2751 bytes --]

#!/bin/zsh

emulate -L zsh
setopt extendedglob

zshs=( /bin/zsh zsh-head-both zsh-5.1.1-dev-0-clean )

# Convert sizes to number of megabytes
to_mbytes() {
    local size="$1"
    #echo "Converting $1"
    if [[ "$size" = [0-9]#[Mm]* ]]; then
        size="${size%[Mm]*}"
    elif [[ "$size" = [0-9]#[Kk]* ]]; then
        size="${size%[Kk]*}"
        (( size = size / 1024.0 ))
    elif [[ "$size" = [0-9]# ]]; then
        case $( uname ) in
            *Linux*)
                (( size = size / 1024.0 ))
                ;;
            *)
                (( size = size / (1024.0 * 1024.0) ))
                ;;
        esac
    else
        echo "Bad size occured: $size"
    fi

    REPLY="$size"
}

#
# Children administration
#

trap "finished" SIGUSR1

finished() {
    FINISHED=1
}

# Waits for signal from child process
# Gets its memory size
wait_get_mem() {
    while [ "$FINISHED" -eq 0 ]; do
        LANG=C sleep 0.5
    done

    case $( uname ) in
        *Darwin*)
            output=( "${(@f)"$( top -pid "$SUB_PID" -stats mem -l 1 )"}" )
            to_mbytes "$output[-1]"
            ;;
        *Linux*)
            output=( "${(@f)"$( top -p "$SUB_PID" -bn 1 )"}" )
            output=$output[-1]
            output=( $=output )
            to_mbytes "$output[6]"
            ;;
    esac

    kill -15 "$SUB_PID"

    # Suitable for gnuplot - X Y
    echo "$TEST $REPLY"
}

_finished_signal_wait() {
    kill -SIGUSR1 "$MAIN_PID"
    sleep 60
}

#
# Tests
#

tests=( string_test array_test )

string_test() {
    local a=""
    integer i=50000
    while (( i -- )); do a+="$i"; done

    _finished_signal_wait
}

array_test() {
    typeset -a a
    integer i=10000
    while (( i -- )); do a+=( $i ); done

    _finished_signal_wait
}

#
# Main code
#

# Detect main vs. for-test invocation
if [ -z "$1" ]; then
    for current_zsh in "$zshs[@]"; do
        type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
        zsh_binary="${current_zsh##*/}"

        echo "# Tests for $zsh_binary"
        for test in "$tests[@]"; do
            FINISHED=0
            TEST="$test"

            "$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &

            SUB_PID=$!
            wait_get_mem
        done
        echo
        echo

    done

    # Example gnuplot invocation:
    #set style data histogram
    #set style fill solid border rgb "black"
    #plot "result" index 0 using 2: xtic(1), "result" index 1 using 2: xtic(1), "result" index 2 using 2: xtic(1)
else
    MAIN_PID="$1"
    zsh_binary="${2##*/}"
    shift
    shift
    # Echo status only when output is not to terminal
    [ ! -t 1 ] && echo >&2 "Running [$zsh_binary]: $@"

    # Run the test
    "$@"
fi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-15 13:03                                           ` Sebastian Gniazdowski
@ 2015-10-16  0:35                                             ` Bart Schaefer
  2015-10-17  9:12                                               ` Sebastian Gniazdowski
  2015-10-16  0:37                                             ` Bart Schaefer
  1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-16  0:35 UTC (permalink / raw)
  To: zsh-workers

On Oct 15,  3:03pm, Sebastian Gniazdowski wrote:
} 
} https://github.com/psprint/zsh-tools/blob/master/mem-test.zsh
} 
} # Tests for zsh-5.1.1-dev-0-clean
} string_test 5.76171875
} array_test 2.703125
} 
} Results are in megabytes of RES memory. The first set is for 5.0.2. It
} seems there is something causing high memory usage, which isn't
} activated for the array test, only for string_test.

The array test only iterates 10000 times, the string test 50000 ?  Does
that not account for the difference?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-15 13:03                                           ` Sebastian Gniazdowski
  2015-10-16  0:35                                             ` Bart Schaefer
@ 2015-10-16  0:37                                             ` Bart Schaefer
  1 sibling, 0 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-16  0:37 UTC (permalink / raw)
  To: zsh-workers

On Oct 15,  3:03pm, Sebastian Gniazdowski wrote:
}
} What test function could stress the recent changes?

Something that makes a lot of function calls, some with and some
without declaring local variables, and to different depths of recursion.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-16  0:35                                             ` Bart Schaefer
@ 2015-10-17  9:12                                               ` Sebastian Gniazdowski
  2015-10-17  9:24                                                 ` Sebastian Gniazdowski
  2015-10-18 16:19                                                 ` Bart Schaefer
  0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-17  9:12 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On 16 October 2015 at 02:35, Bart Schaefer <schaefer@brasslantern.com> wrote:
> The array test only iterates 10000 times, the string test 50000 ?  Does
> that not account for the difference?

Apparently not. I've modified the mem-test script so that it queries
usage of memory also during the tests. It can be seen that for strings
(and zsh 5.0.2) the memory is high from the beginning.

It can be also seen that the recent three patches increase memory
usage. A clean, free from the patches version of zsh
(5.1.1-dev-0-clean) keeps the memory always low. What's good news is
that the memory usage is lower than in 5.0.2.

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: mem-test.txt --]
[-- Type: text/plain, Size: 866 bytes --]

# Tests for zsh
# string_test 453, 571, 650, 711, 758, 800, 835,
string_test 835
# array_test 2.2, 2.9, 3.5, 3.9, 4.7, 5.4, 6.4, 7.5, 7.9, 8.1, 8.6, 8.8, 8.9, 9.2, 9.4, 9.7, 10, 10, 10,
array_test 10


# Tests for zsh-head-both
# string_test 109, 238, 367, 497, 625, 8.9,
string_test 8.9
# array_test 2.1, 3, 3.5, 4, 4.7, 5.8, 6.7, 7.8, 8.2, 8.5, 8.7, 9, 9.2, 9.5, 9.7, 10, 10, 10, 10,
array_test 10


# Tests for zsh-newheaps-three-patches
# string_test 111, 245, 376, 509, 637, 8.9,
string_test 8.9
# array_test 2, 2.8, 3.7, 4.1, 4.7, 5.2, 6.2, 7.2, 7.9, 8.1, 8.4, 8.7, 8.9, 9.2, 9.4, 9.7, 10, 10, 10, 10,
array_test 10


# Tests for zsh-5.1.1-dev-0-clean
# string_test 5.7, 8, 9.5, 11, 11, 10, 10, 11, 11, 11, 12, 11, 10,
string_test 10
# array_test 2, 2.9, 3.5, 3.9, 4.3, 4.6, 5.6, 6.6, 7.3, 7.7, 8, 8.2, 8.5, 8.7, 8.9, 9.1, 9.3, 9.5, 9.7, 10, 10,
array_test 10

[-- Attachment #3: mem-test.png --]
[-- Type: image/png, Size: 89441 bytes --]

[-- Attachment #4: mem-test.zsh --]
[-- Type: application/octet-stream, Size: 3014 bytes --]

#!/bin/zsh

emulate -L zsh
setopt extendedglob

zshs=( /bin/zsh zsh-head-both zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )

# Convert sizes to number of megabytes
to_mbytes() {
    local size="$1"
    if [[ "$size" = [0-9]#[Mm]* ]]; then
        size="${size%[Mm]*}"
    elif [[ "$size" = [0-9]#[Kk]* ]]; then
        size="${size%[Kk]*}"
        (( size = size / 1024.0 ))
    elif [[ "$size" = [0-9]# ]]; then
        case $( uname ) in
            *Linux*)
                (( size = size / 1024.0 ))
                ;;
            *)
                (( size = size / (1024.0 * 1024.0) ))
                ;;
        esac
    else
        echo "Bad size occured: $size"
    fi

    REPLY="$size"
}

#
# Children administration
#

trap "finished" SIGUSR1

finished() {
    FINISHED=1
}

# Gets memory size of the zsh that runs the test
get_mem() {
    case $( uname ) in
        *Darwin*)
            output=( "${(@f)"$( top -pid "$SUB_PID" -stats mem -l 1 )"}" )
            to_mbytes "$output[-1]"
            ;;
        *Linux*)
            output=( "${(@f)"$( top -p "$SUB_PID" -bn 1 )"}" )
            output=$output[-1]
            output=( $=output )
            to_mbytes "$output[6]"
            ;;
    esac
}

# Waits for signal from child process
wait_get_mem() {
    local number

    echo -n "# $TEST "

    while [ "$FINISHED" -eq 0 ]; do
        sleep 3
        get_mem
        number=`LANG=C printf "%.1f" "$REPLY"`
        echo -n "${number%.0}, "
    done

    get_mem

    kill -15 "$SUB_PID"

    # Suitable for gnuplot - X Y
    number=`LANG=C printf "%.1f" "$REPLY"`
    echo
    echo "$TEST" "${number%.0}"
}

_finished_signal_wait() {
    kill -SIGUSR1 "$MAIN_PID"
    sleep 60
}

#
# Tests
#

tests=( string_test array_test )

string_test() {
    local a=""
    integer i=150000
    repeat $i; do a+="$i"; done

    _finished_signal_wait
}

array_test() {
    typeset -a a
    integer i=30000
    repeat $i; do a+=( $i ); done

    _finished_signal_wait
}

#
# Main code
#

# Detect main vs. for-test invocation
if [ -z "$1" ]; then
    for current_zsh in "$zshs[@]"; do
        type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
        zsh_binary="${current_zsh##*/}"

        echo "# Tests for $zsh_binary"
        for test in "$tests[@]"; do
            FINISHED=0
            TEST="$test"

            "$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &

            SUB_PID=$!
            wait_get_mem
        done
        echo
        echo

    done

    # Example gnuplot invocation:
    #set style data histogram
    #set style fill solid border rgb "black"
    #plot "result" index 0 using 2: xtic(1), "result" index 1 using 2: xtic(1), "result" index 2 using 2: xtic(1)
else
    MAIN_PID="$1"
    zsh_binary="${2##*/}"
    shift
    shift
    # Echo status only when output is not to terminal
    [ ! -t 1 ] && echo >&2 "Running [$zsh_binary]: $@"

    # Run the test
    eval "run_test() { $@ }"
    run_test
fi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-17  9:12                                               ` Sebastian Gniazdowski
@ 2015-10-17  9:24                                                 ` Sebastian Gniazdowski
  2015-10-18 16:19                                                 ` Bart Schaefer
  1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-17  9:24 UTC (permalink / raw)
  To: zsh-workers

On 17 October 2015 at 11:12, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> (5.1.1-dev-0-clean) keeps the memory always low. What's good news is
> that the memory usage is lower than in 5.0.2.

Also, the patched versions drop down memory usage as soon as test
function finishes doing its work.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-17  9:12                                               ` Sebastian Gniazdowski
  2015-10-17  9:24                                                 ` Sebastian Gniazdowski
@ 2015-10-18 16:19                                                 ` Bart Schaefer
  2015-10-18 20:40                                                   ` Sebastian Gniazdowski
  1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-18 16:19 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh hackers list

On Sat, Oct 17, 2015 at 2:12 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 16 October 2015 at 02:35, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> The array test only iterates 10000 times, the string test 50000 ?  Does
>> that not account for the difference?
>
> Apparently not. I've modified the mem-test script so that it queries
> usage of memory also during the tests. It can be seen that for strings
> (and zsh 5.0.2) the memory is high from the beginning.

I'm strongly inclined to think that's more likely an artifact of the
testing method than of the test itself -- have you tried running the
tests in the opposite order?  You also haven't explained your
reasoning for having the string test iterate so many more times than
the array test.  Does it have to do with keeping total run time
similar?

> It can be also seen that the recent three patches increase memory
> usage. A clean, free from the patches version of zsh
> (5.1.1-dev-0-clean) keeps the memory always low.

That's entirely to be expected -- all three patches increase speed by
avoiding the work necessary to minimize memory footprint, until (as
you noted in next message) the execution stack is unwound.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-18 16:19                                                 ` Bart Schaefer
@ 2015-10-18 20:40                                                   ` Sebastian Gniazdowski
  2015-10-18 21:07                                                     ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-18 20:40 UTC (permalink / raw)
  To: zsh-workers

On 18 October 2015 at 18:19, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sat, Oct 17, 2015 at 2:12 AM, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> On 16 October 2015 at 02:35, Bart Schaefer <schaefer@brasslantern.com> wrote:
>>> The array test only iterates 10000 times, the string test 50000 ?  Does
>>> that not account for the difference?
>>
>> Apparently not. I've modified the mem-test script so that it queries
>> usage of memory also during the tests. It can be seen that for strings
>> (and zsh 5.0.2) the memory is high from the beginning.
>
> I'm strongly inclined to think that's more likely an artifact of the
> testing method than of the test itself have you tried running the
> tests in the opposite order?

The test script is quite well written. Here is how a test is being run:

      "$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &

It starts new zsh process – the one in "$current_zsh" variable – and
feeds it with the same script file, giving in arguments pid of main
Zsh, name of tested Zsh and name of test function to run. This should
imply full separation of the tests. That's why order shouldn't matter.
However I did reverse the order to just check and results are the
same. The memory profile of three-patches version is thus:

# string_test 41, 164, 279, 388, 504, 602, 707, last: 11

You are quite right that the memory usage isn't "high from the
beginning". It starts from 41 MB, goes to 164, 279, etc. in quite
controlled way. Previous version of the script awaited 3 seconds
before doing first check, that's why it started from values of 100 MB,
and I rather misinterpreted this. How to interpret these still high
values is now open.

> You also haven't explained your
> reasoning for having the string test iterate so many more times than
> the array test.  Does it have to do with keeping total run time
> similar?

Yes that's the reason. I now do even more iterations to make the test
last. During this time I query memory size, obtaining a nice sequence
of memory usages. It nicely illustrates behavior of Zsh.

>> It can be also seen that the recent three patches increase memory
>> usage. A clean, free from the patches version of zsh
>> (5.1.1-dev-0-clean) keeps the memory always low.
>
> That's entirely to be expected -- all three patches increase speed by
> avoiding the work necessary to minimize memory footprint, until (as
> you noted in next message) the execution stack is unwound.

The script now computes averages. Average of 5.0.2 is 622.3, of
two-patches version 327.7, of the three patches (third is the new
heaps) 329.7. So it's better than without the patches, having in mind
also the quick drop after stack is unwound.

Tried a new test function:

function_test() {
    local count

    if [ -z "$1" ]; then
        repeat 10000; do function_test 100; done
        _finished_signal_wait
    else
        count="$1"
    fi

    if (( count -- > 0 )); then
        function_test "$count"
    fi
}

Its memory usage is low for all Zsh versions: ~2.5 MB, (e.g. 2.7, 2.9,
2.8, 2.6, 2.9, 2.8, 2.6, 2.9, 2.9, 2.8).

Best Regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-18 20:40                                                   ` Sebastian Gniazdowski
@ 2015-10-18 21:07                                                     ` Bart Schaefer
  2015-10-18 21:31                                                       ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-18 21:07 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

On Oct 18, 2015 1:40 PM, "Sebastian Gniazdowski" <sgniazdowski@gmail.com>
wrote:
>
> On 18 October 2015 at 18:19, Bart Schaefer <schaefer@brasslantern.com>
wrote:
> >
> > I'm strongly inclined to think that's more likely an artifact of the
> > testing method than of the test itself have you tried running the
> > tests in the opposite order?
>
> It starts new zsh process – the one in "$current_zsh" variable – and
> feeds it with the same script file, giving in arguments pid of main
> Zsh, name of tested Zsh and name of test function to run. This should
> imply full separation of the tests. That's why order shouldn't matter.

Sorry, I didn't mean order of zsh versions, I meant did you try doing
array_test before string_test within each version.

> You are quite right that the memory usage isn't "high from the
> beginning". It starts from 41 MB, goes to 164, 279, etc. in quite
> controlled way. Previous version of the script awaited 3 seconds
> before doing first check, that's why it started from values of 100 MB,
> and I rather misinterpreted this. How to interpret these still high
> values is now open.

String is stored as one contiguous block of bytes, so each time it exceeds
the size of the current heap block an entire new large-enough block is
allocated, and no previous block can never be re-used because the string
never shrinks.

> Tried a new test function:
>
> Its memory usage is low for all Zsh versions: ~2.5 MB, (e.g. 2.7, 2.9,
> 2.8, 2.6, 2.9, 2.8, 2.6, 2.9, 2.9, 2.8).

For functions the test would also need to compare run time / CPU usage - I
would not expect memory usage to be interesting unless the function
declared locals.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-18 21:07                                                     ` Bart Schaefer
@ 2015-10-18 21:31                                                       ` Sebastian Gniazdowski
  2015-10-19 17:21                                                         ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-18 21:31 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

On 18 October 2015 at 23:07, Bart Schaefer <schaefer@brasslantern.com> wrote:
> Sorry, I didn't mean order of zsh versions, I meant did you try doing
> array_test before string_test within each version.

That's what I did

> String is stored as one contiguous block of bytes, so each time it exceeds
> the size of the current heap block an entire new large-enough block is
> allocated, and no previous block can never be re-used because the string
> never shrinks.

How can 5.1.1-dev-0 with no patches maintain low memory usage for strings then?

# string_test 3.3, 7.8, 9.4, 11, 11, 11, 11, 11, 11, 12, 11, 12, 11,
12, last: 11

> For functions the test would also need to compare run time / CPU usage - I
> would not expect memory usage to be interesting unless the function declared
> locals.

The function test has two samples more (7 vs. 9), taken in intervals
of 3 seconds, so there is a difference.

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: mem-test2.txt --]
[-- Type: text/plain, Size: 1165 bytes --]

# Tests for zsh
# array_test 1.5, 2.6, 3.3, 4, 4.5, 4.8, 5.7, 6.4, 7.5, 7.9, 8.1, 8.4, 8.7, 8.9, last: 9
array_test 5.6
# string_test 361, 509, 608, 676, 728, 771, 806, last: 836
string_test 608.8
# function_test 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, last: 2.7
function_test 2.7


# Tests for zsh-head-both
# array_test 1.5, 2.6, 3.2, 4, 4.3, 4.8, 6, 7, 7.7, 8.2, 8.4, 8.7, 9, last: 9.1
array_test 5.5
# string_test 44, 173, 302, 431, 560, 681, last: 10
string_test 302.0
# function_test 2.5, 2.6, 2.6, 2.5, 2.6, 2.6, 2.6, last: 2.5
function_test 2.6

# Tests for zsh-newheaps-three-patches
# array_test 1.5, 2.7, 3.4, 3.9, 4.3, 4.8, 5.9, 6.6, 7.6, 8, 8.3, 8.6, 8.9, 9.1, last: 9.1
array_test 5.7
# string_test 46, 173, 290, 402, 511, 614, 707, last: 10
string_test 339.3
# function_test 2.9, 2.8, 2.6, 2.6, 2.8, 2.7, 2.6, 2.8, 2.9, last: 2.6
function_test 2.7

# Tests for zsh-5.1.1-dev-0-clean
# array_test 1.5, 2.6, 3.3, 4, 4.5, 5, 6.3, 7, 7.9, 8.2, 8.5, 8.7, 9, last: 9.1
array_test 5.6
# string_test 3.1, 7.3, 9, 11, 11, 11, 11, 11, 11, 12, 11, 12, 12, 11, last: 11
string_test 10.2
# function_test 2.6, 2.5, 2.6, 2.6, 2.6, 2.6, 2.5, last: 2.5
function_test 2.6


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-18 21:31                                                       ` Sebastian Gniazdowski
@ 2015-10-19 17:21                                                         ` Bart Schaefer
  2015-10-22 12:49                                                           ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-19 17:21 UTC (permalink / raw)
  To: zsh-workers

On Oct 18, 11:31pm, Sebastian Gniazdowski wrote:
}
} > String is stored as one contiguous block of bytes, so each time it exceeds
} > the size of the current heap block an entire new large-enough block is
} > allocated, and no previous block can never be re-used because the string
} > never shrinks.
} 
} How can 5.1.1-dev-0 with no patches maintain low memory usage for
} strings then?

It's probably this:

+               h->next = NULL;
+           } else {
+               /* Leave an empty arena at the end until popped */
+               h->used = 0;
+               fheap = hl = h;
+               break;
+           }
+           if (fheap == h)

We leave that empty arena at the end, but then on the next zhalloc()
it's still not big enough, so another one gets allocated.

You could try removing that "else" to let the trailing block be freed
again, but then we're back to needing to compare speed as well as space
across shell versions.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-19 17:21                                                         ` Bart Schaefer
@ 2015-10-22 12:49                                                           ` Sebastian Gniazdowski
  2015-10-22 15:00                                                             ` Bart Schaefer
  2015-10-23  6:32                                                             ` Sebastian Gniazdowski
  0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-22 12:49 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1990 bytes --]

On 19 October 2015 at 19:21, Bart Schaefer <schaefer@brasslantern.com> wrote:
> We leave that empty arena at the end, but then on the next zhalloc()
> it's still not big enough, so another one gets allocated.
>
> You could try removing that "else" to let the trailing block be freed
> again, but then we're back to needing to compare speed as well as space
> across shell versions.

I removed the else, code works longer and memory is still allocated.
This is seen in mem-test3.txt, where there are many samples and large
numbers, and in perf-test.txt, where there noted is long running time
(numbers are milliseconds)

I wrote perf-test.zsh to automatically perform performance tests:
https://github.com/psprint/zsh-tools/

With it and with mem-test.zsh I tested following zsh variants:
- 5.0.2 osx
- newheaps patch (zsh-newheaps-only)
- newheaps and zhalloc patch 36836 (zsh-newheaps-zhalloc)
- zhalloc and the previous patch 36834 (zsh-head-both-patches)
- all three patches (zsh-newheaps-three-patches)
- a 5.1.1-dev-0 clean zsh
- the empty arena else removed

My conclusions:
- 36834 introduces high memory usage; I would suggest to use only two
optimizations – newheaps and zhalloc; my znavtools are instant fast
with them, what's slower is searching (wrote search_test function,
results attached); basically half of pattern search speedup is lost
with 36834, however what's gained is beautiful memory usage, without
firefox-like values for RES (~700 MB); newheaps is responsible for
instant-responsivity of my script, 36836 for much faster searching –
that's a good compromise

- searching with patterns through very large arrays (700k) allocates
264 MB of memory regardless of zsh version; this probably tells
something

- still not sure why 36834 allocates much memory for the string test

- newheaps makes function calls longer, but it's not a substantial
difference – 21196 ms vs. 18549 ms

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: mem-test3.txt --]
[-- Type: text/plain, Size: 2620 bytes --]

# Tests for zsh-empty-arena-else
# string_test 51, 152, 206, 247, 280, 305, 327, 345, 363, 380, 394, 407, 421, 432, 444, 455, 464, 474, 483, 493, 502, 509, 518, 525, 534, 541, 549, 555, 560, 569, 575, 581, last: 586
string_test 421.3
# array_test 1, 2.7, 2.9, 3.8, 4.3, 5.6, 6.5, 7.6, 8, 8.4, 8.7, 8.9, last: 9.1
array_test 5.4
# function_test 2.6, 2.9, 2.7, 2.9, 2.9, 2.8, 2.9, 2.7, last: 2.6
function_test 2.8
# search_test 110, 215, last: 264
search_test 110.0


# Tests for zsh-newheaps-zhalloc
# string_test 2.9, 7.7, 9.3, 11, 11, 11, 12, 11, 12, 11, 11, 11, 12, last: 11
string_test 10.1
# array_test 1, 2, 3.1, 3.3, 4.5, 5.3, 6.3, 7.5, 7.8, 8.2, 8.5, 8.7, last: 8.9
array_test 5.2
# function_test 2.7, 2.7, 2.8, 2.6, 2.6, 2.9, 2.9, 3, last: 2.6
function_test 2.7
# search_test 86, 158, 200, 237, last: 265
search_test 148.0


# Tests for zsh-newheaps-only
# string_test 3, 7.7, 9.1, 11, 11, 11, 12, 11, 11, 11, 12, 11, 11, last: 10
string_test 10.1
# array_test 1.4, 2, 3, 3.8, 4.4, 5.3, 6.3, 7, 7.7, 8, 8.3, 8.5, last: 8.8
array_test 5.2
# function_test 2.7, 2.6, 2.8, 2.7, 2.9, 2.7, 2.7, 2.7, last: 2.6
function_test 2.7
# search_test 62, 122, 144, 171, 196, 218, 238, 259, last: 266
search_test 164.4


# Tests for zsh-5.0.2-osx
# string_test 2.5, 7.4, 8.9, 11, 11, 12, 11, 11, 11, 11, 12, 12, last: 10
string_test 9.9
# array_test 1.4, 2.2, 3, 4, 4.6, 5.3, 6.3, 7.4, 7.8, 8, 8.4, 8.7, last: 8.8
array_test 5.3
# function_test 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, last: 2.6
function_test 2.6
# search_test 59, 122, 152, 179, 204, 226, 245, last: 264
search_test 157.0


# Tests for zsh-head-both-patches
# string_test 40, 182, 318, 456, 589, 725, last: 9.7
string_test 317.0
# array_test 1.1, 2.5, 3, 3.8, 4.3, 5, 6, 7, 7.5, 7.8, 8.2, 8.3, last: 8.5
array_test 5.1
# function_test 2.6, 2.6, 2.6, 2.6, 2.5, 2.6, 2.6, last: 2.5
function_test 2.6
# search_test 122, 217, last: 266
search_test 122.0


# Tests for zsh-newheaps-three-patches
# string_test 44, 188, 331, 471, 612, last: 8.9
string_test 258.5
# array_test 1.1, 2, 3.1, 3.8, 4.4, 5.1, 6.2, 7.2, 7.7, 8, 8.2, 8.6, last: 8.7
array_test 5.2
# function_test 2.7, 2.7, 2.9, 2.8, 2.9, 2.7, 2.9, 2.6, last: 2.6
function_test 2.8
# search_test 116, 216, last: 265
search_test 116.0


# Tests for zsh-5.1.1-dev-0-clean
# string_test 2.6, 7.3, 8.8, 11, 11, 11, 10, 11, 12, 11, 12, 11, last: 10
string_test 9.8
# array_test 1.3, 2.2, 2.3, 3.4, 4.2, 5, 6.1, 6.9, 7.6, 7.8, 8.1, 8.4, last: 8.6
array_test 5.0
# function_test 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, last: 2.6
function_test 2.6
# search_test 62, 121, 151, 177, 201, 223, 243, 264, last: 265
search_test 168.3



[-- Attachment #3: perf-test.txt --]
[-- Type: text/plain, Size: 1487 bytes --]

Running [zsh-empty-arena-else]: string_test 13749,47
Running [zsh-empty-arena-else]: array_test 12182,06
Running [zsh-empty-arena-else]: function_test 21345,66
Running [zsh-empty-arena-else]: search_test 3849,80


Running [zsh-newheaps-zhalloc]: string_test 9990,75
Running [zsh-newheaps-zhalloc]: array_test 12109,69
Running [zsh-newheaps-zhalloc]: function_test 21318,52
Running [zsh-newheaps-zhalloc]: search_test 6842,58


Running [zsh-newheaps-only]: string_test 9999,97
Running [zsh-newheaps-only]: array_test 11925,05
Running [zsh-newheaps-only]: function_test 21196,27
Running [zsh-newheaps-only]: search_test 11670,71


Running [zsh-5.0.2-osx]: string_test 9628,87
Running [zsh-5.0.2-osx]: array_test 12294,85
Running [zsh-5.0.2-osx]: function_test 18668,42
Running [zsh-5.0.2-osx]: search_test 11518,38


Running [zsh-head-both-patches]: string_test 4385,93
Running [zsh-head-both-patches]: array_test 11933,38
Running [zsh-head-both-patches]: function_test 18549,57
Running [zsh-head-both-patches]: search_test 3949,40


Running [zsh-newheaps-three-patches]: string_test 4351,58
Running [zsh-newheaps-three-patches]: array_test 12111,25
Running [zsh-newheaps-three-patches]: function_test 21356,30
Running [zsh-newheaps-three-patches]: search_test 4063,73


Running [zsh-5.1.1-dev-0-clean]: string_test 9651,08
Running [zsh-5.1.1-dev-0-clean]: array_test 12749,21
Running [zsh-5.1.1-dev-0-clean]: function_test 18667,25
Running [zsh-5.1.1-dev-0-clean]: search_test 11386,36



[-- Attachment #4: mem-test.zsh --]
[-- Type: application/octet-stream, Size: 4111 bytes --]

#!/bin/zsh

emulate -L zsh
setopt extendedglob

zshs=( zsh-empty-arena-else zsh-newheaps-zhalloc zsh-newheaps-only )
zshs+=( zsh-5.0.2-osx zsh-head-both-patches zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )

# Convert sizes to number of megabytes
to_mbytes() {
    local size="$1"
    #echo Converting: "$1"
    if [[ "$size" = [0-9]#[Mm]* ]]; then
        size="${size%[Mm]*}"
    elif [[ "$size" = [0-9]#[Kk]* ]]; then
        size="${size%[Kk]*}"
        (( size = size / 1024.0 ))
    elif [[ "$size" = [0-9]# ]]; then
        case $( uname ) in
            *Linux*)
                (( size = size / 1024.0 ))
                ;;
            *)
                (( size = size / (1024.0 * 1024.0) ))
                ;;
        esac
    else
        echo "Bad size occured: $size"
    fi

    REPLY="$size"
}

#
# Children administration
#

trap "finished" SIGUSR1

finished() {
    FINISHED=1
}

# Gets memory size of the zsh that runs the test
get_mem() {
    case $( uname ) in
        *Darwin*)
            output=( "${(@f)"$( top -pid "$SUB_PID" -stats mem -l 1 )"}" )
            to_mbytes "$output[-1]"
            ;;
        *Linux*)
            output=( "${(@f)"$( top -p "$SUB_PID" -bn 1 )"}" )
            output=$output[-1]
            output=( $=output )
            to_mbytes "$output[6]"
            ;;
    esac
}

# Waits for signal from child process
wait_get_mem() {
    local number
    float average=0.0
    integer count=0

    echo -n "# $TEST "

    sleep 1
    while [ "$FINISHED" -eq 0 ]; do
        get_mem
        number=`LANG=C printf "%.1f" "$REPLY"`
        echo -n "${number%.0}, "

        average+=number
        count+=1
        sleep 3
    done

    # Remove last result - to be certain that the average
    # is only above data from actively working test function
    if [ "$count" -gt 1 ]; then
        average=average-number
        count=count-1
    fi
    average=average/count

    # Take the explicit last result, it's telling
    # how zsh behaves when computation is stopped
    sleep 1
    get_mem
    number=`LANG=C printf "%.1f" "$REPLY"`
    echo "last: ${number%.0}"
    kill -15 "$SUB_PID"

    # Suitable for gnuplot - X Y
    LANG=C printf "$TEST %.1f\n" $average
}

_finished_signal_wait() {
    kill -SIGUSR1 "$MAIN_PID"
    sleep 60
}

#
# Tests
#

tests=( string_test array_test function_test search_test )

string_test() {
    local a=""
    integer i=150000
    repeat $i; do a+="$i"; done

    _finished_signal_wait
}

array_test() {
    typeset -a a
    integer i=25000
    repeat $i; do a+=( $i ); done

    _finished_signal_wait
}

function_test() {
    local count

    if [ -z "$1" ]; then
        repeat 10000; do function_test 100; done
        _finished_signal_wait
    else
        count="$1"
    fi

    if (( count -- > 0 )); then
        function_test "$count"
    fi
}

search_test() {
    integer elements=700000
    a="${(r:elements:: _:)b}"
    a=( $=a )
    a=( "${(@M)a:#(#i)*_*}" )
    a=( "${(@)a//(#mi)(_|a)/-${MATCH}-}" )
    a=( "${(@)a//(#bi)(_|-)/|${match[1]}|}" )

    _finished_signal_wait
}

#
# Main code
#

# Detect main vs. for-test invocation
if [ -z "$1" ]; then
    for current_zsh in "$zshs[@]"; do
        type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
        zsh_binary="${current_zsh##*/}"

        echo "# Tests for $zsh_binary"
        for test in "$tests[@]"; do
            FINISHED=0
            TEST="$test"

            "$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &

            SUB_PID=$!
            wait_get_mem
        done
        echo
        echo

    done

    # Example gnuplot invocation:
    #set style data histogram
    #set style fill solid border rgb "black"
    #plot "result" index 0 using 2: xtic(1), "result" index 1 using 2: xtic(1), "result" index 2 using 2: xtic(1)
else
    MAIN_PID="$1"
    zsh_binary="${2##*/}"
    shift
    shift
    # Echo status only when output is not to terminal
    [ ! -t 1 ] && echo >&2 "Running [$zsh_binary]: $@"

    # Run the test
    eval "run_test() { $@ }"
    run_test
fi

[-- Attachment #5: perf-test.zsh --]
[-- Type: application/octet-stream, Size: 2168 bytes --]

#!/bin/zsh

emulate -L zsh
setopt extendedglob

zmodload zsh/zprof

zshs=( zsh-empty-arena-else zsh-newheaps-zhalloc zsh-newheaps-only )
zshs+=( zsh-5.0.2-osx zsh-head-both-patches zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )

#
# Children administration
#

trap "finished" SIGUSR1

finished() {
    FINISHED=1
}

# Waits for signal from child process
wait_for_end_of_test() {
    while [ "$FINISHED" -eq 0 ]; do
        sleep 1
    done
    kill -15 "$SUB_PID"
}

_finished_signal_wait() {
    kill -SIGUSR1 "$MAIN_PID"
    sleep 60
}

#
# Tests
#

tests=( string_test array_test function_test search_test )

float multiplier=0.5

string_test() {
    local a=""
    integer i=$(( 150000*multiplier ))
    repeat $i; do a+="$i"; done
}

array_test() {
    typeset -a a
    integer i=$(( 25000*multiplier ))
    repeat $i; do a+=( $i ); done
}

function_test() {
    local count
    integer i=$(( 10000*multiplier ))

    if [ -z "$1" ]; then
        repeat $i; do function_test 100; done
    else
        count="$1"
    fi

    if (( count -- > 0 )); then
        function_test "$count"
    fi
}

search_test() {
    integer elements=$(( 800000 * multiplier ))
    a="${(r:elements:: _:)b}"
    a=( $=a )
    a=( "${(@M)a:#(#i)*_*}" )
    a=( "${(@)a//(#mi)(_|a)/-${MATCH}-}" )
    a=( "${(@)a//(#bi)(_|-)/|${match[1]}|}" )
}

#
# Main code
#

# Detect main vs. for-test invocation
if [ -z "$1" ]; then
    for current_zsh in "$zshs[@]"; do
        type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
        zsh_binary="${current_zsh##*/}"

        for test in "$tests[@]"; do
            FINISHED=0
            TEST="$test"

            "$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &

            SUB_PID=$!
            wait_for_end_of_test
        done
        echo
        echo

    done
else
    MAIN_PID="$1"
    zsh_binary="${2##*/}"
    shift
    shift
    echo -n "Running [$zsh_binary]: $@ "

    # Run the test
    zprof -c
    "$@"
    zprof_out=( "${(@f)"$( zprof )"}" )
    zprof_out="$zprof_out[3]"
    zprof_out=( $=zprof_out )
    echo "$zprof_out[3]"

    _finished_signal_wait
fi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-22 12:49                                                           ` Sebastian Gniazdowski
@ 2015-10-22 15:00                                                             ` Bart Schaefer
  2015-10-22 16:28                                                               ` Sebastian Gniazdowski
  2015-10-23 15:40                                                               ` Sebastian Gniazdowski
  2015-10-23  6:32                                                             ` Sebastian Gniazdowski
  1 sibling, 2 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-22 15:00 UTC (permalink / raw)
  To: zsh-workers

On Oct 22,  2:49pm, Sebastian Gniazdowski wrote:
}
} My conclusions:
} - 36834 introduces high memory usage; I would suggest to use only two
} optimizations -- newheaps and zhalloc; my znavtools are instant fast
} with them, what's slower is searching (wrote search_test function,
} results attached)

My apologies, but I'm not sure I followed all of that.  Can you produce
a diff against commit 8e9a68ad14655c1949b1a04d5715a5caa8c344ee that
shows the situation you think is the best so far?

} - still not sure why 36834 allocates much memory for the string test

Seeing the aforementioned diff may provide a hint.

} - newheaps makes function calls longer, but it's not a substantial
} difference

14% is pretty substantial, but it's probably all gained back if the
function does anything significant with local variables.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-22 15:00                                                             ` Bart Schaefer
@ 2015-10-22 16:28                                                               ` Sebastian Gniazdowski
  2015-10-22 16:33                                                                 ` Sebastian Gniazdowski
  2015-10-23 15:40                                                               ` Sebastian Gniazdowski
  1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-22 16:28 UTC (permalink / raw)
  To: zsh-workers

On 22 October 2015 at 17:00, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 22,  2:49pm, Sebastian Gniazdowski wrote:
> }
> } My conclusions:
> } - 36834 introduces high memory usage; I would suggest to use only two
> } optimizations -- newheaps and zhalloc; my znavtools are instant fast
> } with them, what's slower is searching (wrote search_test function,
> } results attached)
>
> My apologies, but I'm not sure I followed all of that.  Can you produce
> a diff against commit 8e9a68ad14655c1949b1a04d5715a5caa8c344ee that
> shows the situation you think is the best so far?

Patch 9f8e3e8 (36834) and 643aca9 (lost part of 36834) needs to be
reverted. The patches that should stay are 36836 (nicknamed zhalloc by
me) and 36853 (nicknamed newheaps). 9f8e3e8 doesn't revert
automatically, I can probably do it later if you won't.

> } - newheaps makes function calls longer, but it's not a substantial
> } difference
>
> 14% is pretty substantial, but it's probably all gained back if the
> function does anything significant with local variables.

Yes, I strongly looked forward at instant responsivity ignoring the
loss. Wanted to do some real world test – find fully zsh-based
distribution, switch Zsh between newheaps version and no-newheaps
version, compare boot times, but there isn't such distribution. Peter
described one possible real world test, but not sure how to do it, I
might think about it again.

Function calls aren't that fast anyway, are they? When optimizing ZNT
I pasted an expand_tabs() function instead of calling it 30k times
because even with the body of the function literally empty, zprof
still reported 10-15% use of CPU time. I might later revert ZNT to
that commit to compare how this behaves now, it might be surprisingly
better than worse.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-22 16:28                                                               ` Sebastian Gniazdowski
@ 2015-10-22 16:33                                                                 ` Sebastian Gniazdowski
  0 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-22 16:33 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 225 bytes --]

PS. I also attach diff with the two should-stay patches, it has the
state of the code that I used to build zsh-newheaps-zhalloc. It
contains newheaps and zhalloc patch, and not the 36834.

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: newheaps_zhalloc.patch --]
[-- Type: application/octet-stream, Size: 1408 bytes --]

diff --git a/Src/exec.c b/Src/exec.c
index 154bbb8..76af7e9 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -5066,10 +5066,11 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
 #ifdef MAX_FUNCTION_DEPTH
     static int funcdepth;
 #endif
+    Heap funcheap;
 
     queue_signals();	/* Lots of memory and global state changes coming */
 
-    pushheap();
+    NEWHEAPS(funcheap) {
 
     oargv0 = NULL;
     obreaks = breaks;
@@ -5289,7 +5290,8 @@ doshfunc(Shfunc shfunc, LinkList doshargs, int noreturnval)
 	numpipestats = oldnumpipestats;
 	memcpy(pipestats, oldpipestats, sizeof(int)*numpipestats);
     }
-    popheap();
+
+    } OLDHEAPS;
 
     unqueue_signals();
 
diff --git a/Src/mem.c b/Src/mem.c
index b9569ea..63a5afe 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -543,9 +543,14 @@ zhalloc(size_t size)
 
     /* find a heap with enough free space */
 
-    for (h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
-	      ? fheap : heaps);
-	 h; h = h->next) {
+    /*
+     * This previously assigned:
+     *   h = ((fheap && ARENA_SIZEOF(fheap) >= (size + fheap->used))
+     *	      ? fheap : heaps);
+     * but we think that nothing upstream of fheap has more free space,
+     * so why start over at heaps just because fheap has too little?
+     */
+    for (h = (fheap ? fheap : heaps); h; h = h->next) {
 	if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
 	    void *ret;
 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-22 12:49                                                           ` Sebastian Gniazdowski
  2015-10-22 15:00                                                             ` Bart Schaefer
@ 2015-10-23  6:32                                                             ` Sebastian Gniazdowski
  1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-23  6:32 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]

I ran the performance tests again with script formatting them
differently – they are aggregated on test function, not on zsh
version. This allows easier interpretation. A new conclusion is that
zhalloc patch doesn't help for string_test. So the first patch (36834)
speeds up string operations and also introduces increased memory
usage.

Best regards,
Sebastian Gniazdowski

On 22 October 2015 at 14:49, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 19 October 2015 at 19:21, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> We leave that empty arena at the end, but then on the next zhalloc()
>> it's still not big enough, so another one gets allocated.
>>
>> You could try removing that "else" to let the trailing block be freed
>> again, but then we're back to needing to compare speed as well as space
>> across shell versions.
>
> I removed the else, code works longer and memory is still allocated.
> This is seen in mem-test3.txt, where there are many samples and large
> numbers, and in perf-test.txt, where there noted is long running time
> (numbers are milliseconds)
>
> I wrote perf-test.zsh to automatically perform performance tests:
> https://github.com/psprint/zsh-tools/
>
> With it and with mem-test.zsh I tested following zsh variants:
> - 5.0.2 osx
> - newheaps patch (zsh-newheaps-only)
> - newheaps and zhalloc patch 36836 (zsh-newheaps-zhalloc)
> - zhalloc and the previous patch 36834 (zsh-head-both-patches)
> - all three patches (zsh-newheaps-three-patches)
> - a 5.1.1-dev-0 clean zsh
> - the empty arena else removed
>
> My conclusions:
> - 36834 introduces high memory usage; I would suggest to use only two
> optimizations – newheaps and zhalloc; my znavtools are instant fast
> with them, what's slower is searching (wrote search_test function,
> results attached); basically half of pattern search speedup is lost
> with 36834, however what's gained is beautiful memory usage, without
> firefox-like values for RES (~700 MB); newheaps is responsible for
> instant-responsivity of my script, 36836 for much faster searching –
> that's a good compromise
>
> - searching with patterns through very large arrays (700k) allocates
> 264 MB of memory regardless of zsh version; this probably tells
> something
>
> - still not sure why 36834 allocates much memory for the string test
>
> - newheaps makes function calls longer, but it's not a substantial
> difference – 21196 ms vs. 18549 ms
>
> Best regards,
> Sebastian Gniazdowski

[-- Attachment #2: perf-test2.txt --]
[-- Type: text/plain, Size: 1880 bytes --]

Running [zsh-empty-arena-else]:          string_test      24933,14
Running [zsh-newheaps-zhalloc]:          string_test      10045,79
Running [zsh-newheaps-only]:             string_test      10053,55
Running [zsh-5.0.2-osx]:                 string_test       9726,88
Running [zsh-head-both-patches]:         string_test       4204,30
Running [zsh-newheaps-three-patches]:    string_test       4406,47
Running [zsh-5.1.1-dev-0-clean]:         string_test       9600,21

Running [zsh-empty-arena-else]:          array_test       12689,70
Running [zsh-newheaps-zhalloc]:          array_test       12042,77
Running [zsh-newheaps-only]:             array_test       11985,55
Running [zsh-5.0.2-osx]:                 array_test       12206,27
Running [zsh-head-both-patches]:         array_test       12320,56
Running [zsh-newheaps-three-patches]:    array_test       11788,41
Running [zsh-5.1.1-dev-0-clean]:         array_test       12814,19

Running [zsh-empty-arena-else]:          function_test    21774,49
Running [zsh-newheaps-zhalloc]:          function_test    21981,39
Running [zsh-newheaps-only]:             function_test    21268,29
Running [zsh-5.0.2-osx]:                 function_test    18539,70
Running [zsh-head-both-patches]:         function_test    18774,33
Running [zsh-newheaps-three-patches]:    function_test    21171,02
Running [zsh-5.1.1-dev-0-clean]:         function_test    18940,36

Running [zsh-empty-arena-else]:          search_test       3934,60
Running [zsh-newheaps-zhalloc]:          search_test       7007,60
Running [zsh-newheaps-only]:             search_test      11653,22
Running [zsh-5.0.2-osx]:                 search_test      11136,82
Running [zsh-head-both-patches]:         search_test       3872,39
Running [zsh-newheaps-three-patches]:    search_test       3960,67
Running [zsh-5.1.1-dev-0-clean]:         search_test      11281,82


[-- Attachment #3: perf-test.zsh --]
[-- Type: application/octet-stream, Size: 2219 bytes --]

#!/bin/zsh

emulate -L zsh
setopt extendedglob

zmodload zsh/zprof

zshs=( zsh-empty-arena-else zsh-newheaps-zhalloc zsh-newheaps-only )
zshs+=( zsh-5.0.2-osx zsh-head-both-patches zsh-newheaps-three-patches zsh-5.1.1-dev-0-clean )

#
# Children administration
#

trap "finished" SIGUSR1

finished() {
    FINISHED=1
}

# Waits for signal from child process
wait_for_end_of_test() {
    while [ "$FINISHED" -eq 0 ]; do
        sleep 1
    done
    kill -15 "$SUB_PID"
}

_finished_signal_wait() {
    kill -SIGUSR1 "$MAIN_PID"
    sleep 60
}

#
# Tests
#

tests=( string_test array_test function_test search_test )

float multiplier=0.5

string_test() {
    local a=""
    integer i=$(( 150000*multiplier ))
    repeat $i; do a+="$i"; done
}

array_test() {
    typeset -a a
    integer i=$(( 25000*multiplier ))
    repeat $i; do a+=( $i ); done
}

function_test() {
    local count
    integer i=$(( 10000*multiplier ))

    if [ -z "$1" ]; then
        repeat $i; do function_test 100; done
    else
        count="$1"
    fi

    if (( count -- > 0 )); then
        function_test "$count"
    fi
}

search_test() {
    integer elements=$(( 800000 * multiplier ))
    a="${(r:elements:: _:)b}"
    a=( $=a )
    a=( "${(@M)a:#(#i)*_*}" )
    a=( "${(@)a//(#mi)(_|a)/-${MATCH}-}" )
    a=( "${(@)a//(#bi)(_|-)/|${match[1]}|}" )
}

#
# Main code
#

# Detect main vs. for-test invocation
if [ -z "$1" ]; then

    for test in "$tests[@]"; do
        for current_zsh in "$zshs[@]"; do

            type "$current_zsh" 2>/dev/null 1>&2 || { echo >&2 "Skipping non-accessible $current_zsh"; continue }
            zsh_binary="${current_zsh##*/}"
            FINISHED=0
            TEST="$test"

            "$current_zsh" -c "source ./$0 $$ \"$current_zsh\" $test" &

            SUB_PID=$!
            wait_for_end_of_test
        done

        echo

    done

else
    MAIN_PID="$1"
    zsh_binary="${2##*/}"
    shift
    shift
    txt="Running [$zsh_binary]: "
    echo -n "${(r:40:: :)txt}" "${(r:15:: :)*}"

    # Run the test
    zprof -c
    "$@"
    zprof_out=( "${(@f)"$( zprof )"}" )
    zprof_out="$zprof_out[3]"
    zprof_out=( $=zprof_out )
    echo "${(l:10:: :)zprof_out[3]}"

    _finished_signal_wait
fi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-22 15:00                                                             ` Bart Schaefer
  2015-10-22 16:28                                                               ` Sebastian Gniazdowski
@ 2015-10-23 15:40                                                               ` Sebastian Gniazdowski
  2015-10-23 15:57                                                                 ` Sebastian Gniazdowski
  1 sibling, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-23 15:40 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]

I attach the patch that withdraws 9f8e3e8 and 643aca9 (which are 36834)

Best regards,
Sebastian Gniazdowski

On 22 October 2015 at 17:00, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 22,  2:49pm, Sebastian Gniazdowski wrote:
> }
> } My conclusions:
> } - 36834 introduces high memory usage; I would suggest to use only two
> } optimizations -- newheaps and zhalloc; my znavtools are instant fast
> } with them, what's slower is searching (wrote search_test function,
> } results attached)
>
> My apologies, but I'm not sure I followed all of that.  Can you produce
> a diff against commit 8e9a68ad14655c1949b1a04d5715a5caa8c344ee that
> shows the situation you think is the best so far?
>
> } - still not sure why 36834 allocates much memory for the string test
>
> Seeing the aforementioned diff may provide a hint.
>
> } - newheaps makes function calls longer, but it's not a substantial
> } difference
>
> 14% is pretty substantial, but it's probably all gained back if the
> function does anything significant with local variables.

[-- Attachment #2: fheaps_revert.patch --]
[-- Type: application/octet-stream, Size: 4413 bytes --]

diff --git a/Src/mem.c b/Src/mem.c
index 68bd767..63a5afe 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -294,7 +294,7 @@ pushheap(void)
 #endif
 
     for (h = heaps; h; h = h->next) {
-	DPUTS(!h->used && h->next, "BUG: empty heap");
+	DPUTS(!h->used, "BUG: empty heap");
 	hs = (Heapstack) zalloc(sizeof(*hs));
 	hs->next = h->sp;
 	h->sp = hs;
@@ -334,15 +334,17 @@ freeheap(void)
      *
      * Whenever fheap is NULL here, the loop below sweeps back over the
      * entire heap list again, resetting the free space in every arena to
-     * the amount stashed by pushheap() and finding the arena with the most
+     * the amount stashed by pushheap() and finding the first arena with
      * free space to optimize zhalloc()'s next search.  When there's a lot
      * of stuff already on the heap, this is an enormous amount of work,
      * and performance goes to hell.
      *
-     * Therefore, we defer freeing the most recently allocated arena until
-     * we reach popheap().  This may fail to reclaim some space in earlier
-     * arenas.
-     *
+     * However, if the arena to which fheap points is unused, we want to
+     * free it, so we have no choice but to do the sweep for a new fheap.
+     */
+    if (fheap && !fheap->sp)
+	fheap = NULL;	/* We used to do this unconditionally */
+    /*
      * In other cases, either fheap is already correct, or it has never
      * been set and this loop will do it, or it'll be reset from scratch
      * on the next popheap().  So all that's needed here is to pick up
@@ -359,11 +361,7 @@ freeheap(void)
 	    memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
 #endif
 	    h->used = h->sp->used;
-	    if (!fheap) {
-		if (h->used < ARENA_SIZEOF(h))
-		    fheap = h;
-	    } else if (ARENA_SIZEOF(h) - h->used >
-		       ARENA_SIZEOF(fheap) - fheap->used)
+	    if (!fheap && h->used < ARENA_SIZEOF(h))
 		fheap = h;
 	    hl = h;
 #ifdef ZSH_HEAP_DEBUG
@@ -386,26 +384,6 @@ freeheap(void)
 	    VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
 #endif
 	} else {
-	    if (fheap == h)
-		fheap = NULL;
-	    if (h->next) {
-		/* We want to cut this out of the arena list if we can */
-		if (h == heaps)
-		    hl = heaps = h->next;
-		else if (hl && hl->next == h)
-		    hl->next = h->next;
-		else {
-		    DPUTS(hl, "hl->next != h when freeing");
-		    hl = h;
-		    continue;
-		}
-		h->next = NULL;
-	    } else {
-		/* Leave an empty arena at the end until popped */
-		h->used = 0;
-		fheap = hl = h;
-		break;
-	    }
 #ifdef USE_MMAP
 	    munmap((void *) h, h->size);
 #else
@@ -463,29 +441,12 @@ popheap(void)
 #ifdef ZSH_VALGRIND
 	    VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
 #endif
-	    if (!fheap) {
-		if (h->used < ARENA_SIZEOF(h))
-		    fheap = h;
-	    } else if (ARENA_SIZEOF(h) - h->used >
-		       ARENA_SIZEOF(fheap) - fheap->used)
+	    if (!fheap && h->used < ARENA_SIZEOF(h))
 		fheap = h;
 	    zfree(hs, sizeof(*hs));
 
 	    hl = h;
 	} else {
-	    if (h->next) {
-		/* We want to cut this out of the arena list if we can */
-		if (h == heaps)
-		    hl = heaps = h->next;
-		else if (hl && hl->next == h)
-		    hl->next = h->next;
-		else {
-		    DPUTS(hl, "hl->next != h when popping");
-		    hl = h;
-		    continue;
-		}
-		h->next = NULL;
-	    }
 #ifdef USE_MMAP
 	    munmap((void *) h, h->size);
 #else
@@ -563,7 +524,7 @@ zheapptr(void *p)
 mod_export void *
 zhalloc(size_t size)
 {
-    Heap h, hp = NULL;
+    Heap h;
     size_t n;
 #ifdef ZSH_VALGRIND
     size_t req_size = size;
@@ -590,7 +551,6 @@ zhalloc(size_t size)
      * so why start over at heaps just because fheap has too little?
      */
     for (h = (fheap ? fheap : heaps); h; h = h->next) {
-	hp = h;
 	if (ARENA_SIZEOF(h) >= (n = size + h->used)) {
 	    void *ret;
 
@@ -611,6 +571,7 @@ zhalloc(size_t size)
 	}
     }
     {
+	Heap hp;
         /* not found, allocate new heap */
 #if defined(ZSH_MEM) && !defined(USE_MMAP)
 	static int called = 0;
@@ -619,6 +580,7 @@ zhalloc(size_t size)
 #endif
 
 	n = HEAP_ARENA_SIZE > size ? HEAPSIZE : size + sizeof(*h);
+	for (hp = NULL, h = heaps; h; hp = h, h = h->next);
 
 #ifdef USE_MMAP
 	h = mmap_heap_alloc(&n);
@@ -650,7 +612,6 @@ zhalloc(size_t size)
 	VALGRIND_MEMPOOL_ALLOC((char *)h, (char *)arena(h), req_size);
 #endif
 
-	DPUTS(hp && hp->next, "failed to find end of chain in zhalloc");
 	if (hp)
 	    hp->next = h;
 	else

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-23 15:40                                                               ` Sebastian Gniazdowski
@ 2015-10-23 15:57                                                                 ` Sebastian Gniazdowski
  2015-10-23 19:26                                                                   ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-23 15:57 UTC (permalink / raw)
  To: zsh-workers

On 23 October 2015 at 17:40, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> I attach the patch that withdraws 9f8e3e8 and 643aca9 (which are 36834)

One note is that 643aca9 isn't fully from 36834 because:
           if (fheap == h)
               fheap = NULL;

Is moved to top of "if (h->next) {". Was this intended?

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-23 15:57                                                                 ` Sebastian Gniazdowski
@ 2015-10-23 19:26                                                                   ` Bart Schaefer
  2015-10-23 23:50                                                                     ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-23 19:26 UTC (permalink / raw)
  To: Sebastian Gniazdowski, zsh-workers

On Oct 23,  8:32am, Sebastian Gniazdowski wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} 
} [ text/plain
}   Encoded with "quoted-printable" ] :
} 
} I ran the performance tests again with script formatting them
} differently - they are aggregated on test function, not on zsh
} version. This allows easier interpretation. A new conclusion is that
} zhalloc patch doesn't help for string_test. So the first patch (36834)
} speeds up string operations and also introduces increased memory
} usage.

Thanks for the patch.  I think you've removed too much; the part about
finding the arena with the most space rather than first arena with any
space should have stayed, as should the changes to zhalloc() that were
included in 36834.

Try the below patch against 8e9a68ad1 ?


On Oct 23,  5:57pm, Sebastian Gniazdowski wrote:
} Subject: Re: Slowdown around 5.0.5-dev-0
}
} On 23 October 2015 at 17:40, Sebastian Gniazdowski
} <sgniazdowski@gmail.com> wrote:
} > I attach the patch that withdraws 9f8e3e8 and 643aca9 (which are 36834)
} 
} One note is that 643aca9 isn't fully from 36834 because:
}            if (fheap == h)
}                fheap = NULL;
} 
} Is moved to top of "if (h->next) {". Was this intended?

Yes it was, but irrelevant since that whole block is being backed out.


diff --git a/Src/mem.c b/Src/mem.c
index 68bd767..15d8fd2 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -294,7 +294,7 @@ pushheap(void)
 #endif
 
     for (h = heaps; h; h = h->next) {
-	DPUTS(!h->used && h->next, "BUG: empty heap");
+	DPUTS(!h->used, "BUG: empty heap");
 	hs = (Heapstack) zalloc(sizeof(*hs));
 	hs->next = h->sp;
 	h->sp = hs;
@@ -339,10 +339,12 @@ freeheap(void)
      * of stuff already on the heap, this is an enormous amount of work,
      * and performance goes to hell.
      *
-     * Therefore, we defer freeing the most recently allocated arena until
-     * we reach popheap().  This may fail to reclaim some space in earlier
-     * arenas.
-     *
+     * However, if the arena to which fheap points is unused, we want to
+     * free it, so we have no choice but to do the sweep for a new fheap.
+     */
+    if (fheap && !fheap->sp)
+	fheap = NULL;	/* We used to do this unconditionally */
+    /*
      * In other cases, either fheap is already correct, or it has never
      * been set and this loop will do it, or it'll be reset from scratch
      * on the next popheap().  So all that's needed here is to pick up
@@ -386,26 +388,6 @@ freeheap(void)
 	    VALGRIND_MEMPOOL_TRIM((char *)h, (char *)arena(h), h->used);
 #endif
 	} else {
-	    if (fheap == h)
-		fheap = NULL;
-	    if (h->next) {
-		/* We want to cut this out of the arena list if we can */
-		if (h == heaps)
-		    hl = heaps = h->next;
-		else if (hl && hl->next == h)
-		    hl->next = h->next;
-		else {
-		    DPUTS(hl, "hl->next != h when freeing");
-		    hl = h;
-		    continue;
-		}
-		h->next = NULL;
-	    } else {
-		/* Leave an empty arena at the end until popped */
-		h->used = 0;
-		fheap = hl = h;
-		break;
-	    }
 #ifdef USE_MMAP
 	    munmap((void *) h, h->size);
 #else
@@ -473,19 +455,6 @@ popheap(void)
 
 	    hl = h;
 	} else {
-	    if (h->next) {
-		/* We want to cut this out of the arena list if we can */
-		if (h == heaps)
-		    hl = heaps = h->next;
-		else if (hl && hl->next == h)
-		    hl->next = h->next;
-		else {
-		    DPUTS(hl, "hl->next != h when popping");
-		    hl = h;
-		    continue;
-		}
-		h->next = NULL;
-	    }
 #ifdef USE_MMAP
 	    munmap((void *) h, h->size);
 #else


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-23 19:26                                                                   ` Bart Schaefer
@ 2015-10-23 23:50                                                                     ` Bart Schaefer
  2015-10-24  6:09                                                                       ` Sebastian Gniazdowski
                                                                                         ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-23 23:50 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]

On Oct 23, 12:26pm, Bart Schaefer wrote:
}
} Try the below patch against 8e9a68ad1 ?

Attached are my results from running the Sebastian's tests against
whatever binaries I had lying around that had zprof linked, plus
zsh with workers/36926.  Output attached.

I then rebuilt git head (8e9a68ad) and ran the tests again.  First
thing to note is that the results for the string test changed by
about 20% just because of running the test a second time, possibly
because system state changed as a result of using mmap() for heap.

However, if the results can be considered reliable anyway, note that
36926 retains most of the search_test speed, which I think is an
improvement over backing out all of 36834 (Sebastian's results for
zsh-newheaps-zhalloc, if I'm reading his email correctly).

Also zsh-5.0.7 - zsh-5.1 use a LOT of memory for the search test,
and 8e9a68ad uses almost as much for the string test, but 36926 is
pretty good at keeping both in check (of course the real reason
the search test is better is not mem.c but instead PWS's work in
pattern.c).  Also 36926 and 8e9a68ad had noticeably smaller memory
footprint with the function test than 5.1 or earlier, so NEWHEAPS
there is definitely worth the slight loss of speed.

I don't have hard numbers to show for memory because I was watching
"top" ... but each new process seems to inherit some memory from the
previous one, so those mmap()ed segments are probably being shared
between the parent shell (the one driving the test script) and its
children?

Conclusion: 36926 is a good thing.  Thanks, Sebastian, for observing
the memory usage issue with 36384.  Can you confirm my results?

-- 
Barton E. Schaefer

[-- Attachment #2: Text --]
[-- Type: text/plain , Size: 2472 bytes --]

Pass 1:

Running [zsh-5.0.7]:                     string_test      51764.74
Running [zsh-5.0.8]:                     string_test      51473.90
Running [zsh-5.1]:                       string_test      52185.28
Running [zsh-36926]:                     string_test      38504.65

Running [zsh-5.0.7]:                     array_test        6893.61
Running [zsh-5.0.8]:                     array_test        6783.31
Running [zsh-5.1]:                       array_test        7013.65
Running [zsh-36926]:                     array_test        6214.61

Running [zsh-5.0.7]:                     function_test    21903.79
Running [zsh-5.0.8]:                     function_test    23027.64
Running [zsh-5.1]:                       function_test    22488.52
Running [zsh-36926]:                     function_test    30285.55

Running [zsh-5.0.7]:                     search_test       8359.19
Running [zsh-5.0.8]:                     search_test       8626.48
Running [zsh-5.1]:                       search_test       8176.59
Running [zsh-36926]:                     search_test       4394.92

Pass 2 (added build of git head as "zsh"):

Running [zsh-5.0.7]:                     string_test      41757.54
Running [zsh-5.0.8]:                     string_test      41443.93
Running [zsh-5.1]:                       string_test      41697.08
Running [zsh]:                           string_test      23052.51
Running [zsh-36926]:                     string_test      26685.68

Running [zsh-5.0.7]:                     array_test        6956.99
Running [zsh-5.0.8]:                     array_test        6791.33
Running [zsh-5.1]:                       array_test        6996.48
Running [zsh]:                           array_test        6273.76
Running [zsh-36926]:                     array_test        6326.08

Running [zsh-5.0.7]:                     function_test    21892.22
Running [zsh-5.0.8]:                     function_test    22012.86
Running [zsh-5.1]:                       function_test    22046.14
Running [zsh]:                           function_test    29858.53
Running [zsh-36926]:                     function_test    28719.64

Running [zsh-5.0.7]:                     search_test       8254.72
Running [zsh-5.0.8]:                     search_test       8508.30
Running [zsh-5.1]:                       search_test       8599.57
Running [zsh]:                           search_test       4402.41
Running [zsh-36926]:                     search_test       4244.59

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-23 23:50                                                                     ` Bart Schaefer
@ 2015-10-24  6:09                                                                       ` Sebastian Gniazdowski
  2015-10-24  7:37                                                                         ` Sebastian Gniazdowski
  2015-10-24  6:27                                                                       ` Sebastian Gniazdowski
  2015-10-24 10:54                                                                       ` Sebastian Gniazdowski
  2 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24  6:09 UTC (permalink / raw)
  To: zsh-workers

On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 23, 12:26pm, Bart Schaefer wrote:
> }
> } Try the below patch against 8e9a68ad1 ?
>
> However, if the results can be considered reliable anyway, note that
> 36926 retains most of the search_test speed, which I think is an
> improvement over backing out all of 36834 (Sebastian's results for
> zsh-newheaps-zhalloc, if I'm reading his email correctly).

I ran quick tests yesterday and obtained this too. Good news that not
all gains of 36834 will be withdrawn! :) The newheaps-zhalloc is what
you write, two following patches without 36834.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-23 23:50                                                                     ` Bart Schaefer
  2015-10-24  6:09                                                                       ` Sebastian Gniazdowski
@ 2015-10-24  6:27                                                                       ` Sebastian Gniazdowski
  2015-10-24 10:54                                                                       ` Sebastian Gniazdowski
  2 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24  6:27 UTC (permalink / raw)
  To: zsh-workers

On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 23, 12:26pm, Bart Schaefer wrote:
> }
> } Try the below patch against 8e9a68ad1 ?
>
> Attached are my results from running the Sebastian's tests against
> whatever binaries I had lying around that had zprof linked, plus
> zsh with workers/36926.  Output attached.
>
> I then rebuilt git head (8e9a68ad) and ran the tests again.  First
> thing to note is that the results for the string test changed by
> about 20% just because of running the test a second time, possibly
> because system state changed as a result of using mmap() for heap.

Maybe some other process was using CPU? My tests are stable and when
there is a difference it's because I switched a window or firefox with
active flash was running in background

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24  6:09                                                                       ` Sebastian Gniazdowski
@ 2015-10-24  7:37                                                                         ` Sebastian Gniazdowski
  2015-10-24  8:04                                                                           ` Sebastian Gniazdowski
  2015-10-24 19:39                                                                           ` Bart Schaefer
  0 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24  7:37 UTC (permalink / raw)
  To: zsh-workers

On 24 October 2015 at 08:09, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> On Oct 23, 12:26pm, Bart Schaefer wrote:
>> }
>> } Try the below patch against 8e9a68ad1 ?
>>
>> However, if the results can be considered reliable anyway, note that
>> 36926 retains most of the search_test speed, which I think is an
>> improvement over backing out all of 36834 (Sebastian's results for
>> zsh-newheaps-zhalloc, if I'm reading his email correctly).
>
> I ran quick tests yesterday and obtained this too. Good news that not
> all gains of 36834 will be withdrawn! :) The newheaps-zhalloc is what
> you write, two following patches without 36834.

I misread this. What I wanted to state, is that your patch from
yesterday retained search_test speed.

The fact that 36836 has its contribution in speeding up search_test is
was described by me in previous email:

> newheaps is responsible for
> instant-responsivity of my script, 36836 for much faster searching –
> that's a good compromise

So the thing is that your patch from yesterday retains search_test
speed up of 36834, while fixing the memory issues of 36834. Together
with 36836, this gives full retain of search_test speedup, nothing is
lost, speed of pattern matching is as high as with no revert from
current head state. What's lost is string_test speedup 36834 gave,
however, it did so with cost of increased memory usage

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24  7:37                                                                         ` Sebastian Gniazdowski
@ 2015-10-24  8:04                                                                           ` Sebastian Gniazdowski
  2015-10-24 19:39                                                                           ` Bart Schaefer
  1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24  8:04 UTC (permalink / raw)
  To: zsh-workers

Turns out I read this correctly for the first time. You meant 36926,
which is the yesterday's patch. Having this in mind my email still
makes sense. I just described how 36834, 36836, 36926 behave.

On 24 October 2015 at 09:37, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> On 24 October 2015 at 08:09, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> On 24 October 2015 at 01:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
>>> On Oct 23, 12:26pm, Bart Schaefer wrote:
>>> }
>>> } Try the below patch against 8e9a68ad1 ?
>>>
>>> However, if the results can be considered reliable anyway, note that
>>> 36926 retains most of the search_test speed, which I think is an
>>> improvement over backing out all of 36834 (Sebastian's results for
>>> zsh-newheaps-zhalloc, if I'm reading his email correctly).
>>
>> I ran quick tests yesterday and obtained this too. Good news that not
>> all gains of 36834 will be withdrawn! :) The newheaps-zhalloc is what
>> you write, two following patches without 36834.
>
> I misread this. What I wanted to state, is that your patch from
> yesterday retained search_test speed.
>
> The fact that 36836 has its contribution in speeding up search_test is
> was described by me in previous email:
>
>> newheaps is responsible for
>> instant-responsivity of my script, 36836 for much faster searching –
>> that's a good compromise
>
> So the thing is that your patch from yesterday retains search_test
> speed up of 36834, while fixing the memory issues of 36834. Together
> with 36836, this gives full retain of search_test speedup, nothing is
> lost, speed of pattern matching is as high as with no revert from
> current head state. What's lost is string_test speedup 36834 gave,
> however, it did so with cost of increased memory usage
>
> Best regards,
> Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-23 23:50                                                                     ` Bart Schaefer
  2015-10-24  6:09                                                                       ` Sebastian Gniazdowski
  2015-10-24  6:27                                                                       ` Sebastian Gniazdowski
@ 2015-10-24 10:54                                                                       ` Sebastian Gniazdowski
  2015-10-24 11:25                                                                         ` Sebastian Gniazdowski
  2015-10-24 16:31                                                                         ` Bart Schaefer
  2 siblings, 2 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 10:54 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

I attach my results for the same builds as yours. Tried to calibrate
the running time and I ended up running really long tests (quite
exhausting to move mouse every few minutes as my screensaver is broken
and always turns on after few minutes). Will try to calibrate again
later, and I also am running the same tests on a ubuntu netbook. Few
conclusions:

> Running [zsh-5.0.7]:                     string_test      48631,87
> Running [zsh-5.0.8]:                     string_test      86302,14

This says there something happened after 5.0.7 and reveals itself on
OS X, could try to track it with bisect.

> Running [zsh-5.1]:                       string_test      86698,42
> Running [zsh-head-8e9a68a]:              string_test      38887,54
> Running [zsh-36926]:                     string_test      86497,62

This says 36926 doesn't include string boost up (at cost of memory
use) that current head has.

> Running [zsh-5.1]:                       search_test      51321,70
> Running [zsh-head-8e9a68a]:              search_test      10071,26
> Running [zsh-36926]:                     search_test      10260,60

This says that pattern maching boost up is retained.

> Running [zsh-5.1]:                       function_test    49070,83
> Running [zsh-head-8e9a68a]:              function_test    56299,14
> Running [zsh-36926]:                     function_test    54456,34

This informs about slowdown caused by newheaps patch, and points that
36926 can make it smaller.


I also did memory tests:

> # Tests for zsh-head-8e9a68a
> # string_test 42, 182, 318, 457, 590, 720, last: 10
> string_test 317.8

This informs about the 36834 problem with memory.

> # Tests for zsh-36926
> # string_test 2.1, 6.6, 8.6, 10, 10, 11, 9.5, 11, 11, 11, 11, 11, last: 9.6
> string_test 9.3

This shows it's solved.

> # search_test 110, 215, last: 265
> search_test 110.0

This is the high memory usage when matching patterns in large arrays
you mentioned.

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: perf-test-36926.txt --]
[-- Type: text/plain, Size: 2480 bytes --]

2.3 Intel Core i5, 8 GB RAM, OS X 10.9.2

Pass 1:

Running [zsh-5.0.7]:                     string_test      48631,87
Running [zsh-5.0.8]:                     string_test      86302,14
Running [zsh-5.1]:                       string_test      87970,33
Running [zsh-36926]:                     string_test      86682,35

Running [zsh-5.0.7]:                     array_test       87699,03
Running [zsh-5.0.8]:                     array_test       91540,41
Running [zsh-5.1]:                       array_test       82907,69
Running [zsh-36926]:                     array_test       82203,27

Running [zsh-5.0.7]:                     function_test    44150,02
Running [zsh-5.0.8]:                     function_test    47915,24
Running [zsh-5.1]:                       function_test    48448,95
Running [zsh-36926]:                     function_test    57126,86

Running [zsh-5.0.7]:                     search_test      55862,93
Running [zsh-5.0.8]:                     search_test      51480,07
Running [zsh-5.1]:                       search_test      51027,38
Running [zsh-36926]:                     search_test      10274,11

Pass 2:

Running [zsh-5.0.7]:                     string_test      48799,25
Running [zsh-5.0.8]:                     string_test      86682,21
Running [zsh-5.1]:                       string_test      86698,42
Running [zsh-head-8e9a68a]:              string_test      38887,54
Running [zsh-36926]:                     string_test      86497,62

Running [zsh-5.0.7]:                     array_test       86754,03
Running [zsh-5.0.8]:                     array_test       82451,51
Running [zsh-5.1]:                       array_test       82880,56
Running [zsh-head-8e9a68a]:              array_test       82276,80
Running [zsh-36926]:                     array_test       83075,01

Running [zsh-5.0.7]:                     function_test    45516,67
Running [zsh-5.0.8]:                     function_test    49122,46
Running [zsh-5.1]:                       function_test    49070,83
Running [zsh-head-8e9a68a]:              function_test    56299,14
Running [zsh-36926]:                     function_test    54456,34

Running [zsh-5.0.7]:                     search_test      57267,29
Running [zsh-5.0.8]:                     search_test      50841,91
Running [zsh-5.1]:                       search_test      51321,70
Running [zsh-head-8e9a68a]:              search_test      10071,26
Running [zsh-36926]:                     search_test      10260,60


[-- Attachment #3: mem-test-36926.txt --]
[-- Type: text/plain, Size: 1779 bytes --]

2.3 Intel Core i5, 8 GB RAM, OS X 10.9.2

# Tests for zsh-5.0.7
# string_test 5.8, 10, 12, 11, 11, 12, last: 11
string_test 10.0
# array_test 1.3, 2.3, 3.2, 3.9, 4.3, 5.2, 6.2, 7.3, 7.7, 8.1, 8.3, 8.6, last: 8.7
array_test 5.3
# function_test 2.7, 2.8, 2.8, 2.8, 2.8, 2.8, last: 2.8
function_test 2.8
# search_test 65, 117, 149, 175, 198, 219, 238, last: 257
search_test 153.8


# Tests for zsh-5.0.8
# string_test 2.1, 6.7, 8.6, 10, 10, 11, 9.5, 10, 11, 11, 11, 11, last: 9.6
string_test 9.2
# array_test 1.2, 2.1, 3.3, 3.7, 4.4, 5.4, 6.4, 7.1, 7.8, 8.1, 8.4, 8.6, last: 8.9
array_test 5.3
# function_test 2.5, 2.6, 2.5, 2.6, 2.5, 2.5, last: 2.5
function_test 2.5
# search_test 60, 126, 159, 187, 212, 235, last: 256
search_test 148.8


# Tests for zsh-5.1
# string_test 2.4, 7.1, 8.8, 10, 10, 11, 9.7, 10, 11, 11, 12, 11, last: 10
string_test 9.4
# array_test 1.1, 2.4, 2.9, 3.5, 4.3, 5.3, 6.3, 7.1, 7.8, 8.1, 8.4, 8.7, last: 8.8
array_test 5.2
# function_test 2.5, 2.6, 2.6, 2.6, 2.6, 2.6, 2.6, last: 2.6
function_test 2.6
# search_test 60, 125, 158, 186, 211, 234, 256, last: 257
search_test 162.3


# Tests for zsh-head-8e9a68a
# string_test 42, 182, 318, 457, 590, 720, last: 10
string_test 317.8
# array_test 1, 2.5, 3, 4, 4.5, 5.2, 6.3, 7.3, 7.8, 8.2, 8.4, 8.7, last: 8.8
array_test 5.3
# function_test 2.6, 2.7, 2.6, 2.9, 2.8, 2.9, 2.7, 2.8, last: 2.5
function_test 2.7
# search_test 106, 213, last: 264
search_test 106.0


# Tests for zsh-36926
# string_test 2.1, 6.6, 8.6, 10, 10, 11, 9.5, 11, 11, 11, 11, 11, last: 9.6
string_test 9.3
# array_test 1.2, 2.6, 3.1, 3.7, 4.4, 5.5, 6.5, 7.3, 8, 8.3, 8.5, 8.9, last: 9.1
array_test 5.4
# function_test 2.9, 2.7, 2.7, 2.8, 2.9, 2.9, 2.9, 2.7, last: 2.6
function_test 2.8
# search_test 110, 215, last: 265
search_test 110.0

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24 10:54                                                                       ` Sebastian Gniazdowski
@ 2015-10-24 11:25                                                                         ` Sebastian Gniazdowski
  2015-10-24 16:31                                                                         ` Bart Schaefer
  1 sibling, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-24 11:25 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 82 bytes --]

Performance tests from Ubuntu Atom 1.6 laptop

Best regards
Sebastian Gniazdowski

[-- Attachment #2: perf-test-36926-atom.txt --]
[-- Type: text/plain, Size: 2468 bytes --]

Intel Atom CPU 330 @ 1.60GHz

Pass 1:

Running [zsh-5.0.7]:                     string_test      58573.49
Running [zsh-5.0.8]:                     string_test      58206.73
Running [zsh-5.1]:                       string_test      58998.19
Running [zsh-36926]:                     string_test      58491.39

Running [zsh-5.0.7]:                     array_test       27821.68
Running [zsh-5.0.8]:                     array_test       28300.12
Running [zsh-5.1]:                       array_test       28862.40
Running [zsh-36926]:                     array_test       27642.97

Running [zsh-5.0.7]:                     function_test   102523.34
Running [zsh-5.0.8]:                     function_test   105471.44
Running [zsh-5.1]:                       function_test   107102.55
Running [zsh-36926]:                     function_test   122497.27

Running [zsh-5.0.7]:                     search_test      35029.27
Running [zsh-5.0.8]:                     search_test      36958.43
Running [zsh-5.1]:                       search_test      36506.89
Running [zsh-36926]:                     search_test      27938.21

Pass 2:

Running [zsh-5.0.7]:                     string_test      58273.59
Running [zsh-5.0.8]:                     string_test      58166.23
Running [zsh-5.1]:                       string_test      58190.08
Running [zsh-head-8e9a68a]:              string_test      39081.73
Running [zsh-36926]:                     string_test      58978.04

Running [zsh-5.0.7]:                     array_test       27647.93
Running [zsh-5.0.8]:                     array_test       28353.12
Running [zsh-5.1]:                       array_test       28626.84
Running [zsh-head-8e9a68a]:              array_test       27842.07
Running [zsh-36926]:                     array_test       27620.49

Running [zsh-5.0.7]:                     function_test   102423.34
Running [zsh-5.0.8]:                     function_test   105560.15
Running [zsh-5.1]:                       function_test   107393.99
Running [zsh-head-8e9a68a]:              function_test   121731.23
Running [zsh-36926]:                     function_test   121420.25

Running [zsh-5.0.7]:                     search_test      34996.94
Running [zsh-5.0.8]:                     search_test      36811.87
Running [zsh-5.1]:                       search_test      36438.17
Running [zsh-head-8e9a68a]:              search_test      27979.98
Running [zsh-36926]:                     search_test      27904.23


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24 10:54                                                                       ` Sebastian Gniazdowski
  2015-10-24 11:25                                                                         ` Sebastian Gniazdowski
@ 2015-10-24 16:31                                                                         ` Bart Schaefer
  2015-10-24 16:41                                                                           ` Bart Schaefer
  1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-24 16:31 UTC (permalink / raw)
  To: zsh-workers

On Oct 24, 12:54pm, Sebastian Gniazdowski wrote:
} 
} > Running [zsh-5.0.7]:                     string_test      48631,87
} > Running [zsh-5.0.8]:                     string_test      86302,14
} 
} This says there something happened after 5.0.7 and reveals itself on
} OS X, could try to track it with bisect.

What happened is this:

    34451: use mmap() also on Mac OS X
    
    Let MAP_ANONYMOUS be an alias to MAP_ANON if the former is not
    defined.

Which strongly suggests that using the system malloc() rather than our
own mmap() strategy was the correct thing to do on MacOS, and we should
back that out and make note.  Also might be worth checking whether the
behavior is better on other platforms if we disable mmap().


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24 16:31                                                                         ` Bart Schaefer
@ 2015-10-24 16:41                                                                           ` Bart Schaefer
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Schaefer @ 2015-10-24 16:41 UTC (permalink / raw)
  To: zsh-workers

On Oct 24,  9:31am, Bart Schaefer wrote:
} 
} Also might be worth checking whether the
} behavior is better on other platforms if we disable mmap().

I set HAVE_MMAP 0 in config.h, recompiled, and compared on my CentOS VM;
there is no significant difference in string test:

Running [zsh-noMMAP]:                    string_test      26612.85
Running [zsh-36926]:                     string_test      26688.94

Function test is actually slightly worse:

Running [zsh-noMMAP]:                    function_test    28053.23
Running [zsh-36926]:                     function_test    27759.94

No significant diff in the other tests either.  So currently this only
seems to apply to MacOS.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24  7:37                                                                         ` Sebastian Gniazdowski
  2015-10-24  8:04                                                                           ` Sebastian Gniazdowski
@ 2015-10-24 19:39                                                                           ` Bart Schaefer
  2015-10-25  7:35                                                                             ` Sebastian Gniazdowski
  1 sibling, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-24 19:39 UTC (permalink / raw)
  To: zsh-workers

On Oct 24,  9:37am, Sebastian Gniazdowski wrote:
}
} So the thing is that your patch from yesterday retains search_test
} speed up of 36834, while fixing the memory issues of 36834. Together
} with 36836, this gives full retain of search_test speedup, nothing is
} lost, speed of pattern matching is as high as with no revert from
} current head state. What's lost is string_test speedup 36834 gave,
} however, it did so with cost of increased memory usage

OK, let's try it this way.  This retains MOST of 36834, but puts back
scanning the entire heap if we intend to free the block to which the
current fheap pointer refers.

(This is a patch against 8e9a68ad but applies to current head.)


diff --git a/Src/mem.c b/Src/mem.c
index 68bd767..ca9ab96 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -343,6 +343,12 @@ freeheap(void)
      * we reach popheap().  This may fail to reclaim some space in earlier
      * arenas.
      *
+     * However, if the arena to which fheap points is unused, we want to
+     * free it, so we have no choice but to do the sweep for a new fheap.
+     */
+    if (fheap && !fheap->sp)
+       fheap = NULL;   /* We used to do this unconditionally */
+    /*
      * In other cases, either fheap is already correct, or it has never
      * been set and this loop will do it, or it'll be reset from scratch
      * on the next popheap().  So all that's needed here is to pick up


Here are my results for that; memory usage is comparable to 36926 while
speed is improved.

Running [zsh-HEAD]:                      string_test      23203.72
Running [zsh-36926]:                     string_test      38478.23
Running [zsh-HEAD+rescan]:               string_test      28787.44

Running [zsh-HEAD]:                      array_test        6261.07
Running [zsh-36926]:                     array_test        6269.27
Running [zsh-HEAD+rescan]:               array_test        6269.94

Running [zsh-HEAD]:                      function_test    29609.59
Running [zsh-36926]:                     function_test    30133.26
Running [zsh-HEAD+rescan]:               function_test    30237.71

Running [zsh-HEAD]:                      search_test       4209.37
Running [zsh-36926]:                     search_test       4194.97
Running [zsh-HEAD+rescan]:               search_test       4260.73


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-24 19:39                                                                           ` Bart Schaefer
@ 2015-10-25  7:35                                                                             ` Sebastian Gniazdowski
  2015-10-25 17:23                                                                               ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-25  7:35 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

On 24 October 2015 at 21:39, Bart Schaefer <schaefer@brasslantern.com> wrote:
> OK, let's try it this way.  This retains MOST of 36834, but puts back
> scanning the entire heap if we intend to free the block to which the
> current fheap pointer refers.
>
> (This is a patch against 8e9a68ad but applies to current head.)

My results. 36943 has string_test speed of current head (so it's
slightly better than in your results). And memory usage is in check,
so that's a fix.

I calibrated for string_test, it's interesting how much slower
array_test is on OS X. My machine is 2.3 GHz, maybe that's some reason
(your is faster IIRC).

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: perf-test-36943.txt --]
[-- Type: text/plain, Size: 808 bytes --]

Running [zsh-head-8e9a68a]:              string_test      22081,62
Running [zsh-36926]:                     string_test      48766,75
Running [zsh-36943]:                     string_test      21897,12

Running [zsh-head-8e9a68a]:              array_test       48814,49
Running [zsh-36926]:                     array_test       48213,22
Running [zsh-36943]:                     array_test       49198,79

Running [zsh-head-8e9a68a]:              function_test    43091,00
Running [zsh-36926]:                     function_test    44121,87
Running [zsh-36943]:                     function_test    42354,93

Running [zsh-head-8e9a68a]:              search_test       8396,50
Running [zsh-36926]:                     search_test       8122,48
Running [zsh-36943]:                     search_test       7816,17


[-- Attachment #3: mem-test-36943.txt --]
[-- Type: text/plain, Size: 1003 bytes --]

# Tests for zsh-head-8e9a68a
# string_test 43, 179, 318, 451, 583, 713, last: 9.6
string_test 314.8
# array_test 1.3, 2.5, 3.3, 3.7, 4.3, 5.3, 6.3, 7, 7.8, 8, 8.4, 8.6, last: 8.8
array_test 5.3
# function_test 2.7, 2.9, 2.6, 2.9, 2.9, 2.8, 2.7, 2.6, last: 2.6
function_test 2.8
# search_test 111, 213, last: 266
search_test 111.0


# Tests for zsh-36926
# string_test 3.1, 7.6, 9.1, 11, 11, 12, 10, 11, 11, 12, 12, 11, last: 10
string_test 10.0
# array_test 1.5, 2.3, 3.3, 3.7, 4.3, 5, 5.8, 6.8, 7.5, 7.8, 8.1, 8.4, last: 8.6
array_test 5.1
# function_test 2.8, 2.6, 2.6, 2.7, 2.9, 2.8, 2.8, 2.9, last: 2.6
function_test 2.7
# search_test 108, 214, last: 266
search_test 108.0


# Tests for zsh-36943
# string_test 4.6, 8.6, 9.5, 10, 10, 11, last: 8.9
string_test 8.5
# array_test 1.3, 2.4, 2.9, 3.7, 4.3, 5.1, 5.9, 6.9, 7.6, 7.8, 8.2, 8.4, last: 8.7
array_test 5.1
# function_test 2.6, 2.6, 2.7, 2.8, 2.6, 2.9, 2.8, 2.8, last: 2.6
function_test 2.7
# search_test 111, 216, last: 266
search_test 111.0


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-25  7:35                                                                             ` Sebastian Gniazdowski
@ 2015-10-25 17:23                                                                               ` Bart Schaefer
  2015-10-25 20:56                                                                                 ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-25 17:23 UTC (permalink / raw)
  To: Sebastian Gniazdowski, zsh-workers

On Oct 25,  8:35am, Sebastian Gniazdowski wrote:
}
} I calibrated for string_test, it's interesting how much slower
} array_test is on OS X. My machine is 2.3 GHz, maybe that's some reason
} (your is faster IIRC).


diff --git a/Src/mem.c b/Src/mem.c
index 158ad96..3138bb1 100644
--- a/Src/mem.c
+++ b/Src/mem.c
@@ -79,9 +79,16 @@
 
 #include <sys/mman.h>
 
+#if 0
+/*
+ * This change was designed to enable use of memory mapping on MacOS.
+ * However, performance tests indicate that MacOS mapped regions are
+ * significantly slower to allocate than memory from malloc().
+ */
 #if defined(MAP_ANON) && !defined(MAP_ANONYMOUS)
 #define MAP_ANONYMOUS MAP_ANON
 #endif
+#endif /* 0 */
 
 #if defined(MAP_ANONYMOUS) && defined(MAP_PRIVATE)
 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-25 17:23                                                                               ` Bart Schaefer
@ 2015-10-25 20:56                                                                                 ` Sebastian Gniazdowski
  2015-10-26  0:49                                                                                   ` Bart Schaefer
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-25 20:56 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 61 bytes --]

Turns out that's not it

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: perf-test-36943-nommap.txt --]
[-- Type: text/plain, Size: 1076 bytes --]

Running [zsh-head-8e9a68a]:              string_test      22085,99
Running [zsh-36926]:                     string_test      48248,70
Running [zsh-36943]:                     string_test      21867,31
Running [zsh-36943-nommap]:              string_test      27014,77

Running [zsh-head-8e9a68a]:              array_test       49213,94
Running [zsh-36926]:                     array_test       48707,25
Running [zsh-36943]:                     array_test       49611,80
Running [zsh-36943-nommap]:              array_test       50140,97

Running [zsh-head-8e9a68a]:              function_test    42722,47
Running [zsh-36926]:                     function_test    42292,10
Running [zsh-36943]:                     function_test    42579,06
Running [zsh-36943-nommap]:              function_test    34560,54

Running [zsh-head-8e9a68a]:              search_test       7849,94
Running [zsh-36926]:                     search_test       8117,99
Running [zsh-36943]:                     search_test       7759,65
Running [zsh-36943-nommap]:              search_test       7928,84


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-25 20:56                                                                                 ` Sebastian Gniazdowski
@ 2015-10-26  0:49                                                                                   ` Bart Schaefer
  2015-10-26  7:41                                                                                     ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Schaefer @ 2015-10-26  0:49 UTC (permalink / raw)
  To: zsh-workers

On Oct 25,  9:56pm, Sebastian Gniazdowski wrote:
}
} Turns out that's not it

Hm.  If it shows up between 5.0.7 and 5.0.8 and can't be explained by
different compilers or different compiler flags, then other culprits
might be:

 *  36ec763d - 33976: fix overlapping strcpy()
 *  one of the several commits that added queue_signals()

That mmap change was the only thing in mem.c from 5.0.7 -> 5.0.8.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-26  0:49                                                                                   ` Bart Schaefer
@ 2015-10-26  7:41                                                                                     ` Sebastian Gniazdowski
  2015-10-26  7:47                                                                                       ` Sebastian Gniazdowski
  0 siblings, 1 reply; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-26  7:41 UTC (permalink / raw)
  To: zsh-workers

On 26 October 2015 at 01:49, Bart Schaefer <schaefer@brasslantern.com> wrote:
> Hm.  If it shows up between 5.0.7 and 5.0.8 and can't be explained by
> different compilers or different compiler flags, then other culprits
> might be:
>
>  *  36ec763d - 33976: fix overlapping strcpy()
>  *  one of the several commits that added queue_signals()
>
> That mmap change was the only thing in mem.c from 5.0.7 -> 5.0.8.

I ran git bisect and it pointed to the mmap commit you mentioned
earlier, i.e. to 1f62d8d0be7290883991fb4de640fbecb7714917. Array test
wasn't changing during the search, i.e. was continuously slower.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Slowdown around 5.0.5-dev-0
  2015-10-26  7:41                                                                                     ` Sebastian Gniazdowski
@ 2015-10-26  7:47                                                                                       ` Sebastian Gniazdowski
  0 siblings, 0 replies; 61+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-26  7:47 UTC (permalink / raw)
  To: zsh-workers

PS. My OS is Mavericks, curious how this behaves on Yosemite. Have
quite broken system, planning to upgrade to Yosemite, but maybe
someone has such revision and could test this, and I would stay on
Mavericks as it serves well as testing platform for my scripts and
Zsh.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2015-10-26  7:48 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-10 10:54 Slowdown around 5.0.5-dev-0 Sebastian Gniazdowski
2015-10-10 17:58 ` Bart Schaefer
2015-10-10 18:11   ` Sebastian Gniazdowski
2015-10-10 18:32     ` Sebastian Gniazdowski
2015-10-11  0:06       ` Bart Schaefer
2015-10-11  6:20         ` Bart Schaefer
2015-10-11  8:39           ` Sebastian Gniazdowski
2015-10-11 16:17             ` Bart Schaefer
2015-10-11 16:48               ` Sebastian Gniazdowski
2015-10-11 17:31                 ` Bart Schaefer
2015-10-11 18:05                   ` Sebastian Gniazdowski
2015-10-11 21:22                     ` Bart Schaefer
2015-10-12  8:21                       ` Sebastian Gniazdowski
2015-10-12 14:01                         ` Bart Schaefer
2015-10-12 16:50                           ` Sebastian Gniazdowski
2015-10-13  0:33                             ` Bart Schaefer
2015-10-13  8:21                               ` Sebastian Gniazdowski
2015-10-13 15:52                                 ` Bart Schaefer
2015-10-14  6:50                                   ` Sebastian Gniazdowski
2015-10-14 13:27                                   ` Peter Stephenson
2015-10-14 16:25                                     ` Bart Schaefer
2015-10-14 16:50                                       ` Bart Schaefer
2015-10-15  4:32                                         ` Bart Schaefer
2015-10-15 13:03                                           ` Sebastian Gniazdowski
2015-10-16  0:35                                             ` Bart Schaefer
2015-10-17  9:12                                               ` Sebastian Gniazdowski
2015-10-17  9:24                                                 ` Sebastian Gniazdowski
2015-10-18 16:19                                                 ` Bart Schaefer
2015-10-18 20:40                                                   ` Sebastian Gniazdowski
2015-10-18 21:07                                                     ` Bart Schaefer
2015-10-18 21:31                                                       ` Sebastian Gniazdowski
2015-10-19 17:21                                                         ` Bart Schaefer
2015-10-22 12:49                                                           ` Sebastian Gniazdowski
2015-10-22 15:00                                                             ` Bart Schaefer
2015-10-22 16:28                                                               ` Sebastian Gniazdowski
2015-10-22 16:33                                                                 ` Sebastian Gniazdowski
2015-10-23 15:40                                                               ` Sebastian Gniazdowski
2015-10-23 15:57                                                                 ` Sebastian Gniazdowski
2015-10-23 19:26                                                                   ` Bart Schaefer
2015-10-23 23:50                                                                     ` Bart Schaefer
2015-10-24  6:09                                                                       ` Sebastian Gniazdowski
2015-10-24  7:37                                                                         ` Sebastian Gniazdowski
2015-10-24  8:04                                                                           ` Sebastian Gniazdowski
2015-10-24 19:39                                                                           ` Bart Schaefer
2015-10-25  7:35                                                                             ` Sebastian Gniazdowski
2015-10-25 17:23                                                                               ` Bart Schaefer
2015-10-25 20:56                                                                                 ` Sebastian Gniazdowski
2015-10-26  0:49                                                                                   ` Bart Schaefer
2015-10-26  7:41                                                                                     ` Sebastian Gniazdowski
2015-10-26  7:47                                                                                       ` Sebastian Gniazdowski
2015-10-24  6:27                                                                       ` Sebastian Gniazdowski
2015-10-24 10:54                                                                       ` Sebastian Gniazdowski
2015-10-24 11:25                                                                         ` Sebastian Gniazdowski
2015-10-24 16:31                                                                         ` Bart Schaefer
2015-10-24 16:41                                                                           ` Bart Schaefer
2015-10-23  6:32                                                             ` Sebastian Gniazdowski
2015-10-16  0:37                                             ` Bart Schaefer
2015-10-13 13:07                         ` Sebastian Gniazdowski
2015-10-13 13:31                           ` Sebastian Gniazdowski
2015-10-12 12:05                       ` Sebastian Gniazdowski
2015-10-12 15:13                         ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).