zsh-workers
 help / color / mirror / code / Atom feed
* the source of slow large for loops
@ 2011-05-07 17:00 Mikael Magnusson
  2011-05-07 19:11 ` Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Mikael Magnusson @ 2011-05-07 17:00 UTC (permalink / raw)
  To: zsh workers

I don't know if anyone looked into why things like
for i in {1..700000}; do true; done
is extremely slow in zsh, so I did now. Turns out zhalloc is extremely
slow, and for the above loop, one particular line runs 12049901132
times according to gcov. (It takes around 10 minutes). If I add early
returns of malloc() and realloc() to zhalloc and hrealloc, the loop
runs in 3 seconds (but leaks a couple of hundred megabytes memory).

This is mostly a FYI thing, not really intending to do anything about
it or expecting anything, but I've always wondered where the slowness
was and now I know. :)

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: the source of slow large for loops
  2011-05-07 17:00 the source of slow large for loops Mikael Magnusson
@ 2011-05-07 19:11 ` Bart Schaefer
  2011-05-07 19:21   ` Mikael Magnusson
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Schaefer @ 2011-05-07 19:11 UTC (permalink / raw)
  To: zsh workers

On May 7,  7:00pm, Mikael Magnusson wrote:
} Subject: the source of slow large for loops
}
} I don't know if anyone looked into why things like
} for i in {1..700000}; do true; done
} is extremely slow in zsh, so I did now. Turns out zhalloc is extremely
} slow

Hmm ... I think you've misdiagnosed.  zhalloc() is reasonably fast.
What's amazingly slow is freeheap().  If I stick a "return;" at the
top of freeheap() the above loop runs in just over 4 seconds on my
3GHz P4.  (AFAICT the memory is still eventually freed by popheap(),
it just grows a lot more before releasing any.)

} and for the above loop, one particular line runs 12049901132
} times according to gcov.

It would have been nice if you'd told us which line. :-)  However, I
don't think that's actually the problem.  There are a couple of places
where zhalloc() and friends search the heap for free space by doing a
linear scan over a linked list, which will cause a huge number of loop
iterations, but each of those iterations is at most a couple of machine
instructions.  You need to look at how much time something takes, not
just how often it's done.

The real time sink is this bit of freeheap():

    for (h = heaps; h; h = hn) {
	hn = h->next;
	if (h->sp) {
#ifdef ZSH_MEM_DEBUG
	    memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
#endif
	    h->used = h->sp->used;
	    if (!fheap && h->used < ARENA_SIZEOF(h))
		fheap = h;
	    hl = h;
	} else {
#ifdef USE_MMAP
	    munmap((void *) h, h->size);
#else
	    zfree(h, HEAPSIZE);
#endif
	}
    }

Of course if you've used --enable-zsh-mem-debug for configure then this
is also zeroing all the memory on each free and the performance is even
worse, but that isn't really the issue.

Most specifically, take out the stuff inside "if (h->sp)" and the above
700k-iterations loop runs in 11 seconds.

If I understand what's going on here ...

Index: Src/mem.c
===================================================================
RCS file: /extra/cvsroot/zsh/zsh-4.0/Src/mem.c,v
retrieving revision 1.10
diff -c -r1.10 mem.c
--- Src/mem.c	4 Nov 2008 04:47:53 -0000	1.10
+++ Src/mem.c	7 May 2011 19:06:23 -0000
@@ -220,8 +220,28 @@
     h_free++;
 #endif
 
+    /* At this point we used to do:
     fheap = NULL;
-    for (h = heaps; h; h = hn) {
+     *
+     * When pushheap() is called, it sweeps over the entire heaps list of
+     * arenas and marks every one of them with the amount of free space in
+     * that arena at that moment.  zhalloc() is then allowed to grab bits
+     * out of any of those arenas that have free space.
+     *
+     * With the above reset of fheap, the loop below sweeps back over the
+     * entire heap list again, resetting the free space in every arena to
+     * the amount stashed by pushheap() and finding the first arena with
+     * free space to optimize zhalloc()'s next search.  When there's a lot
+     * of stuff already on the heap, this is an enormous amount of work,
+     * and peformance goes to hell.
+     *
+     * However, there doesn't seem to be any reason to reset fheap before
+     * beginning this loop.  Either it's already correct, or it has never
+     * been set and this loop will do it, or it'll be reset from scratch
+     * on the next popheap().  So all that's needed here is to pick up
+     * the scan wherever the last pass [or the last popheap()] left off.
+     */
+    for (h = (fheap ? fheap : heaps); h; h = hn) {
 	hn = h->next;
 	if (h->sp) {
 #ifdef ZSH_MEM_DEBUG
@@ -242,7 +262,7 @@
     if (hl)
 	hl->next = NULL;
     else
-	heaps = NULL;
+	heaps = fheap = NULL;
 
     unqueue_signals();
 }


-- 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: the source of slow large for loops
  2011-05-07 19:11 ` Bart Schaefer
@ 2011-05-07 19:21   ` Mikael Magnusson
  2011-05-07 19:49     ` Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Mikael Magnusson @ 2011-05-07 19:21 UTC (permalink / raw)
  To: zsh workers

On 7 May 2011 21:11, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On May 7,  7:00pm, Mikael Magnusson wrote:
> } Subject: the source of slow large for loops
> }
> } I don't know if anyone looked into why things like
> } for i in {1..700000}; do true; done
> } is extremely slow in zsh, so I did now. Turns out zhalloc is extremely
> } slow
>
> Hmm ... I think you've misdiagnosed.  zhalloc() is reasonably fast.
> What's amazingly slow is freeheap().  If I stick a "return;" at the
> top of freeheap() the above loop runs in just over 4 seconds on my
> 3GHz P4.  (AFAICT the memory is still eventually freed by popheap(),
> it just grows a lot more before releasing any.)

Ah, and when I made zhalloc a nop, freeheap() is fast again because
there are no heaps.

> } and for the above loop, one particular line runs 12049901132
> } times according to gcov.
>
> It would have been nice if you'd told us which line. :-)  However, I
> don't think that's actually the problem.  There are a couple of places
> where zhalloc() and friends search the heap for free space by doing a
> linear scan over a linked list, which will cause a huge number of loop
> iterations, but each of those iterations is at most a couple of machine
> instructions.  You need to look at how much time something takes, not
> just how often it's done.

Yeah, I actually tried gprof first, but I was unable to start zsh
then, it just exited with "profile signal" or something like that.

> The real time sink is this bit of freeheap():
>
>    for (h = heaps; h; h = hn) {
>        hn = h->next;
>        if (h->sp) {
> #ifdef ZSH_MEM_DEBUG
>            memset(arena(h) + h->sp->used, 0xff, h->used - h->sp->used);
> #endif
>            h->used = h->sp->used;
>            if (!fheap && h->used < ARENA_SIZEOF(h))
>                fheap = h;
>            hl = h;
>        } else {
> #ifdef USE_MMAP
>            munmap((void *) h, h->size);
> #else
>            zfree(h, HEAPSIZE);
> #endif
>        }
>    }
>
> Of course if you've used --enable-zsh-mem-debug for configure then this
> is also zeroing all the memory on each free and the performance is even
> worse, but that isn't really the issue.
>
> Most specifically, take out the stuff inside "if (h->sp)" and the above
> 700k-iterations loop runs in 11 seconds.
>
> If I understand what's going on here ...
>
> Index: Src/mem.c
> ===================================================================
> RCS file: /extra/cvsroot/zsh/zsh-4.0/Src/mem.c,v
> retrieving revision 1.10
> diff -c -r1.10 mem.c
> --- Src/mem.c   4 Nov 2008 04:47:53 -0000       1.10
> +++ Src/mem.c   7 May 2011 19:06:23 -0000
> @@ -220,8 +220,28 @@
>     h_free++;
>  #endif
>
> +    /* At this point we used to do:
>     fheap = NULL;
> -    for (h = heaps; h; h = hn) {
> +     *
> +     * When pushheap() is called, it sweeps over the entire heaps list of
> +     * arenas and marks every one of them with the amount of free space in
> +     * that arena at that moment.  zhalloc() is then allowed to grab bits
> +     * out of any of those arenas that have free space.
> +     *
> +     * With the above reset of fheap, the loop below sweeps back over the
> +     * entire heap list again, resetting the free space in every arena to
> +     * the amount stashed by pushheap() and finding the first arena with
> +     * free space to optimize zhalloc()'s next search.  When there's a lot
> +     * of stuff already on the heap, this is an enormous amount of work,
> +     * and peformance goes to hell.
> +     *
> +     * However, there doesn't seem to be any reason to reset fheap before
> +     * beginning this loop.  Either it's already correct, or it has never
> +     * been set and this loop will do it, or it'll be reset from scratch
> +     * on the next popheap().  So all that's needed here is to pick up
> +     * the scan wherever the last pass [or the last popheap()] left off.
> +     */
> +    for (h = (fheap ? fheap : heaps); h; h = hn) {
>        hn = h->next;
>        if (h->sp) {
>  #ifdef ZSH_MEM_DEBUG
> @@ -242,7 +262,7 @@
>     if (hl)
>        hl->next = NULL;
>     else
> -       heaps = NULL;
> +       heaps = fheap = NULL;
>
>     unqueue_signals();
>  }

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: the source of slow large for loops
  2011-05-07 19:21   ` Mikael Magnusson
@ 2011-05-07 19:49     ` Bart Schaefer
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Schaefer @ 2011-05-07 19:49 UTC (permalink / raw)
  To: zsh workers

On May 7,  9:21pm, Mikael Magnusson wrote:
}
} > You need to look at how much time something takes, not
} > just how often it's done.
} 
} Yeah, I actually tried gprof first, but I was unable to start zsh
} then, it just exited with "profile signal" or something like that.

All I did was start up the loop and then repeatedly connect to the
program with gdb and look at the stack trace.  Five times out of
six it was in freeheap() when I attached. :-)

Unfortunately my patch doesn't help with the pathological behavior
of ary+=(elem).


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-05-07 19:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-07 17:00 the source of slow large for loops Mikael Magnusson
2011-05-07 19:11 ` Bart Schaefer
2011-05-07 19:21   ` Mikael Magnusson
2011-05-07 19:49     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).