zsh-users
 help / color / mirror / code / Atom feed
* argv subscript range uses too many memory
@ 2012-11-08  8:40 Han Pingtian
  2012-11-08 10:02 ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Han Pingtian @ 2012-11-08  8:40 UTC (permalink / raw)
  To: zsh-user

Hi,

This script on my laptop uses so much memory so that being killed by
oom-killer:

arr=(~/**/*)
set -- "${(@)arr}"

while ((ARGC))
do
   print -- "${argv[1,3]}"
   shift 3
done

But if change the subscript range to single array elements, it works
just fine. So I suspect there is something wrong with the subscript
range.

Any thoughts?

Thanks in advance.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: argv subscript range uses too many memory
  2012-11-08  8:40 argv subscript range uses too many memory Han Pingtian
@ 2012-11-08 10:02 ` Peter Stephenson
  2012-11-10 10:58   ` Han Pingtian
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2012-11-08 10:02 UTC (permalink / raw)
  To: zsh-user

On Thu, 08 Nov 2012 16:40:01 +0800
Han Pingtian <hanpt@linux.vnet.ibm.com> wrote:
> This script on my laptop uses so much memory so that being killed by
> oom-killer:
> 
> arr=(~/**/*)
> set -- "${(@)arr}"
> 
> while ((ARGC))
> do
>    print -- "${argv[1,3]}"
>    shift 3
> done
> 
> But if change the subscript range to single array elements, it works
> just fine. So I suspect there is something wrong with the subscript
> range.
> 
> Any thoughts?

I think you're right --- I think that was what was causing the memory
usage for zargs to go haywire for me last week.  I couldn't see anything
obviously wrong when I looked, but it may be a pathology with the way
memory is allocated and freed.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: argv subscript range uses too many memory
  2012-11-08 10:02 ` Peter Stephenson
@ 2012-11-10 10:58   ` Han Pingtian
  2012-11-10 14:57     ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Han Pingtian @ 2012-11-10 10:58 UTC (permalink / raw)
  To: zsh-users

Looks like when running with 'print -- "$argv[1,3]", the call trace is
something like this:

(gdb) bt
#0  mmap_heap_alloc (n=0x7fff0852e880) at mem.c:449
#1  0x000000000045f5f9 in zhalloc (size=1594456) at mem.c:542
#2  0x00000000004a4dfa in arrdup (s=0x313b008) at utils.c:3648
#3  0x0000000000471fa9 in getarrvalue (v=0x7fff0852ea20) at params.c:2174
#4  0x00000000004961bf in paramsubst (l=0x7f51d6dd0bb0, n=0x7f51d6dd0bf8, str=0x7fff0852ee38, qt=1, pf_flags=0) at subst.c:2400
#5  0x0000000000491cbd in stringsubst (list=0x7f51d6dd0bb0, node=0x7f51d6dd0bf8, pf_flags=0, asssub=1) at subst.c:236
#6  0x0000000000491089 in prefork (list=0x7f51d6dd0bb0, flags=1) at subst.c:77
#7  0x000000000042dafb in execcmd (state=0x7fff0852f760, input=0, output=0, how=18, last1=2) at exec.c:2579
#8  0x000000000042b410 in execpline2 (state=0x7fff0852f760, pcode=323, how=18, input=0, output=0, last1=0) at exec.c:1677
#9  0x000000000042a56e in execpline (state=0x7fff0852f760, slcode=5122, how=18, last1=0) at exec.c:1462
#10 0x0000000000429c2c in execlist (state=0x7fff0852f760, dont_change_job=0, exiting=0) at exec.c:1245
#11 0x000000000042968a in execode (p=0x7f51d6dd0af0, dont_change_job=0, exiting=0, context=0x4af417 "toplevel") at exec.c:1057
#12 0x0000000000447dcb in loop (toplevel=1, justonce=0) at init.c:185
#13 0x000000000044b3f4 in zsh_main (argc=1, argv=0x7fff0852f938) at init.c:1616
#14 0x000000000040e034 in main (argc=1, argv=0x7fff0852f938) at ./main.c:93
(gdb)

But if running with 'print -- "$argv[1] $argv[2] $argv[3]", the call
trace is something like this:

(gdb) bt
#0  mmap_heap_alloc (n=0x7fff0852f4a0) at mem.c:449
#1  0x000000000045f5f9 in zhalloc (size=16) at mem.c:542
#2  0x0000000000490a1d in dupstring (s=0x4baa95 "%B%S%#%s%b") at string.c:39
#3  0x0000000000488f5c in promptexpand (s=0x4baa95 "%B%S%#%s%b", ns=1, rs=0x0, Rs=0x0, txtchangep=0x0) at prompt.c:185
#4  0x000000000049f6d6 in preprompt () at utils.c:1307
#5  0x0000000000447b11 in loop (toplevel=1, justonce=0) at init.c:121
#6  0x000000000044b3f4 in zsh_main (argc=1, argv=0x7fff0852f938) at init.c:1616
#7  0x000000000040e034 in main (argc=1, argv=0x7fff0852f938) at ./main.c:93

And the outputs showed before hitting the break point mmap_heap_alloc().


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: argv subscript range uses too many memory
  2012-11-10 10:58   ` Han Pingtian
@ 2012-11-10 14:57     ` Bart Schaefer
  2012-11-20 13:04       ` Han Pingtian
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2012-11-10 14:57 UTC (permalink / raw)
  To: zsh-users

Further discussion probably should be re-routed to zsh-workers.

On Nov 10,  6:58pm, Han Pingtian wrote:
}
} Looks like when running with 'print -- "$argv[1,3]", the call trace is
} something like this:
} 
} (gdb) bt
} #0  mmap_heap_alloc (n=0x7fff0852e880) at mem.c:449
} #1  0x000000000045f5f9 in zhalloc (size=1594456) at mem.c:542
} #2  0x00000000004a4dfa in arrdup (s=0x313b008) at utils.c:3648
} #3  0x0000000000471fa9 in getarrvalue (v=0x7fff0852ea20) at params.c:2174

Ah, yes.  Array slices are implemented by copying the entire array and
then extracting the desired subset from the copy.  Individual array
elements are string references and therefore copy only the one element.

Unfortunately this is pretty deeply ingrained in zsh's parameter
expansion implementation and likely requires some serious rewriting to
fix.  It might be easier to come up with a way to garbage-collect more
frequently.

In a loop, the heap allocations are not popped until the loop is done,
IIRC, so you'll end up with a large number of copies of the original
array in the heap with slice results pointing into different parts of
each copy.  Maybe there's a narrower scope in which a pushheap/popheap
could be inserted.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: argv subscript range uses too many memory
  2012-11-10 14:57     ` Bart Schaefer
@ 2012-11-20 13:04       ` Han Pingtian
  2012-11-20 17:03         ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Han Pingtian @ 2012-11-20 13:04 UTC (permalink / raw)
  To: zsh-users; +Cc: schaefer

On Sat, Nov 10, 2012 at 06:57:09AM -0800, Bart Schaefer wrote:
> Further discussion probably should be re-routed to zsh-workers.
> 
> On Nov 10,  6:58pm, Han Pingtian wrote:
> }
> } Looks like when running with 'print -- "$argv[1,3]", the call trace is
> } something like this:
> } 
> } (gdb) bt
> } #0  mmap_heap_alloc (n=0x7fff0852e880) at mem.c:449
> } #1  0x000000000045f5f9 in zhalloc (size=1594456) at mem.c:542
> } #2  0x00000000004a4dfa in arrdup (s=0x313b008) at utils.c:3648
> } #3  0x0000000000471fa9 in getarrvalue (v=0x7fff0852ea20) at params.c:2174
> 
> Ah, yes.  Array slices are implemented by copying the entire array and
> then extracting the desired subset from the copy.  Individual array
> elements are string references and therefore copy only the one element.
> 
> Unfortunately this is pretty deeply ingrained in zsh's parameter
> expansion implementation and likely requires some serious rewriting to
> fix.  It might be easier to come up with a way to garbage-collect more
> frequently.
> 
> In a loop, the heap allocations are not popped until the loop is done,
> IIRC, so you'll end up with a large number of copies of the original
> array in the heap with slice results pointing into different parts of
> each copy.  Maybe there's a narrower scope in which a pushheap/popheap
> could be inserted.
Looks like I have found the reason of this problem. If I revert this commit:

commit 61505654942cb9895a9811fde1dcbb662fd7d66a
Author: Bart Schaefer <barts@users.sourceforge.net>
Date:   Sat May 7 19:32:57 2011 +0000

    29175: optimize freeheap

Then the problem will be fixed. Please have a look. Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: argv subscript range uses too many memory
  2012-11-20 13:04       ` Han Pingtian
@ 2012-11-20 17:03         ` Bart Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2012-11-20 17:03 UTC (permalink / raw)
  To: zsh-users

[C code discussion proceeds below, so those zsh-users who don't care about
the internals can skip this message.  Once again, we should move the rest
of this thread to zsh-workers, thanks.]

Han, thanks for the diagnosis.

On Nov 20,  9:04pm, Han Pingtian wrote:
} Subject: Re: argv subscript range uses too many memory
}
} On Sat, Nov 10, 2012 at 06:57:09AM -0800, Bart Schaefer wrote:
} > In a loop, the heap allocations are not popped until the loop is done,
} > IIRC, so you'll end up with a large number of copies of the original
} > array in the heap with slice results pointing into different parts of
} > each copy.  Maybe there's a narrower scope in which a pushheap/popheap
} > could be inserted.
} 
} Looks like I have found the reason of this problem. If I revert this commit:
} 
} commit 61505654942cb9895a9811fde1dcbb662fd7d66a
} Author: Bart Schaefer <barts@users.sourceforge.net>
} Date:   Sat May 7 19:32:57 2011 +0000
} 
}     29175: optimize freeheap

Aha; this jibes with both the excerpted text from me above and also with
what PWS said in workers/30791:

: What's puzzling me is that loops, including the "while" involved here,
: execute freeheap() at the end of each iteration.  That should restore
: the pristine state of the loop

According to the comment in workers/29175:

+     * However, there doesn't seem to be any reason to reset fheap before
+     * beginning this loop.  Either it's already correct, or it has never
+     * been set and this loop will do it, or it'll be reset from scratch
+     * on the next popheap().  So all that's needed here is to pick up
+     * the scan wherever the last pass [or the last popheap()] left off.

The consequence of this optimization is that, in the name of speed, we
don't do a full-fledged garbage collection upon freeheap(), only upon
popheap().  So the freeheap() on each loop iteration does not "restore
the pristine state" and "a narrower scope [of] pushheap/popheap" would
be one potential solution.

Unfortunately as far as I can tell these two issues (the speed problem
in last year's "the source of slow large for loops" thread and the space
problem in this thread) are directly in conflict with one another.  The
speed problem requires that the heap not be fully garbage collected on
every loop pass, but the space problem requires that it be collected at
some point before the loop is done.

Maybe there's a hybrid where freeheap() can examine the difference in
position (fheaps - heaps) and do a full garbage collect only when the
heap has become "too full".  The question then is, what difference in
position is large enough to trigger a collection?


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-20 17:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-08  8:40 argv subscript range uses too many memory Han Pingtian
2012-11-08 10:02 ` Peter Stephenson
2012-11-10 10:58   ` Han Pingtian
2012-11-10 14:57     ` Bart Schaefer
2012-11-20 13:04       ` Han Pingtian
2012-11-20 17:03         ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).