* argv subscript range uses too many memory @ 2012-11-08 8:40 Han Pingtian 2012-11-08 10:02 ` Peter Stephenson 0 siblings, 1 reply; 6+ messages in thread From: Han Pingtian @ 2012-11-08 8:40 UTC (permalink / raw) To: zsh-user Hi, This script on my laptop uses so much memory so that being killed by oom-killer: arr=(~/**/*) set -- "${(@)arr}" while ((ARGC)) do print -- "${argv[1,3]}" shift 3 done But if change the subscript range to single array elements, it works just fine. So I suspect there is something wrong with the subscript range. Any thoughts? Thanks in advance. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: argv subscript range uses too many memory 2012-11-08 8:40 argv subscript range uses too many memory Han Pingtian @ 2012-11-08 10:02 ` Peter Stephenson 2012-11-10 10:58 ` Han Pingtian 0 siblings, 1 reply; 6+ messages in thread From: Peter Stephenson @ 2012-11-08 10:02 UTC (permalink / raw) To: zsh-user On Thu, 08 Nov 2012 16:40:01 +0800 Han Pingtian <hanpt@linux.vnet.ibm.com> wrote: > This script on my laptop uses so much memory so that being killed by > oom-killer: > > arr=(~/**/*) > set -- "${(@)arr}" > > while ((ARGC)) > do > print -- "${argv[1,3]}" > shift 3 > done > > But if change the subscript range to single array elements, it works > just fine. So I suspect there is something wrong with the subscript > range. > > Any thoughts? I think you're right --- I think that was what was causing the memory usage for zargs to go haywire for me last week. I couldn't see anything obviously wrong when I looked, but it may be a pathology with the way memory is allocated and freed. pws ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: argv subscript range uses too many memory 2012-11-08 10:02 ` Peter Stephenson @ 2012-11-10 10:58 ` Han Pingtian 2012-11-10 14:57 ` Bart Schaefer 0 siblings, 1 reply; 6+ messages in thread From: Han Pingtian @ 2012-11-10 10:58 UTC (permalink / raw) To: zsh-users Looks like when running with 'print -- "$argv[1,3]", the call trace is something like this: (gdb) bt #0 mmap_heap_alloc (n=0x7fff0852e880) at mem.c:449 #1 0x000000000045f5f9 in zhalloc (size=1594456) at mem.c:542 #2 0x00000000004a4dfa in arrdup (s=0x313b008) at utils.c:3648 #3 0x0000000000471fa9 in getarrvalue (v=0x7fff0852ea20) at params.c:2174 #4 0x00000000004961bf in paramsubst (l=0x7f51d6dd0bb0, n=0x7f51d6dd0bf8, str=0x7fff0852ee38, qt=1, pf_flags=0) at subst.c:2400 #5 0x0000000000491cbd in stringsubst (list=0x7f51d6dd0bb0, node=0x7f51d6dd0bf8, pf_flags=0, asssub=1) at subst.c:236 #6 0x0000000000491089 in prefork (list=0x7f51d6dd0bb0, flags=1) at subst.c:77 #7 0x000000000042dafb in execcmd (state=0x7fff0852f760, input=0, output=0, how=18, last1=2) at exec.c:2579 #8 0x000000000042b410 in execpline2 (state=0x7fff0852f760, pcode=323, how=18, input=0, output=0, last1=0) at exec.c:1677 #9 0x000000000042a56e in execpline (state=0x7fff0852f760, slcode=5122, how=18, last1=0) at exec.c:1462 #10 0x0000000000429c2c in execlist (state=0x7fff0852f760, dont_change_job=0, exiting=0) at exec.c:1245 #11 0x000000000042968a in execode (p=0x7f51d6dd0af0, dont_change_job=0, exiting=0, context=0x4af417 "toplevel") at exec.c:1057 #12 0x0000000000447dcb in loop (toplevel=1, justonce=0) at init.c:185 #13 0x000000000044b3f4 in zsh_main (argc=1, argv=0x7fff0852f938) at init.c:1616 #14 0x000000000040e034 in main (argc=1, argv=0x7fff0852f938) at ./main.c:93 (gdb) But if running with 'print -- "$argv[1] $argv[2] $argv[3]", the call trace is something like this: (gdb) bt #0 mmap_heap_alloc (n=0x7fff0852f4a0) at mem.c:449 #1 0x000000000045f5f9 in zhalloc (size=16) at mem.c:542 #2 0x0000000000490a1d in dupstring (s=0x4baa95 "%B%S%#%s%b") at string.c:39 #3 0x0000000000488f5c in promptexpand (s=0x4baa95 "%B%S%#%s%b", ns=1, rs=0x0, Rs=0x0, txtchangep=0x0) at prompt.c:185 #4 0x000000000049f6d6 in preprompt () at utils.c:1307 #5 0x0000000000447b11 in loop (toplevel=1, justonce=0) at init.c:121 #6 0x000000000044b3f4 in zsh_main (argc=1, argv=0x7fff0852f938) at init.c:1616 #7 0x000000000040e034 in main (argc=1, argv=0x7fff0852f938) at ./main.c:93 And the outputs showed before hitting the break point mmap_heap_alloc(). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: argv subscript range uses too many memory 2012-11-10 10:58 ` Han Pingtian @ 2012-11-10 14:57 ` Bart Schaefer 2012-11-20 13:04 ` Han Pingtian 0 siblings, 1 reply; 6+ messages in thread From: Bart Schaefer @ 2012-11-10 14:57 UTC (permalink / raw) To: zsh-users Further discussion probably should be re-routed to zsh-workers. On Nov 10, 6:58pm, Han Pingtian wrote: } } Looks like when running with 'print -- "$argv[1,3]", the call trace is } something like this: } } (gdb) bt } #0 mmap_heap_alloc (n=0x7fff0852e880) at mem.c:449 } #1 0x000000000045f5f9 in zhalloc (size=1594456) at mem.c:542 } #2 0x00000000004a4dfa in arrdup (s=0x313b008) at utils.c:3648 } #3 0x0000000000471fa9 in getarrvalue (v=0x7fff0852ea20) at params.c:2174 Ah, yes. Array slices are implemented by copying the entire array and then extracting the desired subset from the copy. Individual array elements are string references and therefore copy only the one element. Unfortunately this is pretty deeply ingrained in zsh's parameter expansion implementation and likely requires some serious rewriting to fix. It might be easier to come up with a way to garbage-collect more frequently. In a loop, the heap allocations are not popped until the loop is done, IIRC, so you'll end up with a large number of copies of the original array in the heap with slice results pointing into different parts of each copy. Maybe there's a narrower scope in which a pushheap/popheap could be inserted. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: argv subscript range uses too many memory 2012-11-10 14:57 ` Bart Schaefer @ 2012-11-20 13:04 ` Han Pingtian 2012-11-20 17:03 ` Bart Schaefer 0 siblings, 1 reply; 6+ messages in thread From: Han Pingtian @ 2012-11-20 13:04 UTC (permalink / raw) To: zsh-users; +Cc: schaefer On Sat, Nov 10, 2012 at 06:57:09AM -0800, Bart Schaefer wrote: > Further discussion probably should be re-routed to zsh-workers. > > On Nov 10, 6:58pm, Han Pingtian wrote: > } > } Looks like when running with 'print -- "$argv[1,3]", the call trace is > } something like this: > } > } (gdb) bt > } #0 mmap_heap_alloc (n=0x7fff0852e880) at mem.c:449 > } #1 0x000000000045f5f9 in zhalloc (size=1594456) at mem.c:542 > } #2 0x00000000004a4dfa in arrdup (s=0x313b008) at utils.c:3648 > } #3 0x0000000000471fa9 in getarrvalue (v=0x7fff0852ea20) at params.c:2174 > > Ah, yes. Array slices are implemented by copying the entire array and > then extracting the desired subset from the copy. Individual array > elements are string references and therefore copy only the one element. > > Unfortunately this is pretty deeply ingrained in zsh's parameter > expansion implementation and likely requires some serious rewriting to > fix. It might be easier to come up with a way to garbage-collect more > frequently. > > In a loop, the heap allocations are not popped until the loop is done, > IIRC, so you'll end up with a large number of copies of the original > array in the heap with slice results pointing into different parts of > each copy. Maybe there's a narrower scope in which a pushheap/popheap > could be inserted. Looks like I have found the reason of this problem. If I revert this commit: commit 61505654942cb9895a9811fde1dcbb662fd7d66a Author: Bart Schaefer <barts@users.sourceforge.net> Date: Sat May 7 19:32:57 2011 +0000 29175: optimize freeheap Then the problem will be fixed. Please have a look. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: argv subscript range uses too many memory 2012-11-20 13:04 ` Han Pingtian @ 2012-11-20 17:03 ` Bart Schaefer 0 siblings, 0 replies; 6+ messages in thread From: Bart Schaefer @ 2012-11-20 17:03 UTC (permalink / raw) To: zsh-users [C code discussion proceeds below, so those zsh-users who don't care about the internals can skip this message. Once again, we should move the rest of this thread to zsh-workers, thanks.] Han, thanks for the diagnosis. On Nov 20, 9:04pm, Han Pingtian wrote: } Subject: Re: argv subscript range uses too many memory } } On Sat, Nov 10, 2012 at 06:57:09AM -0800, Bart Schaefer wrote: } > In a loop, the heap allocations are not popped until the loop is done, } > IIRC, so you'll end up with a large number of copies of the original } > array in the heap with slice results pointing into different parts of } > each copy. Maybe there's a narrower scope in which a pushheap/popheap } > could be inserted. } } Looks like I have found the reason of this problem. If I revert this commit: } } commit 61505654942cb9895a9811fde1dcbb662fd7d66a } Author: Bart Schaefer <barts@users.sourceforge.net> } Date: Sat May 7 19:32:57 2011 +0000 } } 29175: optimize freeheap Aha; this jibes with both the excerpted text from me above and also with what PWS said in workers/30791: : What's puzzling me is that loops, including the "while" involved here, : execute freeheap() at the end of each iteration. That should restore : the pristine state of the loop According to the comment in workers/29175: + * However, there doesn't seem to be any reason to reset fheap before + * beginning this loop. Either it's already correct, or it has never + * been set and this loop will do it, or it'll be reset from scratch + * on the next popheap(). So all that's needed here is to pick up + * the scan wherever the last pass [or the last popheap()] left off. The consequence of this optimization is that, in the name of speed, we don't do a full-fledged garbage collection upon freeheap(), only upon popheap(). So the freeheap() on each loop iteration does not "restore the pristine state" and "a narrower scope [of] pushheap/popheap" would be one potential solution. Unfortunately as far as I can tell these two issues (the speed problem in last year's "the source of slow large for loops" thread and the space problem in this thread) are directly in conflict with one another. The speed problem requires that the heap not be fully garbage collected on every loop pass, but the space problem requires that it be collected at some point before the loop is done. Maybe there's a hybrid where freeheap() can examine the difference in position (fheaps - heaps) and do a full garbage collect only when the heap has become "too full". The question then is, what difference in position is large enough to trigger a collection? ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-11-20 17:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-11-08 8:40 argv subscript range uses too many memory Han Pingtian 2012-11-08 10:02 ` Peter Stephenson 2012-11-10 10:58 ` Han Pingtian 2012-11-10 14:57 ` Bart Schaefer 2012-11-20 13:04 ` Han Pingtian 2012-11-20 17:03 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).