* Memory leak when working with undefined associative array keys & problems with unset @ 2017-09-16 20:57 Anssi Palin 2017-09-17 23:15 ` Bart Schaefer 0 siblings, 1 reply; 6+ messages in thread From: Anssi Palin @ 2017-09-16 20:57 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1986 bytes --] Hello, I've run into two problems with the way Zsh handles associative array keys, both tested on Linux with Zsh 5.4.1 compiled from source and 5.2 from the Ubuntu repositories. Zsh appears to permanently allocate some memory when just checking if a key is defined in an associative array. This behavior resulted in system memory getting exhausted on my machine with a script that checks random generated strings against a word lookup table. The following snippet replicates the bug and results in some 100 megabytes of memory being allocated: typeset -A a for (( i = 0; i < 1000000; i++ )); do (( ${+a[$i]} )) done Iterating over the same set of undefined keys a second time does not seem to cause more memory to be allocated. Moreover, unsetting or emptying the array doesn't appear to free all of the memory even though the array is destroyed or, in the latter case, emptied as expected. On 5.4.1 unsetting or emptying a second time looks to finally be freeing all of the taken up memory. Unsetting individual keys afterwards in a loop similar to the example above has the same problem, but unsetting each key immediately after checking it seems to mitigate this. The second issue I have pertains to special characters in associative array keys when unsetting them individually: $ key='hello * [ world' $ typeset -A a=("$key" val) $ unset "a[$key]" unset: a[hello * [ world]: invalid parameter name Since characters such as '\', '[', ']', '(' and ')' must be escaped but others like space or '*' shouldn't be, using the q or b parameter expansion flags is out of the question: $ unset "a[${(q)key}]" unset: a[hello\ \*\ \[\ world]: invalid parameter name $ typeset -p a typeset -A a=( 'hello * [ world' val ) The latter unset error message only shows up on 5.2 but in both cases the key remains set. Similar problems with special characters seem to affect [['s -v flag when checking for existence of associative array keys. Thank you. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Memory leak when working with undefined associative array keys & problems with unset 2017-09-16 20:57 Memory leak when working with undefined associative array keys & problems with unset Anssi Palin @ 2017-09-17 23:15 ` Bart Schaefer [not found] ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com> 2017-09-23 17:59 ` Peter Stephenson 0 siblings, 2 replies; 6+ messages in thread From: Bart Schaefer @ 2017-09-17 23:15 UTC (permalink / raw) To: Anssi Palin, zsh-workers@zsh.org On Sep 16, 8:57pm, Anssi Palin wrote: } } Zsh appears to permanently allocate some memory when just checking } if a key is defined in an associative array. Hmm. This has to do with order of operations. In order to test whether a[0] is (not) set, the calling code needs a "struct pm" object for which it can attempt to retrieve a value and/or check the PM_UNSET flag. For a hash-typed parameter, there's nowhere to temporarily store such an object except in the hashtable itself. The same level of evaluation that tests ${+...} also handles ${...::=...} which also needs that struct to store into, so we can't NOT allocate it, and because all parameter ops are by value rather than by reference we can't wait until after we've assigned to it to add it to the hash (the knowledge that "a[0]" is part of the hash "a" or even that the key is "0", is lost by the time we're ready to do the assignment -- we have only a handle to the object that resides at $a[0]). The naughty bit is approximately here: #0 getparamnode (ht=0x92c3898, nam=0xb7cb7818 "0") at ../../zsh-5.0/Src/params.c:496 #1 0x080baa56 in createparam (name=0xb7cb7818 "0", flags=570425344) at ../../zsh-5.0/Src/params.c:963 #2 0x080bb869 in getarg (str=0xbfee7eb8, inv=0xbfee7ebc, v=0xbfee8130, a2=0, w=0xbfee7ea8, prevcharlen=0xbfee7e9c, nextcharlen=0xbfee7e98) at ../../zsh-5.0/Src/params.c:1417 #3 0x080bcf10 in getindex (pptr=0xbfee7f14, v=0xbfee8130, flags=0) at ../../zsh-5.0/Src/params.c:1851 #4 0x080bd548 in fetchvalue (v=0xbfee8130, pptr=0xbfee8194, bracks=1, flags=0) at ../../zsh-5.0/Src/params.c:2069 #5 0x080e0193 in paramsubst (l=0xb7cb77a8, n=0xb7cb77c0, str=0xbfee8208, qt=0, pf_flags=0, ret_flags=0xbfee836c) at ../../zsh-5.0/Src/subst.c:2418 I think paramsubst() "knows" that it only wants to check set/unset, but that can't be passed down through fetchvalue() so getarg() doesn't know that it shouldn't create the hashtable element. } The second issue I have pertains to special characters in associative } array keys when unsetting them individually: } } $ key='hello * [ world' } $ typeset -A a=("$key" val) } $ unset "a[$key]" } unset: a[hello * [ world]: invalid parameter name Hmm, when examining ${a[$key]} we enter parse_subscript() with $key tokenized, but there's no way to get the tokenized string to reach the same point in the unset builtin (it's either already expanded, or not tokenized). Also ${a[$key]} passes sub=0 to parse_subscript() whereas "unset" always passes sub=1, but I'm uncertain how that may be relevant. This may argue for making unset a keyword ala typeset, but perhaps someone else has a clever approach. ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com>]
* Re: Memory leak when working with undefined associative array keys & problems with unset [not found] ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com> @ 2017-09-18 0:22 ` Bart Schaefer 0 siblings, 0 replies; 6+ messages in thread From: Bart Schaefer @ 2017-09-18 0:22 UTC (permalink / raw) To: zsh-workers [Replying to an off-list message w/Daniel's permission] On Sep 17, 11:23pm, Daniel Shahaf wrote: } } Bart Schaefer wrote on Sun, 17 Sep 2017 16:15 -0700: } > On Sep 16, 8:57pm, Anssi Palin wrote: } > } $ key='hello * [ world' } > } $ typeset -A a=("$key" val) } > } $ unset "a[$key]" } > } unset: a[hello * [ world]: invalid parameter name } > } > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key } > tokenized, but there's no way to get the tokenized string to reach } > the same point in the unset builtin Incidentally ksh93 has this same problem, at least as of 2012: $ echo $KSH_VERSION Version AJM 93u+ 2012-08-01 $ unset a["$key"] ksh: unset: a[hello * [ world]: invalid variable name $ unset "a[$key]" ksh: unset: a[hello * [ world]: invalid variable name $ unset "a['$key']" ksh: unset: a['hello * [ world']: invalid variable name This is probably why we've ignored this issue, to date. } Couldn't we just change the interface and keep it a builtin? } The point being to pass the subscript separately. I don't know whether e.g. POSIX would squawk about more options to unset. There's also this, now that we've added the [key]=val syntax: torch% a=( ['hello * [ world']=(x y z) ) torch% typeset -p a typeset -A a=( ['hello * [ world']='(x y z)' ) torch% a=( ['hello * [ world']=() ) zsh: parse error near `()' Since the empty parens there are currently a parse error, we could make that work the way it does for plain arrays and unset the key. However, in ksh note there are no quotes around the parenthesized value (is that because ksh allows hash elements to be arrays?) and the empty parens version is permitted: $ a=(['hello * [ world']=(x y z)) $ typeset -p a typeset -A a=(['hello * [ world']=(x y z) ) $ a=(['hello * [ world']=() ) $ typeset -p a typeset -A a=(['hello * [ world']=()) So we may prefer to go with [increased] ksh compatibility here. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Memory leak when working with undefined associative array keys & problems with unset 2017-09-17 23:15 ` Bart Schaefer [not found] ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com> @ 2017-09-23 17:59 ` Peter Stephenson 2017-09-24 1:48 ` Bart Schaefer 1 sibling, 1 reply; 6+ messages in thread From: Peter Stephenson @ 2017-09-23 17:59 UTC (permalink / raw) To: zsh-workers@zsh.org On Sun, 17 Sep 2017 16:15:14 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Sep 16, 8:57pm, Anssi Palin wrote: > } > } Zsh appears to permanently allocate some memory when just checking > } if a key is defined in an associative array. > > Hmm. > > This has to do with order of operations. In order to test whether a[0] > is (not) set, the calling code needs a "struct pm" object for which it > can attempt to retrieve a value and/or check the PM_UNSET flag. For a > hash-typed parameter, there's nowhere to temporarily store such an > object except in the hashtable itself. The same level of evaluation > that tests ${+...} also handles ${...::=...} which also needs that > struct to store into, so we can't NOT allocate it, and because all > parameter ops are by value rather than by reference we can't wait until > after we've assigned to it to add it to the hash (the knowledge that > "a[0]" is part of the hash "a" or even that the key is "0", is lost > by the time we're ready to do the assignment -- we have only a handle > to the object that resides at $a[0]). Can we really not do something like the patch below? There may well be other cases to check, as the logic is complicated, but I think this is the obvious case. If it breaks anything, then something is missing from the parameter tests. > } The second issue I have pertains to special characters in associative > } array keys when unsetting them individually: > } > } $ key='hello * [ world' > } $ typeset -A a=("$key" val) > } $ unset "a[$key]" > } unset: a[hello * [ world]: invalid parameter name > > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key > tokenized, but there's no way to get the tokenized string to reach > the same point in the unset builtin (it's either already expanded, > or not tokenized). Also ${a[$key]} passes sub=0 to parse_subscript() > whereas "unset" always passes sub=1, but I'm uncertain how that may > be relevant. I gave up on this ages ago, frankly. We certainly a need way of raw key reference, but I don't think we've got one, and each time we tweak it we seem to make it even more tortuous. Possibly we need yet another flag to say "FOR ****'S SAKE DON'T TRY TO BE CLEVER NO I MEAN IT ****" (but without the Tarantino script edit). First hunk is completely gratuitous but I noticed it when debugging. pws diff --git a/Src/hist.c b/Src/hist.c index da5a8b2..177250f 100644 --- a/Src/hist.c +++ b/Src/hist.c @@ -1083,7 +1083,6 @@ hbegin(int dohist) } else histactive = HA_ACTIVE | HA_NOINC; - hf = getsparam("HISTFILE"); /* * For INCAPPENDHISTORYTIME, when interactive, save the history here * as it gives a better estimate of the times of commands. @@ -1104,8 +1103,10 @@ hbegin(int dohist) */ if (isset(INCAPPENDHISTORYTIME) && !isset(SHAREHISTORY) && !isset(INCAPPENDHISTORY) && - !(histactive & HA_NOINC) && !strin && histsave_stack_pos == 0) + !(histactive & HA_NOINC) && !strin && histsave_stack_pos == 0) { + hf = getsparam("HISTFILE"); savehistfile(hf, 0, HFILE_USE_OPTIONS | HFILE_FAST); + } } /**/ diff --git a/Src/params.c b/Src/params.c index d71dfde..8683777 100644 --- a/Src/params.c +++ b/Src/params.c @@ -1204,7 +1204,7 @@ isident(char *s) /**/ static zlong getarg(char **str, int *inv, Value v, int a2, zlong *w, - int *prevcharlen, int *nextcharlen) + int *prevcharlen, int *nextcharlen, int flags) { int hasbeg = 0, word = 0, rev = 0, ind = 0, down = 0, l, i, ishash; int keymatch = 0, needtok = 0, arglen, len, inpar = 0; @@ -1407,6 +1407,8 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w, if (ishash) { HashTable ht = v->pm->gsu.h->getfn(v->pm); if (!ht) { + if (flags & SCANPM_CHECKING) + return isset(KSHARRAYS) ? 1 : 0; ht = newparamtable(17, v->pm->node.nam); v->pm->gsu.h->setfn(v->pm, ht); } @@ -1848,7 +1850,8 @@ getindex(char **pptr, Value v, int flags) zlong we = 0, dummy; int startprevlen, startnextlen; - start = getarg(&s, &inv, v, 0, &we, &startprevlen, &startnextlen); + start = getarg(&s, &inv, v, 0, &we, &startprevlen, &startnextlen, + flags); if (inv) { if (!v->isarr && start != 0) { @@ -1922,7 +1925,7 @@ getindex(char **pptr, Value v, int flags) if ((com = (*s == ','))) { s++; - end = getarg(&s, &inv, v, 1, &dummy, NULL, NULL); + end = getarg(&s, &inv, v, 1, &dummy, NULL, NULL, flags); } else { end = we ? we : start; } diff --git a/Src/subst.c b/Src/subst.c index 5df2a8b..d1827c2 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -2389,7 +2389,13 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, */ if (!subexp || aspar) { char *ov = val; - + int scanflags = hkeys | hvals; + if (arrasg) + scanflags |= SCANPM_ASSIGNING; + if (qt) + scanflags |= SCANPM_DQUOTED; + if (chkset) + scanflags |= SCANPM_CHECKING; /* * Second argument: decide whether to use the subexpression or * the string next on the line as the parameter name. @@ -2418,9 +2424,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, if (!(v = fetchvalue(&vbuf, (subexp ? &ov : &s), (wantt ? -1 : ((unset(KSHARRAYS) || inbrace) ? 1 : -1)), - hkeys|hvals| - (arrasg ? SCANPM_ASSIGNING : 0)| - (qt ? SCANPM_DQUOTED : 0))) || + scanflags)) || (v->pm && (v->pm->node.flags & PM_UNSET)) || (v->flags & VALFLAG_EMPTY)) vunset = 1; diff --git a/Src/zsh.h b/Src/zsh.h index 27642f2..76412b8 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -1906,6 +1906,7 @@ struct tieddata { * necessarily want to match multiple * elements */ +#define SCANPM_CHECKING (1<<10) /* Check if set, no need to create */ /* "$foo[@]"-style substitution * Only sign bit is significant */ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Memory leak when working with undefined associative array keys & problems with unset 2017-09-23 17:59 ` Peter Stephenson @ 2017-09-24 1:48 ` Bart Schaefer 2017-09-24 11:37 ` Daniel Shahaf 0 siblings, 1 reply; 6+ messages in thread From: Bart Schaefer @ 2017-09-24 1:48 UTC (permalink / raw) To: zsh-workers@zsh.org On Sep 23, 6:59pm, Peter Stephenson wrote: } } Can we really not do something like the patch below? The point of my later "naughty bit" remark was that we'd have to do something exactly like that patch, so, yeah. } > } $ unset "a[$key]" } > } unset: a[hello * [ world]: invalid parameter name } > } > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key } > tokenized, but there's no way to get the tokenized string to reach } > the same point in the unset builtin } } I gave up on this ages ago, frankly. We certainly a need way of raw key } reference, but I don't think we've got one, and each time we tweak it we } seem to make it even more tortuous. The basic problem is that a[$key]=foo is parsed in assignment context, but unset a[$key] is parsed in ordinary word context. I don't see any way around this other than either making unset a keyword, or adding a command-line option to unset as Daniel suggested. Neither of these alternatives makes me very happy. As noted, ksh93 has (had?) also thrown up its hands on this one. } Possibly we need yet another flag } to say "FOR ****'S SAKE DON'T TRY TO BE CLEVER NO I MEAN IT ****" (but } without the Tarantino script edit). I can't think of anywhere to attach such a flag. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Memory leak when working with undefined associative array keys & problems with unset 2017-09-24 1:48 ` Bart Schaefer @ 2017-09-24 11:37 ` Daniel Shahaf 0 siblings, 0 replies; 6+ messages in thread From: Daniel Shahaf @ 2017-09-24 11:37 UTC (permalink / raw) To: zsh-workers Bart Schaefer wrote on Sat, 23 Sep 2017 18:48 -0700: > On Sep 23, 6:59pm, Peter Stephenson wrote: > } > } $ unset "a[$key]" > } > } unset: a[hello * [ world]: invalid parameter name > } > > } > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key > } > tokenized, but there's no way to get the tokenized string to reach > } > the same point in the unset builtin > } > } I gave up on this ages ago, frankly. We certainly a need way of raw key > } reference, but I don't think we've got one, and each time we tweak it we > } seem to make it even more tortuous. > > The basic problem is that a[$key]=foo is parsed in assignment context, > but unset a[$key] is parsed in ordinary word context. I don't see any > way around this other than either making unset a keyword, or adding a > command-line option to unset as Daniel suggested. Neither of these > alternatives makes me very happy. We could also invent a separate builtin, "zunset". That way we avoid stomping on any options POSIX's 'unset' might grow in the future. This way, scalars, which are standard, are unset by 'unset' which too is standard, and assocs, which are non-standard, are unset by 'zunset' which too is non-standard. > As noted, ksh93 has (had?) also thrown up its hands on this one. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-09-24 11:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-16 20:57 Memory leak when working with undefined associative array keys & problems with unset Anssi Palin 2017-09-17 23:15 ` Bart Schaefer [not found] ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com> 2017-09-18 0:22 ` Bart Schaefer 2017-09-23 17:59 ` Peter Stephenson 2017-09-24 1:48 ` Bart Schaefer 2017-09-24 11:37 ` Daniel Shahaf
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).