zsh-workers
 help / color / mirror / code / Atom feed
* Memory leak when working with undefined associative array keys & problems with unset
@ 2017-09-16 20:57 Anssi Palin
  2017-09-17 23:15 ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Anssi Palin @ 2017-09-16 20:57 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

Hello,

I've run into two problems with the way Zsh handles associative array keys, both tested on Linux with Zsh 5.4.1 compiled from source and 5.2 from the Ubuntu repositories.

Zsh appears to permanently allocate some memory when just checking if a key is defined in an associative array. This behavior resulted in system memory getting exhausted on my machine with a script that checks random generated strings against a word lookup table. The following snippet replicates the bug and results in some 100 megabytes of memory being allocated:

typeset -A a
for (( i = 0; i < 1000000; i++ )); do
    (( ${+a[$i]} ))
done

Iterating over the same set of undefined keys a second time does not seem to cause more memory to be allocated. Moreover, unsetting or emptying the array doesn't appear to free all of the memory even though the array is destroyed or, in the latter case, emptied as expected. On 5.4.1 unsetting or emptying a second time looks to finally be freeing all of the taken up memory. Unsetting individual keys afterwards in a loop similar to the example above has the same problem, but unsetting each key immediately after checking it seems to mitigate this.


The second issue I have pertains to special characters in associative array keys when unsetting them individually:

$ key='hello * [ world'
$ typeset -A a=("$key" val)
$ unset "a[$key]"
unset: a[hello * [ world]: invalid parameter name

Since characters such as '\', '[', ']', '(' and ')' must be escaped but others like space or '*' shouldn't be, using the q or b parameter expansion flags is out of the question:

$ unset "a[${(q)key}]"
unset: a[hello\ \*\ \[\ world]: invalid parameter name
$ typeset -p a
typeset -A a=( 'hello * [ world' val )

The latter unset error message only shows up on 5.2 but in both cases the key remains set. Similar problems with special characters seem to affect [['s -v flag when checking for existence of associative array keys.

Thank you.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory leak when working with undefined associative array keys & problems with unset
  2017-09-16 20:57 Memory leak when working with undefined associative array keys & problems with unset Anssi Palin
@ 2017-09-17 23:15 ` Bart Schaefer
       [not found]   ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com>
  2017-09-23 17:59   ` Peter Stephenson
  0 siblings, 2 replies; 6+ messages in thread
From: Bart Schaefer @ 2017-09-17 23:15 UTC (permalink / raw)
  To: Anssi Palin, zsh-workers@zsh.org 

On Sep 16,  8:57pm, Anssi Palin wrote:
}
} Zsh appears to permanently allocate some memory when just checking
} if a key is defined in an associative array.

Hmm.

This has to do with order of operations.  In order to test whether a[0]
is (not) set, the calling code needs a "struct pm" object for which it
can attempt to retrieve a value and/or check the PM_UNSET flag.  For a
hash-typed parameter, there's nowhere to temporarily store such an
object except in the hashtable itself.  The same level of evaluation
that tests ${+...} also handles ${...::=...} which also needs that
struct to store into, so we can't NOT allocate it, and because all
parameter ops are by value rather than by reference we can't wait until
after we've assigned to it to add it to the hash (the knowledge that
"a[0]" is part of the hash "a" or even that the key is "0", is lost
by the time we're ready to do the assignment -- we have only a handle
to the object that resides at $a[0]).

The naughty bit is approximately here:

#0  getparamnode (ht=0x92c3898, nam=0xb7cb7818 "0")
    at ../../zsh-5.0/Src/params.c:496
#1  0x080baa56 in createparam (name=0xb7cb7818 "0", flags=570425344)
    at ../../zsh-5.0/Src/params.c:963
#2  0x080bb869 in getarg (str=0xbfee7eb8, inv=0xbfee7ebc, v=0xbfee8130, a2=0, 
    w=0xbfee7ea8, prevcharlen=0xbfee7e9c, nextcharlen=0xbfee7e98)
    at ../../zsh-5.0/Src/params.c:1417
#3  0x080bcf10 in getindex (pptr=0xbfee7f14, v=0xbfee8130, flags=0)
    at ../../zsh-5.0/Src/params.c:1851
#4  0x080bd548 in fetchvalue (v=0xbfee8130, pptr=0xbfee8194, bracks=1, flags=0)
    at ../../zsh-5.0/Src/params.c:2069
#5  0x080e0193 in paramsubst (l=0xb7cb77a8, n=0xb7cb77c0, str=0xbfee8208, 
    qt=0, pf_flags=0, ret_flags=0xbfee836c) at ../../zsh-5.0/Src/subst.c:2418

I think paramsubst() "knows" that it only wants to check set/unset, but
that can't be passed down through fetchvalue() so getarg() doesn't know
that it shouldn't create the hashtable element.

} The second issue I have pertains to special characters in associative
} array keys when unsetting them individually:
}
} $ key='hello * [ world'
} $ typeset -A a=("$key" val)
} $ unset "a[$key]"
} unset: a[hello * [ world]: invalid parameter name

Hmm, when examining ${a[$key]} we enter parse_subscript() with $key
tokenized, but there's no way to get the tokenized string to reach
the same point in the unset builtin (it's either already expanded,
or not tokenized).  Also ${a[$key]} passes sub=0 to parse_subscript()
whereas "unset" always passes sub=1, but I'm uncertain how that may
be relevant.

This may argue for making unset a keyword ala typeset, but perhaps
someone else has a clever approach.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory leak when working with undefined associative array keys & problems with unset
       [not found]   ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com>
@ 2017-09-18  0:22     ` Bart Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2017-09-18  0:22 UTC (permalink / raw)
  To: zsh-workers

[Replying to an off-list message w/Daniel's permission]

On Sep 17, 11:23pm, Daniel Shahaf wrote:
} 
} Bart Schaefer wrote on Sun, 17 Sep 2017 16:15 -0700:
} > On Sep 16,  8:57pm, Anssi Palin wrote:
} > } $ key='hello * [ world'
} > } $ typeset -A a=("$key" val)
} > } $ unset "a[$key]"
} > } unset: a[hello * [ world]: invalid parameter name
} > 
} > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key
} > tokenized, but there's no way to get the tokenized string to reach
} > the same point in the unset builtin

Incidentally ksh93 has this same problem, at least as of 2012:

$ echo $KSH_VERSION
Version AJM 93u+ 2012-08-01
$ unset a["$key"]
ksh: unset: a[hello * [ world]: invalid variable name
$ unset "a[$key]"
ksh: unset: a[hello * [ world]: invalid variable name
$ unset "a['$key']"
ksh: unset: a['hello * [ world']: invalid variable name

This is probably why we've ignored this issue, to date.

} Couldn't we just change the interface and keep it a builtin?
} The point being to pass the subscript separately.

I don't know whether e.g. POSIX would squawk about more options to unset.

There's also this, now that we've added the [key]=val syntax:

torch% a=( ['hello * [ world']=(x y z) )
torch% typeset -p a                     
typeset -A a=( ['hello * [ world']='(x y z)' )
torch% a=( ['hello * [ world']=() )     
zsh: parse error near `()'

Since the empty parens there are currently a parse error, we could make
that work the way it does for plain arrays and unset the key.  However,
in ksh note there are no quotes around the parenthesized value (is that
because ksh allows hash elements to be arrays?) and the empty parens
version is permitted:

$ a=(['hello * [ world']=(x y z))
$ typeset -p a
typeset -A a=(['hello * [ world']=(x y z) )
$ a=(['hello * [ world']=() )    
$ typeset -p a
typeset -A a=(['hello * [ world']=())

So we may prefer to go with [increased] ksh compatibility here.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory leak when working with undefined associative array keys & problems with unset
  2017-09-17 23:15 ` Bart Schaefer
       [not found]   ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com>
@ 2017-09-23 17:59   ` Peter Stephenson
  2017-09-24  1:48     ` Bart Schaefer
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2017-09-23 17:59 UTC (permalink / raw)
  To: zsh-workers@zsh.org 

On Sun, 17 Sep 2017 16:15:14 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 16,  8:57pm, Anssi Palin wrote:
> }
> } Zsh appears to permanently allocate some memory when just checking
> } if a key is defined in an associative array.
> 
> Hmm.
> 
> This has to do with order of operations.  In order to test whether a[0]
> is (not) set, the calling code needs a "struct pm" object for which it
> can attempt to retrieve a value and/or check the PM_UNSET flag.  For a
> hash-typed parameter, there's nowhere to temporarily store such an
> object except in the hashtable itself.  The same level of evaluation
> that tests ${+...} also handles ${...::=...} which also needs that
> struct to store into, so we can't NOT allocate it, and because all
> parameter ops are by value rather than by reference we can't wait until
> after we've assigned to it to add it to the hash (the knowledge that
> "a[0]" is part of the hash "a" or even that the key is "0", is lost
> by the time we're ready to do the assignment -- we have only a handle
> to the object that resides at $a[0]).

Can we really not do something like the patch below?

There may well be other cases to check, as the logic is complicated,
but I think this is the obvious case.  If it breaks anything, then
something is missing from the parameter tests.

> } The second issue I have pertains to special characters in associative
> } array keys when unsetting them individually:
> }
> } $ key='hello * [ world'
> } $ typeset -A a=("$key" val)
> } $ unset "a[$key]"
> } unset: a[hello * [ world]: invalid parameter name
> 
> Hmm, when examining ${a[$key]} we enter parse_subscript() with $key
> tokenized, but there's no way to get the tokenized string to reach
> the same point in the unset builtin (it's either already expanded,
> or not tokenized).  Also ${a[$key]} passes sub=0 to parse_subscript()
> whereas "unset" always passes sub=1, but I'm uncertain how that may
> be relevant.

I gave up on this ages ago, frankly.  We certainly a need way of raw key
reference, but I don't think we've got one, and each time we tweak it we
seem to make it even more tortuous.  Possibly we need yet another flag
to say "FOR ****'S SAKE DON'T TRY TO BE CLEVER NO I MEAN IT ****" (but
without the Tarantino script edit).

First hunk is completely gratuitous but I noticed it when debugging.

pws


diff --git a/Src/hist.c b/Src/hist.c
index da5a8b2..177250f 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -1083,7 +1083,6 @@ hbegin(int dohist)
     } else
 	histactive = HA_ACTIVE | HA_NOINC;
 
-    hf = getsparam("HISTFILE");
     /*
      * For INCAPPENDHISTORYTIME, when interactive, save the history here
      * as it gives a better estimate of the times of commands.
@@ -1104,8 +1103,10 @@ hbegin(int dohist)
      */
     if (isset(INCAPPENDHISTORYTIME) && !isset(SHAREHISTORY) &&
 	!isset(INCAPPENDHISTORY) &&
-	!(histactive & HA_NOINC) && !strin && histsave_stack_pos == 0)
+	!(histactive & HA_NOINC) && !strin && histsave_stack_pos == 0) {
+	hf = getsparam("HISTFILE");
 	savehistfile(hf, 0, HFILE_USE_OPTIONS | HFILE_FAST);
+    }
 }
 
 /**/
diff --git a/Src/params.c b/Src/params.c
index d71dfde..8683777 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -1204,7 +1204,7 @@ isident(char *s)
 /**/
 static zlong
 getarg(char **str, int *inv, Value v, int a2, zlong *w,
-       int *prevcharlen, int *nextcharlen)
+       int *prevcharlen, int *nextcharlen, int flags)
 {
     int hasbeg = 0, word = 0, rev = 0, ind = 0, down = 0, l, i, ishash;
     int keymatch = 0, needtok = 0, arglen, len, inpar = 0;
@@ -1407,6 +1407,8 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w,
 	if (ishash) {
 	    HashTable ht = v->pm->gsu.h->getfn(v->pm);
 	    if (!ht) {
+		if (flags & SCANPM_CHECKING)
+		    return isset(KSHARRAYS) ? 1 : 0;
 		ht = newparamtable(17, v->pm->node.nam);
 		v->pm->gsu.h->setfn(v->pm, ht);
 	    }
@@ -1848,7 +1850,8 @@ getindex(char **pptr, Value v, int flags)
 	zlong we = 0, dummy;
 	int startprevlen, startnextlen;
 
-	start = getarg(&s, &inv, v, 0, &we, &startprevlen, &startnextlen);
+	start = getarg(&s, &inv, v, 0, &we, &startprevlen, &startnextlen,
+		       flags);
 
 	if (inv) {
 	    if (!v->isarr && start != 0) {
@@ -1922,7 +1925,7 @@ getindex(char **pptr, Value v, int flags)
 
 	    if ((com = (*s == ','))) {
 		s++;
-		end = getarg(&s, &inv, v, 1, &dummy, NULL, NULL);
+		end = getarg(&s, &inv, v, 1, &dummy, NULL, NULL, flags);
 	    } else {
 		end = we ? we : start;
 	    }
diff --git a/Src/subst.c b/Src/subst.c
index 5df2a8b..d1827c2 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -2389,7 +2389,13 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
      */
     if (!subexp || aspar) {
 	char *ov = val;
-
+	int scanflags = hkeys | hvals;
+	if (arrasg)
+	    scanflags |= SCANPM_ASSIGNING;
+	if (qt)
+	    scanflags |= SCANPM_DQUOTED;
+	if (chkset)
+	    scanflags |= SCANPM_CHECKING;
 	/*
 	 * Second argument: decide whether to use the subexpression or
 	 *   the string next on the line as the parameter name.
@@ -2418,9 +2424,7 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 	if (!(v = fetchvalue(&vbuf, (subexp ? &ov : &s),
 			     (wantt ? -1 :
 			      ((unset(KSHARRAYS) || inbrace) ? 1 : -1)),
-			     hkeys|hvals|
-			     (arrasg ? SCANPM_ASSIGNING : 0)|
-			     (qt ? SCANPM_DQUOTED : 0))) ||
+			     scanflags)) ||
 	    (v->pm && (v->pm->node.flags & PM_UNSET)) ||
 	    (v->flags & VALFLAG_EMPTY))
 	    vunset = 1;
diff --git a/Src/zsh.h b/Src/zsh.h
index 27642f2..76412b8 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1906,6 +1906,7 @@ struct tieddata {
 				  * necessarily want to match multiple
 				  * elements
 				  */
+#define SCANPM_CHECKING   (1<<10) /* Check if set, no need to create */
 /* "$foo[@]"-style substitution
  * Only sign bit is significant
  */


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory leak when working with undefined associative array keys & problems with unset
  2017-09-23 17:59   ` Peter Stephenson
@ 2017-09-24  1:48     ` Bart Schaefer
  2017-09-24 11:37       ` Daniel Shahaf
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2017-09-24  1:48 UTC (permalink / raw)
  To: zsh-workers@zsh.org 

On Sep 23,  6:59pm, Peter Stephenson wrote:
}
} Can we really not do something like the patch below?

The point of my later "naughty bit" remark was that we'd have to do
something exactly like that patch, so, yeah.

} > } $ unset "a[$key]"
} > } unset: a[hello * [ world]: invalid parameter name
} > 
} > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key
} > tokenized, but there's no way to get the tokenized string to reach
} > the same point in the unset builtin
} 
} I gave up on this ages ago, frankly.  We certainly a need way of raw key
} reference, but I don't think we've got one, and each time we tweak it we
} seem to make it even more tortuous.

The basic problem is that a[$key]=foo is parsed in assignment context,
but unset a[$key] is parsed in ordinary word context.  I don't see any
way around this other than either making unset a keyword, or adding a
command-line option to unset as Daniel suggested.  Neither of these
alternatives makes me very happy.

As noted, ksh93 has (had?) also thrown up its hands on this one.

} Possibly we need yet another flag
} to say "FOR ****'S SAKE DON'T TRY TO BE CLEVER NO I MEAN IT ****" (but
} without the Tarantino script edit).

I can't think of anywhere to attach such a flag.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory leak when working with undefined associative array keys & problems with unset
  2017-09-24  1:48     ` Bart Schaefer
@ 2017-09-24 11:37       ` Daniel Shahaf
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Shahaf @ 2017-09-24 11:37 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Sat, 23 Sep 2017 18:48 -0700:
> On Sep 23,  6:59pm, Peter Stephenson wrote:
> } > } $ unset "a[$key]"
> } > } unset: a[hello * [ world]: invalid parameter name
> } > 
> } > Hmm, when examining ${a[$key]} we enter parse_subscript() with $key
> } > tokenized, but there's no way to get the tokenized string to reach
> } > the same point in the unset builtin
> } 
> } I gave up on this ages ago, frankly.  We certainly a need way of raw key
> } reference, but I don't think we've got one, and each time we tweak it we
> } seem to make it even more tortuous.
> 
> The basic problem is that a[$key]=foo is parsed in assignment context,
> but unset a[$key] is parsed in ordinary word context.  I don't see any
> way around this other than either making unset a keyword, or adding a
> command-line option to unset as Daniel suggested.  Neither of these
> alternatives makes me very happy.

We could also invent a separate builtin, "zunset".  That way we avoid stomping
on any options POSIX's 'unset' might grow in the future.

This way, scalars, which are standard, are unset by 'unset' which too is standard,
and assocs, which are non-standard, are unset by 'zunset' which too is non-standard.

> As noted, ksh93 has (had?) also thrown up its hands on this one.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-24 11:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-16 20:57 Memory leak when working with undefined associative array keys & problems with unset Anssi Palin
2017-09-17 23:15 ` Bart Schaefer
     [not found]   ` <1505690609.1747008.1109155032.651AC839@webmail.messagingengine.com>
2017-09-18  0:22     ` Bart Schaefer
2017-09-23 17:59   ` Peter Stephenson
2017-09-24  1:48     ` Bart Schaefer
2017-09-24 11:37       ` Daniel Shahaf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).