zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH 1/3]: Add named references
@ 2023-02-06  2:21 Bart Schaefer
  2023-02-08  3:45 ` Oliver Kiddle
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2023-02-06  2:21 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 2888 bytes --]

This has been my weekend/evening project the past 10 days or so.  I
compared to ksh93 (from the Ubuntu software store) wherever there
weren't conflicts with zsh-native features, even managing to crash
ksh93 several times in the process.

Attached is the patch with the C code changes.  It'd be nice, Roman,
if you could run your benchmarks on this to see if I've slowed down
any other shell operations.  The next two patches will cover tests and
documentation.

When a reference is created, it's necessary to determine the scope in
which to find the referenced parameter.  Since a reference can name an
array element, this involves separating the base parameter name from
the subscript.  I've overloaded the "base" and "width" fields in
struct param to save both the scope (locallevel) and the offset of the
subscript, to avoid having to search for the subscript again when the
reference is expanded.  There is a little redundant scanning to
validate subscripts when the reference is created; this follows ksh93
(the alternative would be to fail at the time of expansion).

One difference from ksh93 is that you can't convert scalars into
namerefs, e.g., both with this and in ksh93:
  typeset -n foo=bar
  typeset +n nameref
leaves behind a scalar $foo with value bar. but only in ksh93:
  foo=bar
  typeset -n foo
  foo=thing
assigns bar=thing.  In this implementation typeset -n first unsets
foo, so instead of foo
"pointing at" bar, it ends up "pointing at" thing.  This eliminates a
bunch of special cases for exported and tied parameters, etc., and it
seems easy enough to write:
  typeset +n foo
  typeset -n foo=$foo
if that's what you wanted.  If I have a sudden inspiration I'll do a
follow-up patch.

Named references can't have any other attributes, and attempting to
add an attribute to a named reference generates a warning.  However,
existing prohibited attribute changes (such as trying to turn a
special scalar into an array) are fatal errors, so I've left comments
wondering whether that would instead be appropriate here.

With respect to zsh/param/private parameters, it's presently
prohibited to have a "public" reference to a private parameter.  It
might be possible to change this if someone can think of a use case.
There is one glitch: if the reference is created before the parameter
is declared private, and the private local is NOT hiding something in
an outer scope, then the reference does not retroactively generate an
error and assignments to the reference are silent no-ops.

I moved around a little bit of not-directly-related signal handling in
params.c so that we're not calling zfree() outside of queue_signals()
blocks.  That makes the order of unqueue_signals() and zfree()
consistent at least throughout assignsparam().

I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can
tell they should never have been there in the first place.

[-- Attachment #2: nameref-1-code.txt --]
[-- Type: text/plain, Size: 17101 bytes --]

diff --git a/Src/Modules/param_private.c b/Src/Modules/param_private.c
index 065fa63d2..70f36ceb1 100644
--- a/Src/Modules/param_private.c
+++ b/Src/Modules/param_private.c
@@ -512,9 +512,16 @@ static GetNodeFunc getparamnode;
 static HashNode
 getprivatenode(HashTable ht, const char *nam)
 {
-    HashNode hn = getparamnode(ht, nam);
+    /* getparamnode() would follow namerefs, we must not do that here */
+    HashNode hn = gethashnode2(ht, nam);
     Param pm = (Param) hn;
 
+    /* autoload has precedence over nameref, so getparamnode() */
+    if (pm && (pm->node.flags & PM_AUTOLOAD)) {
+	hn = getparamnode(ht, nam);
+	pm = (Param) hn;
+	/* how would an autoloaded private behave?  return here? */
+    }
     while (!fakelevel && pm && locallevel > pm->level && is_private(pm)) {
 	if (!(pm->node.flags & PM_UNSET)) {
 	    /*
@@ -533,6 +540,12 @@ getprivatenode(HashTable ht, const char *nam)
 	}
 	pm = pm->old;
     }
+
+    /* resolve nameref after skipping private parameters */
+    if (pm && (pm->node.flags & PM_NAMEREF) &&
+	(pm->u.str || (pm->node.flags & PM_UNSET)))
+	pm = (Param) resolve_nameref(pm, NULL);
+
     return (HashNode)pm;
 }
 
@@ -571,7 +584,7 @@ printprivatenode(HashNode hn, int printflags)
 
 static struct builtin bintab[] = {
     /* Copied from BUILTIN("local"), "P" added */
-    BUILTIN("private", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_private, 0, -1, 0, "AE:%F:%HL:%PR:%TUZ:%ahi:%lmprtux", "P")
+    BUILTIN("private", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_private, 0, -1, 0, "AE:%F:%HL:%PR:%TUZ:%ahi:%lnmprtux", "P")
 };
 
 static struct features module_features = {
diff --git a/Src/Modules/parameter.c b/Src/Modules/parameter.c
index dbb61e474..5bf675e2a 100644
--- a/Src/Modules/parameter.c
+++ b/Src/Modules/parameter.c
@@ -49,13 +49,15 @@ paramtypestr(Param pm)
 	if (pm->node.flags & PM_AUTOLOAD)
 	    return dupstring("undefined");
 
-	switch (PM_TYPE(f)) {
+	/* For simplicity we treat PM_NAMEREF as PM_TYPE(PM_SCALAR) */
+	switch (PM_TYPE(f)|(f & PM_NAMEREF)) {
 	case PM_SCALAR:  val = "scalar"; break;
 	case PM_ARRAY:   val = "array"; break;
 	case PM_INTEGER: val = "integer"; break;
 	case PM_EFLOAT:
 	case PM_FFLOAT:  val = "float"; break;
 	case PM_HASHED:  val = "association"; break;
+	case PM_NAMEREF: val = "nameref"; break;
 	}
 	DPUTS(!val, "BUG: type not handled in parameter");
 	val = dupstring(val);
diff --git a/Src/builtin.c b/Src/builtin.c
index 4c295d11f..8039b644e 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -55,7 +55,7 @@ static struct builtin builtins[] =
     BUILTIN("cd", BINF_SKIPINVALID | BINF_SKIPDASH | BINF_DASHDASHVALID, bin_cd, 0, 2, BIN_CD, "qsPL", NULL),
     BUILTIN("chdir", BINF_SKIPINVALID | BINF_SKIPDASH | BINF_DASHDASHVALID, bin_cd, 0, 2, BIN_CD, "qsPL", NULL),
     BUILTIN("continue", BINF_PSPECIAL, bin_break, 0, 1, BIN_CONTINUE, NULL, NULL),
-    BUILTIN("declare", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klmp:%rtuxz", NULL),
+    BUILTIN("declare", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klmnp:%rtuxz", NULL),
     BUILTIN("dirs", 0, bin_dirs, 0, -1, 0, "clpv", NULL),
     BUILTIN("disable", 0, bin_enable, 0, -1, BIN_DISABLE, "afmprs", NULL),
     BUILTIN("disown", 0, bin_fg, 0, -1, BIN_DISOWN, NULL, NULL),
@@ -88,7 +88,7 @@ static struct builtin builtins[] =
     BUILTIN("jobs", 0, bin_fg, 0, -1, BIN_JOBS, "dlpZrs", NULL),
     BUILTIN("kill", BINF_HANDLES_OPTS, bin_kill, 0, -1, 0, NULL, NULL),
     BUILTIN("let", 0, bin_let, 1, -1, 0, NULL, NULL),
-    BUILTIN("local", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%ahi:%lp:%rtux", NULL),
+    BUILTIN("local", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%ahi:%lnp:%rtux", NULL),
     BUILTIN("logout", 0, bin_break, 0, 1, BIN_LOGOUT, NULL, NULL),
 
 #if defined(ZSH_MEM) & defined(ZSH_MEM_DEBUG)
@@ -121,7 +121,7 @@ static struct builtin builtins[] =
     BUILTIN("trap", BINF_PSPECIAL | BINF_HANDLES_OPTS, bin_trap, 0, -1, 0, NULL, NULL),
     BUILTIN("true", 0, bin_true, 0, -1, 0, NULL, NULL),
     BUILTIN("type", 0, bin_whence, 0, -1, 0, "ampfsSw", "v"),
-    BUILTIN("typeset", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klp:%rtuxmz", NULL),
+    BUILTIN("typeset", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klp:%rtuxmnz", NULL),
     BUILTIN("umask", 0, bin_umask, 0, 1, 0, "S", NULL),
     BUILTIN("unalias", 0, bin_unhash, 0, -1, BIN_UNALIAS, "ams", NULL),
     BUILTIN("unfunction", 0, bin_unhash, 1, -1, BIN_UNFUNCTION, "m", "f"),
@@ -2030,6 +2030,19 @@ typeset_single(char *cname, char *pname, Param pm, int func,
     int usepm, tc, keeplocal = 0, newspecial = NS_NONE, readonly, dont_set = 0;
     char *subscript;
 
+    if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF)) {
+	if (!(off & PM_NAMEREF))
+	    pm = (Param)resolve_nameref(pm, NULL);
+	if (pm && (pm->node.flags & PM_NAMEREF) &&
+	    (on & ~(PM_NAMEREF|PM_LOCAL))) {
+	    /* Changing type of PM_SPECIAL|PM_AUTOLOAD is a fatal error.  *
+	     * Should this be a fatal error as well, rather than warning? */
+	    zwarnnam(cname, "%s: can't change type of a named reference",
+		     pname);
+	    return NULL;
+	}
+    }
+
     /*
      * Do we use the existing pm?  Note that this isn't the end of the
      * story, because if we try and create a new pm at the same
@@ -2406,6 +2419,11 @@ typeset_single(char *cname, char *pname, Param pm, int func,
 		return NULL;
 	}
     } else if ((subscript = strchr(pname, '['))) {
+	if (on & PM_NAMEREF) {
+	    zerrnam(cname,
+		    "%s: reference variable cannot be an array", pname);
+	    return NULL;
+	}
 	if (on & PM_READONLY) {
 	    zerrnam(cname,
 		    "%s: can't create readonly array elements", pname);
@@ -2640,6 +2658,14 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 	else if (OPT_PLUS(ops,optval))
 	    off |= bit;
     }
+    if (OPT_MINUS(ops,'n')) {
+	if (on|off) {
+	    zwarnnam(name, "no other attributes allowed with -n");
+	    return 1;
+	}
+	on |= PM_NAMEREF;
+    } else if (OPT_PLUS(ops,'n'))
+	off |= PM_NAMEREF;
     roff = off;
 
     /* Sanity checks on the options.  Remove conflicting options. */
@@ -3022,6 +3048,27 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 	    }
 	    continue;
 	}
+
+	if (on & PM_NAMEREF) {
+	    if (asg->value.scalar &&
+		(strcmp(asg->name, asg->value.scalar) == 0 ||
+		 ((pm = (Param)resolve_nameref((Param)hn, asg)) &&
+		  (pm->node.flags & PM_NAMEREF)))) {
+		if (pm->node.flags & PM_SPECIAL)
+		    zwarnnam(name, "%s: invalid reference", pm->node.nam);
+		else
+		    zwarnnam(name, "%s: invalid self reference", asg->name);
+		returnval = 1;
+		continue;
+	    }
+	    if (hn) {
+		/* namerefs always start over fresh */
+		if (((Param)hn)->level >= locallevel)
+		    unsetparam_pm((Param)hn, 0, 1);
+		hn = NULL;
+	    }
+	}
+
 	if (!typeset_single(name, asg->name, (Param)hn,
 			    func, on, off, roff, asg, NULL,
 			    ops, 0))
@@ -3805,7 +3852,8 @@ bin_unset(char *name, char **argv, Options ops, int func)
 		returnval = 1;
 	    }
 	} else {
-	    if (unsetparam_pm(pm, 0, 1))
+	    if ((pm = (Param)resolve_nameref(pm, NULL)) &&
+		unsetparam_pm(pm, 0, 1))
 		returnval = 1;
 	}
 	if (ss)
diff --git a/Src/params.c b/Src/params.c
index 6362b382c..69b7f484f 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -536,6 +536,9 @@ getparamnode(HashTable ht, const char *nam)
 		 nam);
 	}
     }
+
+    if (hn && ht == realparamtab)
+	hn = resolve_nameref(pm, NULL);
     return hn;
 }
 
@@ -993,6 +996,34 @@ createparam(char *name, int flags)
 			 gethashnode2(paramtab, name) :
 			 paramtab->getnode(paramtab, name));
 
+	if (oldpm && (oldpm->node.flags & PM_NAMEREF) &&
+	    !(flags & PM_NAMEREF)) {
+	    Param lastpm;
+	    struct asgment stop;
+	    stop.flags = PM_NAMEREF | (flags & PM_LOCAL);
+	    stop.name = oldpm->node.nam;
+	    stop.value.scalar = oldpm->u.str;
+	    lastpm = (Param)resolve_nameref(oldpm, &stop);
+	    if (lastpm) {
+		if (lastpm->node.flags & PM_NAMEREF) {
+		    if (lastpm->u.str && *(lastpm->u.str)) {
+			name = lastpm->u.str;
+			oldpm = NULL;
+		    } else {
+			if (!(lastpm->node.flags & PM_READONLY))
+			    lastpm->node.flags |= PM_UNSET;
+			return lastpm;
+		    }
+		} else {
+		    /* nameref pointing to an unset local */
+		    DPUTS(!(lastpm->node.flags & PM_UNSET),
+			  "BUG: local parameter is not unset");
+		    oldpm = lastpm;
+		}
+	    } else
+		flags |= PM_NAMEREF;
+	}
+
 	DPUTS(oldpm && oldpm->level > locallevel,
 	      "BUG: old local parameter not deleted");
 	if (oldpm && (oldpm->level == locallevel || !(flags & PM_LOCAL))) {
@@ -2109,6 +2140,23 @@ fetchvalue(Value v, char **pptr, int bracks, int flags)
 	    memset(v, 0, sizeof(*v));
 	else
 	    v = (Value) hcalloc(sizeof *v);
+	if ((pm->node.flags & PM_NAMEREF) && pm->u.str && *(pm->u.str)) {
+	    /* only happens for namerefs pointing to array elements */
+	    char *ref = dupstring(pm->u.str);
+	    char *ss = pm->width ? ref + pm->width : NULL;
+	    if (ss) {
+		sav = *ss;
+		*ss = 0;
+	    }
+	    Param p1 = (Param)gethashnode2(paramtab, ref);
+	    if (!(p1 && (pm = upscope(p1, pm->base))) ||
+		((pm->node.flags & PM_UNSET) &&
+		!(pm->node.flags & PM_DECLARED)))
+		return NULL;
+	    if (ss)
+		*ss = sav;
+	    s = ss;
+	}
 	if (PM_TYPE(pm->node.flags) & (PM_ARRAY|PM_HASHED)) {
 	    /* Overload v->isarr as the flag bits for hashed arrays. */
 	    v->isarr = flags | (isvarat ? SCANPM_ISVAR_AT : 0);
@@ -2677,6 +2725,7 @@ assignstrvalue(Value v, char *val, int flags)
         }
 	break;
     }
+    setscope(v->pm);
     if ((!v->pm->env && !(v->pm->node.flags & PM_EXPORTED) &&
 	 !(isset(ALLEXPORT) && !(v->pm->node.flags & PM_HASHELEM))) ||
 	(v->pm->node.flags & PM_ARRAY) || v->pm->ename)
@@ -3084,11 +3133,20 @@ assignsparam(char *s, char *val, int flags)
 	}
     }
     if (!v && !(v = getvalue(&vbuf, &t, 1))) {
-	unqueue_signals();
 	zsfree(val);
+	unqueue_signals();
 	/* errflag |= ERRFLAG_ERROR; */
 	return NULL;
     }
+    if (*val && (v->pm->node.flags & PM_NAMEREF)) {
+	if (!valid_refname(val)) {
+	    zerr("invalid variable name: %s", val);
+	    zsfree(val);
+	    unqueue_signals();
+	    errflag |= ERRFLAG_ERROR;
+	    return NULL;
+	}
+    }
     if (flags & ASSPM_WARN)
 	check_warn_pm(v->pm, "scalar", created, 1);
     v->pm->node.flags &= ~PM_DEFAULTED;
@@ -3115,8 +3173,8 @@ assignsparam(char *s, char *val, int flags)
 			lhs.u.l = lhs.u.l + (zlong)rhs.u.d;
 		}
 		setnumvalue(v, lhs);
-    	    	unqueue_signals();
 		zsfree(val);
+    	    	unqueue_signals();
 		return v->pm; /* avoid later setstrvalue() call */
 	    case PM_ARRAY:
 	    	if (unset(KSHARRAYS)) {
@@ -3141,9 +3199,9 @@ assignsparam(char *s, char *val, int flags)
 	    case PM_INTEGER:
 	    case PM_EFLOAT:
 	    case PM_FFLOAT:
+		zsfree(val);
 		unqueue_signals();
 		zerr("attempt to add to slice of a numeric variable");
-		zsfree(val);
 		return NULL;
 	    case PM_ARRAY:
 	      kshappend:
@@ -3602,7 +3660,8 @@ unsetparam(char *s)
     if ((pm = (Param) (paramtab == realparamtab ?
 		       /* getnode2() to avoid autoloading */
 		       paramtab->getnode2(paramtab, s) :
-		       paramtab->getnode(paramtab, s))))
+		       paramtab->getnode(paramtab, s))) &&
+	!(pm->node.flags & PM_NAMEREF))
 	unsetparam_pm(pm, 0, 1);
     unqueue_signals();
 }
@@ -5783,7 +5842,8 @@ static const struct paramtypes pmtypes[] = {
     { PM_TAGGED, "tagged", 't', 0},
     { PM_EXPORTED, "exported", 'x', 0},
     { PM_UNIQUE, "unique", 'U', 0},
-    { PM_TIED, "tied", 'T', 0}
+    { PM_TIED, "tied", 'T', 0},
+    { PM_NAMEREF, "namref", 'n', 0}
 };
 
 #define PMTYPES_SIZE ((int)(sizeof(pmtypes)/sizeof(struct paramtypes)))
@@ -6037,3 +6097,123 @@ printparamnode(HashNode hn, int printflags)
     else if (!(printflags & PRINT_KV_PAIR))
 	putchar('\n');
 }
+
+/**/
+mod_export HashNode
+resolve_nameref(Param pm, const Asgment stop)
+{
+    HashNode hn = (HashNode)pm;
+    const char *seek = stop ? stop->value.scalar : NULL;
+
+    if (pm && (pm->node.flags & PM_NAMEREF)) {
+	if (pm && (pm->node.flags & (PM_UNSET|PM_TAGGED))) {
+	    /* Semaphore with createparam() */
+	    pm->node.flags &= ~PM_UNSET;
+	    /* See V10private.ztst end is in scope but private:
+	    if (pm->node.flags & PM_SPECIAL)
+		return NULL;
+	    */
+	    return (HashNode) pm;
+	} else if (pm->u.str) {
+	    if ((pm->node.flags & PM_TAGGED) ||
+		(stop && strcmp(pm->u.str, stop->name) == 0)) {
+		/* zwarnnam(pm->u.str, "invalid self reference"); */
+		return stop ? (HashNode)pm : NULL;
+	    }
+	    if (*(pm->u.str))
+		seek = pm->u.str;
+	}
+    }
+    else if (pm && !(stop && (stop->flags & PM_NAMEREF)))
+	return (HashNode)pm;
+    if (seek) {
+	queue_signals();
+	/* pm->width is the offset of any subscript */
+	if (pm && (pm->node.flags & PM_NAMEREF) && pm->width) {
+	    if (stop) {
+		if (stop->flags & PM_NAMEREF)
+		    hn = (HashNode)pm;
+		else
+		    hn = NULL;
+	    } else {
+		/* this has to be the end of any chain */
+		hn = (HashNode)pm;	/* see fetchvalue() */
+	    }
+	} else if ((hn = gethashnode2(realparamtab, seek))) {
+	    if (pm) {
+		if (!(stop && (stop->flags & (PM_LOCAL))))
+		    hn = (HashNode)upscope((Param)hn,
+					   ((pm->node.flags & PM_NAMEREF) ?
+					    pm->base : ((Param)hn)->level));
+		/* user can't tag a nameref, safe for loop detection */
+		pm->node.flags |= PM_TAGGED;
+	    }
+	    if (hn) {
+		if (hn->flags & PM_AUTOLOAD)
+		    hn = getparamnode(realparamtab, seek);
+		if (!(hn->flags & PM_UNSET))
+		    hn = resolve_nameref((Param)hn, stop);
+	    }
+	    if (pm)
+		pm->node.flags &= ~PM_TAGGED;
+	} else if (stop && (stop->flags & PM_NAMEREF))
+	    hn = (HashNode)pm;
+	unqueue_signals();
+    }
+
+    return hn;
+}
+
+/**/
+static void
+setscope(Param pm)
+{
+    if (pm->node.flags & PM_NAMEREF) {
+	Param basepm;
+	char *t = pm->u.str ? itype_end(pm->u.str, IIDENT, 0) : NULL;
+
+	/* Temporarily change nameref to array parameter itself */
+	if (t && *t == '[')
+	    *t = 0;
+	else
+	    t = 0;
+	basepm = (Param)resolve_nameref(pm, NULL);
+	if (t) {
+	    pm->width = t - pm->u.str;
+	    *t = '[';
+	}
+	if (basepm)
+	    pm->base = ((basepm->node.flags & PM_NAMEREF) ?
+			basepm->base : basepm->level);
+    }
+}
+
+/**/
+mod_export Param
+upscope(Param pm, int reflevel)
+{
+    Param up = pm->old;
+    while (pm && up && up->level >= reflevel) {
+	pm = up;
+	if (up)
+	    up = up->old;
+    }
+    return pm;
+}
+
+/**/
+mod_export int
+valid_refname(char *val)
+{
+    char *t = itype_end(val, IIDENT, 0);
+
+    if (*t != 0) {
+	if (*t == '[') {
+	    tokenize(t = dupstring(t+1));
+	    t = parse_subscript(t, 0, ']');
+	} else {
+	    t = NULL;
+	}
+    }
+    return !!t;
+}
diff --git a/Src/subst.c b/Src/subst.c
index 4ad9fee1a..7ba84cba8 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -2573,13 +2573,14 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 			       !(v->pm->node.flags & PM_UNSET))) {
 		int f = v->pm->node.flags;
 
-		switch (PM_TYPE(f)) {
+		switch (PM_TYPE(f)|(f & PM_NAMEREF)) {
 		case PM_SCALAR:  val = "scalar"; break;
 		case PM_ARRAY:   val = "array"; break;
 		case PM_INTEGER: val = "integer"; break;
 		case PM_EFLOAT:
 		case PM_FFLOAT:  val = "float"; break;
 		case PM_HASHED:  val = "association"; break;
+		case PM_NAMEREF: val = "nameref"; break;
 		}
 		val = dupstring(val);
 		if (v->pm->level)
diff --git a/Src/zsh.h b/Src/zsh.h
index f82e76e4b..1e35bd33e 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1852,8 +1852,9 @@ struct param {
 	GsuHash h;
     } gsu;
 
-    int base;			/* output base or floating point prec    */
-    int width;			/* field width                           */
+    int base;			/* output base or floating point prec or */
+ 				/* for namerefs, locallevel of reference */
+    int width;			/* field width or nameref subscript idx  */
     char *env;			/* location in environment, if exported  */
     char *ename;		/* name of corresponding environment var */
     Param old;			/* old struct for use with local         */
@@ -1932,9 +1933,10 @@ struct tieddata {
 				 */
 #define PM_HASHELEM     (1<<28) /* is a hash-element */
 #define PM_NAMEDDIR     (1<<29) /* has a corresponding nameddirtab entry    */
+#define PM_NAMEREF      (1<<30) /* pointer to a different parameter         */
 
 /* The option string corresponds to the first of the variables above */
-#define TYPESET_OPTSTR "aiEFALRZlurtxUhHTkz"
+#define TYPESET_OPTSTR "aiEFALRZlurtxUhHT"
 
 /* These typeset options take an optional numeric argument */
 #define TYPESET_OPTNUM "LRZiEF"

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-06  2:21 [PATCH 1/3]: Add named references Bart Schaefer
@ 2023-02-08  3:45 ` Oliver Kiddle
  2023-02-08  4:59   ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-08  3:45 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> When a reference is created, it's necessary to determine the scope in

I've run some tests and this looks good. The main issue I noticed was
with a reference to associative array elements.

  typeset -A assoc
  typeset -n ref=assoc[other]
  ref=val

This does work if assoc is assigned an initial value.

The error message from, e.g. `typeset -Z2 ref` is "can't change type
of a named reference". However, this is normally an attempt to change
the type of the target of a reference (-n is not specified). It works
fine for a scalar. So perhaps the error message should complain about
changing the type of [associative] array elements instead.

One ksh feature which this doesn't cover is the special behaviour in ksh
that `typeset -n ref=$1` will use the calling function scope to resolve
the target. This is necessary with the lexical (private) scoping of
ksh93 functions. The positional parameters are special-cased which I
find rather ugly. You could overload -u or -U for indicating Tcl-like
uplevel. The trouble with uplevel is that a called function can't know
how many levels up to specify if people are allowed to write wrappers.
I think a better solution is to take some syntax that isn't a valid
variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would
indicate where another level of scope should be ascended. Wrapper
functions then don't need their own nameref and can just pass, e.g.
\&var as the variable parameter.

> One difference from ksh93 is that you can't convert scalars into
> namerefs, e.g., both with this and in ksh93:

That seems reasonable enough.

> Named references can't have any other attributes, and attempting to
> add an attribute to a named reference generates a warning.  However,

I can't see an actual conflict in permitting -r, making the reference
readonly not the target variable.

> With respect to zsh/param/private parameters, it's presently
> prohibited to have a "public" reference to a private parameter.  It
> might be possible to change this if someone can think of a use case.

To a user it would seem like a fairly arbitrary restriction.

Coming back to the uplevel question, for private to be more useful it'd
be really great to be able to have a private reference to a private
variable from the calling function scope. I'm not sure how easy that
even is with the current implementation of private.

> There is one glitch: if the reference is created before the parameter
> is declared private, and the private local is NOT hiding something in
> an outer scope, then the reference does not retroactively generate an
> error and assignments to the reference are silent no-ops.

Not sure I follow. Sounds like something that ought to be fixed.

> I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can
> tell they should never have been there in the first place.

typeset -fu is like autoload. I think it was intentional.

Note that recent bash also has namerefs so should perhaps be compared.
They only resolve in local scope and don't include the local level with
the reference.

Oliver

PS. The basic approach is much the same as that which was rejected many
years back with consensus at that time being that a C pointer reference
should be used. I did make a start at the time using reference counting
but real life stuff got in the way and it was never completed. Zsh's
code does a lot of deleting and recreating parameter structures which is
problematic with that approach. I do still have an old patch that last
got rebased as recently as only 7 years ago if you're interested. Aside
from not working with array subscripts, I think I also concluded that it
wouldn't work for private variables because of their somewhat too clever
implementation.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-08  3:45 ` Oliver Kiddle
@ 2023-02-08  4:59   ` Bart Schaefer
  2023-02-08 23:16     ` Bart Schaefer
  2023-02-09  0:47     ` Oliver Kiddle
  0 siblings, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-08  4:59 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Tue, Feb 7, 2023 at 7:45 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> with a reference to associative array elements.
>
>   typeset -A assoc
>   typeset -n ref=assoc[other]
>   ref=val
>
> This does work if assoc is assigned an initial value.

This should work after patch 4/3 that I just sent.

% typeset -n ptr=foo
% typeset -A assoc
% typeset -n ref=assoc[other]
% ref=val
% typeset -p assoc
typeset -A assoc=( [other]=val )

> The error message from, e.g. `typeset -Z2 ref` is "can't change type
> of a named reference". However, this is normally an attempt to change
> the type of the target of a reference (-n is not specified). It works
> fine for a scalar. So perhaps the error message should complain about
> changing the type of [associative] array elements instead.

Hmm, yes, that message should be changed if possible.

> One ksh feature which this doesn't cover is the special behaviour in ksh
> that `typeset -n ref=$1` will use the calling function scope to resolve
> the target.

This should also work after patch 4/3.

% typeset outside=OUTSIDE
% () {
typeset -n ref=$1
print $ref
} outside
OUTSIDE

Or do I misunderstand?

> The positional parameters are special-cased which I
> find rather ugly.

There's nothing explaining this in "man ksh93", can you point me
somewhere?  Not that your description makes this sound particular
attractive.

I think the positionals might actually be best handled by a new
special parameter.  Cogitation needed.

> You could overload -u or -U for indicating Tcl-like
> uplevel.

It'd be easier to overload "-i N" for this since it already populates
the "base" integer in Param.

> The trouble with uplevel is that a called function can't know
> how many levels up to specify if people are allowed to write wrappers.

if the current locallevel were visible (cf. new special) then something like
 typeset -i $((locallevel-1)) -n up=argv
could work.  I think the current implementation of argv breaks this, tho.

I'm still not entirely following what problem this solves.

> I think a better solution is to take some syntax that isn't a valid
> variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would
> indicate where another level of scope should be ascended. Wrapper
> functions then don't need their own nameref and can just pass, e.g.
> \&var as the variable parameter.

You're going to have to write out an example, sorry.

> I can't see an actual conflict in permitting -r, making the reference
> readonly not the target variable.

I'm also thinking about that.  The only use-case would be to prevent
something from doing "typeset +n ref".

> > With respect to zsh/param/private parameters, it's presently
> > prohibited to have a "public" reference to a private parameter.  It
> > might be possible to change this if someone can think of a use case.
>
> To a user it would seem like a fairly arbitrary restriction.

What purpose does it serve?  How does it help to have two names for
the same thing, one of which is "hidden" and the other isn't?  The
only analogy I can think of would be having a writable name for a
read-only variable.

> Coming back to the uplevel question, for private to be more useful it'd
> be really great to be able to have a private reference to a private
> variable from the calling function scope. I'm not sure how easy that
> even is with the current implementation of private.

You presently can't make anything that's private visible down-scope.
If you mean you want a variable that is visible in exactly one nested
scope, I'm back to asking for the use case.

> > There is one glitch: if the reference is created before the parameter
> > is declared private, and the private local is NOT hiding something in
> > an outer scope, then the reference does not retroactively generate an
> > error and assignments to the reference are silent no-ops.
>
> Not sure I follow. Sounds like something that ought to be fixed.

In Test/V10private.ztst it's the test with
 F:See K01typeset.ztst up-reference part 5
 F:Here ptr1 finds private ptr2 by scope mismatch, assignment silently fails

> > I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can
> > tell they should never have been there in the first place.
>
> typeset -fu is like autoload. I think it was intentional.

Except every letter in TYPESET_OPTSTR has to correspond to the PM_
flag at that same index position, and the PM_ flags matching the k and
z options are way out of sequence for that.

> Note that recent bash also has namerefs so should perhaps be compared.

The special behavior of namerefs in "for" loops and implementing
"unset -n" might be good; the latter isn't mentioned until the "unset"
builtin in the ksh93 doc so I missed it entirely.

> PS. The basic approach is much the same as that which was rejected many
> years back with consensus at that time being that a C pointer reference
> should be used.

And yet we implemented ${(P)...} which is actually a bit grottier.  I
did consider overloading pm->old but was leery of messing up locals so
decided to leave it as a future optimization.

> I did make a start at the time using reference counting

That was the other part that worried me ... everything just went
(relatively) smoothly if most of the code could continue pretending
namerefs were scalars.

> I think I also concluded that it
> wouldn't work for private variables because of their somewhat too clever
> implementation.

They could be a lot less clever if they could move into the main
shell.  Most of the cleverness was necessary for making them modular.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-08  4:59   ` Bart Schaefer
@ 2023-02-08 23:16     ` Bart Schaefer
  2023-02-09  0:47     ` Oliver Kiddle
  1 sibling, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-08 23:16 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Feb 7, 2023 at 8:59 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Tue, Feb 7, 2023 at 7:45 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > The error message from, e.g. `typeset -Z2 ref` is "can't change type
> > of a named reference".
>
> Hmm, yes, that message should be changed if possible.

It'd be more effort than it's worth to distinguish the type of array
involved, but my next patch will change this to say "of array
element".

> if the current locallevel were visible (cf. new special) then something like
>  typeset -i $((locallevel-1)) -n up=argv
> could work.  I think the current implementation of argv breaks this, tho.

The up-level argv is a char ** scoped to doshfunc() so it'd be rather
icky to expose it as a shell parameter.

> The special behavior of namerefs in "for" loops and implementing
> "unset -n" might be good; the latter isn't mentioned until the "unset"
> builtin in the ksh93 doc so I missed it entirely.

Next patch will have both of these, too.  I note that execfor() has
quite a bit of room for optimization, but I haven't dived into that.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-08  4:59   ` Bart Schaefer
  2023-02-08 23:16     ` Bart Schaefer
@ 2023-02-09  0:47     ` Oliver Kiddle
  2023-02-09  2:01       ` Oliver Kiddle
  2023-02-09  4:49       ` Bart Schaefer
  1 sibling, 2 replies; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-09  0:47 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> > One ksh feature which this doesn't cover is the special behaviour in ksh
> > that `typeset -n ref=$1` will use the calling function scope to resolve
> > the target.
>
> This should also work after patch 4/3.
>
> % typeset outside=OUTSIDE
> % () {
> typeset -n ref=$1
> print $ref
> } outside
> OUTSIDE
>
> Or do I misunderstand?

In ksh you could pass "ref" as a parameter and the nameref would provide
access to the outside variable despite the name clash.

> > The positional parameters are special-cased which I
> > find rather ugly.
>
> There's nothing explaining this in "man ksh93", can you point me
> somewhere?  Not that your description makes this sound particular
> attractive.

The manual does seem rather vague on this. I've long been aware of it so
the detail may have come to me via the old ast-ksh lists. But to
demonstrate:

% function f {
>   typeset out=99 var=$1
>   nameref ref1=$1
>   nameref ref2=$var
>   echo f: $ref1 $ref2
>   ref1=1
>   ref2=2
> }
% out=0
% f out
f: 0 99
% echo $out
1

So $1 as the target of a nameref is special. I don't especially like
this. Everywhere else in the shell, a variable reference like $1 is no
different from using the text it would expand to explicitly. I don't
think we need to worry about the level of ksh93 compatibility here
because our default dynamic scoping mean a ksh script will work if
variable names don't clash.

> > You could overload -u or -U for indicating Tcl-like
> > uplevel.
>
> It'd be easier to overload "-i N" for this since it already populates
> the "base" integer in Param.

Not with the right number, though. Tcl's uplevel gives a distance (up the
procedure calling stack) to move. 1 being the default. You can prefix
with # to get an absolute level but that doesn't seem especially useful.

So 1 would mean $((locallevel-1)) in absolute terms.
Note that Tcl's uplevel is like a precommand modifier in zsh terms so
is not limited to defining references.

However I still prefer the other solution I mentioned with \&.

> I'm still not entirely following what problem this solves.

It solves the problem of name clashes between a function and the calling
function.

Consider Completion/Base/Utility/_call_function,
It expects the name of a parameter in $1.
It does:
  local _name _ret
  _name="$1"
  ...
  _ret=...
  eval "${_name}=${_ret}"
The underscores are there largely to avoid name clashes. In ksh, this could
instead be:
  nameref ret="$1"
  ret=...
And it wouldn't matter if ret is also the variable name in the calling
function.

> > I think a better solution is to take some syntax that isn't a valid
> > variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would
> > indicate where another level of scope should be ascended. Wrapper
> > functions then don't need their own nameref and can just pass, e.g.
> > \&var as the variable parameter.
>
> You're going to have to write out an example, sorry.

With this idea:
  typeset -n ret=\&1
would be like ksh's:
  typeset -n ret=$1
But you could also do
  typeset -n ret=\&var

The code would strip the initial \&, go up one local level and see what
$var is in that scope. If the result also starts with an &, it would
repeat the process going up a further level. So a wrapper function to,
e.g. _call_function could pass "&$1" directly as the first parameter
instead of needing a local nameref of it's own. You could optionally
allow "&&var" to go up two levels. It doesn't need to be & if you think
a different character is better, it just needs to not be a valid
variable name. Does that make any more sense than the previous
explanation?

> I'm also thinking about that.  The only use-case would be to prevent
> something from doing "typeset +n ref".

Also prevents typeset -n ref=othervar
It is just as useful or useless as -r with other variables.

> > > With respect to zsh/param/private parameters, it's presently
> > > prohibited to have a "public" reference to a private parameter.  It
> > > might be possible to change this if someone can think of a use case.
> >
> > To a user it would seem like a fairly arbitrary restriction.
>
> What purpose does it serve?  How does it help to have two names for
> the same thing, one of which is "hidden" and the other isn't?  The
> only analogy I can think of would be having a writable name for a
> read-only variable.

If two functions are developed independently, they shouldn't need
to worry about the possibility of using the same variable name for
different purposes. This is the main benefit of private in general.
An old function like _call_function (updated to use nameref but still
using local) could be used by someone writing a new function where they
pass a private variable as the first parameter. If they are treating
_call_function as a black box, how are they to know that private
variables are not valid for the passed $1. It'd be an arbitrary
limitation.

> The special behavior of namerefs in "for" loops and implementing
> "unset -n" might be good; the latter isn't mentioned until the "unset"
> builtin in the ksh93 doc so I missed it entirely.

Given that you implemented references to array members, being able to
iterate over array elements with for could be nice. Perhaps something
like: for ref in 'arr[*]'; do
maybe [@] to include empty elements.

On the subject references to array elements, it does seem both powerful and
dangerous that subscripts are evaluated on reference. The subscript
should perhaps be evaluated with the same local level as the array
itself. And it could be wise to limit what can be done as part of the
subscript evaluation to avoid a CVE similar to the last one.

> > I think I also concluded that it
> > wouldn't work for private variables because of their somewhat too clever
> > implementation.
>
> They could be a lot less clever if they could move into the main
> shell.  Most of the cleverness was necessary for making them modular.

I would favour moving them to the main shell. And then making extensive
use of them in completon etc. There are things that could move in the
opposite direction if we want to reduce bloat. Mail checking for
example.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  0:47     ` Oliver Kiddle
@ 2023-02-09  2:01       ` Oliver Kiddle
  2023-02-09  5:45         ` Bart Schaefer
  2023-02-09  4:49       ` Bart Schaefer
  1 sibling, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-09  2:01 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh workers

This isn't a self-reference because the local levels would be different.
And "self reference" in the error message should be hyphenated.
  typeset -n ref=var
  foo() { typeset -n var=ref }
  foo
  foo:typeset:1: var: invalid self reference
With -g it would be, however.

I wrote:
> On the subject references to array elements, it does seem both powerful and
> dangerous that subscripts are evaluated on reference. The subscript

This is fairly sane:
  typeset -n ref1=arr[ref2] 
  typeset -n ref2=ref1     
  echo $ref1
  zsh: math recursion limit exceeded: ref2

This seems useful:
  arr=()
  typeset -n ref=arr[1,0]
  ref=4
  ref=5

Similar:
  idx=1
  typeset -n ref=arr[idx++]
  echo $ref $ref

This is where I worry about security:
  arr=( 1 2 3 4)
  typeset -n ref='arr[$(echo 4)]'
  echo $ref
  typeset -p ref  # it was only evaluated late thanks to quotes
Maybe the code should have a safer mode for subscript evaluation. This
could be useful but it is asking for trouble.

Also can see this being useful:
  str='one two three'
  typeset -n ref=str[(w)2]
  ref=zwei

We already talked about this error message. But now it is a substring.
  export ref
  export: ref: can't change type of a named reference

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  0:47     ` Oliver Kiddle
  2023-02-09  2:01       ` Oliver Kiddle
@ 2023-02-09  4:49       ` Bart Schaefer
  2023-02-09 20:49         ` Oliver Kiddle
  1 sibling, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2023-02-09  4:49 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Wed, Feb 8, 2023 at 4:47 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > > One ksh feature which this doesn't cover is the special behaviour in ksh
> > > that `typeset -n ref=$1` will use the calling function scope to resolve
> > > the target.
> >
>
> In ksh you could pass "ref" as a parameter and the nameref would provide
> access to the outside variable despite the name clash.

I think that will start working if I can fix the "invalid self
reference" mentioned in your next message.

> So [in ksh] $1 as the target of a nameref is special. [...] I don't
> think we need to worry about the level of ksh93 compatibility here
> because our default dynamic scoping mean a ksh script will work if
> variable names don't clash.

My general feeling is to be ksh compatible where I don't think ksh is stupid.

> Note that Tcl's uplevel is like a precommand modifier in zsh terms so
> is not limited to defining references.
>
> However I still prefer the other solution I mentioned with \&.

I have some half-formed thoughts about other ways to accomplish this
... I don't think the \& suggestion by itself gets around the problem
that the positionals are C-local to doshfunc().

> > I'm still not entirely following what problem this solves.
>
> It solves the problem of name clashes between a function and the calling
> function.

OK, so if
  typeset ref
  () { typeset -n ref=ref }
works / is not a loop, then we don't need any other magic?

> If two functions are developed independently, they shouldn't need
> to worry about the possibility of using the same variable name for
> different purposes. This is the main benefit of private in general.
> An old function like _call_function (updated to use nameref but still
> using local) could be used by someone writing a new function where they
> pass a private variable as the first parameter. If they are treating
> _call_function as a black box, how are they to know that private
> variables are not valid for the passed $1. It'd be an arbitrary
> limitation.

I'm still not seeing that.  The writer of the caller of _call_function
should know that it's a general rule that another function can't
access a private variable.  Why would the writer pass a private name
as an argument to any function, even a blackbox, if not expecting it
to be referenced?  The rule for a private should be that you always
pass its value.

Correct me if I still misunderstand, but I interpret the example above
as implying that

  f2() {
    typeset -n ref=$1
    # ... do stuff to $ref ...
  }
  f1() {
    private var
    f2 var
  }

should work just because the name is passed in a positional?  I would
disagree with that on the grounds that you're making positionals into
a special case, just a different kind of special case from ksh, that
violates the intent of "private".

The case I was asking about (and the only one I so far consider
possibly viable) is

  f1() {
    private var
    typeset -n ref=var
    f2 ref
  }

That is, the reference has to be declared in the scope where var is
"visible", otherwise you hit the privacy wall when "typeset -n" has to
up-level.

Or perhaps you mean to be tagging on to your other suggestion and it would be
  f2 \&var
which feels like syntactic sugar for my example above, with the
benefit of not needing to declare an extra parameter.  That's an
interesting idea but I would not want this to work:

  f2() {
    typeset -n upref=\&var
    # ... do things to var one level up ...
    typeset -n uptwo=\&\&var
    # ... do things to var two levels up ...
  }

I fear that puts us back to having something magic about positional
parameters, e.g., that unless the "&var" string appears in one of $1,
$2, ... then it's just an invalid name like any other.

Back again to my half-formed thoughts, I can't really elucidate yet.

> Given that you implemented references to array members, being able to
> iterate over array elements with for could be nice. Perhaps something
> like: for ref in 'arr[*]'; do

With a hash that's just:

  typeset -n ref
  for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...

but since you have to "typeset -n ref" before the for-loop anyway, why not just

  typeset -n ref=hash
  for idx in ${(k)ref}; do ... $ref[$idx] ...

Ordinary arrays are harder because there's no "find all matching
elements" subscript flag.

> On the subject references to array elements, it does seem both powerful and
> dangerous that subscripts are evaluated on reference.

Hm ... I implemented that because ksh93 accepted the syntax, but I
never actually tried it -- in fact (Ubuntu's) ksh93 doesn't actually
expand it to anything sensible and the value of the nameref ends up
nonsense.

> The subscript
> should perhaps be evaluated with the same local level as the array
> itself.

That gets pretty weird, and you can already trivially bypass it with
${(P)...} for many cases.

> And it could be wise to limit what can be done as part of the
> subscript evaluation to avoid a CVE similar to the last one.

validate_refname() is calling parse_subscript() ... would further
examination of the accepted string be sufficient, or something more
needed?  I think the greatest danger is of creating an infinite
recursion, which can't really be caught in advance.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  2:01       ` Oliver Kiddle
@ 2023-02-09  5:45         ` Bart Schaefer
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-09  5:45 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh workers

On Wed, Feb 8, 2023 at 6:01 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> This isn't a self-reference because the local levels would be different.

Working on that.  Halfway there.

> And "self reference" in the error message should be hyphenated.

I copied that message verbatim from ksh93.  Can still change it, but
does it really matter?

> Maybe the code should have a safer mode for subscript evaluation.

See previous message.

> We already talked about this error message. But now it is a substring.
>   export ref
>   export: ref: can't change type of a named reference

The point at which it's necessary to catch that error would have to
lop off the subscript and re-evaluate the base variable name in order
to determine the thing being subscripted.  That seems like an
unnecessary amount of work.

How about
  can't change type via subscript reference
??


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  4:49       ` Bart Schaefer
@ 2023-02-09 20:49         ` Oliver Kiddle
  2023-02-09 23:07           ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-09 20:49 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> My general feeling is to be ksh compatible where I don't think ksh is stupid.

Sounds reasonable.

> > It solves the problem of name clashes between a function and the calling
> > function.
>
> OK, so if
>   typeset ref
>   () { typeset -n ref=ref }
> works / is not a loop, then we don't need any other magic?

The following is similar:
  var=hello
  typeset -n ref
  () {
    typeset var=x
    ref=var
    echo $ref
  }
  typeset -p ref
  echo $ref

This creates a reference to a variable at a deeper local level. Ksh
prints "global reference cannot refer to local variable". With zsh's
dynamic scoping you could do the same thing with a reference that isn't
"global" but which has a scope that will outlast the variable. Making
var outlive the function is a feature you'd get with a reference
counting approach to namerefs. The best might be if ref returns to being
unset when the function returns but an error like ksh is fine too.

> I'm still not seeing that.  The writer of the caller of _call_function
> should know that it's a general rule that another function can't
> access a private variable.  Why would the writer pass a private name
> as an argument to any function, even a blackbox, if not expecting it
> to be referenced?  The rule for a private should be that you always
> pass its value.

I hadn't really thought about it that way, perhaps because ksh only
has private scoping and I'm used to writing in languages that only
have lexical scoping. Certainly if it is hard to implement, I have no
objection to this approach. It does have the big advantage of making the
whole uplevel question a non-issue. We do lose some orthogonality in
that you can use a private with builtins that take variable names like
read, compadd and printf (-v). Wrappers of those would have an added
limitation. (_approximate relies on a function wrapper of compadd so
privates can't be used with that)

When relying only on dynamic scoping, it would be good practice to
define all the namerefs to passed parameters as early as possible in a
function to reduce the risk of a name clash.

> Correct me if I still misunderstand, but I interpret the example above
> as implying that
>
>   f2() {
>     typeset -n ref=$1
>     # ... do stuff to $ref ...
>   }
>   f1() {
>     private var
>     f2 var
>   }
>
> should work just because the name is passed in a positional?  I would
> disagree with that on the grounds that you're making positionals into
> a special case, just a different kind of special case from ksh, that
> violates the intent of "private".

It isn't about the positionals being special but that it is useful to be
able to write a function that exposes an interface similar to read where
a variable can be named as a parameter. Ksh's making $1, $2.. special
on the rhs of typeset -n really is very ugly.

> The case I was asking about (and the only one I so far consider
> possibly viable) is
>
>   f1() {
>     private var
>     typeset -n ref=var
>     f2 ref
>   }

This puts the complexity into the caller so f2's interface is less
nice. And by needing ref as a local, f1 hasn't really benefitted from
var being able to be private. But if this is easy to implement then it
could be useful. If not, printing an error is fine too.

> Or perhaps you mean to be tagging on to your other suggestion and it would be
>   f2 \&var

My intention with that suggestion is that you'd only do that to refer to
$var from the scope of f1's caller. So in practice that'd sooner be
something like \&$3. For this, it'd be just `f2 var` and f2() would
declare `private -n mine=\&1`

> which feels like syntactic sugar for my example above, with the
> benefit of not needing to declare an extra parameter.  That's an
> interesting idea but I would not want this to work:

Again, the key difference is that the caller needed the extra parameter
where as using some sort of uplevel goes in the callee. You can't wrap
e.g. compadd transparently if the caller needs to change.

> With a hash that's just:
>
>   typeset -n ref
>   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...

"just"!?

> > And it could be wise to limit what can be done as part of the
> > subscript evaluation to avoid a CVE similar to the last one.
>
> validate_refname() is calling parse_subscript() ... would further
> examination of the accepted string be sufficient, or something more
> needed?  I think the greatest danger is of creating an infinite
> recursion, which can't really be caught in advance.

So if a function gets the target of a nameref from untrusted input the
function writer needs to know to validate it with something like

  [[ $var = [[:IDENT:]]## ]]

It might deprive us of many clever tricks but parse_subscript() could
gain a flag to disable anything questionable like command substitution
and math evaluation side-effects.

This should be an error perhaps:

  typeset -n ref=arr[1][2]

Currently it isn't possible to create a reference fo $!, $?, $@, $+, $#
and $$. If easy to add, there would be no harm in them.

And in 51388, Bart wrote:
> How about
>   can't change type via subscript reference
> ??

That sounds fine.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09 20:49         ` Oliver Kiddle
@ 2023-02-09 23:07           ` Bart Schaefer
  2023-02-11  3:04             ` Bart Schaefer
  2023-02-12  9:02             ` [PATCH 1/3]: Add named references Oliver Kiddle
  0 siblings, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-09 23:07 UTC (permalink / raw)
  To: Zsh hackers list

On Thu, Feb 9, 2023 at 12:49 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> The following is similar:
>   var=hello
>   typeset -n ref
>   () {
>     typeset var=x
>     ref=var
>     echo $ref
>   }
>   typeset -p ref
>   echo $ref
>
> This creates a reference to a variable at a deeper local level.

Only in a pointers/refcounting implementation.  With the assumption of
dynamic scoping, it creates a reference to a name, where the scope
search for the name starts at a lower level.  If that name doesn't
exist at that level (because the whole level doesn't exist any more),
the search climbs up.  So I get with set -x (and a couple of
in-progress patches for looping references):

+Src/zsh:15> var=hello
+Src/zsh:16> typeset -n ref
+Src/zsh:17> '(anon)'
+(anon):1> typeset var=x
+(anon):2> ref=var
+(anon):3> echo x
x
+Src/zsh:22> typeset -p ref
typeset -n ref=var
+Src/zsh:23> echo hello
hello

If I add

  () {
   typeset var=y
   echo $ref
   () {
     typeset var=z
     echo $ref
   }
  }

I get

+Src/zsh:24> '(anon)'
+(anon):0> typeset var=y
+(anon):0> echo y
y
+(anon):1> '(anon)'
+(anon):0> typeset var=z
+(anon):0> echo y
y

> The best might be if ref returns to being
> unset when the function returns but an error like ksh is fine too.

I'm not sure how to do that without scanning the whole parameter
table, but I agree it the above is a little puzzling.

> Ksh prints "global reference cannot refer to local variable".

At what point does that happen?  Upon the assignment ref=var ?

Relatedly, what should happen on any failed assignment?  E.g.

typeset -n xy yx
xy=yx  # OK so far
yx=xy  # Oops, invalid loop

Should yx become unset, or should it remain a nameref with no referent?

> > The rule for a private should be that you always
> > pass its value.
>
> I hadn't really thought about it that way, perhaps because ksh only
> has private scoping and I'm used to writing in languages that only
> have lexical scoping. Certainly if it is hard to implement, I have no
> objection to this approach.

("This approach" meaning "no public refs to private vars"?)  I haven't
tried anything else yet to see how hard it might be.

> We do lose some orthogonality in
> that you can use a private with builtins that take variable names like
> read, compadd and printf (-v). Wrappers of those would have an added
> limitation.

That's true of private already, isn't it?

> When relying only on dynamic scoping, it would be good practice to
> define all the namerefs to passed parameters as early as possible in a
> function to reduce the risk of a name clash.

If you were going to put that in the doc, where would it go?

> It isn't about the positionals being special but that it is useful to be
> able to write a function that exposes an interface similar to read where
> a variable can be named as a parameter. Ksh's making $1, $2.. special
> on the rhs of typeset -n really is very ugly.

This works for my code in current state:

% var=GLOBAL
% typeset -n ref=var
% f() {
function> typeset -n ref=$1
function> print $ref
function> ref=LOCAL
function> }
% f ref
GLOBAL
% print $ref
LOCAL
%

> >   f2 \&var
>
> My intention with that suggestion is that you'd only do that to refer to
> $var from the scope of f1's caller. So in practice that'd sooner be
> something like \&$3. For this, it'd be just `f2 var` and f2() would
> declare `private -n mine=\&1`

Yeah, I don't like the idea that a called function can arbitrarily
grab parameters from its caller just by sticking & in front.  Caller
needs to do do something (even if only make "normal" use of dynamic
scope) to make the parameter "grabbable".

> > With a hash that's just:
> >
> >   typeset -n ref
> >   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...
>
> "just"!?

Hah!  Point was that it's do-able without "for"-context-sensitive
special subscript semantics.  I think it would be strange for

ary=( 1 2 3 4 5 )
typeset -n ref='ary[*]
ref=( a b c)

to produce something different than

ary=( 1 2 3 4 5 )
typeset -n ref
for ref in 'ary{*]'; do ref=( a b c ); done

> > > And it could be wise to limit what can be done as part of the
> > > subscript evaluation to avoid a CVE similar to the last one.
> >
> > validate_refname() is calling parse_subscript() ... would further
> > examination of the accepted string be sufficient, or something more
> > needed?  I think the greatest danger is of creating an infinite
> > recursion, which can't really be caught in advance.
>
> So if a function gets the target of a nameref from untrusted input the
> function writer needs to know to validate it

No, I meant, would examining the subscript string in the C code be sufficient.

> This should be an error perhaps:
>
>   typeset -n ref=arr[1][2]

Why?  ${ary[1][2]} isn't an error, it's the 2nd character of the first
word in $ary.

You can keep throwing subscripts on there as long as the resulting
substrings can be indexed-into.

print ${ary[1][3,9][4]} # etc.

> Currently it isn't possible to create a reference fo $!, $?, $@, $+, $#
> and $$. If easy to add, there would be no harm in them.

You can make references to argv and ARGC, but they always refer to the
current argv/ARGC because of the aforementioned implementation of
positionals as C locals.  $* $@ $# would have the same issues.

The others would all have to be special-cased individually.  What is
$+ ?  Do you mean $- ?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09 23:07           ` Bart Schaefer
@ 2023-02-11  3:04             ` Bart Schaefer
  2023-02-11  3:55               ` Bart Schaefer
  2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
  2023-02-12  9:02             ` [PATCH 1/3]: Add named references Oliver Kiddle
  1 sibling, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-11  3:04 UTC (permalink / raw)
  To: Zsh hackers list

Closing in on this, I think ... remaining decisions on which input
would be appreciated:

On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> > Ksh prints "global reference cannot refer to local variable".
>
> At what point does that happen?  Upon the assignment ref=var ?

At the moment, just toying with this, I have it erroring when
EMULATE_KSH, printing a warning when WARN_NESTED_VAR, and otherwise
silently creating the down-scope reference.  Thoughts?  Perhaps
WARN_CREATE_GLOBAL would actually be better here, or maybe check for
either one.

> Relatedly, what should happen on any failed assignment?  E.g.
>
> typeset -n xy yx
> xy=yx  # OK so far
> yx=xy  # Oops, invalid loop
>
> Should yx become unset, or should it remain a nameref with no referent?

This probably doesn't matter except at the command line as the
referent loop is otherwise fatal.

> > When relying only on dynamic scoping, it would be good practice to
> > define all the namerefs to passed parameters as early as possible in a
> > function to reduce the risk of a name clash.
>
> If you were going to put that in the doc, where would it go?

Options are:
- under Named References in Parameters
- at the description of -n under typeset
- with the scoping example in Expansion
- in a new section under Functions

> > > validate_refname() is calling parse_subscript() ... would further
> > > examination of the accepted string be sufficient, or something more
> > > needed?
> [...] I meant, would examining the subscript string in the C code be sufficient.
>
> > It might deprive us of many clever tricks but parse_subscript() could
> > gain a flag to disable anything questionable like command substitution
> > and math evaluation side-effects.

parse_subscript() is already a bit of a hack, it sets up its own
context and invokes the lexer.  A flag such as that would, I think,
require plugging in an alternate lexer, hence my question about doing
a simpler examination of the string.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  3:04             ` Bart Schaefer
@ 2023-02-11  3:55               ` Bart Schaefer
  2023-02-11  5:36                 ` Speaking of dangerous referents Bart Schaefer
  2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
  1 sibling, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2023-02-11  3:55 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, Feb 10, 2023 at 7:04 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> > > It might deprive us of many clever tricks but parse_subscript() could
> > > gain a flag to disable anything questionable like command substitution
> > > and math evaluation side-effects.

I'm not sure what to do about math side-effects, but I've hit upon the
idea of expanding subscripts that appear in named references with
EXECOPT set to zero (noexec).  That reduces $(...) to the empty
string, except for the special-case $(<file) that is being discussed
in another thread.

Switching to noexec around singsub() in getarg() is quite easy, and as
previously mentioned you can still do fully dynamic subscripts with
${(P)...}.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Speaking of dangerous referents
  2023-02-11  3:55               ` Bart Schaefer
@ 2023-02-11  5:36                 ` Bart Schaefer
  2023-02-12  8:00                   ` Oliver Kiddle
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2023-02-11  5:36 UTC (permalink / raw)
  To: Zsh hackers list

Oliver wrote:
> > And it could be wise to limit what can be done as part of the
> > subscript evaluation to avoid a CVE similar to the last one.

% print $ZSH_PATCHLEVEL
ubuntu/5.8-3ubuntu1.1
% empty=()
% loop='empty[${(P)loop}]'
% print ${(P)loop}
zsh: segmentation fault (core dumped)  zsh -f


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  3:04             ` Bart Schaefer
  2023-02-11  3:55               ` Bart Schaefer
@ 2023-02-11  7:02               ` Oliver Kiddle
  2023-02-11  7:45                 ` Bart Schaefer
  1 sibling, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-11  7:02 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> Closing in on this, I think ... remaining decisions on which input
> would be appreciated:
>
> On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > > Ksh prints "global reference cannot refer to local variable".
> >
> > At what point does that happen?  Upon the assignment ref=var ?
>
> At the moment, just toying with this, I have it erroring when
> EMULATE_KSH, printing a warning when WARN_NESTED_VAR, and otherwise
> silently creating the down-scope reference.  Thoughts?  Perhaps
> WARN_CREATE_GLOBAL would actually be better here, or maybe check for
> either one.

I'd make it a fatal error unconditionally. I don't like it if the target
of a nameref can switch to something else following a return. I'd be
fine with the reference becoming unset at the time of the return if it
refers to a variable that is going out of scope. Can that be done as
part of the same code that drops the local variables? (I'm guessing it
is easy for local but hard for private)

The mention of WARN_NESTED_VAR raises something else. Surely a reference
should disable that warning:

    toplevel() {
      local foo="in fn"
      nested
    }
    nested() {
      typeset -n ref=foo
      ref="in nested"
    }         
    setopt warn_nested_var
    toplevel
    nested:2: scalar parameter foo set in enclosing scope in function nested

> > Relatedly, what should happen on any failed assignment?  E.g.
> >
> > typeset -n xy yx
> > xy=yx  # OK so far
> > yx=xy  # Oops, invalid loop
> >
> > Should yx become unset, or should it remain a nameref with no referent?
>
> This probably doesn't matter except at the command line as the
> referent loop is otherwise fatal.

Current code even allows:

  typeset -n ref
  ref=ref

I'd suggest that we need the self-reference check whenever assigning to
a reference. But that entails evaluation of a subscript.

> > > When relying only on dynamic scoping, it would be good practice to
> > > define all the namerefs to passed parameters as early as possible in a
> > > function to reduce the risk of a name clash.
> >
> > If you were going to put that in the doc, where would it go?

Do we to put recommended good practices in the doc at all as opposed to
it being a terse description of the functionality. I'd prefer it out of
the way with examples rather than adding fluff to the places you list.
I'd also consider it bad practice not to always assign a reference when
creating it.

> > > It might deprive us of many clever tricks but parse_subscript() could
> > > gain a flag to disable anything questionable like command substitution
> > > and math evaluation side-effects.
>
> parse_subscript() is already a bit of a hack, it sets up its own
> context and invokes the lexer.  A flag such as that would, I think,
> require plugging in an alternate lexer, hence my question about doing
> a simpler examination of the string.

That's not as robust but certainly an option. The temporary use of
NOEXEC that you mention in the other message certainly goes a long way
to alleviating security concerns.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
@ 2023-02-11  7:45                 ` Bart Schaefer
  2023-02-11 23:43                   ` Bart Schaefer
  2024-02-11  7:00                   ` Stephane Chazelas
  0 siblings, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-11  7:45 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > >
> > > > Ksh prints "global reference cannot refer to local variable".
>
> I'd make it a fatal error unconditionally. I don't like it if the target
> of a nameref can switch to something else following a return.

Making it an unconditional error means that this sort of thing doesn't work:

typeset -n ref
() {
  typeset a=A b=B c=C
  for ref in a b c; do ref=${(L)ref}; done
}

> I'd be
> fine with the reference becoming unset at the time of the return if it
> refers to a variable that is going out of scope. Can that be done as
> part of the same code that drops the local variables?

Maybe.  Might look at that tomorrow.

> The mention of WARN_NESTED_VAR raises something else. Surely a reference
> should disable that warning:

Actually there's an argument that an unexpected up-reference is a
really good reason for that warning.  A function that intentionally
uses an up-reference could turn the option off locally.

> > > Relatedly, what should happen on any failed assignment?
>
> Current code even allows:
>
>   typeset -n ref
>   ref=ref

That's fixed in my sandbox.  Patch eventually forthcoming.

> I'd suggest that we need the self-reference check whenever assigning to
> a reference. But that entails evaluation of a subscript.

As I have it right now, self-reference checks do happen at any
assignment.  That's why I wonder if deleting the offending parameter
is OK.

Circular references hidden inside subscripts end up expanding to empty
string, as do command substitutions with the NO_EXEC trick.

> Do we to put recommended good practices in the doc at all as opposed to
> it being a terse description of the functionality. I'd prefer it out of
> the way with examples rather than adding fluff to the places you list.

The Functions chapter was rather too terse.  Here's what I've written to add:

--- 8< ---
Parameters declared by any of the 'typeset' family of commands during
the execution of a function become _local_ to the function unless the
'-g' option is used.  This is the _scope_ of the parameter, which
extends dynamically to any other functions called by the declaring
function.  In most cases, local parameters take the place of any other
parameter having the same name that was assigned or declared in an
earlier function scope.  (See *note Local Parameters::.)

A named parameter declared with the '-n' option to any of the 'typeset'
commands becomes a reference to a parameter in scope at the time of
assignment to the named reference, which may be at a different call
level than the declaring function.  For this reason, it is good practice
to declare a named reference as soon as the referent parameter is in
scope, and as early as possible in the function if the reference is to a
parameter in a calling scope.

A typical use of named references is to pass the name of the referent
as a positional parameter.  For example,

     pop() {
       local -n ref=$1
       local last=$ref[$#ref]
       ref[$#ref]=()
       print -r -- $last
     }
     array=( a list of five values )
     pop array

prints the word values and shortens $array to '( a list of five )'.
There are no local parameters in pop at the time 'ref=$1' is assigned,
so 'ref' becomes a reference to 'array' in the caller.
--- >8 ---

> I'd also consider it bad practice not to always assign a reference when
> creating it.

That also doesn't work with the for-loop example above (even if the
nameref is moved inside the function).


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  7:45                 ` Bart Schaefer
@ 2023-02-11 23:43                   ` Bart Schaefer
  2023-02-11 23:45                     ` Bart Schaefer
  2023-02-12  7:38                     ` Oliver Kiddle
  2024-02-11  7:00                   ` Stephane Chazelas
  1 sibling, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-11 23:43 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, Feb 10, 2023 at 11:45 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > I'd be
> > fine with the reference becoming unset at the time of the return if it
> > refers to a variable that is going out of scope. Can that be done as
> > part of the same code that drops the local variables?

That's actually a little startling given dynamic scoping.  It leads to this:

{
  typeset -n ref=var
  () {
    var=FUNC1
    typeset -p ref
  }
  typeset -p ref
}

Output:
typeset -g -n ref=var
typeset: no such variable: ref

If the called function were a blackbox, it would be extremely puzzling
why "ref" became unset.

The additional complication is that the parameter table is a hash, so
it's scanned in random order; we can't be sure that the referent local
$var will be found and unset before the referring $ref is examined.  I
suppose we could scan twice (ick).

Instead let's think for a moment about how the dynamic scoping works.
"ref" has two properties:  The locallevel at which it was declared,
and the locallevel at which it prefers to find a referent.  Any time
you use $ref (or assign to ref) the search starts at the current
locallevel and proceeds upward until either the preferred level is
reached or the "topmost" defined parameter is found, whichever comes
first.

So ... if endparamscope() walks the preferred referent locallevel
upward as the scope unwinds, we get (using your earlier example with
my addition):

{
  var=hello
  typeset -n ref
  () {
    typeset var=x
    ref=var
    echo $ref
  }
  typeset -p ref
  echo $ref
  () {
   typeset var=y
   echo $ref
   () {
     typeset var=z
     echo $ref
   }
  }
}

Output:
x
typeset -n ref=var
hello
hello
hello

The inner assignment to ref changes the scope , but it then unwinds so
in the second function (which does not assign to ref) it is back to
pointing at the original scope, and finds $var there.

If "var=hello" is replaced by "unset var" (so there is no parameter at
the original scope), then the output is:

x
typeset -n ref=var

y
y

This is because the assignment "ref=var" has initialized the name to
which "ref" points, but there is no such name at the same level, so
the nearest parameter of the same name is chosen.

If "typeset -n ref" at the top level is replaced by "typeset -n
ref=var" then the "ref=var" assignment in the first function changes
the value of either the global $var or the local $var from "x" to
"var" (depending on which existed first) because ref already has a
referent name.  So the output could be either:

# with var=hello
var
typeset -n ref=var
var
var
var

# with var unset
var
typeset -n ref=var

y
y

Given the requirements that (1) assigning a value to a nameref that
has no referent initializes the nameref (required for "for" loop) and
(2) dynamic scoping selects the best referent by name, I think this is
the best I can do without something even more surprising.

The other option that occurs to me is for the assignment ref=var to
automatically create a local $ref (as if "typeset ref=var" had been
used).  That would then be unwound and we'd end up back in the
original state.  But that would also be unexpected behavior in dynamic
scopes.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11 23:43                   ` Bart Schaefer
@ 2023-02-11 23:45                     ` Bart Schaefer
  2023-02-12  7:38                     ` Oliver Kiddle
  1 sibling, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-11 23:45 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Feb 11, 2023 at 3:43 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Fri, Feb 10, 2023 at 11:45 PM Bart Schaefer
> <schaefer@brasslantern.com> wrote:
> >
> > On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
> > >
> > > fine with the reference becoming unset at the time of the return if it
> > > refers to a variable that is going out of scope.
>
> That's actually a little startling given dynamic scoping.  It leads to this:
>
> {
>   typeset -n ref=var
>   () {
>     var=FUNC1

# that should of course be "local var=FUNC1" sorry

>     typeset -p ref
>   }
>   typeset -p ref
> }
>
> Output:
> typeset -g -n ref=var
> typeset: no such variable: ref


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11 23:43                   ` Bart Schaefer
  2023-02-11 23:45                     ` Bart Schaefer
@ 2023-02-12  7:38                     ` Oliver Kiddle
  1 sibling, 0 replies; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-12  7:38 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> > > I'd be
> > > fine with the reference becoming unset at the time of the return if it
> > > refers to a variable that is going out of scope. Can that be done as
> > > part of the same code that drops the local variables?
>
> That's actually a little startling given dynamic scoping.  It leads to this:

While I said "unset", what I actually meant was only dropping the value
part of the reference - returning it to the `typeset -n ref` state.

However:

> Instead let's think for a moment about how the dynamic scoping works.
> "ref" has two properties:  The locallevel at which it was declared,
> and the locallevel at which it prefers to find a referent.  Any time
> you use $ref (or assign to ref) the search starts at the current
> locallevel and proceeds upward until either the preferred level is
> reached or the "topmost" defined parameter is found, whichever comes
> first.

You've persuaded me. It is consistent with dynamic scoping in general.
Perhaps it has further convinced me (not that it needed to) that lexical
scoping is generally better.

Also when following and considering your examples, the following
scenario occurred to me. I can't see a reason to object to this
behaviour and some of the ideas I had in mind would mess with it. Note
that the reference starts out pointing to an unset variable:

  bar() {
    typeset var=hello
    echo $ref
  }
  foo() {
    typeset -n ref=var
    bar
    var=foo
    echo $ref
  }
  unset var
  foo

It'd be surprising if calling bar somehow lost the value of the
reference.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Speaking of dangerous referents
  2023-02-11  5:36                 ` Speaking of dangerous referents Bart Schaefer
@ 2023-02-12  8:00                   ` Oliver Kiddle
  2023-02-12  8:34                     ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-12  8:00 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> % empty=()
> % loop='empty[${(P)loop}]'
> % print ${(P)loop}
> zsh: segmentation fault (core dumped)  zsh -f

And in 51399 on namerefs:
> Circular references hidden inside subscripts end up expanding to empty
> string, as do command substitutions with the NO_EXEC trick.

That surprises me. I can't see any executions occurring. Would making
(P) a little safer by applying the NO_EXEC trick to it too fix that seg
fault. Or did you have a different fix in mind? Are there sane uses to
(P) with nasty stuff like hidden command substitutions. We have (e) and
eval which are at least explicit in their dangers.

If genuine uses surface later (PP) could turn on the dangerous form. Or
you might argue that we should support (PP) for the safe form.

Making (P) safer could be another use for the FUTURE option I suggested
in the final paragraph of 51281 - perhaps very few people (if any) were
still reading at that point so it may have been overlooked.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Speaking of dangerous referents
  2023-02-12  8:00                   ` Oliver Kiddle
@ 2023-02-12  8:34                     ` Bart Schaefer
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-12  8:34 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Feb 12, 2023 at 12:00 AM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > % empty=()
> > % loop='empty[${(P)loop}]'
> > % print ${(P)loop}
> > zsh: segmentation fault (core dumped)  zsh -f
>
> And in 51399 on namerefs:
> > Circular references hidden inside subscripts end up expanding to empty
> > string, as do command substitutions with the NO_EXEC trick.
>
> That surprises me. I can't see any executions occurring.

I apologize, the reference to NO_EXEC only applies to the clause about
command substitutions.  The circular references are handled by using
tagging to detect/abort the loop during nameref resolution.  I'm
uncertain whether a similar approach could be used for (P) and haven't
dug into it, I didn't want to mix a patch for that into the patches
for namerefs.

> Would making
> (P) a little safer by applying the NO_EXEC trick to it too fix that seg
> fault. Or did you have a different fix in mind?

It wouldn't and I didn't, at least not yet.  It would have to be
something along the lines of the "math recursion limit exceeded"
handling.  There are two or three other places in the code where there
are comments rejecting arbitrary limits on recursion so this might be
something we don't want to fix.

> Making (P) safer could be another use for the FUTURE option I suggested
> in the final paragraph of 51281 - perhaps very few people (if any) were
> still reading at that point so it may have been overlooked.

It would be helpful to know whether any of the uses of (P) in the
existing contributed and completion functions etc. actually rely on
expanding command substitutions.  It would be pretty unlikely that
they rely on self-reference.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09 23:07           ` Bart Schaefer
  2023-02-11  3:04             ` Bart Schaefer
@ 2023-02-12  9:02             ` Oliver Kiddle
  2023-02-12 18:59               ` Bart Schaefer
  1 sibling, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-12  9:02 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On 9 Feb, Bart Schaefer wrote (I've reordered this):
> > This should be an error perhaps:
> >
> >   typeset -n ref=arr[1][2]
>
> Why?  ${ary[1][2]} isn't an error, it's the 2nd character of the first
> word in $ary.

Sorry, should have been clearer. Applying later subscripts is even
better than an error but the following is not ideal:

  % arr=( one two three )
  % typeset -n ref=arr[2][1]
  % echo $ref
  two[1]
  % ref=x
  % typeset -p arr
  typeset -a arr=( one x three )
  % echo $ref
  x[1]
  % typeset -n ref=arr[1]whatever
  % echo $ref
  onewhatever

> Yeah, I don't like the idea that a called function can arbitrarily
> grab parameters from its caller just by sticking & in front.  Caller
> needs to do do something (even if only make "normal" use of dynamic
> scope) to make the parameter "grabbable".

Allowing such things as uplevel statements (Tcl), special syntax, or
introspection modules (Python's inspect), is fairly normal for a
scripting language.

As I've mentioned, we already have the situation where the caller
needs nothing to make a private grabable by a builtin:
  private -a var
  compadd -O var one two three
And writing wrappers to e.g. compadd is useful.

But I don't think we need to address references to privates in other
scopes at this point before the other patches are even applied. May be
better to consider whether there are other advantages in moving private
to the main shell first as that may ease any implementation.

> > >   typeset -n ref
> > >   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...
> >
> > "just"!?
>
> Hah!  Point was that it's do-able without "for"-context-sensitive
> special subscript semantics.  I think it would be strange for

I can't think of a sensible way. Was just that it could be useful.
Best long term solution might be to allow arrays of references. Bash
does allow arrays of integers.

> The others would all have to be special-cased individually.  What is
> $+ ?  Do you mean $- ?

Yes.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-12  9:02             ` [PATCH 1/3]: Add named references Oliver Kiddle
@ 2023-02-12 18:59               ` Bart Schaefer
  2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
  2023-02-12 22:54                 ` [PATCH 1/3]: Add named references Oliver Kiddle
  0 siblings, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2023-02-12 18:59 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Feb 12, 2023 at 1:02 AM Oliver Kiddle <opk@zsh.org> wrote:
>
>   % typeset -n ref=arr[1]whatever
>   % echo $ref
>   onewhatever

Aha.  OK, that's pretty easy to fix.

> As I've mentioned, we already have the situation where the caller
> needs nothing to make a private grabable by a builtin:
>   private -a var
>   compadd -O var one two three
> And writing wrappers to e.g. compadd is useful.
>
> But I don't think we need to address references to privates in other
> scopes at this point before the other patches are even applied. May be
> better to consider whether there are other advantages in moving private
> to the main shell first as that may ease any implementation.

I think it would in fact be necessary to move private to the main
shell in order to fix this.  The second-worst hack in private is the
overloading of some parameter flags, and the internals of e.g.
createparam() et al. need to be able to distinguish the cases of the
original flag meaning from its overloaded meaning.  Builtins have the
advantage of not passing through that code path.

> > > >   typeset -n ref
> > > >   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...
>
> I can't think of a sensible way. Was just that it could be useful.

What about adding ${(K)array) to return array[1] array[2] ... ?

for ref in ${(K)hash}; do ...

I took a quick look at the implementation of (k) and (v) and (K) looks
do-able with sufficient messing around in paramsubst().  Which I'm not
feeling like undertaking myself just now, though.

> Best long term solution might be to allow arrays of references. Bash
> does allow arrays of integers.

Bash also has sparse arrays, its implementation is quite different.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references)
  2023-02-12 18:59               ` Bart Schaefer
@ 2023-02-12 19:45                 ` Bart Schaefer
  2023-02-12 21:01                   ` Oliver Kiddle
  2023-02-12 22:54                 ` [PATCH 1/3]: Add named references Oliver Kiddle
  1 sibling, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2023-02-12 19:45 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Feb 12, 2023 at 10:59 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> > [...] consider whether there are other advantages in moving private
> > to the main shell first as that may ease any implementation.
>
> [...] The second-worst hack in private is the
> overloading of some parameter flags, and the internals of e.g.
> createparam() et al. need to be able to distinguish the cases of the
> original flag meaning from its overloaded meaning.

We're up to (1<<30) on PM_* flags.  Can we go above (1<<31) ?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references)
  2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
@ 2023-02-12 21:01                   ` Oliver Kiddle
  0 siblings, 0 replies; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-12 21:01 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> We're up to (1<<30) on PM_* flags.  Can we go above (1<<31) ?

We can't even use 1<<31 without either refactoring to ensure the use of
unsigned int or making it otherwise work for either a 32 or 64-bit int.
Perhaps assigning it as INT_MIN just works but I'm not sure.
See 43674. I had somehow missed the fact that Peter had freed two bits
there.

It might be better to force it up to 64-bits if doing refactoring. There
are a few mutually exclusive flags which would imply that bits could be
freed but the need to correspond to letters in TYPESET_OPTSTR prevents
that.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-12 18:59               ` Bart Schaefer
  2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
@ 2023-02-12 22:54                 ` Oliver Kiddle
  1 sibling, 0 replies; 48+ messages in thread
From: Oliver Kiddle @ 2023-02-12 22:54 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> On Sun, Feb 12, 2023 at 1:02 AM Oliver Kiddle <opk@zsh.org> wrote:
> >
> >   % typeset -n ref=arr[1]whatever
> >   % echo $ref
> >   onewhatever
>
> Aha.  OK, that's pretty easy to fix.

That still leaves the oddities with a second subscript:

  typeset -n ref='arr[2][1]'
  arr=( one two three )
  echo $ref
  echo ${ref}
  ref=Z
  typeset -p arr

Only case where the second subscript works is when ref is in braces.

> What about adding ${(K)array) to return array[1] array[2] ... ?
>
> for ref in ${(K)hash}; do ...

That'd work. There's been a few times when I've wished (k) and (kv)
would work for normal arrays. the former is just {1..$#arr} but would
work in nested expansions.

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  7:45                 ` Bart Schaefer
  2023-02-11 23:43                   ` Bart Schaefer
@ 2024-02-11  7:00                   ` Stephane Chazelas
  2024-02-11 16:14                     ` Bart Schaefer
  1 sibling, 1 reply; 48+ messages in thread
From: Stephane Chazelas @ 2024-02-11  7:00 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Oliver Kiddle, Zsh hackers list

2023-02-10 23:45:36 -0800, Bart Schaefer:
[...]
> A named parameter declared with the '-n' option to any of the 'typeset'
> commands becomes a reference to a parameter in scope at the time of
> assignment to the named reference, which may be at a different call
> level than the declaring function.  For this reason, it is good practice
> to declare a named reference as soon as the referent parameter is in
> scope, and as early as possible in the function if the reference is to a
> parameter in a calling scope.
[...]

One difference with ksh93: if the target variable was not set at
the time of the "typeset -n ref=target", then when ref is
assigned it may end up refering to a local target:

$ ./Src/zsh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
X var X

$ ./Src/zsh --emulate ksh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var X

$

Compare with ksh:

$ ksh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var foo
X

That's a common use case like for the zslurp function discussed
on zsh-users where one would do:

cmd | zslurp var

To get the output of cmd verbatim into $var with $var potentially undeclared
beforehand, which wouldn't work if var was used locally by zslurp.

It can be worked around with a typeset -g $1 before the typeset -n var=$1

$ ./Src/zsh --emulate ksh -c 'function f { typeset -g $1; typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var foo
X

But it seems to me it would be better to align with ksh's
behaviour in this instance.

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2024-02-11  7:00                   ` Stephane Chazelas
@ 2024-02-11 16:14                     ` Bart Schaefer
  2024-02-11 16:42                       ` Bart Schaefer
  2024-02-18  3:26                       ` Up-scope named references, vs. ksh Bart Schaefer
  0 siblings, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2024-02-11 16:14 UTC (permalink / raw)
  To: Bart Schaefer, Oliver Kiddle, Zsh hackers list

On Sun, Feb 11, 2024 at 1:00 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> One difference with ksh93: if the target variable was not set at
> the time of the "typeset -n ref=target", then when ref is
> assigned it may end up refering to a local target

Hm.  I think I can prevent that from happening (it wasn't intentional)
but I don't know if it's feasible to actually create a new variable at
the original scope, because of the implemention as a linked list of
scopes for each parameter -- it's not clear how to insert a variable
into the middle of the list.

> It can be worked around with a typeset -g $1 before the typeset -n var=$1

Is that really sufficient? Might the -g not select the wrong scope if
the string $1 was passed down a couple of layers?  Seems to me you
have to avoid re-using a variable regardless, especially if you do

typeset -n var=

(create what I've been calling a "placeholder") and then assign to var
in some nested scope.  This may particularly be true of the "for var
in ..." special case for namerefs.

Something I forgot to mention before regarding "_zslurp_var" -- if
you're going to a assume "typeset -n" then you can also assume
namespace syntax e.g. ".zslurp.var"

But the zslurp I put in the distribution was also meant to be somewhat
backward compatible so I didn't use namrefs or namespaces.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2024-02-11 16:14                     ` Bart Schaefer
@ 2024-02-11 16:42                       ` Bart Schaefer
  2024-02-18  3:26                       ` Up-scope named references, vs. ksh Bart Schaefer
  1 sibling, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2024-02-11 16:42 UTC (permalink / raw)
  To: Bart Schaefer, Oliver Kiddle, Zsh hackers list

On Sun, Feb 11, 2024 at 10:14 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> but I don't know if it's feasible to actually create a new variable at
> the original scope, because of the implemention as a linked list of
> scopes for each parameter -- it's not clear how to insert a variable
> into the middle of the list.

In particular inserting one at global scope is messy if there is
already another at a local scope.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Up-scope named references, vs. ksh
  2024-02-11 16:14                     ` Bart Schaefer
  2024-02-11 16:42                       ` Bart Schaefer
@ 2024-02-18  3:26                       ` Bart Schaefer
  2024-02-20 21:05                         ` Stephane Chazelas
  1 sibling, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-02-18  3:26 UTC (permalink / raw)
  To: Zsh hackers list

Re-establishing the original examples:

On Sat, Feb 10, 2024 at 11:00 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> $ ./Src/zsh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
> X var X
>
> Compare with ksh:
>
> $ ksh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
> X var foo
> X

On Sun, Feb 11, 2024 at 8:14 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sun, Feb 11, 2024 at 1:00 AM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > One difference with ksh93: if the target variable was not set at
> > the time of the "typeset -n ref=target", then when ref is
> > assigned it may end up refering to a local target
[...]
> > It can be worked around with a typeset -g $1 before the typeset -n var=$1
>
> Is that really sufficient? Might the -g not select the wrong scope if
> the string $1 was passed down a couple of layers?

Here's what I'm concerned about:

 A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo;
ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'

Note the subtle difference of separating the declaration of "ref" from
assignment to it.  Then consider this variation:

 B) Src/zsh -c 'function f { typeset -n ref; typeset var=foo; ref=$1;
ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'

Now the assignment is after the declaration of the local. And finally:

C) Src/zsh -c 'function f { typeset var=foo; typeset -n ref=$1;
ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'

Now the declaration+assignment of the nameref is after the local.

At least for the versions I have installed on Ubuntu 20.04, and with
parens removed from (!) as appropriate,
 A) ksh prints "X var foo\nX"; mksh prints an error and does not
create a nameref; Src/zsh prints "X var X".
 B) ksh prints "X var X"; mksh prints an error and does not create a
nameref; Src/zsh prints "X var X".
 C) ksh prints "X var foo\nX"; mksh prints "X var X"; Src/zsh prints "X var X".

So in part C, ksh "knows" that $1 is not just a string but an object
that belongs to a surrounding scope.  That's not going to happen in
zsh for a long time, if ever.  It's interesting that it matters that
the assignment appears in the argument of "typeset" e.g. part C vs.
part B.

Incidentally if I change the delcaration to
 typeset -n ref=
then ksh crashes in parts A and B.  The behavior of mksh and Src/zsh
is unchanged.

Adding the "typeset -g $1" changes the outcome for zsh only for part
A, where it prints "X var foo\nX" as Stephane indicated.  (The other
shells don't support -g.)

Given that it's not possible to fix part C for zsh, and zsh agrees
with ksh on part B and with mksh on B and C, is it worth making an
effort to fix Stephane's original example along with part A ?

> Seems to me you
> have to avoid re-using a variable regardless, especially if you do
>
> typeset -n var=
>
> (create what I've been calling a "placeholder") and then assign to var
> in some nested scope.

I have not dived into what happens in ksh if the assignments in A and
B happen in a nested function call.

> This may particularly be true of the "for var in ..." special case for namerefs.

The potential for confusion here seems large.  If I do

 typeset -n up=$1
 typeset var=foo
 for up in $2 $3 $4 $5; do ...

what scope am I applying to each of the words in $2, $3, ... ?  (E.g.,
suppose $3 is "var" instead of $1.)  Does it change if I do

 typeset -n up

instead?  Should that act like part B where the declaration is
separate from the assignment?  What if I do

 for up in "$@"

there?  The ksh implementation seems both inconsistent and slightly magical.

Anyway, the way to "fix" this is to implicitly perform the "typeset
-g" when assigning to the nameref, and if that creates a new parameter
then mark it unset.  But that still may put the new parameter at
global scope rather than at the scope from which $1 was passed; the
expansion of $1 is just a string like any other scalar, and there's no
magic to change that.  It seems as though the more consistent behavior
is what zsh is already doing.  I'm unsure whether doing the implicit
typeset only in ksh emulation mode is worthwhile as it gets us only
partway there.

Adding the "typeset -g" changes the behavior of a large number of
tests in K01, too.

Thoughts?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-02-18  3:26                       ` Up-scope named references, vs. ksh Bart Schaefer
@ 2024-02-20 21:05                         ` Stephane Chazelas
  2024-02-20 22:30                           ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Stephane Chazelas @ 2024-02-20 21:05 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-02-17 19:26:07 -0800, Bart Schaefer:
[...]
> Here's what I'm concerned about:
> 
>  A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo;
> ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
> 
> Note the subtle difference of separating the declaration of "ref" from
> assignment to it.

I wouldn't mind for the typeset -n var=value form to be
required like in mksh, like for typeset -r var=value.

Note that the ksh93 man page mentions:

ksh93> A  nameref  provides a convenient way to refer to the variable inside a
ksh93> function whose name is passed as an argument to a function.  For  exam‐
ksh93> ple,  if  the  name  of a variable is passed as the first argument to a
ksh93> function, the command typeset -n var=$1 (a.k.a.  nameref var=$1) inside
ksh93> the function causes references and assignments to var to be  references
ksh93> and assignments to the variable whose name has been passed to the func‐
ksh93> tion.   Note  that,  for this to work, the positional parameter must be
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ksh93> assigned directly to the nameref as part of the declaration command, as
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ksh93> in the example above; only that idiom can allow one function to  access
ksh93> a  local  variable  of  another.   For instance, typeset -n var; var=$1
ksh93> won't cross that barrier, nor will typeset foo=$1; typeset -n var=foo.

>  B) Src/zsh -c 'function f { typeset -n ref; typeset var=foo; ref=$1;
> ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
> 
> Now the assignment is after the declaration of the local. And finally:
> 
> C) Src/zsh -c 'function f { typeset var=foo; typeset -n ref=$1;
> ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
> 
> Now the declaration+assignment of the nameref is after the local.

The doc already says:

zsh> A named parameter declared with the '-n' option to any of the 'typeset'
zsh> commands becomes a reference to a parameter in scope at the time of
zsh> assignment to the named reference, which may be at a different call
zsh> level than the declaring function.  For this reason, it is good practice
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
zsh> to declare a named reference as soon as the referent parameter is in
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
zsh> scope, and as early as possible in the function if the reference is to a
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
zsh> parameter in a calling scope.
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Which I'd think covers that.

> At least for the versions I have installed on Ubuntu 20.04, and with
> parens removed from (!) as appropriate,

Note that you don't want to use that one. IIRC, the one on
Ubuntu 20.04 is based on the now discontinued ksh2020 which was
based on the ksh93v- beta release from AT&T. It was deemed too
buggy and abandoned.

You can find a maintained version based on ksh93u+ at
https://github.com/ksh93/ksh which includes hundreds of bug
fixes (including from Solaris, RedHat and some backported from
ksh2020).

[...]
> So in part C, ksh "knows" that $1 is not just a string but an object
> that belongs to a surrounding scope.

Note that ksh93 is special in that it does static scoping.
There's not a stack of scopes like in shells that do dynamic
scoping (ksh88, zsh, bash...), just a global scope and the
local scope of the function. So having namerefs there is
critical as it's the only way for a function to be able to
access the variables of its caller (though one can also export a
local variable to make it available to callees).

I suppose zsh's private variables are similar to that.

[...]
> Given that it's not possible to fix part C for zsh, and zsh agrees
> with ksh on part B and with mksh on B and C, is it worth making an
> effort to fix Stephane's original example along with part A ?
[...]

Seems to me if we have to work around it by namespacing
variables or use argv like in functions/regexp-replace, that
defeats the purpose of those namerefs.

Contrary to mksh or bash, it alread gets this right:

$ ./Src/zsh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"'
inner l=foo
outer l=new
$ ksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"'
inner l=foo
outer l=new
$ mksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"'
inner l=new
outer l=before
$ bash -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"'
inner l=new
outer l=before

The only remaining problem is when the refered variable
is not set/declared.

$ ksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"'
inner l=foo
outer l=new
$ ./Src/zsh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"'
inner l=new
outer l=
$ mksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"'
inner l=new
outer l=
$ bash -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"'
inner l=new
outer l=

[...]
> The potential for confusion here seems large.  If I do
> 
>  typeset -n up=$1
>  typeset var=foo
>  for up in $2 $3 $4 $5; do ...
> 
> what scope am I applying to each of the words in $2, $3, ... ?  (E.g.,
> suppose $3 is "var" instead of $1.)  Does it change if I do

I find that special behaviour of "for" quite surprising.
I wasn't aware ksh93 and bash did it as well. I find 
mksh's behaviour more consistent though I can see how it can be
difficult to do without.

$ mksh -c 'a=1 b=2; f() { typeset name; for name do typeset -n v=$name; echo "$name=$v"; done; }; f a b'
a=1
b=2

Won't work for "f name".

Seems buggy in ksh93:

$ ksh -c 'a=1 b=2; typeset -n v; for v in a b; do echo "v=$v a=$a b=$b"; done'
v=1 a=1 b=2
v=1 a=1 b=2
$ ksh -c 'a=1 b=2; typeset -n v; for v in a b; do echo "v=$v a=$a b=$b ${!v}"; done'
v=1 a=1 b=2 a
v=2 a=1 b=2 b

I would think that would be nowhere as common a usage as the
ones where a function is meant to return something into the
variable(s) passed as argument.

I can see how it can still be difficult to do without
namespacing. For instance having a function that does option
processing à la print -v var would have to do:

print() {
  local _print_{opt,raw=false}
  while getopts rv: _print_opt; do
    case $_print_opt in
      (r) _print_raw=true;;
      (v) typeset -n result=$OPTARG;;
    esac
  done
}

To avoid clashes (or the equivalent with .print.opt instead
thouh I'm not sure I see the benefit if that means one needs to
use ${.print.opt} instead of just $_print_opt).

But that would already be better than what you get in bash or
mksh.

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-02-20 21:05                         ` Stephane Chazelas
@ 2024-02-20 22:30                           ` Bart Schaefer
  2024-02-21 20:12                             ` Stephane Chazelas
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-02-20 22:30 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Feb 20, 2024 at 1:05 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> 2024-02-17 19:26:07 -0800, Bart Schaefer:
> [...]
> >  A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo;
> > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
>
> Note that the ksh93 man page mentions:
>
> ksh93> Note  that,  for this to work, the positional parameter must be
> ksh93> assigned directly to the nameref as part of the declaration command
> ksh93> [...]   For instance, typeset -n var; var=$1
> ksh93> won't cross that barrier

And yet, after replacing ksh2020 with Version AJM 93u+ 2012-08-01 (I
had already done this on a different workstation but had forgotten
about it on the one where I tested):

ksh -c 'function f { typeset -n ref; ref=$1; typeset var=foo;
ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var foo
X

So it sure looks like the barrier was crossed in a case where the doc
stated it would not be.

With 93u+ "typeset -n ref=;" produces
  ksh[2]: f[1]: typeset: : invalid variable name
rather than crashing.

> Note that ksh93 is special in that it does static scoping.
>
> I suppose zsh's private variables are similar to that.

Sort of ... private variables are never accessible in another scope,
even with namerefs.

> [...]
> > Given that it's not possible to fix part C for zsh, and zsh agrees
> > with ksh on part B and with mksh on B and C, is it worth making an
> > effort to fix Stephane's original example along with part A ?
> [...]
>
> Seems to me if we have to work around it by namespacing
> variables or use argv like in functions/regexp-replace, that
> defeats the purpose of those namerefs.

You don't have to use namespacing, you just have to be sure that
either the caller has declared the parameter that it wants referenced,
or that the called function does the "typeset -g $1" preliminary (and
follows the doc recommendation about doing the "typeset -n" early).
There's nothing really to be done about the other cases, and
particularly not about specifying the (dynamic) scope of the referent.

> The only remaining problem is when the refered variable
> is not set/declared.

It IS possible to make that "typeset -g" implicit, but that has other
potential effects such as creating an (unset) parameter in global
scope (triggering WARN_CREATE_GLOBAL?).  If the parameter already has
been declared in a calling function scope, then even with the -g, the
nameref will point to that one, even if it's more than one level up
the call chain.  Which ought to be what you expect from dynamic
scoping.

> Contrary to mksh or bash, it alread gets this right:

Yes, as soon as an existing parameter is found the nameref records
what scope contains that parameter and looks for it there when
(de)referenced.  The glitchy bit is when there is no existing
parameter from which to obtain a scope.  I'm reluctant to try to make
this too magical, which is why I lean toward asking function authors
to explicitly do the "typeset -g $1" when that's what they mean.
Could certainly add this to the doc.

Or maybe add an option to "typeset" to turn this on or off, though I'm
not sure the existing function call framework provides all the needed
info.

> > The potential for confusion here seems large.  If I do
> >
> >  typeset -n up=$1
> >  typeset var=foo
> >  for up in $2 $3 $4 $5; do ...
> >
> > what scope am I applying
>
> I find that special behaviour of "for" quite surprising.

So did I.  I await with gritted teeth the first time that Ray forgets
to declare his loop variable with a nameref of the same name in scope.

Mixing namerefs with dynamic scoping does have a number of potential gotchas.

> I wasn't aware ksh93 and bash did it as well.

If they didn't, I wouldn't have implemented it.

> Seems buggy in ksh93:

Not all that surprising given the potential confusions.

> $ mksh -c 'a=1 b=2; f() { typeset name; for name do typeset -n v=$name; echo "$name=$v"; done; }; f a b'
> a=1
> b=2
>
> Won't work for "f name".

There's probably a way to do it with "f name a b" + shift, but I'm not
going to work it out.

> I can see how it can still be difficult to do without
> namespacing. For instance having a function that does option
> processing à la print -v var would have to do:

zparseopts and/or a declared assoc can get you a long way in this sort
of example while minimizing the opportunity for clashes.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-02-20 22:30                           ` Bart Schaefer
@ 2024-02-21 20:12                             ` Stephane Chazelas
  2024-02-29  5:16                               ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Stephane Chazelas @ 2024-02-21 20:12 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-02-20 14:30:56 -0800, Bart Schaefer:
> On Tue, Feb 20, 2024 at 1:05 PM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > 2024-02-17 19:26:07 -0800, Bart Schaefer:
> > [...]
> > >  A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo;
> > > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
> >
> > Note that the ksh93 man page mentions:
> >
> > ksh93> Note  that,  for this to work, the positional parameter must be
> > ksh93> assigned directly to the nameref as part of the declaration command
> > ksh93> [...]   For instance, typeset -n var; var=$1
> > ksh93> won't cross that barrier
> 
> And yet, after replacing ksh2020 with Version AJM 93u+ 2012-08-01

With recent maintained versions with all the bug fixes, it would
look something like:

$ ./arch/linux.i386-64/bin/ksh -c 'echo ${.sh.version}'
Version AJM 93u+m/1.1.0-alpha+b8e3d2a4 2024-02-17

The one on Ubuntu 22.04 has:

$ ksh -c 'echo ${.sh.version}'
Version AJM 93u+m/1.0.0-beta.2 2021-12-17

93u+ 2012-08-01 would be the one from AT&T without the later bug
fixes.

> ksh -c 'function f { typeset -n ref; ref=$1; typeset var=foo;
> ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
> X var foo
> X
> 
> So it sure looks like the barrier was crossed in a case where the doc
> stated it would not be.

But in that case, you're referencing a variable from the global
scope, not the local scope from a parent function.

[...]
> > The only remaining problem is when the refered variable
> > is not set/declared.
> 
> It IS possible to make that "typeset -g" implicit, but that has other
> potential effects such as creating an (unset) parameter in global
> scope (triggering WARN_CREATE_GLOBAL?).  If the parameter already has
> been declared in a calling function scope, then even with the -g, the
> nameref will point to that one, even if it's more than one level up
> the call chain.  Which ought to be what you expect from dynamic
> scoping.
> 
> > Contrary to mksh or bash, it alread gets this right:
> 
> Yes, as soon as an existing parameter is found the nameref records
> what scope contains that parameter and looks for it there when
> (de)referenced.  The glitchy bit is when there is no existing
> parameter from which to obtain a scope.  I'm reluctant to try to make
> this too magical, which is why I lean toward asking function authors
> to explicitly do the "typeset -g $1" when that's what they mean.
> Could certainly add this to the doc.
[...]

I have no notion of how zsh variables are implemented so I'm
probably talking rubish and it's hard to think this whole thing
through, but it seems to me that for a nameref variable, upon
typeset -n ref=var, zsh should record at that point what scope
or level of the stack the var is to be found and keep that
record for the duration of the scope where that nameref is
defined or until the next typeset -n ref=anything in the same
scope.

Like:
- if var is found (declared) in the current scope, record that
  it's that of that scope that it is to get the value of or set
  or unset (and reset again thereafter)
- if not, repeat for parent scope
- if not found at all, record that it's to be at the global
  scope.

Like in:

f() { typeset a; unset a; g; echo "$a"; }
g() { h; }
h() { 
  typeset b=x
  typeset -n r1=a r2=b r3=c
  typeset a=foo c=bar
  r1=x r2=y r3=z
  unset r{1..3}
  r1=x r2=y r3=z
}
f

r1=x sets the $a of f, r2=y sets the $b of h, and r3=z sets a
global $c both times.

[...]
> So did I.  I await with gritted teeth the first time that Ray forgets
> to declare his loop variable with a nameref of the same name in scope.
[...]

Yes, difficult decision to choose between
- cleaner design, align with mksh
- align with original design in ksh93 and bash which already
  copied it (well, that part).

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-02-21 20:12                             ` Stephane Chazelas
@ 2024-02-29  5:16                               ` Bart Schaefer
  2024-03-01 18:22                                 ` Stephane Chazelas
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-02-29  5:16 UTC (permalink / raw)
  To: Zsh hackers list

Lost track of this for a bit ..

On Wed, Feb 21, 2024 at 12:12 PM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> Like:
> - if var is found (declared) in the current scope, record that
>   it's that of that scope that it is to get the value of or set
>   or unset (and reset again thereafter)
> - if not, repeat for parent scope
> - if not found at all, record that it's to be at the global
>   scope.

That's where this whole discussion started:  It's not good enough to
"record that it's to be global" because if a local of the same name is
later declared, there is no way to insert a new global "above" that,
it's akin to attempting to allocate new space in your caller's stack
frame in C (except there's one stack for each parameter name).  The
workaround in shell code is to start with "typeset -g a" (using your
example code) to initialize the top stack frame.  The workaround in
underlying C code would be to have "typeset -n r1=a" (again your
example) implicitly create the global $a as early as possible.

> f() { typeset a; unset a; g; echo "$a"; }
> g() { h; }
> h() {
>   typeset b=x
>   typeset -n r1=a r2=b r3=c
>   typeset a=foo c=bar
>   r1=x r2=y r3=z
>   unset r{1..3}
>   r1=x r2=y r3=z
> }
> f
>
> r1=x sets the $a of f, r2=y sets the $b of h, and r3=z sets a
> global $c both times.

It works that way for "the $a of f" and "the $b of h", but r3 assigns
"the $c of h" both times.  Change that to:

h() {
  typeset b=x
  typeset -n r1=a r2=b r3=c
  typeset a=foo
  r1=x r2=y r3=z
  typeset c=bar
  unset r{1..3}
  r1=x r2=y r3=z
}

And now the first r3=z creates the global $c so the second r3=z
continues to refer to it.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-02-29  5:16                               ` Bart Schaefer
@ 2024-03-01 18:22                                 ` Stephane Chazelas
  2024-03-01 20:34                                   ` Bart Schaefer
  2024-03-01 23:28                                   ` Up-scope named references, vs. ksh Bart Schaefer
  0 siblings, 2 replies; 48+ messages in thread
From: Stephane Chazelas @ 2024-03-01 18:22 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-02-28 21:16:13 -0800, Bart Schaefer:
[...]
> That's where this whole discussion started:  It's not good enough to
> "record that it's to be global" because if a local of the same name is
> later declared, there is no way to insert a new global "above" that,
> it's akin to attempting to allocate new space in your caller's stack
> frame in C (except there's one stack for each parameter name).  The
> workaround in shell code is to start with "typeset -g a" (using your
> example code) to initialize the top stack frame.  The workaround in
> underlying C code would be to have "typeset -n r1=a" (again your
> example) implicitly create the global $a as early as possible.
[...]

Couldn't typeset -n ref=var, when it finds that var is not set
create one in global scope with a "unset-but-referenced" flag
and maybe a reference count associated with it and that can be
unset but not removed until the reference count reaches 0 when
no nameref point to it?

Another (similar) problem:

Imagine a function meant to create a variable whose name is
passed as argument *as a scalar*:

$ ./Src/zsh -c 'f() { typeset -n ref=$1; local var=2; ref=3; }; typeset -A var; f var; echo $var'
f: var: attempt to set slice of associative array

Doesn't work if the variable was previously set as a hash.

Work around would be to unset it first but:

$ ./Src/zsh -c 'f() { typeset -n ref=$1; unset ref; local var=2; ref=3; }; typeset -A var; f var; echo $var'
<empty>

$ref ended up pointing to the local var again instead of the one
in global scope it was intended to reference.

Comparing with ksh93:

$ ksh -c 'function f { typeset -n ref=$1; typeset var=2; ref=3; }; typeset -A var=([a]=b); f var; typeset -p var'
typeset -A var=([0]=3 [a]=b)
$ ksh -c 'function f { typeset -n ref=$1; unset ref; typeset var=2; ref=3; }; typeset -A var=([a]=b); f var; typeset -p var'
var=3

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-01 18:22                                 ` Stephane Chazelas
@ 2024-03-01 20:34                                   ` Bart Schaefer
  2024-03-02  7:29                                     ` Bart Schaefer
  2024-03-01 23:28                                   ` Up-scope named references, vs. ksh Bart Schaefer
  1 sibling, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-03-01 20:34 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

On Fri, Mar 1, 2024 at 10:22 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> 2024-02-28 21:16:13 -0800, Bart Schaefer:
> > [...] The workaround in
> > underlying C code would be to have "typeset -n r1=a" (again your
> > example) implicitly create the global $a as early as possible.
> [...]
>
> Couldn't typeset -n ref=var, when it finds that var is not set
> create one in global scope with a "unset-but-referenced" flag

Yes, that's what I just said.

> and maybe a reference count associated with it

Reference counting ends up being a pretty significant refactoring of
the parameter code, we started down that path some time ago and didn't
get satisfactory results.

> Comparing with ksh93:

You really can't, because ksh93 uses static scoping of locals where
zsh uses dynamic scoping, and ksh also has special handling of
"typeset -n ref=$1" (and other positionals) that is simply never going
to happen in zsh.

The upshot is that if we go with implicitly creating a variable in
global scope, it doesn't solve the other reference issues -- so the
question is, when we document the pitfalls of using upward-pointing
named references, do we also break the dynamic scope paradigm for the
specific case of nonexistent globals?

> Imagine a function meant to create a variable whose name is passed as argument

That can't cause a serious difficulty because if you have such a
function, the caller must know that the argument is a parameter name,
and therefore must be able to create the parameter in the correct
scope in advance.  It's just not as syntactically sugary as ksh.

Note that all of these situations also exist with builtins like zstat
and even "print -v" that accept parameter names as arguments.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-01 18:22                                 ` Stephane Chazelas
  2024-03-01 20:34                                   ` Bart Schaefer
@ 2024-03-01 23:28                                   ` Bart Schaefer
  2024-03-03 13:44                                     ` Stephane Chazelas
  1 sibling, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-03-01 23:28 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, Mar 1, 2024 at 10:22 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> $ ./Src/zsh -c 'f() { typeset -n ref=$1; unset ref; local var=2; ref=3; }; typeset -A var; f var; echo $var'
> <empty>

That's an actual bug, regardless of the rest of this discussion.

diff --git a/Src/builtin.c b/Src/builtin.c
index 44dfed651..83144677b 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -3932,8 +3932,14 @@ bin_unset(char *name, char **argv, Options ops, int func)
            }
        } else {
            if (!OPT_ISSET(ops,'n')) {
+               int ref = (pm->node.flags & PM_NAMEREF);
                if (!(pm = (Param)resolve_nameref(pm, NULL)))
                    continue;
+               if (ref && pm->level < locallevel) {
+                   /* Just mark unset, do not remove from table */
+                   pm->node.flags |= PM_DECLARED|PM_UNSET;
+                   continue;
+               }
            }
            if (unsetparam_pm(pm, 0, 1))
                returnval = 1;


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-01 20:34                                   ` Bart Schaefer
@ 2024-03-02  7:29                                     ` Bart Schaefer
  2024-03-02 23:55                                       ` [PATCH] "typeset -nu" (was Re: Up-scope named references, vs. ksh) Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-03-02  7:29 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

On Wed, Feb 28, 2024 at 9:16 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> it's akin to attempting to allocate new space in your caller's stack
> frame in C (except there's one stack for each parameter name).

On Fri, Mar 1, 2024 at 12:34 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> zsh uses dynamic scoping, and ksh also has special handling of
> "typeset -n ref=$1" (and other positionals) that is simply never going
> to happen in zsh.

How about this:  Instead of magically turning a positional into a
reference into the surrounding scope, let's make it explicit.

I propose that when the -u flag is combined with the -n flag, the
named reference refers to the enclosing scope, even if there's a local
of that name already in scope.  Otherwise (without -u) normal dynamic
scoping applies.  So in an example like this:

y () {
  typeset $1=Y
  typeset -nu upref=$1
  upref=UP
  echo In y:
  typeset -p $1
}
z () {
  local var
  y var
  echo In z:
  typeset -p var
}

The output would be:

In y:
typeset var=Y
In z:
typeset var=UP

However, there's no good way around the stack problem, so removing the
declaration in z --

z () {
  y var
  echo In z:
  typeset -p var
}

-- will cause an error at the "upref=UP" assignment in y.  Or that
assignment could just silently fail, which already happens in a couple
of existing tests.  The decision on that probably rests on what to do
with prefix assignments, e.g., if it's an error then

upref=UP /bin/echo bad

would print a message and not execute at all.  If instead the
assignment does nothing but set $?, then /bin/echo will execute and
its status will determine $?.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] "typeset -nu" (was Re: Up-scope named references, vs. ksh)
  2024-03-02  7:29                                     ` Bart Schaefer
@ 2024-03-02 23:55                                       ` Bart Schaefer
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2024-03-02 23:55 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

On Fri, Mar 1, 2024 at 11:29 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> How about this:  Instead of magically turning a positional into a
> reference into the surrounding scope, let's make it explicit.

Here's a patch to implement this, including doc with an example.

Of particular note is the last sentence here:

+To force a named reference to refer to the outer scope, even if a local
+has already been declared, add the tt(-u) option when declaring the
+named reference.  In this case var(rname) should already exist in the
+outer scope, otherwise the behavior of assignment through var(pname)
+is not defined and may change the scope of the reference or fail with
+a status of 1.

I went with assignment silently failing rather than printing an error
message in the cases where a parameter in the calling scope cannot be
created.

[-- Attachment #2: typeset-nu.txt --]
[-- Type: text/plain, Size: 13162 bytes --]

diff --git a/Doc/Zsh/builtins.yo b/Doc/Zsh/builtins.yo
index 6318053d8..7a9684ac8 100644
--- a/Doc/Zsh/builtins.yo
+++ b/Doc/Zsh/builtins.yo
@@ -2052,13 +2052,17 @@ cindex(named reference)
 cindex(reference, named)
 The flag tt(-n) creates a em(named reference) to another parameter.
 The second parameter need not exist at the time the reference is
-created.  Only tt(-g) and tt(-r) may be used in conjunction with
+created.  Only tt(-g), tt(-u), and tt(-r) may be used in conjunction with
 tt(-n).  The var(name) so created may not be an array element nor use
 a subscript, but the var(value) assigned may be any valid parameter
 name syntax, even a subscripted array element (including an associative
 array element) or an array slice, which is evaluated when the named
 reference is expanded.  It is an error for a named reference to refer
-to itself, even indirectly through a chain of references.
+to itself, even indirectly through a chain of references.  When tt(-u)
+is applied to a named reference, the parameter identified by var(value)
+is always found in the calling function scope rather than the current
+local scope.  In this case, if there is no such parameter in the calling
+scope, assignments to the named reference may fail, setting tt($?) to 1.
 See ifzman(zmanref(zshexpn))ifnzman(noderef(Parameter Expansion)) and
 ifzman(zmanref(zshparam))ifnzman(noderef(Parameters)) for details of the
 behavior of named references.
diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 2acfd08c9..183ca6e03 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1600,6 +1600,25 @@ example(tt(before local: OUTER)
 tt(by reference: OUTER)
 tt(after func: RESULT))
 
+To force a named reference to refer to the outer scope, even if a local
+has already been declared, add the tt(-u) option when declaring the
+named reference.  In this case var(rname) should already exist in the
+outer scope, otherwise the behavior of assignment through var(pname)
+is not defined and may change the scope of the reference or fail with
+a status of 1.  Example of correct usage:
+ifzman()
+example(tt(caller=OUTER)
+tt(func LPAR()RPAR() {)
+tt(  print before local: $caller)
+tt(  local caller=INNER)
+tt(  print after local: $caller)
+tt(  typeset -n -u outer=$1)
+tt(  print by reference: $outer)
+tt(  outer=RESULT)
+tt(})
+tt(func caller)
+tt(print after func: $caller))
+
 Note, however, that named references to em(special) parameters acquire
 the behavior of the special parameter, regardless of the scope where
 the reference is declared.
diff --git a/Doc/Zsh/func.yo b/Doc/Zsh/func.yo
index d4914df7a..7b71e34e9 100644
--- a/Doc/Zsh/func.yo
+++ b/Doc/Zsh/func.yo
@@ -31,10 +31,12 @@ referent parameter is in scope, and as early as possible in the
 function if the reference is to a parameter in a calling scope.
 
 A typical use of named references is to pass the name
-of the referent as a positional parameter.  For example,
+of the referent as a positional parameter.  In this case it is
+good practice to use the tt(-u) option to reference the calling
+scope.  For example,
 ifzman()
 example(pop+LPAR()RPAR() {
-  local -n ref=$1
+  local -nu ref=$1
   local last=$ref[$#ref]
   ref[$#ref]=LPAR()RPAR()
   print -r -- $last
@@ -43,9 +45,10 @@ array=LPAR() a list of five values RPAR()
 pop array)
 
 prints the word `tt(values)' and shortens `tt($array)' to
-`tt(LPAR() a list of five RPAR())'.  There are no local parameters in
-tt(pop) at the time `tt(ref=$1)' is assigned, so `tt(ref)' becomes a
-reference to `tt(array)' in the caller.
+`tt(LPAR() a list of five RPAR())'.  With tt(-nu), `tt(ref)' becomes a
+reference to `tt(array)' in the caller.  There are no local parameters in
+tt(pop) at the time `tt(ref=$1)' is assigned, so in this example tt(-u)
+could have been omitted, but it makes the intention clear.
 
 Functions execute in the same process as the caller and
 share all files
diff --git a/Doc/Zsh/mod_ksh93.yo b/Doc/Zsh/mod_ksh93.yo
index 9cd708d10..7508758aa 100644
--- a/Doc/Zsh/mod_ksh93.yo
+++ b/Doc/Zsh/mod_ksh93.yo
@@ -12,7 +12,7 @@ The single builtin provided by this module is:
 startitem()
 findex(nameref)
 cindex(named references, creating)
-item(tt(nameref) [ tt(-r) ] var(pname)[tt(=)var(rname)])(
+item(tt(nameref) [ tt(-gur) ] var(pname)[tt(=)var(rname)])(
 Equivalent to tt(typeset -n )var(pname)tt(=)var(rname)
 
 However, tt(nameref) is a builtin command rather than a reserved word,
diff --git a/Etc/FAQ.yo b/Etc/FAQ.yo
index 145ef02c9..4a86050e6 100644
--- a/Etc/FAQ.yo
+++ b/Etc/FAQ.yo
@@ -1025,6 +1025,13 @@ label(210)
     HIT:SPOT
   )
 
+  Dynamic scoping applies to named references, so for example a named
+  reference declared in global scope may be used in function scopes.
+  In ksh, local parameters have static scope, so named references in
+  zsh may have side-effects that do not occur in ksh.  To limit those
+  effects, mytt(zmodload zsh/param/private) and declare all named
+  references mytt(private).
+
   Named references may be used in zsh versions later than 5.9.
 
 sect(What is zsh's support for non-forking command substitution?)
diff --git a/Src/Modules/ksh93.c b/Src/Modules/ksh93.c
index 9af5e1d69..6760cbca0 100644
--- a/Src/Modules/ksh93.c
+++ b/Src/Modules/ksh93.c
@@ -38,7 +38,7 @@
  */
 
 static struct builtin bintab[] = {
-    BUILTIN("nameref", BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "gr", "n")
+    BUILTIN("nameref", BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "gur", "n")
 };
 
 #include "zsh.mdh"
diff --git a/Src/builtin.c b/Src/builtin.c
index 83144677b..ba9cb03c2 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -2699,7 +2699,7 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 	    off |= bit;
     }
     if (OPT_MINUS(ops,'n')) {
-	if ((on|off) & ~PM_READONLY) {
+	if ((on|off) & ~(PM_READONLY|PM_UPPER)) {
 	    zwarnnam(name, "no other attributes allowed with -n");
 	    return 1;
 	}
diff --git a/Src/exec.c b/Src/exec.c
index d85adbea5..8be7bf17d 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1322,7 +1322,7 @@ execsimple(Estate state)
 	    fputc('\n', xtrerr);
 	    fflush(xtrerr);
 	}
-	lv = (errflag ? errflag : cmdoutval);
+	lv = (errflag ? errflag : cmdoutval ? cmdoutval : lastval);
     } else {
 	int q = queue_signal_level();
 	dont_queue_signals();
@@ -2604,6 +2604,7 @@ addvars(Estate state, Wordcode pc, int addflags)
 		opts[ALLEXPORT] = allexp;
 	    } else
 	    	pm = assignsparam(name, val, myflags);
+	    lastval = !pm; /* zerr("%s: assignment failed", name); */
 	    if (errflag) {
 		state->pc = opc;
 		return;
@@ -2628,7 +2629,9 @@ addvars(Estate state, Wordcode pc, int addflags)
 	    }
 	    fprintf(xtrerr, ") ");
 	}
-	assignaparam(name, arr, myflags);
+	;
+	lastval = !assignaparam(name, arr, myflags);
+	/* zerr("%s: array assignment failed", name); */
 	if (errflag) {
 	    state->pc = opc;
 	    return;
diff --git a/Src/params.c b/Src/params.c
index 064dbd2bc..790c978f2 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -1060,8 +1060,7 @@ createparam(char *name, int flags)
 			  "BUG: local parameter is not unset");
 		    oldpm = lastpm;
 		}
-	    } else
-		flags |= PM_NAMEREF;
+	    }
 	}
 
 	DPUTS(oldpm && oldpm->level > locallevel,
@@ -6264,10 +6263,12 @@ resolve_nameref(Param pm, const Asgment stop)
 	    }
 	} else if ((hn = gethashnode2(realparamtab, seek))) {
 	    if (pm) {
-		if (!(stop && (stop->flags & (PM_LOCAL))))
-		    hn = (HashNode)upscope((Param)hn,
-					   ((pm->node.flags & PM_NAMEREF) ?
-					    pm->base : ((Param)hn)->level));
+		if (!(stop && (stop->flags & (PM_LOCAL)))) {
+		    int scope = ((pm->node.flags & PM_NAMEREF) ?
+				 ((pm->node.flags & PM_UPPER) ? -1 :
+				  pm->base) : ((Param)hn)->level);
+		    hn = (HashNode)upscope((Param)hn, scope);
+		}
 		/* user can't tag a nameref, safe for loop detection */
 		pm->node.flags |= PM_TAGGED;
 	    }
@@ -6313,11 +6314,13 @@ setloopvar(char *name, char *value)
 static void
 setscope(Param pm)
 {
-    if (pm->node.flags & PM_NAMEREF) {
+    queue_signals();
+    if (pm->node.flags & PM_NAMEREF) do {
 	Param basepm;
 	struct asgment stop;
 	char *refname = GETREFNAME(pm);
 	char *t = refname ? itype_end(refname, INAMESPC, 0) : NULL;
+	int q = queue_signal_level();
 
 	/* Temporarily change nameref to array parameter itself */
 	if (t && *t == '[')
@@ -6327,9 +6330,11 @@ setscope(Param pm)
 	stop.name = "";
 	stop.value.scalar = NULL;
 	stop.flags = PM_NAMEREF;
-	if (locallevel)
+	if (locallevel && !(pm->node.flags & PM_UPPER))
 	    stop.flags |= PM_LOCAL;
+	dont_queue_signals();	/* Prevent unkillable loops */
 	basepm = (Param)resolve_nameref(pm, &stop);
+	restore_queue_signals(q);
 	if (t) {
 	    pm->width = t - refname;
 	    *t = '[';
@@ -6342,7 +6347,7 @@ setscope(Param pm)
 			if (upscope(pm, pm->base) == pm) {
 			    zerr("%s: invalid self reference", refname);
 			    unsetparam_pm(pm, 0, 1);
-			    return;
+			    break;
 			}
 			pm->node.flags &= ~PM_SELFREF;
 		    } else if (pm->base == pm->level) {
@@ -6350,7 +6355,7 @@ setscope(Param pm)
 			    strcmp(pm->node.nam, refname) == 0) {
 			    zerr("%s: invalid self reference", refname);
 			    unsetparam_pm(pm, 0, 1);
-			    return;
+			    break;
 			}
 		    }
 		} else if ((t = GETREFNAME(basepm))) {
@@ -6358,7 +6363,7 @@ setscope(Param pm)
 			strcmp(pm->node.nam, t) == 0) {
 			zerr("%s: invalid self reference", refname);
 			unsetparam_pm(pm, 0, 1);
-			return;
+			break;
 		    }
 		}
 	    } else
@@ -6378,7 +6383,8 @@ setscope(Param pm)
 	    zerr("%s: invalid self reference", refname);
 	    unsetparam_pm(pm, 0, 1);
 	}
-    }
+    } while (0);
+    unqueue_signals();
 }
 
 /**/
@@ -6390,6 +6396,8 @@ upscope(Param pm, int reflevel)
 	pm = up;
 	up = up->old;
     }
+    if (reflevel < 0 && locallevel > 0)
+	return pm->level == locallevel ? up : pm;
     return pm;
 }
 
diff --git a/Test/K01nameref.ztst b/Test/K01nameref.ztst
index ff48e2289..6db819389 100644
--- a/Test/K01nameref.ztst
+++ b/Test/K01nameref.ztst
@@ -492,7 +492,6 @@ F:unexpected side-effects of previous tests
  }
  typeset -p ptr2
 1:up-reference part 5, stacked namerefs, end not in scope
-F:What is the correct behavior for the scope of ptr1?
 >typeset -n ptr1=ptr2
 >typeset -n ptr2
 >ptr1=ptr2
@@ -529,6 +528,49 @@ F:Same test, should part 5 output look like this?
 >typeset -n ptr2
 >typeset ptr2=val
 
+ () {
+   () {
+     local var
+     typeset -nu ptr1=var
+     ptr1=outer && print -u2 assignment expected to fail
+     typeset -n ptr2=var
+     ptr2=inner
+     typeset -n
+     printf "%s=%s\n" ptr1 "$ptr1" ptr2 "$ptr2"
+   }
+   typeset -p var
+ }
+ typeset -p var
+1:up-reference part 7, upscope namerefs, end not in scope
+>ptr1=var
+>ptr2=var
+>ptr1=
+>ptr2=inner
+*?*typeset*: no such variable: var
+?*typeset*: no such variable: var
+
+ typeset var
+ () {
+   () {
+     local var
+     typeset -nu ptr1=var
+     ptr1=outer || print -u2 assignment expected to succeed
+     typeset -n ptr2=var
+     ptr2=inner
+     typeset -n
+     printf "%s=%s\n" ptr1 "$ptr1" ptr2 "$ptr2"
+   }
+   typeset -p var
+ }
+ typeset -p var
+0:up-reference part 8, upscope namerefs, end in scope
+>ptr1=var
+>ptr2=var
+>ptr1=outer
+>ptr2=inner
+>typeset -g var=outer
+>typeset var=outer
+
  if zmodload zsh/parameter; then
  () {
    zmodload -u zsh/parameter
@@ -539,7 +581,7 @@ F:Same test, should part 5 output look like this?
  }
  else ZTST_skip='Cannot zmodload zsh/parameter, skipping autoload test'
  fi
-0:up-reference part 3, autoloading with hidden special
+0:up-reference part 9, autoloading with hidden special
 >nameref-local-nameref-local
 >typeset -h parameters
 
@@ -762,12 +804,17 @@ F:relies on global TYPESET_TO_UNSET in %prep
 
  bar=xx
  typeset -n foo=bar
- () { typeset -n foo; foo=zz; foo=zz; print $bar $zz }
+ () {
+   typeset -n foo; foo=zz
+   foo=zz || print -u2 foo: assignment failed
+   print $bar $zz
+ }
  () { typeset -n foo; foo=zz; local zz; foo=zz; print $bar $zz }
 0:regression: local nameref may not in-scope a global parameter
 F:previously this could create an infinite recursion and crash
 >xx
 >xx zz
+*?*foo: assignment failed
 
  typeset -nm foo=bar
 1:create nameref by pattern match not allowed
diff --git a/Test/V10private.ztst b/Test/V10private.ztst
index efa346002..ed51316f3 100644
--- a/Test/V10private.ztst
+++ b/Test/V10private.ztst
@@ -378,7 +378,7 @@ F:Here ptr1 finds private ptr2 by scope mismatch
    typeset -p ptr1 ptr2
    typeset val=LOCAL
    () {
-     ptr1=val		# This is a silent no-op, why?
+     ptr1=val || print -u2 ptr1: assignment failed
      typeset -n
      printf "%s=%s\n" ptr1 "$ptr1" ptr2 "$ptr2"
    }
@@ -388,7 +388,6 @@ F:Here ptr1 finds private ptr2 by scope mismatch
 1:up-reference for private namerefs, end not in scope
 F:See K01nameref.ztst up-reference part 5
 F:Here ptr1 finds private ptr2 by scope mismatch
-F:Assignment silently fails, is that correct?
 >typeset -n ptr1=ptr2
 >typeset -hn ptr2=''
 >ptr1=ptr2
@@ -396,6 +395,7 @@ F:Assignment silently fails, is that correct?
 >ptr2=
 >typeset -n ptr1=ptr2
 >typeset -hn ptr2=''
+*?*ptr1: assignment failed
 *?*no such variable: ptr2
 
  typeset ptr2

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-01 23:28                                   ` Up-scope named references, vs. ksh Bart Schaefer
@ 2024-03-03 13:44                                     ` Stephane Chazelas
  2024-03-03 19:04                                       ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Stephane Chazelas @ 2024-03-03 13:44 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-03-01 15:28:58 -0800, Bart Schaefer:
> On Fri, Mar 1, 2024 at 10:22 AM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > $ ./Src/zsh -c 'f() { typeset -n ref=$1; unset ref; local var=2; ref=3; }; typeset -A var; f var; echo $var'
> > <empty>
> 
> That's an actual bug, regardless of the rest of this discussion.
[...]

If that one can be fixed then why can't the main one, namely:

assign() {
  typeset -n var=$1
  typeset    value=$2
  var=$value
}
assign value x
echo $value

(or assign var x returning an error)

be fixed the same way, that is typeset -n ref=var doing the
equivalent of assigning var and unsetting afterwards if it is
previously not found? 

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-03 13:44                                     ` Stephane Chazelas
@ 2024-03-03 19:04                                       ` Bart Schaefer
  2024-03-03 20:27                                         ` Stephane Chazelas
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-03-03 19:04 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Mar 3, 2024 at 5:44 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> If that one can be fixed then why can't the main one

Because the outer-scope parameter object already exists in "that one".
As I keep saying, the problem is with creating a new parameter at a
lower scope (counting 0 as the global scope) when a parameter of that
name already exists at the current (greater than 0) scope.

>   typeset -n var=$1
>   typeset    value=$2

If the parameter is magically created "in advance" (at the point of
"typeset -n") then it only works for global scalars -- if you try to
create an array using "var" you run into baffling stuff like this:

assign:3: value: attempt to assign array value to non-array

I think it's much more consistent for dynamic scoping to apply at the
time $var is referenced or assigned.

The patch in workers/52650 allows something like this to work:

assign() {
  typeset value=$2
  typeset -nu var=$1 # ignore local $value
  case ${(t)var} in
  (*scalar*)  var=$value;;
  (*array*)  var=( $value );;
  (*) print -u2 cannot assign;;
  esac
}
declare -a value
assign value x

If a function wants to be able to actually add things to its calling
scope without fear of name clashes, it has to use unique local names
(e.g., a namespace).  But with dynamic scoping it remains the case
that you can't add a name to the global scope if any function in the
call chain has pre-empted that name -- named references don't change
this basic rule.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-03 19:04                                       ` Bart Schaefer
@ 2024-03-03 20:27                                         ` Stephane Chazelas
  2024-03-03 22:58                                           ` Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Stephane Chazelas @ 2024-03-03 20:27 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-03-03 11:04:03 -0800, Bart Schaefer:
[...]
> >   typeset -n var=$1
> >   typeset    value=$2
> 
> If the parameter is magically created "in advance" (at the point of
> "typeset -n") then it only works for global scalars

Why would we need it for any other scope? If the parameter is
not found in any scope, then it must be created in global scope.

That's what "read var" does. Many if not most of the usages of
namerefs I've come across were to implement "read"-like commands
that return or can return something in a variable whose name is given by the
caller (and that can create the variable if needed).

> if you try to
> create an array using "var" you run into baffling stuff like this:
> 
> assign:3: value: attempt to assign array value to non-array
> 

assign() {
  if [[ -v $1 ]]; then
    typeset -n var=$1
  else
    eval "$1="
    typeset -n var=$1
    unset var
  fi
  typeset value=$2
  var=(var now set with value $value)
}
assign value x
typeset -p value


Where an approximation of what I'm suggesting is implemented in
zsh seems to work fine (with your patch applied).

It segfaults with zsh -x though

$ ./Src/zsh  ~/a
typeset -a value=( var now set with value x )
$ gdb -q --args ./Src/zsh -x  ~/a
Reading symbols from ./Src/zsh...
(gdb) r
Starting program: /home/chazelas/install/cvs/zsh/Src/zsh -x /home/chazelas/a
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
+/home/chazelas/a:12> assign value x
+assign:1> [[ -v value ]]
+assign:4> eval 'value='
+(eval):1> value=''
+assign:5> typeset -n var=value
+assign:6> unset var
+assign:8> typeset value=x

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7cfb3fe in __GI___libc_free (mem=0x555555600) at ./malloc/malloc.c:3368
3368    ./malloc/malloc.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7cfb3fe in __GI___libc_free (mem=0x555555600) at ./malloc/malloc.c:3368
#1  0x00005555555d5105 in zsfree (p=0x555555600 <error: Cannot access memory at address 0x555555600>) at mem.c:1878
#2  0x0000555555625482 in freearray (s=0x555555683d88) at utils.c:4100
#3  0x00005555555ea0e6 in arrsetfn (pm=0x555555683da0, x=0x555555683f40) at params.c:3996
#4  0x00005555555e5691 in setarrvalue (v=0x7fffffffc360, val=0x555555683f40) at params.c:2889
#5  0x00005555555e8537 in assignaparam (s=0x7ffff7fac733 "", val=0x555555683f40, flags=6) at params.c:3538
#6  0x00005555555943f7 in addvars (state=0x7fffffffcda0, pc=0x55555567d508, addflags=0) at exec.c:2631
#7  0x000055555559639b in execcmd_exec (state=0x7fffffffcda0, eparams=0x7fffffffc9b0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3363
#8  0x00005555555926c8 in execpline2 (state=0x7fffffffcda0, pcode=643, how=18, input=0, output=0, last1=0) at exec.c:2016
#9  0x000055555559125c in execpline (state=0x7fffffffcda0, slcode=10242, how=18, last1=0) at exec.c:1741
#10 0x000055555559049e in execlist (state=0x7fffffffcda0, dont_change_job=1, exiting=0) at exec.c:1495
#11 0x000055555558fabe in execode (p=0x55555567d3f0, dont_change_job=1, exiting=0, context=0x5555556310da "shfunc") at exec.c:1276
#12 0x000055555559e87a in runshfunc (prog=0x55555567d3f0, wrap=0x0, name=0x7ffff7fac170 "assign") at exec.c:6164
#13 0x000055555559dd8a in doshfunc (shfunc=0x55555567d570, doshargs=0x7ffff7fb7b18, noreturnval=0) at exec.c:6010
#14 0x000055555559ca1d in execshfunc (shf=0x55555567d570, args=0x7ffff7fb7b18) at exec.c:5548
#15 0x0000555555598a9c in execcmd_exec (state=0x7fffffffdad0, eparams=0x7fffffffd6e0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:4075
#16 0x00005555555926c8 in execpline2 (state=0x7fffffffdad0, pcode=835, how=18, input=0, output=0, last1=0) at exec.c:2016
#17 0x000055555559125c in execpline (state=0x7fffffffdad0, slcode=5122, how=18, last1=0) at exec.c:1741
#18 0x000055555559049e in execlist (state=0x7fffffffdad0, dont_change_job=0, exiting=0) at exec.c:1495
#19 0x000055555558fabe in execode (p=0x7ffff7fb7a30, dont_change_job=0, exiting=0, context=0x555555632b86 "toplevel") at exec.c:1276
#20 0x00005555555b7cb0 in loop (toplevel=1, justonce=0) at init.c:212
#21 0x00005555555bc7a9 in zsh_main (argc=3, argv=0x7fffffffddf8) at init.c:1934
#22 0x000055555556c99d in main (argc=3, argv=0x7fffffffddf8) at ./main.c:93
(gdb)

> I think it's much more consistent for dynamic scoping to apply at the
> time $var is referenced or assigned.
> 
> The patch in workers/52650 allows something like this to work:
> 
> assign() {
>   typeset value=$2
>   typeset -nu var=$1 # ignore local $value
>   case ${(t)var} in
>   (*scalar*)  var=$value;;
>   (*array*)  var=( $value );;
>   (*) print -u2 cannot assign;;
>   esac
> }
> declare -a value
> assign value x
> 
> If a function wants to be able to actually add things to its calling
> scope without fear of name clashes, it has to use unique local names
> (e.g., a namespace).  But with dynamic scoping it remains the case
> that you can't add a name to the global scope if any function in the
> call chain has pre-empted that name -- named references don't change
> this basic rule.

I really don't follow, dynamic scoping should make things easier
here.

If the caller does "local var; assign var value", then obviously
they want value to be assigned to their var, not have one
created in the global scope instead. It's easy in a shell with
dynamic scoping as a function sees the local variables of its
caller anyway (for instance, my [[ -v $1 ]] above will see).

That typeset -nu var is useful indeed as it allows one to
declare local variables *before* tying namerefs to variables,
but having namerefs that can't create globals seems like a very
severe limitations to me, all other shells with namerefs can do
it. zsh can do it except for that possible clash with local
variables declared *after* which it seems to me should be
avoidable.

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-03 20:27                                         ` Stephane Chazelas
@ 2024-03-03 22:58                                           ` Bart Schaefer
  2024-03-04 19:59                                             ` Stephane Chazelas
  2024-03-05  1:05                                             ` Oliver Kiddle
  0 siblings, 2 replies; 48+ messages in thread
From: Bart Schaefer @ 2024-03-03 22:58 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Mar 3, 2024 at 12:27 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> 2024-03-03 11:04:03 -0800, Bart Schaefer:
> >
> > If the parameter is magically created "in advance" (at the point of
> > "typeset -n") then it only works for global scalars
>
> Why would we need it for any other scope? If the parameter is
> not found in any scope, then it must be created in global scope.
>
> That's what "read var" does.

The word "scalar" is just as important in that sentence as the word
"global" -- "read" and "zstat" can specify whether the variable is
scalar or array, you can't do that with a nameref.  Furthermore, if
there's a local in scope at the point where "read var" is called, it
assigns to the local, even if the local is unset.  The scope is
determined at the point of assignment.

Locals consume the space allocated in the parameter table for the
corresponding name.  This is a longstanding, conscious decision in the
parameter implementation, I even quoted a code comment to that effect
in the thread about unsetting tied parameters.  With -u, a nameref can
skip over one local, but it can't "make room" for a global that didn't
already exist. I've tried several different ways to explain this, so
I'm just going to ignore this going forward.

> Many if not most of the usages of
> namerefs I've come across were to implement "read"-like commands
> that return or can return something in a variable whose name is given by the
> caller (and that can create the variable if needed).

There's nothing in the current implementation that prevents you from
doing that if you choose your local names appropriately (and if
nothing above you in the call chain hasn't already co-opted the name).

> Where an approximation of what I'm suggesting is implemented in
> zsh seems to work fine (with your patch applied).

But that approach doesn't work in the general case!  If you move
  typeset value=$2
to above the [[ -v $1 ]], then it either bombs, or assigns to the
"wrong" $value.

There's no way to magically get this right in the internal
implementation, it has to be handled in shell code.

> It segfaults with zsh -x though

I'll have a look at that later, thanks.

> > If a function wants to be able to actually add things to its calling
> > scope without fear of name clashes, it has to use unique local names
> > (e.g., a namespace).  But with dynamic scoping it remains the case
> > that you can't add a name to the global scope if any function in the
> > call chain has pre-empted that name -- named references don't change
> > this basic rule.
>
> I really don't follow, dynamic scoping should make things easier
> here.

Consider the case where your function is called 2 or more levels
removed from the global scope.

> but having namerefs that can't create globals seems like a very
> severe limitations to me

It's the same limitation as
() {
  local foo
  unset foo
  () { foo=bar }
}
which despite the appearance of the assignment in the nested function,
doesn't create a global foo, it resurrects the caller's local foo.
Ksh has static local scope (and an entirely different internal
representation of parameters) so the local foo makes no difference to
the nested function.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-03 22:58                                           ` Bart Schaefer
@ 2024-03-04 19:59                                             ` Stephane Chazelas
  2024-03-05  1:05                                             ` Oliver Kiddle
  1 sibling, 0 replies; 48+ messages in thread
From: Stephane Chazelas @ 2024-03-04 19:59 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2024-03-03 14:58:04 -0800, Bart Schaefer:
[...]
> I've tried several different ways to explain this, so
> I'm just going to ignore this going forward.

Sorry for being so thick but I still don't get it. As there's
still a chance it's just we have different expectations, I'll
try a last attempt at clarifying mine below

[...]
> > Many if not most of the usages of
> > namerefs I've come across were to implement "read"-like commands
> > that return or can return something in a variable whose name is given by the
> > caller (and that can create the variable if needed).
> 
> There's nothing in the current implementation that prevents you from
> doing that if you choose your local names appropriately (and if
> nothing above you in the call chain hasn't already co-opted the name).

If we still have to namespace the names of the nameref variables
and any local variables of functions that use namerefs, then we
might as well do like in bash/mksh and just follow by name.

> > Where an approximation of what I'm suggesting is implemented in
> > zsh seems to work fine (with your patch applied).
> 
> But that approach doesn't work in the general case!  If you move
>   typeset value=$2
> to above the [[ -v $1 ]], then it either bombs, or assigns to the
> "wrong" $value.

To me,

typeset -n ref=var

should tie the ref to the var that is currently in scope at the
time ref is assigned var. If there's none in scope at the
moment, then it should be created unset and undeclared at the
global scope at that point, so that at the next ref=value, it's
that one that is set and not whatever variable with the same
name happened to appear at whatever scope that ref=value was
run.

In:

var=0
f() {
  local var=1
  nameref ref=var
  echo "$var"
}
f

I expect ref to poing to the local var and get "1" because
that's the var that was in scope at the time of ref=var

In:

var=0
f() {
  nameref ref=var
  local var=1
  echo "$var"
}
f

I expect 0.

In

f() {
  nameref ref=var
  local var=1
  echo "${var-unset}"
  ref=2
}
f
echo "$var"

I expect unset and 2
 
[...]
> > I really don't follow, dynamic scoping should make things easier
> > here.
> 
> Consider the case where your function is called 2 or more levels
> removed from the global scope.

In:

var=0
f1() {
  local var=1
  f2
}
f2() {
  f3
  local var=2
}
f3() {
  nameref ref=var
  echo "$ref"
}

I expect to see 1 as that's the var that is in scope at the time of ref=var.

> 
> > but having namerefs that can't create globals seems like a very
> > severe limitations to me
> 
> It's the same limitation as
> () {
>   local foo
>   unset foo
>   () { foo=bar }
> }
> which despite the appearance of the assignment in the nested function,
> doesn't create a global foo, it resurrects the caller's local foo.

Which is very much what I expect and agree it would be very wrong for that to
create a global foo.

> Ksh has static local scope (and an entirely different internal
> representation of parameters) so the local foo makes no difference to
> the nested function.

(unless the local foo is expected or the nest function uses
Bourne-style instead of Korn-style function definition) but yes,
I any case I agree with your description of how scoping would
work in ksh.

-- 
Stephane


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-03 22:58                                           ` Bart Schaefer
  2024-03-04 19:59                                             ` Stephane Chazelas
@ 2024-03-05  1:05                                             ` Oliver Kiddle
  2024-03-05  2:53                                               ` Bart Schaefer
  1 sibling, 1 reply; 48+ messages in thread
From: Oliver Kiddle @ 2024-03-05  1:05 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> The word "scalar" is just as important in that sentence as the word
> "global" -- "read" and "zstat" can specify whether the variable is
> scalar or array, you can't do that with a nameref.  Furthermore, if

Isn't defaulting to a scalar type fairly normal. For the read use case,
it works for some basic tests when you start the function with:

  [[ -v $1 ]] || typeset -g $1
  typeset -un ref=$1

A different problem I'm finding is with not being able to hide the
reference type of a variable in a called function:

  inner() {
    local var=hello
    typeset -p var
  }
  outer() {
    typeset -n var=value
    inner
  }
  outer

This outputs

  typeset -g -n var=value

The value "hello" is lost but a subsequent assignment within inner()
applies to the reference.

I would expect the `local var` in inner to hide any variable including
references with the same name from an outer scope.

(I'm currently testing with the typeset -nu and Fix crash on
unset-through-nameref patches applied).

Oliver


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-05  1:05                                             ` Oliver Kiddle
@ 2024-03-05  2:53                                               ` Bart Schaefer
  2024-03-05  5:43                                                 ` Mikael Magnusson
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Schaefer @ 2024-03-05  2:53 UTC (permalink / raw)
  To: Zsh hackers list

On Mon, Mar 4, 2024 at 5:05 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Isn't defaulting to a scalar type fairly normal.

Sure, but Stephane wants a global that's a blank slate for anything to
be assigned to.  I don't think we have a way to do that -- the way the
implementation works in the absence of namerefs is to actually
remove/free the parameter and create a new one.  If there are locals
pointing to the global via pm->old, and the global gets removed, all
the locals are also destroyed.

> A different problem I'm finding is with not being able to hide the
> reference type of a variable in a called function:

That's actually "correct" -- from the doc:

 When a _named reference_ is created with 'typeset -n', all uses of PNAME
 in assignments and expansions instead assign to or expand RNAME.  This
 also applies to 'unset PNAME' and to most subsequent uses of 'typeset'
 with the exception of 'typeset -n' and 'typeset +n'

I admit this is a little weird when the named reference is inherited
from another scope.  The most recent FAQ entry that I patched actually
suggests using "private -n var" to prevent downward scope leakage of
namerefs.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-05  2:53                                               ` Bart Schaefer
@ 2024-03-05  5:43                                                 ` Mikael Magnusson
  2024-03-05  6:30                                                   ` Stephane Chazelas
  0 siblings, 1 reply; 48+ messages in thread
From: Mikael Magnusson @ 2024-03-05  5:43 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On 3/5/24, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mon, Mar 4, 2024 at 5:05 PM Oliver Kiddle <opk@zsh.org> wrote:
>>
>> Isn't defaulting to a scalar type fairly normal.
>
> Sure, but Stephane wants a global that's a blank slate for anything to
> be assigned to.  I don't think we have a way to do that -- the way the
> implementation works in the absence of namerefs is to actually
> remove/free the parameter and create a new one.  If there are locals
> pointing to the global via pm->old, and the global gets removed, all
> the locals are also destroyed.
>
>> A different problem I'm finding is with not being able to hide the
>> reference type of a variable in a called function:
>
> That's actually "correct" -- from the doc:
>
>  When a _named reference_ is created with 'typeset -n', all uses of PNAME
>  in assignments and expansions instead assign to or expand RNAME.  This
>  also applies to 'unset PNAME' and to most subsequent uses of 'typeset'
>  with the exception of 'typeset -n' and 'typeset +n'
>
> I admit this is a little weird when the named reference is inherited
> from another scope.  The most recent FAQ entry that I patched actually
> suggests using "private -n var" to prevent downward scope leakage of
> namerefs.

Is this possible to change? I feel like if "typeset myvar" (or "local
myvar") cannot be depended on to create a local parameter, a lot of
code will no longer be safe that previously was (in the sense that it
doesn't break if calling code / the shell environmnet has certain
parameters defined). Surely we cannot expect everyone to add +n to
every single "local" statement? If the current behavior is wanted, one
can always use typeset -g to get it, presumably? And that would be
more backward compatible, right?

Like say I have typeset -n foo=PS1 in my .zshrc because I don't like
writing $PS1 manually, now any code I call which has a "local foo" is
broken, which is not great. They have explicitly added their "local
foo" to be able to use the name 'foo' regardless of the calling
environment... "private -n foo" works in this case too, but I still
hold my opinion.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Up-scope named references, vs. ksh
  2024-03-05  5:43                                                 ` Mikael Magnusson
@ 2024-03-05  6:30                                                   ` Stephane Chazelas
  2024-03-06  5:04                                                     ` [PATCH] local vs. nameref scoping (was Re: Up-scope named references, vs. ksh) Bart Schaefer
  0 siblings, 1 reply; 48+ messages in thread
From: Stephane Chazelas @ 2024-03-05  6:30 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Bart Schaefer, Zsh hackers list

2024-03-05 06:43:02 +0100, Mikael Magnusson:
[...]
> > That's actually "correct" -- from the doc:
> >
> >  When a _named reference_ is created with 'typeset -n', all uses of PNAME
> >  in assignments and expansions instead assign to or expand RNAME.  This
> >  also applies to 'unset PNAME' and to most subsequent uses of 'typeset'
> >  with the exception of 'typeset -n' and 'typeset +n'
> >
> > I admit this is a little weird when the named reference is inherited
> > from another scope.  The most recent FAQ entry that I patched actually
> > suggests using "private -n var" to prevent downward scope leakage of
> > namerefs.
> 
> Is this possible to change? I feel like if "typeset myvar" (or "local
> myvar") cannot be depended on to create a local parameter, a lot of
> code will no longer be safe that previously was (in the sense that it
> doesn't break if calling code / the shell environmnet has certain
> parameters defined). Surely we cannot expect everyone to add +n to
> every single "local" statement? If the current behavior is wanted, one
> can always use typeset -g to get it, presumably? And that would be
> more backward compatible, right?
> 
> Like say I have typeset -n foo=PS1 in my .zshrc because I don't like
> writing $PS1 manually, now any code I call which has a "local foo" is
> broken, which is not great. They have explicitly added their "local
> foo" to be able to use the name 'foo' regardless of the calling
> environment... "private -n foo" works in this case too, but I still
> hold my opinion.
[...]

I'd agree that means defining a nameref would/could break any
function that uses local variables by the same name and doesn't
sound acceptable.

See also:

$ nameref action=PS1
$ zmv -n '*' '$f.back'
zsh: segmentation fault  ./Src/zsh

Program received signal SIGSEGV, Segmentation fault.
getstrvalue (v=0x7fffffffaf70) at params.c:2343
2343            s = v->pm->gsu.s->getfn(v->pm);
(gdb) bt
#0  getstrvalue (v=0x7fffffffaf70) at params.c:2343
#1  0x000055555561369d in paramsubst (l=0x7fffffffb230, n=0x7fffffffb250, str=0x7fffffffb040, qt=0, pf_flags=4, ret_flags=0x7fffffffb154) at subst.c:2907
#2  0x000055555560dca1 in stringsubst (list=0x7fffffffb230, node=0x7fffffffb250, pf_flags=4, ret_flags=0x7fffffffb154, asssub=0) at subst.c:322
#3  0x000055555560cf87 in prefork (list=0x7fffffffb230, flags=4, ret_flags=0x7fffffffb154) at subst.c:142
#4  0x000055555560e474 in singsub (s=0x7fffffffb368) at subst.c:520
#5  0x000055555558a80e in cond_subst (strp=0x7fffffffb368, glob_ok=1) at cond.c:53
#6  0x000055555558b029 in evalcond (state=0x7fffffffc0c0, fromtest=0x0) at cond.c:198
#7  0x000055555559ba6a in execcond (state=0x7fffffffc0c0, do_exec=0) at exec.c:5185
#8  0x000055555558fe49 in execsimple (state=0x7fffffffc0c0) at exec.c:1334
#9  0x00005555555902e8 in execlist (state=0x7fffffffc0c0, dont_change_job=1, exiting=0) at exec.c:1462
#10 0x00005555555cd63d in execif (state=0x7fffffffc0c0, do_exec=0) at loop.c:561
#11 0x00005555555988aa in execcmd_exec (state=0x7fffffffc0c0, eparams=0x7fffffffbcd0, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:4024
#12 0x00005555555926dc in execpline2 (state=0x7fffffffc0c0, pcode=9987, how=2, input=0, output=0, last1=0) at exec.c:2016
#13 0x0000555555591270 in execpline (state=0x7fffffffc0c0, slcode=40962, how=2, last1=0) at exec.c:1741
#14 0x00005555555904b2 in execlist (state=0x7fffffffc0c0, dont_change_job=1, exiting=0) at exec.c:1495
#15 0x000055555558fad2 in execode (p=0x555555689cd0, dont_change_job=1, exiting=0, context=0x555555630f0a "loadautofunc") at exec.c:1276
#16 0x000055555559cca0 in execautofn_basic (state=0x7fffffffc9c0, do_exec=0) at exec.c:5594
#17 0x000055555559886d in execcmd_exec (state=0x7fffffffc9c0, eparams=0x7fffffffc5d0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:4022
#18 0x00005555555926dc in execpline2 (state=0x7fffffffc9c0, pcode=3, how=18, input=0, output=0, last1=0) at exec.c:2016
#19 0x0000555555591270 in execpline (state=0x7fffffffc9c0, slcode=3074, how=18, last1=0) at exec.c:1741
#20 0x00005555555904b2 in execlist (state=0x7fffffffc9c0, dont_change_job=1, exiting=0) at exec.c:1495
#21 0x000055555558fad2 in execode (p=0x555555689b40, dont_change_job=1, exiting=0, context=0x5555556310da "shfunc") at exec.c:1276
#22 0x000055555559e88e in runshfunc (prog=0x555555689b40, wrap=0x0, name=0x7ffff7fa6168 "zmv") at exec.c:6164
#23 0x000055555559dd9e in doshfunc (shfunc=0x555555678450, doshargs=0x7ffff7fb8080, noreturnval=0) at exec.c:6010
#24 0x000055555559ca31 in execshfunc (shf=0x555555678450, args=0x7ffff7fb8080) at exec.c:5548
#25 0x0000555555598ab0 in execcmd_exec (state=0x7fffffffd6f0, eparams=0x7fffffffd300, input=0, output=0, how=18, last1=1, close_if_forked=-1) at exec.c:4075
#26 0x00005555555926dc in execpline2 (state=0x7fffffffd6f0, pcode=131, how=18, input=0, output=0, last1=1) at exec.c:2016
#27 0x0000555555591270 in execpline (state=0x7fffffffd6f0, slcode=6146, how=18, last1=1) at exec.c:1741
#28 0x00005555555904b2 in execlist (state=0x7fffffffd6f0, dont_change_job=0, exiting=1) at exec.c:1495
#29 0x000055555558fad2 in execode (p=0x7ffff7fb7b60, dont_change_job=0, exiting=1, context=0x555555633326 "cmdarg") at exec.c:1276
#30 0x000055555558f998 in execstring (s=0x7fffffffdd3b "nameref action=PS1; autoload zmv; zmv -n \"*\" \"\\$f.back\"", dont_change_job=0, exiting=1, context=0x555555633326 "cmdarg")
    at exec.c:1242
#31 0x00005555555baf25 in init_misc (cmd=0x7fffffffdd3b "nameref action=PS1; autoload zmv; zmv -n \"*\" \"\\$f.back\"", zsh_name=0x7fffffffdd34 "zsh") at init.c:1530
#32 0x00005555555bc798 in zsh_main (argc=3, argv=0x7fffffffd918) at init.c:1920
#33 0x000055555556c99d in main (argc=3, argv=0x7fffffffd918) at ./main.c:93
(gdb) p *v.pm
$1 = {node = {next = 0x0, nam = 0x555555670480 "PS1", flags = 1048576}, u = {data = 0x55555568bab0, arr = 0x55555568bab0, str = 0x55555568bab0 "", val = 93824993508016,
    valptr = 0x55555568bab0, dval = 4.6355706013588689e-310, hash = 0x55555568bab0}, gsu = {s = 0x0, i = 0x0, f = 0x0, a = 0x0, h = 0x0}, base = 0, width = 0, env = 0x0, ename = 0x0,
  old = 0x0, level = 0}




^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] local vs. nameref scoping (was Re: Up-scope named references, vs. ksh)
  2024-03-05  6:30                                                   ` Stephane Chazelas
@ 2024-03-06  5:04                                                     ` Bart Schaefer
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Schaefer @ 2024-03-06  5:04 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

On Mon, Mar 4, 2024 at 9:43 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 3/5/24, Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> >  When a _named reference_ is created with 'typeset -n', all uses of PNAME
> >  in assignments and expansions instead assign to or expand RNAME.  This
> >  also applies to 'unset PNAME' and to most subsequent uses of 'typeset'
> >  with the exception of 'typeset -n' and 'typeset +n'
>
> Is this possible to change? I feel like if "typeset myvar" (or "local
> myvar") cannot be depended on to create a local parameter, a lot of
> code will no longer be safe that previously was (in the sense that it
> doesn't break if calling code / the shell environmnet has certain
> parameters defined).

Yes, it's possible, and the attached patch does so, with doc update
and new tests.

The division of labor among bin_typeset(), typeset_single(), and
createparam() is a bit hard to follow (as in, it's not really much of
a "division" at all).

On Mon, Mar 4, 2024 at 10:31 PM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> See also:
>
> $ nameref action=PS1
> $ zmv -n '*' '$f.back'
> zsh: segmentation fault  ./Src/zsh

A simpler reproducer:

% typeset -n foo=PS1
% () { local foo; foo=xx }
==1113679== Invalid read of size 8
==1113679==    at 0x1A16E1: assignstrvalue (params.c:2684)

This patch fixes that, and one other potential crash that valgrind
complains about:

% typeset -n foo=bar
% typeset -n foo
==1113695== Invalid read of size 4
==1113695==    at 0x136041: bin_typeset (builtin.c:3137)

I'll go ahead and push this since there doesn't seem to be any
argument about the change in function and it fixes two crash bugs.

[-- Attachment #2: local-of-nameref.txt --]
[-- Type: text/plain, Size: 4492 bytes --]

diff --git a/Doc/Zsh/params.yo b/Doc/Zsh/params.yo
index 8c5e67e70..d179a0d1d 100644
--- a/Doc/Zsh/params.yo
+++ b/Doc/Zsh/params.yo
@@ -670,8 +670,9 @@ This manual was generated with Zsh tt(version()).
 When a em(named reference) is created with `tt(typeset -n)', all uses
 of var(pname) in assignments and expansions instead assign to or
 expand var(rname).  This also applies to `tt(unset )var(pname)' and to
-most subsequent uses of `tt(typeset)' with the exception of
-`tt(typeset -n)' and `tt(typeset +n)', so to remove a named reference,
+most subsequent uses of `tt(typeset)' with the exceptions of declaring
+a local in a called function, or updating a current-scope parameter with
+`tt(typeset -n)' or `tt(typeset +n)'. Thus to remove a named reference,
 use either `tt(unset -n )var(pname)' (preferred) or one of:
 ifzman()
 example(tt(typeset -n )var(pname=)
diff --git a/Src/builtin.c b/Src/builtin.c
index 6f98990f9..829b899f8 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -2030,11 +2030,10 @@ typeset_single(char *cname, char *pname, Param pm, int func,
     int usepm, tc, keeplocal = 0, newspecial = NS_NONE, readonly, dont_set = 0;
     char *subscript;
 
-    if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF)) {
-	if (!(off & PM_NAMEREF)) {
-	    if ((pm = (Param)resolve_nameref(pm, NULL)))
-		pname = pm->node.nam;
-	}
+    if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF) &&
+	(pm->level == locallevel || !(on & PM_LOCAL))) {
+	if ((pm = (Param)resolve_nameref(pm, NULL)))
+	    pname = pm->node.nam;
 	if (pm && (pm->node.flags & PM_NAMEREF) &&
 	    (on & ~(PM_NAMEREF|PM_LOCAL|PM_READONLY))) {
 	    /* Changing type of PM_SPECIAL|PM_AUTOLOAD is a fatal error.  *
@@ -3125,8 +3124,10 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 			oldpm->u.str)
 			asg->value.scalar = dupstring(oldpm->u.str);
 		    /* Defer read-only error to typeset_single() */
-		    if (!(hn->flags & PM_READONLY))
+		    if (!(hn->flags & PM_READONLY)) {
 			unsetparam_pm(oldpm, 0, 1);
+			hn = NULL;
+		    }
 		}
 		/* Passing a NULL pm to typeset_single() makes the
 		 * nameref read-only before assignment, which breaks
@@ -3134,7 +3135,7 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 		 * so this is special-cased to permit that action
 		 * like assign-at-create for other parameter types.
 		 */
-		if (!(hn->flags & PM_READONLY))
+		if (hn && !(hn->flags & PM_READONLY))
 		    hn = NULL;
 	    }
 	}
diff --git a/Src/params.c b/Src/params.c
index 4bcf41c22..973df3fe5 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -1034,7 +1034,8 @@ createparam(char *name, int flags)
 	}
 
 	if (oldpm && !(flags & PM_NAMEREF) &&
-	    (!(oldpm->node.flags & PM_RO_BY_DESIGN) || !(flags & PM_LOCAL)) &&
+	    (oldpm->level == locallevel ?
+	     !(oldpm->node.flags & PM_RO_BY_DESIGN) : !(flags & PM_LOCAL)) &&
 	    (oldpm->node.flags & PM_NAMEREF) &&
 	    (oldpm = upscope(oldpm, oldpm->base))) {
 	    Param lastpm;
diff --git a/Test/K01nameref.ztst b/Test/K01nameref.ztst
index e45b922e2..bb0d11821 100644
--- a/Test/K01nameref.ztst
+++ b/Test/K01nameref.ztst
@@ -51,9 +51,19 @@
 0:remove nameref attribute
 >typeset ptr=var
 
-  typeset -n ptr
-  typeset -t ptr
-  typeset -p ptr
+ typeset -n ptr=gvar
+ () {
+   local ptr
+   typeset -p ptr
+ }
+ typeset -p ptr
+0:Local non-reference hides outside reference
+>typeset ptr
+>typeset -n ptr=gvar
+
+ typeset -n ptr
+ typeset -t ptr
+ typeset -p ptr
 0:change type of a placeholder
 F:Other type changes are fatal errors, should this also be?
 >typeset -n ptr=''
@@ -845,4 +855,39 @@ F:previously this could create an infinite recursion and crash
 1:create nameref by pattern match not allowed
 *?*typeset:1: invalid reference
 
+#
+# The following tests are run in interactive mode, using PS1 as an
+# assignable special with side-effects.  This crashed at one time.
+#
+
+ # Note bypassing TYPESET_TO_UNSET here
+ $ZTST_testdir/../Src/zsh -fis <<<$'
+ typeset -n p=PS1
+ () {
+  typeset -p p
+  local p
+  typeset -p p
+  p=xx
+  typeset -p p
+ }
+ '
+0:regression: assign to local that shadows global named reference
+>typeset -g -n p=PS1
+>typeset p=''
+>typeset p=xx
+*?*
+
+ # Note bypassing TYPESET_TO_UNSET here
+ $ZTST_testdir/../Src/zsh -fis <<<$'
+ () {
+   typeset p=PS1
+   typeset -n p
+   p=zz
+ }
+ typeset -p PS1
+ '
+0:regression - converting a string into a named reference
+>typeset PS1=zz
+*?*
+
 %clean

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-03-06  5:05 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06  2:21 [PATCH 1/3]: Add named references Bart Schaefer
2023-02-08  3:45 ` Oliver Kiddle
2023-02-08  4:59   ` Bart Schaefer
2023-02-08 23:16     ` Bart Schaefer
2023-02-09  0:47     ` Oliver Kiddle
2023-02-09  2:01       ` Oliver Kiddle
2023-02-09  5:45         ` Bart Schaefer
2023-02-09  4:49       ` Bart Schaefer
2023-02-09 20:49         ` Oliver Kiddle
2023-02-09 23:07           ` Bart Schaefer
2023-02-11  3:04             ` Bart Schaefer
2023-02-11  3:55               ` Bart Schaefer
2023-02-11  5:36                 ` Speaking of dangerous referents Bart Schaefer
2023-02-12  8:00                   ` Oliver Kiddle
2023-02-12  8:34                     ` Bart Schaefer
2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
2023-02-11  7:45                 ` Bart Schaefer
2023-02-11 23:43                   ` Bart Schaefer
2023-02-11 23:45                     ` Bart Schaefer
2023-02-12  7:38                     ` Oliver Kiddle
2024-02-11  7:00                   ` Stephane Chazelas
2024-02-11 16:14                     ` Bart Schaefer
2024-02-11 16:42                       ` Bart Schaefer
2024-02-18  3:26                       ` Up-scope named references, vs. ksh Bart Schaefer
2024-02-20 21:05                         ` Stephane Chazelas
2024-02-20 22:30                           ` Bart Schaefer
2024-02-21 20:12                             ` Stephane Chazelas
2024-02-29  5:16                               ` Bart Schaefer
2024-03-01 18:22                                 ` Stephane Chazelas
2024-03-01 20:34                                   ` Bart Schaefer
2024-03-02  7:29                                     ` Bart Schaefer
2024-03-02 23:55                                       ` [PATCH] "typeset -nu" (was Re: Up-scope named references, vs. ksh) Bart Schaefer
2024-03-01 23:28                                   ` Up-scope named references, vs. ksh Bart Schaefer
2024-03-03 13:44                                     ` Stephane Chazelas
2024-03-03 19:04                                       ` Bart Schaefer
2024-03-03 20:27                                         ` Stephane Chazelas
2024-03-03 22:58                                           ` Bart Schaefer
2024-03-04 19:59                                             ` Stephane Chazelas
2024-03-05  1:05                                             ` Oliver Kiddle
2024-03-05  2:53                                               ` Bart Schaefer
2024-03-05  5:43                                                 ` Mikael Magnusson
2024-03-05  6:30                                                   ` Stephane Chazelas
2024-03-06  5:04                                                     ` [PATCH] local vs. nameref scoping (was Re: Up-scope named references, vs. ksh) Bart Schaefer
2023-02-12  9:02             ` [PATCH 1/3]: Add named references Oliver Kiddle
2023-02-12 18:59               ` Bart Schaefer
2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
2023-02-12 21:01                   ` Oliver Kiddle
2023-02-12 22:54                 ` [PATCH 1/3]: Add named references Oliver Kiddle

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).