zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH 1/3]: Add named references
@ 2023-02-06  2:21 Bart Schaefer
  2023-02-08  3:45 ` Oliver Kiddle
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Schaefer @ 2023-02-06  2:21 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 2888 bytes --]

This has been my weekend/evening project the past 10 days or so.  I
compared to ksh93 (from the Ubuntu software store) wherever there
weren't conflicts with zsh-native features, even managing to crash
ksh93 several times in the process.

Attached is the patch with the C code changes.  It'd be nice, Roman,
if you could run your benchmarks on this to see if I've slowed down
any other shell operations.  The next two patches will cover tests and
documentation.

When a reference is created, it's necessary to determine the scope in
which to find the referenced parameter.  Since a reference can name an
array element, this involves separating the base parameter name from
the subscript.  I've overloaded the "base" and "width" fields in
struct param to save both the scope (locallevel) and the offset of the
subscript, to avoid having to search for the subscript again when the
reference is expanded.  There is a little redundant scanning to
validate subscripts when the reference is created; this follows ksh93
(the alternative would be to fail at the time of expansion).

One difference from ksh93 is that you can't convert scalars into
namerefs, e.g., both with this and in ksh93:
  typeset -n foo=bar
  typeset +n nameref
leaves behind a scalar $foo with value bar. but only in ksh93:
  foo=bar
  typeset -n foo
  foo=thing
assigns bar=thing.  In this implementation typeset -n first unsets
foo, so instead of foo
"pointing at" bar, it ends up "pointing at" thing.  This eliminates a
bunch of special cases for exported and tied parameters, etc., and it
seems easy enough to write:
  typeset +n foo
  typeset -n foo=$foo
if that's what you wanted.  If I have a sudden inspiration I'll do a
follow-up patch.

Named references can't have any other attributes, and attempting to
add an attribute to a named reference generates a warning.  However,
existing prohibited attribute changes (such as trying to turn a
special scalar into an array) are fatal errors, so I've left comments
wondering whether that would instead be appropriate here.

With respect to zsh/param/private parameters, it's presently
prohibited to have a "public" reference to a private parameter.  It
might be possible to change this if someone can think of a use case.
There is one glitch: if the reference is created before the parameter
is declared private, and the private local is NOT hiding something in
an outer scope, then the reference does not retroactively generate an
error and assignments to the reference are silent no-ops.

I moved around a little bit of not-directly-related signal handling in
params.c so that we're not calling zfree() outside of queue_signals()
blocks.  That makes the order of unqueue_signals() and zfree()
consistent at least throughout assignsparam().

I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can
tell they should never have been there in the first place.

[-- Attachment #2: nameref-1-code.txt --]
[-- Type: text/plain, Size: 17101 bytes --]

diff --git a/Src/Modules/param_private.c b/Src/Modules/param_private.c
index 065fa63d2..70f36ceb1 100644
--- a/Src/Modules/param_private.c
+++ b/Src/Modules/param_private.c
@@ -512,9 +512,16 @@ static GetNodeFunc getparamnode;
 static HashNode
 getprivatenode(HashTable ht, const char *nam)
 {
-    HashNode hn = getparamnode(ht, nam);
+    /* getparamnode() would follow namerefs, we must not do that here */
+    HashNode hn = gethashnode2(ht, nam);
     Param pm = (Param) hn;
 
+    /* autoload has precedence over nameref, so getparamnode() */
+    if (pm && (pm->node.flags & PM_AUTOLOAD)) {
+	hn = getparamnode(ht, nam);
+	pm = (Param) hn;
+	/* how would an autoloaded private behave?  return here? */
+    }
     while (!fakelevel && pm && locallevel > pm->level && is_private(pm)) {
 	if (!(pm->node.flags & PM_UNSET)) {
 	    /*
@@ -533,6 +540,12 @@ getprivatenode(HashTable ht, const char *nam)
 	}
 	pm = pm->old;
     }
+
+    /* resolve nameref after skipping private parameters */
+    if (pm && (pm->node.flags & PM_NAMEREF) &&
+	(pm->u.str || (pm->node.flags & PM_UNSET)))
+	pm = (Param) resolve_nameref(pm, NULL);
+
     return (HashNode)pm;
 }
 
@@ -571,7 +584,7 @@ printprivatenode(HashNode hn, int printflags)
 
 static struct builtin bintab[] = {
     /* Copied from BUILTIN("local"), "P" added */
-    BUILTIN("private", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_private, 0, -1, 0, "AE:%F:%HL:%PR:%TUZ:%ahi:%lmprtux", "P")
+    BUILTIN("private", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_private, 0, -1, 0, "AE:%F:%HL:%PR:%TUZ:%ahi:%lnmprtux", "P")
 };
 
 static struct features module_features = {
diff --git a/Src/Modules/parameter.c b/Src/Modules/parameter.c
index dbb61e474..5bf675e2a 100644
--- a/Src/Modules/parameter.c
+++ b/Src/Modules/parameter.c
@@ -49,13 +49,15 @@ paramtypestr(Param pm)
 	if (pm->node.flags & PM_AUTOLOAD)
 	    return dupstring("undefined");
 
-	switch (PM_TYPE(f)) {
+	/* For simplicity we treat PM_NAMEREF as PM_TYPE(PM_SCALAR) */
+	switch (PM_TYPE(f)|(f & PM_NAMEREF)) {
 	case PM_SCALAR:  val = "scalar"; break;
 	case PM_ARRAY:   val = "array"; break;
 	case PM_INTEGER: val = "integer"; break;
 	case PM_EFLOAT:
 	case PM_FFLOAT:  val = "float"; break;
 	case PM_HASHED:  val = "association"; break;
+	case PM_NAMEREF: val = "nameref"; break;
 	}
 	DPUTS(!val, "BUG: type not handled in parameter");
 	val = dupstring(val);
diff --git a/Src/builtin.c b/Src/builtin.c
index 4c295d11f..8039b644e 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -55,7 +55,7 @@ static struct builtin builtins[] =
     BUILTIN("cd", BINF_SKIPINVALID | BINF_SKIPDASH | BINF_DASHDASHVALID, bin_cd, 0, 2, BIN_CD, "qsPL", NULL),
     BUILTIN("chdir", BINF_SKIPINVALID | BINF_SKIPDASH | BINF_DASHDASHVALID, bin_cd, 0, 2, BIN_CD, "qsPL", NULL),
     BUILTIN("continue", BINF_PSPECIAL, bin_break, 0, 1, BIN_CONTINUE, NULL, NULL),
-    BUILTIN("declare", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klmp:%rtuxz", NULL),
+    BUILTIN("declare", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klmnp:%rtuxz", NULL),
     BUILTIN("dirs", 0, bin_dirs, 0, -1, 0, "clpv", NULL),
     BUILTIN("disable", 0, bin_enable, 0, -1, BIN_DISABLE, "afmprs", NULL),
     BUILTIN("disown", 0, bin_fg, 0, -1, BIN_DISOWN, NULL, NULL),
@@ -88,7 +88,7 @@ static struct builtin builtins[] =
     BUILTIN("jobs", 0, bin_fg, 0, -1, BIN_JOBS, "dlpZrs", NULL),
     BUILTIN("kill", BINF_HANDLES_OPTS, bin_kill, 0, -1, 0, NULL, NULL),
     BUILTIN("let", 0, bin_let, 1, -1, 0, NULL, NULL),
-    BUILTIN("local", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%ahi:%lp:%rtux", NULL),
+    BUILTIN("local", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%ahi:%lnp:%rtux", NULL),
     BUILTIN("logout", 0, bin_break, 0, 1, BIN_LOGOUT, NULL, NULL),
 
 #if defined(ZSH_MEM) & defined(ZSH_MEM_DEBUG)
@@ -121,7 +121,7 @@ static struct builtin builtins[] =
     BUILTIN("trap", BINF_PSPECIAL | BINF_HANDLES_OPTS, bin_trap, 0, -1, 0, NULL, NULL),
     BUILTIN("true", 0, bin_true, 0, -1, 0, NULL, NULL),
     BUILTIN("type", 0, bin_whence, 0, -1, 0, "ampfsSw", "v"),
-    BUILTIN("typeset", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klp:%rtuxmz", NULL),
+    BUILTIN("typeset", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klp:%rtuxmnz", NULL),
     BUILTIN("umask", 0, bin_umask, 0, 1, 0, "S", NULL),
     BUILTIN("unalias", 0, bin_unhash, 0, -1, BIN_UNALIAS, "ams", NULL),
     BUILTIN("unfunction", 0, bin_unhash, 1, -1, BIN_UNFUNCTION, "m", "f"),
@@ -2030,6 +2030,19 @@ typeset_single(char *cname, char *pname, Param pm, int func,
     int usepm, tc, keeplocal = 0, newspecial = NS_NONE, readonly, dont_set = 0;
     char *subscript;
 
+    if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF)) {
+	if (!(off & PM_NAMEREF))
+	    pm = (Param)resolve_nameref(pm, NULL);
+	if (pm && (pm->node.flags & PM_NAMEREF) &&
+	    (on & ~(PM_NAMEREF|PM_LOCAL))) {
+	    /* Changing type of PM_SPECIAL|PM_AUTOLOAD is a fatal error.  *
+	     * Should this be a fatal error as well, rather than warning? */
+	    zwarnnam(cname, "%s: can't change type of a named reference",
+		     pname);
+	    return NULL;
+	}
+    }
+
     /*
      * Do we use the existing pm?  Note that this isn't the end of the
      * story, because if we try and create a new pm at the same
@@ -2406,6 +2419,11 @@ typeset_single(char *cname, char *pname, Param pm, int func,
 		return NULL;
 	}
     } else if ((subscript = strchr(pname, '['))) {
+	if (on & PM_NAMEREF) {
+	    zerrnam(cname,
+		    "%s: reference variable cannot be an array", pname);
+	    return NULL;
+	}
 	if (on & PM_READONLY) {
 	    zerrnam(cname,
 		    "%s: can't create readonly array elements", pname);
@@ -2640,6 +2658,14 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 	else if (OPT_PLUS(ops,optval))
 	    off |= bit;
     }
+    if (OPT_MINUS(ops,'n')) {
+	if (on|off) {
+	    zwarnnam(name, "no other attributes allowed with -n");
+	    return 1;
+	}
+	on |= PM_NAMEREF;
+    } else if (OPT_PLUS(ops,'n'))
+	off |= PM_NAMEREF;
     roff = off;
 
     /* Sanity checks on the options.  Remove conflicting options. */
@@ -3022,6 +3048,27 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func)
 	    }
 	    continue;
 	}
+
+	if (on & PM_NAMEREF) {
+	    if (asg->value.scalar &&
+		(strcmp(asg->name, asg->value.scalar) == 0 ||
+		 ((pm = (Param)resolve_nameref((Param)hn, asg)) &&
+		  (pm->node.flags & PM_NAMEREF)))) {
+		if (pm->node.flags & PM_SPECIAL)
+		    zwarnnam(name, "%s: invalid reference", pm->node.nam);
+		else
+		    zwarnnam(name, "%s: invalid self reference", asg->name);
+		returnval = 1;
+		continue;
+	    }
+	    if (hn) {
+		/* namerefs always start over fresh */
+		if (((Param)hn)->level >= locallevel)
+		    unsetparam_pm((Param)hn, 0, 1);
+		hn = NULL;
+	    }
+	}
+
 	if (!typeset_single(name, asg->name, (Param)hn,
 			    func, on, off, roff, asg, NULL,
 			    ops, 0))
@@ -3805,7 +3852,8 @@ bin_unset(char *name, char **argv, Options ops, int func)
 		returnval = 1;
 	    }
 	} else {
-	    if (unsetparam_pm(pm, 0, 1))
+	    if ((pm = (Param)resolve_nameref(pm, NULL)) &&
+		unsetparam_pm(pm, 0, 1))
 		returnval = 1;
 	}
 	if (ss)
diff --git a/Src/params.c b/Src/params.c
index 6362b382c..69b7f484f 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -536,6 +536,9 @@ getparamnode(HashTable ht, const char *nam)
 		 nam);
 	}
     }
+
+    if (hn && ht == realparamtab)
+	hn = resolve_nameref(pm, NULL);
     return hn;
 }
 
@@ -993,6 +996,34 @@ createparam(char *name, int flags)
 			 gethashnode2(paramtab, name) :
 			 paramtab->getnode(paramtab, name));
 
+	if (oldpm && (oldpm->node.flags & PM_NAMEREF) &&
+	    !(flags & PM_NAMEREF)) {
+	    Param lastpm;
+	    struct asgment stop;
+	    stop.flags = PM_NAMEREF | (flags & PM_LOCAL);
+	    stop.name = oldpm->node.nam;
+	    stop.value.scalar = oldpm->u.str;
+	    lastpm = (Param)resolve_nameref(oldpm, &stop);
+	    if (lastpm) {
+		if (lastpm->node.flags & PM_NAMEREF) {
+		    if (lastpm->u.str && *(lastpm->u.str)) {
+			name = lastpm->u.str;
+			oldpm = NULL;
+		    } else {
+			if (!(lastpm->node.flags & PM_READONLY))
+			    lastpm->node.flags |= PM_UNSET;
+			return lastpm;
+		    }
+		} else {
+		    /* nameref pointing to an unset local */
+		    DPUTS(!(lastpm->node.flags & PM_UNSET),
+			  "BUG: local parameter is not unset");
+		    oldpm = lastpm;
+		}
+	    } else
+		flags |= PM_NAMEREF;
+	}
+
 	DPUTS(oldpm && oldpm->level > locallevel,
 	      "BUG: old local parameter not deleted");
 	if (oldpm && (oldpm->level == locallevel || !(flags & PM_LOCAL))) {
@@ -2109,6 +2140,23 @@ fetchvalue(Value v, char **pptr, int bracks, int flags)
 	    memset(v, 0, sizeof(*v));
 	else
 	    v = (Value) hcalloc(sizeof *v);
+	if ((pm->node.flags & PM_NAMEREF) && pm->u.str && *(pm->u.str)) {
+	    /* only happens for namerefs pointing to array elements */
+	    char *ref = dupstring(pm->u.str);
+	    char *ss = pm->width ? ref + pm->width : NULL;
+	    if (ss) {
+		sav = *ss;
+		*ss = 0;
+	    }
+	    Param p1 = (Param)gethashnode2(paramtab, ref);
+	    if (!(p1 && (pm = upscope(p1, pm->base))) ||
+		((pm->node.flags & PM_UNSET) &&
+		!(pm->node.flags & PM_DECLARED)))
+		return NULL;
+	    if (ss)
+		*ss = sav;
+	    s = ss;
+	}
 	if (PM_TYPE(pm->node.flags) & (PM_ARRAY|PM_HASHED)) {
 	    /* Overload v->isarr as the flag bits for hashed arrays. */
 	    v->isarr = flags | (isvarat ? SCANPM_ISVAR_AT : 0);
@@ -2677,6 +2725,7 @@ assignstrvalue(Value v, char *val, int flags)
         }
 	break;
     }
+    setscope(v->pm);
     if ((!v->pm->env && !(v->pm->node.flags & PM_EXPORTED) &&
 	 !(isset(ALLEXPORT) && !(v->pm->node.flags & PM_HASHELEM))) ||
 	(v->pm->node.flags & PM_ARRAY) || v->pm->ename)
@@ -3084,11 +3133,20 @@ assignsparam(char *s, char *val, int flags)
 	}
     }
     if (!v && !(v = getvalue(&vbuf, &t, 1))) {
-	unqueue_signals();
 	zsfree(val);
+	unqueue_signals();
 	/* errflag |= ERRFLAG_ERROR; */
 	return NULL;
     }
+    if (*val && (v->pm->node.flags & PM_NAMEREF)) {
+	if (!valid_refname(val)) {
+	    zerr("invalid variable name: %s", val);
+	    zsfree(val);
+	    unqueue_signals();
+	    errflag |= ERRFLAG_ERROR;
+	    return NULL;
+	}
+    }
     if (flags & ASSPM_WARN)
 	check_warn_pm(v->pm, "scalar", created, 1);
     v->pm->node.flags &= ~PM_DEFAULTED;
@@ -3115,8 +3173,8 @@ assignsparam(char *s, char *val, int flags)
 			lhs.u.l = lhs.u.l + (zlong)rhs.u.d;
 		}
 		setnumvalue(v, lhs);
-    	    	unqueue_signals();
 		zsfree(val);
+    	    	unqueue_signals();
 		return v->pm; /* avoid later setstrvalue() call */
 	    case PM_ARRAY:
 	    	if (unset(KSHARRAYS)) {
@@ -3141,9 +3199,9 @@ assignsparam(char *s, char *val, int flags)
 	    case PM_INTEGER:
 	    case PM_EFLOAT:
 	    case PM_FFLOAT:
+		zsfree(val);
 		unqueue_signals();
 		zerr("attempt to add to slice of a numeric variable");
-		zsfree(val);
 		return NULL;
 	    case PM_ARRAY:
 	      kshappend:
@@ -3602,7 +3660,8 @@ unsetparam(char *s)
     if ((pm = (Param) (paramtab == realparamtab ?
 		       /* getnode2() to avoid autoloading */
 		       paramtab->getnode2(paramtab, s) :
-		       paramtab->getnode(paramtab, s))))
+		       paramtab->getnode(paramtab, s))) &&
+	!(pm->node.flags & PM_NAMEREF))
 	unsetparam_pm(pm, 0, 1);
     unqueue_signals();
 }
@@ -5783,7 +5842,8 @@ static const struct paramtypes pmtypes[] = {
     { PM_TAGGED, "tagged", 't', 0},
     { PM_EXPORTED, "exported", 'x', 0},
     { PM_UNIQUE, "unique", 'U', 0},
-    { PM_TIED, "tied", 'T', 0}
+    { PM_TIED, "tied", 'T', 0},
+    { PM_NAMEREF, "namref", 'n', 0}
 };
 
 #define PMTYPES_SIZE ((int)(sizeof(pmtypes)/sizeof(struct paramtypes)))
@@ -6037,3 +6097,123 @@ printparamnode(HashNode hn, int printflags)
     else if (!(printflags & PRINT_KV_PAIR))
 	putchar('\n');
 }
+
+/**/
+mod_export HashNode
+resolve_nameref(Param pm, const Asgment stop)
+{
+    HashNode hn = (HashNode)pm;
+    const char *seek = stop ? stop->value.scalar : NULL;
+
+    if (pm && (pm->node.flags & PM_NAMEREF)) {
+	if (pm && (pm->node.flags & (PM_UNSET|PM_TAGGED))) {
+	    /* Semaphore with createparam() */
+	    pm->node.flags &= ~PM_UNSET;
+	    /* See V10private.ztst end is in scope but private:
+	    if (pm->node.flags & PM_SPECIAL)
+		return NULL;
+	    */
+	    return (HashNode) pm;
+	} else if (pm->u.str) {
+	    if ((pm->node.flags & PM_TAGGED) ||
+		(stop && strcmp(pm->u.str, stop->name) == 0)) {
+		/* zwarnnam(pm->u.str, "invalid self reference"); */
+		return stop ? (HashNode)pm : NULL;
+	    }
+	    if (*(pm->u.str))
+		seek = pm->u.str;
+	}
+    }
+    else if (pm && !(stop && (stop->flags & PM_NAMEREF)))
+	return (HashNode)pm;
+    if (seek) {
+	queue_signals();
+	/* pm->width is the offset of any subscript */
+	if (pm && (pm->node.flags & PM_NAMEREF) && pm->width) {
+	    if (stop) {
+		if (stop->flags & PM_NAMEREF)
+		    hn = (HashNode)pm;
+		else
+		    hn = NULL;
+	    } else {
+		/* this has to be the end of any chain */
+		hn = (HashNode)pm;	/* see fetchvalue() */
+	    }
+	} else if ((hn = gethashnode2(realparamtab, seek))) {
+	    if (pm) {
+		if (!(stop && (stop->flags & (PM_LOCAL))))
+		    hn = (HashNode)upscope((Param)hn,
+					   ((pm->node.flags & PM_NAMEREF) ?
+					    pm->base : ((Param)hn)->level));
+		/* user can't tag a nameref, safe for loop detection */
+		pm->node.flags |= PM_TAGGED;
+	    }
+	    if (hn) {
+		if (hn->flags & PM_AUTOLOAD)
+		    hn = getparamnode(realparamtab, seek);
+		if (!(hn->flags & PM_UNSET))
+		    hn = resolve_nameref((Param)hn, stop);
+	    }
+	    if (pm)
+		pm->node.flags &= ~PM_TAGGED;
+	} else if (stop && (stop->flags & PM_NAMEREF))
+	    hn = (HashNode)pm;
+	unqueue_signals();
+    }
+
+    return hn;
+}
+
+/**/
+static void
+setscope(Param pm)
+{
+    if (pm->node.flags & PM_NAMEREF) {
+	Param basepm;
+	char *t = pm->u.str ? itype_end(pm->u.str, IIDENT, 0) : NULL;
+
+	/* Temporarily change nameref to array parameter itself */
+	if (t && *t == '[')
+	    *t = 0;
+	else
+	    t = 0;
+	basepm = (Param)resolve_nameref(pm, NULL);
+	if (t) {
+	    pm->width = t - pm->u.str;
+	    *t = '[';
+	}
+	if (basepm)
+	    pm->base = ((basepm->node.flags & PM_NAMEREF) ?
+			basepm->base : basepm->level);
+    }
+}
+
+/**/
+mod_export Param
+upscope(Param pm, int reflevel)
+{
+    Param up = pm->old;
+    while (pm && up && up->level >= reflevel) {
+	pm = up;
+	if (up)
+	    up = up->old;
+    }
+    return pm;
+}
+
+/**/
+mod_export int
+valid_refname(char *val)
+{
+    char *t = itype_end(val, IIDENT, 0);
+
+    if (*t != 0) {
+	if (*t == '[') {
+	    tokenize(t = dupstring(t+1));
+	    t = parse_subscript(t, 0, ']');
+	} else {
+	    t = NULL;
+	}
+    }
+    return !!t;
+}
diff --git a/Src/subst.c b/Src/subst.c
index 4ad9fee1a..7ba84cba8 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -2573,13 +2573,14 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 			       !(v->pm->node.flags & PM_UNSET))) {
 		int f = v->pm->node.flags;
 
-		switch (PM_TYPE(f)) {
+		switch (PM_TYPE(f)|(f & PM_NAMEREF)) {
 		case PM_SCALAR:  val = "scalar"; break;
 		case PM_ARRAY:   val = "array"; break;
 		case PM_INTEGER: val = "integer"; break;
 		case PM_EFLOAT:
 		case PM_FFLOAT:  val = "float"; break;
 		case PM_HASHED:  val = "association"; break;
+		case PM_NAMEREF: val = "nameref"; break;
 		}
 		val = dupstring(val);
 		if (v->pm->level)
diff --git a/Src/zsh.h b/Src/zsh.h
index f82e76e4b..1e35bd33e 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1852,8 +1852,9 @@ struct param {
 	GsuHash h;
     } gsu;
 
-    int base;			/* output base or floating point prec    */
-    int width;			/* field width                           */
+    int base;			/* output base or floating point prec or */
+ 				/* for namerefs, locallevel of reference */
+    int width;			/* field width or nameref subscript idx  */
     char *env;			/* location in environment, if exported  */
     char *ename;		/* name of corresponding environment var */
     Param old;			/* old struct for use with local         */
@@ -1932,9 +1933,10 @@ struct tieddata {
 				 */
 #define PM_HASHELEM     (1<<28) /* is a hash-element */
 #define PM_NAMEDDIR     (1<<29) /* has a corresponding nameddirtab entry    */
+#define PM_NAMEREF      (1<<30) /* pointer to a different parameter         */
 
 /* The option string corresponds to the first of the variables above */
-#define TYPESET_OPTSTR "aiEFALRZlurtxUhHTkz"
+#define TYPESET_OPTSTR "aiEFALRZlurtxUhHT"
 
 /* These typeset options take an optional numeric argument */
 #define TYPESET_OPTNUM "LRZiEF"

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-06  2:21 [PATCH 1/3]: Add named references Bart Schaefer
@ 2023-02-08  3:45 ` Oliver Kiddle
  2023-02-08  4:59   ` Bart Schaefer
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-08  3:45 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> When a reference is created, it's necessary to determine the scope in

I've run some tests and this looks good. The main issue I noticed was
with a reference to associative array elements.

  typeset -A assoc
  typeset -n ref=assoc[other]
  ref=val

This does work if assoc is assigned an initial value.

The error message from, e.g. `typeset -Z2 ref` is "can't change type
of a named reference". However, this is normally an attempt to change
the type of the target of a reference (-n is not specified). It works
fine for a scalar. So perhaps the error message should complain about
changing the type of [associative] array elements instead.

One ksh feature which this doesn't cover is the special behaviour in ksh
that `typeset -n ref=$1` will use the calling function scope to resolve
the target. This is necessary with the lexical (private) scoping of
ksh93 functions. The positional parameters are special-cased which I
find rather ugly. You could overload -u or -U for indicating Tcl-like
uplevel. The trouble with uplevel is that a called function can't know
how many levels up to specify if people are allowed to write wrappers.
I think a better solution is to take some syntax that isn't a valid
variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would
indicate where another level of scope should be ascended. Wrapper
functions then don't need their own nameref and can just pass, e.g.
\&var as the variable parameter.

> One difference from ksh93 is that you can't convert scalars into
> namerefs, e.g., both with this and in ksh93:

That seems reasonable enough.

> Named references can't have any other attributes, and attempting to
> add an attribute to a named reference generates a warning.  However,

I can't see an actual conflict in permitting -r, making the reference
readonly not the target variable.

> With respect to zsh/param/private parameters, it's presently
> prohibited to have a "public" reference to a private parameter.  It
> might be possible to change this if someone can think of a use case.

To a user it would seem like a fairly arbitrary restriction.

Coming back to the uplevel question, for private to be more useful it'd
be really great to be able to have a private reference to a private
variable from the calling function scope. I'm not sure how easy that
even is with the current implementation of private.

> There is one glitch: if the reference is created before the parameter
> is declared private, and the private local is NOT hiding something in
> an outer scope, then the reference does not retroactively generate an
> error and assignments to the reference are silent no-ops.

Not sure I follow. Sounds like something that ought to be fixed.

> I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can
> tell they should never have been there in the first place.

typeset -fu is like autoload. I think it was intentional.

Note that recent bash also has namerefs so should perhaps be compared.
They only resolve in local scope and don't include the local level with
the reference.

Oliver

PS. The basic approach is much the same as that which was rejected many
years back with consensus at that time being that a C pointer reference
should be used. I did make a start at the time using reference counting
but real life stuff got in the way and it was never completed. Zsh's
code does a lot of deleting and recreating parameter structures which is
problematic with that approach. I do still have an old patch that last
got rebased as recently as only 7 years ago if you're interested. Aside
from not working with array subscripts, I think I also concluded that it
wouldn't work for private variables because of their somewhat too clever
implementation.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-08  3:45 ` Oliver Kiddle
@ 2023-02-08  4:59   ` Bart Schaefer
  2023-02-08 23:16     ` Bart Schaefer
  2023-02-09  0:47     ` Oliver Kiddle
  0 siblings, 2 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-08  4:59 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Tue, Feb 7, 2023 at 7:45 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> with a reference to associative array elements.
>
>   typeset -A assoc
>   typeset -n ref=assoc[other]
>   ref=val
>
> This does work if assoc is assigned an initial value.

This should work after patch 4/3 that I just sent.

% typeset -n ptr=foo
% typeset -A assoc
% typeset -n ref=assoc[other]
% ref=val
% typeset -p assoc
typeset -A assoc=( [other]=val )

> The error message from, e.g. `typeset -Z2 ref` is "can't change type
> of a named reference". However, this is normally an attempt to change
> the type of the target of a reference (-n is not specified). It works
> fine for a scalar. So perhaps the error message should complain about
> changing the type of [associative] array elements instead.

Hmm, yes, that message should be changed if possible.

> One ksh feature which this doesn't cover is the special behaviour in ksh
> that `typeset -n ref=$1` will use the calling function scope to resolve
> the target.

This should also work after patch 4/3.

% typeset outside=OUTSIDE
% () {
typeset -n ref=$1
print $ref
} outside
OUTSIDE

Or do I misunderstand?

> The positional parameters are special-cased which I
> find rather ugly.

There's nothing explaining this in "man ksh93", can you point me
somewhere?  Not that your description makes this sound particular
attractive.

I think the positionals might actually be best handled by a new
special parameter.  Cogitation needed.

> You could overload -u or -U for indicating Tcl-like
> uplevel.

It'd be easier to overload "-i N" for this since it already populates
the "base" integer in Param.

> The trouble with uplevel is that a called function can't know
> how many levels up to specify if people are allowed to write wrappers.

if the current locallevel were visible (cf. new special) then something like
 typeset -i $((locallevel-1)) -n up=argv
could work.  I think the current implementation of argv breaks this, tho.

I'm still not entirely following what problem this solves.

> I think a better solution is to take some syntax that isn't a valid
> variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would
> indicate where another level of scope should be ascended. Wrapper
> functions then don't need their own nameref and can just pass, e.g.
> \&var as the variable parameter.

You're going to have to write out an example, sorry.

> I can't see an actual conflict in permitting -r, making the reference
> readonly not the target variable.

I'm also thinking about that.  The only use-case would be to prevent
something from doing "typeset +n ref".

> > With respect to zsh/param/private parameters, it's presently
> > prohibited to have a "public" reference to a private parameter.  It
> > might be possible to change this if someone can think of a use case.
>
> To a user it would seem like a fairly arbitrary restriction.

What purpose does it serve?  How does it help to have two names for
the same thing, one of which is "hidden" and the other isn't?  The
only analogy I can think of would be having a writable name for a
read-only variable.

> Coming back to the uplevel question, for private to be more useful it'd
> be really great to be able to have a private reference to a private
> variable from the calling function scope. I'm not sure how easy that
> even is with the current implementation of private.

You presently can't make anything that's private visible down-scope.
If you mean you want a variable that is visible in exactly one nested
scope, I'm back to asking for the use case.

> > There is one glitch: if the reference is created before the parameter
> > is declared private, and the private local is NOT hiding something in
> > an outer scope, then the reference does not retroactively generate an
> > error and assignments to the reference are silent no-ops.
>
> Not sure I follow. Sounds like something that ought to be fixed.

In Test/V10private.ztst it's the test with
 F:See K01typeset.ztst up-reference part 5
 F:Here ptr1 finds private ptr2 by scope mismatch, assignment silently fails

> > I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can
> > tell they should never have been there in the first place.
>
> typeset -fu is like autoload. I think it was intentional.

Except every letter in TYPESET_OPTSTR has to correspond to the PM_
flag at that same index position, and the PM_ flags matching the k and
z options are way out of sequence for that.

> Note that recent bash also has namerefs so should perhaps be compared.

The special behavior of namerefs in "for" loops and implementing
"unset -n" might be good; the latter isn't mentioned until the "unset"
builtin in the ksh93 doc so I missed it entirely.

> PS. The basic approach is much the same as that which was rejected many
> years back with consensus at that time being that a C pointer reference
> should be used.

And yet we implemented ${(P)...} which is actually a bit grottier.  I
did consider overloading pm->old but was leery of messing up locals so
decided to leave it as a future optimization.

> I did make a start at the time using reference counting

That was the other part that worried me ... everything just went
(relatively) smoothly if most of the code could continue pretending
namerefs were scalars.

> I think I also concluded that it
> wouldn't work for private variables because of their somewhat too clever
> implementation.

They could be a lot less clever if they could move into the main
shell.  Most of the cleverness was necessary for making them modular.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-08  4:59   ` Bart Schaefer
@ 2023-02-08 23:16     ` Bart Schaefer
  2023-02-09  0:47     ` Oliver Kiddle
  1 sibling, 0 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-08 23:16 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Feb 7, 2023 at 8:59 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Tue, Feb 7, 2023 at 7:45 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > The error message from, e.g. `typeset -Z2 ref` is "can't change type
> > of a named reference".
>
> Hmm, yes, that message should be changed if possible.

It'd be more effort than it's worth to distinguish the type of array
involved, but my next patch will change this to say "of array
element".

> if the current locallevel were visible (cf. new special) then something like
>  typeset -i $((locallevel-1)) -n up=argv
> could work.  I think the current implementation of argv breaks this, tho.

The up-level argv is a char ** scoped to doshfunc() so it'd be rather
icky to expose it as a shell parameter.

> The special behavior of namerefs in "for" loops and implementing
> "unset -n" might be good; the latter isn't mentioned until the "unset"
> builtin in the ksh93 doc so I missed it entirely.

Next patch will have both of these, too.  I note that execfor() has
quite a bit of room for optimization, but I haven't dived into that.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-08  4:59   ` Bart Schaefer
  2023-02-08 23:16     ` Bart Schaefer
@ 2023-02-09  0:47     ` Oliver Kiddle
  2023-02-09  2:01       ` Oliver Kiddle
  2023-02-09  4:49       ` Bart Schaefer
  1 sibling, 2 replies; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-09  0:47 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> > One ksh feature which this doesn't cover is the special behaviour in ksh
> > that `typeset -n ref=$1` will use the calling function scope to resolve
> > the target.
>
> This should also work after patch 4/3.
>
> % typeset outside=OUTSIDE
> % () {
> typeset -n ref=$1
> print $ref
> } outside
> OUTSIDE
>
> Or do I misunderstand?

In ksh you could pass "ref" as a parameter and the nameref would provide
access to the outside variable despite the name clash.

> > The positional parameters are special-cased which I
> > find rather ugly.
>
> There's nothing explaining this in "man ksh93", can you point me
> somewhere?  Not that your description makes this sound particular
> attractive.

The manual does seem rather vague on this. I've long been aware of it so
the detail may have come to me via the old ast-ksh lists. But to
demonstrate:

% function f {
>   typeset out=99 var=$1
>   nameref ref1=$1
>   nameref ref2=$var
>   echo f: $ref1 $ref2
>   ref1=1
>   ref2=2
> }
% out=0
% f out
f: 0 99
% echo $out
1

So $1 as the target of a nameref is special. I don't especially like
this. Everywhere else in the shell, a variable reference like $1 is no
different from using the text it would expand to explicitly. I don't
think we need to worry about the level of ksh93 compatibility here
because our default dynamic scoping mean a ksh script will work if
variable names don't clash.

> > You could overload -u or -U for indicating Tcl-like
> > uplevel.
>
> It'd be easier to overload "-i N" for this since it already populates
> the "base" integer in Param.

Not with the right number, though. Tcl's uplevel gives a distance (up the
procedure calling stack) to move. 1 being the default. You can prefix
with # to get an absolute level but that doesn't seem especially useful.

So 1 would mean $((locallevel-1)) in absolute terms.
Note that Tcl's uplevel is like a precommand modifier in zsh terms so
is not limited to defining references.

However I still prefer the other solution I mentioned with \&.

> I'm still not entirely following what problem this solves.

It solves the problem of name clashes between a function and the calling
function.

Consider Completion/Base/Utility/_call_function,
It expects the name of a parameter in $1.
It does:
  local _name _ret
  _name="$1"
  ...
  _ret=...
  eval "${_name}=${_ret}"
The underscores are there largely to avoid name clashes. In ksh, this could
instead be:
  nameref ret="$1"
  ret=...
And it wouldn't matter if ret is also the variable name in the calling
function.

> > I think a better solution is to take some syntax that isn't a valid
> > variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would
> > indicate where another level of scope should be ascended. Wrapper
> > functions then don't need their own nameref and can just pass, e.g.
> > \&var as the variable parameter.
>
> You're going to have to write out an example, sorry.

With this idea:
  typeset -n ret=\&1
would be like ksh's:
  typeset -n ret=$1
But you could also do
  typeset -n ret=\&var

The code would strip the initial \&, go up one local level and see what
$var is in that scope. If the result also starts with an &, it would
repeat the process going up a further level. So a wrapper function to,
e.g. _call_function could pass "&$1" directly as the first parameter
instead of needing a local nameref of it's own. You could optionally
allow "&&var" to go up two levels. It doesn't need to be & if you think
a different character is better, it just needs to not be a valid
variable name. Does that make any more sense than the previous
explanation?

> I'm also thinking about that.  The only use-case would be to prevent
> something from doing "typeset +n ref".

Also prevents typeset -n ref=othervar
It is just as useful or useless as -r with other variables.

> > > With respect to zsh/param/private parameters, it's presently
> > > prohibited to have a "public" reference to a private parameter.  It
> > > might be possible to change this if someone can think of a use case.
> >
> > To a user it would seem like a fairly arbitrary restriction.
>
> What purpose does it serve?  How does it help to have two names for
> the same thing, one of which is "hidden" and the other isn't?  The
> only analogy I can think of would be having a writable name for a
> read-only variable.

If two functions are developed independently, they shouldn't need
to worry about the possibility of using the same variable name for
different purposes. This is the main benefit of private in general.
An old function like _call_function (updated to use nameref but still
using local) could be used by someone writing a new function where they
pass a private variable as the first parameter. If they are treating
_call_function as a black box, how are they to know that private
variables are not valid for the passed $1. It'd be an arbitrary
limitation.

> The special behavior of namerefs in "for" loops and implementing
> "unset -n" might be good; the latter isn't mentioned until the "unset"
> builtin in the ksh93 doc so I missed it entirely.

Given that you implemented references to array members, being able to
iterate over array elements with for could be nice. Perhaps something
like: for ref in 'arr[*]'; do
maybe [@] to include empty elements.

On the subject references to array elements, it does seem both powerful and
dangerous that subscripts are evaluated on reference. The subscript
should perhaps be evaluated with the same local level as the array
itself. And it could be wise to limit what can be done as part of the
subscript evaluation to avoid a CVE similar to the last one.

> > I think I also concluded that it
> > wouldn't work for private variables because of their somewhat too clever
> > implementation.
>
> They could be a lot less clever if they could move into the main
> shell.  Most of the cleverness was necessary for making them modular.

I would favour moving them to the main shell. And then making extensive
use of them in completon etc. There are things that could move in the
opposite direction if we want to reduce bloat. Mail checking for
example.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  0:47     ` Oliver Kiddle
@ 2023-02-09  2:01       ` Oliver Kiddle
  2023-02-09  5:45         ` Bart Schaefer
  2023-02-09  4:49       ` Bart Schaefer
  1 sibling, 1 reply; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-09  2:01 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh workers

This isn't a self-reference because the local levels would be different.
And "self reference" in the error message should be hyphenated.
  typeset -n ref=var
  foo() { typeset -n var=ref }
  foo
  foo:typeset:1: var: invalid self reference
With -g it would be, however.

I wrote:
> On the subject references to array elements, it does seem both powerful and
> dangerous that subscripts are evaluated on reference. The subscript

This is fairly sane:
  typeset -n ref1=arr[ref2] 
  typeset -n ref2=ref1     
  echo $ref1
  zsh: math recursion limit exceeded: ref2

This seems useful:
  arr=()
  typeset -n ref=arr[1,0]
  ref=4
  ref=5

Similar:
  idx=1
  typeset -n ref=arr[idx++]
  echo $ref $ref

This is where I worry about security:
  arr=( 1 2 3 4)
  typeset -n ref='arr[$(echo 4)]'
  echo $ref
  typeset -p ref  # it was only evaluated late thanks to quotes
Maybe the code should have a safer mode for subscript evaluation. This
could be useful but it is asking for trouble.

Also can see this being useful:
  str='one two three'
  typeset -n ref=str[(w)2]
  ref=zwei

We already talked about this error message. But now it is a substring.
  export ref
  export: ref: can't change type of a named reference

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  0:47     ` Oliver Kiddle
  2023-02-09  2:01       ` Oliver Kiddle
@ 2023-02-09  4:49       ` Bart Schaefer
  2023-02-09 20:49         ` Oliver Kiddle
  1 sibling, 1 reply; 25+ messages in thread
From: Bart Schaefer @ 2023-02-09  4:49 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Wed, Feb 8, 2023 at 4:47 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > > One ksh feature which this doesn't cover is the special behaviour in ksh
> > > that `typeset -n ref=$1` will use the calling function scope to resolve
> > > the target.
> >
>
> In ksh you could pass "ref" as a parameter and the nameref would provide
> access to the outside variable despite the name clash.

I think that will start working if I can fix the "invalid self
reference" mentioned in your next message.

> So [in ksh] $1 as the target of a nameref is special. [...] I don't
> think we need to worry about the level of ksh93 compatibility here
> because our default dynamic scoping mean a ksh script will work if
> variable names don't clash.

My general feeling is to be ksh compatible where I don't think ksh is stupid.

> Note that Tcl's uplevel is like a precommand modifier in zsh terms so
> is not limited to defining references.
>
> However I still prefer the other solution I mentioned with \&.

I have some half-formed thoughts about other ways to accomplish this
... I don't think the \& suggestion by itself gets around the problem
that the positionals are C-local to doshfunc().

> > I'm still not entirely following what problem this solves.
>
> It solves the problem of name clashes between a function and the calling
> function.

OK, so if
  typeset ref
  () { typeset -n ref=ref }
works / is not a loop, then we don't need any other magic?

> If two functions are developed independently, they shouldn't need
> to worry about the possibility of using the same variable name for
> different purposes. This is the main benefit of private in general.
> An old function like _call_function (updated to use nameref but still
> using local) could be used by someone writing a new function where they
> pass a private variable as the first parameter. If they are treating
> _call_function as a black box, how are they to know that private
> variables are not valid for the passed $1. It'd be an arbitrary
> limitation.

I'm still not seeing that.  The writer of the caller of _call_function
should know that it's a general rule that another function can't
access a private variable.  Why would the writer pass a private name
as an argument to any function, even a blackbox, if not expecting it
to be referenced?  The rule for a private should be that you always
pass its value.

Correct me if I still misunderstand, but I interpret the example above
as implying that

  f2() {
    typeset -n ref=$1
    # ... do stuff to $ref ...
  }
  f1() {
    private var
    f2 var
  }

should work just because the name is passed in a positional?  I would
disagree with that on the grounds that you're making positionals into
a special case, just a different kind of special case from ksh, that
violates the intent of "private".

The case I was asking about (and the only one I so far consider
possibly viable) is

  f1() {
    private var
    typeset -n ref=var
    f2 ref
  }

That is, the reference has to be declared in the scope where var is
"visible", otherwise you hit the privacy wall when "typeset -n" has to
up-level.

Or perhaps you mean to be tagging on to your other suggestion and it would be
  f2 \&var
which feels like syntactic sugar for my example above, with the
benefit of not needing to declare an extra parameter.  That's an
interesting idea but I would not want this to work:

  f2() {
    typeset -n upref=\&var
    # ... do things to var one level up ...
    typeset -n uptwo=\&\&var
    # ... do things to var two levels up ...
  }

I fear that puts us back to having something magic about positional
parameters, e.g., that unless the "&var" string appears in one of $1,
$2, ... then it's just an invalid name like any other.

Back again to my half-formed thoughts, I can't really elucidate yet.

> Given that you implemented references to array members, being able to
> iterate over array elements with for could be nice. Perhaps something
> like: for ref in 'arr[*]'; do

With a hash that's just:

  typeset -n ref
  for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...

but since you have to "typeset -n ref" before the for-loop anyway, why not just

  typeset -n ref=hash
  for idx in ${(k)ref}; do ... $ref[$idx] ...

Ordinary arrays are harder because there's no "find all matching
elements" subscript flag.

> On the subject references to array elements, it does seem both powerful and
> dangerous that subscripts are evaluated on reference.

Hm ... I implemented that because ksh93 accepted the syntax, but I
never actually tried it -- in fact (Ubuntu's) ksh93 doesn't actually
expand it to anything sensible and the value of the nameref ends up
nonsense.

> The subscript
> should perhaps be evaluated with the same local level as the array
> itself.

That gets pretty weird, and you can already trivially bypass it with
${(P)...} for many cases.

> And it could be wise to limit what can be done as part of the
> subscript evaluation to avoid a CVE similar to the last one.

validate_refname() is calling parse_subscript() ... would further
examination of the accepted string be sufficient, or something more
needed?  I think the greatest danger is of creating an infinite
recursion, which can't really be caught in advance.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  2:01       ` Oliver Kiddle
@ 2023-02-09  5:45         ` Bart Schaefer
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-09  5:45 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh workers

On Wed, Feb 8, 2023 at 6:01 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> This isn't a self-reference because the local levels would be different.

Working on that.  Halfway there.

> And "self reference" in the error message should be hyphenated.

I copied that message verbatim from ksh93.  Can still change it, but
does it really matter?

> Maybe the code should have a safer mode for subscript evaluation.

See previous message.

> We already talked about this error message. But now it is a substring.
>   export ref
>   export: ref: can't change type of a named reference

The point at which it's necessary to catch that error would have to
lop off the subscript and re-evaluate the base variable name in order
to determine the thing being subscripted.  That seems like an
unnecessary amount of work.

How about
  can't change type via subscript reference
??


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09  4:49       ` Bart Schaefer
@ 2023-02-09 20:49         ` Oliver Kiddle
  2023-02-09 23:07           ` Bart Schaefer
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-09 20:49 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> My general feeling is to be ksh compatible where I don't think ksh is stupid.

Sounds reasonable.

> > It solves the problem of name clashes between a function and the calling
> > function.
>
> OK, so if
>   typeset ref
>   () { typeset -n ref=ref }
> works / is not a loop, then we don't need any other magic?

The following is similar:
  var=hello
  typeset -n ref
  () {
    typeset var=x
    ref=var
    echo $ref
  }
  typeset -p ref
  echo $ref

This creates a reference to a variable at a deeper local level. Ksh
prints "global reference cannot refer to local variable". With zsh's
dynamic scoping you could do the same thing with a reference that isn't
"global" but which has a scope that will outlast the variable. Making
var outlive the function is a feature you'd get with a reference
counting approach to namerefs. The best might be if ref returns to being
unset when the function returns but an error like ksh is fine too.

> I'm still not seeing that.  The writer of the caller of _call_function
> should know that it's a general rule that another function can't
> access a private variable.  Why would the writer pass a private name
> as an argument to any function, even a blackbox, if not expecting it
> to be referenced?  The rule for a private should be that you always
> pass its value.

I hadn't really thought about it that way, perhaps because ksh only
has private scoping and I'm used to writing in languages that only
have lexical scoping. Certainly if it is hard to implement, I have no
objection to this approach. It does have the big advantage of making the
whole uplevel question a non-issue. We do lose some orthogonality in
that you can use a private with builtins that take variable names like
read, compadd and printf (-v). Wrappers of those would have an added
limitation. (_approximate relies on a function wrapper of compadd so
privates can't be used with that)

When relying only on dynamic scoping, it would be good practice to
define all the namerefs to passed parameters as early as possible in a
function to reduce the risk of a name clash.

> Correct me if I still misunderstand, but I interpret the example above
> as implying that
>
>   f2() {
>     typeset -n ref=$1
>     # ... do stuff to $ref ...
>   }
>   f1() {
>     private var
>     f2 var
>   }
>
> should work just because the name is passed in a positional?  I would
> disagree with that on the grounds that you're making positionals into
> a special case, just a different kind of special case from ksh, that
> violates the intent of "private".

It isn't about the positionals being special but that it is useful to be
able to write a function that exposes an interface similar to read where
a variable can be named as a parameter. Ksh's making $1, $2.. special
on the rhs of typeset -n really is very ugly.

> The case I was asking about (and the only one I so far consider
> possibly viable) is
>
>   f1() {
>     private var
>     typeset -n ref=var
>     f2 ref
>   }

This puts the complexity into the caller so f2's interface is less
nice. And by needing ref as a local, f1 hasn't really benefitted from
var being able to be private. But if this is easy to implement then it
could be useful. If not, printing an error is fine too.

> Or perhaps you mean to be tagging on to your other suggestion and it would be
>   f2 \&var

My intention with that suggestion is that you'd only do that to refer to
$var from the scope of f1's caller. So in practice that'd sooner be
something like \&$3. For this, it'd be just `f2 var` and f2() would
declare `private -n mine=\&1`

> which feels like syntactic sugar for my example above, with the
> benefit of not needing to declare an extra parameter.  That's an
> interesting idea but I would not want this to work:

Again, the key difference is that the caller needed the extra parameter
where as using some sort of uplevel goes in the callee. You can't wrap
e.g. compadd transparently if the caller needs to change.

> With a hash that's just:
>
>   typeset -n ref
>   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...

"just"!?

> > And it could be wise to limit what can be done as part of the
> > subscript evaluation to avoid a CVE similar to the last one.
>
> validate_refname() is calling parse_subscript() ... would further
> examination of the accepted string be sufficient, or something more
> needed?  I think the greatest danger is of creating an infinite
> recursion, which can't really be caught in advance.

So if a function gets the target of a nameref from untrusted input the
function writer needs to know to validate it with something like

  [[ $var = [[:IDENT:]]## ]]

It might deprive us of many clever tricks but parse_subscript() could
gain a flag to disable anything questionable like command substitution
and math evaluation side-effects.

This should be an error perhaps:

  typeset -n ref=arr[1][2]

Currently it isn't possible to create a reference fo $!, $?, $@, $+, $#
and $$. If easy to add, there would be no harm in them.

And in 51388, Bart wrote:
> How about
>   can't change type via subscript reference
> ??

That sounds fine.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09 20:49         ` Oliver Kiddle
@ 2023-02-09 23:07           ` Bart Schaefer
  2023-02-11  3:04             ` Bart Schaefer
  2023-02-12  9:02             ` Oliver Kiddle
  0 siblings, 2 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-09 23:07 UTC (permalink / raw)
  To: Zsh hackers list

On Thu, Feb 9, 2023 at 12:49 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> The following is similar:
>   var=hello
>   typeset -n ref
>   () {
>     typeset var=x
>     ref=var
>     echo $ref
>   }
>   typeset -p ref
>   echo $ref
>
> This creates a reference to a variable at a deeper local level.

Only in a pointers/refcounting implementation.  With the assumption of
dynamic scoping, it creates a reference to a name, where the scope
search for the name starts at a lower level.  If that name doesn't
exist at that level (because the whole level doesn't exist any more),
the search climbs up.  So I get with set -x (and a couple of
in-progress patches for looping references):

+Src/zsh:15> var=hello
+Src/zsh:16> typeset -n ref
+Src/zsh:17> '(anon)'
+(anon):1> typeset var=x
+(anon):2> ref=var
+(anon):3> echo x
x
+Src/zsh:22> typeset -p ref
typeset -n ref=var
+Src/zsh:23> echo hello
hello

If I add

  () {
   typeset var=y
   echo $ref
   () {
     typeset var=z
     echo $ref
   }
  }

I get

+Src/zsh:24> '(anon)'
+(anon):0> typeset var=y
+(anon):0> echo y
y
+(anon):1> '(anon)'
+(anon):0> typeset var=z
+(anon):0> echo y
y

> The best might be if ref returns to being
> unset when the function returns but an error like ksh is fine too.

I'm not sure how to do that without scanning the whole parameter
table, but I agree it the above is a little puzzling.

> Ksh prints "global reference cannot refer to local variable".

At what point does that happen?  Upon the assignment ref=var ?

Relatedly, what should happen on any failed assignment?  E.g.

typeset -n xy yx
xy=yx  # OK so far
yx=xy  # Oops, invalid loop

Should yx become unset, or should it remain a nameref with no referent?

> > The rule for a private should be that you always
> > pass its value.
>
> I hadn't really thought about it that way, perhaps because ksh only
> has private scoping and I'm used to writing in languages that only
> have lexical scoping. Certainly if it is hard to implement, I have no
> objection to this approach.

("This approach" meaning "no public refs to private vars"?)  I haven't
tried anything else yet to see how hard it might be.

> We do lose some orthogonality in
> that you can use a private with builtins that take variable names like
> read, compadd and printf (-v). Wrappers of those would have an added
> limitation.

That's true of private already, isn't it?

> When relying only on dynamic scoping, it would be good practice to
> define all the namerefs to passed parameters as early as possible in a
> function to reduce the risk of a name clash.

If you were going to put that in the doc, where would it go?

> It isn't about the positionals being special but that it is useful to be
> able to write a function that exposes an interface similar to read where
> a variable can be named as a parameter. Ksh's making $1, $2.. special
> on the rhs of typeset -n really is very ugly.

This works for my code in current state:

% var=GLOBAL
% typeset -n ref=var
% f() {
function> typeset -n ref=$1
function> print $ref
function> ref=LOCAL
function> }
% f ref
GLOBAL
% print $ref
LOCAL
%

> >   f2 \&var
>
> My intention with that suggestion is that you'd only do that to refer to
> $var from the scope of f1's caller. So in practice that'd sooner be
> something like \&$3. For this, it'd be just `f2 var` and f2() would
> declare `private -n mine=\&1`

Yeah, I don't like the idea that a called function can arbitrarily
grab parameters from its caller just by sticking & in front.  Caller
needs to do do something (even if only make "normal" use of dynamic
scope) to make the parameter "grabbable".

> > With a hash that's just:
> >
> >   typeset -n ref
> >   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...
>
> "just"!?

Hah!  Point was that it's do-able without "for"-context-sensitive
special subscript semantics.  I think it would be strange for

ary=( 1 2 3 4 5 )
typeset -n ref='ary[*]
ref=( a b c)

to produce something different than

ary=( 1 2 3 4 5 )
typeset -n ref
for ref in 'ary{*]'; do ref=( a b c ); done

> > > And it could be wise to limit what can be done as part of the
> > > subscript evaluation to avoid a CVE similar to the last one.
> >
> > validate_refname() is calling parse_subscript() ... would further
> > examination of the accepted string be sufficient, or something more
> > needed?  I think the greatest danger is of creating an infinite
> > recursion, which can't really be caught in advance.
>
> So if a function gets the target of a nameref from untrusted input the
> function writer needs to know to validate it

No, I meant, would examining the subscript string in the C code be sufficient.

> This should be an error perhaps:
>
>   typeset -n ref=arr[1][2]

Why?  ${ary[1][2]} isn't an error, it's the 2nd character of the first
word in $ary.

You can keep throwing subscripts on there as long as the resulting
substrings can be indexed-into.

print ${ary[1][3,9][4]} # etc.

> Currently it isn't possible to create a reference fo $!, $?, $@, $+, $#
> and $$. If easy to add, there would be no harm in them.

You can make references to argv and ARGC, but they always refer to the
current argv/ARGC because of the aforementioned implementation of
positionals as C locals.  $* $@ $# would have the same issues.

The others would all have to be special-cased individually.  What is
$+ ?  Do you mean $- ?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09 23:07           ` Bart Schaefer
@ 2023-02-11  3:04             ` Bart Schaefer
  2023-02-11  3:55               ` Bart Schaefer
  2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
  2023-02-12  9:02             ` Oliver Kiddle
  1 sibling, 2 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-11  3:04 UTC (permalink / raw)
  To: Zsh hackers list

Closing in on this, I think ... remaining decisions on which input
would be appreciated:

On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> > Ksh prints "global reference cannot refer to local variable".
>
> At what point does that happen?  Upon the assignment ref=var ?

At the moment, just toying with this, I have it erroring when
EMULATE_KSH, printing a warning when WARN_NESTED_VAR, and otherwise
silently creating the down-scope reference.  Thoughts?  Perhaps
WARN_CREATE_GLOBAL would actually be better here, or maybe check for
either one.

> Relatedly, what should happen on any failed assignment?  E.g.
>
> typeset -n xy yx
> xy=yx  # OK so far
> yx=xy  # Oops, invalid loop
>
> Should yx become unset, or should it remain a nameref with no referent?

This probably doesn't matter except at the command line as the
referent loop is otherwise fatal.

> > When relying only on dynamic scoping, it would be good practice to
> > define all the namerefs to passed parameters as early as possible in a
> > function to reduce the risk of a name clash.
>
> If you were going to put that in the doc, where would it go?

Options are:
- under Named References in Parameters
- at the description of -n under typeset
- with the scoping example in Expansion
- in a new section under Functions

> > > validate_refname() is calling parse_subscript() ... would further
> > > examination of the accepted string be sufficient, or something more
> > > needed?
> [...] I meant, would examining the subscript string in the C code be sufficient.
>
> > It might deprive us of many clever tricks but parse_subscript() could
> > gain a flag to disable anything questionable like command substitution
> > and math evaluation side-effects.

parse_subscript() is already a bit of a hack, it sets up its own
context and invokes the lexer.  A flag such as that would, I think,
require plugging in an alternate lexer, hence my question about doing
a simpler examination of the string.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  3:04             ` Bart Schaefer
@ 2023-02-11  3:55               ` Bart Schaefer
  2023-02-11  5:36                 ` Speaking of dangerous referents Bart Schaefer
  2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
  1 sibling, 1 reply; 25+ messages in thread
From: Bart Schaefer @ 2023-02-11  3:55 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, Feb 10, 2023 at 7:04 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> > > It might deprive us of many clever tricks but parse_subscript() could
> > > gain a flag to disable anything questionable like command substitution
> > > and math evaluation side-effects.

I'm not sure what to do about math side-effects, but I've hit upon the
idea of expanding subscripts that appear in named references with
EXECOPT set to zero (noexec).  That reduces $(...) to the empty
string, except for the special-case $(<file) that is being discussed
in another thread.

Switching to noexec around singsub() in getarg() is quite easy, and as
previously mentioned you can still do fully dynamic subscripts with
${(P)...}.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Speaking of dangerous referents
  2023-02-11  3:55               ` Bart Schaefer
@ 2023-02-11  5:36                 ` Bart Schaefer
  2023-02-12  8:00                   ` Oliver Kiddle
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Schaefer @ 2023-02-11  5:36 UTC (permalink / raw)
  To: Zsh hackers list

Oliver wrote:
> > And it could be wise to limit what can be done as part of the
> > subscript evaluation to avoid a CVE similar to the last one.

% print $ZSH_PATCHLEVEL
ubuntu/5.8-3ubuntu1.1
% empty=()
% loop='empty[${(P)loop}]'
% print ${(P)loop}
zsh: segmentation fault (core dumped)  zsh -f


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  3:04             ` Bart Schaefer
  2023-02-11  3:55               ` Bart Schaefer
@ 2023-02-11  7:02               ` Oliver Kiddle
  2023-02-11  7:45                 ` Bart Schaefer
  1 sibling, 1 reply; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-11  7:02 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> Closing in on this, I think ... remaining decisions on which input
> would be appreciated:
>
> On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > > Ksh prints "global reference cannot refer to local variable".
> >
> > At what point does that happen?  Upon the assignment ref=var ?
>
> At the moment, just toying with this, I have it erroring when
> EMULATE_KSH, printing a warning when WARN_NESTED_VAR, and otherwise
> silently creating the down-scope reference.  Thoughts?  Perhaps
> WARN_CREATE_GLOBAL would actually be better here, or maybe check for
> either one.

I'd make it a fatal error unconditionally. I don't like it if the target
of a nameref can switch to something else following a return. I'd be
fine with the reference becoming unset at the time of the return if it
refers to a variable that is going out of scope. Can that be done as
part of the same code that drops the local variables? (I'm guessing it
is easy for local but hard for private)

The mention of WARN_NESTED_VAR raises something else. Surely a reference
should disable that warning:

    toplevel() {
      local foo="in fn"
      nested
    }
    nested() {
      typeset -n ref=foo
      ref="in nested"
    }         
    setopt warn_nested_var
    toplevel
    nested:2: scalar parameter foo set in enclosing scope in function nested

> > Relatedly, what should happen on any failed assignment?  E.g.
> >
> > typeset -n xy yx
> > xy=yx  # OK so far
> > yx=xy  # Oops, invalid loop
> >
> > Should yx become unset, or should it remain a nameref with no referent?
>
> This probably doesn't matter except at the command line as the
> referent loop is otherwise fatal.

Current code even allows:

  typeset -n ref
  ref=ref

I'd suggest that we need the self-reference check whenever assigning to
a reference. But that entails evaluation of a subscript.

> > > When relying only on dynamic scoping, it would be good practice to
> > > define all the namerefs to passed parameters as early as possible in a
> > > function to reduce the risk of a name clash.
> >
> > If you were going to put that in the doc, where would it go?

Do we to put recommended good practices in the doc at all as opposed to
it being a terse description of the functionality. I'd prefer it out of
the way with examples rather than adding fluff to the places you list.
I'd also consider it bad practice not to always assign a reference when
creating it.

> > > It might deprive us of many clever tricks but parse_subscript() could
> > > gain a flag to disable anything questionable like command substitution
> > > and math evaluation side-effects.
>
> parse_subscript() is already a bit of a hack, it sets up its own
> context and invokes the lexer.  A flag such as that would, I think,
> require plugging in an alternate lexer, hence my question about doing
> a simpler examination of the string.

That's not as robust but certainly an option. The temporary use of
NOEXEC that you mention in the other message certainly goes a long way
to alleviating security concerns.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
@ 2023-02-11  7:45                 ` Bart Schaefer
  2023-02-11 23:43                   ` Bart Schaefer
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Schaefer @ 2023-02-11  7:45 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > >
> > > > Ksh prints "global reference cannot refer to local variable".
>
> I'd make it a fatal error unconditionally. I don't like it if the target
> of a nameref can switch to something else following a return.

Making it an unconditional error means that this sort of thing doesn't work:

typeset -n ref
() {
  typeset a=A b=B c=C
  for ref in a b c; do ref=${(L)ref}; done
}

> I'd be
> fine with the reference becoming unset at the time of the return if it
> refers to a variable that is going out of scope. Can that be done as
> part of the same code that drops the local variables?

Maybe.  Might look at that tomorrow.

> The mention of WARN_NESTED_VAR raises something else. Surely a reference
> should disable that warning:

Actually there's an argument that an unexpected up-reference is a
really good reason for that warning.  A function that intentionally
uses an up-reference could turn the option off locally.

> > > Relatedly, what should happen on any failed assignment?
>
> Current code even allows:
>
>   typeset -n ref
>   ref=ref

That's fixed in my sandbox.  Patch eventually forthcoming.

> I'd suggest that we need the self-reference check whenever assigning to
> a reference. But that entails evaluation of a subscript.

As I have it right now, self-reference checks do happen at any
assignment.  That's why I wonder if deleting the offending parameter
is OK.

Circular references hidden inside subscripts end up expanding to empty
string, as do command substitutions with the NO_EXEC trick.

> Do we to put recommended good practices in the doc at all as opposed to
> it being a terse description of the functionality. I'd prefer it out of
> the way with examples rather than adding fluff to the places you list.

The Functions chapter was rather too terse.  Here's what I've written to add:

--- 8< ---
Parameters declared by any of the 'typeset' family of commands during
the execution of a function become _local_ to the function unless the
'-g' option is used.  This is the _scope_ of the parameter, which
extends dynamically to any other functions called by the declaring
function.  In most cases, local parameters take the place of any other
parameter having the same name that was assigned or declared in an
earlier function scope.  (See *note Local Parameters::.)

A named parameter declared with the '-n' option to any of the 'typeset'
commands becomes a reference to a parameter in scope at the time of
assignment to the named reference, which may be at a different call
level than the declaring function.  For this reason, it is good practice
to declare a named reference as soon as the referent parameter is in
scope, and as early as possible in the function if the reference is to a
parameter in a calling scope.

A typical use of named references is to pass the name of the referent
as a positional parameter.  For example,

     pop() {
       local -n ref=$1
       local last=$ref[$#ref]
       ref[$#ref]=()
       print -r -- $last
     }
     array=( a list of five values )
     pop array

prints the word values and shortens $array to '( a list of five )'.
There are no local parameters in pop at the time 'ref=$1' is assigned,
so 'ref' becomes a reference to 'array' in the caller.
--- >8 ---

> I'd also consider it bad practice not to always assign a reference when
> creating it.

That also doesn't work with the for-loop example above (even if the
nameref is moved inside the function).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11  7:45                 ` Bart Schaefer
@ 2023-02-11 23:43                   ` Bart Schaefer
  2023-02-11 23:45                     ` Bart Schaefer
  2023-02-12  7:38                     ` Oliver Kiddle
  0 siblings, 2 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-11 23:43 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, Feb 10, 2023 at 11:45 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > I'd be
> > fine with the reference becoming unset at the time of the return if it
> > refers to a variable that is going out of scope. Can that be done as
> > part of the same code that drops the local variables?

That's actually a little startling given dynamic scoping.  It leads to this:

{
  typeset -n ref=var
  () {
    var=FUNC1
    typeset -p ref
  }
  typeset -p ref
}

Output:
typeset -g -n ref=var
typeset: no such variable: ref

If the called function were a blackbox, it would be extremely puzzling
why "ref" became unset.

The additional complication is that the parameter table is a hash, so
it's scanned in random order; we can't be sure that the referent local
$var will be found and unset before the referring $ref is examined.  I
suppose we could scan twice (ick).

Instead let's think for a moment about how the dynamic scoping works.
"ref" has two properties:  The locallevel at which it was declared,
and the locallevel at which it prefers to find a referent.  Any time
you use $ref (or assign to ref) the search starts at the current
locallevel and proceeds upward until either the preferred level is
reached or the "topmost" defined parameter is found, whichever comes
first.

So ... if endparamscope() walks the preferred referent locallevel
upward as the scope unwinds, we get (using your earlier example with
my addition):

{
  var=hello
  typeset -n ref
  () {
    typeset var=x
    ref=var
    echo $ref
  }
  typeset -p ref
  echo $ref
  () {
   typeset var=y
   echo $ref
   () {
     typeset var=z
     echo $ref
   }
  }
}

Output:
x
typeset -n ref=var
hello
hello
hello

The inner assignment to ref changes the scope , but it then unwinds so
in the second function (which does not assign to ref) it is back to
pointing at the original scope, and finds $var there.

If "var=hello" is replaced by "unset var" (so there is no parameter at
the original scope), then the output is:

x
typeset -n ref=var

y
y

This is because the assignment "ref=var" has initialized the name to
which "ref" points, but there is no such name at the same level, so
the nearest parameter of the same name is chosen.

If "typeset -n ref" at the top level is replaced by "typeset -n
ref=var" then the "ref=var" assignment in the first function changes
the value of either the global $var or the local $var from "x" to
"var" (depending on which existed first) because ref already has a
referent name.  So the output could be either:

# with var=hello
var
typeset -n ref=var
var
var
var

# with var unset
var
typeset -n ref=var

y
y

Given the requirements that (1) assigning a value to a nameref that
has no referent initializes the nameref (required for "for" loop) and
(2) dynamic scoping selects the best referent by name, I think this is
the best I can do without something even more surprising.

The other option that occurs to me is for the assignment ref=var to
automatically create a local $ref (as if "typeset ref=var" had been
used).  That would then be unwound and we'd end up back in the
original state.  But that would also be unexpected behavior in dynamic
scopes.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11 23:43                   ` Bart Schaefer
@ 2023-02-11 23:45                     ` Bart Schaefer
  2023-02-12  7:38                     ` Oliver Kiddle
  1 sibling, 0 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-11 23:45 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Feb 11, 2023 at 3:43 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Fri, Feb 10, 2023 at 11:45 PM Bart Schaefer
> <schaefer@brasslantern.com> wrote:
> >
> > On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
> > >
> > > fine with the reference becoming unset at the time of the return if it
> > > refers to a variable that is going out of scope.
>
> That's actually a little startling given dynamic scoping.  It leads to this:
>
> {
>   typeset -n ref=var
>   () {
>     var=FUNC1

# that should of course be "local var=FUNC1" sorry

>     typeset -p ref
>   }
>   typeset -p ref
> }
>
> Output:
> typeset -g -n ref=var
> typeset: no such variable: ref


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-11 23:43                   ` Bart Schaefer
  2023-02-11 23:45                     ` Bart Schaefer
@ 2023-02-12  7:38                     ` Oliver Kiddle
  1 sibling, 0 replies; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-12  7:38 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> > > I'd be
> > > fine with the reference becoming unset at the time of the return if it
> > > refers to a variable that is going out of scope. Can that be done as
> > > part of the same code that drops the local variables?
>
> That's actually a little startling given dynamic scoping.  It leads to this:

While I said "unset", what I actually meant was only dropping the value
part of the reference - returning it to the `typeset -n ref` state.

However:

> Instead let's think for a moment about how the dynamic scoping works.
> "ref" has two properties:  The locallevel at which it was declared,
> and the locallevel at which it prefers to find a referent.  Any time
> you use $ref (or assign to ref) the search starts at the current
> locallevel and proceeds upward until either the preferred level is
> reached or the "topmost" defined parameter is found, whichever comes
> first.

You've persuaded me. It is consistent with dynamic scoping in general.
Perhaps it has further convinced me (not that it needed to) that lexical
scoping is generally better.

Also when following and considering your examples, the following
scenario occurred to me. I can't see a reason to object to this
behaviour and some of the ideas I had in mind would mess with it. Note
that the reference starts out pointing to an unset variable:

  bar() {
    typeset var=hello
    echo $ref
  }
  foo() {
    typeset -n ref=var
    bar
    var=foo
    echo $ref
  }
  unset var
  foo

It'd be surprising if calling bar somehow lost the value of the
reference.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Speaking of dangerous referents
  2023-02-11  5:36                 ` Speaking of dangerous referents Bart Schaefer
@ 2023-02-12  8:00                   ` Oliver Kiddle
  2023-02-12  8:34                     ` Bart Schaefer
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-12  8:00 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> % empty=()
> % loop='empty[${(P)loop}]'
> % print ${(P)loop}
> zsh: segmentation fault (core dumped)  zsh -f

And in 51399 on namerefs:
> Circular references hidden inside subscripts end up expanding to empty
> string, as do command substitutions with the NO_EXEC trick.

That surprises me. I can't see any executions occurring. Would making
(P) a little safer by applying the NO_EXEC trick to it too fix that seg
fault. Or did you have a different fix in mind? Are there sane uses to
(P) with nasty stuff like hidden command substitutions. We have (e) and
eval which are at least explicit in their dangers.

If genuine uses surface later (PP) could turn on the dangerous form. Or
you might argue that we should support (PP) for the safe form.

Making (P) safer could be another use for the FUTURE option I suggested
in the final paragraph of 51281 - perhaps very few people (if any) were
still reading at that point so it may have been overlooked.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Speaking of dangerous referents
  2023-02-12  8:00                   ` Oliver Kiddle
@ 2023-02-12  8:34                     ` Bart Schaefer
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-12  8:34 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Feb 12, 2023 at 12:00 AM Oliver Kiddle <opk@zsh.org> wrote:
>
> Bart Schaefer wrote:
> > % empty=()
> > % loop='empty[${(P)loop}]'
> > % print ${(P)loop}
> > zsh: segmentation fault (core dumped)  zsh -f
>
> And in 51399 on namerefs:
> > Circular references hidden inside subscripts end up expanding to empty
> > string, as do command substitutions with the NO_EXEC trick.
>
> That surprises me. I can't see any executions occurring.

I apologize, the reference to NO_EXEC only applies to the clause about
command substitutions.  The circular references are handled by using
tagging to detect/abort the loop during nameref resolution.  I'm
uncertain whether a similar approach could be used for (P) and haven't
dug into it, I didn't want to mix a patch for that into the patches
for namerefs.

> Would making
> (P) a little safer by applying the NO_EXEC trick to it too fix that seg
> fault. Or did you have a different fix in mind?

It wouldn't and I didn't, at least not yet.  It would have to be
something along the lines of the "math recursion limit exceeded"
handling.  There are two or three other places in the code where there
are comments rejecting arbitrary limits on recursion so this might be
something we don't want to fix.

> Making (P) safer could be another use for the FUTURE option I suggested
> in the final paragraph of 51281 - perhaps very few people (if any) were
> still reading at that point so it may have been overlooked.

It would be helpful to know whether any of the uses of (P) in the
existing contributed and completion functions etc. actually rely on
expanding command substitutions.  It would be pretty unlikely that
they rely on self-reference.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-09 23:07           ` Bart Schaefer
  2023-02-11  3:04             ` Bart Schaefer
@ 2023-02-12  9:02             ` Oliver Kiddle
  2023-02-12 18:59               ` Bart Schaefer
  1 sibling, 1 reply; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-12  9:02 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On 9 Feb, Bart Schaefer wrote (I've reordered this):
> > This should be an error perhaps:
> >
> >   typeset -n ref=arr[1][2]
>
> Why?  ${ary[1][2]} isn't an error, it's the 2nd character of the first
> word in $ary.

Sorry, should have been clearer. Applying later subscripts is even
better than an error but the following is not ideal:

  % arr=( one two three )
  % typeset -n ref=arr[2][1]
  % echo $ref
  two[1]
  % ref=x
  % typeset -p arr
  typeset -a arr=( one x three )
  % echo $ref
  x[1]
  % typeset -n ref=arr[1]whatever
  % echo $ref
  onewhatever

> Yeah, I don't like the idea that a called function can arbitrarily
> grab parameters from its caller just by sticking & in front.  Caller
> needs to do do something (even if only make "normal" use of dynamic
> scope) to make the parameter "grabbable".

Allowing such things as uplevel statements (Tcl), special syntax, or
introspection modules (Python's inspect), is fairly normal for a
scripting language.

As I've mentioned, we already have the situation where the caller
needs nothing to make a private grabable by a builtin:
  private -a var
  compadd -O var one two three
And writing wrappers to e.g. compadd is useful.

But I don't think we need to address references to privates in other
scopes at this point before the other patches are even applied. May be
better to consider whether there are other advantages in moving private
to the main shell first as that may ease any implementation.

> > >   typeset -n ref
> > >   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...
> >
> > "just"!?
>
> Hah!  Point was that it's do-able without "for"-context-sensitive
> special subscript semantics.  I think it would be strange for

I can't think of a sensible way. Was just that it could be useful.
Best long term solution might be to allow arrays of references. Bash
does allow arrays of integers.

> The others would all have to be special-cased individually.  What is
> $+ ?  Do you mean $- ?

Yes.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-12  9:02             ` Oliver Kiddle
@ 2023-02-12 18:59               ` Bart Schaefer
  2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
  2023-02-12 22:54                 ` [PATCH 1/3]: Add named references Oliver Kiddle
  0 siblings, 2 replies; 25+ messages in thread
From: Bart Schaefer @ 2023-02-12 18:59 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Feb 12, 2023 at 1:02 AM Oliver Kiddle <opk@zsh.org> wrote:
>
>   % typeset -n ref=arr[1]whatever
>   % echo $ref
>   onewhatever

Aha.  OK, that's pretty easy to fix.

> As I've mentioned, we already have the situation where the caller
> needs nothing to make a private grabable by a builtin:
>   private -a var
>   compadd -O var one two three
> And writing wrappers to e.g. compadd is useful.
>
> But I don't think we need to address references to privates in other
> scopes at this point before the other patches are even applied. May be
> better to consider whether there are other advantages in moving private
> to the main shell first as that may ease any implementation.

I think it would in fact be necessary to move private to the main
shell in order to fix this.  The second-worst hack in private is the
overloading of some parameter flags, and the internals of e.g.
createparam() et al. need to be able to distinguish the cases of the
original flag meaning from its overloaded meaning.  Builtins have the
advantage of not passing through that code path.

> > > >   typeset -n ref
> > > >   for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ...
>
> I can't think of a sensible way. Was just that it could be useful.

What about adding ${(K)array) to return array[1] array[2] ... ?

for ref in ${(K)hash}; do ...

I took a quick look at the implementation of (k) and (v) and (K) looks
do-able with sufficient messing around in paramsubst().  Which I'm not
feeling like undertaking myself just now, though.

> Best long term solution might be to allow arrays of references. Bash
> does allow arrays of integers.

Bash also has sparse arrays, its implementation is quite different.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references)
  2023-02-12 18:59               ` Bart Schaefer
@ 2023-02-12 19:45                 ` Bart Schaefer
  2023-02-12 21:01                   ` Oliver Kiddle
  2023-02-12 22:54                 ` [PATCH 1/3]: Add named references Oliver Kiddle
  1 sibling, 1 reply; 25+ messages in thread
From: Bart Schaefer @ 2023-02-12 19:45 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Feb 12, 2023 at 10:59 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> > [...] consider whether there are other advantages in moving private
> > to the main shell first as that may ease any implementation.
>
> [...] The second-worst hack in private is the
> overloading of some parameter flags, and the internals of e.g.
> createparam() et al. need to be able to distinguish the cases of the
> original flag meaning from its overloaded meaning.

We're up to (1<<30) on PM_* flags.  Can we go above (1<<31) ?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references)
  2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
@ 2023-02-12 21:01                   ` Oliver Kiddle
  0 siblings, 0 replies; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-12 21:01 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> We're up to (1<<30) on PM_* flags.  Can we go above (1<<31) ?

We can't even use 1<<31 without either refactoring to ensure the use of
unsigned int or making it otherwise work for either a 32 or 64-bit int.
Perhaps assigning it as INT_MIN just works but I'm not sure.
See 43674. I had somehow missed the fact that Peter had freed two bits
there.

It might be better to force it up to 64-bits if doing refactoring. There
are a few mutually exclusive flags which would imply that bits could be
freed but the need to correspond to letters in TYPESET_OPTSTR prevents
that.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3]: Add named references
  2023-02-12 18:59               ` Bart Schaefer
  2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
@ 2023-02-12 22:54                 ` Oliver Kiddle
  1 sibling, 0 replies; 25+ messages in thread
From: Oliver Kiddle @ 2023-02-12 22:54 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

Bart Schaefer wrote:
> On Sun, Feb 12, 2023 at 1:02 AM Oliver Kiddle <opk@zsh.org> wrote:
> >
> >   % typeset -n ref=arr[1]whatever
> >   % echo $ref
> >   onewhatever
>
> Aha.  OK, that's pretty easy to fix.

That still leaves the oddities with a second subscript:

  typeset -n ref='arr[2][1]'
  arr=( one two three )
  echo $ref
  echo ${ref}
  ref=Z
  typeset -p arr

Only case where the second subscript works is when ref is in braces.

> What about adding ${(K)array) to return array[1] array[2] ... ?
>
> for ref in ${(K)hash}; do ...

That'd work. There's been a few times when I've wished (k) and (kv)
would work for normal arrays. the former is just {1..$#arr} but would
work in nested expansions.

Oliver


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-02-12 22:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06  2:21 [PATCH 1/3]: Add named references Bart Schaefer
2023-02-08  3:45 ` Oliver Kiddle
2023-02-08  4:59   ` Bart Schaefer
2023-02-08 23:16     ` Bart Schaefer
2023-02-09  0:47     ` Oliver Kiddle
2023-02-09  2:01       ` Oliver Kiddle
2023-02-09  5:45         ` Bart Schaefer
2023-02-09  4:49       ` Bart Schaefer
2023-02-09 20:49         ` Oliver Kiddle
2023-02-09 23:07           ` Bart Schaefer
2023-02-11  3:04             ` Bart Schaefer
2023-02-11  3:55               ` Bart Schaefer
2023-02-11  5:36                 ` Speaking of dangerous referents Bart Schaefer
2023-02-12  8:00                   ` Oliver Kiddle
2023-02-12  8:34                     ` Bart Schaefer
2023-02-11  7:02               ` [PATCH 1/3]: Add named references Oliver Kiddle
2023-02-11  7:45                 ` Bart Schaefer
2023-02-11 23:43                   ` Bart Schaefer
2023-02-11 23:45                     ` Bart Schaefer
2023-02-12  7:38                     ` Oliver Kiddle
2023-02-12  9:02             ` Oliver Kiddle
2023-02-12 18:59               ` Bart Schaefer
2023-02-12 19:45                 ` PM_* flags in zsh.h (was Re: [PATCH 1/3]: Add named references) Bart Schaefer
2023-02-12 21:01                   ` Oliver Kiddle
2023-02-12 22:54                 ` [PATCH 1/3]: Add named references Oliver Kiddle

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).