[-- Attachment #1: Type: text/plain, Size: 2888 bytes --] This has been my weekend/evening project the past 10 days or so. I compared to ksh93 (from the Ubuntu software store) wherever there weren't conflicts with zsh-native features, even managing to crash ksh93 several times in the process. Attached is the patch with the C code changes. It'd be nice, Roman, if you could run your benchmarks on this to see if I've slowed down any other shell operations. The next two patches will cover tests and documentation. When a reference is created, it's necessary to determine the scope in which to find the referenced parameter. Since a reference can name an array element, this involves separating the base parameter name from the subscript. I've overloaded the "base" and "width" fields in struct param to save both the scope (locallevel) and the offset of the subscript, to avoid having to search for the subscript again when the reference is expanded. There is a little redundant scanning to validate subscripts when the reference is created; this follows ksh93 (the alternative would be to fail at the time of expansion). One difference from ksh93 is that you can't convert scalars into namerefs, e.g., both with this and in ksh93: typeset -n foo=bar typeset +n nameref leaves behind a scalar $foo with value bar. but only in ksh93: foo=bar typeset -n foo foo=thing assigns bar=thing. In this implementation typeset -n first unsets foo, so instead of foo "pointing at" bar, it ends up "pointing at" thing. This eliminates a bunch of special cases for exported and tied parameters, etc., and it seems easy enough to write: typeset +n foo typeset -n foo=$foo if that's what you wanted. If I have a sudden inspiration I'll do a follow-up patch. Named references can't have any other attributes, and attempting to add an attribute to a named reference generates a warning. However, existing prohibited attribute changes (such as trying to turn a special scalar into an array) are fatal errors, so I've left comments wondering whether that would instead be appropriate here. With respect to zsh/param/private parameters, it's presently prohibited to have a "public" reference to a private parameter. It might be possible to change this if someone can think of a use case. There is one glitch: if the reference is created before the parameter is declared private, and the private local is NOT hiding something in an outer scope, then the reference does not retroactively generate an error and assignments to the reference are silent no-ops. I moved around a little bit of not-directly-related signal handling in params.c so that we're not calling zfree() outside of queue_signals() blocks. That makes the order of unqueue_signals() and zfree() consistent at least throughout assignsparam(). I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can tell they should never have been there in the first place. [-- Attachment #2: nameref-1-code.txt --] [-- Type: text/plain, Size: 17101 bytes --] diff --git a/Src/Modules/param_private.c b/Src/Modules/param_private.c index 065fa63d2..70f36ceb1 100644 --- a/Src/Modules/param_private.c +++ b/Src/Modules/param_private.c @@ -512,9 +512,16 @@ static GetNodeFunc getparamnode; static HashNode getprivatenode(HashTable ht, const char *nam) { - HashNode hn = getparamnode(ht, nam); + /* getparamnode() would follow namerefs, we must not do that here */ + HashNode hn = gethashnode2(ht, nam); Param pm = (Param) hn; + /* autoload has precedence over nameref, so getparamnode() */ + if (pm && (pm->node.flags & PM_AUTOLOAD)) { + hn = getparamnode(ht, nam); + pm = (Param) hn; + /* how would an autoloaded private behave? return here? */ + } while (!fakelevel && pm && locallevel > pm->level && is_private(pm)) { if (!(pm->node.flags & PM_UNSET)) { /* @@ -533,6 +540,12 @@ getprivatenode(HashTable ht, const char *nam) } pm = pm->old; } + + /* resolve nameref after skipping private parameters */ + if (pm && (pm->node.flags & PM_NAMEREF) && + (pm->u.str || (pm->node.flags & PM_UNSET))) + pm = (Param) resolve_nameref(pm, NULL); + return (HashNode)pm; } @@ -571,7 +584,7 @@ printprivatenode(HashNode hn, int printflags) static struct builtin bintab[] = { /* Copied from BUILTIN("local"), "P" added */ - BUILTIN("private", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_private, 0, -1, 0, "AE:%F:%HL:%PR:%TUZ:%ahi:%lmprtux", "P") + BUILTIN("private", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_private, 0, -1, 0, "AE:%F:%HL:%PR:%TUZ:%ahi:%lnmprtux", "P") }; static struct features module_features = { diff --git a/Src/Modules/parameter.c b/Src/Modules/parameter.c index dbb61e474..5bf675e2a 100644 --- a/Src/Modules/parameter.c +++ b/Src/Modules/parameter.c @@ -49,13 +49,15 @@ paramtypestr(Param pm) if (pm->node.flags & PM_AUTOLOAD) return dupstring("undefined"); - switch (PM_TYPE(f)) { + /* For simplicity we treat PM_NAMEREF as PM_TYPE(PM_SCALAR) */ + switch (PM_TYPE(f)|(f & PM_NAMEREF)) { case PM_SCALAR: val = "scalar"; break; case PM_ARRAY: val = "array"; break; case PM_INTEGER: val = "integer"; break; case PM_EFLOAT: case PM_FFLOAT: val = "float"; break; case PM_HASHED: val = "association"; break; + case PM_NAMEREF: val = "nameref"; break; } DPUTS(!val, "BUG: type not handled in parameter"); val = dupstring(val); diff --git a/Src/builtin.c b/Src/builtin.c index 4c295d11f..8039b644e 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -55,7 +55,7 @@ static struct builtin builtins[] = BUILTIN("cd", BINF_SKIPINVALID | BINF_SKIPDASH | BINF_DASHDASHVALID, bin_cd, 0, 2, BIN_CD, "qsPL", NULL), BUILTIN("chdir", BINF_SKIPINVALID | BINF_SKIPDASH | BINF_DASHDASHVALID, bin_cd, 0, 2, BIN_CD, "qsPL", NULL), BUILTIN("continue", BINF_PSPECIAL, bin_break, 0, 1, BIN_CONTINUE, NULL, NULL), - BUILTIN("declare", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klmp:%rtuxz", NULL), + BUILTIN("declare", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klmnp:%rtuxz", NULL), BUILTIN("dirs", 0, bin_dirs, 0, -1, 0, "clpv", NULL), BUILTIN("disable", 0, bin_enable, 0, -1, BIN_DISABLE, "afmprs", NULL), BUILTIN("disown", 0, bin_fg, 0, -1, BIN_DISOWN, NULL, NULL), @@ -88,7 +88,7 @@ static struct builtin builtins[] = BUILTIN("jobs", 0, bin_fg, 0, -1, BIN_JOBS, "dlpZrs", NULL), BUILTIN("kill", BINF_HANDLES_OPTS, bin_kill, 0, -1, 0, NULL, NULL), BUILTIN("let", 0, bin_let, 1, -1, 0, NULL, NULL), - BUILTIN("local", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%ahi:%lp:%rtux", NULL), + BUILTIN("local", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%ahi:%lnp:%rtux", NULL), BUILTIN("logout", 0, bin_break, 0, 1, BIN_LOGOUT, NULL, NULL), #if defined(ZSH_MEM) & defined(ZSH_MEM_DEBUG) @@ -121,7 +121,7 @@ static struct builtin builtins[] = BUILTIN("trap", BINF_PSPECIAL | BINF_HANDLES_OPTS, bin_trap, 0, -1, 0, NULL, NULL), BUILTIN("true", 0, bin_true, 0, -1, 0, NULL, NULL), BUILTIN("type", 0, bin_whence, 0, -1, 0, "ampfsSw", "v"), - BUILTIN("typeset", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klp:%rtuxmz", NULL), + BUILTIN("typeset", BINF_PLUSOPTS | BINF_MAGICEQUALS | BINF_PSPECIAL | BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "AE:%F:%HL:%R:%TUZ:%afghi:%klp:%rtuxmnz", NULL), BUILTIN("umask", 0, bin_umask, 0, 1, 0, "S", NULL), BUILTIN("unalias", 0, bin_unhash, 0, -1, BIN_UNALIAS, "ams", NULL), BUILTIN("unfunction", 0, bin_unhash, 1, -1, BIN_UNFUNCTION, "m", "f"), @@ -2030,6 +2030,19 @@ typeset_single(char *cname, char *pname, Param pm, int func, int usepm, tc, keeplocal = 0, newspecial = NS_NONE, readonly, dont_set = 0; char *subscript; + if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF)) { + if (!(off & PM_NAMEREF)) + pm = (Param)resolve_nameref(pm, NULL); + if (pm && (pm->node.flags & PM_NAMEREF) && + (on & ~(PM_NAMEREF|PM_LOCAL))) { + /* Changing type of PM_SPECIAL|PM_AUTOLOAD is a fatal error. * + * Should this be a fatal error as well, rather than warning? */ + zwarnnam(cname, "%s: can't change type of a named reference", + pname); + return NULL; + } + } + /* * Do we use the existing pm? Note that this isn't the end of the * story, because if we try and create a new pm at the same @@ -2406,6 +2419,11 @@ typeset_single(char *cname, char *pname, Param pm, int func, return NULL; } } else if ((subscript = strchr(pname, '['))) { + if (on & PM_NAMEREF) { + zerrnam(cname, + "%s: reference variable cannot be an array", pname); + return NULL; + } if (on & PM_READONLY) { zerrnam(cname, "%s: can't create readonly array elements", pname); @@ -2640,6 +2658,14 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func) else if (OPT_PLUS(ops,optval)) off |= bit; } + if (OPT_MINUS(ops,'n')) { + if (on|off) { + zwarnnam(name, "no other attributes allowed with -n"); + return 1; + } + on |= PM_NAMEREF; + } else if (OPT_PLUS(ops,'n')) + off |= PM_NAMEREF; roff = off; /* Sanity checks on the options. Remove conflicting options. */ @@ -3022,6 +3048,27 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func) } continue; } + + if (on & PM_NAMEREF) { + if (asg->value.scalar && + (strcmp(asg->name, asg->value.scalar) == 0 || + ((pm = (Param)resolve_nameref((Param)hn, asg)) && + (pm->node.flags & PM_NAMEREF)))) { + if (pm->node.flags & PM_SPECIAL) + zwarnnam(name, "%s: invalid reference", pm->node.nam); + else + zwarnnam(name, "%s: invalid self reference", asg->name); + returnval = 1; + continue; + } + if (hn) { + /* namerefs always start over fresh */ + if (((Param)hn)->level >= locallevel) + unsetparam_pm((Param)hn, 0, 1); + hn = NULL; + } + } + if (!typeset_single(name, asg->name, (Param)hn, func, on, off, roff, asg, NULL, ops, 0)) @@ -3805,7 +3852,8 @@ bin_unset(char *name, char **argv, Options ops, int func) returnval = 1; } } else { - if (unsetparam_pm(pm, 0, 1)) + if ((pm = (Param)resolve_nameref(pm, NULL)) && + unsetparam_pm(pm, 0, 1)) returnval = 1; } if (ss) diff --git a/Src/params.c b/Src/params.c index 6362b382c..69b7f484f 100644 --- a/Src/params.c +++ b/Src/params.c @@ -536,6 +536,9 @@ getparamnode(HashTable ht, const char *nam) nam); } } + + if (hn && ht == realparamtab) + hn = resolve_nameref(pm, NULL); return hn; } @@ -993,6 +996,34 @@ createparam(char *name, int flags) gethashnode2(paramtab, name) : paramtab->getnode(paramtab, name)); + if (oldpm && (oldpm->node.flags & PM_NAMEREF) && + !(flags & PM_NAMEREF)) { + Param lastpm; + struct asgment stop; + stop.flags = PM_NAMEREF | (flags & PM_LOCAL); + stop.name = oldpm->node.nam; + stop.value.scalar = oldpm->u.str; + lastpm = (Param)resolve_nameref(oldpm, &stop); + if (lastpm) { + if (lastpm->node.flags & PM_NAMEREF) { + if (lastpm->u.str && *(lastpm->u.str)) { + name = lastpm->u.str; + oldpm = NULL; + } else { + if (!(lastpm->node.flags & PM_READONLY)) + lastpm->node.flags |= PM_UNSET; + return lastpm; + } + } else { + /* nameref pointing to an unset local */ + DPUTS(!(lastpm->node.flags & PM_UNSET), + "BUG: local parameter is not unset"); + oldpm = lastpm; + } + } else + flags |= PM_NAMEREF; + } + DPUTS(oldpm && oldpm->level > locallevel, "BUG: old local parameter not deleted"); if (oldpm && (oldpm->level == locallevel || !(flags & PM_LOCAL))) { @@ -2109,6 +2140,23 @@ fetchvalue(Value v, char **pptr, int bracks, int flags) memset(v, 0, sizeof(*v)); else v = (Value) hcalloc(sizeof *v); + if ((pm->node.flags & PM_NAMEREF) && pm->u.str && *(pm->u.str)) { + /* only happens for namerefs pointing to array elements */ + char *ref = dupstring(pm->u.str); + char *ss = pm->width ? ref + pm->width : NULL; + if (ss) { + sav = *ss; + *ss = 0; + } + Param p1 = (Param)gethashnode2(paramtab, ref); + if (!(p1 && (pm = upscope(p1, pm->base))) || + ((pm->node.flags & PM_UNSET) && + !(pm->node.flags & PM_DECLARED))) + return NULL; + if (ss) + *ss = sav; + s = ss; + } if (PM_TYPE(pm->node.flags) & (PM_ARRAY|PM_HASHED)) { /* Overload v->isarr as the flag bits for hashed arrays. */ v->isarr = flags | (isvarat ? SCANPM_ISVAR_AT : 0); @@ -2677,6 +2725,7 @@ assignstrvalue(Value v, char *val, int flags) } break; } + setscope(v->pm); if ((!v->pm->env && !(v->pm->node.flags & PM_EXPORTED) && !(isset(ALLEXPORT) && !(v->pm->node.flags & PM_HASHELEM))) || (v->pm->node.flags & PM_ARRAY) || v->pm->ename) @@ -3084,11 +3133,20 @@ assignsparam(char *s, char *val, int flags) } } if (!v && !(v = getvalue(&vbuf, &t, 1))) { - unqueue_signals(); zsfree(val); + unqueue_signals(); /* errflag |= ERRFLAG_ERROR; */ return NULL; } + if (*val && (v->pm->node.flags & PM_NAMEREF)) { + if (!valid_refname(val)) { + zerr("invalid variable name: %s", val); + zsfree(val); + unqueue_signals(); + errflag |= ERRFLAG_ERROR; + return NULL; + } + } if (flags & ASSPM_WARN) check_warn_pm(v->pm, "scalar", created, 1); v->pm->node.flags &= ~PM_DEFAULTED; @@ -3115,8 +3173,8 @@ assignsparam(char *s, char *val, int flags) lhs.u.l = lhs.u.l + (zlong)rhs.u.d; } setnumvalue(v, lhs); - unqueue_signals(); zsfree(val); + unqueue_signals(); return v->pm; /* avoid later setstrvalue() call */ case PM_ARRAY: if (unset(KSHARRAYS)) { @@ -3141,9 +3199,9 @@ assignsparam(char *s, char *val, int flags) case PM_INTEGER: case PM_EFLOAT: case PM_FFLOAT: + zsfree(val); unqueue_signals(); zerr("attempt to add to slice of a numeric variable"); - zsfree(val); return NULL; case PM_ARRAY: kshappend: @@ -3602,7 +3660,8 @@ unsetparam(char *s) if ((pm = (Param) (paramtab == realparamtab ? /* getnode2() to avoid autoloading */ paramtab->getnode2(paramtab, s) : - paramtab->getnode(paramtab, s)))) + paramtab->getnode(paramtab, s))) && + !(pm->node.flags & PM_NAMEREF)) unsetparam_pm(pm, 0, 1); unqueue_signals(); } @@ -5783,7 +5842,8 @@ static const struct paramtypes pmtypes[] = { { PM_TAGGED, "tagged", 't', 0}, { PM_EXPORTED, "exported", 'x', 0}, { PM_UNIQUE, "unique", 'U', 0}, - { PM_TIED, "tied", 'T', 0} + { PM_TIED, "tied", 'T', 0}, + { PM_NAMEREF, "namref", 'n', 0} }; #define PMTYPES_SIZE ((int)(sizeof(pmtypes)/sizeof(struct paramtypes))) @@ -6037,3 +6097,123 @@ printparamnode(HashNode hn, int printflags) else if (!(printflags & PRINT_KV_PAIR)) putchar('\n'); } + +/**/ +mod_export HashNode +resolve_nameref(Param pm, const Asgment stop) +{ + HashNode hn = (HashNode)pm; + const char *seek = stop ? stop->value.scalar : NULL; + + if (pm && (pm->node.flags & PM_NAMEREF)) { + if (pm && (pm->node.flags & (PM_UNSET|PM_TAGGED))) { + /* Semaphore with createparam() */ + pm->node.flags &= ~PM_UNSET; + /* See V10private.ztst end is in scope but private: + if (pm->node.flags & PM_SPECIAL) + return NULL; + */ + return (HashNode) pm; + } else if (pm->u.str) { + if ((pm->node.flags & PM_TAGGED) || + (stop && strcmp(pm->u.str, stop->name) == 0)) { + /* zwarnnam(pm->u.str, "invalid self reference"); */ + return stop ? (HashNode)pm : NULL; + } + if (*(pm->u.str)) + seek = pm->u.str; + } + } + else if (pm && !(stop && (stop->flags & PM_NAMEREF))) + return (HashNode)pm; + if (seek) { + queue_signals(); + /* pm->width is the offset of any subscript */ + if (pm && (pm->node.flags & PM_NAMEREF) && pm->width) { + if (stop) { + if (stop->flags & PM_NAMEREF) + hn = (HashNode)pm; + else + hn = NULL; + } else { + /* this has to be the end of any chain */ + hn = (HashNode)pm; /* see fetchvalue() */ + } + } else if ((hn = gethashnode2(realparamtab, seek))) { + if (pm) { + if (!(stop && (stop->flags & (PM_LOCAL)))) + hn = (HashNode)upscope((Param)hn, + ((pm->node.flags & PM_NAMEREF) ? + pm->base : ((Param)hn)->level)); + /* user can't tag a nameref, safe for loop detection */ + pm->node.flags |= PM_TAGGED; + } + if (hn) { + if (hn->flags & PM_AUTOLOAD) + hn = getparamnode(realparamtab, seek); + if (!(hn->flags & PM_UNSET)) + hn = resolve_nameref((Param)hn, stop); + } + if (pm) + pm->node.flags &= ~PM_TAGGED; + } else if (stop && (stop->flags & PM_NAMEREF)) + hn = (HashNode)pm; + unqueue_signals(); + } + + return hn; +} + +/**/ +static void +setscope(Param pm) +{ + if (pm->node.flags & PM_NAMEREF) { + Param basepm; + char *t = pm->u.str ? itype_end(pm->u.str, IIDENT, 0) : NULL; + + /* Temporarily change nameref to array parameter itself */ + if (t && *t == '[') + *t = 0; + else + t = 0; + basepm = (Param)resolve_nameref(pm, NULL); + if (t) { + pm->width = t - pm->u.str; + *t = '['; + } + if (basepm) + pm->base = ((basepm->node.flags & PM_NAMEREF) ? + basepm->base : basepm->level); + } +} + +/**/ +mod_export Param +upscope(Param pm, int reflevel) +{ + Param up = pm->old; + while (pm && up && up->level >= reflevel) { + pm = up; + if (up) + up = up->old; + } + return pm; +} + +/**/ +mod_export int +valid_refname(char *val) +{ + char *t = itype_end(val, IIDENT, 0); + + if (*t != 0) { + if (*t == '[') { + tokenize(t = dupstring(t+1)); + t = parse_subscript(t, 0, ']'); + } else { + t = NULL; + } + } + return !!t; +} diff --git a/Src/subst.c b/Src/subst.c index 4ad9fee1a..7ba84cba8 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -2573,13 +2573,14 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, !(v->pm->node.flags & PM_UNSET))) { int f = v->pm->node.flags; - switch (PM_TYPE(f)) { + switch (PM_TYPE(f)|(f & PM_NAMEREF)) { case PM_SCALAR: val = "scalar"; break; case PM_ARRAY: val = "array"; break; case PM_INTEGER: val = "integer"; break; case PM_EFLOAT: case PM_FFLOAT: val = "float"; break; case PM_HASHED: val = "association"; break; + case PM_NAMEREF: val = "nameref"; break; } val = dupstring(val); if (v->pm->level) diff --git a/Src/zsh.h b/Src/zsh.h index f82e76e4b..1e35bd33e 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -1852,8 +1852,9 @@ struct param { GsuHash h; } gsu; - int base; /* output base or floating point prec */ - int width; /* field width */ + int base; /* output base or floating point prec or */ + /* for namerefs, locallevel of reference */ + int width; /* field width or nameref subscript idx */ char *env; /* location in environment, if exported */ char *ename; /* name of corresponding environment var */ Param old; /* old struct for use with local */ @@ -1932,9 +1933,10 @@ struct tieddata { */ #define PM_HASHELEM (1<<28) /* is a hash-element */ #define PM_NAMEDDIR (1<<29) /* has a corresponding nameddirtab entry */ +#define PM_NAMEREF (1<<30) /* pointer to a different parameter */ /* The option string corresponds to the first of the variables above */ -#define TYPESET_OPTSTR "aiEFALRZlurtxUhHTkz" +#define TYPESET_OPTSTR "aiEFALRZlurtxUhHT" /* These typeset options take an optional numeric argument */ #define TYPESET_OPTNUM "LRZiEF"
Bart Schaefer wrote: > When a reference is created, it's necessary to determine the scope in I've run some tests and this looks good. The main issue I noticed was with a reference to associative array elements. typeset -A assoc typeset -n ref=assoc[other] ref=val This does work if assoc is assigned an initial value. The error message from, e.g. `typeset -Z2 ref` is "can't change type of a named reference". However, this is normally an attempt to change the type of the target of a reference (-n is not specified). It works fine for a scalar. So perhaps the error message should complain about changing the type of [associative] array elements instead. One ksh feature which this doesn't cover is the special behaviour in ksh that `typeset -n ref=$1` will use the calling function scope to resolve the target. This is necessary with the lexical (private) scoping of ksh93 functions. The positional parameters are special-cased which I find rather ugly. You could overload -u or -U for indicating Tcl-like uplevel. The trouble with uplevel is that a called function can't know how many levels up to specify if people are allowed to write wrappers. I think a better solution is to take some syntax that isn't a valid variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would indicate where another level of scope should be ascended. Wrapper functions then don't need their own nameref and can just pass, e.g. \&var as the variable parameter. > One difference from ksh93 is that you can't convert scalars into > namerefs, e.g., both with this and in ksh93: That seems reasonable enough. > Named references can't have any other attributes, and attempting to > add an attribute to a named reference generates a warning. However, I can't see an actual conflict in permitting -r, making the reference readonly not the target variable. > With respect to zsh/param/private parameters, it's presently > prohibited to have a "public" reference to a private parameter. It > might be possible to change this if someone can think of a use case. To a user it would seem like a fairly arbitrary restriction. Coming back to the uplevel question, for private to be more useful it'd be really great to be able to have a private reference to a private variable from the calling function scope. I'm not sure how easy that even is with the current implementation of private. > There is one glitch: if the reference is created before the parameter > is declared private, and the private local is NOT hiding something in > an outer scope, then the reference does not retroactively generate an > error and assignments to the reference are silent no-ops. Not sure I follow. Sounds like something that ought to be fixed. > I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can > tell they should never have been there in the first place. typeset -fu is like autoload. I think it was intentional. Note that recent bash also has namerefs so should perhaps be compared. They only resolve in local scope and don't include the local level with the reference. Oliver PS. The basic approach is much the same as that which was rejected many years back with consensus at that time being that a C pointer reference should be used. I did make a start at the time using reference counting but real life stuff got in the way and it was never completed. Zsh's code does a lot of deleting and recreating parameter structures which is problematic with that approach. I do still have an old patch that last got rebased as recently as only 7 years ago if you're interested. Aside from not working with array subscripts, I think I also concluded that it wouldn't work for private variables because of their somewhat too clever implementation.
On Tue, Feb 7, 2023 at 7:45 PM Oliver Kiddle <opk@zsh.org> wrote: > > with a reference to associative array elements. > > typeset -A assoc > typeset -n ref=assoc[other] > ref=val > > This does work if assoc is assigned an initial value. This should work after patch 4/3 that I just sent. % typeset -n ptr=foo % typeset -A assoc % typeset -n ref=assoc[other] % ref=val % typeset -p assoc typeset -A assoc=( [other]=val ) > The error message from, e.g. `typeset -Z2 ref` is "can't change type > of a named reference". However, this is normally an attempt to change > the type of the target of a reference (-n is not specified). It works > fine for a scalar. So perhaps the error message should complain about > changing the type of [associative] array elements instead. Hmm, yes, that message should be changed if possible. > One ksh feature which this doesn't cover is the special behaviour in ksh > that `typeset -n ref=$1` will use the calling function scope to resolve > the target. This should also work after patch 4/3. % typeset outside=OUTSIDE % () { typeset -n ref=$1 print $ref } outside OUTSIDE Or do I misunderstand? > The positional parameters are special-cased which I > find rather ugly. There's nothing explaining this in "man ksh93", can you point me somewhere? Not that your description makes this sound particular attractive. I think the positionals might actually be best handled by a new special parameter. Cogitation needed. > You could overload -u or -U for indicating Tcl-like > uplevel. It'd be easier to overload "-i N" for this since it already populates the "base" integer in Param. > The trouble with uplevel is that a called function can't know > how many levels up to specify if people are allowed to write wrappers. if the current locallevel were visible (cf. new special) then something like typeset -i $((locallevel-1)) -n up=argv could work. I think the current implementation of argv breaks this, tho. I'm still not entirely following what problem this solves. > I think a better solution is to take some syntax that isn't a valid > variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would > indicate where another level of scope should be ascended. Wrapper > functions then don't need their own nameref and can just pass, e.g. > \&var as the variable parameter. You're going to have to write out an example, sorry. > I can't see an actual conflict in permitting -r, making the reference > readonly not the target variable. I'm also thinking about that. The only use-case would be to prevent something from doing "typeset +n ref". > > With respect to zsh/param/private parameters, it's presently > > prohibited to have a "public" reference to a private parameter. It > > might be possible to change this if someone can think of a use case. > > To a user it would seem like a fairly arbitrary restriction. What purpose does it serve? How does it help to have two names for the same thing, one of which is "hidden" and the other isn't? The only analogy I can think of would be having a writable name for a read-only variable. > Coming back to the uplevel question, for private to be more useful it'd > be really great to be able to have a private reference to a private > variable from the calling function scope. I'm not sure how easy that > even is with the current implementation of private. You presently can't make anything that's private visible down-scope. If you mean you want a variable that is visible in exactly one nested scope, I'm back to asking for the use case. > > There is one glitch: if the reference is created before the parameter > > is declared private, and the private local is NOT hiding something in > > an outer scope, then the reference does not retroactively generate an > > error and assignments to the reference are silent no-ops. > > Not sure I follow. Sounds like something that ought to be fixed. In Test/V10private.ztst it's the test with F:See K01typeset.ztst up-reference part 5 F:Here ptr1 finds private ptr2 by scope mismatch, assignment silently fails > > I also removed "kz" from TYPESET_OPTSTR in zsh.h -- as far as I can > > tell they should never have been there in the first place. > > typeset -fu is like autoload. I think it was intentional. Except every letter in TYPESET_OPTSTR has to correspond to the PM_ flag at that same index position, and the PM_ flags matching the k and z options are way out of sequence for that. > Note that recent bash also has namerefs so should perhaps be compared. The special behavior of namerefs in "for" loops and implementing "unset -n" might be good; the latter isn't mentioned until the "unset" builtin in the ksh93 doc so I missed it entirely. > PS. The basic approach is much the same as that which was rejected many > years back with consensus at that time being that a C pointer reference > should be used. And yet we implemented ${(P)...} which is actually a bit grottier. I did consider overloading pm->old but was leery of messing up locals so decided to leave it as a future optimization. > I did make a start at the time using reference counting That was the other part that worried me ... everything just went (relatively) smoothly if most of the code could continue pretending namerefs were scalars. > I think I also concluded that it > wouldn't work for private variables because of their somewhat too clever > implementation. They could be a lot less clever if they could move into the main shell. Most of the cleverness was necessary for making them modular.
On Tue, Feb 7, 2023 at 8:59 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Tue, Feb 7, 2023 at 7:45 PM Oliver Kiddle <opk@zsh.org> wrote: > > > > The error message from, e.g. `typeset -Z2 ref` is "can't change type > > of a named reference". > > Hmm, yes, that message should be changed if possible. It'd be more effort than it's worth to distinguish the type of array involved, but my next patch will change this to say "of array element". > if the current locallevel were visible (cf. new special) then something like > typeset -i $((locallevel-1)) -n up=argv > could work. I think the current implementation of argv breaks this, tho. The up-level argv is a char ** scoped to doshfunc() so it'd be rather icky to expose it as a shell parameter. > The special behavior of namerefs in "for" loops and implementing > "unset -n" might be good; the latter isn't mentioned until the "unset" > builtin in the ksh93 doc so I missed it entirely. Next patch will have both of these, too. I note that execfor() has quite a bit of room for optimization, but I haven't dived into that.
Bart Schaefer wrote: > > One ksh feature which this doesn't cover is the special behaviour in ksh > > that `typeset -n ref=$1` will use the calling function scope to resolve > > the target. > > This should also work after patch 4/3. > > % typeset outside=OUTSIDE > % () { > typeset -n ref=$1 > print $ref > } outside > OUTSIDE > > Or do I misunderstand? In ksh you could pass "ref" as a parameter and the nameref would provide access to the outside variable despite the name clash. > > The positional parameters are special-cased which I > > find rather ugly. > > There's nothing explaining this in "man ksh93", can you point me > somewhere? Not that your description makes this sound particular > attractive. The manual does seem rather vague on this. I've long been aware of it so the detail may have come to me via the old ast-ksh lists. But to demonstrate: % function f { > typeset out=99 var=$1 > nameref ref1=$1 > nameref ref2=$var > echo f: $ref1 $ref2 > ref1=1 > ref2=2 > } % out=0 % f out f: 0 99 % echo $out 1 So $1 as the target of a nameref is special. I don't especially like this. Everywhere else in the shell, a variable reference like $1 is no different from using the text it would expand to explicitly. I don't think we need to worry about the level of ksh93 compatibility here because our default dynamic scoping mean a ksh script will work if variable names don't clash. > > You could overload -u or -U for indicating Tcl-like > > uplevel. > > It'd be easier to overload "-i N" for this since it already populates > the "base" integer in Param. Not with the right number, though. Tcl's uplevel gives a distance (up the procedure calling stack) to move. 1 being the default. You can prefix with # to get an absolute level but that doesn't seem especially useful. So 1 would mean $((locallevel-1)) in absolute terms. Note that Tcl's uplevel is like a precommand modifier in zsh terms so is not limited to defining references. However I still prefer the other solution I mentioned with \&. > I'm still not entirely following what problem this solves. It solves the problem of name clashes between a function and the calling function. Consider Completion/Base/Utility/_call_function, It expects the name of a parameter in $1. It does: local _name _ret _name="$1" ... _ret=... eval "${_name}=${_ret}" The underscores are there largely to avoid name clashes. In ksh, this could instead be: nameref ret="$1" ret=... And it wouldn't matter if ret is also the variable name in the calling function. > > I think a better solution is to take some syntax that isn't a valid > > variable name. e.g typeset -n ref=\&1 Repeated use of an initial & would > > indicate where another level of scope should be ascended. Wrapper > > functions then don't need their own nameref and can just pass, e.g. > > \&var as the variable parameter. > > You're going to have to write out an example, sorry. With this idea: typeset -n ret=\&1 would be like ksh's: typeset -n ret=$1 But you could also do typeset -n ret=\&var The code would strip the initial \&, go up one local level and see what $var is in that scope. If the result also starts with an &, it would repeat the process going up a further level. So a wrapper function to, e.g. _call_function could pass "&$1" directly as the first parameter instead of needing a local nameref of it's own. You could optionally allow "&&var" to go up two levels. It doesn't need to be & if you think a different character is better, it just needs to not be a valid variable name. Does that make any more sense than the previous explanation? > I'm also thinking about that. The only use-case would be to prevent > something from doing "typeset +n ref". Also prevents typeset -n ref=othervar It is just as useful or useless as -r with other variables. > > > With respect to zsh/param/private parameters, it's presently > > > prohibited to have a "public" reference to a private parameter. It > > > might be possible to change this if someone can think of a use case. > > > > To a user it would seem like a fairly arbitrary restriction. > > What purpose does it serve? How does it help to have two names for > the same thing, one of which is "hidden" and the other isn't? The > only analogy I can think of would be having a writable name for a > read-only variable. If two functions are developed independently, they shouldn't need to worry about the possibility of using the same variable name for different purposes. This is the main benefit of private in general. An old function like _call_function (updated to use nameref but still using local) could be used by someone writing a new function where they pass a private variable as the first parameter. If they are treating _call_function as a black box, how are they to know that private variables are not valid for the passed $1. It'd be an arbitrary limitation. > The special behavior of namerefs in "for" loops and implementing > "unset -n" might be good; the latter isn't mentioned until the "unset" > builtin in the ksh93 doc so I missed it entirely. Given that you implemented references to array members, being able to iterate over array elements with for could be nice. Perhaps something like: for ref in 'arr[*]'; do maybe [@] to include empty elements. On the subject references to array elements, it does seem both powerful and dangerous that subscripts are evaluated on reference. The subscript should perhaps be evaluated with the same local level as the array itself. And it could be wise to limit what can be done as part of the subscript evaluation to avoid a CVE similar to the last one. > > I think I also concluded that it > > wouldn't work for private variables because of their somewhat too clever > > implementation. > > They could be a lot less clever if they could move into the main > shell. Most of the cleverness was necessary for making them modular. I would favour moving them to the main shell. And then making extensive use of them in completon etc. There are things that could move in the opposite direction if we want to reduce bloat. Mail checking for example. Oliver
This isn't a self-reference because the local levels would be different.
And "self reference" in the error message should be hyphenated.
typeset -n ref=var
foo() { typeset -n var=ref }
foo
foo:typeset:1: var: invalid self reference
With -g it would be, however.
I wrote:
> On the subject references to array elements, it does seem both powerful and
> dangerous that subscripts are evaluated on reference. The subscript
This is fairly sane:
typeset -n ref1=arr[ref2]
typeset -n ref2=ref1
echo $ref1
zsh: math recursion limit exceeded: ref2
This seems useful:
arr=()
typeset -n ref=arr[1,0]
ref=4
ref=5
Similar:
idx=1
typeset -n ref=arr[idx++]
echo $ref $ref
This is where I worry about security:
arr=( 1 2 3 4)
typeset -n ref='arr[$(echo 4)]'
echo $ref
typeset -p ref # it was only evaluated late thanks to quotes
Maybe the code should have a safer mode for subscript evaluation. This
could be useful but it is asking for trouble.
Also can see this being useful:
str='one two three'
typeset -n ref=str[(w)2]
ref=zwei
We already talked about this error message. But now it is a substring.
export ref
export: ref: can't change type of a named reference
Oliver
On Wed, Feb 8, 2023 at 4:47 PM Oliver Kiddle <opk@zsh.org> wrote: > > Bart Schaefer wrote: > > > One ksh feature which this doesn't cover is the special behaviour in ksh > > > that `typeset -n ref=$1` will use the calling function scope to resolve > > > the target. > > > > In ksh you could pass "ref" as a parameter and the nameref would provide > access to the outside variable despite the name clash. I think that will start working if I can fix the "invalid self reference" mentioned in your next message. > So [in ksh] $1 as the target of a nameref is special. [...] I don't > think we need to worry about the level of ksh93 compatibility here > because our default dynamic scoping mean a ksh script will work if > variable names don't clash. My general feeling is to be ksh compatible where I don't think ksh is stupid. > Note that Tcl's uplevel is like a precommand modifier in zsh terms so > is not limited to defining references. > > However I still prefer the other solution I mentioned with \&. I have some half-formed thoughts about other ways to accomplish this ... I don't think the \& suggestion by itself gets around the problem that the positionals are C-local to doshfunc(). > > I'm still not entirely following what problem this solves. > > It solves the problem of name clashes between a function and the calling > function. OK, so if typeset ref () { typeset -n ref=ref } works / is not a loop, then we don't need any other magic? > If two functions are developed independently, they shouldn't need > to worry about the possibility of using the same variable name for > different purposes. This is the main benefit of private in general. > An old function like _call_function (updated to use nameref but still > using local) could be used by someone writing a new function where they > pass a private variable as the first parameter. If they are treating > _call_function as a black box, how are they to know that private > variables are not valid for the passed $1. It'd be an arbitrary > limitation. I'm still not seeing that. The writer of the caller of _call_function should know that it's a general rule that another function can't access a private variable. Why would the writer pass a private name as an argument to any function, even a blackbox, if not expecting it to be referenced? The rule for a private should be that you always pass its value. Correct me if I still misunderstand, but I interpret the example above as implying that f2() { typeset -n ref=$1 # ... do stuff to $ref ... } f1() { private var f2 var } should work just because the name is passed in a positional? I would disagree with that on the grounds that you're making positionals into a special case, just a different kind of special case from ksh, that violates the intent of "private". The case I was asking about (and the only one I so far consider possibly viable) is f1() { private var typeset -n ref=var f2 ref } That is, the reference has to be declared in the scope where var is "visible", otherwise you hit the privacy wall when "typeset -n" has to up-level. Or perhaps you mean to be tagging on to your other suggestion and it would be f2 \&var which feels like syntactic sugar for my example above, with the benefit of not needing to declare an extra parameter. That's an interesting idea but I would not want this to work: f2() { typeset -n upref=\&var # ... do things to var one level up ... typeset -n uptwo=\&\&var # ... do things to var two levels up ... } I fear that puts us back to having something magic about positional parameters, e.g., that unless the "&var" string appears in one of $1, $2, ... then it's just an invalid name like any other. Back again to my half-formed thoughts, I can't really elucidate yet. > Given that you implemented references to array members, being able to > iterate over array elements with for could be nice. Perhaps something > like: for ref in 'arr[*]'; do With a hash that's just: typeset -n ref for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ... but since you have to "typeset -n ref" before the for-loop anyway, why not just typeset -n ref=hash for idx in ${(k)ref}; do ... $ref[$idx] ... Ordinary arrays are harder because there's no "find all matching elements" subscript flag. > On the subject references to array elements, it does seem both powerful and > dangerous that subscripts are evaluated on reference. Hm ... I implemented that because ksh93 accepted the syntax, but I never actually tried it -- in fact (Ubuntu's) ksh93 doesn't actually expand it to anything sensible and the value of the nameref ends up nonsense. > The subscript > should perhaps be evaluated with the same local level as the array > itself. That gets pretty weird, and you can already trivially bypass it with ${(P)...} for many cases. > And it could be wise to limit what can be done as part of the > subscript evaluation to avoid a CVE similar to the last one. validate_refname() is calling parse_subscript() ... would further examination of the accepted string be sufficient, or something more needed? I think the greatest danger is of creating an infinite recursion, which can't really be caught in advance.
On Wed, Feb 8, 2023 at 6:01 PM Oliver Kiddle <opk@zsh.org> wrote: > > This isn't a self-reference because the local levels would be different. Working on that. Halfway there. > And "self reference" in the error message should be hyphenated. I copied that message verbatim from ksh93. Can still change it, but does it really matter? > Maybe the code should have a safer mode for subscript evaluation. See previous message. > We already talked about this error message. But now it is a substring. > export ref > export: ref: can't change type of a named reference The point at which it's necessary to catch that error would have to lop off the subscript and re-evaluate the base variable name in order to determine the thing being subscripted. That seems like an unnecessary amount of work. How about can't change type via subscript reference ??
Bart Schaefer wrote: > My general feeling is to be ksh compatible where I don't think ksh is stupid. Sounds reasonable. > > It solves the problem of name clashes between a function and the calling > > function. > > OK, so if > typeset ref > () { typeset -n ref=ref } > works / is not a loop, then we don't need any other magic? The following is similar: var=hello typeset -n ref () { typeset var=x ref=var echo $ref } typeset -p ref echo $ref This creates a reference to a variable at a deeper local level. Ksh prints "global reference cannot refer to local variable". With zsh's dynamic scoping you could do the same thing with a reference that isn't "global" but which has a scope that will outlast the variable. Making var outlive the function is a feature you'd get with a reference counting approach to namerefs. The best might be if ref returns to being unset when the function returns but an error like ksh is fine too. > I'm still not seeing that. The writer of the caller of _call_function > should know that it's a general rule that another function can't > access a private variable. Why would the writer pass a private name > as an argument to any function, even a blackbox, if not expecting it > to be referenced? The rule for a private should be that you always > pass its value. I hadn't really thought about it that way, perhaps because ksh only has private scoping and I'm used to writing in languages that only have lexical scoping. Certainly if it is hard to implement, I have no objection to this approach. It does have the big advantage of making the whole uplevel question a non-issue. We do lose some orthogonality in that you can use a private with builtins that take variable names like read, compadd and printf (-v). Wrappers of those would have an added limitation. (_approximate relies on a function wrapper of compadd so privates can't be used with that) When relying only on dynamic scoping, it would be good practice to define all the namerefs to passed parameters as early as possible in a function to reduce the risk of a name clash. > Correct me if I still misunderstand, but I interpret the example above > as implying that > > f2() { > typeset -n ref=$1 > # ... do stuff to $ref ... > } > f1() { > private var > f2 var > } > > should work just because the name is passed in a positional? I would > disagree with that on the grounds that you're making positionals into > a special case, just a different kind of special case from ksh, that > violates the intent of "private". It isn't about the positionals being special but that it is useful to be able to write a function that exposes an interface similar to read where a variable can be named as a parameter. Ksh's making $1, $2.. special on the rhs of typeset -n really is very ugly. > The case I was asking about (and the only one I so far consider > possibly viable) is > > f1() { > private var > typeset -n ref=var > f2 ref > } This puts the complexity into the caller so f2's interface is less nice. And by needing ref as a local, f1 hasn't really benefitted from var being able to be private. But if this is easy to implement then it could be useful. If not, printing an error is fine too. > Or perhaps you mean to be tagging on to your other suggestion and it would be > f2 \&var My intention with that suggestion is that you'd only do that to refer to $var from the scope of f1's caller. So in practice that'd sooner be something like \&$3. For this, it'd be just `f2 var` and f2() would declare `private -n mine=\&1` > which feels like syntactic sugar for my example above, with the > benefit of not needing to declare an extra parameter. That's an > interesting idea but I would not want this to work: Again, the key difference is that the caller needed the extra parameter where as using some sort of uplevel goes in the callee. You can't wrap e.g. compadd transparently if the caller needs to change. > With a hash that's just: > > typeset -n ref > for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ... "just"!? > > And it could be wise to limit what can be done as part of the > > subscript evaluation to avoid a CVE similar to the last one. > > validate_refname() is calling parse_subscript() ... would further > examination of the accepted string be sufficient, or something more > needed? I think the greatest danger is of creating an infinite > recursion, which can't really be caught in advance. So if a function gets the target of a nameref from untrusted input the function writer needs to know to validate it with something like [[ $var = [[:IDENT:]]## ]] It might deprive us of many clever tricks but parse_subscript() could gain a flag to disable anything questionable like command substitution and math evaluation side-effects. This should be an error perhaps: typeset -n ref=arr[1][2] Currently it isn't possible to create a reference fo $!, $?, $@, $+, $# and $$. If easy to add, there would be no harm in them. And in 51388, Bart wrote: > How about > can't change type via subscript reference > ?? That sounds fine. Oliver
On Thu, Feb 9, 2023 at 12:49 PM Oliver Kiddle <opk@zsh.org> wrote: > > The following is similar: > var=hello > typeset -n ref > () { > typeset var=x > ref=var > echo $ref > } > typeset -p ref > echo $ref > > This creates a reference to a variable at a deeper local level. Only in a pointers/refcounting implementation. With the assumption of dynamic scoping, it creates a reference to a name, where the scope search for the name starts at a lower level. If that name doesn't exist at that level (because the whole level doesn't exist any more), the search climbs up. So I get with set -x (and a couple of in-progress patches for looping references): +Src/zsh:15> var=hello +Src/zsh:16> typeset -n ref +Src/zsh:17> '(anon)' +(anon):1> typeset var=x +(anon):2> ref=var +(anon):3> echo x x +Src/zsh:22> typeset -p ref typeset -n ref=var +Src/zsh:23> echo hello hello If I add () { typeset var=y echo $ref () { typeset var=z echo $ref } } I get +Src/zsh:24> '(anon)' +(anon):0> typeset var=y +(anon):0> echo y y +(anon):1> '(anon)' +(anon):0> typeset var=z +(anon):0> echo y y > The best might be if ref returns to being > unset when the function returns but an error like ksh is fine too. I'm not sure how to do that without scanning the whole parameter table, but I agree it the above is a little puzzling. > Ksh prints "global reference cannot refer to local variable". At what point does that happen? Upon the assignment ref=var ? Relatedly, what should happen on any failed assignment? E.g. typeset -n xy yx xy=yx # OK so far yx=xy # Oops, invalid loop Should yx become unset, or should it remain a nameref with no referent? > > The rule for a private should be that you always > > pass its value. > > I hadn't really thought about it that way, perhaps because ksh only > has private scoping and I'm used to writing in languages that only > have lexical scoping. Certainly if it is hard to implement, I have no > objection to this approach. ("This approach" meaning "no public refs to private vars"?) I haven't tried anything else yet to see how hard it might be. > We do lose some orthogonality in > that you can use a private with builtins that take variable names like > read, compadd and printf (-v). Wrappers of those would have an added > limitation. That's true of private already, isn't it? > When relying only on dynamic scoping, it would be good practice to > define all the namerefs to passed parameters as early as possible in a > function to reduce the risk of a name clash. If you were going to put that in the doc, where would it go? > It isn't about the positionals being special but that it is useful to be > able to write a function that exposes an interface similar to read where > a variable can be named as a parameter. Ksh's making $1, $2.. special > on the rhs of typeset -n really is very ugly. This works for my code in current state: % var=GLOBAL % typeset -n ref=var % f() { function> typeset -n ref=$1 function> print $ref function> ref=LOCAL function> } % f ref GLOBAL % print $ref LOCAL % > > f2 \&var > > My intention with that suggestion is that you'd only do that to refer to > $var from the scope of f1's caller. So in practice that'd sooner be > something like \&$3. For this, it'd be just `f2 var` and f2() would > declare `private -n mine=\&1` Yeah, I don't like the idea that a called function can arbitrarily grab parameters from its caller just by sticking & in front. Caller needs to do do something (even if only make "normal" use of dynamic scope) to make the parameter "grabbable". > > With a hash that's just: > > > > typeset -n ref > > for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ... > > "just"!? Hah! Point was that it's do-able without "for"-context-sensitive special subscript semantics. I think it would be strange for ary=( 1 2 3 4 5 ) typeset -n ref='ary[*] ref=( a b c) to produce something different than ary=( 1 2 3 4 5 ) typeset -n ref for ref in 'ary{*]'; do ref=( a b c ); done > > > And it could be wise to limit what can be done as part of the > > > subscript evaluation to avoid a CVE similar to the last one. > > > > validate_refname() is calling parse_subscript() ... would further > > examination of the accepted string be sufficient, or something more > > needed? I think the greatest danger is of creating an infinite > > recursion, which can't really be caught in advance. > > So if a function gets the target of a nameref from untrusted input the > function writer needs to know to validate it No, I meant, would examining the subscript string in the C code be sufficient. > This should be an error perhaps: > > typeset -n ref=arr[1][2] Why? ${ary[1][2]} isn't an error, it's the 2nd character of the first word in $ary. You can keep throwing subscripts on there as long as the resulting substrings can be indexed-into. print ${ary[1][3,9][4]} # etc. > Currently it isn't possible to create a reference fo $!, $?, $@, $+, $# > and $$. If easy to add, there would be no harm in them. You can make references to argv and ARGC, but they always refer to the current argv/ARGC because of the aforementioned implementation of positionals as C locals. $* $@ $# would have the same issues. The others would all have to be special-cased individually. What is $+ ? Do you mean $- ?
Closing in on this, I think ... remaining decisions on which input would be appreciated: On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > Ksh prints "global reference cannot refer to local variable". > > At what point does that happen? Upon the assignment ref=var ? At the moment, just toying with this, I have it erroring when EMULATE_KSH, printing a warning when WARN_NESTED_VAR, and otherwise silently creating the down-scope reference. Thoughts? Perhaps WARN_CREATE_GLOBAL would actually be better here, or maybe check for either one. > Relatedly, what should happen on any failed assignment? E.g. > > typeset -n xy yx > xy=yx # OK so far > yx=xy # Oops, invalid loop > > Should yx become unset, or should it remain a nameref with no referent? This probably doesn't matter except at the command line as the referent loop is otherwise fatal. > > When relying only on dynamic scoping, it would be good practice to > > define all the namerefs to passed parameters as early as possible in a > > function to reduce the risk of a name clash. > > If you were going to put that in the doc, where would it go? Options are: - under Named References in Parameters - at the description of -n under typeset - with the scoping example in Expansion - in a new section under Functions > > > validate_refname() is calling parse_subscript() ... would further > > > examination of the accepted string be sufficient, or something more > > > needed? > [...] I meant, would examining the subscript string in the C code be sufficient. > > > It might deprive us of many clever tricks but parse_subscript() could > > gain a flag to disable anything questionable like command substitution > > and math evaluation side-effects. parse_subscript() is already a bit of a hack, it sets up its own context and invokes the lexer. A flag such as that would, I think, require plugging in an alternate lexer, hence my question about doing a simpler examination of the string.
On Fri, Feb 10, 2023 at 7:04 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> > > It might deprive us of many clever tricks but parse_subscript() could
> > > gain a flag to disable anything questionable like command substitution
> > > and math evaluation side-effects.
I'm not sure what to do about math side-effects, but I've hit upon the
idea of expanding subscripts that appear in named references with
EXECOPT set to zero (noexec). That reduces $(...) to the empty
string, except for the special-case $(<file) that is being discussed
in another thread.
Switching to noexec around singsub() in getarg() is quite easy, and as
previously mentioned you can still do fully dynamic subscripts with
${(P)...}.
Oliver wrote:
> > And it could be wise to limit what can be done as part of the
> > subscript evaluation to avoid a CVE similar to the last one.
% print $ZSH_PATCHLEVEL
ubuntu/5.8-3ubuntu1.1
% empty=()
% loop='empty[${(P)loop}]'
% print ${(P)loop}
zsh: segmentation fault (core dumped) zsh -f
Bart Schaefer wrote: > Closing in on this, I think ... remaining decisions on which input > would be appreciated: > > On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > > Ksh prints "global reference cannot refer to local variable". > > > > At what point does that happen? Upon the assignment ref=var ? > > At the moment, just toying with this, I have it erroring when > EMULATE_KSH, printing a warning when WARN_NESTED_VAR, and otherwise > silently creating the down-scope reference. Thoughts? Perhaps > WARN_CREATE_GLOBAL would actually be better here, or maybe check for > either one. I'd make it a fatal error unconditionally. I don't like it if the target of a nameref can switch to something else following a return. I'd be fine with the reference becoming unset at the time of the return if it refers to a variable that is going out of scope. Can that be done as part of the same code that drops the local variables? (I'm guessing it is easy for local but hard for private) The mention of WARN_NESTED_VAR raises something else. Surely a reference should disable that warning: toplevel() { local foo="in fn" nested } nested() { typeset -n ref=foo ref="in nested" } setopt warn_nested_var toplevel nested:2: scalar parameter foo set in enclosing scope in function nested > > Relatedly, what should happen on any failed assignment? E.g. > > > > typeset -n xy yx > > xy=yx # OK so far > > yx=xy # Oops, invalid loop > > > > Should yx become unset, or should it remain a nameref with no referent? > > This probably doesn't matter except at the command line as the > referent loop is otherwise fatal. Current code even allows: typeset -n ref ref=ref I'd suggest that we need the self-reference check whenever assigning to a reference. But that entails evaluation of a subscript. > > > When relying only on dynamic scoping, it would be good practice to > > > define all the namerefs to passed parameters as early as possible in a > > > function to reduce the risk of a name clash. > > > > If you were going to put that in the doc, where would it go? Do we to put recommended good practices in the doc at all as opposed to it being a terse description of the functionality. I'd prefer it out of the way with examples rather than adding fluff to the places you list. I'd also consider it bad practice not to always assign a reference when creating it. > > > It might deprive us of many clever tricks but parse_subscript() could > > > gain a flag to disable anything questionable like command substitution > > > and math evaluation side-effects. > > parse_subscript() is already a bit of a hack, it sets up its own > context and invokes the lexer. A flag such as that would, I think, > require plugging in an alternate lexer, hence my question about doing > a simpler examination of the string. That's not as robust but certainly an option. The temporary use of NOEXEC that you mention in the other message certainly goes a long way to alleviating security concerns. Oliver
On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote: > > Bart Schaefer wrote: > > On Thu, Feb 9, 2023 at 3:07 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > > > > Ksh prints "global reference cannot refer to local variable". > > I'd make it a fatal error unconditionally. I don't like it if the target > of a nameref can switch to something else following a return. Making it an unconditional error means that this sort of thing doesn't work: typeset -n ref () { typeset a=A b=B c=C for ref in a b c; do ref=${(L)ref}; done } > I'd be > fine with the reference becoming unset at the time of the return if it > refers to a variable that is going out of scope. Can that be done as > part of the same code that drops the local variables? Maybe. Might look at that tomorrow. > The mention of WARN_NESTED_VAR raises something else. Surely a reference > should disable that warning: Actually there's an argument that an unexpected up-reference is a really good reason for that warning. A function that intentionally uses an up-reference could turn the option off locally. > > > Relatedly, what should happen on any failed assignment? > > Current code even allows: > > typeset -n ref > ref=ref That's fixed in my sandbox. Patch eventually forthcoming. > I'd suggest that we need the self-reference check whenever assigning to > a reference. But that entails evaluation of a subscript. As I have it right now, self-reference checks do happen at any assignment. That's why I wonder if deleting the offending parameter is OK. Circular references hidden inside subscripts end up expanding to empty string, as do command substitutions with the NO_EXEC trick. > Do we to put recommended good practices in the doc at all as opposed to > it being a terse description of the functionality. I'd prefer it out of > the way with examples rather than adding fluff to the places you list. The Functions chapter was rather too terse. Here's what I've written to add: --- 8< --- Parameters declared by any of the 'typeset' family of commands during the execution of a function become _local_ to the function unless the '-g' option is used. This is the _scope_ of the parameter, which extends dynamically to any other functions called by the declaring function. In most cases, local parameters take the place of any other parameter having the same name that was assigned or declared in an earlier function scope. (See *note Local Parameters::.) A named parameter declared with the '-n' option to any of the 'typeset' commands becomes a reference to a parameter in scope at the time of assignment to the named reference, which may be at a different call level than the declaring function. For this reason, it is good practice to declare a named reference as soon as the referent parameter is in scope, and as early as possible in the function if the reference is to a parameter in a calling scope. A typical use of named references is to pass the name of the referent as a positional parameter. For example, pop() { local -n ref=$1 local last=$ref[$#ref] ref[$#ref]=() print -r -- $last } array=( a list of five values ) pop array prints the word values and shortens $array to '( a list of five )'. There are no local parameters in pop at the time 'ref=$1' is assigned, so 'ref' becomes a reference to 'array' in the caller. --- >8 --- > I'd also consider it bad practice not to always assign a reference when > creating it. That also doesn't work with the for-loop example above (even if the nameref is moved inside the function).
On Fri, Feb 10, 2023 at 11:45 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > I'd be
> > fine with the reference becoming unset at the time of the return if it
> > refers to a variable that is going out of scope. Can that be done as
> > part of the same code that drops the local variables?
That's actually a little startling given dynamic scoping. It leads to this:
{
typeset -n ref=var
() {
var=FUNC1
typeset -p ref
}
typeset -p ref
}
Output:
typeset -g -n ref=var
typeset: no such variable: ref
If the called function were a blackbox, it would be extremely puzzling
why "ref" became unset.
The additional complication is that the parameter table is a hash, so
it's scanned in random order; we can't be sure that the referent local
$var will be found and unset before the referring $ref is examined. I
suppose we could scan twice (ick).
Instead let's think for a moment about how the dynamic scoping works.
"ref" has two properties: The locallevel at which it was declared,
and the locallevel at which it prefers to find a referent. Any time
you use $ref (or assign to ref) the search starts at the current
locallevel and proceeds upward until either the preferred level is
reached or the "topmost" defined parameter is found, whichever comes
first.
So ... if endparamscope() walks the preferred referent locallevel
upward as the scope unwinds, we get (using your earlier example with
my addition):
{
var=hello
typeset -n ref
() {
typeset var=x
ref=var
echo $ref
}
typeset -p ref
echo $ref
() {
typeset var=y
echo $ref
() {
typeset var=z
echo $ref
}
}
}
Output:
x
typeset -n ref=var
hello
hello
hello
The inner assignment to ref changes the scope , but it then unwinds so
in the second function (which does not assign to ref) it is back to
pointing at the original scope, and finds $var there.
If "var=hello" is replaced by "unset var" (so there is no parameter at
the original scope), then the output is:
x
typeset -n ref=var
y
y
This is because the assignment "ref=var" has initialized the name to
which "ref" points, but there is no such name at the same level, so
the nearest parameter of the same name is chosen.
If "typeset -n ref" at the top level is replaced by "typeset -n
ref=var" then the "ref=var" assignment in the first function changes
the value of either the global $var or the local $var from "x" to
"var" (depending on which existed first) because ref already has a
referent name. So the output could be either:
# with var=hello
var
typeset -n ref=var
var
var
var
# with var unset
var
typeset -n ref=var
y
y
Given the requirements that (1) assigning a value to a nameref that
has no referent initializes the nameref (required for "for" loop) and
(2) dynamic scoping selects the best referent by name, I think this is
the best I can do without something even more surprising.
The other option that occurs to me is for the assignment ref=var to
automatically create a local $ref (as if "typeset ref=var" had been
used). That would then be unwound and we'd end up back in the
original state. But that would also be unexpected behavior in dynamic
scopes.
On Sat, Feb 11, 2023 at 3:43 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Fri, Feb 10, 2023 at 11:45 PM Bart Schaefer > <schaefer@brasslantern.com> wrote: > > > > On Fri, Feb 10, 2023 at 11:02 PM Oliver Kiddle <opk@zsh.org> wrote: > > > > > > fine with the reference becoming unset at the time of the return if it > > > refers to a variable that is going out of scope. > > That's actually a little startling given dynamic scoping. It leads to this: > > { > typeset -n ref=var > () { > var=FUNC1 # that should of course be "local var=FUNC1" sorry > typeset -p ref > } > typeset -p ref > } > > Output: > typeset -g -n ref=var > typeset: no such variable: ref
Bart Schaefer wrote: > > > I'd be > > > fine with the reference becoming unset at the time of the return if it > > > refers to a variable that is going out of scope. Can that be done as > > > part of the same code that drops the local variables? > > That's actually a little startling given dynamic scoping. It leads to this: While I said "unset", what I actually meant was only dropping the value part of the reference - returning it to the `typeset -n ref` state. However: > Instead let's think for a moment about how the dynamic scoping works. > "ref" has two properties: The locallevel at which it was declared, > and the locallevel at which it prefers to find a referent. Any time > you use $ref (or assign to ref) the search starts at the current > locallevel and proceeds upward until either the preferred level is > reached or the "topmost" defined parameter is found, whichever comes > first. You've persuaded me. It is consistent with dynamic scoping in general. Perhaps it has further convinced me (not that it needed to) that lexical scoping is generally better. Also when following and considering your examples, the following scenario occurred to me. I can't see a reason to object to this behaviour and some of the ideas I had in mind would mess with it. Note that the reference starts out pointing to an unset variable: bar() { typeset var=hello echo $ref } foo() { typeset -n ref=var bar var=foo echo $ref } unset var foo It'd be surprising if calling bar somehow lost the value of the reference. Oliver
Bart Schaefer wrote: > % empty=() > % loop='empty[${(P)loop}]' > % print ${(P)loop} > zsh: segmentation fault (core dumped) zsh -f And in 51399 on namerefs: > Circular references hidden inside subscripts end up expanding to empty > string, as do command substitutions with the NO_EXEC trick. That surprises me. I can't see any executions occurring. Would making (P) a little safer by applying the NO_EXEC trick to it too fix that seg fault. Or did you have a different fix in mind? Are there sane uses to (P) with nasty stuff like hidden command substitutions. We have (e) and eval which are at least explicit in their dangers. If genuine uses surface later (PP) could turn on the dangerous form. Or you might argue that we should support (PP) for the safe form. Making (P) safer could be another use for the FUTURE option I suggested in the final paragraph of 51281 - perhaps very few people (if any) were still reading at that point so it may have been overlooked. Oliver
On Sun, Feb 12, 2023 at 12:00 AM Oliver Kiddle <opk@zsh.org> wrote: > > Bart Schaefer wrote: > > % empty=() > > % loop='empty[${(P)loop}]' > > % print ${(P)loop} > > zsh: segmentation fault (core dumped) zsh -f > > And in 51399 on namerefs: > > Circular references hidden inside subscripts end up expanding to empty > > string, as do command substitutions with the NO_EXEC trick. > > That surprises me. I can't see any executions occurring. I apologize, the reference to NO_EXEC only applies to the clause about command substitutions. The circular references are handled by using tagging to detect/abort the loop during nameref resolution. I'm uncertain whether a similar approach could be used for (P) and haven't dug into it, I didn't want to mix a patch for that into the patches for namerefs. > Would making > (P) a little safer by applying the NO_EXEC trick to it too fix that seg > fault. Or did you have a different fix in mind? It wouldn't and I didn't, at least not yet. It would have to be something along the lines of the "math recursion limit exceeded" handling. There are two or three other places in the code where there are comments rejecting arbitrary limits on recursion so this might be something we don't want to fix. > Making (P) safer could be another use for the FUTURE option I suggested > in the final paragraph of 51281 - perhaps very few people (if any) were > still reading at that point so it may have been overlooked. It would be helpful to know whether any of the uses of (P) in the existing contributed and completion functions etc. actually rely on expanding command substitutions. It would be pretty unlikely that they rely on self-reference.
On 9 Feb, Bart Schaefer wrote (I've reordered this): > > This should be an error perhaps: > > > > typeset -n ref=arr[1][2] > > Why? ${ary[1][2]} isn't an error, it's the 2nd character of the first > word in $ary. Sorry, should have been clearer. Applying later subscripts is even better than an error but the following is not ideal: % arr=( one two three ) % typeset -n ref=arr[2][1] % echo $ref two[1] % ref=x % typeset -p arr typeset -a arr=( one x three ) % echo $ref x[1] % typeset -n ref=arr[1]whatever % echo $ref onewhatever > Yeah, I don't like the idea that a called function can arbitrarily > grab parameters from its caller just by sticking & in front. Caller > needs to do do something (even if only make "normal" use of dynamic > scope) to make the parameter "grabbable". Allowing such things as uplevel statements (Tcl), special syntax, or introspection modules (Python's inspect), is fairly normal for a scripting language. As I've mentioned, we already have the situation where the caller needs nothing to make a private grabable by a builtin: private -a var compadd -O var one two three And writing wrappers to e.g. compadd is useful. But I don't think we need to address references to privates in other scopes at this point before the other patches are even applied. May be better to consider whether there are other advantages in moving private to the main shell first as that may ease any implementation. > > > typeset -n ref > > > for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ... > > > > "just"!? > > Hah! Point was that it's do-able without "for"-context-sensitive > special subscript semantics. I think it would be strange for I can't think of a sensible way. Was just that it could be useful. Best long term solution might be to allow arrays of references. Bash does allow arrays of integers. > The others would all have to be special-cased individually. What is > $+ ? Do you mean $- ? Yes. Oliver
On Sun, Feb 12, 2023 at 1:02 AM Oliver Kiddle <opk@zsh.org> wrote: > > % typeset -n ref=arr[1]whatever > % echo $ref > onewhatever Aha. OK, that's pretty easy to fix. > As I've mentioned, we already have the situation where the caller > needs nothing to make a private grabable by a builtin: > private -a var > compadd -O var one two three > And writing wrappers to e.g. compadd is useful. > > But I don't think we need to address references to privates in other > scopes at this point before the other patches are even applied. May be > better to consider whether there are other advantages in moving private > to the main shell first as that may ease any implementation. I think it would in fact be necessary to move private to the main shell in order to fix this. The second-worst hack in private is the overloading of some parameter flags, and the internals of e.g. createparam() et al. need to be able to distinguish the cases of the original flag meaning from its overloaded meaning. Builtins have the advantage of not passing through that code path. > > > > typeset -n ref > > > > for ref in 'hash[(e)'${(k)^hash[(R)?*]}']'; do ... > > I can't think of a sensible way. Was just that it could be useful. What about adding ${(K)array) to return array[1] array[2] ... ? for ref in ${(K)hash}; do ... I took a quick look at the implementation of (k) and (v) and (K) looks do-able with sufficient messing around in paramsubst(). Which I'm not feeling like undertaking myself just now, though. > Best long term solution might be to allow arrays of references. Bash > does allow arrays of integers. Bash also has sparse arrays, its implementation is quite different.
On Sun, Feb 12, 2023 at 10:59 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> > [...] consider whether there are other advantages in moving private
> > to the main shell first as that may ease any implementation.
>
> [...] The second-worst hack in private is the
> overloading of some parameter flags, and the internals of e.g.
> createparam() et al. need to be able to distinguish the cases of the
> original flag meaning from its overloaded meaning.
We're up to (1<<30) on PM_* flags. Can we go above (1<<31) ?
Bart Schaefer wrote:
> We're up to (1<<30) on PM_* flags. Can we go above (1<<31) ?
We can't even use 1<<31 without either refactoring to ensure the use of
unsigned int or making it otherwise work for either a 32 or 64-bit int.
Perhaps assigning it as INT_MIN just works but I'm not sure.
See 43674. I had somehow missed the fact that Peter had freed two bits
there.
It might be better to force it up to 64-bits if doing refactoring. There
are a few mutually exclusive flags which would imply that bits could be
freed but the need to correspond to letters in TYPESET_OPTSTR prevents
that.
Oliver
Bart Schaefer wrote: > On Sun, Feb 12, 2023 at 1:02 AM Oliver Kiddle <opk@zsh.org> wrote: > > > > % typeset -n ref=arr[1]whatever > > % echo $ref > > onewhatever > > Aha. OK, that's pretty easy to fix. That still leaves the oddities with a second subscript: typeset -n ref='arr[2][1]' arr=( one two three ) echo $ref echo ${ref} ref=Z typeset -p arr Only case where the second subscript works is when ref is in braces. > What about adding ${(K)array) to return array[1] array[2] ... ? > > for ref in ${(K)hash}; do ... That'd work. There's been a few times when I've wished (k) and (kv) would work for normal arrays. the former is just {1..$#arr} but would work in nested expansions. Oliver
2023-02-10 23:45:36 -0800, Bart Schaefer:
[...]
> A named parameter declared with the '-n' option to any of the 'typeset'
> commands becomes a reference to a parameter in scope at the time of
> assignment to the named reference, which may be at a different call
> level than the declaring function. For this reason, it is good practice
> to declare a named reference as soon as the referent parameter is in
> scope, and as early as possible in the function if the reference is to a
> parameter in a calling scope.
[...]
One difference with ksh93: if the target variable was not set at
the time of the "typeset -n ref=target", then when ref is
assigned it may end up refering to a local target:
$ ./Src/zsh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"'
X var X
$ ./Src/zsh --emulate ksh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var X
$
Compare with ksh:
$ ksh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var foo
X
That's a common use case like for the zslurp function discussed
on zsh-users where one would do:
cmd | zslurp var
To get the output of cmd verbatim into $var with $var potentially undeclared
beforehand, which wouldn't work if var was used locally by zslurp.
It can be worked around with a typeset -g $1 before the typeset -n var=$1
$ ./Src/zsh --emulate ksh -c 'function f { typeset -g $1; typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"'
X var foo
X
But it seems to me it would be better to align with ksh's
behaviour in this instance.
--
Stephane
On Sun, Feb 11, 2024 at 1:00 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > One difference with ksh93: if the target variable was not set at > the time of the "typeset -n ref=target", then when ref is > assigned it may end up refering to a local target Hm. I think I can prevent that from happening (it wasn't intentional) but I don't know if it's feasible to actually create a new variable at the original scope, because of the implemention as a linked list of scopes for each parameter -- it's not clear how to insert a variable into the middle of the list. > It can be worked around with a typeset -g $1 before the typeset -n var=$1 Is that really sufficient? Might the -g not select the wrong scope if the string $1 was passed down a couple of layers? Seems to me you have to avoid re-using a variable regardless, especially if you do typeset -n var= (create what I've been calling a "placeholder") and then assign to var in some nested scope. This may particularly be true of the "for var in ..." special case for namerefs. Something I forgot to mention before regarding "_zslurp_var" -- if you're going to a assume "typeset -n" then you can also assume namespace syntax e.g. ".zslurp.var" But the zslurp I put in the distribution was also meant to be somewhat backward compatible so I didn't use namrefs or namespaces.
On Sun, Feb 11, 2024 at 10:14 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> but I don't know if it's feasible to actually create a new variable at
> the original scope, because of the implemention as a linked list of
> scopes for each parameter -- it's not clear how to insert a variable
> into the middle of the list.
In particular inserting one at global scope is messy if there is
already another at a local scope.
Re-establishing the original examples: On Sat, Feb 10, 2024 at 11:00 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ ./Src/zsh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' > X var X > > Compare with ksh: > > $ ksh -c 'function f { typeset -n ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"' > X var foo > X On Sun, Feb 11, 2024 at 8:14 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Sun, Feb 11, 2024 at 1:00 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > > > One difference with ksh93: if the target variable was not set at > > the time of the "typeset -n ref=target", then when ref is > > assigned it may end up refering to a local target [...] > > It can be worked around with a typeset -g $1 before the typeset -n var=$1 > > Is that really sufficient? Might the -g not select the wrong scope if > the string $1 was passed down a couple of layers? Here's what I'm concerned about: A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' Note the subtle difference of separating the declaration of "ref" from assignment to it. Then consider this variation: B) Src/zsh -c 'function f { typeset -n ref; typeset var=foo; ref=$1; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' Now the assignment is after the declaration of the local. And finally: C) Src/zsh -c 'function f { typeset var=foo; typeset -n ref=$1; ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' Now the declaration+assignment of the nameref is after the local. At least for the versions I have installed on Ubuntu 20.04, and with parens removed from (!) as appropriate, A) ksh prints "X var foo\nX"; mksh prints an error and does not create a nameref; Src/zsh prints "X var X". B) ksh prints "X var X"; mksh prints an error and does not create a nameref; Src/zsh prints "X var X". C) ksh prints "X var foo\nX"; mksh prints "X var X"; Src/zsh prints "X var X". So in part C, ksh "knows" that $1 is not just a string but an object that belongs to a surrounding scope. That's not going to happen in zsh for a long time, if ever. It's interesting that it matters that the assignment appears in the argument of "typeset" e.g. part C vs. part B. Incidentally if I change the delcaration to typeset -n ref= then ksh crashes in parts A and B. The behavior of mksh and Src/zsh is unchanged. Adding the "typeset -g $1" changes the outcome for zsh only for part A, where it prints "X var foo\nX" as Stephane indicated. (The other shells don't support -g.) Given that it's not possible to fix part C for zsh, and zsh agrees with ksh on part B and with mksh on B and C, is it worth making an effort to fix Stephane's original example along with part A ? > Seems to me you > have to avoid re-using a variable regardless, especially if you do > > typeset -n var= > > (create what I've been calling a "placeholder") and then assign to var > in some nested scope. I have not dived into what happens in ksh if the assignments in A and B happen in a nested function call. > This may particularly be true of the "for var in ..." special case for namerefs. The potential for confusion here seems large. If I do typeset -n up=$1 typeset var=foo for up in $2 $3 $4 $5; do ... what scope am I applying to each of the words in $2, $3, ... ? (E.g., suppose $3 is "var" instead of $1.) Does it change if I do typeset -n up instead? Should that act like part B where the declaration is separate from the assignment? What if I do for up in "$@" there? The ksh implementation seems both inconsistent and slightly magical. Anyway, the way to "fix" this is to implicitly perform the "typeset -g" when assigning to the nameref, and if that creates a new parameter then mark it unset. But that still may put the new parameter at global scope rather than at the scope from which $1 was passed; the expansion of $1 is just a string like any other scalar, and there's no magic to change that. It seems as though the more consistent behavior is what zsh is already doing. I'm unsure whether doing the implicit typeset only in ksh emulation mode is worthwhile as it gets us only partway there. Adding the "typeset -g" changes the behavior of a large number of tests in K01, too. Thoughts?
2024-02-17 19:26:07 -0800, Bart Schaefer: [...] > Here's what I'm concerned about: > > A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo; > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' > > Note the subtle difference of separating the declaration of "ref" from > assignment to it. I wouldn't mind for the typeset -n var=value form to be required like in mksh, like for typeset -r var=value. Note that the ksh93 man page mentions: ksh93> A nameref provides a convenient way to refer to the variable inside a ksh93> function whose name is passed as an argument to a function. For exam‐ ksh93> ple, if the name of a variable is passed as the first argument to a ksh93> function, the command typeset -n var=$1 (a.k.a. nameref var=$1) inside ksh93> the function causes references and assignments to var to be references ksh93> and assignments to the variable whose name has been passed to the func‐ ksh93> tion. Note that, for this to work, the positional parameter must be ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ksh93> assigned directly to the nameref as part of the declaration command, as ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ksh93> in the example above; only that idiom can allow one function to access ksh93> a local variable of another. For instance, typeset -n var; var=$1 ksh93> won't cross that barrier, nor will typeset foo=$1; typeset -n var=foo. > B) Src/zsh -c 'function f { typeset -n ref; typeset var=foo; ref=$1; > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' > > Now the assignment is after the declaration of the local. And finally: > > C) Src/zsh -c 'function f { typeset var=foo; typeset -n ref=$1; > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' > > Now the declaration+assignment of the nameref is after the local. The doc already says: zsh> A named parameter declared with the '-n' option to any of the 'typeset' zsh> commands becomes a reference to a parameter in scope at the time of zsh> assignment to the named reference, which may be at a different call zsh> level than the declaring function. For this reason, it is good practice ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ zsh> to declare a named reference as soon as the referent parameter is in ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ zsh> scope, and as early as possible in the function if the reference is to a ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ zsh> parameter in a calling scope. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Which I'd think covers that. > At least for the versions I have installed on Ubuntu 20.04, and with > parens removed from (!) as appropriate, Note that you don't want to use that one. IIRC, the one on Ubuntu 20.04 is based on the now discontinued ksh2020 which was based on the ksh93v- beta release from AT&T. It was deemed too buggy and abandoned. You can find a maintained version based on ksh93u+ at https://github.com/ksh93/ksh which includes hundreds of bug fixes (including from Solaris, RedHat and some backported from ksh2020). [...] > So in part C, ksh "knows" that $1 is not just a string but an object > that belongs to a surrounding scope. Note that ksh93 is special in that it does static scoping. There's not a stack of scopes like in shells that do dynamic scoping (ksh88, zsh, bash...), just a global scope and the local scope of the function. So having namerefs there is critical as it's the only way for a function to be able to access the variables of its caller (though one can also export a local variable to make it available to callees). I suppose zsh's private variables are similar to that. [...] > Given that it's not possible to fix part C for zsh, and zsh agrees > with ksh on part B and with mksh on B and C, is it worth making an > effort to fix Stephane's original example along with part A ? [...] Seems to me if we have to work around it by namespacing variables or use argv like in functions/regexp-replace, that defeats the purpose of those namerefs. Contrary to mksh or bash, it alread gets this right: $ ./Src/zsh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"' inner l=foo outer l=new $ ksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"' inner l=foo outer l=new $ mksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"' inner l=new outer l=before $ bash -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; l=before; f l; echo "outer l=$l"' inner l=new outer l=before The only remaining problem is when the refered variable is not set/declared. $ ksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"' inner l=foo outer l=new $ ./Src/zsh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"' inner l=new outer l= $ mksh -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"' inner l=new outer l= $ bash -c 'function f { typeset -n v=$1; typeset l=foo; v=new; echo "inner l=$l";}; f l; echo "outer l=$l"' inner l=new outer l= [...] > The potential for confusion here seems large. If I do > > typeset -n up=$1 > typeset var=foo > for up in $2 $3 $4 $5; do ... > > what scope am I applying to each of the words in $2, $3, ... ? (E.g., > suppose $3 is "var" instead of $1.) Does it change if I do I find that special behaviour of "for" quite surprising. I wasn't aware ksh93 and bash did it as well. I find mksh's behaviour more consistent though I can see how it can be difficult to do without. $ mksh -c 'a=1 b=2; f() { typeset name; for name do typeset -n v=$name; echo "$name=$v"; done; }; f a b' a=1 b=2 Won't work for "f name". Seems buggy in ksh93: $ ksh -c 'a=1 b=2; typeset -n v; for v in a b; do echo "v=$v a=$a b=$b"; done' v=1 a=1 b=2 v=1 a=1 b=2 $ ksh -c 'a=1 b=2; typeset -n v; for v in a b; do echo "v=$v a=$a b=$b ${!v}"; done' v=1 a=1 b=2 a v=2 a=1 b=2 b I would think that would be nowhere as common a usage as the ones where a function is meant to return something into the variable(s) passed as argument. I can see how it can still be difficult to do without namespacing. For instance having a function that does option processing à la print -v var would have to do: print() { local _print_{opt,raw=false} while getopts rv: _print_opt; do case $_print_opt in (r) _print_raw=true;; (v) typeset -n result=$OPTARG;; esac done } To avoid clashes (or the equivalent with .print.opt instead thouh I'm not sure I see the benefit if that means one needs to use ${.print.opt} instead of just $_print_opt). But that would already be better than what you get in bash or mksh. -- Stephane
On Tue, Feb 20, 2024 at 1:05 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2024-02-17 19:26:07 -0800, Bart Schaefer: > [...] > > A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo; > > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' > > Note that the ksh93 man page mentions: > > ksh93> Note that, for this to work, the positional parameter must be > ksh93> assigned directly to the nameref as part of the declaration command > ksh93> [...] For instance, typeset -n var; var=$1 > ksh93> won't cross that barrier And yet, after replacing ksh2020 with Version AJM 93u+ 2012-08-01 (I had already done this on a different workstation but had forgotten about it on the one where I tested): ksh -c 'function f { typeset -n ref; ref=$1; typeset var=foo; ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"' X var foo X So it sure looks like the barrier was crossed in a case where the doc stated it would not be. With 93u+ "typeset -n ref=;" produces ksh[2]: f[1]: typeset: : invalid variable name rather than crashing. > Note that ksh93 is special in that it does static scoping. > > I suppose zsh's private variables are similar to that. Sort of ... private variables are never accessible in another scope, even with namerefs. > [...] > > Given that it's not possible to fix part C for zsh, and zsh agrees > > with ksh on part B and with mksh on B and C, is it worth making an > > effort to fix Stephane's original example along with part A ? > [...] > > Seems to me if we have to work around it by namespacing > variables or use argv like in functions/regexp-replace, that > defeats the purpose of those namerefs. You don't have to use namespacing, you just have to be sure that either the caller has declared the parameter that it wants referenced, or that the called function does the "typeset -g $1" preliminary (and follows the doc recommendation about doing the "typeset -n" early). There's nothing really to be done about the other cases, and particularly not about specifying the (dynamic) scope of the referent. > The only remaining problem is when the refered variable > is not set/declared. It IS possible to make that "typeset -g" implicit, but that has other potential effects such as creating an (unset) parameter in global scope (triggering WARN_CREATE_GLOBAL?). If the parameter already has been declared in a calling function scope, then even with the -g, the nameref will point to that one, even if it's more than one level up the call chain. Which ought to be what you expect from dynamic scoping. > Contrary to mksh or bash, it alread gets this right: Yes, as soon as an existing parameter is found the nameref records what scope contains that parameter and looks for it there when (de)referenced. The glitchy bit is when there is no existing parameter from which to obtain a scope. I'm reluctant to try to make this too magical, which is why I lean toward asking function authors to explicitly do the "typeset -g $1" when that's what they mean. Could certainly add this to the doc. Or maybe add an option to "typeset" to turn this on or off, though I'm not sure the existing function call framework provides all the needed info. > > The potential for confusion here seems large. If I do > > > > typeset -n up=$1 > > typeset var=foo > > for up in $2 $3 $4 $5; do ... > > > > what scope am I applying > > I find that special behaviour of "for" quite surprising. So did I. I await with gritted teeth the first time that Ray forgets to declare his loop variable with a nameref of the same name in scope. Mixing namerefs with dynamic scoping does have a number of potential gotchas. > I wasn't aware ksh93 and bash did it as well. If they didn't, I wouldn't have implemented it. > Seems buggy in ksh93: Not all that surprising given the potential confusions. > $ mksh -c 'a=1 b=2; f() { typeset name; for name do typeset -n v=$name; echo "$name=$v"; done; }; f a b' > a=1 > b=2 > > Won't work for "f name". There's probably a way to do it with "f name a b" + shift, but I'm not going to work it out. > I can see how it can still be difficult to do without > namespacing. For instance having a function that does option > processing à la print -v var would have to do: zparseopts and/or a declared assoc can get you a long way in this sort of example while minimizing the opportunity for clashes.
2024-02-20 14:30:56 -0800, Bart Schaefer: > On Tue, Feb 20, 2024 at 1:05 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > > > 2024-02-17 19:26:07 -0800, Bart Schaefer: > > [...] > > > A) Src/zsh -c 'function f { typeset -n ref; ref=$1; typeset var=foo; > > > ref=X; echo "$ref ${(!)ref} $var"; }; f var; echo "$var"' > > > > Note that the ksh93 man page mentions: > > > > ksh93> Note that, for this to work, the positional parameter must be > > ksh93> assigned directly to the nameref as part of the declaration command > > ksh93> [...] For instance, typeset -n var; var=$1 > > ksh93> won't cross that barrier > > And yet, after replacing ksh2020 with Version AJM 93u+ 2012-08-01 With recent maintained versions with all the bug fixes, it would look something like: $ ./arch/linux.i386-64/bin/ksh -c 'echo ${.sh.version}' Version AJM 93u+m/1.1.0-alpha+b8e3d2a4 2024-02-17 The one on Ubuntu 22.04 has: $ ksh -c 'echo ${.sh.version}' Version AJM 93u+m/1.0.0-beta.2 2021-12-17 93u+ 2012-08-01 would be the one from AT&T without the later bug fixes. > ksh -c 'function f { typeset -n ref; ref=$1; typeset var=foo; > ref=X; echo "$ref ${!ref} $var"; }; f var; echo "$var"' > X var foo > X > > So it sure looks like the barrier was crossed in a case where the doc > stated it would not be. But in that case, you're referencing a variable from the global scope, not the local scope from a parent function. [...] > > The only remaining problem is when the refered variable > > is not set/declared. > > It IS possible to make that "typeset -g" implicit, but that has other > potential effects such as creating an (unset) parameter in global > scope (triggering WARN_CREATE_GLOBAL?). If the parameter already has > been declared in a calling function scope, then even with the -g, the > nameref will point to that one, even if it's more than one level up > the call chain. Which ought to be what you expect from dynamic > scoping. > > > Contrary to mksh or bash, it alread gets this right: > > Yes, as soon as an existing parameter is found the nameref records > what scope contains that parameter and looks for it there when > (de)referenced. The glitchy bit is when there is no existing > parameter from which to obtain a scope. I'm reluctant to try to make > this too magical, which is why I lean toward asking function authors > to explicitly do the "typeset -g $1" when that's what they mean. > Could certainly add this to the doc. [...] I have no notion of how zsh variables are implemented so I'm probably talking rubish and it's hard to think this whole thing through, but it seems to me that for a nameref variable, upon typeset -n ref=var, zsh should record at that point what scope or level of the stack the var is to be found and keep that record for the duration of the scope where that nameref is defined or until the next typeset -n ref=anything in the same scope. Like: - if var is found (declared) in the current scope, record that it's that of that scope that it is to get the value of or set or unset (and reset again thereafter) - if not, repeat for parent scope - if not found at all, record that it's to be at the global scope. Like in: f() { typeset a; unset a; g; echo "$a"; } g() { h; } h() { typeset b=x typeset -n r1=a r2=b r3=c typeset a=foo c=bar r1=x r2=y r3=z unset r{1..3} r1=x r2=y r3=z } f r1=x sets the $a of f, r2=y sets the $b of h, and r3=z sets a global $c both times. [...] > So did I. I await with gritted teeth the first time that Ray forgets > to declare his loop variable with a nameref of the same name in scope. [...] Yes, difficult decision to choose between - cleaner design, align with mksh - align with original design in ksh93 and bash which already copied it (well, that part). -- Stephane
Lost track of this for a bit .. On Wed, Feb 21, 2024 at 12:12 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > Like: > - if var is found (declared) in the current scope, record that > it's that of that scope that it is to get the value of or set > or unset (and reset again thereafter) > - if not, repeat for parent scope > - if not found at all, record that it's to be at the global > scope. That's where this whole discussion started: It's not good enough to "record that it's to be global" because if a local of the same name is later declared, there is no way to insert a new global "above" that, it's akin to attempting to allocate new space in your caller's stack frame in C (except there's one stack for each parameter name). The workaround in shell code is to start with "typeset -g a" (using your example code) to initialize the top stack frame. The workaround in underlying C code would be to have "typeset -n r1=a" (again your example) implicitly create the global $a as early as possible. > f() { typeset a; unset a; g; echo "$a"; } > g() { h; } > h() { > typeset b=x > typeset -n r1=a r2=b r3=c > typeset a=foo c=bar > r1=x r2=y r3=z > unset r{1..3} > r1=x r2=y r3=z > } > f > > r1=x sets the $a of f, r2=y sets the $b of h, and r3=z sets a > global $c both times. It works that way for "the $a of f" and "the $b of h", but r3 assigns "the $c of h" both times. Change that to: h() { typeset b=x typeset -n r1=a r2=b r3=c typeset a=foo r1=x r2=y r3=z typeset c=bar unset r{1..3} r1=x r2=y r3=z } And now the first r3=z creates the global $c so the second r3=z continues to refer to it.
2024-02-28 21:16:13 -0800, Bart Schaefer:
[...]
> That's where this whole discussion started: It's not good enough to
> "record that it's to be global" because if a local of the same name is
> later declared, there is no way to insert a new global "above" that,
> it's akin to attempting to allocate new space in your caller's stack
> frame in C (except there's one stack for each parameter name). The
> workaround in shell code is to start with "typeset -g a" (using your
> example code) to initialize the top stack frame. The workaround in
> underlying C code would be to have "typeset -n r1=a" (again your
> example) implicitly create the global $a as early as possible.
[...]
Couldn't typeset -n ref=var, when it finds that var is not set
create one in global scope with a "unset-but-referenced" flag
and maybe a reference count associated with it and that can be
unset but not removed until the reference count reaches 0 when
no nameref point to it?
Another (similar) problem:
Imagine a function meant to create a variable whose name is
passed as argument *as a scalar*:
$ ./Src/zsh -c 'f() { typeset -n ref=$1; local var=2; ref=3; }; typeset -A var; f var; echo $var'
f: var: attempt to set slice of associative array
Doesn't work if the variable was previously set as a hash.
Work around would be to unset it first but:
$ ./Src/zsh -c 'f() { typeset -n ref=$1; unset ref; local var=2; ref=3; }; typeset -A var; f var; echo $var'
<empty>
$ref ended up pointing to the local var again instead of the one
in global scope it was intended to reference.
Comparing with ksh93:
$ ksh -c 'function f { typeset -n ref=$1; typeset var=2; ref=3; }; typeset -A var=([a]=b); f var; typeset -p var'
typeset -A var=([0]=3 [a]=b)
$ ksh -c 'function f { typeset -n ref=$1; unset ref; typeset var=2; ref=3; }; typeset -A var=([a]=b); f var; typeset -p var'
var=3
--
Stephane
On Fri, Mar 1, 2024 at 10:22 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2024-02-28 21:16:13 -0800, Bart Schaefer: > > [...] The workaround in > > underlying C code would be to have "typeset -n r1=a" (again your > > example) implicitly create the global $a as early as possible. > [...] > > Couldn't typeset -n ref=var, when it finds that var is not set > create one in global scope with a "unset-but-referenced" flag Yes, that's what I just said. > and maybe a reference count associated with it Reference counting ends up being a pretty significant refactoring of the parameter code, we started down that path some time ago and didn't get satisfactory results. > Comparing with ksh93: You really can't, because ksh93 uses static scoping of locals where zsh uses dynamic scoping, and ksh also has special handling of "typeset -n ref=$1" (and other positionals) that is simply never going to happen in zsh. The upshot is that if we go with implicitly creating a variable in global scope, it doesn't solve the other reference issues -- so the question is, when we document the pitfalls of using upward-pointing named references, do we also break the dynamic scope paradigm for the specific case of nonexistent globals? > Imagine a function meant to create a variable whose name is passed as argument That can't cause a serious difficulty because if you have such a function, the caller must know that the argument is a parameter name, and therefore must be able to create the parameter in the correct scope in advance. It's just not as syntactically sugary as ksh. Note that all of these situations also exist with builtins like zstat and even "print -v" that accept parameter names as arguments.
On Fri, Mar 1, 2024 at 10:22 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ ./Src/zsh -c 'f() { typeset -n ref=$1; unset ref; local var=2; ref=3; }; typeset -A var; f var; echo $var' > <empty> That's an actual bug, regardless of the rest of this discussion. diff --git a/Src/builtin.c b/Src/builtin.c index 44dfed651..83144677b 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -3932,8 +3932,14 @@ bin_unset(char *name, char **argv, Options ops, int func) } } else { if (!OPT_ISSET(ops,'n')) { + int ref = (pm->node.flags & PM_NAMEREF); if (!(pm = (Param)resolve_nameref(pm, NULL))) continue; + if (ref && pm->level < locallevel) { + /* Just mark unset, do not remove from table */ + pm->node.flags |= PM_DECLARED|PM_UNSET; + continue; + } } if (unsetparam_pm(pm, 0, 1)) returnval = 1;
On Wed, Feb 28, 2024 at 9:16 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > it's akin to attempting to allocate new space in your caller's stack > frame in C (except there's one stack for each parameter name). On Fri, Mar 1, 2024 at 12:34 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > zsh uses dynamic scoping, and ksh also has special handling of > "typeset -n ref=$1" (and other positionals) that is simply never going > to happen in zsh. How about this: Instead of magically turning a positional into a reference into the surrounding scope, let's make it explicit. I propose that when the -u flag is combined with the -n flag, the named reference refers to the enclosing scope, even if there's a local of that name already in scope. Otherwise (without -u) normal dynamic scoping applies. So in an example like this: y () { typeset $1=Y typeset -nu upref=$1 upref=UP echo In y: typeset -p $1 } z () { local var y var echo In z: typeset -p var } The output would be: In y: typeset var=Y In z: typeset var=UP However, there's no good way around the stack problem, so removing the declaration in z -- z () { y var echo In z: typeset -p var } -- will cause an error at the "upref=UP" assignment in y. Or that assignment could just silently fail, which already happens in a couple of existing tests. The decision on that probably rests on what to do with prefix assignments, e.g., if it's an error then upref=UP /bin/echo bad would print a message and not execute at all. If instead the assignment does nothing but set $?, then /bin/echo will execute and its status will determine $?.
[-- Attachment #1: Type: text/plain, Size: 870 bytes --] On Fri, Mar 1, 2024 at 11:29 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > How about this: Instead of magically turning a positional into a > reference into the surrounding scope, let's make it explicit. Here's a patch to implement this, including doc with an example. Of particular note is the last sentence here: +To force a named reference to refer to the outer scope, even if a local +has already been declared, add the tt(-u) option when declaring the +named reference. In this case var(rname) should already exist in the +outer scope, otherwise the behavior of assignment through var(pname) +is not defined and may change the scope of the reference or fail with +a status of 1. I went with assignment silently failing rather than printing an error message in the cases where a parameter in the calling scope cannot be created. [-- Attachment #2: typeset-nu.txt --] [-- Type: text/plain, Size: 13162 bytes --] diff --git a/Doc/Zsh/builtins.yo b/Doc/Zsh/builtins.yo index 6318053d8..7a9684ac8 100644 --- a/Doc/Zsh/builtins.yo +++ b/Doc/Zsh/builtins.yo @@ -2052,13 +2052,17 @@ cindex(named reference) cindex(reference, named) The flag tt(-n) creates a em(named reference) to another parameter. The second parameter need not exist at the time the reference is -created. Only tt(-g) and tt(-r) may be used in conjunction with +created. Only tt(-g), tt(-u), and tt(-r) may be used in conjunction with tt(-n). The var(name) so created may not be an array element nor use a subscript, but the var(value) assigned may be any valid parameter name syntax, even a subscripted array element (including an associative array element) or an array slice, which is evaluated when the named reference is expanded. It is an error for a named reference to refer -to itself, even indirectly through a chain of references. +to itself, even indirectly through a chain of references. When tt(-u) +is applied to a named reference, the parameter identified by var(value) +is always found in the calling function scope rather than the current +local scope. In this case, if there is no such parameter in the calling +scope, assignments to the named reference may fail, setting tt($?) to 1. See ifzman(zmanref(zshexpn))ifnzman(noderef(Parameter Expansion)) and ifzman(zmanref(zshparam))ifnzman(noderef(Parameters)) for details of the behavior of named references. diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo index 2acfd08c9..183ca6e03 100644 --- a/Doc/Zsh/expn.yo +++ b/Doc/Zsh/expn.yo @@ -1600,6 +1600,25 @@ example(tt(before local: OUTER) tt(by reference: OUTER) tt(after func: RESULT)) +To force a named reference to refer to the outer scope, even if a local +has already been declared, add the tt(-u) option when declaring the +named reference. In this case var(rname) should already exist in the +outer scope, otherwise the behavior of assignment through var(pname) +is not defined and may change the scope of the reference or fail with +a status of 1. Example of correct usage: +ifzman() +example(tt(caller=OUTER) +tt(func LPAR()RPAR() {) +tt( print before local: $caller) +tt( local caller=INNER) +tt( print after local: $caller) +tt( typeset -n -u outer=$1) +tt( print by reference: $outer) +tt( outer=RESULT) +tt(}) +tt(func caller) +tt(print after func: $caller)) + Note, however, that named references to em(special) parameters acquire the behavior of the special parameter, regardless of the scope where the reference is declared. diff --git a/Doc/Zsh/func.yo b/Doc/Zsh/func.yo index d4914df7a..7b71e34e9 100644 --- a/Doc/Zsh/func.yo +++ b/Doc/Zsh/func.yo @@ -31,10 +31,12 @@ referent parameter is in scope, and as early as possible in the function if the reference is to a parameter in a calling scope. A typical use of named references is to pass the name -of the referent as a positional parameter. For example, +of the referent as a positional parameter. In this case it is +good practice to use the tt(-u) option to reference the calling +scope. For example, ifzman() example(pop+LPAR()RPAR() { - local -n ref=$1 + local -nu ref=$1 local last=$ref[$#ref] ref[$#ref]=LPAR()RPAR() print -r -- $last @@ -43,9 +45,10 @@ array=LPAR() a list of five values RPAR() pop array) prints the word `tt(values)' and shortens `tt($array)' to -`tt(LPAR() a list of five RPAR())'. There are no local parameters in -tt(pop) at the time `tt(ref=$1)' is assigned, so `tt(ref)' becomes a -reference to `tt(array)' in the caller. +`tt(LPAR() a list of five RPAR())'. With tt(-nu), `tt(ref)' becomes a +reference to `tt(array)' in the caller. There are no local parameters in +tt(pop) at the time `tt(ref=$1)' is assigned, so in this example tt(-u) +could have been omitted, but it makes the intention clear. Functions execute in the same process as the caller and share all files diff --git a/Doc/Zsh/mod_ksh93.yo b/Doc/Zsh/mod_ksh93.yo index 9cd708d10..7508758aa 100644 --- a/Doc/Zsh/mod_ksh93.yo +++ b/Doc/Zsh/mod_ksh93.yo @@ -12,7 +12,7 @@ The single builtin provided by this module is: startitem() findex(nameref) cindex(named references, creating) -item(tt(nameref) [ tt(-r) ] var(pname)[tt(=)var(rname)])( +item(tt(nameref) [ tt(-gur) ] var(pname)[tt(=)var(rname)])( Equivalent to tt(typeset -n )var(pname)tt(=)var(rname) However, tt(nameref) is a builtin command rather than a reserved word, diff --git a/Etc/FAQ.yo b/Etc/FAQ.yo index 145ef02c9..4a86050e6 100644 --- a/Etc/FAQ.yo +++ b/Etc/FAQ.yo @@ -1025,6 +1025,13 @@ label(210) HIT:SPOT ) + Dynamic scoping applies to named references, so for example a named + reference declared in global scope may be used in function scopes. + In ksh, local parameters have static scope, so named references in + zsh may have side-effects that do not occur in ksh. To limit those + effects, mytt(zmodload zsh/param/private) and declare all named + references mytt(private). + Named references may be used in zsh versions later than 5.9. sect(What is zsh's support for non-forking command substitution?) diff --git a/Src/Modules/ksh93.c b/Src/Modules/ksh93.c index 9af5e1d69..6760cbca0 100644 --- a/Src/Modules/ksh93.c +++ b/Src/Modules/ksh93.c @@ -38,7 +38,7 @@ */ static struct builtin bintab[] = { - BUILTIN("nameref", BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "gr", "n") + BUILTIN("nameref", BINF_ASSIGN, (HandlerFunc)bin_typeset, 0, -1, 0, "gur", "n") }; #include "zsh.mdh" diff --git a/Src/builtin.c b/Src/builtin.c index 83144677b..ba9cb03c2 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -2699,7 +2699,7 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func) off |= bit; } if (OPT_MINUS(ops,'n')) { - if ((on|off) & ~PM_READONLY) { + if ((on|off) & ~(PM_READONLY|PM_UPPER)) { zwarnnam(name, "no other attributes allowed with -n"); return 1; } diff --git a/Src/exec.c b/Src/exec.c index d85adbea5..8be7bf17d 100644 --- a/Src/exec.c +++ b/Src/exec.c @@ -1322,7 +1322,7 @@ execsimple(Estate state) fputc('\n', xtrerr); fflush(xtrerr); } - lv = (errflag ? errflag : cmdoutval); + lv = (errflag ? errflag : cmdoutval ? cmdoutval : lastval); } else { int q = queue_signal_level(); dont_queue_signals(); @@ -2604,6 +2604,7 @@ addvars(Estate state, Wordcode pc, int addflags) opts[ALLEXPORT] = allexp; } else pm = assignsparam(name, val, myflags); + lastval = !pm; /* zerr("%s: assignment failed", name); */ if (errflag) { state->pc = opc; return; @@ -2628,7 +2629,9 @@ addvars(Estate state, Wordcode pc, int addflags) } fprintf(xtrerr, ") "); } - assignaparam(name, arr, myflags); + ; + lastval = !assignaparam(name, arr, myflags); + /* zerr("%s: array assignment failed", name); */ if (errflag) { state->pc = opc; return; diff --git a/Src/params.c b/Src/params.c index 064dbd2bc..790c978f2 100644 --- a/Src/params.c +++ b/Src/params.c @@ -1060,8 +1060,7 @@ createparam(char *name, int flags) "BUG: local parameter is not unset"); oldpm = lastpm; } - } else - flags |= PM_NAMEREF; + } } DPUTS(oldpm && oldpm->level > locallevel, @@ -6264,10 +6263,12 @@ resolve_nameref(Param pm, const Asgment stop) } } else if ((hn = gethashnode2(realparamtab, seek))) { if (pm) { - if (!(stop && (stop->flags & (PM_LOCAL)))) - hn = (HashNode)upscope((Param)hn, - ((pm->node.flags & PM_NAMEREF) ? - pm->base : ((Param)hn)->level)); + if (!(stop && (stop->flags & (PM_LOCAL)))) { + int scope = ((pm->node.flags & PM_NAMEREF) ? + ((pm->node.flags & PM_UPPER) ? -1 : + pm->base) : ((Param)hn)->level); + hn = (HashNode)upscope((Param)hn, scope); + } /* user can't tag a nameref, safe for loop detection */ pm->node.flags |= PM_TAGGED; } @@ -6313,11 +6314,13 @@ setloopvar(char *name, char *value) static void setscope(Param pm) { - if (pm->node.flags & PM_NAMEREF) { + queue_signals(); + if (pm->node.flags & PM_NAMEREF) do { Param basepm; struct asgment stop; char *refname = GETREFNAME(pm); char *t = refname ? itype_end(refname, INAMESPC, 0) : NULL; + int q = queue_signal_level(); /* Temporarily change nameref to array parameter itself */ if (t && *t == '[') @@ -6327,9 +6330,11 @@ setscope(Param pm) stop.name = ""; stop.value.scalar = NULL; stop.flags = PM_NAMEREF; - if (locallevel) + if (locallevel && !(pm->node.flags & PM_UPPER)) stop.flags |= PM_LOCAL; + dont_queue_signals(); /* Prevent unkillable loops */ basepm = (Param)resolve_nameref(pm, &stop); + restore_queue_signals(q); if (t) { pm->width = t - refname; *t = '['; @@ -6342,7 +6347,7 @@ setscope(Param pm) if (upscope(pm, pm->base) == pm) { zerr("%s: invalid self reference", refname); unsetparam_pm(pm, 0, 1); - return; + break; } pm->node.flags &= ~PM_SELFREF; } else if (pm->base == pm->level) { @@ -6350,7 +6355,7 @@ setscope(Param pm) strcmp(pm->node.nam, refname) == 0) { zerr("%s: invalid self reference", refname); unsetparam_pm(pm, 0, 1); - return; + break; } } } else if ((t = GETREFNAME(basepm))) { @@ -6358,7 +6363,7 @@ setscope(Param pm) strcmp(pm->node.nam, t) == 0) { zerr("%s: invalid self reference", refname); unsetparam_pm(pm, 0, 1); - return; + break; } } } else @@ -6378,7 +6383,8 @@ setscope(Param pm) zerr("%s: invalid self reference", refname); unsetparam_pm(pm, 0, 1); } - } + } while (0); + unqueue_signals(); } /**/ @@ -6390,6 +6396,8 @@ upscope(Param pm, int reflevel) pm = up; up = up->old; } + if (reflevel < 0 && locallevel > 0) + return pm->level == locallevel ? up : pm; return pm; } diff --git a/Test/K01nameref.ztst b/Test/K01nameref.ztst index ff48e2289..6db819389 100644 --- a/Test/K01nameref.ztst +++ b/Test/K01nameref.ztst @@ -492,7 +492,6 @@ F:unexpected side-effects of previous tests } typeset -p ptr2 1:up-reference part 5, stacked namerefs, end not in scope -F:What is the correct behavior for the scope of ptr1? >typeset -n ptr1=ptr2 >typeset -n ptr2 >ptr1=ptr2 @@ -529,6 +528,49 @@ F:Same test, should part 5 output look like this? >typeset -n ptr2 >typeset ptr2=val + () { + () { + local var + typeset -nu ptr1=var + ptr1=outer && print -u2 assignment expected to fail + typeset -n ptr2=var + ptr2=inner + typeset -n + printf "%s=%s\n" ptr1 "$ptr1" ptr2 "$ptr2" + } + typeset -p var + } + typeset -p var +1:up-reference part 7, upscope namerefs, end not in scope +>ptr1=var +>ptr2=var +>ptr1= +>ptr2=inner +*?*typeset*: no such variable: var +?*typeset*: no such variable: var + + typeset var + () { + () { + local var + typeset -nu ptr1=var + ptr1=outer || print -u2 assignment expected to succeed + typeset -n ptr2=var + ptr2=inner + typeset -n + printf "%s=%s\n" ptr1 "$ptr1" ptr2 "$ptr2" + } + typeset -p var + } + typeset -p var +0:up-reference part 8, upscope namerefs, end in scope +>ptr1=var +>ptr2=var +>ptr1=outer +>ptr2=inner +>typeset -g var=outer +>typeset var=outer + if zmodload zsh/parameter; then () { zmodload -u zsh/parameter @@ -539,7 +581,7 @@ F:Same test, should part 5 output look like this? } else ZTST_skip='Cannot zmodload zsh/parameter, skipping autoload test' fi -0:up-reference part 3, autoloading with hidden special +0:up-reference part 9, autoloading with hidden special >nameref-local-nameref-local >typeset -h parameters @@ -762,12 +804,17 @@ F:relies on global TYPESET_TO_UNSET in %prep bar=xx typeset -n foo=bar - () { typeset -n foo; foo=zz; foo=zz; print $bar $zz } + () { + typeset -n foo; foo=zz + foo=zz || print -u2 foo: assignment failed + print $bar $zz + } () { typeset -n foo; foo=zz; local zz; foo=zz; print $bar $zz } 0:regression: local nameref may not in-scope a global parameter F:previously this could create an infinite recursion and crash >xx >xx zz +*?*foo: assignment failed typeset -nm foo=bar 1:create nameref by pattern match not allowed diff --git a/Test/V10private.ztst b/Test/V10private.ztst index efa346002..ed51316f3 100644 --- a/Test/V10private.ztst +++ b/Test/V10private.ztst @@ -378,7 +378,7 @@ F:Here ptr1 finds private ptr2 by scope mismatch typeset -p ptr1 ptr2 typeset val=LOCAL () { - ptr1=val # This is a silent no-op, why? + ptr1=val || print -u2 ptr1: assignment failed typeset -n printf "%s=%s\n" ptr1 "$ptr1" ptr2 "$ptr2" } @@ -388,7 +388,6 @@ F:Here ptr1 finds private ptr2 by scope mismatch 1:up-reference for private namerefs, end not in scope F:See K01nameref.ztst up-reference part 5 F:Here ptr1 finds private ptr2 by scope mismatch -F:Assignment silently fails, is that correct? >typeset -n ptr1=ptr2 >typeset -hn ptr2='' >ptr1=ptr2 @@ -396,6 +395,7 @@ F:Assignment silently fails, is that correct? >ptr2= >typeset -n ptr1=ptr2 >typeset -hn ptr2='' +*?*ptr1: assignment failed *?*no such variable: ptr2 typeset ptr2
2024-03-01 15:28:58 -0800, Bart Schaefer:
> On Fri, Mar 1, 2024 at 10:22 AM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > $ ./Src/zsh -c 'f() { typeset -n ref=$1; unset ref; local var=2; ref=3; }; typeset -A var; f var; echo $var'
> > <empty>
>
> That's an actual bug, regardless of the rest of this discussion.
[...]
If that one can be fixed then why can't the main one, namely:
assign() {
typeset -n var=$1
typeset value=$2
var=$value
}
assign value x
echo $value
(or assign var x returning an error)
be fixed the same way, that is typeset -n ref=var doing the
equivalent of assigning var and unsetting afterwards if it is
previously not found?
--
Stephane
On Sun, Mar 3, 2024 at 5:44 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > If that one can be fixed then why can't the main one Because the outer-scope parameter object already exists in "that one". As I keep saying, the problem is with creating a new parameter at a lower scope (counting 0 as the global scope) when a parameter of that name already exists at the current (greater than 0) scope. > typeset -n var=$1 > typeset value=$2 If the parameter is magically created "in advance" (at the point of "typeset -n") then it only works for global scalars -- if you try to create an array using "var" you run into baffling stuff like this: assign:3: value: attempt to assign array value to non-array I think it's much more consistent for dynamic scoping to apply at the time $var is referenced or assigned. The patch in workers/52650 allows something like this to work: assign() { typeset value=$2 typeset -nu var=$1 # ignore local $value case ${(t)var} in (*scalar*) var=$value;; (*array*) var=( $value );; (*) print -u2 cannot assign;; esac } declare -a value assign value x If a function wants to be able to actually add things to its calling scope without fear of name clashes, it has to use unique local names (e.g., a namespace). But with dynamic scoping it remains the case that you can't add a name to the global scope if any function in the call chain has pre-empted that name -- named references don't change this basic rule.
2024-03-03 11:04:03 -0800, Bart Schaefer: [...] > > typeset -n var=$1 > > typeset value=$2 > > If the parameter is magically created "in advance" (at the point of > "typeset -n") then it only works for global scalars Why would we need it for any other scope? If the parameter is not found in any scope, then it must be created in global scope. That's what "read var" does. Many if not most of the usages of namerefs I've come across were to implement "read"-like commands that return or can return something in a variable whose name is given by the caller (and that can create the variable if needed). > if you try to > create an array using "var" you run into baffling stuff like this: > > assign:3: value: attempt to assign array value to non-array > assign() { if [[ -v $1 ]]; then typeset -n var=$1 else eval "$1=" typeset -n var=$1 unset var fi typeset value=$2 var=(var now set with value $value) } assign value x typeset -p value Where an approximation of what I'm suggesting is implemented in zsh seems to work fine (with your patch applied). It segfaults with zsh -x though $ ./Src/zsh ~/a typeset -a value=( var now set with value x ) $ gdb -q --args ./Src/zsh -x ~/a Reading symbols from ./Src/zsh... (gdb) r Starting program: /home/chazelas/install/cvs/zsh/Src/zsh -x /home/chazelas/a [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". +/home/chazelas/a:12> assign value x +assign:1> [[ -v value ]] +assign:4> eval 'value=' +(eval):1> value='' +assign:5> typeset -n var=value +assign:6> unset var +assign:8> typeset value=x Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7cfb3fe in __GI___libc_free (mem=0x555555600) at ./malloc/malloc.c:3368 3368 ./malloc/malloc.c: No such file or directory. (gdb) bt #0 0x00007ffff7cfb3fe in __GI___libc_free (mem=0x555555600) at ./malloc/malloc.c:3368 #1 0x00005555555d5105 in zsfree (p=0x555555600 <error: Cannot access memory at address 0x555555600>) at mem.c:1878 #2 0x0000555555625482 in freearray (s=0x555555683d88) at utils.c:4100 #3 0x00005555555ea0e6 in arrsetfn (pm=0x555555683da0, x=0x555555683f40) at params.c:3996 #4 0x00005555555e5691 in setarrvalue (v=0x7fffffffc360, val=0x555555683f40) at params.c:2889 #5 0x00005555555e8537 in assignaparam (s=0x7ffff7fac733 "", val=0x555555683f40, flags=6) at params.c:3538 #6 0x00005555555943f7 in addvars (state=0x7fffffffcda0, pc=0x55555567d508, addflags=0) at exec.c:2631 #7 0x000055555559639b in execcmd_exec (state=0x7fffffffcda0, eparams=0x7fffffffc9b0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3363 #8 0x00005555555926c8 in execpline2 (state=0x7fffffffcda0, pcode=643, how=18, input=0, output=0, last1=0) at exec.c:2016 #9 0x000055555559125c in execpline (state=0x7fffffffcda0, slcode=10242, how=18, last1=0) at exec.c:1741 #10 0x000055555559049e in execlist (state=0x7fffffffcda0, dont_change_job=1, exiting=0) at exec.c:1495 #11 0x000055555558fabe in execode (p=0x55555567d3f0, dont_change_job=1, exiting=0, context=0x5555556310da "shfunc") at exec.c:1276 #12 0x000055555559e87a in runshfunc (prog=0x55555567d3f0, wrap=0x0, name=0x7ffff7fac170 "assign") at exec.c:6164 #13 0x000055555559dd8a in doshfunc (shfunc=0x55555567d570, doshargs=0x7ffff7fb7b18, noreturnval=0) at exec.c:6010 #14 0x000055555559ca1d in execshfunc (shf=0x55555567d570, args=0x7ffff7fb7b18) at exec.c:5548 #15 0x0000555555598a9c in execcmd_exec (state=0x7fffffffdad0, eparams=0x7fffffffd6e0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:4075 #16 0x00005555555926c8 in execpline2 (state=0x7fffffffdad0, pcode=835, how=18, input=0, output=0, last1=0) at exec.c:2016 #17 0x000055555559125c in execpline (state=0x7fffffffdad0, slcode=5122, how=18, last1=0) at exec.c:1741 #18 0x000055555559049e in execlist (state=0x7fffffffdad0, dont_change_job=0, exiting=0) at exec.c:1495 #19 0x000055555558fabe in execode (p=0x7ffff7fb7a30, dont_change_job=0, exiting=0, context=0x555555632b86 "toplevel") at exec.c:1276 #20 0x00005555555b7cb0 in loop (toplevel=1, justonce=0) at init.c:212 #21 0x00005555555bc7a9 in zsh_main (argc=3, argv=0x7fffffffddf8) at init.c:1934 #22 0x000055555556c99d in main (argc=3, argv=0x7fffffffddf8) at ./main.c:93 (gdb) > I think it's much more consistent for dynamic scoping to apply at the > time $var is referenced or assigned. > > The patch in workers/52650 allows something like this to work: > > assign() { > typeset value=$2 > typeset -nu var=$1 # ignore local $value > case ${(t)var} in > (*scalar*) var=$value;; > (*array*) var=( $value );; > (*) print -u2 cannot assign;; > esac > } > declare -a value > assign value x > > If a function wants to be able to actually add things to its calling > scope without fear of name clashes, it has to use unique local names > (e.g., a namespace). But with dynamic scoping it remains the case > that you can't add a name to the global scope if any function in the > call chain has pre-empted that name -- named references don't change > this basic rule. I really don't follow, dynamic scoping should make things easier here. If the caller does "local var; assign var value", then obviously they want value to be assigned to their var, not have one created in the global scope instead. It's easy in a shell with dynamic scoping as a function sees the local variables of its caller anyway (for instance, my [[ -v $1 ]] above will see). That typeset -nu var is useful indeed as it allows one to declare local variables *before* tying namerefs to variables, but having namerefs that can't create globals seems like a very severe limitations to me, all other shells with namerefs can do it. zsh can do it except for that possible clash with local variables declared *after* which it seems to me should be avoidable. -- Stephane
On Sun, Mar 3, 2024 at 12:27 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2024-03-03 11:04:03 -0800, Bart Schaefer: > > > > If the parameter is magically created "in advance" (at the point of > > "typeset -n") then it only works for global scalars > > Why would we need it for any other scope? If the parameter is > not found in any scope, then it must be created in global scope. > > That's what "read var" does. The word "scalar" is just as important in that sentence as the word "global" -- "read" and "zstat" can specify whether the variable is scalar or array, you can't do that with a nameref. Furthermore, if there's a local in scope at the point where "read var" is called, it assigns to the local, even if the local is unset. The scope is determined at the point of assignment. Locals consume the space allocated in the parameter table for the corresponding name. This is a longstanding, conscious decision in the parameter implementation, I even quoted a code comment to that effect in the thread about unsetting tied parameters. With -u, a nameref can skip over one local, but it can't "make room" for a global that didn't already exist. I've tried several different ways to explain this, so I'm just going to ignore this going forward. > Many if not most of the usages of > namerefs I've come across were to implement "read"-like commands > that return or can return something in a variable whose name is given by the > caller (and that can create the variable if needed). There's nothing in the current implementation that prevents you from doing that if you choose your local names appropriately (and if nothing above you in the call chain hasn't already co-opted the name). > Where an approximation of what I'm suggesting is implemented in > zsh seems to work fine (with your patch applied). But that approach doesn't work in the general case! If you move typeset value=$2 to above the [[ -v $1 ]], then it either bombs, or assigns to the "wrong" $value. There's no way to magically get this right in the internal implementation, it has to be handled in shell code. > It segfaults with zsh -x though I'll have a look at that later, thanks. > > If a function wants to be able to actually add things to its calling > > scope without fear of name clashes, it has to use unique local names > > (e.g., a namespace). But with dynamic scoping it remains the case > > that you can't add a name to the global scope if any function in the > > call chain has pre-empted that name -- named references don't change > > this basic rule. > > I really don't follow, dynamic scoping should make things easier > here. Consider the case where your function is called 2 or more levels removed from the global scope. > but having namerefs that can't create globals seems like a very > severe limitations to me It's the same limitation as () { local foo unset foo () { foo=bar } } which despite the appearance of the assignment in the nested function, doesn't create a global foo, it resurrects the caller's local foo. Ksh has static local scope (and an entirely different internal representation of parameters) so the local foo makes no difference to the nested function.
2024-03-03 14:58:04 -0800, Bart Schaefer: [...] > I've tried several different ways to explain this, so > I'm just going to ignore this going forward. Sorry for being so thick but I still don't get it. As there's still a chance it's just we have different expectations, I'll try a last attempt at clarifying mine below [...] > > Many if not most of the usages of > > namerefs I've come across were to implement "read"-like commands > > that return or can return something in a variable whose name is given by the > > caller (and that can create the variable if needed). > > There's nothing in the current implementation that prevents you from > doing that if you choose your local names appropriately (and if > nothing above you in the call chain hasn't already co-opted the name). If we still have to namespace the names of the nameref variables and any local variables of functions that use namerefs, then we might as well do like in bash/mksh and just follow by name. > > Where an approximation of what I'm suggesting is implemented in > > zsh seems to work fine (with your patch applied). > > But that approach doesn't work in the general case! If you move > typeset value=$2 > to above the [[ -v $1 ]], then it either bombs, or assigns to the > "wrong" $value. To me, typeset -n ref=var should tie the ref to the var that is currently in scope at the time ref is assigned var. If there's none in scope at the moment, then it should be created unset and undeclared at the global scope at that point, so that at the next ref=value, it's that one that is set and not whatever variable with the same name happened to appear at whatever scope that ref=value was run. In: var=0 f() { local var=1 nameref ref=var echo "$var" } f I expect ref to poing to the local var and get "1" because that's the var that was in scope at the time of ref=var In: var=0 f() { nameref ref=var local var=1 echo "$var" } f I expect 0. In f() { nameref ref=var local var=1 echo "${var-unset}" ref=2 } f echo "$var" I expect unset and 2 [...] > > I really don't follow, dynamic scoping should make things easier > > here. > > Consider the case where your function is called 2 or more levels > removed from the global scope. In: var=0 f1() { local var=1 f2 } f2() { f3 local var=2 } f3() { nameref ref=var echo "$ref" } I expect to see 1 as that's the var that is in scope at the time of ref=var. > > > but having namerefs that can't create globals seems like a very > > severe limitations to me > > It's the same limitation as > () { > local foo > unset foo > () { foo=bar } > } > which despite the appearance of the assignment in the nested function, > doesn't create a global foo, it resurrects the caller's local foo. Which is very much what I expect and agree it would be very wrong for that to create a global foo. > Ksh has static local scope (and an entirely different internal > representation of parameters) so the local foo makes no difference to > the nested function. (unless the local foo is expected or the nest function uses Bourne-style instead of Korn-style function definition) but yes, I any case I agree with your description of how scoping would work in ksh. -- Stephane
Bart Schaefer wrote:
> The word "scalar" is just as important in that sentence as the word
> "global" -- "read" and "zstat" can specify whether the variable is
> scalar or array, you can't do that with a nameref. Furthermore, if
Isn't defaulting to a scalar type fairly normal. For the read use case,
it works for some basic tests when you start the function with:
[[ -v $1 ]] || typeset -g $1
typeset -un ref=$1
A different problem I'm finding is with not being able to hide the
reference type of a variable in a called function:
inner() {
local var=hello
typeset -p var
}
outer() {
typeset -n var=value
inner
}
outer
This outputs
typeset -g -n var=value
The value "hello" is lost but a subsequent assignment within inner()
applies to the reference.
I would expect the `local var` in inner to hide any variable including
references with the same name from an outer scope.
(I'm currently testing with the typeset -nu and Fix crash on
unset-through-nameref patches applied).
Oliver
On Mon, Mar 4, 2024 at 5:05 PM Oliver Kiddle <opk@zsh.org> wrote: > > Isn't defaulting to a scalar type fairly normal. Sure, but Stephane wants a global that's a blank slate for anything to be assigned to. I don't think we have a way to do that -- the way the implementation works in the absence of namerefs is to actually remove/free the parameter and create a new one. If there are locals pointing to the global via pm->old, and the global gets removed, all the locals are also destroyed. > A different problem I'm finding is with not being able to hide the > reference type of a variable in a called function: That's actually "correct" -- from the doc: When a _named reference_ is created with 'typeset -n', all uses of PNAME in assignments and expansions instead assign to or expand RNAME. This also applies to 'unset PNAME' and to most subsequent uses of 'typeset' with the exception of 'typeset -n' and 'typeset +n' I admit this is a little weird when the named reference is inherited from another scope. The most recent FAQ entry that I patched actually suggests using "private -n var" to prevent downward scope leakage of namerefs.
On 3/5/24, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mon, Mar 4, 2024 at 5:05 PM Oliver Kiddle <opk@zsh.org> wrote:
>>
>> Isn't defaulting to a scalar type fairly normal.
>
> Sure, but Stephane wants a global that's a blank slate for anything to
> be assigned to. I don't think we have a way to do that -- the way the
> implementation works in the absence of namerefs is to actually
> remove/free the parameter and create a new one. If there are locals
> pointing to the global via pm->old, and the global gets removed, all
> the locals are also destroyed.
>
>> A different problem I'm finding is with not being able to hide the
>> reference type of a variable in a called function:
>
> That's actually "correct" -- from the doc:
>
> When a _named reference_ is created with 'typeset -n', all uses of PNAME
> in assignments and expansions instead assign to or expand RNAME. This
> also applies to 'unset PNAME' and to most subsequent uses of 'typeset'
> with the exception of 'typeset -n' and 'typeset +n'
>
> I admit this is a little weird when the named reference is inherited
> from another scope. The most recent FAQ entry that I patched actually
> suggests using "private -n var" to prevent downward scope leakage of
> namerefs.
Is this possible to change? I feel like if "typeset myvar" (or "local
myvar") cannot be depended on to create a local parameter, a lot of
code will no longer be safe that previously was (in the sense that it
doesn't break if calling code / the shell environmnet has certain
parameters defined). Surely we cannot expect everyone to add +n to
every single "local" statement? If the current behavior is wanted, one
can always use typeset -g to get it, presumably? And that would be
more backward compatible, right?
Like say I have typeset -n foo=PS1 in my .zshrc because I don't like
writing $PS1 manually, now any code I call which has a "local foo" is
broken, which is not great. They have explicitly added their "local
foo" to be able to use the name 'foo' regardless of the calling
environment... "private -n foo" works in this case too, but I still
hold my opinion.
--
Mikael Magnusson
2024-03-05 06:43:02 +0100, Mikael Magnusson:
[...]
> > That's actually "correct" -- from the doc:
> >
> > When a _named reference_ is created with 'typeset -n', all uses of PNAME
> > in assignments and expansions instead assign to or expand RNAME. This
> > also applies to 'unset PNAME' and to most subsequent uses of 'typeset'
> > with the exception of 'typeset -n' and 'typeset +n'
> >
> > I admit this is a little weird when the named reference is inherited
> > from another scope. The most recent FAQ entry that I patched actually
> > suggests using "private -n var" to prevent downward scope leakage of
> > namerefs.
>
> Is this possible to change? I feel like if "typeset myvar" (or "local
> myvar") cannot be depended on to create a local parameter, a lot of
> code will no longer be safe that previously was (in the sense that it
> doesn't break if calling code / the shell environmnet has certain
> parameters defined). Surely we cannot expect everyone to add +n to
> every single "local" statement? If the current behavior is wanted, one
> can always use typeset -g to get it, presumably? And that would be
> more backward compatible, right?
>
> Like say I have typeset -n foo=PS1 in my .zshrc because I don't like
> writing $PS1 manually, now any code I call which has a "local foo" is
> broken, which is not great. They have explicitly added their "local
> foo" to be able to use the name 'foo' regardless of the calling
> environment... "private -n foo" works in this case too, but I still
> hold my opinion.
[...]
I'd agree that means defining a nameref would/could break any
function that uses local variables by the same name and doesn't
sound acceptable.
See also:
$ nameref action=PS1
$ zmv -n '*' '$f.back'
zsh: segmentation fault ./Src/zsh
Program received signal SIGSEGV, Segmentation fault.
getstrvalue (v=0x7fffffffaf70) at params.c:2343
2343 s = v->pm->gsu.s->getfn(v->pm);
(gdb) bt
#0 getstrvalue (v=0x7fffffffaf70) at params.c:2343
#1 0x000055555561369d in paramsubst (l=0x7fffffffb230, n=0x7fffffffb250, str=0x7fffffffb040, qt=0, pf_flags=4, ret_flags=0x7fffffffb154) at subst.c:2907
#2 0x000055555560dca1 in stringsubst (list=0x7fffffffb230, node=0x7fffffffb250, pf_flags=4, ret_flags=0x7fffffffb154, asssub=0) at subst.c:322
#3 0x000055555560cf87 in prefork (list=0x7fffffffb230, flags=4, ret_flags=0x7fffffffb154) at subst.c:142
#4 0x000055555560e474 in singsub (s=0x7fffffffb368) at subst.c:520
#5 0x000055555558a80e in cond_subst (strp=0x7fffffffb368, glob_ok=1) at cond.c:53
#6 0x000055555558b029 in evalcond (state=0x7fffffffc0c0, fromtest=0x0) at cond.c:198
#7 0x000055555559ba6a in execcond (state=0x7fffffffc0c0, do_exec=0) at exec.c:5185
#8 0x000055555558fe49 in execsimple (state=0x7fffffffc0c0) at exec.c:1334
#9 0x00005555555902e8 in execlist (state=0x7fffffffc0c0, dont_change_job=1, exiting=0) at exec.c:1462
#10 0x00005555555cd63d in execif (state=0x7fffffffc0c0, do_exec=0) at loop.c:561
#11 0x00005555555988aa in execcmd_exec (state=0x7fffffffc0c0, eparams=0x7fffffffbcd0, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:4024
#12 0x00005555555926dc in execpline2 (state=0x7fffffffc0c0, pcode=9987, how=2, input=0, output=0, last1=0) at exec.c:2016
#13 0x0000555555591270 in execpline (state=0x7fffffffc0c0, slcode=40962, how=2, last1=0) at exec.c:1741
#14 0x00005555555904b2 in execlist (state=0x7fffffffc0c0, dont_change_job=1, exiting=0) at exec.c:1495
#15 0x000055555558fad2 in execode (p=0x555555689cd0, dont_change_job=1, exiting=0, context=0x555555630f0a "loadautofunc") at exec.c:1276
#16 0x000055555559cca0 in execautofn_basic (state=0x7fffffffc9c0, do_exec=0) at exec.c:5594
#17 0x000055555559886d in execcmd_exec (state=0x7fffffffc9c0, eparams=0x7fffffffc5d0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:4022
#18 0x00005555555926dc in execpline2 (state=0x7fffffffc9c0, pcode=3, how=18, input=0, output=0, last1=0) at exec.c:2016
#19 0x0000555555591270 in execpline (state=0x7fffffffc9c0, slcode=3074, how=18, last1=0) at exec.c:1741
#20 0x00005555555904b2 in execlist (state=0x7fffffffc9c0, dont_change_job=1, exiting=0) at exec.c:1495
#21 0x000055555558fad2 in execode (p=0x555555689b40, dont_change_job=1, exiting=0, context=0x5555556310da "shfunc") at exec.c:1276
#22 0x000055555559e88e in runshfunc (prog=0x555555689b40, wrap=0x0, name=0x7ffff7fa6168 "zmv") at exec.c:6164
#23 0x000055555559dd9e in doshfunc (shfunc=0x555555678450, doshargs=0x7ffff7fb8080, noreturnval=0) at exec.c:6010
#24 0x000055555559ca31 in execshfunc (shf=0x555555678450, args=0x7ffff7fb8080) at exec.c:5548
#25 0x0000555555598ab0 in execcmd_exec (state=0x7fffffffd6f0, eparams=0x7fffffffd300, input=0, output=0, how=18, last1=1, close_if_forked=-1) at exec.c:4075
#26 0x00005555555926dc in execpline2 (state=0x7fffffffd6f0, pcode=131, how=18, input=0, output=0, last1=1) at exec.c:2016
#27 0x0000555555591270 in execpline (state=0x7fffffffd6f0, slcode=6146, how=18, last1=1) at exec.c:1741
#28 0x00005555555904b2 in execlist (state=0x7fffffffd6f0, dont_change_job=0, exiting=1) at exec.c:1495
#29 0x000055555558fad2 in execode (p=0x7ffff7fb7b60, dont_change_job=0, exiting=1, context=0x555555633326 "cmdarg") at exec.c:1276
#30 0x000055555558f998 in execstring (s=0x7fffffffdd3b "nameref action=PS1; autoload zmv; zmv -n \"*\" \"\\$f.back\"", dont_change_job=0, exiting=1, context=0x555555633326 "cmdarg")
at exec.c:1242
#31 0x00005555555baf25 in init_misc (cmd=0x7fffffffdd3b "nameref action=PS1; autoload zmv; zmv -n \"*\" \"\\$f.back\"", zsh_name=0x7fffffffdd34 "zsh") at init.c:1530
#32 0x00005555555bc798 in zsh_main (argc=3, argv=0x7fffffffd918) at init.c:1920
#33 0x000055555556c99d in main (argc=3, argv=0x7fffffffd918) at ./main.c:93
(gdb) p *v.pm
$1 = {node = {next = 0x0, nam = 0x555555670480 "PS1", flags = 1048576}, u = {data = 0x55555568bab0, arr = 0x55555568bab0, str = 0x55555568bab0 "", val = 93824993508016,
valptr = 0x55555568bab0, dval = 4.6355706013588689e-310, hash = 0x55555568bab0}, gsu = {s = 0x0, i = 0x0, f = 0x0, a = 0x0, h = 0x0}, base = 0, width = 0, env = 0x0, ename = 0x0,
old = 0x0, level = 0}
[-- Attachment #1: Type: text/plain, Size: 1726 bytes --] On Mon, Mar 4, 2024 at 9:43 PM Mikael Magnusson <mikachu@gmail.com> wrote: > > On 3/5/24, Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > When a _named reference_ is created with 'typeset -n', all uses of PNAME > > in assignments and expansions instead assign to or expand RNAME. This > > also applies to 'unset PNAME' and to most subsequent uses of 'typeset' > > with the exception of 'typeset -n' and 'typeset +n' > > Is this possible to change? I feel like if "typeset myvar" (or "local > myvar") cannot be depended on to create a local parameter, a lot of > code will no longer be safe that previously was (in the sense that it > doesn't break if calling code / the shell environmnet has certain > parameters defined). Yes, it's possible, and the attached patch does so, with doc update and new tests. The division of labor among bin_typeset(), typeset_single(), and createparam() is a bit hard to follow (as in, it's not really much of a "division" at all). On Mon, Mar 4, 2024 at 10:31 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > See also: > > $ nameref action=PS1 > $ zmv -n '*' '$f.back' > zsh: segmentation fault ./Src/zsh A simpler reproducer: % typeset -n foo=PS1 % () { local foo; foo=xx } ==1113679== Invalid read of size 8 ==1113679== at 0x1A16E1: assignstrvalue (params.c:2684) This patch fixes that, and one other potential crash that valgrind complains about: % typeset -n foo=bar % typeset -n foo ==1113695== Invalid read of size 4 ==1113695== at 0x136041: bin_typeset (builtin.c:3137) I'll go ahead and push this since there doesn't seem to be any argument about the change in function and it fixes two crash bugs. [-- Attachment #2: local-of-nameref.txt --] [-- Type: text/plain, Size: 4492 bytes --] diff --git a/Doc/Zsh/params.yo b/Doc/Zsh/params.yo index 8c5e67e70..d179a0d1d 100644 --- a/Doc/Zsh/params.yo +++ b/Doc/Zsh/params.yo @@ -670,8 +670,9 @@ This manual was generated with Zsh tt(version()). When a em(named reference) is created with `tt(typeset -n)', all uses of var(pname) in assignments and expansions instead assign to or expand var(rname). This also applies to `tt(unset )var(pname)' and to -most subsequent uses of `tt(typeset)' with the exception of -`tt(typeset -n)' and `tt(typeset +n)', so to remove a named reference, +most subsequent uses of `tt(typeset)' with the exceptions of declaring +a local in a called function, or updating a current-scope parameter with +`tt(typeset -n)' or `tt(typeset +n)'. Thus to remove a named reference, use either `tt(unset -n )var(pname)' (preferred) or one of: ifzman() example(tt(typeset -n )var(pname=) diff --git a/Src/builtin.c b/Src/builtin.c index 6f98990f9..829b899f8 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -2030,11 +2030,10 @@ typeset_single(char *cname, char *pname, Param pm, int func, int usepm, tc, keeplocal = 0, newspecial = NS_NONE, readonly, dont_set = 0; char *subscript; - if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF)) { - if (!(off & PM_NAMEREF)) { - if ((pm = (Param)resolve_nameref(pm, NULL))) - pname = pm->node.nam; - } + if (pm && (pm->node.flags & PM_NAMEREF) && !((off|on) & PM_NAMEREF) && + (pm->level == locallevel || !(on & PM_LOCAL))) { + if ((pm = (Param)resolve_nameref(pm, NULL))) + pname = pm->node.nam; if (pm && (pm->node.flags & PM_NAMEREF) && (on & ~(PM_NAMEREF|PM_LOCAL|PM_READONLY))) { /* Changing type of PM_SPECIAL|PM_AUTOLOAD is a fatal error. * @@ -3125,8 +3124,10 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func) oldpm->u.str) asg->value.scalar = dupstring(oldpm->u.str); /* Defer read-only error to typeset_single() */ - if (!(hn->flags & PM_READONLY)) + if (!(hn->flags & PM_READONLY)) { unsetparam_pm(oldpm, 0, 1); + hn = NULL; + } } /* Passing a NULL pm to typeset_single() makes the * nameref read-only before assignment, which breaks @@ -3134,7 +3135,7 @@ bin_typeset(char *name, char **argv, LinkList assigns, Options ops, int func) * so this is special-cased to permit that action * like assign-at-create for other parameter types. */ - if (!(hn->flags & PM_READONLY)) + if (hn && !(hn->flags & PM_READONLY)) hn = NULL; } } diff --git a/Src/params.c b/Src/params.c index 4bcf41c22..973df3fe5 100644 --- a/Src/params.c +++ b/Src/params.c @@ -1034,7 +1034,8 @@ createparam(char *name, int flags) } if (oldpm && !(flags & PM_NAMEREF) && - (!(oldpm->node.flags & PM_RO_BY_DESIGN) || !(flags & PM_LOCAL)) && + (oldpm->level == locallevel ? + !(oldpm->node.flags & PM_RO_BY_DESIGN) : !(flags & PM_LOCAL)) && (oldpm->node.flags & PM_NAMEREF) && (oldpm = upscope(oldpm, oldpm->base))) { Param lastpm; diff --git a/Test/K01nameref.ztst b/Test/K01nameref.ztst index e45b922e2..bb0d11821 100644 --- a/Test/K01nameref.ztst +++ b/Test/K01nameref.ztst @@ -51,9 +51,19 @@ 0:remove nameref attribute >typeset ptr=var - typeset -n ptr - typeset -t ptr - typeset -p ptr + typeset -n ptr=gvar + () { + local ptr + typeset -p ptr + } + typeset -p ptr +0:Local non-reference hides outside reference +>typeset ptr +>typeset -n ptr=gvar + + typeset -n ptr + typeset -t ptr + typeset -p ptr 0:change type of a placeholder F:Other type changes are fatal errors, should this also be? >typeset -n ptr='' @@ -845,4 +855,39 @@ F:previously this could create an infinite recursion and crash 1:create nameref by pattern match not allowed *?*typeset:1: invalid reference +# +# The following tests are run in interactive mode, using PS1 as an +# assignable special with side-effects. This crashed at one time. +# + + # Note bypassing TYPESET_TO_UNSET here + $ZTST_testdir/../Src/zsh -fis <<<$' + typeset -n p=PS1 + () { + typeset -p p + local p + typeset -p p + p=xx + typeset -p p + } + ' +0:regression: assign to local that shadows global named reference +>typeset -g -n p=PS1 +>typeset p='' +>typeset p=xx +*?* + + # Note bypassing TYPESET_TO_UNSET here + $ZTST_testdir/../Src/zsh -fis <<<$' + () { + typeset p=PS1 + typeset -n p + p=zz + } + typeset -p PS1 + ' +0:regression - converting a string into a named reference +>typeset PS1=zz +*?* + %clean