zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: parameter substitution for exclusion by array
@ 2012-04-20 19:20 Peter Stephenson
  2012-04-21 21:05 ` Peter Stephenson
  2012-04-22 12:55 ` Mikael Magnusson
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Stephenson @ 2012-04-20 19:20 UTC (permalink / raw)
  To: Zsh Hackers' List

I was completing after "autoload" the other day and got an error message
about a pattern.  I didn't notice in detail what the problem in the
pattern was, but the error came from this:

      # Filter out functions already loaded or marked for autoload.
      args=(${args:#(${(kj.|.)~functions})})

Function names can actually include pattern characters, though it's not
usually a good idea, although possible to produce by mistake, so they
would need quoting here.  Quoting patterns is a bit icky, since you
can't quote characters that aren't special to the pattern (even if they
are otherwise special to the shell), so using (q) doesn't quite do what
you want.  What's more, although the expression above is about the best
we can do if you want to use a pattern, it's very inefficient for
removing elements from an array, effectively performing a quadratic
search since we don't try to optimise patterns.

So I thought about an operation that would remove elements of an array
using another array and, serendipitously, the new implementation of
uniqarray(), or rather the hash table that goes with it, is exactly
what's needed to do this efficiently, since it's the same problem but
with two arrays instead of one.

I thought of using ${...:~...} for this, but unfortunately ~ can
introduce an arithmetic expression, so this introduces an ambiguity with
index offset notation.  So I picked ${...:|...}, which has a double
mnemonic: bar the "or" of the array and currently always producds an
invalid expression.

Then I realised it was a trivial modification to get the intersection of
two arrays instead.  For that, ${...:&...} might be nice, but :& can be
a history modifier.  So I picked ${...:*...} as being the nearest thing
that wasn't used.

It might be nice to allow ${...:&(k)...} to use the keys of an
associative array, but that can wait.

Index: Completion/Zsh/Command/_typeset
===================================================================
RCS file: /cvsroot/zsh/zsh/Completion/Zsh/Command/_typeset,v
retrieving revision 1.4
diff -p -u -r1.4 _typeset
--- Completion/Zsh/Command/_typeset	6 Nov 2006 17:15:00 -0000	1.4
+++ Completion/Zsh/Command/_typeset	20 Apr 2012 19:15:06 -0000
@@ -77,7 +77,9 @@ if [[ "$state" = vars_eq ]]; then
     elif [[ $service = autoload || -n $opt_args[(i)-[uU]] ]]; then
       args=(${^fpath}/*(:t))
       # Filter out functions already loaded or marked for autoload.
-      args=(${args:#(${(kj.|.)~functions})})
+      local -a funckeys
+      funckeys=(${(k)functions})
+      args=${args:|funckeys}
       _wanted functions expl 'shell function' compadd -a args
     else
       _functions
Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.141
diff -p -u -r1.141 expn.yo
--- Doc/Zsh/expn.yo	11 Dec 2011 17:22:59 -0000	1.141
+++ Doc/Zsh/expn.yo	20 Apr 2012 19:15:07 -0000
@@ -604,6 +604,19 @@ If var(name) is an array
 the matching array elements are removed (use the `tt((M))' flag to
 remove the non-matched elements).
 )
+item(tt(${)var(name)tt(:|)var(arrayname)tt(}))(
+If var(arrayname) is the name (N.B., not contents) of an array
+variable, then any elements contained in var(arrayname) are removed
+from the substitution of var(name).  If the substitution is scalar,
+either because var(name) is a scalar variable or the expression is
+quoted, the elements of var(arrayname) are instead tested against the
+entire expression.
+)
+item(tt(${)var(name)tt(:*)var(arrayname)tt(}))(
+Similar to the preceding subsitution, but in the opposite sense,
+so that entries present in both the original substitution and as
+elements of var(arrayname) are retained and others removed.
+)
 xitem(tt(${)var(name)tt(:)var(offset)tt(}))
 item(tt(${)var(name)tt(:)var(offset)tt(:)var(length)tt(}))(
 This syntax gives effects similar to parameter subscripting
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.180
diff -p -u -r1.180 params.c
--- Src/params.c	13 Apr 2012 16:01:23 -0000	1.180
+++ Src/params.c	20 Apr 2012 19:15:07 -0000
@@ -3493,7 +3493,7 @@ arrayuniq_freenode(HashNode hn)
 }
 
 /**/
-static HashTable
+HashTable
 newuniqtable(zlong size)
 {
     HashTable ht = newhashtable((int)size, "arrayuniq", NULL);
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.133
diff -p -u -r1.133 subst.c
--- Src/subst.c	29 Feb 2012 09:57:41 -0000	1.133
+++ Src/subst.c	20 Apr 2012 19:15:07 -0000
@@ -2872,6 +2872,49 @@ paramsubst(LinkList l, LinkNode n, char 
 	    }
 	    break;
 	}
+    } else if (inbrace && (*s == '|' || *s == Bar ||
+			   *s == '*' || *s == Star)) {
+	int intersect = (*s == '*' || *s == Star);
+	char **compare = getaparam(++s), **ap, **apsrc;
+	if (compare) {
+	    HashTable ht = newuniqtable(arrlen(compare)+1);
+	    int present;
+	    for (ap = compare; *ap; ap++)
+		(void)addhashnode2(ht, *ap, (HashNode)
+				   zhalloc(sizeof(struct hashnode)));
+	    if (!vunset && isarr) {
+		if (!copied) {
+		    aval = arrdup(aval);
+		    copied = 1;
+		}
+		for (ap = apsrc = aval; *apsrc; apsrc++) {
+		    untokenize(*apsrc);
+		    present = (gethashnode2(ht, *apsrc) != NULL);
+		    if (intersect ? present : !present) {
+			if (ap != apsrc) {
+			    *ap = *apsrc;
+			}
+			ap++;
+		    }
+		}
+		*ap = NULL;
+	    } else {
+		if (vunset) {
+		    if (unset(UNSET)) {
+			*idend = '\0';
+			zerr("%s: parameter not set", idbeg);
+			deletehashtable(ht);
+			return NULL;
+		    }
+		    val = dupstring("");
+		} else {
+		    present = (gethashnode2(ht, val) != NULL);
+		    if (intersect ? !present : present)
+			val = dupstring("");
+		}
+	    }
+	    deletehashtable(ht);
+	}
     } else {			/* no ${...=...} or anything, but possible modifiers. */
 	/*
 	 * Handler ${+...}.  TODO: strange, why do we handle this only
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.64
diff -p -u -r1.64 D04parameter.ztst
--- Test/D04parameter.ztst	10 Apr 2012 01:17:03 -0000	1.64
+++ Test/D04parameter.ztst	20 Apr 2012 19:15:07 -0000
@@ -169,6 +169,29 @@
 >a-string-with-slashes
 >a-string-with-slashes
 
+  args=('one' '#foo' '(bar' "'three'" two)
+  mod=('#foo' '(bar' "'three'" sir_not_appearing_in_this_film)
+  print ${args:|mod}
+  print ${args:*mod}
+  print "${(@)args:|mod}"
+  print "${(@)args:*mod}"
+  args=(two words)
+  mod=('one word' 'two words')
+  print "${args:|mod}"
+  print "${args:*mod}"
+  scalar='two words'
+  print ${scalar:|mod}
+  print ${scalar:*mod}
+0:"|" array exclusion and "*" array intersection
+>one two
+>#foo (bar 'three'
+>one two
+>#foo (bar 'three'
+>
+>two words
+>
+>two words
+
   str1='twocubed'
   array=(the number of protons in an oxygen nucleus)
   print $#str1 ${#str1} "$#str1 ${#str1}" $#array ${#array} "$#array ${#array}"


-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: parameter substitution for exclusion by array
  2012-04-20 19:20 PATCH: parameter substitution for exclusion by array Peter Stephenson
@ 2012-04-21 21:05 ` Peter Stephenson
  2012-04-21 21:48   ` Bart Schaefer
  2012-04-22 12:55 ` Mikael Magnusson
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Stephenson @ 2012-04-21 21:05 UTC (permalink / raw)
  To: Zsh Hackers' List

On Fri, 20 Apr 2012 20:20:51 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> Index: Completion/Zsh/Command/_typeset
>        args=(${^fpath}/*(:t))

I wonder if it would be OK to fix this to

      args=(${^fpath}/*(-.:t))

or is that going to give problems with network paths?  Presumably zsh
function paths should never get quite as hairy as system executable paths.
I could base it on HASH_EXECUTABLES_ONLY (except they're not hashed and
aren't executables).

> +      args=${args:|funckeys}

I'll fix the embarrassing lack of parentheses here.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: parameter substitution for exclusion by array
  2012-04-21 21:05 ` Peter Stephenson
@ 2012-04-21 21:48   ` Bart Schaefer
  2012-04-21 22:06     ` Peter Stephenson
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2012-04-21 21:48 UTC (permalink / raw)
  To: Zsh Hackers' List

On Apr 20,  8:20pm, Peter Stephenson wrote:
}
}       # Filter out functions already loaded or marked for autoload.
}       args=(${args:#(${(kj.|.)~functions})})
} 
} So I thought about an operation that would remove elements of an array
} using another array and, serendipitously, the new implementation of
} uniqarray(), or rather the hash table that goes with it, is exactly
} what's needed to do this efficiently, since it's the same problem but
} with two arrays instead of one.

There isn't anything magic about the hash created by newuniqtable(),
unless you mean that it never frees any of the entries put into it.

(Based on 30391 newuniqtable could lose disablehashnode/enablehashnode
as well?  Not that they hurt anything by being there.)

} I thought of using ${...:~...} for this, but unfortunately ~ can
} introduce an arithmetic expression, so this introduces an ambiguity with
} index offset notation.  So I picked ${...:|...}, which has a double
} mnemonic: bar the "or" of the array and currently always producds an
} invalid expression.

This is nice and there are probably a LOT of places in completion that
could benefit from it.  Particularly in _git, as I recall, where we
fell back to external processes for cache deduplication efficiency.

On Apr 21, 10:05pm, Peter Stephenson wrote:
}
} > Index: Completion/Zsh/Command/_typeset
} >        args=(${^fpath}/*(:t))
} 
} I wonder if it would be OK to fix this to
} 
}       args=(${^fpath}/*(-.:t))
}  
} or is that going to give problems with network paths?

As we're already globbing in the directory, the only difference between
*(-.:t) and *(.:t) is for directories that contain a bunch of symlinks;
any symlink to the directory itself must already have been followed.  So
I don't think network paths are a huge concern.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: parameter substitution for exclusion by array
  2012-04-21 21:48   ` Bart Schaefer
@ 2012-04-21 22:06     ` Peter Stephenson
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Stephenson @ 2012-04-21 22:06 UTC (permalink / raw)
  To: Zsh Hackers' List

On Sat, 21 Apr 2012 14:48:17 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Apr 21, 10:05pm, Peter Stephenson wrote:
> }
> } > Index: Completion/Zsh/Command/_typeset
> } >        args=(${^fpath}/*(:t))
> } 
> } I wonder if it would be OK to fix this to
> } 
> }       args=(${^fpath}/*(-.:t))
> }  
> } or is that going to give problems with network paths?
> 
> As we're already globbing in the directory, the only difference between
> *(-.:t) and *(.:t) is for directories that contain a bunch of symlinks;
> any symlink to the directory itself must already have been followed.  So
> I don't think network paths are a huge concern.

The point is there wasn't even a "." there before, so we're going from a
pure readdir() loop (plus a bit of string handling with no implications
for system interaction) to performing a stat() on each file.  What
really triggered my noticing this was we allow completion of
subdirectories of $fpath as autoloadable functions, which is a bit
bizarre.

Hmm... we could make this more efficient with some trickery: remove
functions already marked for autoload first, then regenerate the path to
the remaining (presumably, though not necessarily much smaller) set of
matches first, then only test the types of those.  However, I'm not sure
there's an efficient way of getting from the pruned $args back to the
full paths, or equivalently an efficient way of removing full paths
where only the last component matches a function name... or whether it's
worth even attempting.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: parameter substitution for exclusion by array
  2012-04-20 19:20 PATCH: parameter substitution for exclusion by array Peter Stephenson
  2012-04-21 21:05 ` Peter Stephenson
@ 2012-04-22 12:55 ` Mikael Magnusson
  2012-04-22 18:20   ` Peter Stephenson
  1 sibling, 1 reply; 7+ messages in thread
From: Mikael Magnusson @ 2012-04-22 12:55 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh Hackers' List

On 2012-04-20, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
>
> Then I realised it was a trivial modification to get the intersection of
> two arrays instead.  For that, ${...:&...} might be nice, but :& can be
> a history modifier.  So I picked ${...:*...} as being the nearest thing
> that wasn't used.

Using ${path:*notexist} prints all elements of $path, not sure if that
is what one should expect or not. ${path:|notexist} does the same but
that makes a lot more sense to me :). ${path:*SCALAR} and :| act the
same way, substitute all elements in $path.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: parameter substitution for exclusion by array
  2012-04-22 12:55 ` Mikael Magnusson
@ 2012-04-22 18:20   ` Peter Stephenson
  2012-04-23 13:54     ` PATCH: NEWS: mention new array operations Mikael Magnusson
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Stephenson @ 2012-04-22 18:20 UTC (permalink / raw)
  To: Zsh Hackers' List

On Sun, 22 Apr 2012 14:55:15 +0200
Mikael Magnusson <mikachu@gmail.com> wrote:
> Using ${path:*notexist} prints all elements of $path, not sure if that
> is what one should expect or not. ${path:|notexist} does the same but
> that makes a lot more sense to me :). ${path:*SCALAR} and :| act the
> same way, substitute all elements in $path.

You're right that using :* with something that doesn't exist or isn't an
array should produce an empty result, I hadn't picked up this
difference from :|.

Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.135
diff -p -u -r1.135 subst.c
--- Src/subst.c	22 Apr 2012 18:10:43 -0000	1.135
+++ Src/subst.c	22 Apr 2012 18:18:18 -0000
@@ -2918,6 +2918,19 @@ paramsubst(LinkList l, LinkNode n, char 
 		}
 	    }
 	    deletehashtable(ht);
+	} else if (intersect) {
+	    /*
+	     * The intersection with nothing is nothing...
+	     * Seems a bit pointless complaining that the first
+	     * expression is unset here if the second is, too.
+	     */
+	    if (!vunset) {
+		if (isarr) {
+		    aval = mkarray(NULL);
+		} else {
+		    val = dupstring("");
+		}
+	    }
 	}
     } else {			/* no ${...=...} or anything, but possible modifiers. */
 	/*
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.66
diff -p -u -r1.66 D04parameter.ztst
--- Test/D04parameter.ztst	22 Apr 2012 18:10:43 -0000	1.66
+++ Test/D04parameter.ztst	22 Apr 2012 18:18:18 -0000
@@ -182,6 +182,9 @@
   scalar='two words'
   print ${scalar:|mod}
   print ${scalar:*mod}
+  print ${args:*nonexistent}
+  empty=
+  print ${args:*empty}
 0:"|" array exclusion and "*" array intersection
 >one two
 >#foo (bar 'three'
@@ -191,6 +194,8 @@
 >two words
 >
 >two words
+>
+>
 
   str1='twocubed'
   array=(the number of protons in an oxygen nucleus)

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* PATCH: NEWS: mention new array operations
  2012-04-22 18:20   ` Peter Stephenson
@ 2012-04-23 13:54     ` Mikael Magnusson
  0 siblings, 0 replies; 7+ messages in thread
From: Mikael Magnusson @ 2012-04-23 13:54 UTC (permalink / raw)
  To: zsh-workers

---
 NEWS |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/NEWS b/NEWS
index 2c05c63..ed0b567 100644
--- a/NEWS
+++ b/NEWS
@@ -87,6 +87,12 @@ Expansion (parameters, globbing, etc.) and redirection
   as a string, never a pattern, e.g. ${array[(ie)*]} looks for the
   index of the array element containing the literal string "*".
 
+- The operators :| and :* in parameter expansion allow for array
+  subtraction and intersection in the form ${name:|array}. With the :|
+  operator, all entries in $name that are also in $array will be removed
+  from the substitution. Conversely for the :* operation only the
+  entries that are in both arrays will be substituted.
+
 - Numeric expansions can have a positive or negative step
   e.g. "{3..9..2}".  Negative start and end of ranges are also now
   supported.
-- 
1.7.10.GIT


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-04-23 14:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-20 19:20 PATCH: parameter substitution for exclusion by array Peter Stephenson
2012-04-21 21:05 ` Peter Stephenson
2012-04-21 21:48   ` Bart Schaefer
2012-04-21 22:06     ` Peter Stephenson
2012-04-22 12:55 ` Mikael Magnusson
2012-04-22 18:20   ` Peter Stephenson
2012-04-23 13:54     ` PATCH: NEWS: mention new array operations Mikael Magnusson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).