zsh-workers
 help / color / mirror / code / Atom feed
* [RFC or so] Add HASH_LOOKUP option
@ 2010-08-23 14:09 Mikael Magnusson
  2010-08-23 14:55 ` Peter Stephenson
  2010-08-23 17:46 ` Bart Schaefer
  0 siblings, 2 replies; 6+ messages in thread
From: Mikael Magnusson @ 2010-08-23 14:09 UTC (permalink / raw)
  To: zsh workers

When this is unset, external commands are always resolved with a full
path search, but still inserted into the hash for spell correction if
those options are on.

Caveat: I have not verified all possible paths that use the hash check
for this, but at least =cmd and which -p cmd and actually running cmd
do check it.
---

This was motivated by wycats (Yehuda Katz) on IRC having troubles with
his users running gem install bundler or so, and then getting
segfaults because of mismatches between ruby versions and whatnot, and
they didn't know to run rehash. Adding this option of course doesn't
help him that much since I'm not sure if A) we want to 1) have it at
all 2) default it to on B) it would reach distros anytime soon.

Also by the fact that doing a full path search is probably not that
slow for most people, and I know that some people have resorted to
putting hash -r in their precmd() functions which is of course slower
than just doing the path search since HASH_DIRS defaults to on.

If it turns out to be a good idea, I will of course include
documentation in the followup patch, but for now I won't bother.

 Src/exec.c    |   10 ++++++++--
 Src/options.c |    1 +
 Src/zsh.h     |    1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/Src/exec.c b/Src/exec.c
index 93d1b26..9a488fe 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -754,7 +754,7 @@ findcmd(char *arg0, int docopy)
 	    }
 	    break;
 	}
-    if (cn) {
+    if (cn && isset(HASHLOOKUP)) {
 	char nn[PATH_MAX];

 	if (cn->node.flags & HASHED)
@@ -2708,7 +2708,13 @@ execcmd(Estate state, int input, int output,
int how, int last1)
 	    char **checkpath = pathchecked;
 	    int dohashcmd = isset(HASHCMDS);

-	    hn = cmdnamtab->getnode(cmdnamtab, cmdarg);
+            if (isset(HASHLOOKUP))
+		hn = cmdnamtab->getnode(cmdnamtab, cmdarg);
+            else {
+                hn = NULL;
+                dohashcmd = 1;
+                checkpath = path;
+            }
 	    if (hn && trycd && !isreallycom((Cmdnam)hn)) {
 		if (!(((Cmdnam)hn)->node.flags & HASHED)) {
 		    checkpath = path;
diff --git a/Src/options.c b/Src/options.c
index dedbf0c..fe720fd 100644
--- a/Src/options.c
+++ b/Src/options.c
@@ -141,6 +141,7 @@ static struct optname optns[] = {
 {{NULL, "hashcmds",	      OPT_ALL},			 HASHCMDS},
 {{NULL, "hashdirs",	      OPT_ALL},			 HASHDIRS},
 {{NULL, "hashlistall",	      OPT_ALL},			 HASHLISTALL},
+{{NULL, "hashlookup",         OPT_ALL},                  HASHLOOKUP},
 {{NULL, "histallowclobber",   0},			 HISTALLOWCLOBBER},
 {{NULL, "histbeep",	      OPT_ALL},			 HISTBEEP},
 {{NULL, "histexpiredupsfirst",0},			 HISTEXPIREDUPSFIRST},
diff --git a/Src/zsh.h b/Src/zsh.h
index 4485bd3..1dd5760 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1928,6 +1928,7 @@ enum {
     HASHCMDS,
     HASHDIRS,
     HASHLISTALL,
+    HASHLOOKUP,
     HISTALLOWCLOBBER,
     HISTBEEP,
     HISTEXPIREDUPSFIRST,
-- 
1.7.2

http://git.mika.l3ib.org/?p=zsh-cvs.git;a=patch;h=5e75cf283aa470ac28c94c6c3dd9e2bb0ea9cec1
if anyone actually wants to try it since I usually post garbled
patches.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC or so] Add HASH_LOOKUP option
  2010-08-23 14:09 [RFC or so] Add HASH_LOOKUP option Mikael Magnusson
@ 2010-08-23 14:55 ` Peter Stephenson
  2010-08-23 17:46 ` Bart Schaefer
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2010-08-23 14:55 UTC (permalink / raw)
  To: zsh workers

On Mon, 23 Aug 2010 16:09:45 +0200
Mikael Magnusson <mikachu@gmail.com> wrote:
> When this is unset, external commands are always resolved with a full
> path search, but still inserted into the hash for spell correction if
> those options are on.

Looks a reasonable thing to be able to do.  The word on the streets (for a
tiny, tiny subset of streets) has for some time been that modern operating
systems do their own optimisation of path lookup such that hashing in the
shell doesn't gain you very much.  The main reason we haven't done anything
about it is that we need the hash for other reasons.  You've sidestepped
that simply by always searching and still hashing it (depending on the
existing options).  That's a very minor performance hit beyond the path
search, so I can't see any objection.

Telling people to turn it off is probably the big problem.  I'd be
sympathetic to turning it off by default if there's enough evidence the
performance hit is minor, but that's all rather system-specific, so don't
hold your breath.  I'd be wary of turning it off by default on Cygwin with
paths to remote shares, for example.

It would be useful if you could write some tests to go in
Test/E01options.ztst.  Basically you just need to test the right version
of a command is found with it on and off.  It would be good to do a test
along the lines of

  mkdir dir1 dir2
  mkcmd() {
    print "#!/bin/sh\necho Command $1" >dir$1/cmd
    chmod +x dir$1/cmd
  }
  mkcmd 2
  for opt in on off on off; do
    if [[ $opt = on ]]; then
      setopt hashlookup
    else
      unsetopt hashlookup
    fi
    rm -rf dir1/cmd
    (path=($PWD/dir1 $PWD/dir2)
     cmd
     mkcmd 1
     cmd)
  done

and verify that when the option is off the second cmd prints
"Command 1" each time round and when it's on it doesn't.  (I haven't tested
this code.)

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC or so] Add HASH_LOOKUP option
  2010-08-23 14:09 [RFC or so] Add HASH_LOOKUP option Mikael Magnusson
  2010-08-23 14:55 ` Peter Stephenson
@ 2010-08-23 17:46 ` Bart Schaefer
  2010-08-23 18:25   ` Mikael Magnusson
  2011-06-02 15:11   ` Mikael Magnusson
  1 sibling, 2 replies; 6+ messages in thread
From: Bart Schaefer @ 2010-08-23 17:46 UTC (permalink / raw)
  To: zsh workers

On Aug 23,  4:09pm, Mikael Magnusson wrote:
}
} When this is unset, external commands are always resolved with a full
} path search, but still inserted into the hash for spell correction if
} those options are on.
} 
} diff --git a/Src/exec.c b/Src/exec.c
} index 93d1b26..9a488fe 100644
} --- a/Src/exec.c
} +++ b/Src/exec.c
} @@ -754,7 +754,7 @@ findcmd(char *arg0, int docopy)
}  	    }
}  	    break;
}  	}
} -    if (cn) {
} +    if (cn && isset(HASHLOOKUP)) {
}  	char nn[PATH_MAX];
} 
}  	if (cn->node.flags & HASHED)

I think there may be  problem with this in the event that the "hash"
command has been used to deliberately insert an entry into the hash
table.  That's a documented mechanism for overriding path search.
Also I think you've missed that execute() also uses the hash table;
did you intend that "command foo" ignores NO_HASHLOOKUP?

In looking more closely, there are a number of things about findcmd()
that look a bit fishy to me.  Correct me where I've gone wrong?

It starts (if the command is not already in the hash table and HASHCMDS
is set) by doing hashcmd(), which performs a search of only absolute
directory names in the path (not, e.g., ".", "..", or other relative
names).  That search itself seems a bit off because if *arg0 == '/'
then it's going to prepend the path component anyway. [*]

It then ignores what it got back from the search and, if arg0 contains
a '/', checks whether arg0 is executable without path traversal.  This
seems a bit sideways to me; why not do that first?  HASHDIRS, I guess?
[This disagrees with execute(), which does the directly-executable test
first.]  In any case this can return NULL, leaving the ignored results
from the previous search in the hash table, unless PATHDIRS is set.

Then, if it DID find the command either in the hashtable already or
via hashcmd() but the hashtable entry does not have the HASHED bit [**]
it performs a search of only relative directory names in the path ahead
of the previously-found absolute directory.

Finally, if all of the above failed, we do a full path search of both
relative and absolute directories again, which is redundant in the
case of HASHCMDS unless by some unlikely race condition the command
has appeared in one of the directories in the path in the meantime.


[*] And indeed, from zsh -f:

torch% hash
torch% print =/X11/xterm
zsh: /X11/xterm not found
torch% hash | grep X11
/X11/xterm=/usr/bin//X11/xterm


[**] The HASHED bit means the command was deliberately inserted into
the hash table with the "hash" builtin, rather than found by search.
In this case the both findcmd() and execute() are forced to believe
what the hash table tells them.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC or so] Add HASH_LOOKUP option
  2010-08-23 17:46 ` Bart Schaefer
@ 2010-08-23 18:25   ` Mikael Magnusson
  2010-08-24  8:37     ` Bart Schaefer
  2011-06-02 15:11   ` Mikael Magnusson
  1 sibling, 1 reply; 6+ messages in thread
From: Mikael Magnusson @ 2010-08-23 18:25 UTC (permalink / raw)
  To: zsh workers

On 23 August 2010 19:46, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Aug 23,  4:09pm, Mikael Magnusson wrote:
> }
> } When this is unset, external commands are always resolved with a full
> } path search, but still inserted into the hash for spell correction if
> } those options are on.
> }
> } diff --git a/Src/exec.c b/Src/exec.c
> } index 93d1b26..9a488fe 100644
> } --- a/Src/exec.c
> } +++ b/Src/exec.c
> } @@ -754,7 +754,7 @@ findcmd(char *arg0, int docopy)
> }           }
> }           break;
> }       }
> } -    if (cn) {
> } +    if (cn && isset(HASHLOOKUP)) {
> }       char nn[PATH_MAX];
> }
> }       if (cn->node.flags & HASHED)
>
> I think there may be  problem with this in the event that the "hash"
> command has been used to deliberately insert an entry into the hash
> table.  That's a documented mechanism for overriding path search.
> Also I think you've missed that execute() also uses the hash table;
> did you intend that "command foo" ignores NO_HASHLOOKUP?

As I said, I didn't check everywhere yet to see if there are more
duplicated codepaths, so no, that is not the intent. However, command
foo for me doesn't appear to ignore it for me.

> In looking more closely, there are a number of things about findcmd()
> that look a bit fishy to me.  Correct me where I've gone wrong?

A lot of things looked fishy to me while poking at this, but I
couldn't be sure which were actually fishy and which were just arcane
reasons.

> It starts (if the command is not already in the hash table and HASHCMDS
> is set) by doing hashcmd(), which performs a search of only absolute
> directory names in the path (not, e.g., ".", "..", or other relative
> names).  That search itself seems a bit off because if *arg0 == '/'
> then it's going to prepend the path component anyway. [*]

That's the first fishy thing I noticed, why is this for loop over
$path copied to at least four places, two of which are in the same
function, with slight variations?

The fact that 'command' supposedly does a hash lookup while 'command
-p' doesn't seems to be rather undocumented too.

> It then ignores what it got back from the search and, if arg0 contains
> a '/', checks whether arg0 is executable without path traversal.  This
> seems a bit sideways to me; why not do that first?  HASHDIRS, I guess?
> [This disagrees with execute(), which does the directly-executable test
> first.]  In any case this can return NULL, leaving the ignored results
> from the previous search in the hash table, unless PATHDIRS is set.
>
> Then, if it DID find the command either in the hashtable already or
> via hashcmd() but the hashtable entry does not have the HASHED bit [**]
> it performs a search of only relative directory names in the path ahead
> of the previously-found absolute directory.

I guess in this specific function I could move my isset(HASHLOOKUP) to
be in the & HASHED if() instead of the cn one. Which is to say if the
whole thing isn't broken in the first place.

> Finally, if all of the above failed, we do a full path search of both
> relative and absolute directories again, which is redundant in the
> case of HASHCMDS unless by some unlikely race condition the command
> has appeared in one of the directories in the path in the meantime.
>
>
> [*] And indeed, from zsh -f:
>
> torch% hash
> torch% print =/X11/xterm
> zsh: /X11/xterm not found
> torch% hash | grep X11
> /X11/xterm=/usr/bin//X11/xterm

Even funner:
% print =./ls
/bin/./ls

> [**] The HASHED bit means the command was deliberately inserted into
> the hash table with the "hash" builtin, rather than found by search.
> In this case the both findcmd() and execute() are forced to believe
> what the hash table tells them.

Could this be used, theoretically, to still allow users to override
the path, and still do a full path search for search hashed entries? I
note you didn't include execcmd() in the function listing there, but
I'm not clear on how/if execute() and execcmd() relate to eachother.
When I looked at the definition for the struct hashnode earlier when
writing the patch, I found the rather helpful
  int flags;        /* various flags      */
and proceeded to ignore it :P.

Another thing I was vaguely wondering about is how is HASH_CMDS forced
on by CORRECT? I was unable to grep up anything about it.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC or so] Add HASH_LOOKUP option
  2010-08-23 18:25   ` Mikael Magnusson
@ 2010-08-24  8:37     ` Bart Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2010-08-24  8:37 UTC (permalink / raw)
  To: zsh workers

On Aug 23,  8:25pm, Mikael Magnusson wrote:
}
} > I think there may be  problem with this in the event that the "hash"
} > command has been used to deliberately insert an entry into the hash
} > table.  That's a documented mechanism for overriding path search.
} > Also I think you've missed that execute() also uses the hash table;
} > did you intend that "command foo" ignores NO_HASHLOOKUP?
} 
} As I said, I didn't check everywhere yet to see if there are more
} duplicated codepaths, so no, that is not the intent. However, command
} foo for me doesn't appear to ignore it for me.

execute() gets called for all WC_SIMPLE commands that aren't builtins
or functions, including the "command" and "exec" precommand modifiers:

    /* for command -p, search the default path */ 
    if (defpath) {
        ...
    } else {
   
	if ((cn = (Cmdnam) cmdnamtab->getnode(cmdnamtab, arg0))) {
            ...
        }
        ... usual path search ...
    }

} The fact that 'command' supposedly does a hash lookup while 'command
} -p' doesn't seems to be rather undocumented too.

It'd hardly be possible to search the default POSIX $PATH via a hash
lookup, given that the hash table is filled from what is most commonly
a non-default $path.
 
} > It starts (if the command is not already in the hash table and HASHCMDS
} > is set) by doing hashcmd(), which performs a search of only absolute
} > directory names in the path (not, e.g., ".", "..", or other relative
} > names).  That search itself seems a bit off because if *arg0 == '/'
} > then it's going to prepend the path component anyway. [*]
} 
} That's the first fishy thing I noticed, why is this for loop over
} $path copied to at least four places, two of which are in the same
} function, with slight variations?
 
My recollection (which I haven't bothered to augment by attempting to
search the list archives) is that the path is searched first so that
the hash table is populated so spelling corrections can be suggested,
or something like that.  The first such search, though, has to skip
relative directories, lest "." cause NO_PATH_DIRS to be violated.

Then it's necessary to repeat the search for just the directories that
previously were skipped, in case one of them appears earlier in the
path than the directory where the command was previously found.  So
that's two variations.

Then execute() has to search either the default $PATH (third variant)
or the normal $path (fourth variant), but doesn't have to worry about
separating absolute and relative directories; it can just use whichever
it finds first, just like the final fallback attempt in findcmd().
 
} I guess in this specific function I could move my isset(HASHLOOKUP) to
} be in the & HASHED if() instead of the cn one. Which is to say if the
} whole thing isn't broken in the first place.

Except for the part about putting goofy stuff in the hash table in a
few edge cases, I think it's not actually broken, just disorganized.

} > [**] The HASHED bit means the command was deliberately inserted into
} > the hash table with the "hash" builtin, rather than found by search.
} > In this case the both findcmd() and execute() are forced to believe
} > what the hash table tells them.
} 
} Could this be used, theoretically, to still allow users to override
} the path, and still do a full path search for search hashed entries?

I'm not sure what you mean, but on my best guess:  If you poke into
the hash table with "hash foo=/xyz/foo" then the "foo" command will
run /xyz/foo even if ${^path}/foo(N) is non-empty.

However, and this might be considered a bug, changing the value of
$path discards the entire hash table, including entries that have been
explicitly inserted with "hash".

} Another thing I was vaguely wondering about is how is HASH_CMDS forced
} on by CORRECT? I was unable to grep up anything about it.

It isn't actually forced on by CORRECT, but during the actual event of
correcting, spckword() calls cmdnamtab->filltable() which has the same
effect.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC or so] Add HASH_LOOKUP option
  2010-08-23 17:46 ` Bart Schaefer
  2010-08-23 18:25   ` Mikael Magnusson
@ 2011-06-02 15:11   ` Mikael Magnusson
  1 sibling, 0 replies; 6+ messages in thread
From: Mikael Magnusson @ 2011-06-02 15:11 UTC (permalink / raw)
  To: zsh workers

So I'm trying to remember why I put off finishing this. :) Sorry if I
ask something I've already asked, it appears to have been a while. I
haven't yet looked at the actual code again.

On 23 August 2010 19:46, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Aug 23,  4:09pm, Mikael Magnusson wrote:
> }
> } When this is unset, external commands are always resolved with a full
> } path search, but still inserted into the hash for spell correction if
> } those options are on.
> }
> } diff --git a/Src/exec.c b/Src/exec.c
> } index 93d1b26..9a488fe 100644
> } --- a/Src/exec.c
> } +++ b/Src/exec.c
> } @@ -754,7 +754,7 @@ findcmd(char *arg0, int docopy)
> }           }
> }           break;
> }       }
> } -    if (cn) {
> } +    if (cn && isset(HASHLOOKUP)) {
> }       char nn[PATH_MAX];
> }
> }       if (cn->node.flags & HASHED)
>
> I think there may be  problem with this in the event that the "hash"
> command has been used to deliberately insert an entry into the hash
> table.  That's a documented mechanism for overriding path search.

[and in a later reply]
> However, and this might be considered a bug, changing the value of
> $path discards the entire hash table, including entries that have been
> explicitly inserted with "hash".

Setting the rehash style to true also overrides this mechanism.
% zstyle \* rehash true
% echo $options[hashlookup]
on
% hash urxvt=/bin/false
% urxvt
[returns false, no terminal appears]
% urxv<tab, t appears><enter>
[terminal pops up]
Without the rehash style set at this point, it will still return false
instead of starting urxvt.

> } > [**] The HASHED bit means the command was deliberately inserted into
> } > the hash table with the "hash" builtin, rather than found by search.
> } > In this case the both findcmd() and execute() are forced to believe
> } > what the hash table tells them.
> }
> } Could this be used, theoretically, to still allow users to override
> } the path, and still do a full path search for search hashed entries?
>
> I'm not sure what you mean, but on my best guess:  If you poke into
> the hash table with "hash foo=/xyz/foo" then the "foo" command will
> run /xyz/foo even if ${^path}/foo(N) is non-empty.

What I mean is that with the patch, and setopt nohashlookup active,
doing hash urxvt=/bin/ls and then attempting to run urxvt, will still
run /usr/bin/urxvt (or wherever it's found). It would possibly be nice
if nohashlookup meant that if you add a binary earlier in your $path,
then it is automatically picked up despite being automatically hashed,
but if you inserted it manually with the hash builtin, it is not
picked up.

It would (maybe?) be nice if we would do a path search for a command
not in the hash before attempting to correct it, to avoid the problem
where you install something similar-sounding and zsh wants to correct
it to something else, and you have to answer 'n'. Maybe this is
orthogonal and should happen regardless of the setting of HASH_LOOKUP?

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-06-02 15:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-23 14:09 [RFC or so] Add HASH_LOOKUP option Mikael Magnusson
2010-08-23 14:55 ` Peter Stephenson
2010-08-23 17:46 ` Bart Schaefer
2010-08-23 18:25   ` Mikael Magnusson
2010-08-24  8:37     ` Bart Schaefer
2011-06-02 15:11   ` Mikael Magnusson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).