zsh-workers
 help / color / mirror / code / Atom feed
* Possible bug: HASH_CMDS has no observable effect
@ 2020-09-11  8:21 Roman Perepelitsa
  2020-09-11 14:48 ` Phil Pennock
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2020-09-11  8:21 UTC (permalink / raw)
  To: Zsh hackers list

From the documentation for HASH_CMDS option:

  Note the location of each command the first time it is executed.
  Subsequent invocations of the same command will use the saved
  location, avoiding a path search. If this option is unset, no path
  hashing is done at all. However, when CORRECT is set, commands
  whose names do not appear in the functions or aliases hash tables
  are hashed in order to avoid reporting them as spelling errors.

I took this to mean that after installing rsync and invoking it,
${commands[rsync]} will be set and running `hash` will display an
entry for rsync. This, however, is not the case.

 % sudo docker run -e TERM -it --rm zshusers/zsh:5.8
  # print $options[hash_cmds]
  on
  # rsync
  zsh: command not found: rsync
  # apt-get update && apt-get install -y rsync
  # rsync
  rsync  version 3.1.2  protocol version 31
  # print $+commands[rsync]
  0
  # hash | grep rsync
  #

Discovered during the discussion of
https://github.com/zsh-users/zsh-syntax-highlighting/pull/764.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-11  8:21 Possible bug: HASH_CMDS has no observable effect Roman Perepelitsa
@ 2020-09-11 14:48 ` Phil Pennock
  2020-09-11 15:01   ` Roman Perepelitsa
  0 siblings, 1 reply; 13+ messages in thread
From: Phil Pennock @ 2020-09-11 14:48 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On 2020-09-11 at 10:21 +0200, Roman Perepelitsa wrote:
> From the documentation for HASH_CMDS option:
> 
>   Note the location of each command the first time it is executed.
>   Subsequent invocations of the same command will use the saved
>   location, avoiding a path search. If this option is unset, no path
>   hashing is done at all. However, when CORRECT is set, commands
>   whose names do not appear in the functions or aliases hash tables
>   are hashed in order to avoid reporting them as spelling errors.

So, if and only if CORRECT is set, then non-present commands will be
remembered as not present.

>  % sudo docker run -e TERM -it --rm zshusers/zsh:5.8
>   # print $options[hash_cmds]
>   on

% docker run -e TERM -it --rm zshusers/zsh:5.8
7190647e021a# print $options[hash_cmds] $options[correct]
on off
7190647e021a# setopt correct
7190647e021a# print $options[hash_cmds] $options[correct]
on on
7190647e021a# rsync
zsh: correct 'rsync' to 'sync' [nyae]? n
zsh: command not found: rsync
7190647e021a# apt-get update && apt-get install -y rsync
[...]
7190647e021a# rsync
zsh: correct 'rsync' to 'sync' [nyae]? 

At that point, if you type 'n', then zsh will go ahead and try to run
rsync, and it will succeed.

You will need to run `rehash` (or whatever) before zsh will remember,
for correction purposes, that the command does exist after all.

-Phil


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-11 14:48 ` Phil Pennock
@ 2020-09-11 15:01   ` Roman Perepelitsa
  2020-09-11 16:10     ` Phil Pennock
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2020-09-11 15:01 UTC (permalink / raw)
  To: Phil Pennock; +Cc: Zsh hackers list

On Fri, Sep 11, 2020 at 4:48 PM Phil Pennock
<zsh-workers+phil.pennock@spodhuis.org> wrote:
>
> On 2020-09-11 at 10:21 +0200, Roman Perepelitsa wrote:
> > From the documentation for HASH_CMDS option:
> >
> >   Note the location of each command the first time it is executed.
> >   Subsequent invocations of the same command will use the saved
> >   location, avoiding a path search. If this option is unset, no path
> >   hashing is done at all. However, when CORRECT is set, commands
> >   whose names do not appear in the functions or aliases hash tables
> >   are hashed in order to avoid reporting them as spelling errors.
>
> So, if and only if CORRECT is set, then non-present commands will be
> remembered as not present.

Could you clarify how this statement is related to my bug report?

In case this wasn't clear, in my bug report the output of the
following two commands is not what I expect:

  # print $+commands[rsync]
  0
  # hash | grep rsync
  #

The expected output:

  # print $+commands[rsync]
  1
  # hash | grep rsync
  rsync=/usr/bin/rsync
  #

The reason why I expect this output is because I've invoked rsync
right before these two commands while HASH_CMDS was in effect.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-11 15:01   ` Roman Perepelitsa
@ 2020-09-11 16:10     ` Phil Pennock
  2020-09-11 16:33       ` Roman Perepelitsa
  0 siblings, 1 reply; 13+ messages in thread
From: Phil Pennock @ 2020-09-11 16:10 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On 2020-09-11 at 17:01 +0200, Roman Perepelitsa wrote:
> Could you clarify how this statement is related to my bug report?

Sure thing.  You quoted documentation which also covered a combination
not in effect, reported Zsh's behavior, and wrote:

} I took this to mean that after installing rsync and invoking it,
} ${commands[rsync]} will be set and running `hash` will display an
} entry for rsync. This, however, is not the case.

I explained the observed behavior, relative to the quoted documentation,
and what was going on.

> In case this wasn't clear, in my bug report the output of the
> following two commands is not what I expect:
> 
>   # print $+commands[rsync]
>   0
>   # hash | grep rsync
>   #
> 
> The expected output:
> 
>   # print $+commands[rsync]
>   1
>   # hash | grep rsync
>   rsync=/usr/bin/rsync
>   #
> 
> The reason why I expect this output is because I've invoked rsync
> right before these two commands while HASH_CMDS was in effect.

There's two issues here, and it does look to me like the docs are out of
date.

Per the documentation, the first time you invoked `rsync`, an entry was
added to the cache and thereafter when you invoked rsync, the cached
entry was used.  So the quoted examples don't make the documentation
wrong.

Except that's not what's going on, because even if rsync is installed
before you first try to run it, the same thing happens.  So it looks
like zsh is preemptively building the command hash and not remembering
when you do first run it.

If you install rsync before first trying to access it, the docs imply
that it should just work for you.  Instead, you need to `rehash` first.


Can anyone speak to whether we should change the documentation or the
behavior?  If the documentation, can someone who's not a maintainer
suggest text which is sufficiently clear to them?

-Phil


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-11 16:10     ` Phil Pennock
@ 2020-09-11 16:33       ` Roman Perepelitsa
  2020-09-11 21:10         ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2020-09-11 16:33 UTC (permalink / raw)
  To: Phil Pennock; +Cc: Zsh hackers list

On Fri, Sep 11, 2020 at 6:10 PM Phil Pennock
<zsh-workers+phil.pennock@spodhuis.org> wrote:
>
> On 2020-09-11 at 17:01 +0200, Roman Perepelitsa wrote:
> > Could you clarify how this statement is related to my bug report?
>
> Sure thing.  You quoted documentation which also covered a combination
> not in effect, reported Zsh's behavior, and wrote:
>
> } I took this to mean that after installing rsync and invoking it,
> } ${commands[rsync]} will be set and running `hash` will display an
> } entry for rsync. This, however, is not the case.
>
> I explained the observed behavior, relative to the quoted documentation,
> and what was going on.

Thanks for the clarification. I've read your post once again and I
don't understand how it explains the observed behavior.

Firstly, let's dispense with the long option description to avoid the
red herring of CORRECT. Here's a shortened version that removes
clauses that don't apply:

  Note the location of each command the first time it is executed.
  Subsequent invocations of the same command will use the saved
  location, avoiding a path search. If this option is unset,
  [irrelevant because this option is set]. However, when CORRECT is
  set, [irrelevant because CORRECT is not set].

And here are the important parts of the commands I ran:

1. The first execution of rsync. It prints version plus help; I've
truncated the latter.

  # rsync
  rsync  version 3.1.2  protocol version 31

2. Check whether rsync is hashed. It appears not to be.

  # print $+commands[rsync]
  0
  # hash | grep rsync
  #

Why isn't rsync hashed the first time it's executed?

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-11 16:33       ` Roman Perepelitsa
@ 2020-09-11 21:10         ` Bart Schaefer
  2020-09-12  7:02           ` Roman Perepelitsa
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2020-09-11 21:10 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On Fri, Sep 11, 2020 at 9:33 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> Why isn't rsync hashed the first time it's executed?

This is usually because of HASH_DIRS:
              Whenever a command name is hashed, hash the directory containing
              it,  as  well as all directories that occur earlier in the path.

Once the cache has been pre-populated by hashing a directory,
HASH_CMDS stops attempting to search that directory for new additions.
A "rehash" is needed to discard the cached directory and reload any
new commands that are now contained therein.

If you unset HASH_DIRS you might get what you want.

A complication of this is that the completion system also invokes
command hashing in order to be able to use the $commands associative
array.  So if you use completion at all, you might also find that
command (re)hashing works differently than in a "zsh -f" shell.  Refer
to the description of "rehash" under "Standard Styles".


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-11 21:10         ` Bart Schaefer
@ 2020-09-12  7:02           ` Roman Perepelitsa
  2020-09-12  8:35             ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2020-09-12  7:02 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On Fri, Sep 11, 2020 at 11:10 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> On Fri, Sep 11, 2020 at 9:33 AM Roman Perepelitsa
> <roman.perepelitsa@gmail.com> wrote:
> >
> > Why isn't rsync hashed the first time it's executed?
>
> This is usually because of HASH_DIRS
> [...]
> If you unset HASH_DIRS you might get what you want.

I can confirm that unsetting HASH_DIRS makes HASH_CMDS behave as I
expect. That is, after a successful invocation of rsync,
rsync=/usr/bin/rsync gets hashed.

This implies that invoking any non-existing command with HASH_DIRS set
effectively disables HASH_CMDS. The following test confirms it:

  % sudo docker run -e TERM -it --rm zshusers/zsh:5.8
  # does-not-exist
  zsh: command not found: does-not-exist
  # apt-get update && apt-get install -y rsync
  # rsync
  rsync  version 3.1.2  protocol version 31
  # print $+commands[rsync]
  0

Do I understand it correctly that this is working as intended? It
appears to contradict the documentation for HASH_CMDS, which states
that commands are hashed when they are invoked for the first time.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-12  7:02           ` Roman Perepelitsa
@ 2020-09-12  8:35             ` Bart Schaefer
  2020-09-12  8:49               ` Roman Perepelitsa
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2020-09-12  8:35 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On Sat, Sep 12, 2020 at 12:02 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> Do I understand it correctly that this is working as intended? It
> appears to contradict the documentation for HASH_CMDS, which states
> that commands are hashed when they are invoked for the first time.

What it really means is that commands are hashed the first time a path
search finds them.

If you have only HASH_CMDS, then "a path search finds" just the single
command you invoked.  If you tend to be using one, or a few, commands
repeatedly, and never invoke anything else, this is all you need to
avoid path search overhead for commands that exist.  It remains
"expensive" to discover that a command does not exist.

If you also have HASH_DIRS set, then "a path search finds" every entry
in every directory that is searched.  This eliminates the search
overhead for any external command you might use in the future.  In
order to avoid the expense of a useless search when for example you
make a typo, the assumption is made that if the command is not already
in the cache then it must not be in any part of the path that has
previously been searched.  The whole search-and-cache process is
short-circuited.

This really is the intended behavior, because for most people most of
the time new commands do not appear in the path during a shell
session.  It's also the reason that HASH_EXECUTABLES_ONLY eventually
got added, because the "every entry in every directory" part tends to
be too aggressive for some path elements.

Historically, HASH_CMDS predates HASH_DIRS by several years, and the
documentation for the former still uses the wording that encompassed
that state.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-12  8:35             ` Bart Schaefer
@ 2020-09-12  8:49               ` Roman Perepelitsa
  2020-09-12 20:41                 ` Bart Schaefer
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2020-09-12  8:49 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On Sat, Sep 12, 2020 at 10:35 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
> This eliminates the search
> overhead for any external command you might use in the future.  In
> order to avoid the expense of a useless search when for example you
> make a typo, the assumption is made that if the command is not already
> in the cache then it must not be in any part of the path that has
> previously been searched.  The whole search-and-cache process is
> short-circuited.

Doesn't this search happen anyway? When I type `rsync`, it gets
resolved as /usr/bin/rsync and gets executed. This requires searching
for rsync in all path directories. I believe this is done in execute()
in Src/exec.c. Wouldn't it be better to search for `rsync` in the
parent shell (before forking) and hash the result? This would make the
behavior of HASH_CMDS match the documentation (and my intuition),
would make the invocation of newly installed commands faster, and
wouldn't slow anything down. Am I missing something?

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-12  8:49               ` Roman Perepelitsa
@ 2020-09-12 20:41                 ` Bart Schaefer
  2020-09-12 20:43                   ` Bart Schaefer
  2020-09-13  9:31                   ` Roman Perepelitsa
  0 siblings, 2 replies; 13+ messages in thread
From: Bart Schaefer @ 2020-09-12 20:41 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On Sat, Sep 12, 2020 at 1:49 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> On Sat, Sep 12, 2020 at 10:35 AM Bart Schaefer
> <schaefer@brasslantern.com> wrote:
> > The whole search-and-cache process is
> > short-circuited.
>
> Doesn't this search happen anyway? When I type `rsync`, it gets
> resolved as /usr/bin/rsync and gets executed. This requires searching
> for rsync in all path directories.

The cache is a mapping from command names to locations.  So,  if
"rsync" is in the cache, then the value of that cache entry is
"/usr/bin/rsync" and zsh just executes that without scanning the path
again.  Conversely if HASH_DIRS is set and "rsync" is NOT in the
cache, but everything else from /usr/bin IS in the cache (as a
byproduct of some previous search), zsh reports command not found,
again without re-scanning the path.

HASH_DIRS is done incrementally, e.g., if your path is "/bin:/usr/bin"
and you first execute a command from /bin, zsh only populates the
cache with commands in /bin, and remembers that it has not looked in
/usr/bin yet.  If you then execute "rsync", a cache miss causes
HASH_DIRS to scan /usr/bin, add everything there to the cache, and
execute /usr/bin/rsync.

Eventually every directory in the path will have been scanned, and
cache misses become immediately "not found".  If HASH_DIRS is not set
(but HASH_CMDS is).

> I believe this is done in execute()
> in Src/exec.c. Wouldn't it be better to search for `rsync` in the
> parent shell (before forking) and hash the result?

That's exactly how it works.  The search in execute() is only done if
an absolute file location for the command isn't provided by the
parent, which only occurs if HASH_CMDS is not set.

This does mean that if a command is removed from (or moved within) the
path, execute() will be handed an invalid file location and will fail.
Again, for most users this never happens.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-12 20:41                 ` Bart Schaefer
@ 2020-09-12 20:43                   ` Bart Schaefer
  2020-09-13  9:31                   ` Roman Perepelitsa
  1 sibling, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2020-09-12 20:43 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On Sat, Sep 12, 2020 at 1:41 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Eventually every directory in the path will have been scanned, and
> cache misses become immediately "not found".  If HASH_DIRS is not set
> (but HASH_CMDS is).

Sorry, unintentional sentence fragment.  "... then all the directories
are always rescanned on a cache miss."


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-12 20:41                 ` Bart Schaefer
  2020-09-12 20:43                   ` Bart Schaefer
@ 2020-09-13  9:31                   ` Roman Perepelitsa
  2020-09-13 22:24                     ` Bart Schaefer
  1 sibling, 1 reply; 13+ messages in thread
From: Roman Perepelitsa @ 2020-09-13  9:31 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On Sat, Sep 12, 2020 at 10:41 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> That's exactly how it works.  The search in execute() is only done if
> an absolute file location for the command isn't provided by the
> parent, which only occurs if HASH_CMDS is not set.

This is what I expected to happen but that's not what actually
happens. Sometimes, when HASH_CMDS is set, a successfully invoked
command does not get hashed.

Here's a new test case that doesn't use docker.

  % zsh -f
  % mkdir /tmp/foo
  % path+=(/tmp/foo)
  % print $+commands[bar]
  0
  % print 'echo hello' >/tmp/foo/bar
  % chmod +x /tmp/foo/bar
  % bar
  hello
  % print $+commands[bar]
  0

The output of the last command should be "1", right? Everything else
looks as expected.

Roman.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Possible bug: HASH_CMDS has no observable effect
  2020-09-13  9:31                   ` Roman Perepelitsa
@ 2020-09-13 22:24                     ` Bart Schaefer
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Schaefer @ 2020-09-13 22:24 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

On Sun, Sep 13, 2020 at 2:31 AM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> This is what I expected to happen but that's not what actually
> happens. Sometimes, when HASH_CMDS is set, a successfully invoked
> command does not get hashed.

Ah ... recall that several messages back, I wrote:
> A complication of this is that the completion system also invokes
> command hashing in order to be able to use the $commands associative
> array.  So if you use completion at all, you might also find that
> command (re)hashing works differently than in a "zsh -f" shell.

I forgot that even with zsh -f, interactive shells load the
zsh/compctl and zsh/complete modules.

Those are causing the $commands hash to be immediately repopulated as
soon as $path is changed, and HASH_DIRS is also set by default, so
further updates to the hash table do not occur.  If you were to create
/tmp/foo/bar before adding /tmp/foo to the path, you would see what
you expect.

It appears that throughout this thread I've been conflating the
command being "not found" by correction [spckword()] with the command
being "not found" by execute().  Sorry about that.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-13 22:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11  8:21 Possible bug: HASH_CMDS has no observable effect Roman Perepelitsa
2020-09-11 14:48 ` Phil Pennock
2020-09-11 15:01   ` Roman Perepelitsa
2020-09-11 16:10     ` Phil Pennock
2020-09-11 16:33       ` Roman Perepelitsa
2020-09-11 21:10         ` Bart Schaefer
2020-09-12  7:02           ` Roman Perepelitsa
2020-09-12  8:35             ` Bart Schaefer
2020-09-12  8:49               ` Roman Perepelitsa
2020-09-12 20:41                 ` Bart Schaefer
2020-09-12 20:43                   ` Bart Schaefer
2020-09-13  9:31                   ` Roman Perepelitsa
2020-09-13 22:24                     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).