> n=($^fpath(e^'n=($REPLY/*(N.)); reply=("$#n $REPLY")'^))
> print -l ${${(On)n}[1,3]}

And this continues to demonstrate that Zsh is the only language that the more I learn, the less readable my code becomes.  I really do appreciate you demonstrating the most-Zsh way to achieve the desired result.

There really ought to be an explainshell.com equivalent for Zsh expressions / expansion / modifiers / etc.  The information is already nicely codified in Zsh autocompletion (e.g. ${(<TAB> ), it would be nice to feed in expressions like the above and get a sane explanation.

I actually intend to use this goal for a babys-first-Rust project, so we'll see how far along I get.  The MVP of the project is to take a glob / path expansion expression (e.g. foo/**/bar(^/.) ) and convert it into a BSD find expression.

>> for d in $fpath; do n=$(ls $d/* | wc -l); echo "$n $d"; done | sort -nr | head -3
> Good heavens, so many processes and pipes.

Pipes are nicely composable, and maintainable by others not intimately familiar with Zsh section 14.  The Unix philosophy still applies -- do one thing and do it well.  Shells are good at connecting inputs and outputs and modifying them.

Sure, $#fpath + 4 processes is heavy vs. an array construct and a builtin -- but I don't think anybody is writing shell scripts for performance.

This is more apparent when using e.g. the Rust package 'fd' via `fd --type f .` instead of extglob with **/*(.).  8.2 seconds vs 328 seconds (sample size is 1,641,649 files).  The only trade-off is that "fd" does not guarantee any ordering.  Skipping fd's internal sorting flags, and piping directly to sort(1) gives a runtime of 30 seconds -- still ten times faster than extglob (which is close to the number of CPU cores I have).

$ hyperfine --runs 2 "zsh -il -c 'echo **/*(.)'" "zsh -il -c 'fd --type f .'"
Benchmark 1: zsh -il -c 'echo **/*(.)'
  Time (mean ± σ):     328.973 s ±  2.153 s    [User: 198.746 s, System: 86.629 s]
  Range (min … max):   327.451 s … 330.496 s    2 runs

Benchmark 2: zsh -il -c 'fd --type f .'
  Time (mean ± σ):      8.281 s ±  0.703 s    [User: 17.441 s, System: 47.829 s]
  Range (min … max):    7.784 s …  8.778 s    2 runs

Shells ultimately exist to spawn processes and create pipes.  I'd wager that (A) below is more maintainable than (B).

A. | sort | head -3
B. {$(${(On)n}[1,3]

If there's a sufficient performance benefit to in-shell-process computation, I would love to see some standard library expansions of zsh to reimplement common GNU/BSD utilities as functions

Zach Riggle


On Mon, Nov 29, 2021 at 10:12 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
On Mon, Nov 29, 2021 at 6:30 PM Zach Riggle <zachriggle@gmail.com> wrote:
>
> I would expect that the md5sum of a file is reasonably fast, and could be stored in the .zwc for sanity checking, instead of just the "newer than" check.

To what are you comparing that checksum?  It could tell you if the
.zwc file were corrupted, but not whether the file differs from all
the component files that were compiled into it.  Even if you could
somehow tell they were different, that doesn't answer the question of
whether the .zwc contains newer versions of any of those functions.
The .zwc does contain a check that it matches the parser version of
the shell that's trying to read it.

> I expect that I have more $fpath entries than usual, but the total number of autoloadable functions is much more.

That's exactly the point:  You're unlikely to ever execute most of
those functions, so storing an autoload entry for them is much more
space-efficient (and startup-time faster) than actually parsing and
storing the function definitions themselves.

> $ for d in $fpath; do n=$(ls $d/* | wc -l); echo "$n $d"; done | sort -nr | head -3

Good heavens, so many processes and pipes.

n=($^fpath(e^'n=($REPLY/*(N.)); reply=("$#n $REPLY")'^))
print -l ${${(On)n}[1,3]}