Most frequent history words

zsh-users
 help / color / mirror / code / Atom feed

* Most frequent history words
@ 2016-04-25 10:36 Sebastian Gniazdowski
  2016-04-25 19:49 ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Gniazdowski @ 2016-04-25 10:36 UTC (permalink / raw)
  To: Zsh Users

Hello,
can the following be made less coreutils dependent, i.e. more pure-Zsh code?

print -rl "${historywords[@]}" | sort | uniq -c | sort -k1,1nr -k2,2  | head

The goal is to list most frequent history words.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Most frequent history words
  2016-04-25 10:36 Most frequent history words Sebastian Gniazdowski
@ 2016-04-25 19:49 ` Bart Schaefer
  2016-04-26 11:03   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2016-04-25 19:49 UTC (permalink / raw)
  To: Zsh Users

On Apr 25, 12:36pm, Sebastian Gniazdowski wrote:
} Subject: Most frequent history words
}
} Hello,
} can the following be made less coreutils dependent, i.e. more pure-Zsh code?
} 
} print -rl "${historywords[@]}" | sort | uniq -c | sort -k1,1nr -k2,2  | head

It can, but it's probably not very efficient.

The pipe to sort can be replaced with

    print -rl -- ${(o)historywords[@]}

The "uniq -c" would have to be replaced by a loop building a hash whose
keys are words and whose values are the count thereof (making the initial
sort irrelevant).

    typeset -A uniq
    for k in ${historywords[@]}
    do uniq[$k]=$(( ${uniq[$k]:-0} + 1 ))
    done

Some quoting on $k such as ${(b)k} is probably required there, this is
the shakiest part of the process.

Then the final "sort -k..." would have to be done by iterating over the
hash, with "head" just taking an array slice.

    vk=()
    for k v in ${(kv)uniq}
    do vk+="$v=$k"
    done
    print -rl -- ${${${(on)vk}#<->=}[1,10]}

Plus unwrapping there whatever quoting on $k you did in the first loop.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Most frequent history words
  2016-04-25 19:49 ` Bart Schaefer
@ 2016-04-26 11:03   ` Sebastian Gniazdowski
  2016-04-26 19:39     ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Gniazdowski @ 2016-04-26 11:03 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh Users

The code works. Thanks. Had only to change (on) to (On). Interesting
trick with the sorting on "$v=$k". Here is a complete version for
someone to quickly reuse:


typeset -A uniq
for k in ${historywords[@]}; do
    uniq[$k]=$(( ${uniq[$k]:-0} + 1 ))
done

vk=()
for k v in ${(kv)uniq}; do
    vk+="$v=$k"
done
print -rl -- ${${${(On)vk}#<->=}[1,10]}


Interesingly, changing ${historywords[@]} to ${history[@]} and
"${history[@]}" doesn't change script's output, it still outputs most
frequent words, not history entries.

Best regards,
Sebastian Gniazdowski


On 25 April 2016 at 21:49, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Apr 25, 12:36pm, Sebastian Gniazdowski wrote:
> } Subject: Most frequent history words
> }
> } Hello,
> } can the following be made less coreutils dependent, i.e. more pure-Zsh code?
> }
> } print -rl "${historywords[@]}" | sort | uniq -c | sort -k1,1nr -k2,2  | head
>
> It can, but it's probably not very efficient.
>
> The pipe to sort can be replaced with
>
>     print -rl -- ${(o)historywords[@]}
>
> The "uniq -c" would have to be replaced by a loop building a hash whose
> keys are words and whose values are the count thereof (making the initial
> sort irrelevant).
>
>     typeset -A uniq
>     for k in ${historywords[@]}
>     do uniq[$k]=$(( ${uniq[$k]:-0} + 1 ))
>     done
>
> Some quoting on $k such as ${(b)k} is probably required there, this is
> the shakiest part of the process.
>
> Then the final "sort -k..." would have to be done by iterating over the
> hash, with "head" just taking an array slice.
>
>     vk=()
>     for k v in ${(kv)uniq}
>     do vk+="$v=$k"
>     done
>     print -rl -- ${${${(on)vk}#<->=}[1,10]}
>
> Plus unwrapping there whatever quoting on $k you did in the first loop.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Most frequent history words
  2016-04-26 11:03   ` Sebastian Gniazdowski
@ 2016-04-26 19:39     ` Bart Schaefer
  2016-04-27  9:14       ` Sebastian Gniazdowski
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2016-04-26 19:39 UTC (permalink / raw)
  To: Zsh Users

On Apr 26,  1:03pm, Sebastian Gniazdowski wrote:
}
} Interesingly, changing ${historywords[@]} to ${history[@]} and
} "${history[@]}" doesn't change script's output, it still outputs most
} frequent words, not history entries.

I get complete history entries with "${history[@]}" (with or without
the surrounding quotes).  Are you sure this isn't being affected by
some other setting such as shwordsplit?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Most frequent history words
  2016-04-26 19:39     ` Bart Schaefer
@ 2016-04-27  9:14       ` Sebastian Gniazdowski
  2016-04-27 15:59         ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Gniazdowski @ 2016-04-27  9:14 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh Users

On 26 April 2016 at 21:39, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Apr 26,  1:03pm, Sebastian Gniazdowski wrote:
> }
> } Interesingly, changing ${historywords[@]} to ${history[@]} and
> } "${history[@]}" doesn't change script's output, it still outputs most
> } frequent words, not history entries.
>
> I get complete history entries with "${history[@]}" (with or without
> the surrounding quotes).  Are you sure this isn't being affected by
> some other setting such as shwordsplit?

True, it does work. I was mislead by the fact that first 10 entries
were identical – commands like "ls", "git", single word stuff.

The code is slower than core utils pipeline but I've found a nice way
to solve this, wanting to preserve Zsh portability – a cache
regenerated periodically:

https://asciinema.org/a/4apqm5hzfz2pgci4u7371ophh

This makes the snippet a fine piece of code in general.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Most frequent history words
  2016-04-27  9:14       ` Sebastian Gniazdowski
@ 2016-04-27 15:59         ` Bart Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2016-04-27 15:59 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh Users

On Wed, Apr 27, 2016 at 2:14 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
>
> The code is slower than core utils pipeline but I've found a nice way
> to solve this, wanting to preserve Zsh portability – a cache

You could update the cache in a zshaddhistory hook to make it
continuously accurate.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-27 16:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-25 10:36 Most frequent history words Sebastian Gniazdowski
2016-04-25 19:49 ` Bart Schaefer
2016-04-26 11:03   ` Sebastian Gniazdowski
2016-04-26 19:39     ` Bart Schaefer
2016-04-27  9:14       ` Sebastian Gniazdowski
2016-04-27 15:59         ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).