zsh-users
 help / color / mirror / code / Atom feed
* Memory usage of history?
@ 2016-06-24 13:47 Dominik Vogt
  2016-06-24 22:57 ` Eric Cook
  2016-06-25  1:47 ` Bart Schaefer
  0 siblings, 2 replies; 7+ messages in thread
From: Dominik Vogt @ 2016-06-24 13:47 UTC (permalink / raw)
  To: Zsh Users; +Cc: Robin Dapp

Could someone please explain the implications of having a large
history file?  Does an interactive zsh read the history file into
private memory upon startup, or how does this work?  Is there a
way to reduce memory and cpu consumption somehow?  (A colleague
says his zshs use 200 MB memory each with a history size of a
million lines).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory usage of history?
  2016-06-24 13:47 Memory usage of history? Dominik Vogt
@ 2016-06-24 22:57 ` Eric Cook
  2016-06-25  1:47 ` Bart Schaefer
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Cook @ 2016-06-24 22:57 UTC (permalink / raw)
  To: zsh-users

On 06/24/2016 09:47 AM, Dominik Vogt wrote:
> Could someone please explain the implications of having a large
> history file?
Uses more disk space
> Does an interactive zsh read the history file into
> private memory upon startup, or how does this work?
zsh reads $HISTSIZE number of lines from $HISTFILE upon startup.
which may be the same size as $SAVEHIST, which controls the
number of lines to keep in $HISTFILE.
> Is there a way to reduce memory and cpu consumption somehow?  (A colleague
> says his zshs use 200 MB memory each with a history size of a
> million lines).

Reducing the number of lines you tell zsh to keep in memory via
HISTSIZE would help.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory usage of history?
  2016-06-24 13:47 Memory usage of history? Dominik Vogt
  2016-06-24 22:57 ` Eric Cook
@ 2016-06-25  1:47 ` Bart Schaefer
  2016-06-25 17:33   ` Nikolay Aleksandrovich Pavlov (ZyX)
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2016-06-25  1:47 UTC (permalink / raw)
  To: Zsh Users, Robin Dapp

On Fri, Jun 24, 2016 at 6:47 AM, Dominik Vogt <vogt@linux.vnet.ibm.com> wrote:
>
> (A colleague
> says his zshs use 200 MB memory each with a history size of a
> million lines).

To expand on Eric's answer, zsh reads the entire $HISTFILE and retains
the last $HISTSIZE entries.  So a large $HISTFILE also slows down
startup, even if it doesn't consume lots of memory.

I can't imagine anyone having a million useful lines of history.  A
few tens of thousands at most.  Things he might consider that would
allow him to reduce SAVEHIST and/or HISTSIZE without losing too much
information:
* Set the hist_ignore_all_dups option, if he doesn't already.
* Set the hist_save_no_dups option, similarly.
* Define a zshaddhistory function to filter out commands that are
unlikely to be used again.

If he isn't already ignoring / not saving duplicates, an interesting
experiment might be to add hist_ignore_all_dups without changing
HISTSIZE, then run zsh and see how many lines of history actually end
up being retained.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory usage of history?
  2016-06-25  1:47 ` Bart Schaefer
@ 2016-06-25 17:33   ` Nikolay Aleksandrovich Pavlov (ZyX)
  2016-06-25 17:46     ` Bart Schaefer
  2016-06-26 23:29     ` Bart Schaefer
  0 siblings, 2 replies; 7+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2016-06-25 17:33 UTC (permalink / raw)
  To: Bart Schaefer, Zsh Users, Robin Dapp

25.06.2016, 04:49, "Bart Schaefer" <schaefer@brasslantern.com>:
> On Fri, Jun 24, 2016 at 6:47 AM, Dominik Vogt <vogt@linux.vnet.ibm.com> wrote:
>>  (A colleague
>>  says his zshs use 200 MB memory each with a history size of a
>>  million lines).
>
> To expand on Eric's answer, zsh reads the entire $HISTFILE and retains
> the last $HISTSIZE entries. So a large $HISTFILE also slows down
> startup, even if it doesn't consume lots of memory.
>
> I can't imagine anyone having a million useful lines of history. A
> few tens of thousands at most. Things he might consider that would
> allow him to reduce SAVEHIST and/or HISTSIZE without losing too much
> information:
> * Set the hist_ignore_all_dups option, if he doesn't already.
> * Set the hist_save_no_dups option, similarly.
> * Define a zshaddhistory function to filter out commands that are
> unlikely to be used again.
>
> If he isn't already ignoring / not saving duplicates, an interesting
> experiment might be to add hist_ignore_all_dups without changing
> HISTSIZE, then run zsh and see how many lines of history actually end
> up being retained.

Actually there may be better solution: consider the case when zsh

1. allows saving user-defined metada in history file and
2. allows user to get control over what exactly will be removed.

Specifically first may be used to save information about

1. How often the command is used (total number of uses, anything else like “uses per month” would be harder to determine).
2. Time it took command to type (when it was typed for the first time) (time between first self-insert (or $*BUFFER modification if it was constructed by a widget) and accept-line).
3. Last time command was run.
4. Time it took command to finish (average among all runs).
5. What was the exit code (hash exit code - number of times it occurred).

Second is supposed to be a function like `zshhistkey` that returns basically the same thing as function used for `(o+)`: function that accepts history entry with attached metadata (passed through arguments or via a local parameter that is an associative array, meatadata saved by EXTENDED_HISTORY should also be passed) and saves something in $REPLY, history entries with least values in $REPLY will be removed.

On this basis it would be possible to construct a more useful filter, I guess the first three would be enough (when removing history lines, find least often then fastest to type commands and remove them in first place, but always save commands typed during the last hour: `zshhistkey() { REPLY="$(printf "%u-%020u-%020.2g" $[$(date +%s) - $metadata[last_run_time] < 60 * 60 ? 1 : 0] $metadata[num_runs] $metadata[type_duration])"`). EXTENDED_HISTORY already provides 4 (though I do not think it provides “average”) and 3, but I do not find that very useful (especially 4, 3 is needed to protect most recent commands).

Without something like this “set $HISTSIZE and $SAVEHIST to a rather large number” strategy (in addition to the options you mentioned) is the best option, I personally have both set to 65536. I have no idea how one may construct “zshaddhistory function” that “filters out commands that are unlikely to be used again” without somehow know what these stats will be in the future.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory usage of history?
  2016-06-25 17:33   ` Nikolay Aleksandrovich Pavlov (ZyX)
@ 2016-06-25 17:46     ` Bart Schaefer
  2016-06-26 23:29     ` Bart Schaefer
  1 sibling, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-06-25 17:46 UTC (permalink / raw)
  To: ZyX; +Cc: Robin Dapp, Zsh Users

[-- Attachment #1: Type: text/plain, Size: 472 bytes --]

On Jun 25, 2016 10:33 AM, "Nikolay Aleksandrovich Pavlov (ZyX)" <
kp-pav@yandex.ru> wrote:
> I have no idea how one may construct “zshaddhistory function” that
“filters out commands that are unlikely to be used again” without somehow
know what these stats will be in the future.

As an example, if any command I type contains an argument "foo" "bar" or
"baz" it is probably a throwaway.  There must be other similar criteria,
e.g., why ever save "cd .."?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory usage of history?
  2016-06-25 17:33   ` Nikolay Aleksandrovich Pavlov (ZyX)
  2016-06-25 17:46     ` Bart Schaefer
@ 2016-06-26 23:29     ` Bart Schaefer
  2016-06-27  0:23       ` Nikolay Aleksandrovich Pavlov (ZyX)
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2016-06-26 23:29 UTC (permalink / raw)
  To: Zsh Users

On Jun 25,  8:33pm, Nikolay Aleksandrovich Pavlov (ZyX) wrote:
} Subject: Re: Memory usage of history?
}
} 1. allows saving user-defined metada in history file and

I'm not sure the answer to the history file being too large is to
make it even larger by cramming in all sorts of other data.  This
would be even slower to parse at load time as well.

} 2. allows user to get control over what exactly will be removed.

In addition to all the other stuff I mentioned, I forgot about the
relatively recent addition of the HISTORY_IGNORE variable, which can
be a pattern that matches lines to leave out.  That would be the
best way to handle my "foo is a throwaway" and similar criteria.

} Specifically first may be used to save information about
} 
} 1. How often the command is used (total number of uses, anything else like
}    "uses per month" would be harder to determine).
} 2. Time it took command to type (when it was typed for the first time)
}    (time between first self-insert (or $*BUFFER modification if it was
}    constructed by a widget) and accept-line).
} 3. Last time command was run.
} 4. Time it took command to finish (average among all runs).
} 5. What was the exit code (hash exit code - number of times it occurred).

I find these to be very unlikely criteria for deciding what's interesting
in the history?

For one thing, "time it took to type" is going to be really hard to get
right; multi-line commands have multiple accept-line calls, and you'd
have to filter out commands that were recalled from the history or you'd
get an average much too small.

Larger number of uses would be skewed towards really simple things, and
in fact (at least in my own case) the LESS often I use a command, the
more likely I am to want it from the history (unless it's one of those
throwaways I mentioned in another message), because I can remember the
ones I use a lot without zsh's help.  If I use it often enough, I can
make an alias or keybinding for it and not need to search history.

How long the command took to run seems entirely unrelated to whether
it is history-worthy (and also doesn't work with shared/incremental
history). What would you use the exit code for, except maybe weeding
out typos?

I like Christian Neukirchen's idea of maintaining a daily archive.
Adding a function / keybinding to search through an alternate history
store seems more manageable than either having a huge history always
in memory or a complicated AI for storing only interesting bits.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory usage of history?
  2016-06-26 23:29     ` Bart Schaefer
@ 2016-06-27  0:23       ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 0 replies; 7+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2016-06-27  0:23 UTC (permalink / raw)
  To: Bart Schaefer, Zsh Users



27.06.2016, 02:30, "Bart Schaefer" <schaefer@brasslantern.com>:
> On Jun 25, 8:33pm, Nikolay Aleksandrovich Pavlov (ZyX) wrote:
> } Subject: Re: Memory usage of history?
> }
> } 1. allows saving user-defined metada in history file and
>
> I'm not sure the answer to the history file being too large is to
> make it even larger by cramming in all sorts of other data. This
> would be even slower to parse at load time as well.

The idea is that history file needs not be too large, but without more advanced criteria using big SAVEHIST value is needed to not miss useful, but uncommon entries. So adding metadata will reduce history size not because metadata reduces history entry size, but because smaller SAVEHIST is needed.

>
> } 2. allows user to get control over what exactly will be removed.
>
> In addition to all the other stuff I mentioned, I forgot about the
> relatively recent addition of the HISTORY_IGNORE variable, which can
> be a pattern that matches lines to leave out. That would be the
> best way to handle my "foo is a throwaway" and similar criteria.

I cannot say I have any such patterns.

>
> } Specifically first may be used to save information about
> }
> } 1. How often the command is used (total number of uses, anything else like
> } "uses per month" would be harder to determine).
> } 2. Time it took command to type (when it was typed for the first time)
> } (time between first self-insert (or $*BUFFER modification if it was
> } constructed by a widget) and accept-line).
> } 3. Last time command was run.
> } 4. Time it took command to finish (average among all runs).
> } 5. What was the exit code (hash exit code - number of times it occurred).
>
> I find these to be very unlikely criteria for deciding what's interesting
> in the history?
>
> For one thing, "time it took to type" is going to be really hard to get
> right; multi-line commands have multiple accept-line calls, and you'd
> have to filter out commands that were recalled from the history or you'd
> get an average much too small.

Filtering out commands that were recalled from history is not hard: there are not too much widgets that do this. Though it may be practical for commands that were recalled from history and modified to add “time it took to type” from the original command to the modified one.

Also some prompt %format allows determining whether command is a continuation, so saving previous time and adding it on next accept-line is not a problem.

>
> Larger number of uses would be skewed towards really simple things, and
> in fact (at least in my own case) the LESS often I use a command, the
> more likely I am to want it from the history (unless it's one of those
> throwaways I mentioned in another message), because I can remember the
> ones I use a lot without zsh's help. If I use it often enough, I can
> make an alias or keybinding for it and not need to search history.

My main point was making a custom function that allows to adjust criteria. I suggested this because I tend to keep in history some commands which are rather easy to retype, but I need them fast, and I do not want to have 100500 aliases for easy commands in my zshrc.

>
> How long the command took to run seems entirely unrelated to whether
> it is history-worthy (and also doesn't work with shared/incremental
> history). What would you use the exit code for, except maybe weeding
> out typos?

Exit code is for typos, and it was put on the last place because it is not much useful for other purposes. How long it took command to run is second-least-useful, but still has something to do with history: usually I do not want to repeat long-running commands, if needed they are first candidates to be run using `screen`/`tmux`/… which will be another history entry.

>
> I like Christian Neukirchen's idea of maintaining a daily archive.
> Adding a function / keybinding to search through an alternate history
> store seems more manageable than either having a huge history always
> in memory or a complicated AI for storing only interesting bits.

So far I am fine with my variant “just use large SAVEHIST and HISTSIZE”. Just suggested a way to reduce the number of entries that need to be stored that came to my mind.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-27  0:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-24 13:47 Memory usage of history? Dominik Vogt
2016-06-24 22:57 ` Eric Cook
2016-06-25  1:47 ` Bart Schaefer
2016-06-25 17:33   ` Nikolay Aleksandrovich Pavlov (ZyX)
2016-06-25 17:46     ` Bart Schaefer
2016-06-26 23:29     ` Bart Schaefer
2016-06-27  0:23       ` Nikolay Aleksandrovich Pavlov (ZyX)

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).