From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-6218-mason-zsh=primenet.com.au@sunsite.auc.dk>
Received: (qmail 2470 invoked from network); 4 May 1999 19:42:23 -0000
Received: from sunsite.auc.dk (130.225.51.30)
  by ns1.primenet.com.au with SMTP; 4 May 1999 19:42:23 -0000
Received: (qmail 13677 invoked by alias); 4 May 1999 19:42:18 -0000
Mailing-List: contact zsh-workers-help@sunsite.auc.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
X-Seq: 6218
Received: (qmail 13670 invoked from network); 4 May 1999 19:42:17 -0000
Message-Id: <199905041942.MAA11235@bebop.clari.net>
To: zsh-workers@sunsite.auc.dk
Subject: Upcoming: handy history extensions
Date: Tue, 04 May 1999 12:42:14 -0700
From: Wayne Davison <wayne@clari.net>

I spent some time over the weekend adding some history features that
I've been thinking about for some time, and I wanted to send some
email about what I'm doing to (1) get any feedback you might have,
and (2) ask a few questions.

My primary motivation was an annoyance that the myriad of "normal"
commands that I type over and over crowd out older and more useful
commands that I typed long ago, especially when exiting multiple
shells in a row.  I figured that if the history routines favored
unique commands, I could store more commands of relevance.  I also
wanted to be able to have multiple shells incrementally adding their
history to the history file (instead of dumping it all at the end),
and I wanted to be able to have searching find only unique commands
(never showing me the same command twice, even if the duplicates are
not consecutive).  While I was at it, I also decided to allow
simultaneously-running shells to share their history data.

To do this, I decided to make a number of options so that people
could customize this to their liking.  Here's what I came up with:

HIST_SAVE_NO_DUPS causes only "unique" commands to be written to the
history file (unique after ignoring white-space differences).  The
setting of APPEND_HISTORY determines if the file is a unique set of
the internal history (off) or the unique set of the old history file
combined with the newly-appended lines (on).

HIST_FIND_NO_DUPS causes the search commands to only visit unique
lines in your internal history buffer.  Does not affect commands
and move up and down in the history buffer.

HIST_EXPIRE_DUPS_FIRST will make the internal history list preserve
a unique command at the tail end of the list when there is a newer,
duplicated command that can be deleted instead.

HIST_IGNORE_ALL_DUPS is like HIST_IGNORE_DUPS, but it removes all
duplicates of newer commands from your history buffer, even if they
aren't adjacent.  Having this set renders the options
HIST_FIND_NO_DUPS, HIST_SAVE_NO_DUPS, and HIST_EXPIRE_DUPS_FIRST
irrelevant (since there are no duplicates in the internal history
data).

INCREMENTAL_APPEND_HISTORY is just like APPEND_HISTORY, except that
it adds the lines to the history file as you enter them.  It also
periodically re-writes the file in addition to re-writing on exit.
I'd be willing to merge this new option into the old one, if people
feel that the slight change in behavior would not disturb upgraders.

SHARE_HISTORY enables the reading the history lines that other shells
have appended to the current history file.  Setting this option also
causes the shell to behave like INCREMENTAL_APPEND_HISTORY and
EXTENDED_HISTORY are set (the writing of the timestamps to the
history file helps us to find the point we left off in the reading
of history lines, especially after it gets rewritten by some other
shell).

Sharing history data brings up some tricky user-interface issues
that I've attempted to solve, but I'd appreciate feedback on how
the combined history data should be presented to the user.

My code currently marks all imported lines with a "foreign" flag,
allowing the lines to be conditionally ignored, and I have the
commands that move up and down in the history list ignoring them.
This means that it's primarily the search commands that will find
foreign data.  However, I also decided that once a line gets shown
to the user, it gets assimilated into the local history data.  This
means that if you run the "history" command, you'll be shown the most
recent lines in your history buffer (both local and foreign), all of
which will now be accessible via up/down movement (I should probably
make this showing of foreign data an option, though).

Also, I've made the importing of foreign commands ignore any commands
that are already in our local history buffer.  This avoids weird
cases where a command coming from a foreign shell might remove the
non-foreign version (if dups are deleted), causing the line to
vanish from up/down movement.

Implementation details:

In order to make the unduplication of history lines easy, I decided
to get rid of the array of history structures and turn it into a
hash table.  Since the new options can cause the history numbers to
become non-consecutive, I decided to link the data into a
doubly-linked list rather than using an array of pointers to store
the history order (note that this list is also how the duplicated
lines are remembered after they vanish from the hash).  As a result,
any command that iterates over the history list needed to be
modified to follow the linked list and (sometimes) check the flags.

The current readhistfile() function only breaks up words by spaces.
I changed this to also break up words by tabs and newlines (using the
inblank() function), but it would be nice if we could actually call
the lexer on the lines we read from the file.  Anyone know how hard
that would be?

Since a read-from-file line can now be intermixed with locally-added
lines, I changed the histcmp() function to use a white-space ignoring
string-compare instead of using the word arrays.  If this wasn't
done, the command "echo hi >file" read from a file wouldn't match
the same command typed on the command line.  However, it does mean
that spaces inside quotes are no longer treated as word characters.

I made a couple minor extensions to the hash-table code.  One is that
the string-compare function is now a member variable, allowing the
history hash to use the space-ignoring string compare to match
lines.  The other is that I added a new node-adding function,
addhashnode2(), that returns the duplicated hash node rather than
freeing it (in reality addhashnode() is implemented as a wrapper
that calls addhashnode2() and frees any returned duplicates).

I'm also wondering if we should be using a better string-hash
function than the one in hasher().  I haven't yet analyzed the
distribution of nodes within a hash table, but I was thinking that a
generic CRC function might give us better results.  Also, is there
any interest in rounding up the history table sizes to the nearest
prime number?  DBZ has a simple (but not terribly efficient)
algorithm for this, for instance.

Another issue is history-file locking.  I think we want to lock the
file when doing incremental updating (especially since re-writing
now occurs much more frequently).  I'm not familiar with how to get
metaconfig to look for the current machine's file-locking functions,
though.  Anyone want to help me out?  If not, I'll look it up later.

My current code tweaks the EXTENDED_HISTORY file format to change
the current ending time (measured in seconds since the epoch) to an
elapsed time (measured in seconds since the start time).  The new
code can still read the older format, and older shells shouldn't be
adversely affected (they just get the finish time wrong).  Anyone
else like this?

Side note:  I think there's a memory leak in the current
resizehistents() function when it makes the list smaller, but I
didn't diagnose this too much (I replaced it with a very simple
list-trimming routine).

So, there's still some more finishing work to do, and I need to
debug it a while longer, but the code is mostly done.  Comments?

I'm planning to send a patch for pws 17 when things solidify (I'll
also have a patch available for 3.0.6* when I get done, since I
haven't switched over to running 3.1.x on a regular basis yet).

..wayne..