From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2470 invoked from network); 4 May 1999 19:42:23 -0000 Received: from sunsite.auc.dk (130.225.51.30) by ns1.primenet.com.au with SMTP; 4 May 1999 19:42:23 -0000 Received: (qmail 13677 invoked by alias); 4 May 1999 19:42:18 -0000 Mailing-List: contact zsh-workers-help@sunsite.auc.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 6218 Received: (qmail 13670 invoked from network); 4 May 1999 19:42:17 -0000 Message-Id: <199905041942.MAA11235@bebop.clari.net> To: zsh-workers@sunsite.auc.dk Subject: Upcoming: handy history extensions Date: Tue, 04 May 1999 12:42:14 -0700 From: Wayne Davison I spent some time over the weekend adding some history features that I've been thinking about for some time, and I wanted to send some email about what I'm doing to (1) get any feedback you might have, and (2) ask a few questions. My primary motivation was an annoyance that the myriad of "normal" commands that I type over and over crowd out older and more useful commands that I typed long ago, especially when exiting multiple shells in a row. I figured that if the history routines favored unique commands, I could store more commands of relevance. I also wanted to be able to have multiple shells incrementally adding their history to the history file (instead of dumping it all at the end), and I wanted to be able to have searching find only unique commands (never showing me the same command twice, even if the duplicates are not consecutive). While I was at it, I also decided to allow simultaneously-running shells to share their history data. To do this, I decided to make a number of options so that people could customize this to their liking. Here's what I came up with: HIST_SAVE_NO_DUPS causes only "unique" commands to be written to the history file (unique after ignoring white-space differences). The setting of APPEND_HISTORY determines if the file is a unique set of the internal history (off) or the unique set of the old history file combined with the newly-appended lines (on). HIST_FIND_NO_DUPS causes the search commands to only visit unique lines in your internal history buffer. Does not affect commands and move up and down in the history buffer. HIST_EXPIRE_DUPS_FIRST will make the internal history list preserve a unique command at the tail end of the list when there is a newer, duplicated command that can be deleted instead. HIST_IGNORE_ALL_DUPS is like HIST_IGNORE_DUPS, but it removes all duplicates of newer commands from your history buffer, even if they aren't adjacent. Having this set renders the options HIST_FIND_NO_DUPS, HIST_SAVE_NO_DUPS, and HIST_EXPIRE_DUPS_FIRST irrelevant (since there are no duplicates in the internal history data). INCREMENTAL_APPEND_HISTORY is just like APPEND_HISTORY, except that it adds the lines to the history file as you enter them. It also periodically re-writes the file in addition to re-writing on exit. I'd be willing to merge this new option into the old one, if people feel that the slight change in behavior would not disturb upgraders. SHARE_HISTORY enables the reading the history lines that other shells have appended to the current history file. Setting this option also causes the shell to behave like INCREMENTAL_APPEND_HISTORY and EXTENDED_HISTORY are set (the writing of the timestamps to the history file helps us to find the point we left off in the reading of history lines, especially after it gets rewritten by some other shell). Sharing history data brings up some tricky user-interface issues that I've attempted to solve, but I'd appreciate feedback on how the combined history data should be presented to the user. My code currently marks all imported lines with a "foreign" flag, allowing the lines to be conditionally ignored, and I have the commands that move up and down in the history list ignoring them. This means that it's primarily the search commands that will find foreign data. However, I also decided that once a line gets shown to the user, it gets assimilated into the local history data. This means that if you run the "history" command, you'll be shown the most recent lines in your history buffer (both local and foreign), all of which will now be accessible via up/down movement (I should probably make this showing of foreign data an option, though). Also, I've made the importing of foreign commands ignore any commands that are already in our local history buffer. This avoids weird cases where a command coming from a foreign shell might remove the non-foreign version (if dups are deleted), causing the line to vanish from up/down movement. Implementation details: In order to make the unduplication of history lines easy, I decided to get rid of the array of history structures and turn it into a hash table. Since the new options can cause the history numbers to become non-consecutive, I decided to link the data into a doubly-linked list rather than using an array of pointers to store the history order (note that this list is also how the duplicated lines are remembered after they vanish from the hash). As a result, any command that iterates over the history list needed to be modified to follow the linked list and (sometimes) check the flags. The current readhistfile() function only breaks up words by spaces. I changed this to also break up words by tabs and newlines (using the inblank() function), but it would be nice if we could actually call the lexer on the lines we read from the file. Anyone know how hard that would be? Since a read-from-file line can now be intermixed with locally-added lines, I changed the histcmp() function to use a white-space ignoring string-compare instead of using the word arrays. If this wasn't done, the command "echo hi >file" read from a file wouldn't match the same command typed on the command line. However, it does mean that spaces inside quotes are no longer treated as word characters. I made a couple minor extensions to the hash-table code. One is that the string-compare function is now a member variable, allowing the history hash to use the space-ignoring string compare to match lines. The other is that I added a new node-adding function, addhashnode2(), that returns the duplicated hash node rather than freeing it (in reality addhashnode() is implemented as a wrapper that calls addhashnode2() and frees any returned duplicates). I'm also wondering if we should be using a better string-hash function than the one in hasher(). I haven't yet analyzed the distribution of nodes within a hash table, but I was thinking that a generic CRC function might give us better results. Also, is there any interest in rounding up the history table sizes to the nearest prime number? DBZ has a simple (but not terribly efficient) algorithm for this, for instance. Another issue is history-file locking. I think we want to lock the file when doing incremental updating (especially since re-writing now occurs much more frequently). I'm not familiar with how to get metaconfig to look for the current machine's file-locking functions, though. Anyone want to help me out? If not, I'll look it up later. My current code tweaks the EXTENDED_HISTORY file format to change the current ending time (measured in seconds since the epoch) to an elapsed time (measured in seconds since the start time). The new code can still read the older format, and older shells shouldn't be adversely affected (they just get the finish time wrong). Anyone else like this? Side note: I think there's a memory leak in the current resizehistents() function when it makes the list smaller, but I didn't diagnose this too much (I replaced it with a very simple list-trimming routine). So, there's still some more finishing work to do, and I need to debug it a while longer, but the code is mostly done. Comments? I'm planning to send a patch for pws 17 when things solidify (I'll also have a patch available for 3.0.6* when I get done, since I haven't switched over to running 3.1.x on a regular basis yet). ..wayne..