(bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file

zsh-workers
 help / color / mirror / code / Atom feed

* (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
@ 2010-09-26 18:12 Evgeny Burmentyev
  2010-09-26 20:15 ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Evgeny Burmentyev @ 2010-09-26 18:12 UTC (permalink / raw)
  To: zsh-workers

Linux arch 2.6.35-ARCH #1 SMP PREEMPT Tue Sep 21 11:00:23 UTC 2010 i686
Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz GenuineIntel GNU/Linux
HISTSIZE=50000
SAVEHIST=50000

With the current version from cvs:
Output of $strace zsh
http://sprunge.us/bKCj
Output of $time zsh (and ^D right after it)
zsh  10.35s user 0.06s system 82% cpu 12.613 total

With the version from cvs prior to 2010-09-10:
Output of $strace zsh
http://sprunge.us/behC
Output of $time zsh (and ^D right after it)
zsh  1.29s user 0.03s system 72% cpu 1.825 total

I did only these tests, but I'm pretty sure, it began not before 20th of
September. I use zsh on daily basis and try not to miss new commits.
And this huge delay happens not only when starting zsh, but after
logging off it as well (^D). Otherwise ok.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-09-26 18:12 (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file Evgeny Burmentyev
@ 2010-09-26 20:15 ` Bart Schaefer
  2010-09-27  1:26   ` Benjamin R. Haskell
  2010-10-02 18:58   ` Peter Stephenson
  0 siblings, 2 replies; 8+ messages in thread
From: Bart Schaefer @ 2010-09-26 20:15 UTC (permalink / raw)
  To: zsh-workers

I was somewhat concerned that this was going to happen.  This must be
because of zsh-workers/28295 (use lexer to split words from history).

The logging-off part is actually of more concern, it's probably because
you are sharing history and/or have some of the duplicate-suppression
options set.

Probably the right thing to do is:

(1) have an option for the new behavior (HIST_LEX_WORDS ?);

(2) implicitly turn that option off on the way through shell exit, so
    history rewrites/comparisons use faster processing.

-- 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-09-26 20:15 ` Bart Schaefer
@ 2010-09-27  1:26   ` Benjamin R. Haskell
  2010-10-02 19:41     ` PATCH " Bart Schaefer
  2010-10-02 18:58   ` Peter Stephenson
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin R. Haskell @ 2010-09-27  1:26 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

On Sun, 26 Sep 2010, Bart Schaefer wrote:

> I was somewhat concerned that this was going to happen.  This must be 
> because of zsh-workers/28295 (use lexer to split words from history).
>
> The logging-off part is actually of more concern, it's probably 
> because you are sharing history and/or have some of the 
> duplicate-suppression options set.
>
> Probably the right thing to do is:
>
> (1) have an option for the new behavior (HIST_LEX_WORDS ?);
>
> (2) implicitly turn that option off on the way through shell exit, so
>    history rewrites/comparisons use faster processing.
>

Not that confirmation was needed, but yes, that's the commit that did 
it:

$ git tag workers-28295 5ca08ec

# see git-build-zsh script below...

$ git build-zsh workers-28295~1
$ time /tmp/zsh-git-workers-28295\~1/bin/zsh -i =(print -l exit) > /dev/null
[...] 1.224 total

$ git build-zsh workers-28295
$ time /tmp/zsh-git-workers-28295/bin/zsh -i =(print -l exit) > /dev/null
[...] 5.108 total

Personally, I don't think I'd ever turn on a HIST_LEX_WORDS option.  (As 
mentioned in the other thread, I was just glad there was a simple 
explanation for the seemingly buggy behavior.)

Starting to regret noticing the problem in the first place. :-\

On the other hand it might be yet-another kick in the pants I need to 
start hacking a DB-based history mechanism.

[The first kick:]

I like to keep my entire history (HISTSIZE=50000000, SAVEHIST=$HISTSIZE, 
setopt extended_history).  But, occasionally I'll let my home dir 
partition fill up (usually from forgetting that I have a Wireshark 
instance up in the background, slowly chewing up disk space).  After I 
clear out some space, the entire history file gets its timestamps reset 
to the current time.  I check it in to git daily, but still, it's 
frustrating to lose any history.

-- 
Best,
Ben

==> git-build-zsh <==
#!/bin/zsh

die () { echo "\e[31m"$argv"\e[0m" ; exit 1 }
(( $#argv )) || die "Usage: $0 revision"

script=$0:t
rev=$1
TRAPZERR () { die "Failed: $last_cmd" }
TRAPDEBUG () {
 	last_cmd=$ZSH_DEBUG_CMD
 	print -n "\e]2;$last_cmd - $script $rev\a"
}

git co $rev
autoreconf
./configure --prefix=/tmp/zsh-git-$rev --enable-multibyte
make -j5
make install

^ permalink raw reply	[flat|nested] 8+ messages in thread

* PATCH Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-09-27  1:26   ` Benjamin R. Haskell
@ 2010-10-02 19:41     ` Bart Schaefer
  2010-10-02 19:54       ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 2010-10-02 19:41 UTC (permalink / raw)
  To: zsh-workers

On Sep 26,  9:26pm, Benjamin R. Haskell wrote:
} Subject: Re: (bug report) Starting zsh takes ~5 seconds with the latest co
}
} On Sun, 26 Sep 2010, Bart Schaefer wrote:
} 
} > Probably the right thing to do is:
} >
} > (1) have an option for the new behavior (HIST_LEX_WORDS ?);
} >
} > (2) implicitly turn that option off on the way through shell exit, so
} >    history rewrites/comparisons use faster processing.
} 
} Personally, I don't think I'd ever turn on a HIST_LEX_WORDS option.
} 
} Starting to regret noticing the problem in the first place. :-\

I'm sure it's best that somebody did.

Here's a patch for part (2) of the suggestion above, I'll leave it to
someone else to add the option -- it should be pretty obvious where to
check for it.

Most of the diff is re-indentation.  This patch *follows* 28295 (plus
the unposted tweak that PWS committed).

Index: Src/hist.c
--- ../current/Src/hist.c	2010-09-26 13:05:43.000000000 -0700
+++ Src/hist.c	2010-10-02 12:33:08.000000000 -0700
@@ -2335,61 +2335,84 @@
 	    else
 		he->ftim = ftim;
 
-	    /*
-	     * Divide up the words.  Attempt to do this using the lexer.
-	     */
 	    nwordpos = 0;
 	    start = pt;
-	    wordlist = bufferwords(NULL, pt, NULL);
-	    he->nwords = countlinknodes(wordlist);
-	    if (2*he->nwords > nwords) {
-		nwords = 2*he->nwords;
-		words = (short *)realloc(words, nwords*sizeof(short));
-	    }
-	    while (firstnode(wordlist)) {
-		char *word = uremnode(wordlist, firstnode(wordlist));
-		
-		while (inblank(*pt))
-		    pt++;
-		if (!strpfx(word, pt)) {
-		    int bad = 0;
-		    /*
-		     * Oddity 1: newlines turn into semicolons.
-		     */
-		    if (!strcmp(word, ";"))
-			continue;
-		    /*
-		     * Oddity 2: !'s turn into |'s.
-		     */
-		    while (*pt) {
-			if (!*word) {
-			    bad = 1;
-			    break;
-			}
-			if (*pt == *word ||
-			    (*pt == '!' && *word == '|')) {
+
+	    if (readflags & HFILE_FAST) {
+		/*
+		 * Divide up the words.  We don't care how it lexes,
+		 * so just look for white-space.
+		 */
+		do {
+		    while (inblank(*pt))
+			pt++;
+		    if (*pt) {
+			if (nwordpos >= nwords)
+			    words = (short *) realloc(words,
+						      (nwords += 64)*sizeof(short));
+			words[nwordpos++] = pt - start;
+			while (*pt && !inblank(*pt))
 			    pt++;
-			    word++;
-			} else {
-			    bad = 1;
-			    break;
-			}
+			words[nwordpos++] = pt - start;
 		    }
-		    if (bad) {
+		} while (*pt);
+
+		he->nwords = nwordpos/2;
+	    } else {
+		/*
+		 * Divide up the words.  Attempt to do this using the lexer.
+		 */
+		wordlist = bufferwords(NULL, pt, NULL);
+		he->nwords = countlinknodes(wordlist);
+		if (2*he->nwords > nwords) {
+		    nwords = 2*he->nwords;
+		    words = (short *)realloc(words, nwords*sizeof(short));
+		}
+		while (firstnode(wordlist)) {
+		    char *word = uremnode(wordlist, firstnode(wordlist));
+		
+		    while (inblank(*pt))
+			pt++;
+		    if (!strpfx(word, pt)) {
+			int bad = 0;
+			/*
+			 * Oddity 1: newlines turn into semicolons.
+			 */
+			if (!strcmp(word, ";"))
+			    continue;
+			/*
+			 * Oddity 2: !'s turn into |'s.
+			 */
+			while (*pt) {
+			    if (!*word) {
+				bad = 1;
+				break;
+			    }
+			    if (*pt == *word ||
+				(*pt == '!' && *word == '|')) {
+				pt++;
+				word++;
+			    } else {
+				bad = 1;
+				break;
+			    }
+			}
+			if (bad) {
 #ifdef DEBUG
-			dputs(ERRMSG("bad wordsplit reading history: %s\nat: %s"
-				     "\nword: %s"),
-			      start, pt, word);
+			    dputs(ERRMSG("bad wordsplit reading history: %s"
+					 "\nat: %s\nword: %s"),
+				  start, pt, word);
 #endif
-			words[nwordpos++] = pt - start;
-			pt += strlen(pt);
-			words[nwordpos++] = pt - start;
-			break;
+			    words[nwordpos++] = pt - start;
+			    pt += strlen(pt);
+			    words[nwordpos++] = pt - start;
+			    break;
+			}
 		    }
+		    words[nwordpos++] = pt - start;
+		    pt += strlen(word);
+		    words[nwordpos++] = pt - start;
 		}
-		words[nwordpos++] = pt - start;
-		pt += strlen(word);
-		words[nwordpos++] = pt - start;
 	    }
 
 	    if (he->nwords) {

-- 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-10-02 19:41     ` PATCH " Bart Schaefer
@ 2010-10-02 19:54       ` Bart Schaefer
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 2010-10-02 19:54 UTC (permalink / raw)
  To: zsh-workers

On Oct 2, 12:41pm, Bart Schaefer wrote:
}
} Here's a patch for part (2) of the suggestion above, I'll leave it to
} someone else to add the option

Ah, hrm.  Apparently PWS and I were doing the same thing at the same
instant, eight timezones apart ... I'll withdraw my patch in favor of
28310.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-09-26 20:15 ` Bart Schaefer
  2010-09-27  1:26   ` Benjamin R. Haskell
@ 2010-10-02 18:58   ` Peter Stephenson
  2010-10-02 20:04     ` Bart Schaefer
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2010-10-02 18:58 UTC (permalink / raw)
  To: zsh-workers

On Sun, 26 Sep 2010 13:15:27 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> (1) have an option for the new behavior (HIST_LEX_WORDS ?);

This adds that.

> (2) implicitly turn that option off on the way through shell exit, so
>     history rewrites/comparisons use faster processing.

Sounds like the shell is already doing too much work here.

Index: Doc/Zsh/options.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/options.yo,v
retrieving revision 1.95
diff -p -u -r1.95 options.yo
--- Doc/Zsh/options.yo	12 Sep 2010 18:56:40 -0000	1.95
+++ Doc/Zsh/options.yo	2 Oct 2010 18:56:16 -0000
@@ -836,6 +836,21 @@ command is entered before it vanishes, a
 or edit the line.  If you want to make it vanish right away without
 entering another command, type a space and press return.
 )
+pindex(HIST_LEX_WORDS)
+pindex(NO_HIST_LEX_WORDS)
+pindex(HISTLEXWORDS)
+pindex(NOHISTLEXWORDS)
+item(tt(HIST_LEX_WORDS))(
+By default, shell history that is read in from files is split into
+words on all white space.  This means that arguments with quoted
+whitespace are not correctly handled, with the consequence that
+references to words in history lines that have been read from a file
+may be inaccurate.  When this option is set, words read in from a
+history file are divided up in a similar fashion to normal shell
+command line handling.  Although this produces more accurately delimited
+words, if the size of the history file is large this can be slow.  Trial
+and error is necessary to decide.
+)
 pindex(HIST_NO_FUNCTIONS)
 pindex(NO_HIST_NO_FUNCTIONS)
 pindex(HISTNOFUNCTIONS)
Index: Src/hist.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/hist.c,v
retrieving revision 1.100
diff -p -u -r1.100 hist.c
--- Src/hist.c	25 Sep 2010 18:49:37 -0000	1.100
+++ Src/hist.c	2 Oct 2010 18:56:17 -0000
@@ -2232,7 +2232,6 @@ readhistfile(char *fn, int err, int read
     struct stat sb;
     int nwordpos, nwords, bufsiz;
     int searching, newflags, l, ret;
-    LinkList wordlist;
    
     if (!fn && !(fn = getsparam("HISTFILE")))
 	return;
@@ -2336,60 +2335,81 @@ readhistfile(char *fn, int err, int read
 		he->ftim = ftim;
 
 	    /*
-	     * Divide up the words.  Attempt to do this using the lexer.
+	     * Divide up the words.
 	     */
 	    nwordpos = 0;
 	    start = pt;
-	    wordlist = bufferwords(NULL, pt, NULL);
-	    he->nwords = countlinknodes(wordlist);
-	    if (2*he->nwords > nwords) {
-		nwords = 2*he->nwords;
-		words = (short *)realloc(words, nwords*sizeof(short));
-	    }
-	    while (firstnode(wordlist)) {
-		char *word = uremnode(wordlist, firstnode(wordlist));
+	    if (isset(HISTLEXWORDS)) {
+		/*
+		 * Attempt to do this using the lexer.
+		 */
+		LinkList wordlist = bufferwords(NULL, pt, NULL);
+		he->nwords = countlinknodes(wordlist);
+		if (2*he->nwords > nwords) {
+		    nwords = 2*he->nwords;
+		    words = (short *)realloc(words, nwords*sizeof(short));
+		}
+		while (firstnode(wordlist)) {
+		    char *word = uremnode(wordlist, firstnode(wordlist));
 		
-		while (inblank(*pt))
-		    pt++;
-		if (!strpfx(word, pt)) {
-		    int bad = 0;
-		    /*
-		     * Oddity 1: newlines turn into semicolons.
-		     */
-		    if (!strcmp(word, ";"))
-			continue;
-		    /*
-		     * Oddity 2: !'s turn into |'s.
-		     */
-		    while (*pt) {
-			if (!*word) {
-			    bad = 1;
-			    break;
+		    while (inblank(*pt))
+			pt++;
+		    if (!strpfx(word, pt)) {
+			int bad = 0;
+			/*
+			 * Oddity 1: newlines turn into semicolons.
+			 */
+			if (!strcmp(word, ";"))
+			    continue;
+			/*
+			 * Oddity 2: !'s turn into |'s.
+			 */
+			while (*pt) {
+			    if (!*word) {
+				bad = 1;
+				break;
+			    }
+			    if (*pt == *word ||
+				(*pt == '!' && *word == '|')) {
+				pt++;
+				word++;
+			    } else {
+				bad = 1;
+				break;
+			    }
 			}
-			if (*pt == *word ||
-			    (*pt == '!' && *word == '|')) {
-			    pt++;
-			    word++;
-			} else {
-			    bad = 1;
+			if (bad) {
+#ifdef DEBUG
+			    dputs(ERRMSG("bad wordsplit reading history: "
+					 "%s\nat: %s\nword: %s"),
+				  start, pt, word);
+#endif
+			    words[nwordpos++] = pt - start;
+			    pt += strlen(pt);
+			    words[nwordpos++] = pt - start;
 			    break;
 			}
 		    }
-		    if (bad) {
-#ifdef DEBUG
-			dputs(ERRMSG("bad wordsplit reading history: %s\nat: %s"
-				     "\nword: %s"),
-			      start, pt, word);
-#endif
+		    words[nwordpos++] = pt - start;
+		    pt += strlen(word);
+		    words[nwordpos++] = pt - start;
+		}
+	    } else {
+		do {
+		    while (inblank(*pt))
+			pt++;
+		    if (*pt) {
+			if (nwordpos >= nwords)
+			    words = (short *)
+				realloc(words, (nwords += 64)*sizeof(short));
 			words[nwordpos++] = pt - start;
-			pt += strlen(pt);
+			while (*pt && !inblank(*pt))
+			    pt++;
 			words[nwordpos++] = pt - start;
-			break;
 		    }
-		}
-		words[nwordpos++] = pt - start;
-		pt += strlen(word);
-		words[nwordpos++] = pt - start;
+		} while (*pt);
+
+		he->nwords = nwordpos/2;
 	    }
 
 	    if (he->nwords) {
Index: Src/options.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/options.c,v
retrieving revision 1.55
diff -p -u -r1.55 options.c
--- Src/options.c	12 Sep 2010 18:56:41 -0000	1.55
+++ Src/options.c	2 Oct 2010 18:56:17 -0000
@@ -149,6 +149,7 @@ static struct optname optns[] = {
 {{NULL, "histignorealldups",  0},			 HISTIGNOREALLDUPS},
 {{NULL, "histignoredups",     0},			 HISTIGNOREDUPS},
 {{NULL, "histignorespace",    0},			 HISTIGNORESPACE},
+{{NULL, "histlexwords",	      0},			 HISTLEXWORDS},
 {{NULL, "histnofunctions",    0},			 HISTNOFUNCTIONS},
 {{NULL, "histnostore",	      0},			 HISTNOSTORE},
 {{NULL, "histsubstpattern",   OPT_EMULATE},              HISTSUBSTPATTERN},
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.168
diff -p -u -r1.168 zsh.h
--- Src/zsh.h	14 Sep 2010 14:46:26 -0000	1.168
+++ Src/zsh.h	2 Oct 2010 18:56:18 -0000
@@ -1942,6 +1942,7 @@ enum {
     HISTIGNOREALLDUPS,
     HISTIGNOREDUPS,
     HISTIGNORESPACE,
+    HISTLEXWORDS,
     HISTNOFUNCTIONS,
     HISTNOSTORE,
     HISTREDUCEBLANKS,

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-10-02 18:58   ` Peter Stephenson
@ 2010-10-02 20:04     ` Bart Schaefer
  2010-10-02 21:06       ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 2010-10-02 20:04 UTC (permalink / raw)
  To: zsh-workers

On Oct 2,  7:58pm, Peter Stephenson wrote:
}
} On Sun, 26 Sep 2010 13:15:27 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > (2) implicitly turn that option off on the way through shell exit, so
} >     history rewrites/comparisons use faster processing.
} 
} Sounds like the shell is already doing too much work here.

I think by adding the (readflags & HFILE_FAST) test to the check for
isset(HISTLEXWORDS), both problems are solved.  Do that for commit?

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file
  2010-10-02 20:04     ` Bart Schaefer
@ 2010-10-02 21:06       ` Peter Stephenson
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Stephenson @ 2010-10-02 21:06 UTC (permalink / raw)
  To: zsh-workers

On Sat, 02 Oct 2010 13:04:54 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 2,  7:58pm, Peter Stephenson wrote:
> }
> } On Sun, 26 Sep 2010 13:15:27 -0700
> } Bart Schaefer <schaefer@brasslantern.com> wrote:
> } > (2) implicitly turn that option off on the way through shell exit, so
> } >     history rewrites/comparisons use faster processing.
> } 
> } Sounds like the shell is already doing too much work here.
> 
> I think by adding the (readflags & HFILE_FAST) test to the check for
> isset(HISTLEXWORDS), both problems are solved.  Do that for commit?

i.e. !(readflags & HFILE_FAST) in that branch.  I've done that.  Should
have come back from holiday an hour later.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-10-02 21:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-26 18:12 (bug report) Starting zsh takes ~5 seconds with the latest commits after 20th of September with big history file Evgeny Burmentyev
2010-09-26 20:15 ` Bart Schaefer
2010-09-27  1:26   ` Benjamin R. Haskell
2010-10-02 19:41     ` PATCH " Bart Schaefer
2010-10-02 19:54       ` Bart Schaefer
2010-10-02 18:58   ` Peter Stephenson
2010-10-02 20:04     ` Bart Schaefer
2010-10-02 21:06       ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).