zsh-workers
 help / color / mirror / Atom feed
* [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
@ 2020-02-07  7:47 Michael Stapelberg
  2020-02-07 11:30 ` Mikael Magnusson
  2020-02-08  5:54 ` Bart Schaefer
  0 siblings, 2 replies; 5+ messages in thread
From: Michael Stapelberg @ 2020-02-07  7:47 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

[please cc me in replies, as I’m not subscribed to this list]

Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:

time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,35    1,112890           1    697153         1 lseek
  0,99    0,011314           2      5277           read
[…]

This change keeps track of read bytes and the position within the file, removing
all of these system calls.

I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.

[-- Attachment #2: 0001-readhistfile-avoid-thousands-of-lseek-2-syscalls-via.patch --]
[-- Type: text/x-patch, Size: 2980 bytes --]

From 55ca0751f11432efab667451cc9b63ec0becc038 Mon Sep 17 00:00:00 2001
From: Michael Stapelberg <stapelberg@google.com>
Date: Fri, 7 Feb 2020 08:41:26 +0100
Subject: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via
 ftell()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:

time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,35    1,112890           1    697153         1 lseek
  0,99    0,011314           2      5277           read
[…]

This change keeps track of read bytes and the position within the file, removing
all of these system calls.

I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.
---
 Src/hist.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/Src/hist.c b/Src/hist.c
index 5281e8718..12eef5294 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2575,10 +2575,11 @@ resizehistents(void)
 }
 
 static int
-readhistline(int start, char **bufp, int *bufsiz, FILE *in)
+readhistline(int start, char **bufp, int *bufsiz, FILE *in, int *readbytes)
 {
     char *buf = *bufp;
     if (fgets(buf + start, *bufsiz - start, in)) {
+	*readbytes += strlen(buf + start);
 	int len = start + strlen(buf + start);
 	if (len == start)
 	    return -1;
@@ -2588,7 +2589,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 		    return -1;
 		*bufp = zrealloc(buf, 2 * (*bufsiz));
 		*bufsiz = 2 * (*bufsiz);
-		return readhistline(len, bufp, bufsiz, in);
+		return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	else {
@@ -2596,7 +2597,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 	    if (len > 1 && buf[len - 2] == '\\') {
 		buf[--len - 1] = '\n';
 		if (!feof(in))
-		    return readhistline(len, bufp, bufsiz, in);
+		    return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	return len;
@@ -2616,7 +2617,7 @@ readhistfile(char *fn, int err, int readflags)
     short *words;
     struct stat sb;
     int nwordpos, nwords, bufsiz;
-    int searching, newflags, l, ret, uselex;
+    int searching, newflags, l, ret, uselex, readbytes;
 
     if (!fn && !(fn = getsparam("HISTFILE")))
 	return;
@@ -2658,13 +2659,15 @@ readhistfile(char *fn, int err, int readflags)
 	} else
 	    searching = 0;
 
+	fpos = ftell(in);
+	readbytes = 0;
 	newflags = HIST_OLD | HIST_READ;
 	if (readflags & HFILE_FAST)
 	    newflags |= HIST_FOREIGN;
 	if (readflags & HFILE_SKIPOLD
 	 || (hist_ignore_all_dups && newflags & hist_skip_flags))
 	    newflags |= HIST_MAKEUNIQUE;
-	while (fpos = ftell(in), (l = readhistline(0, &buf, &bufsiz, in))) {
+	while (fpos += readbytes, readbytes = 0, (l = readhistline(0, &buf, &bufsiz, in, &readbytes))) {
 	    char *pt;
 	    int remeta = 0;
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07  7:47 [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell() Michael Stapelberg
@ 2020-02-07 11:30 ` Mikael Magnusson
  2020-02-07 11:33   ` Michael Stapelberg
  2020-02-08  5:54 ` Bart Schaefer
  1 sibling, 1 reply; 5+ messages in thread
From: Mikael Magnusson @ 2020-02-07 11:30 UTC (permalink / raw)
  To: Michael Stapelberg; +Cc: zsh-workers

On 2/7/20, Michael Stapelberg <michael+zsh@stapelberg.ch> wrote:
> [please cc me in replies, as I’m not subscribed to this list]
>
> Before this change, zsh startup time was dominated by lseek(2) system calls
> on
> the history file, as shown by strace -c:
>
> time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  97,35    1,112890           1    697153         1 lseek
>   0,99    0,011314           2      5277           read
> […]
>
> This change keeps track of read bytes and the position within the file,
> removing
> all of these system calls.
>
> I verified correctness of the change by comparing fpos with ftell(in) in
> every
> loop iteration.

> +	*readbytes += strlen(buf + start);
>  	int len = start + strlen(buf + start);

int len = strlen(buf + start);
*readbytes += len;
len += start;

Even if you're convinced the compiler will optimize the double strlen
calls, the declaration has to come before code (I don't think we
require new enough C to not require this yet?).

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07 11:30 ` Mikael Magnusson
@ 2020-02-07 11:33   ` Michael Stapelberg
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Stapelberg @ 2020-02-07 11:33 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

Done, thanks.

On Fri, Feb 7, 2020 at 12:30 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 2/7/20, Michael Stapelberg <michael+zsh@stapelberg.ch> wrote:
> > [please cc me in replies, as I’m not subscribed to this list]
> >
> > Before this change, zsh startup time was dominated by lseek(2) system calls
> > on
> > the history file, as shown by strace -c:
> >
> > time     seconds  usecs/call     calls    errors syscall
> > ------ ----------- ----------- --------- --------- ----------------
> >  97,35    1,112890           1    697153         1 lseek
> >   0,99    0,011314           2      5277           read
> > […]
> >
> > This change keeps track of read bytes and the position within the file,
> > removing
> > all of these system calls.
> >
> > I verified correctness of the change by comparing fpos with ftell(in) in
> > every
> > loop iteration.
>
> > +     *readbytes += strlen(buf + start);
> >       int len = start + strlen(buf + start);
>
> int len = strlen(buf + start);
> *readbytes += len;
> len += start;
>
> Even if you're convinced the compiler will optimize the double strlen
> calls, the declaration has to come before code (I don't think we
> require new enough C to not require this yet?).
>
> --
> Mikael Magnusson

[-- Attachment #2: 0001-readhistfile-avoid-thousands-of-lseek-2-syscalls-via.patch --]
[-- Type: text/x-patch, Size: 3047 bytes --]

From 4a110807581ebafeed8178fd177e9987f334ce9c Mon Sep 17 00:00:00 2001
From: Michael Stapelberg <stapelberg@google.com>
Date: Fri, 7 Feb 2020 08:41:26 +0100
Subject: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via
 ftell()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:

time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,35    1,112890           1    697153         1 lseek
  0,99    0,011314           2      5277           read
[…]

This change keeps track of read bytes and the position within the file, removing
all of these system calls.

I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.
---
 Src/hist.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/Src/hist.c b/Src/hist.c
index 5281e8718..b44463e1b 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2575,11 +2575,13 @@ resizehistents(void)
 }
 
 static int
-readhistline(int start, char **bufp, int *bufsiz, FILE *in)
+readhistline(int start, char **bufp, int *bufsiz, FILE *in, int *readbytes)
 {
     char *buf = *bufp;
     if (fgets(buf + start, *bufsiz - start, in)) {
-	int len = start + strlen(buf + start);
+	int len = strlen(buf + start);
+	*readbytes += len;
+	len += start;
 	if (len == start)
 	    return -1;
 	if (buf[len - 1] != '\n') {
@@ -2588,7 +2590,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 		    return -1;
 		*bufp = zrealloc(buf, 2 * (*bufsiz));
 		*bufsiz = 2 * (*bufsiz);
-		return readhistline(len, bufp, bufsiz, in);
+		return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	else {
@@ -2596,7 +2598,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 	    if (len > 1 && buf[len - 2] == '\\') {
 		buf[--len - 1] = '\n';
 		if (!feof(in))
-		    return readhistline(len, bufp, bufsiz, in);
+		    return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	return len;
@@ -2616,7 +2618,7 @@ readhistfile(char *fn, int err, int readflags)
     short *words;
     struct stat sb;
     int nwordpos, nwords, bufsiz;
-    int searching, newflags, l, ret, uselex;
+    int searching, newflags, l, ret, uselex, readbytes;
 
     if (!fn && !(fn = getsparam("HISTFILE")))
 	return;
@@ -2658,13 +2660,15 @@ readhistfile(char *fn, int err, int readflags)
 	} else
 	    searching = 0;
 
+	fpos = ftell(in);
+	readbytes = 0;
 	newflags = HIST_OLD | HIST_READ;
 	if (readflags & HFILE_FAST)
 	    newflags |= HIST_FOREIGN;
 	if (readflags & HFILE_SKIPOLD
 	 || (hist_ignore_all_dups && newflags & hist_skip_flags))
 	    newflags |= HIST_MAKEUNIQUE;
-	while (fpos = ftell(in), (l = readhistline(0, &buf, &bufsiz, in))) {
+	while (fpos += readbytes, readbytes = 0, (l = readhistline(0, &buf, &bufsiz, in, &readbytes))) {
 	    char *pt;
 	    int remeta = 0;
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07  7:47 [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell() Michael Stapelberg
  2020-02-07 11:30 ` Mikael Magnusson
@ 2020-02-08  5:54 ` Bart Schaefer
  2020-02-08  7:47   ` Michael Stapelberg
  1 sibling, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2020-02-08  5:54 UTC (permalink / raw)
  To: Michael Stapelberg; +Cc: zsh-workers

On Thu, Feb 6, 2020 at 11:48 PM Michael Stapelberg
<michael+zsh@stapelberg.ch> wrote:
>
> Before this change, zsh startup time was dominated by lseek(2) system calls on
> the history file

Curious.  I would have thought stdio buffering would mask most of the
need for lseek()ing by ftell().

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-08  5:54 ` Bart Schaefer
@ 2020-02-08  7:47   ` Michael Stapelberg
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Stapelberg @ 2020-02-08  7:47 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Apparently not :). I tried looking into the corresponding glibc code
to see if anything changed here, but it’s really hard to follow with
all its macros. AFAICT, though, ftell is just never buffered.

On Sat, Feb 8, 2020 at 6:54 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Thu, Feb 6, 2020 at 11:48 PM Michael Stapelberg
> <michael+zsh@stapelberg.ch> wrote:
> >
> > Before this change, zsh startup time was dominated by lseek(2) system calls on
> > the history file
>
> Curious.  I would have thought stdio buffering would mask most of the
> need for lseek()ing by ftell().

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-08  7:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07  7:47 [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell() Michael Stapelberg
2020-02-07 11:30 ` Mikael Magnusson
2020-02-07 11:33   ` Michael Stapelberg
2020-02-08  5:54 ` Bart Schaefer
2020-02-08  7:47   ` Michael Stapelberg

zsh-workers

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.vuxu.org/zsh-workers

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 zsh-workers zsh-workers/ http://inbox.vuxu.org/zsh-workers \
		zsh-workers@zsh.org
	public-inbox-index zsh-workers

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.zsh.workers


code repositories for the project(s) associated with this inbox:

	https://git.vuxu.org/mirror/zsh/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git