zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
@ 2020-02-07  7:47 Michael Stapelberg
  2020-02-07 11:30 ` Mikael Magnusson
  2020-02-08  5:54 ` Bart Schaefer
  0 siblings, 2 replies; 7+ messages in thread
From: Michael Stapelberg @ 2020-02-07  7:47 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

[please cc me in replies, as I’m not subscribed to this list]

Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:

time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,35    1,112890           1    697153         1 lseek
  0,99    0,011314           2      5277           read
[…]

This change keeps track of read bytes and the position within the file, removing
all of these system calls.

I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.

[-- Attachment #2: 0001-readhistfile-avoid-thousands-of-lseek-2-syscalls-via.patch --]
[-- Type: text/x-patch, Size: 2980 bytes --]

From 55ca0751f11432efab667451cc9b63ec0becc038 Mon Sep 17 00:00:00 2001
From: Michael Stapelberg <stapelberg@google.com>
Date: Fri, 7 Feb 2020 08:41:26 +0100
Subject: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via
 ftell()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:

time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,35    1,112890           1    697153         1 lseek
  0,99    0,011314           2      5277           read
[…]

This change keeps track of read bytes and the position within the file, removing
all of these system calls.

I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.
---
 Src/hist.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/Src/hist.c b/Src/hist.c
index 5281e8718..12eef5294 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2575,10 +2575,11 @@ resizehistents(void)
 }
 
 static int
-readhistline(int start, char **bufp, int *bufsiz, FILE *in)
+readhistline(int start, char **bufp, int *bufsiz, FILE *in, int *readbytes)
 {
     char *buf = *bufp;
     if (fgets(buf + start, *bufsiz - start, in)) {
+	*readbytes += strlen(buf + start);
 	int len = start + strlen(buf + start);
 	if (len == start)
 	    return -1;
@@ -2588,7 +2589,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 		    return -1;
 		*bufp = zrealloc(buf, 2 * (*bufsiz));
 		*bufsiz = 2 * (*bufsiz);
-		return readhistline(len, bufp, bufsiz, in);
+		return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	else {
@@ -2596,7 +2597,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 	    if (len > 1 && buf[len - 2] == '\\') {
 		buf[--len - 1] = '\n';
 		if (!feof(in))
-		    return readhistline(len, bufp, bufsiz, in);
+		    return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	return len;
@@ -2616,7 +2617,7 @@ readhistfile(char *fn, int err, int readflags)
     short *words;
     struct stat sb;
     int nwordpos, nwords, bufsiz;
-    int searching, newflags, l, ret, uselex;
+    int searching, newflags, l, ret, uselex, readbytes;
 
     if (!fn && !(fn = getsparam("HISTFILE")))
 	return;
@@ -2658,13 +2659,15 @@ readhistfile(char *fn, int err, int readflags)
 	} else
 	    searching = 0;
 
+	fpos = ftell(in);
+	readbytes = 0;
 	newflags = HIST_OLD | HIST_READ;
 	if (readflags & HFILE_FAST)
 	    newflags |= HIST_FOREIGN;
 	if (readflags & HFILE_SKIPOLD
 	 || (hist_ignore_all_dups && newflags & hist_skip_flags))
 	    newflags |= HIST_MAKEUNIQUE;
-	while (fpos = ftell(in), (l = readhistline(0, &buf, &bufsiz, in))) {
+	while (fpos += readbytes, readbytes = 0, (l = readhistline(0, &buf, &bufsiz, in, &readbytes))) {
 	    char *pt;
 	    int remeta = 0;
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07  7:47 [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell() Michael Stapelberg
@ 2020-02-07 11:30 ` Mikael Magnusson
  2020-02-07 11:33   ` Michael Stapelberg
  2020-02-08  5:54 ` Bart Schaefer
  1 sibling, 1 reply; 7+ messages in thread
From: Mikael Magnusson @ 2020-02-07 11:30 UTC (permalink / raw)
  To: Michael Stapelberg; +Cc: zsh-workers

On 2/7/20, Michael Stapelberg <michael+zsh@stapelberg.ch> wrote:
> [please cc me in replies, as I’m not subscribed to this list]
>
> Before this change, zsh startup time was dominated by lseek(2) system calls
> on
> the history file, as shown by strace -c:
>
> time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  97,35    1,112890           1    697153         1 lseek
>   0,99    0,011314           2      5277           read
> […]
>
> This change keeps track of read bytes and the position within the file,
> removing
> all of these system calls.
>
> I verified correctness of the change by comparing fpos with ftell(in) in
> every
> loop iteration.

> +	*readbytes += strlen(buf + start);
>  	int len = start + strlen(buf + start);

int len = strlen(buf + start);
*readbytes += len;
len += start;

Even if you're convinced the compiler will optimize the double strlen
calls, the declaration has to come before code (I don't think we
require new enough C to not require this yet?).

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07 11:30 ` Mikael Magnusson
@ 2020-02-07 11:33   ` Michael Stapelberg
  2021-04-10  0:54     ` Oliver Kiddle
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Stapelberg @ 2020-02-07 11:33 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

Done, thanks.

On Fri, Feb 7, 2020 at 12:30 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 2/7/20, Michael Stapelberg <michael+zsh@stapelberg.ch> wrote:
> > [please cc me in replies, as I’m not subscribed to this list]
> >
> > Before this change, zsh startup time was dominated by lseek(2) system calls
> > on
> > the history file, as shown by strace -c:
> >
> > time     seconds  usecs/call     calls    errors syscall
> > ------ ----------- ----------- --------- --------- ----------------
> >  97,35    1,112890           1    697153         1 lseek
> >   0,99    0,011314           2      5277           read
> > […]
> >
> > This change keeps track of read bytes and the position within the file,
> > removing
> > all of these system calls.
> >
> > I verified correctness of the change by comparing fpos with ftell(in) in
> > every
> > loop iteration.
>
> > +     *readbytes += strlen(buf + start);
> >       int len = start + strlen(buf + start);
>
> int len = strlen(buf + start);
> *readbytes += len;
> len += start;
>
> Even if you're convinced the compiler will optimize the double strlen
> calls, the declaration has to come before code (I don't think we
> require new enough C to not require this yet?).
>
> --
> Mikael Magnusson

[-- Attachment #2: 0001-readhistfile-avoid-thousands-of-lseek-2-syscalls-via.patch --]
[-- Type: text/x-patch, Size: 3047 bytes --]

From 4a110807581ebafeed8178fd177e9987f334ce9c Mon Sep 17 00:00:00 2001
From: Michael Stapelberg <stapelberg@google.com>
Date: Fri, 7 Feb 2020 08:41:26 +0100
Subject: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via
 ftell()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:

time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,35    1,112890           1    697153         1 lseek
  0,99    0,011314           2      5277           read
[…]

This change keeps track of read bytes and the position within the file, removing
all of these system calls.

I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.
---
 Src/hist.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/Src/hist.c b/Src/hist.c
index 5281e8718..b44463e1b 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2575,11 +2575,13 @@ resizehistents(void)
 }
 
 static int
-readhistline(int start, char **bufp, int *bufsiz, FILE *in)
+readhistline(int start, char **bufp, int *bufsiz, FILE *in, int *readbytes)
 {
     char *buf = *bufp;
     if (fgets(buf + start, *bufsiz - start, in)) {
-	int len = start + strlen(buf + start);
+	int len = strlen(buf + start);
+	*readbytes += len;
+	len += start;
 	if (len == start)
 	    return -1;
 	if (buf[len - 1] != '\n') {
@@ -2588,7 +2590,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 		    return -1;
 		*bufp = zrealloc(buf, 2 * (*bufsiz));
 		*bufsiz = 2 * (*bufsiz);
-		return readhistline(len, bufp, bufsiz, in);
+		return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	else {
@@ -2596,7 +2598,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
 	    if (len > 1 && buf[len - 2] == '\\') {
 		buf[--len - 1] = '\n';
 		if (!feof(in))
-		    return readhistline(len, bufp, bufsiz, in);
+		    return readhistline(len, bufp, bufsiz, in, readbytes);
 	    }
 	}
 	return len;
@@ -2616,7 +2618,7 @@ readhistfile(char *fn, int err, int readflags)
     short *words;
     struct stat sb;
     int nwordpos, nwords, bufsiz;
-    int searching, newflags, l, ret, uselex;
+    int searching, newflags, l, ret, uselex, readbytes;
 
     if (!fn && !(fn = getsparam("HISTFILE")))
 	return;
@@ -2658,13 +2660,15 @@ readhistfile(char *fn, int err, int readflags)
 	} else
 	    searching = 0;
 
+	fpos = ftell(in);
+	readbytes = 0;
 	newflags = HIST_OLD | HIST_READ;
 	if (readflags & HFILE_FAST)
 	    newflags |= HIST_FOREIGN;
 	if (readflags & HFILE_SKIPOLD
 	 || (hist_ignore_all_dups && newflags & hist_skip_flags))
 	    newflags |= HIST_MAKEUNIQUE;
-	while (fpos = ftell(in), (l = readhistline(0, &buf, &bufsiz, in))) {
+	while (fpos += readbytes, readbytes = 0, (l = readhistline(0, &buf, &bufsiz, in, &readbytes))) {
 	    char *pt;
 	    int remeta = 0;
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07  7:47 [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell() Michael Stapelberg
  2020-02-07 11:30 ` Mikael Magnusson
@ 2020-02-08  5:54 ` Bart Schaefer
  2020-02-08  7:47   ` Michael Stapelberg
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2020-02-08  5:54 UTC (permalink / raw)
  To: Michael Stapelberg; +Cc: zsh-workers

On Thu, Feb 6, 2020 at 11:48 PM Michael Stapelberg
<michael+zsh@stapelberg.ch> wrote:
>
> Before this change, zsh startup time was dominated by lseek(2) system calls on
> the history file

Curious.  I would have thought stdio buffering would mask most of the
need for lseek()ing by ftell().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-08  5:54 ` Bart Schaefer
@ 2020-02-08  7:47   ` Michael Stapelberg
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Stapelberg @ 2020-02-08  7:47 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Apparently not :). I tried looking into the corresponding glibc code
to see if anything changed here, but it’s really hard to follow with
all its macros. AFAICT, though, ftell is just never buffered.

On Sat, Feb 8, 2020 at 6:54 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Thu, Feb 6, 2020 at 11:48 PM Michael Stapelberg
> <michael+zsh@stapelberg.ch> wrote:
> >
> > Before this change, zsh startup time was dominated by lseek(2) system calls on
> > the history file
>
> Curious.  I would have thought stdio buffering would mask most of the
> need for lseek()ing by ftell().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2020-02-07 11:33   ` Michael Stapelberg
@ 2021-04-10  0:54     ` Oliver Kiddle
  2021-04-10 17:31       ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Oliver Kiddle @ 2021-04-10  0:54 UTC (permalink / raw)
  To: zsh-workers

On 7 Feb 2020, Michael Stapelberg wrote:
> Before this change, zsh startup time was dominated by lseek(2) system calls on
> the history file, as shown by strace -c:

This patch never got applied. 45768 is almost identical.

I tried timing the readhistline function with dtrace on FreeBSD to see
if this is perhaps only relevant with glibc. It didn't appear to result
in particularly great savings however when measuring a few times, I did
consistently get better performance with the patch than without. That was
with a fast local disk and of the order of microseconds - certainly not
enough to be discernable. Over NFS, timings were erratic.

Having read through both patches, nothing caught my attention as being
a concern. 45396 keeps track of number of bytes read while 45768 does
the same with file position in an off_t variable. The first doesn't
duplicate a strlen() call which is preferable but the second patch could
be tweaked to do likewise. Does anyone have a preference between them?
If nobody else has any particular views on the patch(es), I'm inclined to
apply, probably the first one.

Oliver


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
  2021-04-10  0:54     ` Oliver Kiddle
@ 2021-04-10 17:31       ` Bart Schaefer
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2021-04-10 17:31 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On Fri, Apr 9, 2021 at 6:06 PM Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
>
> If nobody else has any particular views on the patch(es), I'm inclined to
> apply, probably the first one.

I think that's fine.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-10 17:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07  7:47 [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell() Michael Stapelberg
2020-02-07 11:30 ` Mikael Magnusson
2020-02-07 11:33   ` Michael Stapelberg
2021-04-10  0:54     ` Oliver Kiddle
2021-04-10 17:31       ` Bart Schaefer
2020-02-08  5:54 ` Bart Schaefer
2020-02-08  7:47   ` Michael Stapelberg

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).