From: Michael Stapelberg <michael+zsh@stapelberg.ch>
To: Mikael Magnusson <mikachu@gmail.com>
Cc: zsh-workers@zsh.org
Subject: Re: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via ftell()
Date: Fri, 7 Feb 2020 12:33:32 +0100 [thread overview]
Message-ID: <CANnVG6mmK+i87hSu1KUxs9APyeXVHiwf1oZpokCgYSEu4UP_tQ@mail.gmail.com> (raw)
In-Reply-To: <CAHYJk3T+4VwToio67q0UeQ4vmcTJopvZiwYXP18+usfyY+LYoA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]
Done, thanks.
On Fri, Feb 7, 2020 at 12:30 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 2/7/20, Michael Stapelberg <michael+zsh@stapelberg.ch> wrote:
> > [please cc me in replies, as I’m not subscribed to this list]
> >
> > Before this change, zsh startup time was dominated by lseek(2) system calls
> > on
> > the history file, as shown by strace -c:
> >
> > time seconds usecs/call calls errors syscall
> > ------ ----------- ----------- --------- --------- ----------------
> > 97,35 1,112890 1 697153 1 lseek
> > 0,99 0,011314 2 5277 read
> > […]
> >
> > This change keeps track of read bytes and the position within the file,
> > removing
> > all of these system calls.
> >
> > I verified correctness of the change by comparing fpos with ftell(in) in
> > every
> > loop iteration.
>
> > + *readbytes += strlen(buf + start);
> > int len = start + strlen(buf + start);
>
> int len = strlen(buf + start);
> *readbytes += len;
> len += start;
>
> Even if you're convinced the compiler will optimize the double strlen
> calls, the declaration has to come before code (I don't think we
> require new enough C to not require this yet?).
>
> --
> Mikael Magnusson
[-- Attachment #2: 0001-readhistfile-avoid-thousands-of-lseek-2-syscalls-via.patch --]
[-- Type: text/x-patch, Size: 3047 bytes --]
From 4a110807581ebafeed8178fd177e9987f334ce9c Mon Sep 17 00:00:00 2001
From: Michael Stapelberg <stapelberg@google.com>
Date: Fri, 7 Feb 2020 08:41:26 +0100
Subject: [PATCH] readhistfile: avoid thousands of lseek(2) syscalls via
ftell()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Before this change, zsh startup time was dominated by lseek(2) system calls on
the history file, as shown by strace -c:
time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97,35 1,112890 1 697153 1 lseek
0,99 0,011314 2 5277 read
[…]
This change keeps track of read bytes and the position within the file, removing
all of these system calls.
I verified correctness of the change by comparing fpos with ftell(in) in every
loop iteration.
---
Src/hist.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/Src/hist.c b/Src/hist.c
index 5281e8718..b44463e1b 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2575,11 +2575,13 @@ resizehistents(void)
}
static int
-readhistline(int start, char **bufp, int *bufsiz, FILE *in)
+readhistline(int start, char **bufp, int *bufsiz, FILE *in, int *readbytes)
{
char *buf = *bufp;
if (fgets(buf + start, *bufsiz - start, in)) {
- int len = start + strlen(buf + start);
+ int len = strlen(buf + start);
+ *readbytes += len;
+ len += start;
if (len == start)
return -1;
if (buf[len - 1] != '\n') {
@@ -2588,7 +2590,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
return -1;
*bufp = zrealloc(buf, 2 * (*bufsiz));
*bufsiz = 2 * (*bufsiz);
- return readhistline(len, bufp, bufsiz, in);
+ return readhistline(len, bufp, bufsiz, in, readbytes);
}
}
else {
@@ -2596,7 +2598,7 @@ readhistline(int start, char **bufp, int *bufsiz, FILE *in)
if (len > 1 && buf[len - 2] == '\\') {
buf[--len - 1] = '\n';
if (!feof(in))
- return readhistline(len, bufp, bufsiz, in);
+ return readhistline(len, bufp, bufsiz, in, readbytes);
}
}
return len;
@@ -2616,7 +2618,7 @@ readhistfile(char *fn, int err, int readflags)
short *words;
struct stat sb;
int nwordpos, nwords, bufsiz;
- int searching, newflags, l, ret, uselex;
+ int searching, newflags, l, ret, uselex, readbytes;
if (!fn && !(fn = getsparam("HISTFILE")))
return;
@@ -2658,13 +2660,15 @@ readhistfile(char *fn, int err, int readflags)
} else
searching = 0;
+ fpos = ftell(in);
+ readbytes = 0;
newflags = HIST_OLD | HIST_READ;
if (readflags & HFILE_FAST)
newflags |= HIST_FOREIGN;
if (readflags & HFILE_SKIPOLD
|| (hist_ignore_all_dups && newflags & hist_skip_flags))
newflags |= HIST_MAKEUNIQUE;
- while (fpos = ftell(in), (l = readhistline(0, &buf, &bufsiz, in))) {
+ while (fpos += readbytes, readbytes = 0, (l = readhistline(0, &buf, &bufsiz, in, &readbytes))) {
char *pt;
int remeta = 0;
--
2.25.0
next prev parent reply other threads:[~2020-02-07 11:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 7:47 Michael Stapelberg
2020-02-07 11:30 ` Mikael Magnusson
2020-02-07 11:33 ` Michael Stapelberg [this message]
2021-04-10 0:54 ` Oliver Kiddle
2021-04-10 17:31 ` Bart Schaefer
2020-02-08 5:54 ` Bart Schaefer
2020-02-08 7:47 ` Michael Stapelberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANnVG6mmK+i87hSu1KUxs9APyeXVHiwf1oZpokCgYSEu4UP_tQ@mail.gmail.com \
--to=michael+zsh@stapelberg.ch \
--cc=mikachu@gmail.com \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).