zsh-users
 help / color / mirror / code / Atom feed
* No fsync on history file? I lost my history
@ 2018-09-23  8:52 lilydjwg
  2018-09-23 13:35 ` Daniel Shahaf
  0 siblings, 1 reply; 10+ messages in thread
From: lilydjwg @ 2018-09-23  8:52 UTC (permalink / raw)
  To: zsh-users

(Resending to zsh-users because my message to zsh-workers seemed to be
ignored.)

Hi there!

I just experienced a kernel oops. When I was back after a hard reset,
I lost my entire zsh history: the file was truncated to zero bytes.

I noticed zsh writes to a temporary file first, then renames, but
doesn't call fsync after finishing writing. I talked with my friend
knowing a lot about filesystems and I think a fsync will prevent this
kind of thing (or it'll be a kernel bug).

What do you think? Would you add the fsync call or make an option for
that? I recovered from a backup but it was not great experience.

-- 
Best regards,
lilydjwg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23  8:52 No fsync on history file? I lost my history lilydjwg
@ 2018-09-23 13:35 ` Daniel Shahaf
  2018-09-23 14:22   ` lilydjwg
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Shahaf @ 2018-09-23 13:35 UTC (permalink / raw)
  To: lilydjwg, zsh-users

lilydjwg wrote on Sun, 23 Sep 2018 16:52 +0800:
> Would you add the fsync call or make an option for that?

I'd be happy to accept patches for this.

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 13:35 ` Daniel Shahaf
@ 2018-09-23 14:22   ` lilydjwg
  2018-09-23 14:46     ` Daniel Shahaf
  0 siblings, 1 reply; 10+ messages in thread
From: lilydjwg @ 2018-09-23 14:22 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: zsh-users

[-- Attachment #1: Type: text/plain, Size: 342 bytes --]

On Sun, Sep 23, 2018 at 01:35:47PM +0000, Daniel Shahaf wrote:
> lilydjwg wrote on Sun, 23 Sep 2018 16:52 +0800:
> > Would you add the fsync call or make an option for that?
> 
> I'd be happy to accept patches for this.

Here it is. I've checked that it works on my Linux system, but not sure
about other systems.

-- 
Best regards,
lilydjwg

[-- Attachment #2: 0001-Call-fsync-after-writing-out-new-histfile.patch --]
[-- Type: text/plain, Size: 746 bytes --]

From 78b1df1329b2c7bb2b6e78e5b7d70be4bbf103e2 Mon Sep 17 00:00:00 2001
From: lilydjwg <lilydjwg@gmail.com>
Date: Sun, 23 Sep 2018 22:12:56 +0800
Subject: [PATCH] Call fsync after writing out new histfile

to ensure the data is on disk before the rename in case of a system crash.
---
 Src/hist.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Src/hist.c b/Src/hist.c
index dbdc1e4e5..e128e9f14 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2933,6 +2933,9 @@ savehistfile(char *fn, int err, int writeflags)
 		lasthist.text = ztrdup(start);
 	    }
 	}
+	fflush(out); /* need to flush before fsync */
+	if (fsync(fileno(out)) < 0 && ret >= 0)
+	    ret = -1;
 	if (fclose(out) < 0 && ret >= 0)
 	    ret = -1;
 	if (ret >= 0) {
-- 
2.19.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 14:22   ` lilydjwg
@ 2018-09-23 14:46     ` Daniel Shahaf
  2018-09-23 15:25       ` lilydjwg
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Shahaf @ 2018-09-23 14:46 UTC (permalink / raw)
  To: lilydjwg; +Cc: zsh-users

lilydjwg wrote on Sun, 23 Sep 2018 22:22 +0800:
> Here it is. I've checked that it works on my Linux system, but not sure
> about other systems.

fsync() is in POSIX.  I assume we can just call it, but if somebody complains
we'll need to use an HAVE_FSYNC guard.

> +++ b/Src/hist.c
> @@ -2933,6 +2933,9 @@ savehistfile(char *fn, int err, int writeflags)
>  		lasthist.text = ztrdup(start);
>  	    }
>  	}
> +	fflush(out); /* need to flush before fsync */

Isn't the fflush() on line 2927 sufficient?  (Even if it isn't, I would have
expected a ret>=0 guard around this call.)

> +	if (fsync(fileno(out)) < 0 && ret >= 0)
> +	    ret = -1;

fileno() can return -1.

Shouldn't the ret>=0 check happen before the calls to fileno() and fsync()?

>  	if (fclose(out) < 0 && ret >= 0)
>  	    ret = -1;
>  	if (ret >= 0) {

Cheers,

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 14:46     ` Daniel Shahaf
@ 2018-09-23 15:25       ` lilydjwg
  2018-09-23 15:45         ` Daniel Shahaf
  2018-09-23 18:02         ` Mikael Magnusson
  0 siblings, 2 replies; 10+ messages in thread
From: lilydjwg @ 2018-09-23 15:25 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: zsh-users

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

I'm sending an updated patch.

On Sun, Sep 23, 2018 at 02:46:51PM +0000, Daniel Shahaf wrote:
> fsync() is in POSIX.  I assume we can just call it, but if somebody complains
> we'll need to use an HAVE_FSYNC guard.

I don't know how to add a HAVE_FSYNC macro to the build system, sorry.

> > +++ b/Src/hist.c
> > @@ -2933,6 +2933,9 @@ savehistfile(char *fn, int err, int writeflags)
> >  		lasthist.text = ztrdup(start);
> >  	    }
> >  	}
> > +	fflush(out); /* need to flush before fsync */
> 
> Isn't the fflush() on line 2927 sufficient?  (Even if it isn't, I would have
> expected a ret>=0 guard around this call.)

It should call write(2) to write out the buffered data. Then the kernel
can fsync the data to disk. A guard has been added.

> > +	if (fsync(fileno(out)) < 0 && ret >= 0)
> > +	    ret = -1;
> 
> fileno() can return -1.

It shouldn't matter, fsync will return EBADF for -1. Other parts of the
code don't check for this either, and I can't think a case when fileno
would fail after so many successful I/O operations on it (corrupted memory?)

> Shouldn't the ret>=0 check happen before the calls to fileno() and fsync()?

Yes, I've changed that.

-- 
Best regards,
lilydjwg

[-- Attachment #2: 0001-Call-fsync-after-writing-out-new-histfile.patch --]
[-- Type: text/plain, Size: 774 bytes --]

From 3c6c07733f12176c737d1f610f0dceafd07437df Mon Sep 17 00:00:00 2001
From: lilydjwg <lilydjwg@gmail.com>
Date: Sun, 23 Sep 2018 22:12:56 +0800
Subject: [PATCH] Call fsync after writing out new histfile

to ensure the data is on disk before the rename in case of a system crash.
---
 Src/hist.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Src/hist.c b/Src/hist.c
index dbdc1e4e5..d3370252d 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2933,6 +2933,10 @@ savehistfile(char *fn, int err, int writeflags)
 		lasthist.text = ztrdup(start);
 	    }
 	}
+	if (ret >= 0)
+	    ret = fflush(out); /* need to flush before fsync */
+	if (ret >= 0 && fsync(fileno(out)) < 0)
+	    ret = -1;
 	if (fclose(out) < 0 && ret >= 0)
 	    ret = -1;
 	if (ret >= 0) {
-- 
2.19.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 15:25       ` lilydjwg
@ 2018-09-23 15:45         ` Daniel Shahaf
  2018-09-23 19:40           ` Vincent Lefevre
  2018-09-23 18:02         ` Mikael Magnusson
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Shahaf @ 2018-09-23 15:45 UTC (permalink / raw)
  To: lilydjwg; +Cc: zsh-users

lilydjwg wrote on Sun, 23 Sep 2018 23:25 +0800:
> On Sun, Sep 23, 2018 at 02:46:51PM +0000, Daniel Shahaf wrote:
> > fsync() is in POSIX.  I assume we can just call it, but if somebody complains
> > we'll need to use an HAVE_FSYNC guard.
> 
> I don't know how to add a HAVE_FSYNC macro to the build system, sorry.
> 

I doubt that's needed. I had a quick look and found that libapr's
apr_file_sync() call calls fsync() without a guard, implying that
fsync() exists on all platforms libapr is used on.

If we do need it, we'll add 'fsync' to the AC_CHECK_FUNCS call,
regenerate configure (using Util/preconfig), and use #ifdef HAVE_FSYNC.

> > > +++ b/Src/hist.c
> > > @@ -2933,6 +2933,9 @@ savehistfile(char *fn, int err, int writeflags)
> > >  		lasthist.text = ztrdup(start);
> > >  	    }
> > >  	}
> > > +	fflush(out); /* need to flush before fsync */
> > 
> > Isn't the fflush() on line 2927 sufficient?  (Even if it isn't, I would have
> > expected a ret>=0 guard around this call.)
> 
> It should call write(2) to write out the buffered data. Then the kernel
> can fsync the data to disk. A guard has been added.
> 

Yes, I understand that fflush(3) must be called in order to flush data
from libc's buffers into kernel buffers, which fsync(2) will later flush
to disk.  My question was whether the fflush() call being added was
redundant because of the existing call on line 2927.  It would be odd
to have two fflush() calls in a row without fwrite() between them; and
to have only one of them update lasthist.

Maybe it's all correct and it's just my lack of familiarity of hist.c
that's showing.

> > > +	if (fsync(fileno(out)) < 0 && ret >= 0)
> > > +	    ret = -1;
> > 
> > fileno() can return -1.
> 
> It shouldn't matter, fsync will return EBADF for -1. Other parts of the
> code don't check for this either, and I can't think a case when fileno
> would fail after so many successful I/O operations on it (corrupted memory?)
> 

Fair enough.

> > Shouldn't the ret>=0 check happen before the calls to fileno() and fsync()?
> 
> Yes, I've changed that.

Thanks.

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 15:25       ` lilydjwg
  2018-09-23 15:45         ` Daniel Shahaf
@ 2018-09-23 18:02         ` Mikael Magnusson
  2018-09-24  3:04           ` lilydjwg
  1 sibling, 1 reply; 10+ messages in thread
From: Mikael Magnusson @ 2018-09-23 18:02 UTC (permalink / raw)
  To: lilydjwg; +Cc: Daniel Shahaf, Zsh Users

On Sun, Sep 23, 2018 at 5:25 PM, lilydjwg <lilydjwg@gmail.com> wrote:
> I'm sending an updated patch.
>
> On Sun, Sep 23, 2018 at 02:46:51PM +0000, Daniel Shahaf wrote:
>> fsync() is in POSIX.  I assume we can just call it, but if somebody complains
>> we'll need to use an HAVE_FSYNC guard.
>
> I don't know how to add a HAVE_FSYNC macro to the build system, sorry.
>
>> > +++ b/Src/hist.c
>> > @@ -2933,6 +2933,9 @@ savehistfile(char *fn, int err, int writeflags)
>> >             lasthist.text = ztrdup(start);
>> >         }
>> >     }
>> > +   fflush(out); /* need to flush before fsync */
>>
>> Isn't the fflush() on line 2927 sufficient?  (Even if it isn't, I would have
>> expected a ret>=0 guard around this call.)
>
> It should call write(2) to write out the buffered data. Then the kernel
> can fsync the data to disk. A guard has been added.
>
>> > +   if (fsync(fileno(out)) < 0 && ret >= 0)
>> > +       ret = -1;
>>
>> fileno() can return -1.
>
> It shouldn't matter, fsync will return EBADF for -1. Other parts of the
> code don't check for this either, and I can't think a case when fileno
> would fail after so many successful I/O operations on it (corrupted memory?)
>
>> Shouldn't the ret>=0 check happen before the calls to fileno() and fsync()?
>
> Yes, I've changed that.
>
> --
> Best regards,
> lilydjwg
>
> From 3c6c07733f12176c737d1f610f0dceafd07437df Mon Sep 17 00:00:00 2001
> From: lilydjwg <lilydjwg@gmail.com>
> Date: Sun, 23 Sep 2018 22:12:56 +0800
> Subject: [PATCH] Call fsync after writing out new histfile
>
> to ensure the data is on disk before the rename in case of a system crash.
> ---
>  Src/hist.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/Src/hist.c b/Src/hist.c
> index dbdc1e4e5..d3370252d 100644
> --- a/Src/hist.c
> +++ b/Src/hist.c
> @@ -2933,6 +2933,10 @@ savehistfile(char *fn, int err, int writeflags)
>                 lasthist.text = ztrdup(start);
>             }
>         }
> +       if (ret >= 0)
> +           ret = fflush(out); /* need to flush before fsync */
> +       if (ret >= 0 && fsync(fileno(out)) < 0)
> +           ret = -1;
>         if (fclose(out) < 0 && ret >= 0)
>             ret = -1;
>         if (ret >= 0) {

Please only do this for the tmpfile case, or interactive usage is
going to be unbearable under load with incappendhistory.

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 15:45         ` Daniel Shahaf
@ 2018-09-23 19:40           ` Vincent Lefevre
  2018-09-24  3:03             ` lilydjwg
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Lefevre @ 2018-09-23 19:40 UTC (permalink / raw)
  To: zsh-users

On 2018-09-23 15:45:17 +0000, Daniel Shahaf wrote:
> > > > +++ b/Src/hist.c
> > > > @@ -2933,6 +2933,9 @@ savehistfile(char *fn, int err, int writeflags)
> > > >  		lasthist.text = ztrdup(start);
> > > >  	    }
> > > >  	}
> > > > +	fflush(out); /* need to flush before fsync */
> > > 
> > > Isn't the fflush() on line 2927 sufficient?  (Even if it isn't, I would have
> > > expected a ret>=0 guard around this call.)
> > 
> > It should call write(2) to write out the buffered data. Then the kernel
> > can fsync the data to disk. A guard has been added.
> 
> Yes, I understand that fflush(3) must be called in order to flush data
> from libc's buffers into kernel buffers, which fsync(2) will later flush
> to disk.  My question was whether the fflush() call being added was
> redundant because of the existing call on line 2927.  It would be odd
> to have two fflush() calls in a row without fwrite() between them; and
> to have only one of them update lasthist.

But the first one is in a "if", thus might have not been called. Or
there is an inconsistency in the conditions, which should be fixed.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 19:40           ` Vincent Lefevre
@ 2018-09-24  3:03             ` lilydjwg
  0 siblings, 0 replies; 10+ messages in thread
From: lilydjwg @ 2018-09-24  3:03 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: zsh-users

On 2018-09-23 15:45:17 +0000, Daniel Shahaf wrote:
> Yes, I understand that fflush(3) must be called in order to flush data
> from libc's buffers into kernel buffers, which fsync(2) will later flush
> to disk.  My question was whether the fflush() call being added was
> redundant because of the existing call on line 2927.  It would be odd
> to have two fflush() calls in a row without fwrite() between them; and
> to have only one of them update lasthist.

Oh sorry I misread your message. The fflush() above isn't sufficient
because it's not called in my case. I don't know when the "if" will be
true but calling two fflush() is better than calling none.

Maybe calling only once above them is better, but this will leave the
fflush() a little far from where it's needed.

On Sun, Sep 23, 2018 at 09:40:19PM +0200, Vincent Lefevre wrote:
> But the first one is in a "if", thus might have not been called. Or
> there is an inconsistency in the conditions, which should be fixed.

What does HFILE_USE_OPTIONS mean? This seems to be unset when zsh exits.

-- 
Best regards,
lilydjwg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: No fsync on history file? I lost my history
  2018-09-23 18:02         ` Mikael Magnusson
@ 2018-09-24  3:04           ` lilydjwg
  0 siblings, 0 replies; 10+ messages in thread
From: lilydjwg @ 2018-09-24  3:04 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Daniel Shahaf, Zsh Users

[-- Attachment #1: Type: text/plain, Size: 246 bytes --]

On Sun, Sep 23, 2018 at 08:02:44PM +0200, Mikael Magnusson wrote:
> Please only do this for the tmpfile case, or interactive usage is
> going to be unbearable under load with incappendhistory.

Yes, indeeded. Updated.

-- 
Best regards,
lilydjwg

[-- Attachment #2: 0001-Call-fsync-after-writing-out-new-histfile.patch --]
[-- Type: text/plain, Size: 813 bytes --]

From 19cfeacfb8338f06577ac8f1d546f5b196903d4c Mon Sep 17 00:00:00 2001
From: lilydjwg <lilydjwg@gmail.com>
Date: Sun, 23 Sep 2018 22:12:56 +0800
Subject: [PATCH] Call fsync after writing out new histfile

to ensure the data is on disk before the rename in case of a system crash.
---
 Src/hist.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Src/hist.c b/Src/hist.c
index dbdc1e4e5..64e5b4766 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2933,6 +2933,12 @@ savehistfile(char *fn, int err, int writeflags)
 		lasthist.text = ztrdup(start);
 	    }
 	}
+	if (tmpfile) {
+	    if (ret >= 0)
+	        ret = fflush(out); /* need to flush before fsync */
+	    if (ret >= 0 && fsync(fileno(out)) < 0)
+	        ret = -1;
+	}
 	if (fclose(out) < 0 && ret >= 0)
 	    ret = -1;
 	if (ret >= 0) {
-- 
2.19.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-24  3:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-23  8:52 No fsync on history file? I lost my history lilydjwg
2018-09-23 13:35 ` Daniel Shahaf
2018-09-23 14:22   ` lilydjwg
2018-09-23 14:46     ` Daniel Shahaf
2018-09-23 15:25       ` lilydjwg
2018-09-23 15:45         ` Daniel Shahaf
2018-09-23 19:40           ` Vincent Lefevre
2018-09-24  3:03             ` lilydjwg
2018-09-23 18:02         ` Mikael Magnusson
2018-09-24  3:04           ` lilydjwg

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).