zsh-workers
 help / color / mirror / code / Atom feed
* zsh got stuck without any message because of history lock file
@ 2014-04-24 16:48 Andreas
  2014-04-24 17:02 ` Bart Schaefer
  2014-04-24 17:58 ` Andreas
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas @ 2014-04-24 16:48 UTC (permalink / raw)
  To: zsh-workers

My zsh got stuck without any error message, because the 
~/.zsh-history.LOCK file wasn't deleted after a system crash. This is a 
real problem if one uses zsh as default shell and therefore can't 
execute any command. Luckily i was able to open a bash terminal and used 
strace to determine why zsh doesn't respond.
Please add an error message or prompt the user, if the lock should be 
deleted.

I'm using zsh "5.0.5 (x86_64-unknown-linux-gnu)" on Arch Linux with 
Kernel 3.14.1-1-ARCH x86_64.

Andreas Linz


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-04-24 16:48 zsh got stuck without any message because of history lock file Andreas
@ 2014-04-24 17:02 ` Bart Schaefer
  2014-04-25  5:02   ` Bart Schaefer
  2014-04-24 17:58 ` Andreas
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2014-04-24 17:02 UTC (permalink / raw)
  To: zsh-workers

On Apr 24,  6:48pm, Andreas wrote:
} Subject: zsh got stuck without any message because of history lock file
}
} My zsh got stuck without any error message, because the 
} ~/.zsh-history.LOCK file wasn't deleted after a system crash.

The history mechanism should time out on lock attempts after a few
seconds ... oh.  Was the system clock perhaps wrong, or did the crash
result in a lock file with a timestamp in the future?

The way the code is currently written the shell will wait until 10
seconds after the modification time stamp of the file.  If for some
reason the time stamp is wrong (clocks out of sync with NFS server,
for example) then it may end up waiting for longer.

This should get fixed.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: zsh got stuck without any message because of history lock file
  2014-04-24 16:48 zsh got stuck without any message because of history lock file Andreas
  2014-04-24 17:02 ` Bart Schaefer
@ 2014-04-24 17:58 ` Andreas
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas @ 2014-04-24 17:58 UTC (permalink / raw)
  To: zsh-workers

On Thu, 24 Apr 2014 10:02:28, Bart wrote:
> ... If for some
> reason the time stamp is wrong (clocks out of sync with NFS server,
> for example) then it may end up waiting for longer.

I deleted the file already, but i remember that the timestamp was in the future, maybe the system crash got the file to be corrupt.

Andreas


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-04-24 17:02 ` Bart Schaefer
@ 2014-04-25  5:02   ` Bart Schaefer
  2014-04-25  8:35     ` Peter Stephenson
  2014-06-26 20:46     ` Mikael Magnusson
  0 siblings, 2 replies; 10+ messages in thread
From: Bart Schaefer @ 2014-04-25  5:02 UTC (permalink / raw)
  To: zsh-workers

On Apr 24, 10:02am, Bart Schaefer wrote:
}
} The way the code is currently written the shell will wait until 10
} seconds after the modification time stamp of the file.  If for some
} reason the time stamp is wrong (clocks out of sync with NFS server,
} for example) then it may end up waiting for longer.

So ... here is a patch that checks for future timestamps.  If it finds
one, it leaves errno set to EEXIST, which results in:

zsh: locking failed for /home/schaefer/.zhistory: file exists

This isn't perfect, but at least it provides a clue.

There's a controversial bit in this patch:  It appears HIST_FCNTL_LOCK
may have been broken ever since workers/28047 in 2010.  That patch
changed flockhistfile() [which is misnamed, it uses fcntl] to return 0
on success, but then also this:

	ret = flockhistfile(fn, keep_trying);
	if (ret)
	    return ret;

That means that we only stop when flockhistfile() FAILS.  On success,
we go on and lock with BOTH fcntl AND a .LOCK file.  This completely
contradicts the documentation claim that "this may provide better
performance."  So the second hunk below makes flockhistfile() the
only lock applied when HIST_FCNTL_LOCK is set.

However, this would mean that every zsh that is accessing the same
HISTFILE has to be using the same locking options.  Is that OK?


diff --git a/Src/hist.c b/Src/hist.c
index 1624912..1182994 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2720,6 +2720,25 @@ savehistfile(char *fn, int err, int writeflags)
 
 static int lockhistct;
 
+static int
+checklocktime(char *lockfile, time_t then)
+{
+    time_t now = time(NULL);
+
+    if (now + 10 < then) {
+	/* File is more than 10 seconds in the future? */
+	errno = EEXIST;
+	return -1;
+    }
+
+    if (now - then < 10)
+	sleep(1);
+    else
+	unlink(lockfile);
+
+    return 0;
+}
+
 /*
  * Lock history file.  Return 0 on success, 1 on failure to lock this
  * time, 2 on permanent failure (e.g. permission).
@@ -2737,9 +2756,7 @@ lockhistfile(char *fn, int keep_trying)
 
 #ifdef HAVE_FCNTL_H
     if (isset(HISTFCNTLLOCK) && flock_fd < 0) {
-	ret = flockhistfile(fn, keep_trying);
-	if (ret)
-	    return ret;
+	return flockhistfile(fn, keep_trying);
     }
 #endif
 
@@ -2776,10 +2793,10 @@ lockhistfile(char *fn, int keep_trying)
 		    continue;
 		break;
 	    }
-	    if (time(NULL) - sb.st_mtime < 10)
-		sleep(1);
-	    else
-		unlink(lockfile);
+	    if (checklocktime(lockfile, sb.st_mtime) < 0) {
+		ret = 1;
+		break;
+	    }
 	}
 	if (fd < 0)
 	    lockhistct--;
@@ -2804,10 +2821,10 @@ lockhistfile(char *fn, int keep_trying)
 			continue;
 		    ret = 2;
 		} else {
-		    if (time(NULL) - sb.st_mtime < 10)
-			sleep(1);
-		    else
-			unlink(lockfile);
+		    if (checklocktime(lockfile, sb.st_mtime) < 0) {
+			ret = 1;
+			break;
+		    }
 		    continue;
 		}
 		lockhistct--;
@@ -2832,10 +2849,10 @@ lockhistfile(char *fn, int keep_trying)
 		ret = 2;
 		break;
 	    }
-	    if (time(NULL) - sb.st_mtime < 10)
-		sleep(1);
-	    else
-		unlink(lockfile);
+	    if (checklocktime(lockfile, sb.st_mtime) < 0) {
+		ret = 1;
+		break;
+	    }
 	}
 	if (fd < 0)
 	    lockhistct--;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-04-25  5:02   ` Bart Schaefer
@ 2014-04-25  8:35     ` Peter Stephenson
  2014-04-25 15:26       ` Bart Schaefer
  2014-04-25 15:53       ` Bart Schaefer
  2014-06-26 20:46     ` Mikael Magnusson
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Stephenson @ 2014-04-25  8:35 UTC (permalink / raw)
  To: zsh-workers

On Thu, 24 Apr 2014 22:02:32 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> However, this would mean that every zsh that is accessing the same
> HISTFILE has to be using the same locking options.  Is that OK?

Sounds highly plausible.  It's hard to construct an argument where
locking a single file with multiple different mechanisms is the right
answer.

It's fairly easy to get inconsistent shell options, however.  I suppose
we could think about simple tests to see if it looks like it's been
locked a different way and print a warning.

pws


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-04-25  8:35     ` Peter Stephenson
@ 2014-04-25 15:26       ` Bart Schaefer
  2014-04-25 15:53       ` Bart Schaefer
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2014-04-25 15:26 UTC (permalink / raw)
  To: zsh-workers

On Apr 25,  9:35am, Peter Stephenson wrote:
}
} On Thu, 24 Apr 2014 22:02:32 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > However, this would mean that every zsh that is accessing the same
} > HISTFILE has to be using the same locking options.  Is that OK?
} 
} It's fairly easy to get inconsistent shell options, however.  I suppose
} we could think about simple tests to see if it looks like it's been
} locked a different way and print a warning.

That shouldn't be too difficult code-wise, but will require calling
lstat() on the .LOCK file name (ignoring failure thereof).

I'm going to go ahead and commit 32580 while we think about it.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-04-25  8:35     ` Peter Stephenson
  2014-04-25 15:26       ` Bart Schaefer
@ 2014-04-25 15:53       ` Bart Schaefer
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2014-04-25 15:53 UTC (permalink / raw)
  To: zsh-workers

On Apr 25,  9:35am, Peter Stephenson wrote:
}
} It's hard to construct an argument where locking a single file with
} multiple different mechanisms is the right answer.

The argument goes something like this:

Consider a .zhistory on a shared filesystem (NFS or Samba).  Mutliple
hosts access this file system.  Some of the operating systems involved
support fcntl(), some do not.

In this situation the only correct thing is to use the .LOCK file, but
that might be expensive for some of the hosts that have fcntl().  Posit
that these are the busiest hosts, most likely to have lock contention.

In this circumstance it may be beneficial to "fail faster" by attempting
fcntl() [which blocks out all the other zsh that support it], then only
create the .LOCK file [to block out the rest] after fcntl() succeeds.

I don't claim this scenario is likely or that we have to account for it,
but prior to 32580 zsh *has* supported it.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-04-25  5:02   ` Bart Schaefer
  2014-04-25  8:35     ` Peter Stephenson
@ 2014-06-26 20:46     ` Mikael Magnusson
  2014-09-06  9:03       ` Mikael Magnusson
  1 sibling, 1 reply; 10+ messages in thread
From: Mikael Magnusson @ 2014-06-26 20:46 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh workers

On 25 April 2014 07:02, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Apr 24, 10:02am, Bart Schaefer wrote:
> }
> } The way the code is currently written the shell will wait until 10
> } seconds after the modification time stamp of the file.  If for some
> } reason the time stamp is wrong (clocks out of sync with NFS server,
> } for example) then it may end up waiting for longer.
>
> So ... here is a patch that checks for future timestamps.  If it finds
> one, it leaves errno set to EEXIST, which results in:
>
> zsh: locking failed for /home/schaefer/.zhistory: file exists
>
> This isn't perfect, but at least it provides a clue.
>
> There's a controversial bit in this patch:  It appears HIST_FCNTL_LOCK
> may have been broken ever since workers/28047 in 2010.  That patch
> changed flockhistfile() [which is misnamed, it uses fcntl] to return 0
> on success, but then also this:
>
>         ret = flockhistfile(fn, keep_trying);
>         if (ret)
>             return ret;
>
> That means that we only stop when flockhistfile() FAILS.  On success,
> we go on and lock with BOTH fcntl AND a .LOCK file.  This completely
> contradicts the documentation claim that "this may provide better
> performance."  So the second hunk below makes flockhistfile() the
> only lock applied when HIST_FCNTL_LOCK is set.
>
> However, this would mean that every zsh that is accessing the same
> HISTFILE has to be using the same locking options.  Is that OK?

I'm not 100% sure it's because of this change, but recently, I've had
new terminals stuck forever on startup until I exit all open terminals
logged in as that user (or possibly just a specific one of them). It
happens very spuriously, maybe a few times per week. I have
histfcntllock set. On one occasion I tried moving the .history file
instead of exiting, and this also immediately unstuck all new
terminals.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-06-26 20:46     ` Mikael Magnusson
@ 2014-09-06  9:03       ` Mikael Magnusson
  2014-09-06 18:51         ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Mikael Magnusson @ 2014-09-06  9:03 UTC (permalink / raw)
  To: zsh workers

On 26 June 2014 22:46, Mikael Magnusson <mikachu@gmail.com> wrote:
> On 25 April 2014 07:02, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> On Apr 24, 10:02am, Bart Schaefer wrote:
>> }
>> } The way the code is currently written the shell will wait until 10
>> } seconds after the modification time stamp of the file.  If for some
>> } reason the time stamp is wrong (clocks out of sync with NFS server,
>> } for example) then it may end up waiting for longer.
>>
>> So ... here is a patch that checks for future timestamps.  If it finds
>> one, it leaves errno set to EEXIST, which results in:
>>
>> zsh: locking failed for /home/schaefer/.zhistory: file exists
>>
>> This isn't perfect, but at least it provides a clue.
>>
>> There's a controversial bit in this patch:  It appears HIST_FCNTL_LOCK
>> may have been broken ever since workers/28047 in 2010.  That patch
>> changed flockhistfile() [which is misnamed, it uses fcntl] to return 0
>> on success, but then also this:
>>
>>         ret = flockhistfile(fn, keep_trying);
>>         if (ret)
>>             return ret;
>>
>> That means that we only stop when flockhistfile() FAILS.  On success,
>> we go on and lock with BOTH fcntl AND a .LOCK file.  This completely
>> contradicts the documentation claim that "this may provide better
>> performance."  So the second hunk below makes flockhistfile() the
>> only lock applied when HIST_FCNTL_LOCK is set.
>>
>> However, this would mean that every zsh that is accessing the same
>> HISTFILE has to be using the same locking options.  Is that OK?
>
> I'm not 100% sure it's because of this change, but recently, I've had
> new terminals stuck forever on startup until I exit all open terminals
> logged in as that user (or possibly just a specific one of them). It
> happens very spuriously, maybe a few times per week. I have
> histfcntllock set. On one occasion I tried moving the .history file
> instead of exiting, and this also immediately unstuck all new
> terminals.

I'm still seeing reports of this every now and then in #zsh (and I
have the option unset so I'm not sure if it happens to me anymore.) If
nobody fixed this, it's probably still a problem.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: zsh got stuck without any message because of history lock file
  2014-09-06  9:03       ` Mikael Magnusson
@ 2014-09-06 18:51         ` Bart Schaefer
  0 siblings, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2014-09-06 18:51 UTC (permalink / raw)
  To: zsh workers

On Sep 6, 11:03am, Mikael Magnusson wrote:
}
} > I'm not 100% sure it's because of this change, but recently, I've had
} > new terminals stuck forever on startup until I exit all open terminals
} > logged in as that user (or possibly just a specific one of them). It
} > happens very spuriously, maybe a few times per week. I have
} > histfcntllock set. On one occasion I tried moving the .history file
} > instead of exiting, and this also immediately unstuck all new
} > terminals.
} 
} I'm still seeing reports of this every now and then in #zsh (and I
} have the option unset so I'm not sure if it happens to me anymore.) If
} nobody fixed this, it's probably still a problem.

So, a few things.

The unlockhistfile() operation assumes that closing the flock_fd is
enough to release the lock.  I can't quote chapter and verse for this
from e.g. the POSIX spec, but I believe that to be correct.

However, lockhistfile() does not call flockhistfile() when it has
already been locked (flock_fd >= 0), and in that case it falls through
to file-based locking again.  This isn't a problem except when hend()
is deciding whether to call savehistfile(); it only considers whether
the file is already locked in the case of SHARE_HISTORY, but may call
savehistfile() for both SHARE_HISTORY and INC_APPEND_HISTORY et al.,
and savehistfile() calls lockhistfile() again.

Additionally, lockhistfile() only increments the lockhistct counter
when file locking, not when fcntl locking, which leads to an incorrect
answer from histfileIsLocked().

So I think there's a potential problem where INC_APPEND_HISTORY could
apply both locking mechanisms at different times, leading to a race.

These were all assumptions about both kinds of locking being applied
that I missed before.  The "better performance" from HIST_FCNTL_LOCK
I guess just mean that it would fail faster on permission problems or
lock collisions, not that it would improve performance overall.

We can either apply the additional patch below, or back out some of
the last one to restore the "always locks both ways" behavior.


diff --git a/Src/hist.c b/Src/hist.c
index 770d559..d29a65a 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2490,6 +2490,9 @@ flockhistfile(char *fn, int keep_trying)
     struct flock lck;
     int ctr = keep_trying ? 9 : 0;
 
+    if (flock_fd >= 0)
+	return 0; /* already locked */
+
     if ((flock_fd = open(unmeta(fn), O_RDWR | O_NOCTTY)) < 0)
 	return errno == ENOENT ? 0 : 2; /* "successfully" locked missing file */
 
@@ -2768,12 +2771,6 @@ lockhistfile(char *fn, int keep_trying)
     if (!fn && !(fn = getsparam("HISTFILE")))
 	return 1;
 
-#ifdef HAVE_FCNTL_H
-    if (isset(HISTFCNTLLOCK) && flock_fd < 0) {
-	return flockhistfile(fn, keep_trying);
-    }
-#endif
-
     if (!lockhistct++) {
 	struct stat sb;
 	int fd;
@@ -2786,6 +2783,11 @@ lockhistfile(char *fn, int keep_trying)
 # endif
 #endif
 
+#ifdef HAVE_FCNTL_H
+	if (isset(HISTFCNTLLOCK))
+	    return flockhistfile(fn, keep_trying);
+#endif
+
 	lockfile = bicat(unmeta(fn), ".LOCK");
 	/* NOTE: only use symlink locking on a link()-having host in order to
 	 * avoid a change from open()-based locking to symlink()-based. */


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-06 18:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-24 16:48 zsh got stuck without any message because of history lock file Andreas
2014-04-24 17:02 ` Bart Schaefer
2014-04-25  5:02   ` Bart Schaefer
2014-04-25  8:35     ` Peter Stephenson
2014-04-25 15:26       ` Bart Schaefer
2014-04-25 15:53       ` Bart Schaefer
2014-06-26 20:46     ` Mikael Magnusson
2014-09-06  9:03       ` Mikael Magnusson
2014-09-06 18:51         ` Bart Schaefer
2014-04-24 17:58 ` Andreas

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).