From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 2fb9d8d8 for ; Wed, 27 Feb 2019 18:31:23 +0000 (UTC) Received: (qmail 3364 invoked by alias); 27 Feb 2019 18:31:12 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 44092 Received: (qmail 4698 invoked by uid 1010); 27 Feb 2019 18:31:12 -0000 X-Qmail-Scanner-Diagnostics: from granite.fifsource.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.100.2/25370. spamassassin: 3.4.2. Clear:RC:0(173.255.216.206):SA:0(-1.9/5.0):. Processed in 3.342687 secs); 27 Feb 2019 18:31:12 -0000 X-Envelope-From: phil@fifi.org X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at fifi.org designates 173.255.216.206 as permitted sender) Message-ID: <717dfbf28e1b56d070ad0038f0367e3d2ab99464.camel@fifi.org> Subject: Issues with fcntl() history file locking From: Philippe Troin To: Zsh hackers list Date: Wed, 27 Feb 2019 10:30:35 -0800 Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.30.5 (3.30.5-1.fc29) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Hi, I've been using zsh with share_history for many years and never had any real issues on several networks where my home directory is mounted over NFS. Recently, it's been giving me trouble, maybe when I bumped up my history file size to 10k entries. Terminal 1 on host1: Terminal 2 on host 2: host1% echo 1 2 3 host2% 1 2 3 host1% host2% host2% unrelated command I'd expect that on pressing UP on host2, my last host1 command would show up. It does most of the times, but not reliably enough to make me completely happy :-) I then discovered hist_fcntl_lock, which I had not ever set, and turned it on. It didn't improve anything. After a bit of stracing, and subsequent reading of the code, I found out that the the history file is opened and closed many times during the history file manipulation, while the lock is maintained on one of the open descriptors. Unfortunately, POSIX states that the fcntl() lock will be released upon the closing the first descriptor to the file. Quoth 'man -s 3p fcntl': All locks associated with a file for a given process shall be removed when a file descriptor for that file is closed by that process or the process holding that file descriptor terminates. The key word here is "a", in "a file descriptor ... is closed". Don't you love standardese? If you look at Src/hist.c, you'll see that locks are sprinkled everywhere and both readhistfile() and writehistfile() open the history file and are cross-recursive. We can totally end up in a case where: * flockhistfile opens and puts a write lock on the history file. * writehistfile opens a new fd to the same file * history needs to be merged/trimmed or whatever else leading to a recursive call to... * readhistfile, which opens another fd to the same file, and closes it, at which point the lock is lost. * writehistfile writes the history file without lock * ... Now I'm not sure if that's what's causing my mysterious shared history lapses, but fixing that problem shouldn't hurt. After contemplating Src/hist.c for a bit, it won't be a trivial fix. I see two ways: the right and hard way, and the easy messy way. The right and hard way is to have the various calls to open() the history file to actually use the flock_fd lock file descriptor (and not close it when done with it, leaving that to unlockhistfile()). I think we can open the descriptor in flockhistfile() with O_APPEND since I haven't spotted any location where we do not write at the end of the file. O_APPEND can't hurt if we don't write in the middle of the file. That leaves the issue of truncating the file when needed. We cannot open(...O_TRUNC...) for the same reason: we will need ftruncate(), which we've avoided all these years :-) and probably an autoconf feature test. We may also have to keep track (or reset) the seek pointer depending on the sequencing of the calls, I haven't fully investigated the need for that. The whole thing will certainly not improve the clarity of the code :-/ The easy messy way is to keep track of all the open descriptors to the history file in a global variable, and delaying the actual close until unlockhistfile() is called. While it's conceptually fugly, it's arguably easier to maintain and understand. Any opinions? Phil. === Full history options and variables % echo $ZSH_VERSION 5.6.2 % setopt | grep hist noappendhistory off nobanghist on cshjunkiehistory off extendedhistory on histallowclobber off nohistbeep off histexpiredupsfirst on histfcntllock on histfindnodups off histignorealldups off histignoredups on histignorespace off histlexwords off histnofunctions off histnostore off histreduceblanks on nohistsavebycopy off histsavenodups off histsubstpattern off histverify off incappendhistory off incappendhistorytime off sharehistory on % set | grep -i hist HISTCHARS='!^#' HISTCMD=10264 HISTFILE=/home/phil/.zhistory HISTSIZE=40960 SAVEHIST=10240