From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13389 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Deadlock when calling fflush/fclose in multiple threads Date: Fri, 2 Nov 2018 11:45:28 -0400 Message-ID: <20181102154528.GI5150@brightrain.aerifal.cx> References: <2018110213110009300613@kooiot.com> <20181102142915.GG5150@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1541173423 25139 195.159.176.226 (2 Nov 2018 15:43:43 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 2 Nov 2018 15:43:43 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) Cc: musl To: "dirk@kooiot.com" Original-X-From: musl-return-13405-gllmg-musl=m.gmane.org@lists.openwall.com Fri Nov 02 16:43:38 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gIbbm-0006RH-Ar for gllmg-musl@m.gmane.org; Fri, 02 Nov 2018 16:43:38 +0100 Original-Received: (qmail 30333 invoked by uid 550); 2 Nov 2018 15:45:47 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 30310 invoked from network); 2 Nov 2018 15:45:47 -0000 Content-Disposition: inline In-Reply-To: <20181102142915.GG5150@brightrain.aerifal.cx> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13389 Archived-At: On Fri, Nov 02, 2018 at 10:29:15AM -0400, Rich Felker wrote: > By removing the FILE being closed from the open file list (and > unlocking the open file list, without which the removal can't be seen) > before it's flushed and closed, fclose creates a race window where > fflush(NULL) or exit() from another thread can complete without this > file being flushed, potentially causing data loss. > > I think we just have to move the __ofl_lock to the top of the > function, before FLOCK, and the __ofl_unlock to after the > fflush/close. Unfortunately this makes fclose much more serializing > than it was before, but I don't see any way to avoid it. Another possibility seems to be moving the ofl lock and after the fflush, close, and FUNLOCK, but before the free. This leaves a 'dead' FILE in the open file list momentarily, but the only things that can act on it are pthread_create's init_file_lock, __stdio_exit's close_file, and fflush(NULL), and none of these can have any side effects except on a FILE with buffered data (which the FILE being closed can't have at this point). I think I like this solution better, and I think it's necessary to do something other than the above-quoted idea; holding the ofl lock during flush can itself cause deadlock, since the flush could block and forward progress of whatever has it blocked (e.g. other end of socket or pipe) could depend on forward progress of fclose, fopen, etc. in another thread. Also, in light of having added support for application-provided buffers with setvbuf, even on regular files the flush operation could take a long time. Rich