From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13385 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Deadlock when calling fflush/fclose in multiple threads Date: Fri, 2 Nov 2018 10:29:15 -0400 Message-ID: <20181102142915.GG5150@brightrain.aerifal.cx> References: <2018110213110009300613@kooiot.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1541168852 7780 195.159.176.226 (2 Nov 2018 14:27:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 2 Nov 2018 14:27:32 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) Cc: musl To: "dirk@kooiot.com" Original-X-From: musl-return-13401-gllmg-musl=m.gmane.org@lists.openwall.com Fri Nov 02 15:27:28 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gIaQ3-0001tb-1V for gllmg-musl@m.gmane.org; Fri, 02 Nov 2018 15:27:27 +0100 Original-Received: (qmail 30038 invoked by uid 550); 2 Nov 2018 14:29:34 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 30020 invoked from network); 2 Nov 2018 14:29:33 -0000 Content-Disposition: inline In-Reply-To: <2018110213110009300613@kooiot.com> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13385 Archived-At: On Fri, Nov 02, 2018 at 01:11:00PM +0800, dirk@kooiot.com wrote: > Hi, > > We got deadlock on fflush/fclose with musl-1.1.19 (openwrt 18.06). > Actually we using lua's popen in mutiple threads, following is gdb > trace. > > I am new to musl libc source code, fflush(NULL) will call __ofl_lock > and then try to lock and flush every stream, fclose will lock the > stream and then __ofl_lock. The question is the fflush/fclose api > thread-safe? What i have got from man document is that linux > fflush/fclose is thread-safe api. Your analysis is exactly correct. Calling fflush(NULL) frequently (or at all) is a really bad idea because of how it scales and how serializing it is, but it is valid, and the deadlock is a bug in musl. The current placement of the ofl update seems to have been based on minimizing how serializing fclose is, and on avoiding taking the global lock for F_PERM (stdin/out/err) FILEs (which is largely a useless optimization since the operation can happen at most 3 times). Just moving it above the FLOCK (and making it not conditional on F_PERM, to avoid data races) would solve this, but there's a deeper bug here too. By removing the FILE being closed from the open file list (and unlocking the open file list, without which the removal can't be seen) before it's flushed and closed, fclose creates a race window where fflush(NULL) or exit() from another thread can complete without this file being flushed, potentially causing data loss. I think we just have to move the __ofl_lock to the top of the function, before FLOCK, and the __ofl_unlock to after the fflush/close. Unfortunately this makes fclose much more serializing than it was before, but I don't see any way to avoid it. Rich