mailing list of musl libc
 help / color / mirror / code / Atom feed
* Re: Deadlock when calling fflush/fclose in multiple threads
@ 2018-11-02 15:59 dirk
  0 siblings, 0 replies; 8+ messages in thread
From: dirk @ 2018-11-02 15:59 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/html, Size: 2977 bytes --]

[-- Attachment #2: Type: text/plain, Size: 2757 bytes --]

In our case, fflush(NULL) is called in lua's popen. free/df/loadavg will be executed in popen to collect device status in multiple threads, for each five seconds.  And the case for me to reproduce this bug is calling popen In infinite loop in two threads, the deadlock happens within one second in our application.

This deadlock happens on x86-64 openwrt (virtual machine), and we haven't meet this deadlock in nxp imx6ul(arm-cortex-a9) and tplink 702n(mips24_kc). I was thought it might caused by 64bits, and we used glibc for those amd64 vms.  We started port our application to Raspberry 3B+from last month, we got same deaklock(the executed commands become zombie). And i got chance to loop at this issue again.



来自 魅族 PRO 6s

-------- 原始邮件 --------
发件人:Rich Felker <dalias@libc.org>
时间:2018年11月2日 22:29
收件人:dirk@kooiot.com
抄送:musl <musl@lists.openwall.com>
主题:Re: [musl] Deadlock when calling fflush/fclose in multiple threads

>On Fri, Nov 02, 2018 at 01:11:00PM +0800, dirk@kooiot.com wrote:
>> Hi,
>> 
>> We got deadlock on fflush/fclose with musl-1.1.19 (openwrt 18.06).
>> Actually we using lua's popen in mutiple threads, following is gdb
>> trace.
>> 
>> I am new to musl libc source code, fflush(NULL) will call __ofl_lock
>> and then try to lock and flush every stream, fclose will lock the
>> stream and then __ofl_lock. The question is the fflush/fclose api
>> thread-safe? What i have got from man document is that linux
>> fflush/fclose is thread-safe api.
>
>Your analysis is exactly correct. Calling fflush(NULL) frequently (or
>at all) is a really bad idea because of how it scales and how
>serializing it is, but it is valid, and the deadlock is a bug in musl.
>
>The current placement of the ofl update seems to have been based on
>minimizing how serializing fclose is, and on avoiding taking the
>global lock for F_PERM (stdin/out/err) FILEs (which is largely a
>useless optimization since the operation can happen at most 3 times).
>Just moving it above the FLOCK (and making it not conditional on
>F_PERM, to avoid data races) would solve this, but there's a deeper
>bug here too.
>
>By removing the FILE being closed from the open file list (and
>unlocking the open file list, without which the removal can't be seen)
>before it's flushed and closed, fclose creates a race window where
>fflush(NULL) or exit() from another thread can complete without this
>file being flushed, potentially causing data loss.
>
>I think we just have to move the __ofl_lock to the top of the
>function, before FLOCK, and the __ofl_unlock to after the
>fflush/close. Unfortunately this makes fclose much more serializing
>than it was before, but I don't see any way to avoid it.
>
>Rich
>

^ permalink raw reply	[flat|nested] 8+ messages in thread
[parent not found: <2018110213110009300613@kooiot.com>]
* Deadlock when calling fflush/fclose in multiple threads
@ 2018-11-02  5:11 dirk
  0 siblings, 0 replies; 8+ messages in thread
From: dirk @ 2018-11-02  5:11 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 7195 bytes --]

Hi,

We got deadlock on fflush/fclose with musl-1.1.19 (openwrt 18.06). Actually we using lua's popen in mutiple threads, following is gdb trace.

I am new to musl libc source code, fflush(NULL) will call __ofl_lock and then try to lock and flush every stream, fclose will lock the stream and then __ofl_lock.  The question is the fflush/fclose api thread-safe?  What i have got from man document is that linux fflush/fclose is thread-safe api.

(gdb) info thread
  Id   Target Id         Frame 
* 1    LWP 2772 "skynet" __cp_end () at src/thread/i386/syscall_cp.s:31
  2    LWP 2776 "skynet" __cp_end () at src/thread/i386/syscall_cp.s:31
  3    LWP 2777 "skynet" __cp_end () at src/thread/i386/syscall_cp.s:31
  4    LWP 2778 "skynet" 0xb7f95ad9 in __kernel_vsyscall ()
  5    LWP 2779 "skynet" __cp_end () at src/thread/i386/syscall_cp.s:31
  6    LWP 2780 "skynet" 0xb7f95ad9 in __kernel_vsyscall ()
  7    LWP 2781 "skynet" 0xb7f95ad9 in __kernel_vsyscall ()
  8    LWP 2782 "skynet" __cp_end () at src/thread/i386/syscall_cp.s:31
(gdb) thread 6
[Switching to thread 6 (LWP 2780)]
#0  0xb7f95ad9 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7f95ad9 in __kernel_vsyscall ()
#1  0xb7fb1314 in __vsyscall () at src/internal/i386/syscall.s:19
#2  0xb7fe29a9 in __syscall4 (a4=0, a3=2781, a2=128, a1=136262812, n=240) at ./arch/i386/syscall_arch.h:37
#3  __wait (addr=0x81f349c, waiters=0x81f34a0, val=2781, priv=128) at src/thread/__wait.c:13
#4  0xb7fd7a52 in __lockfile (f=0x81f3450) at src/stdio/__lockfile.c:10
#5  0xb7fd829d in fflush (f=0x81f3450) at src/stdio/fflush.c:13
#6  0x0807129e in io_popen (L=0x81e5f04) at liolib.c:282
#7  0x0805a064 in luaD_precall (L=0x81e5f04, func=0x81e638c, nresults=2) at ldo.c:434
#8  0x080661f0 in luaV_execute (L=0x81e5f04) at lvm.c:1146
#9  0x08059dc1 in unroll (L=0x81e5f04, ud=0x0) at ldo.c:556
#10 0x080596bb in luaD_rawrunprotected (L=0x81e5f04, f=0x805a250 <resume>, ud=0xb778a5f8) at ldo.c:142
#11 0x0805a46d in lua_resume (L=0x81e5f04, from=0x81d8bd4, nargs=<optimized out>) at ldo.c:664
#12 0x0806f252 in auxresume (L=L@entry=0x81d8bd4, co=co@entry=0x81e5f04, narg=3) at lcorolib.c:39
#13 0x0806f5a5 in luaB_coresume (L=0x81d8bd4) at lcorolib.c:60
#14 0xb7732ca9 in timing_resume (L=0x81d8bd4) at lualib-src/lua-profile.c:134
#15 0x0805a064 in luaD_precall (L=0x81d8bd4, func=0x81e4144, nresults=-1) at ldo.c:434
#16 0x08066686 in luaV_execute (L=0x81d8bd4) at lvm.c:1162
#17 0x0805a348 in luaD_call (L=0x81d8bd4, func=0x81e4084, nResults=-1) at ldo.c:499
#18 0x0805a3ac in luaD_callnoyield (L=0x81d8bd4, func=0x81e4084, nResults=-1) at ldo.c:509
#19 0x080560e8 in f_call (L=0x81d8bd4, ud=0xb778a8e8) at lapi.c:944
#20 0x080596bb in luaD_rawrunprotected (L=0x81d8bd4, f=0x80560d0 <f_call>, ud=0xb778a8e8) at ldo.c:142
#21 0x0805a656 in luaD_pcall (L=0x81d8bd4, func=0x80560d0 <f_call>, u=0xb778a8e8, old_top=132, ef=0) at ldo.c:729
#22 0x08057969 in lua_pcallk (L=0x81d8bd4, nargs=5, nresults=-1, errfunc=0, ctx=0, k=0x806f070 <finishpcall>) at lapi.c:970
#23 0x0806f116 in luaB_pcall (L=0x81d8bd4) at lbaselib.c:424
#24 0x0805a064 in luaD_precall (L=0x81d8bd4, func=0x81e406c, nresults=2) at ldo.c:434
#25 0x080661f0 in luaV_execute (L=0x81d8bd4) at lvm.c:1146
#26 0x0805a348 in luaD_call (L=0x81d8bd4, func=0x81e4024, nResults=0) at ldo.c:499
#27 0x0805a3ac in luaD_callnoyield (L=0x81d8bd4, func=0x81e4024, nResults=0) at ldo.c:509
#28 0x080560e8 in f_call (L=0x81d8bd4, ud=0xb778abb8) at lapi.c:944
#29 0x080596bb in luaD_rawrunprotected (L=0x81d8bd4, f=0x80560d0 <f_call>, ud=0xb778abb8) at ldo.c:142
#30 0x0805a656 in luaD_pcall (L=0x81d8bd4, func=0x80560d0 <f_call>, u=0xb778abb8, old_top=36, ef=12) at ldo.c:729
#31 0x08057969 in lua_pcallk (L=0x81d8bd4, nargs=5, nresults=0, errfunc=1, ctx=0, k=0x0) at lapi.c:970
#32 0xb772978e in _cb (context=0x81da400, ud=0x81d8bd4, type=1, session=2, source=3, msg=0x81b1080, sz=2) at lualib-src/lua-skynet.c:75
#33 0x0804ecb1 in dispatch_message (ctx=ctx@entry=0x81da400, msg=msg@entry=0xb778ac80) at skynet-src/skynet_server.c:274
#34 0x0804f97a in skynet_context_message_dispatch (sm=0x815b2a0, q=0x81da470, weight=-1) at skynet-src/skynet_server.c:334
#35 0x080501a9 in thread_worker (p=0xbffffb9c) at skynet-src/skynet_start.c:163
#36 0xb7fe3fc5 in start (p=0xb778ad54) at src/thread/pthread_create.c:150
#37 0xb7fe2b3a in __clone () at src/thread/i386/clone.s:36
#38 0x00000000 in ?? ()
(gdb) thread 7
[Switching to thread 7 (LWP 2781)]
#0  0xb7f95ad9 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7f95ad9 in __kernel_vsyscall ()
#1  0xb7fb1314 in __vsyscall () at src/internal/i386/syscall.s:19
#2  0xb7fe278b in __syscall3 (a3=-2147483646, a2=128, a1=-1207963496, n=240) at ./arch/i386/syscall_arch.h:30
#3  __futexwait (priv=128, val=-2147483646, addr=0xb7fff098 <ofl_lock>) at ./src/internal/pthread_impl.h:150
#4  __lock (l=0xb7fff098 <ofl_lock>) at src/thread/__lock.c:42
#5  0xb7fd9f30 in __ofl_lock () at src/stdio/ofl.c:9
#6  0xb7fd8144 in fclose (f=0x81f3450) at src/stdio/fclose.c:17
#7  0xb7fda4e2 in pclose (f=0x81f3450) at src/stdio/pclose.c:9
#8  0x0807130e in io_pclose (L=0x81f3034) at liolib.c:273
#9  0x0807090a in aux_close (L=0x81f3034) at liolib.c:205
#10 0x0805a064 in luaD_precall (L=0x81f3034, func=0x81f312c, nresults=0) at ldo.c:434
#11 0x080661f0 in luaV_execute (L=0x81f3034) at lvm.c:1146
#12 0x080596bb in luaD_rawrunprotected (L=0x81f3034, f=0x805a250 <resume>, ud=0xb77748c8) at ldo.c:142
#13 0x0805a46d in lua_resume (L=0x81f3034, from=0x81e6654, nargs=<optimized out>) at ldo.c:664
#14 0x0806f252 in auxresume (L=L@entry=0x81e6654, co=co@entry=0x81f3034, narg=0) at lcorolib.c:39
#15 0x0806f5a5 in luaB_coresume (L=0x81e6654) at lcorolib.c:60
#16 0xb7732ca9 in timing_resume (L=0x81e6654) at lualib-src/lua-profile.c:134
#17 0x0805a064 in luaD_precall (L=0x81e6654, func=0x81f09e4, nresults=-1) at ldo.c:434
#18 0x08066686 in luaV_execute (L=0x81e6654) at lvm.c:1162
#19 0x0805a348 in luaD_call (L=0x81e6654, func=0x81f0924, nResults=0) at ldo.c:499
#20 0x0805a3ac in luaD_callnoyield (L=0x81e6654, func=0x81f0924, nResults=0) at ldo.c:509
#21 0x080560e8 in f_call (L=0x81e6654, ud=0xb7774bb8) at lapi.c:944
#22 0x080596bb in luaD_rawrunprotected (L=0x81e6654, f=0x80560d0 <f_call>, ud=0xb7774bb8) at ldo.c:142
#23 0x0805a656 in luaD_pcall (L=0x81e6654, func=0x80560d0 <f_call>, u=0xb7774bb8, old_top=36, ef=12) at ldo.c:729
#24 0x08057969 in lua_pcallk (L=0x81e6654, nargs=5, nresults=0, errfunc=1, ctx=0, k=0x0) at lapi.c:970
#25 0xb772978e in _cb (context=0x81e6e00, ud=0x81e6654, type=1, session=1, source=0, msg=0x0, sz=0) at lualib-src/lua-skynet.c:75
#26 0x0804ecb1 in dispatch_message (ctx=ctx@entry=0x81e6e00, msg=msg@entry=0xb7774c80) at skynet-src/skynet_server.c:274
#27 0x0804f97a in skynet_context_message_dispatch (sm=0x815b2c0, q=0x81e6ec0, weight=-1) at skynet-src/skynet_server.c:334
#28 0x080501a9 in thread_worker (p=0xbffffba8) at skynet-src/skynet_start.c:163
#29 0xb7fe3fc5 in start (p=0xb7774d54) at src/thread/pthread_create.c:150
#30 0xb7fe2b3a in __clone () at src/thread/i386/clone.s:36
#31 0x00000000 in ?? ()
(gdb) 



dirk@kooiot.com

[-- Attachment #2: Type: text/html, Size: 9518 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-02 15:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-02 15:59 Deadlock when calling fflush/fclose in multiple threads dirk
     [not found] <2018110213110009300613@kooiot.com>
2018-11-02 14:29 ` Rich Felker
2018-11-02 15:15   ` John Starks
2018-11-02 15:33     ` Rich Felker
2018-11-02 15:37       ` John Starks
2018-11-02 15:46         ` Rich Felker
2018-11-02 15:45   ` Rich Felker
  -- strict thread matches above, loose matches on Subject: below --
2018-11-02  5:11 dirk

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).