From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13539 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: aio_cancel segmentation fault for in progress write requests Date: Fri, 7 Dec 2018 18:50:40 -0500 Message-ID: <20181207235040.GK23599@brightrain.aerifal.cx> References: <20181207154419.GD23599@brightrain.aerifal.cx> <20181207165217.GE23599@brightrain.aerifal.cx> <54b4d253-1660-3207-5d59-f23f1c25b2b9@adelielinux.org> <20181207182650.GF23599@brightrain.aerifal.cx> <03a5f237-87cd-5580-4148-a29fa22d3ef0@adelielinux.org> <20181207203532.GA2554@voyager> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1544226529 560 195.159.176.226 (7 Dec 2018 23:48:49 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 7 Dec 2018 23:48:49 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13555-gllmg-musl=m.gmane.org@lists.openwall.com Sat Dec 08 00:48:45 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gVPrQ-0008Tz-Bo for gllmg-musl@m.gmane.org; Sat, 08 Dec 2018 00:48:44 +0100 Original-Received: (qmail 20325 invoked by uid 550); 7 Dec 2018 23:50:53 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 20307 invoked from network); 7 Dec 2018 23:50:52 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13539 Archived-At: On Fri, Dec 07, 2018 at 04:51:03PM -0600, A. Wilcox wrote: > On 12/07/18 14:35, Markus Wichmann wrote: > > On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote: > >> So, my best theory is that running inside a debugger (gdb, valgrind) > >> makes it slow enough that it no longer races. > > > > Two ideas to investigate further. 1: Produce a coredump ("ulimit -c > > unlimited"). That won't interfere with timing, but I have no clue if > > coredumps work with multithreading. > > Core was generated by `./aioWrite '. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 __cp_end () at src/thread/powerpc64/syscall_cp.s:32 > 32 src/thread/powerpc64/syscall_cp.s: No such file or directory. > [Current thread is 1 (LWP 5507)] > (gdb) bt > #0 __cp_end () at src/thread/powerpc64/syscall_cp.s:32 > #1 0x00003fffa768f2a4 in __syscall_cp_c (nr=180, u=512512, v=0, w=0, > x=0, y=0, z=0) at src/thread/pthread_cancel.c:35 > #2 0x00003fffa768e008 in __syscall_cp (nr=, u= out>, v=, w=, x=, > y=, z=) at src/thread/__syscall_cp.c:20 > #3 0x00003fffa76969f4 in pwrite (fd=, buf= out>, size=, ofs=) at src/unistd/pwrite.c:7 > #4 0x00003fffa763eddc in io_thread_func (ctx=) at > src/aio/aio.c:240 > #5 0x00003fffa768f76c in start (p=0x3fffa76e8af8) at > src/thread/pthread_create.c:147 > #6 0x00003fffa769b608 in __clone () at src/thread/powerpc64/clone.s:43 > (gdb) thread 2 > [Switching to thread 2 (LWP 5506)] > #0 0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at > ./arch/powerpc64/syscall_arch.h:54 > 54 ./arch/powerpc64/syscall_arch.h: No such file or directory. > (gdb) bt > #0 0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at > ./arch/powerpc64/syscall_arch.h:54 > #1 __wait (addr=0x200, waiters=0x0, val=, > priv=) at src/thread/__wait.c:13 > #2 0x00003fffa763f07c in aio_cancel (fd=, > cb=0x3fffffafd2b8) at src/aio/aio.c:356 > #3 0x000000012034c044 in main () > > > 221 is SYS_futex. Wow, that looks wrong. I don't think thread 2 (odd numbering; it looks like the main thread) is relevant to the crash; it's alread proceeded past whatever was happening when thread 1 (the io thread) started crashing. I'm guessing it is stack overflow. Can you dump the registers (to see the stack pointer value) and info about memory ranges? That should show how much space is left on the stack at the point of crash. If the crash is the signal handler trying to run, there will probably be some space left but less than the size of a signal frame, and the kernel will probably refrain from moving the stack pointer to include the signal frame. Rich