From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13539
Path: news.gmane.org!.POSTED!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: aio_cancel segmentation fault for in progress write
 requests
Date: Fri, 7 Dec 2018 18:50:40 -0500
Message-ID: <20181207235040.GK23599@brightrain.aerifal.cx>
References: <CAO=yjR2ZEh1Rj24ABcObeB6q6dXMF=HYwEnAJZ7+S9Ezv_37Xg@mail.gmail.com>
 <20181207154419.GD23599@brightrain.aerifal.cx>
 <CAO=yjR1xdDHijaCAv0UpCKZCXBmyL_21SGBOQjQd+Z9dmjb2aw@mail.gmail.com>
 <20181207165217.GE23599@brightrain.aerifal.cx>
 <54b4d253-1660-3207-5d59-f23f1c25b2b9@adelielinux.org>
 <20181207182650.GF23599@brightrain.aerifal.cx>
 <03a5f237-87cd-5580-4148-a29fa22d3ef0@adelielinux.org>
 <20181207203532.GA2554@voyager>
 <a41b688f-dfae-c4b2-d58e-df797ede8a09@adelielinux.org>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1544226529 560 195.159.176.226 (7 Dec 2018 23:48:49 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Fri, 7 Dec 2018 23:48:49 +0000 (UTC)
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-13555-gllmg-musl=m.gmane.org@lists.openwall.com Sat Dec 08 00:48:45 2018
Return-path: <musl-return-13555-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-13555-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gVPrQ-0008Tz-Bo
	for gllmg-musl@m.gmane.org; Sat, 08 Dec 2018 00:48:44 +0100
Original-Received: (qmail 20325 invoked by uid 550); 7 Dec 2018 23:50:53 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 20307 invoked from network); 7 Dec 2018 23:50:52 -0000
Content-Disposition: inline
In-Reply-To: <a41b688f-dfae-c4b2-d58e-df797ede8a09@adelielinux.org>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:13539
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13539>

On Fri, Dec 07, 2018 at 04:51:03PM -0600, A. Wilcox wrote:
> On 12/07/18 14:35, Markus Wichmann wrote:
> > On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> >> So, my best theory is that running inside a debugger (gdb, valgrind)
> >> makes it slow enough that it no longer races.
> > 
> > Two ideas to investigate further. 1: Produce a coredump ("ulimit -c
> > unlimited"). That won't interfere with timing, but I have no clue if
> > coredumps work with multithreading.
> 
> Core was generated by `./aioWrite '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
> 32      src/thread/powerpc64/syscall_cp.s: No such file or directory.
> [Current thread is 1 (LWP 5507)]
> (gdb) bt
> #0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
> #1  0x00003fffa768f2a4 in __syscall_cp_c (nr=180, u=512512, v=0, w=0,
> x=0, y=0, z=0) at src/thread/pthread_cancel.c:35
> #2  0x00003fffa768e008 in __syscall_cp (nr=<optimized out>, u=<optimized
> out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
> y=<optimized out>, z=<optimized out>) at src/thread/__syscall_cp.c:20
> #3  0x00003fffa76969f4 in pwrite (fd=<optimized out>, buf=<optimized
> out>, size=<optimized out>, ofs=<optimized out>) at src/unistd/pwrite.c:7
> #4  0x00003fffa763eddc in io_thread_func (ctx=<optimized out>) at
> src/aio/aio.c:240
> #5  0x00003fffa768f76c in start (p=0x3fffa76e8af8) at
> src/thread/pthread_create.c:147
> #6  0x00003fffa769b608 in __clone () at src/thread/powerpc64/clone.s:43
> (gdb) thread 2
> [Switching to thread 2 (LWP 5506)]
> #0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
> ./arch/powerpc64/syscall_arch.h:54
> 54      ./arch/powerpc64/syscall_arch.h: No such file or directory.
> (gdb) bt
> #0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
> ./arch/powerpc64/syscall_arch.h:54
> #1  __wait (addr=0x200, waiters=0x0, val=<optimized out>,
> priv=<optimized out>) at src/thread/__wait.c:13
> #2  0x00003fffa763f07c in aio_cancel (fd=<optimized out>,
> cb=0x3fffffafd2b8) at src/aio/aio.c:356
> #3  0x000000012034c044 in main ()
> 
> 
> 221 is SYS_futex.  Wow, that looks wrong.

I don't think thread 2 (odd numbering; it looks like the main thread)
is relevant to the crash; it's alread proceeded past whatever was
happening when thread 1 (the io thread) started crashing.

I'm guessing it is stack overflow. Can you dump the registers (to see
the stack pointer value) and info about memory ranges? That should
show how much space is left on the stack at the point of crash. If the
crash is the signal handler trying to run, there will probably be some
space left but less than the size of a signal frame, and the kernel
will probably refrain from moving the stack pointer to include the
signal frame.

Rich