From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13535 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: aio_cancel segmentation fault for in progress write requests Date: Fri, 7 Dec 2018 15:21:14 -0500 Message-ID: <20181207202114.GI23599@brightrain.aerifal.cx> References: <20181207154419.GD23599@brightrain.aerifal.cx> <20181207165217.GE23599@brightrain.aerifal.cx> <54b4d253-1660-3207-5d59-f23f1c25b2b9@adelielinux.org> <20181207182650.GF23599@brightrain.aerifal.cx> <03a5f237-87cd-5580-4148-a29fa22d3ef0@adelielinux.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1544213962 28755 195.159.176.226 (7 Dec 2018 20:19:22 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 7 Dec 2018 20:19:22 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13551-gllmg-musl=m.gmane.org@lists.openwall.com Fri Dec 07 21:19:18 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gVMaj-0007NC-BQ for gllmg-musl@m.gmane.org; Fri, 07 Dec 2018 21:19:17 +0100 Original-Received: (qmail 5846 invoked by uid 550); 7 Dec 2018 20:21:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5828 invoked from network); 7 Dec 2018 20:21:26 -0000 Content-Disposition: inline In-Reply-To: <03a5f237-87cd-5580-4148-a29fa22d3ef0@adelielinux.org> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13535 Archived-At: On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote: > Okay, it's a race of some kind: > > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so > musl libc (powerpc64) > Version 1.1.20-git-156-gb1c58cb9 > Dynamic Program Loader > Usage: lib/libc.so [options] [--] pathname [args] > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > aio_write/1-1.c cancelationStatus : 2 > Test PASSED > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > > > So, my best theory is that running inside a debugger (gdb, valgrind) > makes it slow enough that it no longer races. OK, here's a theory. Based on my reply just now to Florian, the signal context would have to get really big to make the expected code path overflow -- io_thread_func() has a very small stack frame and so does cleanup(). However, early in io_thread_func, it calls __aio_get_queue(), which calls calloc() if the tables at each level don't already exist, which is certainly the case for the first call. During this call, the margin will be somewhat smaller, and maybe it's enough to make kernels that break the MINSIGSTKSZ contract cause an overflow. The right action here is probably calling __aio_get_queue with the fd number *before* calling pthread_create, so that it's guaranteed that __aio_get_queue takes the fast path in the io thread and doesn't call calloc. This is especially important in light of the newish allowance that malloc be interposed, where we would be running application-provided malloc code in a thread with tiny stack. I'm still not sure this is the source of the reported crash but I think it needs to be changed either way. Rich