From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14902 invoked from network); 18 Oct 2022 17:07:51 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 18 Oct 2022 17:07:51 -0000 Received: (qmail 7432 invoked by uid 550); 18 Oct 2022 17:07:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7400 invoked from network); 18 Oct 2022 17:07:47 -0000 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru E3226419E9E6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1666112854; bh=zokerRRNfFVuij8vtUTUno092OW2AixxXJXBro7L2mo=; h=Date:From:To:Cc:Subject:Reply-To:In-Reply-To:References:From; b=jIeev5jri/Gnb22saVqdEYxNWdpmrdkHqVp0ystT8PuVu5tBvbdIxifxBIqNFeT94 j245KPiegHdu/JuLhZuJWOreSR6VOYGNztEa8oef4AGXBVpG7HZU3BcBoDeaHcTyjX NKf1SPRIE3PJgVJrI/YWrxjEuGIQ7wSeA1XH22HE= MIME-Version: 1.0 Date: Tue, 18 Oct 2022 20:07:34 +0300 From: Alexey Izbyshev To: musl@lists.openwall.com Cc: Ben Hillis , Brian Perkins , Peter Martincic , Pierre Boulay Mail-Followup-To: musl@lists.openwall.com In-Reply-To: References: User-Agent: Roundcube Webmail/1.4.4 Message-ID: X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [musl] Potential deadlock when fork() is called while __aio_get_queue() is allocating memory On 2022-10-18 19:51, Pierre Boulay wrote: > Hello, > > I'm working on the Windows Subsystem Linux, which uses MUSL and I'm > currently working on solving WSL complete freeze · Issue #8824 · > microsoft/WSL (github.com). > > I believe that this freeze is caused by a bug causing a deadlock in > MUSL. > > If we look at this sample program: > > #include > #include > #include > #include > #include > #include > #include > > void* thread2(void* unused) // Thread 2 > { > unsigned char buf[256] = {0}; > struct aiocb aiocb = {}; > aiocb.aio_fildes = open("/proc/self/cmdline", O_RDONLY, S_IRUSR | > S_IWUSR); > aiocb.aio_buf = buf; > aiocb.aio_nbytes = sizeof(buf) - 1; > aiocb.aio_lio_opcode = LIO_READ; > > if (aio_read(&aiocb) < 0) > { > printf("aio_read failed, %i\n", errno); > return NULL; > } > > int error = -1; > while(error = aio_error(&aiocb) == EINPROGRESS) > { > printf("In progress...\n"); > sleep(1); > } > > if (error != 0) > { > printf("aio_error: %i\n", error); > } > > printf("aio result: %s\n", buf); > return NULL; > } > > int main() > { > pthread_t thread; > pthread_create(&thread, NULL, &thread2, NULL); > > if (fork() > 0) // Thread 1 > { > pthread_join(thread, NULL); > printf("aio complete"); > } > > } > > Here we have two threads. The main thread is scheduling an asynchronous > io and the other is calling fork(). > > If we look at what the main thread is doing: > > * After calling pthread_create, the fork() call goes here, in fork.c. > * Before calling _Fork(), this thread will acquire the __malloc_lock > (through __malloc_atfork()), > * Then _Fork() is called, which calls __aio_atfork(), which waits for > the maplock > > In the meantime, the second thread will: > > * Call aio_read(), which calls submit(), in aio.c > * submit() then calls __aio_get_queue(), which acquires the maplock > * At this point the map structure needs to be allocated so submit() > calls calloc() > * calloc() calls malloc(), in malloc.c > * If the allocation overflows, malloc() then calls wrlock(), which > waits for the __malloc_lock > > So to summarize: Thread1 holds __malloc_lock and waits for maplock, and > thread2 holds maplock and waits for ___malloc_lock. We have a deadlock. > > This is my understanding of the issue, but I have a (very) limited > knowledge of MUSL's codebase, so let me know if it's correct. > Yes. Please see https://www.openwall.com/lists/musl/2022/10/06/2 and the following discussion. Alexey > If it is, I think this could be solved by moving __aio_atfork() from > _Fork() to fork(), before __malloc_atfork() so the locks are acquired > in the correct order to prevent the deadlock. > > I do see that _Fork() calls __block_all_sigs before calling > __aio_atfork() so I wonder if the order of these calls is important > though (let me know if that's the case). > > If this solution makes sense to you, I'm happy to submit a contribution > with the fix. > > Thank you, > > -- > > Pierre Boulay