Hello,

 

I’m working on the Windows Subsystem Linux, which uses MUSL and I’m currently working on solving WSL complete freeze · Issue #8824 · microsoft/WSL (github.com).

 

I believe that this freeze is caused by a bug causing a deadlock in MUSL.

 

If we look at this sample program:

 

#include <unistd.h>

#include <pthread.h>

#include <aio.h>

#include <fcntl.h>

#include <time.h>

#include <stdio.h>

#include <errno.h>

 

void* thread2(void* unused) // Thread 2

{

    unsigned char buf[256] = {0};

    struct aiocb aiocb = {};

    aiocb.aio_fildes = open("/proc/self/cmdline", O_RDONLY, S_IRUSR | S_IWUSR);

    aiocb.aio_buf = buf;

    aiocb.aio_nbytes = sizeof(buf) - 1;

    aiocb.aio_lio_opcode = LIO_READ;

 

    if (aio_read(&aiocb) < 0)

    {

        printf("aio_read failed, %i\n", errno);

        return NULL;

    }

 

    int error = -1;

    while(error = aio_error(&aiocb) == EINPROGRESS)

    {

        printf("In progress...\n");

        sleep(1);

    }

 

    if (error != 0)

    {

        printf("aio_error: %i\n", error);

    }

 

    printf("aio result: %s\n", buf);

    return NULL;

}

 

int main()

{

    pthread_t thread;

    pthread_create(&thread, NULL, &thread2, NULL);

 

    if (fork() > 0) // Thread 1

    {

        pthread_join(thread, NULL);

        printf("aio complete");

    }

 

}

 

Here we have two threads. The main thread is scheduling an asynchronous io and the other is calling fork().

If we look at what the main thread is doing:

 

 

In the meantime, the second thread will:

 

 

So to summarize: Thread1 holds __malloc_lock and waits for maplock, and thread2 holds maplock and waits for ___malloc_lock. We have a deadlock.

This is my understanding of the issue, but I have a (very) limited knowledge of MUSL’s codebase, so let me know if it’s correct.

 

If it is, I think this could be solved by  moving __aio_atfork() from _Fork() to fork(), before __malloc_atfork() so the locks are acquired in the correct order to prevent the deadlock.

 

I do see that _Fork() calls __block_all_sigs before calling __aio_atfork() so I wonder if the order of these calls is important though (let me know if that’s the case).

If this solution makes sense to you, I’m happy to submit a contribution with the fix.

 

Thank you,

 

--

Pierre Boulay