From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 12371 invoked from network); 12 Mar 2022 05:46:45 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 12 Mar 2022 05:46:45 -0000 Received: (qmail 8007 invoked by uid 550); 12 Mar 2022 05:46:41 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7969 invoked from network); 12 Mar 2022 05:46:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3CgZKp2Lu3wZWzpuHkkA6bbcohIyNmrewoAUyB1DGFk=; b=QKenD5cR0MvMVdyyLbDH1sk8zgEVrtvcsviQiudRsbCn4NwotmotEEZ39bM7LR0Alt ulDXHuLtCSGVN3y5BkSZeupdNNUXB9cTXVHDzJZAYXD5qNRHlsOqvGlbrXm872mVOigV e5MdKCo/1Sh88zAwttVj8AjmmGl4varOuKPeotz2JWanGm3tnorb+XXlVmakeM8crzr/ Qii/xv1n/Z3J7ZocK2fIOD2eCExEmLKdEQNZC/L3zrKS+0t8PzXcXBwKLw8qOYXuvGOZ WL8v/gnQddNkmRyH4YTIBEus73MY53V+3bYCe1UCIsx950jrAFlnViMRpiIy+UNAMIFN 60Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3CgZKp2Lu3wZWzpuHkkA6bbcohIyNmrewoAUyB1DGFk=; b=qZwXQgFUFJ4+7OvABC223zxJpeNdANX7CDPjFQqezoVxHIDTtFDtDO05vcndMz2qye LabTex0M+SzdnxELsQ0/lUXS0WRTAE4c+HksHqydxSOY53vc3qQnVw2YZD5LV1CE+2EM mbMBWknwvoP1cEShoFfifSl2vwBP0LmoztVMD8jUuDVsOCxqL+ZAQtpFz28wXo7MO745 gMtYUMXTQJoPHNtk7gaiGlO7b3sRXKggHIZF0ypNeBiQXZ1fAf31oOIbmVanxI773qbr 15U5UfMS8bfW2GrpCH/y3h5nqnkS3oKTc6uyk/SK5u+U5Cj5lKd7Uoi3q3sdbatTTDze v9/A== X-Gm-Message-State: AOAM530WS4CMyKTemPHJHDjxL1Eok7Z2MneHQ+A4tsoEbI0HPuP+4IWX a3LJDNB5geSzCmdjwVX/ZkHIyxYM33KFNF1uDB3Um6f/l4Q= X-Google-Smtp-Source: ABdhPJxUSF+pMK6vx9SadaaejWeyj5vwGeEEEhOcsZaRDjE5AsD3/zMmj4/mQ2IQpoRYzDX2R1jLvpExDNljfB2AVeg= X-Received: by 2002:a05:6870:248c:b0:d7:19f3:a52c with SMTP id s12-20020a056870248c00b000d719f3a52cmr7154240oaq.149.1647063988277; Fri, 11 Mar 2022 21:46:28 -0800 (PST) MIME-Version: 1.0 References: <20220311202151.GW7074@brightrain.aerifal.cx> In-Reply-To: <20220311202151.GW7074@brightrain.aerifal.cx> From: Gavin Howard Date: Fri, 11 Mar 2022 22:45:52 -0700 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: text/plain; charset="UTF-8" Subject: Re: [musl] Possible PEBKAC or Bug in musl Semaphores and/or Mutexes > I don't think there's enough detail to debug your issue based on the > high-level description you gave. The bug is almost surely in something > we can't see. As long as all modification to and reading from the flag > is done with the mutex held, you don't need to worry about data races > on it. While I can't show a lot of the code due to licensing issues, I will show what I can. Main Thread #1 (after joining all other threads: ``` s = y_strucon_mutex_lock(&procs_lock); if (y_err(s != y_STATUS_SUCCESS)) return; done = 1; y_strucon_sem_post(&procs_sem, 1); y_strucon_mutex_unlock(&procs_lock); r = pthread_join(childthread->sysid, &data); if (y_err(r != 0)) { y_panica("Deadlock in thread join"); } ``` When you see a `y_` prefix, that is my own code. `y_strucon_mutex_*` functions are wrappers around the `pthread_mutex_*` functions, and same with `y_strucon_sem_*` for semaphores. The variable `s` is the standard return type of most functions; it's a status code. Main Thread #2: ``` do { int status; s = y_strucon_sem_wait(&procs_sem); if (y_err(s != y_STATUS_SUCCESS)) return s; s = y_strucon_mutex_lock(&procs_lock); if (y_err(s != y_STATUS_SUCCESS)) return s; if (y_unlikely(done != 0)) { y_strucon_mutex_unlock(&procs_lock); pthread_exit((void*) (uintptr_t) s); } y_strucon_mutex_unlock(&procs_lock); do { r = waitpid(-1, &status, 0); } while (r < 0 && errno == EINTR); y_assert(r > 0, "Child process does not exist"); y_procchildinfo info; info.pid = r; info.code = status; s = y_strucon_mutex_lock(&procs_lock); if (y_err(s != y_STATUS_SUCCESS)) return s; y_fd* fdptr; // This function returns a pointer to the fd in fdptr, if the // process has been registered before. if (y_map_exists_v(procs, &info.pid, &fdptr)) { yc_assert(fdptr != NULL, YC_ASSERT_NULL); s = y_io_bareio_write(*fdptr, (y_uchar*) &info, sizeof(y_procchildinfo)); if (y_err(s != y_STATUS_SUCCESS)) { y_panicva("Could not generate child status notification " "using fd %d", *fdptr); } s = y_map_remove(procs, &info.pid); if (y_err(s != y_STATUS_SUCCESS)) return s; } else { s = y_vec_push(&unregprocs, &info); if (y_err(s != y_STATUS_SUCCESS)) return s; } y_strucon_mutex_unlock(&procs_lock); } while (s == y_STATUS_SUCCESS); ``` What a worker thread does to start and register a process: ``` pid_t chpid = fork(); if (chpid < 0) return y_STATUS_CHILD_ERR; if (chpid == 0) { // Setup and exec() child. Abort if exec() fails. abort(); } s = y_strucon_mutex_lock(&procs_lock); if (y_err(s != y_STATUS_SUCCESS)) return s; size_t i; size_t len = y_vec_len(&unregprocs); for (i = 0; i < len; ++i) { y_procchildinfo* ptr = (y_procchildinfo*) y_vec_get(&unregprocs, i); if (ptr->pid == chpid) { s = y_io_bareio_write(fd, (y_uchar*) ptr, sizeof(y_procchildinfo)); if (y_err(s != y_STATUS_SUCCESS)) { y_panica("Could not generate child status notification"); } s = y_vec_popAt(&unregprocs, i); y_strucon_sem_post(&procs_sem, 1); y_strucon_mutex_unlock(&procs_lock); return s; } } s = y_map_insert(procs, &chpid, &fd); y_strucon_sem_post(&procs_sem, 1); y_strucon_mutex_unlock(&procs_lock); ``` As you can see, except for the presence of a vector (resizable array) for the possibility of a waitpid() returning a child process before it is "registered," the high-level description I gave wasn't very high-level but matches pretty well. Notice that all accesses of the map and vector are protected by the mutex, so there's no data race on them. (I have another problem with musl's rwlocks on another map, but I'll send another message to the mailing list about that.) > You should probably look at the strace of your program to see what's > going on with the ECHILD. It's not clear whether you're calling > waidpid with an argument of -1 or a specific pid, but in the latter > case you could look for whether the child was already reaped and > whether the pid was ever valid (from the strace log). Also, be aware > that ignoring SIGCHLD may break waiting for it, in case you're doing > that. As shown above, I'm always calling it with waitpid(-1, ...), mostly because I don't know what order child processes will finish in. And this thread is the only one calling waitpid(), so it couldn't have been reaped by another thread by accident. SIGCHLD is not blocked. In fact, a signal handler is explicitly registered for it is Main Thread #2, which, if it gets a SIGCHLD, does nothing. I do this because I want to be sure that the problem you mention does not happen, but also because after attempting to make SIGCHLD work for this same purpose, I found out that Linux will drop SIGCHLD's without a second thought. I know this because the code using them would deadlock nearly every time. > As an aside, overall, the whole setup looks like you were looking for > condition variables but skipped over them for some reason. I skipped over condition variables for a *very* important reason: they would cause deadlock. When a condition variable is signalled, it will do nothing if there are no threads waiting on it. At least, that's what POSIX condition variables are guaranteed to do. ("The pthread_cond_broadcast() and pthread_cond_signal() functions shall have no effect if there are no threads currently blocked on cond.") What this means is that if Main Thread #2 is blocked on waitpid(), then if another thread creates a child and signals the condition variable, then after Main Thread #2 returns from waitpid(), it will block on the condition variable. If another thread creates another child, sure, it will unblock, but eventually, when the build is near done, there will be no more child processes created, and Main Thread #2 will permanently block on the condition variable. In turn, this will cause a deadlock because the worker threads all poll() on the read end of the pipe (the write end was put into the map for Main Thread #2), so if Main Thread #2 blocks on the condition variable, they will all block waiting for their child(ren) to be reaped, and the build system will deadlock. Likewise, having Main Thread #2 hold a mutex except when on a condition variable means that worker threads won't be able to register children except when Main Thread #2 is blocked on the condition variable. If Main Thread #2 holds a lock on a mutex while in waitpid(), this means that the build system can only run one command at a time. If you release the lock just before calling waitpid(), allowing worker threads to make progress, you still have the problem I mentioned above should the condition variable be signalled during that call. A semaphore is the right choice because it can be signalled while Main Thread #2 is not blocked on it, and Main Thread #2 will get that signal when it tries to wait on it. And it can be signalled many times, which will tell Main Thread #2 how many children it needs to reap. There is one thing I can think of that might cause the problem: the processor is reordering the syscall to be before the if (done != 0) check. This doesn't make sense to me because 1) the arch I'm on is the mostly TSO x86_64, and 2) while a mutex unlock may allow some memory operations to be reordered from after to before the lock, I don't know that a syscall would be reordered. In fact, I don't know how syscalls are treated in such cases; are they treated as memory operations at all? I have no clue. Thank you for your time. Gavin Howard