From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 1720 invoked from network); 12 Mar 2022 17:11:38 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 12 Mar 2022 17:11:38 -0000 Received: (qmail 13709 invoked by uid 550); 12 Mar 2022 17:11:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 13659 invoked from network); 12 Mar 2022 17:11:35 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UTOArL8QeIrR1gZA3rcMM1kzOoCI35gSzqGBXb1QhKQ=; b=ReyDVXWG4pHwL2knhkStE6N/hU0CaLt3s0zB4aXPU2Iu9DA1y0PTAYoBKZDTeRWll9 7mQxSTEHd2iETckWOrsFhHkjZOvZL0s+zLLE63hIkQNXxN68qPjYrq8g+KBV087VRaGw 2sQZk5D0dbU+zI9HGs839tu4Nlvq2zemXF6EuAEw3y/Uc60xBNh2TyhgGHjN0HXc4Ohx cut7GL4bHEMQFe6rc5IcPnglaW+evbRNGiJi+6QfyhxSSLq1LirbRoSrGZrwYXDp0B+j h+usE1AAHJr4UF72IATfmbVcPzmuT2yH8rIutEhuvIBcFAxEtE486ZXzyr3ElIR9hZqU NTtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UTOArL8QeIrR1gZA3rcMM1kzOoCI35gSzqGBXb1QhKQ=; b=Vo0SnVFcj0eshImJY1odlMl1fBqjq/7o/xUmITtj8xY6ELrO7nS7mf1ATCMzgHFVcK qf+cJjeTguuzhGx3i7ncfKXxHJspY5wsSbot18MOAl74wgdl+O5pTSd1iMpgXHddjzEe ULzfNpdOXQ21scyuDf7QbWnA4krG7GqqpFkxHTwE8+MDZ8gih+k0zxErgk6hzkkbIUii PqUXVF0ncSS5lZAaRKMumkVG4MmdeDo+gnXTXtVFY3LXD5VGNKVRexyRam5b/bkKc9+K jImLsPGQYA2Hlds6v8Lom/gIYKNB1dJzwH2KS6kJFpTS7eKAAmqNLIOFFFeo0tko6MIo kXxg== X-Gm-Message-State: AOAM533a1pXjx0mVZ/SG3j8gGIexS/EwWLc4eUyFsYDJGSOPrhVc4vcB 3wUBTsUFSxMYWGGJ1zeQbObHuD1m89jbG3CQrbk= X-Google-Smtp-Source: ABdhPJwUFcKB9n6XC/QXaSNReNzNsSlu2wvqbW/zU0/smnLRwFaXAVVPMYRkZLmvlTLTZL+e4Z0nVi14vbhvRKqbKzE= X-Received: by 2002:a4a:e703:0:b0:324:b6e:4b76 with SMTP id y3-20020a4ae703000000b003240b6e4b76mr1248616oou.51.1647105083243; Sat, 12 Mar 2022 09:11:23 -0800 (PST) MIME-Version: 1.0 References: <20220311202151.GW7074@brightrain.aerifal.cx> <20220312150132.GX7074@brightrain.aerifal.cx> In-Reply-To: <20220312150132.GX7074@brightrain.aerifal.cx> From: Gavin Howard Date: Sat, 12 Mar 2022 10:10:47 -0700 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: text/plain; charset="UTF-8" Subject: Re: [musl] Possible PEBKAC or Bug in musl Semaphores and/or Mutexes > This is why condition variables necessarily have an associated > predicate (in your case, involving your flag and possibly other > state). You can *never* just do pthread_cond_wait. The only correct > usage is: > > while (!predicate(...)) pthread_cond_wait(mtx,cnd); > > Correct use of condition variables ensures that you see any relevant > changes to the state the predicate is evaluated on, without requiring > you to explicitly work out a way to do the same thing with semaphores > or other lower-level primitives. It does not matter whatsoever whether > there are already waiters when you call signal. If not, they'll > necessarily see the state change *before* they wait. The state change > cannot happen between evaluation of the predicate and the wait because > the calling thread holds the mutex. Ah, I misunderstood. > > What this means is that if Main Thread #2 is blocked on waitpid(), then > > if another thread creates a child and signals the condition variable, > > then after Main Thread #2 returns from waitpid(), it will block on the > ^^^^^^^^^^^^^^^^^^^^ > > condition variable. If another thread creates another child, sure, it > ^^^^^^^^^^^^^^^^^^ > > No it won't, because it evaluates the predicate that tells it there's > something to do before calling wait. If you're not doing that, you're > using cond vars in a fundamentally wrong way. I have now implemented the system using the mutex and changing the semaphore to a condition variable with the flag and two other variables for how many children there are versus how many children have been reaped. We'll see if anything shows up while I run it over and over again. > No, this is absolutely not what's happening. Neither the processor nor > the compiler (in general the latter is at least as much of a concern > if not more when you have missing/incorrect synchronization) can > reorder things like that. Good to know, thank you. > On some level they're like any other external function call to a > function whose definition the layer optimizing the caller can't see: > they must be assumed to be able to store to or load from any memory > whose address has been exposed, and thus cannot be reordered > whatsoever. Okay, I wondered if that might be the case. > Your problem is not mysterious reordering. Your problem is in your > program's logic somewhere. Please use the debugging tools at your > disposal, especially strace which will probably reveal the problem to > you right away (by letting you see the exact sequence of events that > happened and trace through it to figure out where it mismatches your > expectation). I did use strace. It revealed nothing out of the ordinary. That's why I did not mention it, but I probably should have. I've also used GDB to inspect the core dumps. I did try. Perhaps while I'm learning and making a fool of myself, I'll mention my problem with rwlocks. The relevant code is: ``` do { bool rel = (strchr((char*) cmd->a, '/') != NULL); cont = false; // We only need to do something when the cmd is not a relative path. if (!rel) { s = y_strucon_rwlock_rdlock(&r->env.lock); if (y_err(s != y_STATUS_SUCCESS)) goto err; exists = y_map_existsStrings_v(&r->env.exec_map, (char*) cmd->a, &res); // We have to hold the lock until we have copied the result because it // could be moved by a write to the map. if (exists) { // Just move the value from res to cmd. I can do this because // the string in res is heap allocated and is not affected by // edits to the map. cmd->len = res->len; cmd->a = res->a; // Release the lock as soon as possible. y_strucon_rwlock_rdunlock(&r->env.lock); } else { // Release the lock as soon as possible. y_strucon_rwlock_rdunlock(&r->env.lock); s = y_strucon_rwlock_wrlock(&r->env.lock); if (y_err(s != y_STATUS_SUCCESS)) { y_str_free(&str); goto err; } // Make sure someone didn't get there first. if (!y_map_existsStrings(&r->env.exec_map, (char*) cmd->a)) { s = y_map_insertStrings(&r->env.exec_map, (char*) cmd->a, (char*) str.a); if (s == y_STATUS_ELEM_EXISTS) { y_panica("Element already exists"); } } y_strucon_rwlock_wrunlock(&r->env.lock); // Always free first. y_str_free(&str); y_stackpool_free(pool); if (y_err(s != y_STATUS_SUCCESS)) goto err; cont = true; } } } while (cont); err: ``` Besides initialization (before any other threads are created) and destruction (after all other threads are joined), these are the only references to the map in question. `y_strucon_rwlock_*` functions are just wrappers around POSIX rwlocks. In this case, I'm not doing anything fancy; it's just rwlocks. However, that abort about the element already existing will be triggered consistently within about 5 minutes, both on musl and glibc, so probably PEBKAC. If I change the read lock near the beginning with a write lock, I still get the same issue. However, if I change the rwlock for a mutex, and update all locking and unlocking to match, I don't get the issue. In this case, the strace shows nothing out of the ordinary that I can see. Yet I can't see how I am doing anything wrong. I've double checked that y_map_existsStrings{,_v} do not edit the map at all. So where's my stupidity on this one? Gavin Howard