From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-10.9 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 9455 invoked from network); 14 Mar 2022 16:10:00 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 14 Mar 2022 16:10:00 -0000 Received: (qmail 28009 invoked by uid 550); 14 Mar 2022 16:09:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 27972 invoked from network); 14 Mar 2022 16:09:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xFI9+FWP8q9uTFnDyr5KnljOZyNlQbtfu/gmysmpGYE=; b=n3/QyjThbtT3o/7Mc2icXC0sib+Qp5HLC1YxQvv6JtacmkkQXfsvIy+MLg33QY2rMB FaKEJEm+cbDEuxlgRmsAUc/UZcTSPHruG4S47w+bsvneReUw70PKuFgxhHnK6HPRpZe1 LI7Et4UITg8KDfB+suBT4/WHLPfIfLTYKPhDrzodw2TQ2uqa2D6U6i3O/SIdHdtcR2Jl vp3Lbi964gR4BI4GcxUb6K3c2g0SeBRzyTh/B8ud9AbIQXIUtkVHWWXRko/UF4RGeqJ4 DL8obYypldGxRWm6icTWaJ7Bh1HPYdHsp01YxHaK9dSqtjNM0PnForfVBJLrARZwAnf2 cfqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xFI9+FWP8q9uTFnDyr5KnljOZyNlQbtfu/gmysmpGYE=; b=W2Pnp3LQnRl493+P+4TPuy6La/bQA8X9vCHYv/7p6FcWqT9X0QlRqtRggJ3gol7gfS Vxqt3kBX13MzCLrQY96TK6TVlLfQDLO+3+kai2U1zFUinUXir8P4B08dNy2W9HNkK/DN 7+7X7pk0Vyo6RGSZqAdgU4qP91CeU+G5VbZ7YkixrEVAR5ryTU777TM5dWfUIg+FnN4v HLJXcwo1fSfUFLzkTjW8NKtqvYbcQCCqLv6FKihK81xFukBmQd5d5VgB8pxYSnMwGYSw FOr1MHQP0i7F6FlqNJ4BldviNce7j2dmOI7ZAEfpRGyLGaMA36/W9IyYqaaYGP/OqnXL Igpw== X-Gm-Message-State: AOAM533vGvwekiVkYosAqSuffrRTYUvuIy4rDh429wZiUCRrDZFWas0O pYUj4TZ7Vn8zlhH2CO8PWnFN2yLFAf4JJLnS8saDuGbcQTEbLg== X-Google-Smtp-Source: ABdhPJw28zoyCRzOa4HK0+RaYEPSo1+hfiYv/+wKCxUfOInxQY6kpVFk5fNZoj+Q9eLhi3XCEADIlYutxQLIk2u4YOc= X-Received: by 2002:a05:690c:398:b0:2e5:59ab:7c6a with SMTP id bh24-20020a05690c039800b002e559ab7c6amr5126750ywb.123.1647274177511; Mon, 14 Mar 2022 09:09:37 -0700 (PDT) MIME-Version: 1.0 References: <20220311202151.GW7074@brightrain.aerifal.cx> <20220312150132.GX7074@brightrain.aerifal.cx> In-Reply-To: From: enh Date: Mon, 14 Mar 2022 09:09:26 -0700 Message-ID: To: musl@lists.openwall.com Cc: Rich Felker Content-Type: multipart/alternative; boundary="0000000000003337ee05da2fe81a" Subject: Re: [musl] Possible PEBKAC or Bug in musl Semaphores and/or Mutexes --0000000000003337ee05da2fe81a Content-Type: text/plain; charset="UTF-8" On Sat, Mar 12, 2022 at 9:11 AM Gavin Howard wrote: > > This is why condition variables necessarily have an associated > > predicate (in your case, involving your flag and possibly other > > state). You can *never* just do pthread_cond_wait. The only correct > > usage is: > > > > while (!predicate(...)) pthread_cond_wait(mtx,cnd); > > > > Correct use of condition variables ensures that you see any relevant > > changes to the state the predicate is evaluated on, without requiring > > you to explicitly work out a way to do the same thing with semaphores > > or other lower-level primitives. It does not matter whatsoever whether > > there are already waiters when you call signal. If not, they'll > > necessarily see the state change *before* they wait. The state change > > cannot happen between evaluation of the predicate and the wait because > > the calling thread holds the mutex. > > Ah, I misunderstood. > > > > What this means is that if Main Thread #2 is blocked on waitpid(), then > > > if another thread creates a child and signals the condition variable, > > > then after Main Thread #2 returns from waitpid(), it will block on the > > ^^^^^^^^^^^^^^^^^^^^ > > > condition variable. If another thread creates another child, sure, it > > ^^^^^^^^^^^^^^^^^^ > > > > No it won't, because it evaluates the predicate that tells it there's > > something to do before calling wait. If you're not doing that, you're > > using cond vars in a fundamentally wrong way. > > I have now implemented the system using the mutex and changing the > semaphore to a condition variable with the flag and two other variables > for how many children there are versus how many children have been > reaped. We'll see if anything shows up while I run it over and over > again. > > > No, this is absolutely not what's happening. Neither the processor nor > > the compiler (in general the latter is at least as much of a concern > > if not more when you have missing/incorrect synchronization) can > > reorder things like that. > > Good to know, thank you. > > > On some level they're like any other external function call to a > > function whose definition the layer optimizing the caller can't see: > > they must be assumed to be able to store to or load from any memory > > whose address has been exposed, and thus cannot be reordered > > whatsoever. > > Okay, I wondered if that might be the case. > > > Your problem is not mysterious reordering. Your problem is in your > > program's logic somewhere. Please use the debugging tools at your > > disposal, especially strace which will probably reveal the problem to > > you right away (by letting you see the exact sequence of events that > > happened and trace through it to figure out where it mismatches your > > expectation). > > I did use strace. It revealed nothing out of the ordinary. That's why I > did not mention it, but I probably should have. I've also used GDB to > inspect the core dumps. I did try. > you might find https://clang.llvm.org/docs/ThreadSanitizer.html useful for bugs like this. > Perhaps while I'm learning and making a fool of myself, I'll mention my > problem with rwlocks. > > The relevant code is: > > ``` > do > { > bool rel = (strchr((char*) cmd->a, '/') != NULL); > > cont = false; > > // We only need to do something when the cmd is not a relative path. > if (!rel) > { > s = y_strucon_rwlock_rdlock(&r->env.lock); > if (y_err(s != y_STATUS_SUCCESS)) goto err; > > exists = y_map_existsStrings_v(&r->env.exec_map, (char*) cmd->a, > &res); > > // We have to hold the lock until we have copied the result > because it > // could be moved by a write to the map. > if (exists) > { > // Just move the value from res to cmd. I can do this because > // the string in res is heap allocated and is not affected by > // edits to the map. > cmd->len = res->len; > cmd->a = res->a; > > // Release the lock as soon as possible. > y_strucon_rwlock_rdunlock(&r->env.lock); > } > else > { > // Release the lock as soon as possible. > y_strucon_rwlock_rdunlock(&r->env.lock); > > > > s = y_strucon_rwlock_wrlock(&r->env.lock); > if (y_err(s != y_STATUS_SUCCESS)) > { > y_str_free(&str); > goto err; > } > > // Make sure someone didn't get there first. > if (!y_map_existsStrings(&r->env.exec_map, (char*) cmd->a)) > { > s = y_map_insertStrings(&r->env.exec_map, (char*) cmd->a, > (char*) str.a); > if (s == y_STATUS_ELEM_EXISTS) > { > y_panica("Element already exists"); > } > } > > y_strucon_rwlock_wrunlock(&r->env.lock); > > // Always free first. > y_str_free(&str); > y_stackpool_free(pool); > > if (y_err(s != y_STATUS_SUCCESS)) goto err; > > cont = true; > } > } > } > while (cont); > > err: > > ``` > > Besides initialization (before any other threads are created) and > destruction (after all other threads are joined), these are the only > references to the map in question. > > `y_strucon_rwlock_*` functions are just wrappers around POSIX rwlocks. > > In this case, I'm not doing anything fancy; it's just rwlocks. However, > that abort about the element already existing will be triggered > consistently within about 5 minutes, both on musl and glibc, so probably > PEBKAC. > > If I change the read lock near the beginning with a write lock, I still > get the same issue. However, if I change the rwlock for a mutex, and > update all locking and unlocking to match, I don't get the issue. > > In this case, the strace shows nothing out of the ordinary that I can > see. > > Yet I can't see how I am doing anything wrong. I've double checked that > y_map_existsStrings{,_v} do not edit the map at all. > > So where's my stupidity on this one? > > Gavin Howard > --0000000000003337ee05da2fe81a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sat, Mar 12, 2022 at 9:11 AM Gavin= Howard <gavin.d.howard@gmai= l.com> wrote:
> This is why condition variables necessarily have an associated > predicate (in your case, involving your flag and possibly other
> state). You can *never* just do pthread_cond_wait. The only correct > usage is:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (!predicate(...)) pthread_cond_= wait(mtx,cnd);
>
> Correct use of condition variables ensures that you see any relevant > changes to the state the predicate is evaluated on, without requiring<= br> > you to explicitly work out a way to do the same thing with semaphores<= br> > or other lower-level primitives. It does not matter whatsoever whether=
> there are already waiters when you call signal. If not, they'll > necessarily see the state change *before* they wait. The state change<= br> > cannot happen between evaluation of the predicate and the wait because=
> the calling thread holds the mutex.

Ah, I misunderstood.

> > What this means is that if Main Thread #2 is blocked on waitpid()= , then
> > if another thread creates a child and signals the condition varia= ble,
> > then after Main Thread #2 returns from waitpid(), it will block o= n the
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^^^^^^^^^^^^^^^^^^^^
> > condition variable. If another thread creates another child, sure= , it
>=C2=A0 =C2=A0^^^^^^^^^^^^^^^^^^
>
> No it won't, because it evaluates the predicate that tells it ther= e's
> something to do before calling wait. If you're not doing that, you= 're
> using cond vars in a fundamentally wrong way.

I have now implemented the system using the mutex and changing the
semaphore to a condition variable with the flag and two other variables
for how many children there are versus how many children have been
reaped. We'll see if anything shows up while I run it over and over
again.

> No, this is absolutely not what's happening. Neither the processor= nor
> the compiler (in general the latter is at least as much of a concern > if not more when you have missing/incorrect synchronization) can
> reorder things like that.

Good to know, thank you.

> On some level they're like any other external function call to a > function whose definition the layer optimizing the caller can't se= e:
> they must be assumed to be able to store to or load from any memory > whose address has been exposed, and thus cannot be reordered
> whatsoever.

Okay, I wondered if that might be the case.

> Your problem is not mysterious reordering. Your problem is in your
> program's logic somewhere. Please use the debugging tools at your<= br> > disposal, especially strace which will probably reveal the problem to<= br> > you right away (by letting you see the exact sequence of events that > happened and trace through it to figure out where it mismatches your > expectation).

I did use strace. It revealed nothing out of the ordinary. That's why I=
did not mention it, but I probably should have. I've also used GDB to inspect the core dumps. I did try.

you = might find=C2=A0https://clang.llvm.org/docs/ThreadSanitizer.html useful for bugs like= this.
=C2=A0
Perhaps while I'm learning and making a fool of myself, I'll mentio= n my
problem with rwlocks.

The relevant code is:

```
do
{
=C2=A0 =C2=A0 bool rel =3D (strchr((char*) cmd->a, '/') !=3D NUL= L);

=C2=A0 =C2=A0 cont =3D false;

=C2=A0 =C2=A0 // We only need to do something when the cmd is not a relativ= e path.
=C2=A0 =C2=A0 if (!rel)
=C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 s =3D y_strucon_rwlock_rdlock(&r->env.lo= ck);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (y_err(s !=3D y_STATUS_SUCCESS)) goto err;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 exists =3D y_map_existsStrings_v(&r->env= .exec_map, (char*) cmd->a,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&res)= ;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 // We have to hold the lock until we have copie= d the result because it
=C2=A0 =C2=A0 =C2=A0 =C2=A0 // could be moved by a write to the map.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (exists)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // Just move the value from res t= o cmd. I can do this because
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // the string in res is heap allo= cated and is not affected by
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // edits to the map.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cmd->len =3D res->len;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cmd->a =3D res->a;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // Release the lock as soon as po= ssible.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_strucon_rwlock_rdunlock(&r-= >env.lock);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // Release the lock as soon as po= ssible.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_strucon_rwlock_rdunlock(&r-= >env.lock);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <Find executable, error if non= -existent, and prepare entry>

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s =3D y_strucon_rwlock_wrlock(&am= p;r->env.lock);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (y_err(s !=3D y_STATUS_SUCCESS= ))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_str_free(&str= );
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto err;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // Make sure someone didn't g= et there first.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!y_map_existsStrings(&r-&= gt;env.exec_map, (char*) cmd->a))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 s =3D y_map_insertS= trings(&r->env.exec_map, (char*) cmd->a,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (char*) = str.a);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (s =3D=3D y_STAT= US_ELEM_EXISTS)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_pan= ica("Element already exists");
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_strucon_rwlock_wrunlock(&r-= >env.lock);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // Always free first.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_str_free(&str);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 y_stackpool_free(pool);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (y_err(s !=3D y_STATUS_SUCCESS= )) goto err;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cont =3D true;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 }
}
while (cont);

err:
=C2=A0 =C2=A0 <Do some error handling>
```

Besides initialization (before any other threads are created) and
destruction (after all other threads are joined), these are the only
references to the map in question.

`y_strucon_rwlock_*` functions are just wrappers around POSIX rwlocks.

In this case, I'm not doing anything fancy; it's just rwlocks. Howe= ver,
that abort about the element already existing will be triggered
consistently within about 5 minutes, both on musl and glibc, so probably PEBKAC.

If I change the read lock near the beginning with a write lock, I still
get the same issue. However, if I change the rwlock for a mutex, and
update all locking and unlocking to match, I don't get the issue.

In this case, the strace shows nothing out of the ordinary that I can
see.

Yet I can't see how I am doing anything wrong. I've double checked = that
y_map_existsStrings{,_v} do not edit the map at all.

So where's my stupidity on this one?

Gavin Howard
--0000000000003337ee05da2fe81a--