mailing list of musl libc
 help / color / mirror / code / Atom feed
From: enh <enh@google.com>
To: musl@lists.openwall.com
Cc: Rich Felker <dalias@libc.org>
Subject: Re: [musl] Possible PEBKAC or Bug in musl Semaphores and/or Mutexes
Date: Mon, 14 Mar 2022 09:09:26 -0700	[thread overview]
Message-ID: <CAJgzZoq9R25RJX0Y8TvVbjkeQ42oNiRXHkqgJUzXGHgRSqh+aQ@mail.gmail.com> (raw)
In-Reply-To: <CAF=dzRMjtCkZn0Vx31Bc14-+cnUDi_fP41=tf7eMShAchE-tNw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6412 bytes --]

On Sat, Mar 12, 2022 at 9:11 AM Gavin Howard <gavin.d.howard@gmail.com>
wrote:

> > This is why condition variables necessarily have an associated
> > predicate (in your case, involving your flag and possibly other
> > state). You can *never* just do pthread_cond_wait. The only correct
> > usage is:
> >
> >         while (!predicate(...)) pthread_cond_wait(mtx,cnd);
> >
> > Correct use of condition variables ensures that you see any relevant
> > changes to the state the predicate is evaluated on, without requiring
> > you to explicitly work out a way to do the same thing with semaphores
> > or other lower-level primitives. It does not matter whatsoever whether
> > there are already waiters when you call signal. If not, they'll
> > necessarily see the state change *before* they wait. The state change
> > cannot happen between evaluation of the predicate and the wait because
> > the calling thread holds the mutex.
>
> Ah, I misunderstood.
>
> > > What this means is that if Main Thread #2 is blocked on waitpid(), then
> > > if another thread creates a child and signals the condition variable,
> > > then after Main Thread #2 returns from waitpid(), it will block on the
> >                                                     ^^^^^^^^^^^^^^^^^^^^
> > > condition variable. If another thread creates another child, sure, it
> >   ^^^^^^^^^^^^^^^^^^
> >
> > No it won't, because it evaluates the predicate that tells it there's
> > something to do before calling wait. If you're not doing that, you're
> > using cond vars in a fundamentally wrong way.
>
> I have now implemented the system using the mutex and changing the
> semaphore to a condition variable with the flag and two other variables
> for how many children there are versus how many children have been
> reaped. We'll see if anything shows up while I run it over and over
> again.
>
> > No, this is absolutely not what's happening. Neither the processor nor
> > the compiler (in general the latter is at least as much of a concern
> > if not more when you have missing/incorrect synchronization) can
> > reorder things like that.
>
> Good to know, thank you.
>
> > On some level they're like any other external function call to a
> > function whose definition the layer optimizing the caller can't see:
> > they must be assumed to be able to store to or load from any memory
> > whose address has been exposed, and thus cannot be reordered
> > whatsoever.
>
> Okay, I wondered if that might be the case.
>
> > Your problem is not mysterious reordering. Your problem is in your
> > program's logic somewhere. Please use the debugging tools at your
> > disposal, especially strace which will probably reveal the problem to
> > you right away (by letting you see the exact sequence of events that
> > happened and trace through it to figure out where it mismatches your
> > expectation).
>
> I did use strace. It revealed nothing out of the ordinary. That's why I
> did not mention it, but I probably should have. I've also used GDB to
> inspect the core dumps. I did try.
>

you might find https://clang.llvm.org/docs/ThreadSanitizer.html useful for
bugs like this.


> Perhaps while I'm learning and making a fool of myself, I'll mention my
> problem with rwlocks.
>
> The relevant code is:
>
> ```
> do
> {
>     bool rel = (strchr((char*) cmd->a, '/') != NULL);
>
>     cont = false;
>
>     // We only need to do something when the cmd is not a relative path.
>     if (!rel)
>     {
>         s = y_strucon_rwlock_rdlock(&r->env.lock);
>         if (y_err(s != y_STATUS_SUCCESS)) goto err;
>
>         exists = y_map_existsStrings_v(&r->env.exec_map, (char*) cmd->a,
>                                        &res);
>
>         // We have to hold the lock until we have copied the result
> because it
>         // could be moved by a write to the map.
>         if (exists)
>         {
>             // Just move the value from res to cmd. I can do this because
>             // the string in res is heap allocated and is not affected by
>             // edits to the map.
>             cmd->len = res->len;
>             cmd->a = res->a;
>
>             // Release the lock as soon as possible.
>             y_strucon_rwlock_rdunlock(&r->env.lock);
>         }
>         else
>         {
>             // Release the lock as soon as possible.
>             y_strucon_rwlock_rdunlock(&r->env.lock);
>
>             <Find executable, error if non-existent, and prepare entry>
>
>             s = y_strucon_rwlock_wrlock(&r->env.lock);
>             if (y_err(s != y_STATUS_SUCCESS))
>             {
>                 y_str_free(&str);
>                 goto err;
>             }
>
>             // Make sure someone didn't get there first.
>             if (!y_map_existsStrings(&r->env.exec_map, (char*) cmd->a))
>             {
>                 s = y_map_insertStrings(&r->env.exec_map, (char*) cmd->a,
>                                         (char*) str.a);
>                 if (s == y_STATUS_ELEM_EXISTS)
>                 {
>                     y_panica("Element already exists");
>                 }
>             }
>
>             y_strucon_rwlock_wrunlock(&r->env.lock);
>
>             // Always free first.
>             y_str_free(&str);
>             y_stackpool_free(pool);
>
>             if (y_err(s != y_STATUS_SUCCESS)) goto err;
>
>             cont = true;
>         }
>     }
> }
> while (cont);
>
> err:
>     <Do some error handling>
> ```
>
> Besides initialization (before any other threads are created) and
> destruction (after all other threads are joined), these are the only
> references to the map in question.
>
> `y_strucon_rwlock_*` functions are just wrappers around POSIX rwlocks.
>
> In this case, I'm not doing anything fancy; it's just rwlocks. However,
> that abort about the element already existing will be triggered
> consistently within about 5 minutes, both on musl and glibc, so probably
> PEBKAC.
>
> If I change the read lock near the beginning with a write lock, I still
> get the same issue. However, if I change the rwlock for a mutex, and
> update all locking and unlocking to match, I don't get the issue.
>
> In this case, the strace shows nothing out of the ordinary that I can
> see.
>
> Yet I can't see how I am doing anything wrong. I've double checked that
> y_map_existsStrings{,_v} do not edit the map at all.
>
> So where's my stupidity on this one?
>
> Gavin Howard
>

[-- Attachment #2: Type: text/html, Size: 8137 bytes --]

  reply	other threads:[~2022-03-14 16:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 19:41 Gavin Howard
2022-03-11 20:21 ` Rich Felker
2022-03-12  5:45   ` Gavin Howard
2022-03-12 15:01     ` Rich Felker
2022-03-12 17:10       ` Gavin Howard
2022-03-14 16:09         ` enh [this message]
2022-03-14 16:12           ` Gavin Howard
2022-03-14 16:23             ` Rich Felker
2022-03-15 18:55               ` Gavin Howard
2022-03-15 19:50                 ` Markus Wichmann
2022-03-15 20:35                   ` Gavin Howard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJgzZoq9R25RJX0Y8TvVbjkeQ42oNiRXHkqgJUzXGHgRSqh+aQ@mail.gmail.com \
    --to=enh@google.com \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).