Upon successful completion, pthread_create() shall store the ID

"Upon successfully completion" is somewhat vague wording IMO which could result in misinterpretation? You seem to have interpreted it as "Upon function completion", but surely if clone() succeeds, and the new thread starts running, then there is already "successful completion" even before pthread_create() has returned to caller?

It's also interesting that the latest man pages for pthread_create() [2] have changed the wording substantially:

Before returning, a successful call to pthread_create() stores the ID of the new thread in the buffer pointed to by thread; this identifier is used to refer to the thread in subsequent calls to other pthreads functions.

This seems more specific than "Upon successful completion" but still doesn't exactly tie down whether the thread ID should be written before the new thread starts execution, only saying (somewhen) "Before returning" the thread ID should be written by pthread_create(). In a way, it's backing up the musl thread ID store implementation because it says 'subsequent calls to other pthreads functions' which implies calls after pthread_create() returns, whether in the same thread or other threads including the newly created thread.

any code relying on it is non-portable/racy.

I checked the glibc pthread_create() implementation and it appears to set the thread ID before the new thread executes. So from that POV any code using glibc relying on this is arguably non-portable, but not racy since there is no race with the glibc implementation?

Are there reasons you still think the alternate behavior would be preferable?

I'm not advocating for C/C++ code to access the thread ID like this. However, assuming (a big if...) that long existing code relying on this -- that hasn't been compiled with musl yet -- exists and is not racy and works fine with glibc, but will intermittently fail with the musl implementation, wouldn't it be 'defensive programming' [3] to move the thread ID store in the musl pthread_create() implementation? Worse, a binary compiled with musl and this issue might already exist and be "working", but like the code I provided, only fails if run by gdb or strace or specific timing situations... and maybe gdb or strace have never been run on it to date? So it's a sort of 'sleeper' issue?

Personally I think the proper fix is to (a) write the thread ID before the new thread starts executing because it's common sense that things should work that way, and existing code written with locks will continue to work, and existing code written without locks will also just work, and (b) whether fixing or not, try to update the various pthread_create() documentation to account for this edge case, so that it's obvious one way or the other. This would (a) fix the cause of the confusion, (b) avoid users of pthread_create() having to fall back to common sense, and (c) optionally make all existing caller implementations more robust?

If the new thread needs its own id, there's an easy and portable way to obtain it: pthread_self().

Interestingly, my initial implementation with pthread_self() actually failed via glibc, or at least one of the pthread functions using it seemed to fail... I cannot remember the exact circumstances unfortunately. Which is why I started using the thread ID returned by pthread_create() in the first place, because that didn't fail and no lock was needed. My personal philosophy with threads and locks is to try and minimize their use where-ever possible, and that's the path which prompted this email thread in the first place because I developed the working code with glibc and only later tried compiling and running with musl and to my surprise it didn't work...

Anyway, I hope this is food for thought and no worries if you decide not to change anything. It's been fun bringing this to your attention and interacting with you! And if there's anything more I can do to help further then please let me know!

--
Simon

[1] https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html
[2] http://man7.org/linux/man-pages/man3/pthread_create.3.html
[3] https://en.wikipedia.org/wiki/Defensive_programming

On Mon, Jan 27, 2020 at 9:51 AM Rich Felker <dalias@libc.org> wrote:
On Sun, Jan 26, 2020 at 04:33:57PM -0800, Simon wrote:
> Hello! I recently had some C code which works normally with glibc but seg
> faults sometimes with musl. I managed to reproduce the seg fault via
> musl-gcc and Alpine Linux and document it here [1]. Seems to be some kind
> of race condition, so hopefully you guys also get a seg fault when you
> follow my reproduction steps. Hope this helps and looking forward to any
> feedback or helping further if possible, Simon
>
> [1] https://gist.github.com/simonhf/6d8097f4d6caa572cc42354f494b20ef

This behavior was originally intentional. In general, if a function is
specified to modify pointed-to memory as output as a side effect of
success, that does not give it license to modify it on failure. And
since pthread_create can't commit to success until after the thread is
created, it would have to hold back start of the new thread
unnecessarily to guarantee that the result is written before the new
thread starts. (Note that it can't simply write the value from both
the caller and the new thread; the latter could end up writing to the
pthread_t object after the end of its lifetime.)

Moreover, there is no expectation from the application that it should
be able to read the result object from the new thread without
additional synchronization. The wording of the spec is:

    "Upon successful completion, pthread_create() shall store the ID
     ^^^^^^^^^^^^^^^^^^^^^^^^^^
    of the created thread in the location referenced by thread."

Until completion of (observation of & synchronization with return
from) pthread_create, nothing can be said about value of the object;
access to it is unsynchronized.

With that said, the specification for pthread_create does *allow*
implementations that store the value speculatively before success:

    "If pthread_create() fails, no new thread is created and the
    contents of the location referenced by thread are undefined."

I was not aware of this when writing it. So we could change it, but it
doesn't seem like a very good idea to do so; any code relying on it is
non-portable/racy. If the new thread needs its own id, there's an easy
and portable way to obtain it: pthread_self().

Are there reasons you still think the alternate behavior would be
preferable?

Rich