From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id 29119889 for ; Mon, 27 Jan 2020 19:59:50 +0000 (UTC) Received: (qmail 22341 invoked by uid 550); 27 Jan 2020 19:59:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 22323 invoked from network); 27 Jan 2020 19:59:48 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KIl0g37EADs7MuInx3F869hR5ZVLpa4B4lA1ewzz+44=; b=HtJ/REhtfMQgpyeFmPYaa6rWAupXdIIgQi6kfMzB5YVlE6l0WusYb6dI/nmL7TgDk8 3mMsDIqtCUhMO+WlixCRYK5RnOEagYnlKeivXWdd4R8AH9ilXquSg2JnFfFyFIXzBH55 y98NHgChqxQlXbcWWktOKK2E5AMYh/JOUEYCJ4LlsazqiTZr6rGwqCdzQaz++ICUK+Aj ZOc+PuTCLFmTD/lMXj4jDnPo6BIW73cMIaD9IyjGJyHQ6ts2uh2iHbz6Gj8b+XMchJQY lX3gJF3l8FVeGuernJanvwdOlD53e6P3wrxq63nLwSMtIJ4dQ2wCFID4sBxk5WGgwdQi xUuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KIl0g37EADs7MuInx3F869hR5ZVLpa4B4lA1ewzz+44=; b=FaiLvPxn63uvksHtaVP3IXXSDpuozTCSa7OKL8HSNSLrpNS9dmz5rU2YHz1rYe3Vf7 4UounmNZbRfhPw1li8DBNOWJQ2HY96DKY4MzEAFqSYuOn9xZNs6SQ9M1Q7t2oGY6GE5x EJx5t1cOaRCznFPsPNs5rXcZnu8p+7RUkPNuNUMmB5LUSBrlUmZzaLJZx9FhtXRRACjE m9lu498G5cVY9am0n3xmLD87lZbdi6OyWCqPWhhyT192crO5x1cFWaH9oVL09lNcyH47 Q7xhIlJDDVnh9my+Dtdn2gAsJaZ6uH5WEFbwVC1kUoDm78ZFOckCd9FENxhF+GflW+Fk GyeQ== X-Gm-Message-State: APjAAAUvdwhs+zV67vxrO8gpVM1ZmomEzdjcq1KdLwTo9E+9URctWXtL Ptv6Rbq196OFeuAfG3V9zyqvkzaErSZE2IckN2Y= X-Google-Smtp-Source: APXvYqychQZYgoVkhKs39MxxBHMlcu9vpiYbjmOEgAxEGt6ctcHjDqiZdsX8iDPsoIkJtEDdBuQGheH9hIx2YB7il7c= X-Received: by 2002:a05:6402:13c6:: with SMTP id a6mr362062edx.67.1580155176592; Mon, 27 Jan 2020 11:59:36 -0800 (PST) MIME-Version: 1.0 References: <20200127175154.GA30412@brightrain.aerifal.cx> In-Reply-To: <20200127175154.GA30412@brightrain.aerifal.cx> From: Simon Date: Mon, 27 Jan 2020 11:59:24 -0800 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: multipart/alternative; boundary="000000000000fdaaab059d248b01" Subject: Re: [musl] Bug report: Reproduction of seg fault caused by musl thread creation race condition --000000000000fdaaab059d248b01 Content-Type: text/plain; charset="UTF-8" > > Upon successful completion, pthread_create() shall store the ID > "Upon successfully completion" is somewhat vague wording IMO which could result in misinterpretation? You seem to have interpreted it as "Upon function completion", but surely if clone() succeeds, and the new thread starts running, then there is already "successful completion" even before pthread_create() has returned to caller? It's also interesting that the latest man pages for pthread_create() [2] have changed the wording substantially: Before returning, a successful call to pthread_create() stores the ID of > the new thread in the buffer pointed to by thread; this identifier is used > to refer to the thread in subsequent calls to other pthreads functions. > This seems more specific than "Upon successful completion" but still doesn't exactly tie down whether the thread ID should be written before the new thread starts execution, only saying (somewhen) "Before returning" the thread ID should be written by pthread_create(). In a way, it's backing up the musl thread ID store implementation because it says 'subsequent calls to other pthreads functions' which implies calls after pthread_create() returns, whether in the same thread or other threads including the newly created thread. any code relying on it is non-portable/racy. > I checked the glibc pthread_create() implementation and it appears to set the thread ID before the new thread executes. So from that POV any code using glibc relying on this is arguably non-portable, but not racy since there is no race with the glibc implementation? Are there reasons you still think the alternate behavior would be > preferable? > I'm not advocating for C/C++ code to access the thread ID like this. However, assuming (a big if...) that long existing code relying on this -- that hasn't been compiled with musl yet -- exists and is not racy and works fine with glibc, but will intermittently fail with the musl implementation, wouldn't it be 'defensive programming' [3] to move the thread ID store in the musl pthread_create() implementation? Worse, a binary compiled with musl and this issue might already exist and be "working", but like the code I provided, only fails if run by gdb or strace or specific timing situations... and maybe gdb or strace have never been run on it to date? So it's a sort of 'sleeper' issue? Personally I think the proper fix is to (a) write the thread ID before the new thread starts executing because it's common sense that things should work that way, and existing code written with locks will continue to work, and existing code written without locks will also just work, and (b) whether fixing or not, try to update the various pthread_create() documentation to account for this edge case, so that it's obvious one way or the other. This would (a) fix the cause of the confusion, (b) avoid users of pthread_create() having to fall back to common sense, and (c) optionally make all existing caller implementations more robust? If the new thread needs its own id, there's an easy and portable way to > obtain it: pthread_self(). > Interestingly, my initial implementation with pthread_self() actually failed via glibc, or at least one of the pthread functions using it seemed to fail... I cannot remember the exact circumstances unfortunately. Which is why I started using the thread ID returned by pthread_create() in the first place, because that didn't fail and no lock was needed. My personal philosophy with threads and locks is to try and minimize their use where-ever possible, and that's the path which prompted this email thread in the first place because I developed the working code with glibc and only later tried compiling and running with musl and to my surprise it didn't work... Anyway, I hope this is food for thought and no worries if you decide not to change anything. It's been fun bringing this to your attention and interacting with you! And if there's anything more I can do to help further then please let me know! -- Simon [1] https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html [2] http://man7.org/linux/man-pages/man3/pthread_create.3.html [3] https://en.wikipedia.org/wiki/Defensive_programming On Mon, Jan 27, 2020 at 9:51 AM Rich Felker wrote: > On Sun, Jan 26, 2020 at 04:33:57PM -0800, Simon wrote: > > Hello! I recently had some C code which works normally with glibc but seg > > faults sometimes with musl. I managed to reproduce the seg fault via > > musl-gcc and Alpine Linux and document it here [1]. Seems to be some kind > > of race condition, so hopefully you guys also get a seg fault when you > > follow my reproduction steps. Hope this helps and looking forward to any > > feedback or helping further if possible, Simon > > > > [1] https://gist.github.com/simonhf/6d8097f4d6caa572cc42354f494b20ef > > This behavior was originally intentional. In general, if a function is > specified to modify pointed-to memory as output as a side effect of > success, that does not give it license to modify it on failure. And > since pthread_create can't commit to success until after the thread is > created, it would have to hold back start of the new thread > unnecessarily to guarantee that the result is written before the new > thread starts. (Note that it can't simply write the value from both > the caller and the new thread; the latter could end up writing to the > pthread_t object after the end of its lifetime.) > > Moreover, there is no expectation from the application that it should > be able to read the result object from the new thread without > additional synchronization. The wording of the spec is: > > "Upon successful completion, pthread_create() shall store the ID > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > of the created thread in the location referenced by thread." > > Until completion of (observation of & synchronization with return > from) pthread_create, nothing can be said about value of the object; > access to it is unsynchronized. > > With that said, the specification for pthread_create does *allow* > implementations that store the value speculatively before success: > > "If pthread_create() fails, no new thread is created and the > contents of the location referenced by thread are undefined." > > I was not aware of this when writing it. So we could change it, but it > doesn't seem like a very good idea to do so; any code relying on it is > non-portable/racy. If the new thread needs its own id, there's an easy > and portable way to obtain it: pthread_self(). > > Are there reasons you still think the alternate behavior would be > preferable? > > Rich > --000000000000fdaaab059d248b01 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Upo= n successful completion, pthread_create() shall store the ID

"Upon successfully completion" is somewha= t vague wording IMO which could result in misinterpretation? You seem to ha= ve interpreted it as "Upon function completion", but surely if cl= one() succeeds, and the new thread starts running, then there is already &q= uot;successful completion" even before pthread_create() has returned t= o caller?

It's also interesting that the lates= t man pages for pthread_create() [2] have changed the wording substantially= :

Before returning, a successful call to pthread_create() stores the ID of = the new thread in the buffer pointed to by thread; this identifier is used = to refer to the thread in subsequent calls to other pthreads functions.

This seems more specific than "Upon= successful completion" but still doesn't exactly tie down whether= the thread ID should be written before the new thread starts execution, on= ly saying (somewhen) "Before returning" the thread ID should be w= ritten by pthread_create(). In a way, it's backing up the musl thread I= D store implementation because it says 'subsequent calls to other pthre= ads functions' which implies calls after pthread_create() returns, whet= her in the same thread or other threads including the newly created thread.=

=
any code relying on it is non-portable/racy.
I checked the glibc pthread_create() implementation and it app= ears to set the thread ID before the new thread executes. So from that POV = any code using glibc relying on this is arguably non-portable, but not racy= since there is no race with the glibc implementation?

Are there reason= s you still think the alternate behavior would be preferable?

I'm not advocating for C/C++ code to access th= e thread ID like this. However, assuming (a big if...) that long existing c= ode relying on this -- that hasn't been compiled with musl yet -- exist= s and is not racy and works fine with glibc, but will intermittently fail w= ith the musl implementation, wouldn't it be 'defensive programming&= #39; [3] to move the thread ID store in the musl pthread_create() implement= ation? Worse, a binary compiled with musl and this issue might already exis= t and be "working", but like the code I provided, only fails if r= un by gdb or strace or specific timing situations... and maybe gdb or strac= e have never been run on it to date? So it's a sort of 'sleeper'= ; issue?

Personally I think the proper fix is to (= a) write the thread ID before the new thread starts executing because it= 9;s common sense that things should work that way, and existing code writte= n with locks will continue to work, and existing code written without locks= will also just work, and (b) whether fixing or not, try to update the vari= ous pthread_create() documentation to account for this edge case, so that i= t's obvious one way or the other. This would (a) fix the cause of the c= onfusion, (b) avoid users of pthread_create() having to fall back to common= sense, and (c) optionally make all existing caller implementations more ro= bust?

If the new thread needs its own id, there's an easy and porta= ble way to obtain it: pthread_self().

Interestingly, my initial implementation with pthread_self() actually fail= ed via glibc, or at least one of the pthread functions using it seemed to f= ail... I cannot remember the exact circumstances unfortunately. Which is wh= y I started using the thread ID returned by pthread_create() in the first p= lace, because that didn't fail and no lock was needed. My personal phil= osophy with threads and locks is to try and minimize their use where-ever p= ossible, and that's the path which prompted this email thread in the fi= rst place because I developed the working code with glibc and only later tr= ied compiling and running with musl and to my surprise it didn't work..= .

Anyway, I hope this is food for thought and = no worries if you decide not to change anything. It's been fun bringing= this to your attention and interacting with you! And if there's anythi= ng more I can do to help further then please let me know!
--
Simon

[1] ht= tps://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html
On Mon, Jan 27= , 2020 at 9:51 AM Rich Felker <dalias= @libc.org> wrote:
On Sun, Jan 26, 2020 at 04:33:57PM -0800, Simon wrote:
> Hello! I recently had some C code which works normally with glibc but = seg
> faults sometimes with musl. I managed to reproduce the seg fault via > musl-gcc and Alpine Linux and document it here [1]. Seems to be some k= ind
> of race condition, so hopefully you guys also get a seg fault when you=
> follow my reproduction steps. Hope this helps and looking forward to a= ny
> feedback or helping further if possible, Simon
>
> [1] https://gist.github.com/sim= onhf/6d8097f4d6caa572cc42354f494b20ef

This behavior was originally intentional. In general, if a function is
specified to modify pointed-to memory as output as a side effect of
success, that does not give it license to modify it on failure. And
since pthread_create can't commit to success until after the thread is<= br> created, it would have to hold back start of the new thread
unnecessarily to guarantee that the result is written before the new
thread starts. (Note that it can't simply write the value from both
the caller and the new thread; the latter could end up writing to the
pthread_t object after the end of its lifetime.)

Moreover, there is no expectation from the application that it should
be able to read the result object from the new thread without
additional synchronization. The wording of the spec is:

=C2=A0 =C2=A0 "Upon successful completion, pthread_create() shall stor= e the ID
=C2=A0 =C2=A0 =C2=A0^^^^^^^^^^^^^^^^^^^^^^^^^^
=C2=A0 =C2=A0 of the created thread in the location referenced by thread.&q= uot;

Until completion of (observation of & synchronization with return
from) pthread_create, nothing can be said about value of the object;
access to it is unsynchronized.

With that said, the specification for pthread_create does *allow*
implementations that store the value speculatively before success:

=C2=A0 =C2=A0 "If pthread_create() fails, no new thread is created and= the
=C2=A0 =C2=A0 contents of the location referenced by thread are undefined.&= quot;

I was not aware of this when writing it. So we could change it, but it
doesn't seem like a very good idea to do so; any code relying on it is<= br> non-portable/racy. If the new thread needs its own id, there's an easy<= br> and portable way to obtain it: pthread_self().

Are there reasons you still think the alternate behavior would be
preferable?

Rich
--000000000000fdaaab059d248b01--