From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <musl-return-15310-ml=inbox.vuxu.org@lists.openwall.com>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.2
Received: from mother.openwall.net (mother.openwall.net [195.42.179.200])
	by inbox.vuxu.org (OpenSMTPD) with SMTP id 29119889
	for <ml@inbox.vuxu.org>;
	Mon, 27 Jan 2020 19:59:50 +0000 (UTC)
Received: (qmail 22341 invoked by uid 550); 27 Jan 2020 19:59:48 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 22323 invoked from network); 27 Jan 2020 19:59:48 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=KIl0g37EADs7MuInx3F869hR5ZVLpa4B4lA1ewzz+44=;
        b=HtJ/REhtfMQgpyeFmPYaa6rWAupXdIIgQi6kfMzB5YVlE6l0WusYb6dI/nmL7TgDk8
         3mMsDIqtCUhMO+WlixCRYK5RnOEagYnlKeivXWdd4R8AH9ilXquSg2JnFfFyFIXzBH55
         y98NHgChqxQlXbcWWktOKK2E5AMYh/JOUEYCJ4LlsazqiTZr6rGwqCdzQaz++ICUK+Aj
         ZOc+PuTCLFmTD/lMXj4jDnPo6BIW73cMIaD9IyjGJyHQ6ts2uh2iHbz6Gj8b+XMchJQY
         lX3gJF3l8FVeGuernJanvwdOlD53e6P3wrxq63nLwSMtIJ4dQ2wCFID4sBxk5WGgwdQi
         xUuQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=KIl0g37EADs7MuInx3F869hR5ZVLpa4B4lA1ewzz+44=;
        b=FaiLvPxn63uvksHtaVP3IXXSDpuozTCSa7OKL8HSNSLrpNS9dmz5rU2YHz1rYe3Vf7
         4UounmNZbRfhPw1li8DBNOWJQ2HY96DKY4MzEAFqSYuOn9xZNs6SQ9M1Q7t2oGY6GE5x
         EJx5t1cOaRCznFPsPNs5rXcZnu8p+7RUkPNuNUMmB5LUSBrlUmZzaLJZx9FhtXRRACjE
         m9lu498G5cVY9am0n3xmLD87lZbdi6OyWCqPWhhyT192crO5x1cFWaH9oVL09lNcyH47
         Q7xhIlJDDVnh9my+Dtdn2gAsJaZ6uH5WEFbwVC1kUoDm78ZFOckCd9FENxhF+GflW+Fk
         GyeQ==
X-Gm-Message-State: APjAAAUvdwhs+zV67vxrO8gpVM1ZmomEzdjcq1KdLwTo9E+9URctWXtL
	Ptv6Rbq196OFeuAfG3V9zyqvkzaErSZE2IckN2Y=
X-Google-Smtp-Source: APXvYqychQZYgoVkhKs39MxxBHMlcu9vpiYbjmOEgAxEGt6ctcHjDqiZdsX8iDPsoIkJtEDdBuQGheH9hIx2YB7il7c=
X-Received: by 2002:a05:6402:13c6:: with SMTP id a6mr362062edx.67.1580155176592;
 Mon, 27 Jan 2020 11:59:36 -0800 (PST)
MIME-Version: 1.0
References: <CABkUXbdOP8d=BzFTpYetmEEKyEwWqwaW7NmmB9vdJacu-wXABQ@mail.gmail.com>
 <20200127175154.GA30412@brightrain.aerifal.cx>
In-Reply-To: <20200127175154.GA30412@brightrain.aerifal.cx>
From: Simon <simonhf@gmail.com>
Date: Mon, 27 Jan 2020 11:59:24 -0800
Message-ID: <CABkUXbe6-vg+A5--2jPJCn4QMJ1HTtt4bdL0S7DQv9nvotwj9A@mail.gmail.com>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Content-Type: multipart/alternative; boundary="000000000000fdaaab059d248b01"
Subject: Re: [musl] Bug report: Reproduction of seg fault caused by musl
 thread creation race condition

--000000000000fdaaab059d248b01
Content-Type: text/plain; charset="UTF-8"

>
> Upon successful completion, pthread_create() shall store the ID
>

"Upon successfully completion" is somewhat vague wording IMO which could
result in misinterpretation? You seem to have interpreted it as "Upon
function completion", but surely if clone() succeeds, and the new thread
starts running, then there is already "successful completion" even before
pthread_create() has returned to caller?

It's also interesting that the latest man pages for pthread_create() [2]
have changed the wording substantially:

Before returning, a successful call to pthread_create() stores the ID of
> the new thread in the buffer pointed to by thread; this identifier is used
> to refer to the thread in subsequent calls to other pthreads functions.
>

This seems more specific than "Upon successful completion" but still
doesn't exactly tie down whether the thread ID should be written before the
new thread starts execution, only saying (somewhen) "Before returning" the
thread ID should be written by pthread_create(). In a way, it's backing up
the musl thread ID store implementation because it says 'subsequent calls
to other pthreads functions' which implies calls after pthread_create()
returns, whether in the same thread or other threads including the newly
created thread.

any code relying on it is non-portable/racy.
>

I checked the glibc pthread_create() implementation and it appears to set
the thread ID before the new thread executes. So from that POV any code
using glibc relying on this is arguably non-portable, but not racy since
there is no race with the glibc implementation?

Are there reasons you still think the alternate behavior would be
> preferable?
>

I'm not advocating for C/C++ code to access the thread ID like this.
However, assuming (a big if...) that long existing code relying on this --
that hasn't been compiled with musl yet -- exists and is not racy and works
fine with glibc, but will intermittently fail with the musl implementation,
wouldn't it be 'defensive programming' [3] to move the thread ID store in
the musl pthread_create() implementation? Worse, a binary compiled with
musl and this issue might already exist and be "working", but like the code
I provided, only fails if run by gdb or strace or specific timing
situations... and maybe gdb or strace have never been run on it to date? So
it's a sort of 'sleeper' issue?

Personally I think the proper fix is to (a) write the thread ID before the
new thread starts executing because it's common sense that things should
work that way, and existing code written with locks will continue to work,
and existing code written without locks will also just work, and (b)
whether fixing or not, try to update the various pthread_create()
documentation to account for this edge case, so that it's obvious one way
or the other. This would (a) fix the cause of the confusion, (b) avoid
users of pthread_create() having to fall back to common sense, and (c)
optionally make all existing caller implementations more robust?

If the new thread needs its own id, there's an easy and portable way to
> obtain it: pthread_self().
>

Interestingly, my initial implementation with pthread_self() actually
failed via glibc, or at least one of the pthread functions using it seemed
to fail... I cannot remember the exact circumstances unfortunately. Which
is why I started using the thread ID returned by pthread_create() in the
first place, because that didn't fail and no lock was needed. My personal
philosophy with threads and locks is to try and minimize their use
where-ever possible, and that's the path which prompted this email thread
in the first place because I developed the working code with glibc and only
later tried compiling and running with musl and to my surprise it didn't
work...

Anyway, I hope this is food for thought and no worries if you decide not to
change anything. It's been fun bringing this to your attention and
interacting with you! And if there's anything more I can do to help further
then please let me know!

--
Simon

[1] https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html
[2] http://man7.org/linux/man-pages/man3/pthread_create.3.html
[3] https://en.wikipedia.org/wiki/Defensive_programming

On Mon, Jan 27, 2020 at 9:51 AM Rich Felker <dalias@libc.org> wrote:

> On Sun, Jan 26, 2020 at 04:33:57PM -0800, Simon wrote:
> > Hello! I recently had some C code which works normally with glibc but seg
> > faults sometimes with musl. I managed to reproduce the seg fault via
> > musl-gcc and Alpine Linux and document it here [1]. Seems to be some kind
> > of race condition, so hopefully you guys also get a seg fault when you
> > follow my reproduction steps. Hope this helps and looking forward to any
> > feedback or helping further if possible, Simon
> >
> > [1] https://gist.github.com/simonhf/6d8097f4d6caa572cc42354f494b20ef
>
> This behavior was originally intentional. In general, if a function is
> specified to modify pointed-to memory as output as a side effect of
> success, that does not give it license to modify it on failure. And
> since pthread_create can't commit to success until after the thread is
> created, it would have to hold back start of the new thread
> unnecessarily to guarantee that the result is written before the new
> thread starts. (Note that it can't simply write the value from both
> the caller and the new thread; the latter could end up writing to the
> pthread_t object after the end of its lifetime.)
>
> Moreover, there is no expectation from the application that it should
> be able to read the result object from the new thread without
> additional synchronization. The wording of the spec is:
>
>     "Upon successful completion, pthread_create() shall store the ID
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     of the created thread in the location referenced by thread."
>
> Until completion of (observation of & synchronization with return
> from) pthread_create, nothing can be said about value of the object;
> access to it is unsynchronized.
>
> With that said, the specification for pthread_create does *allow*
> implementations that store the value speculatively before success:
>
>     "If pthread_create() fails, no new thread is created and the
>     contents of the location referenced by thread are undefined."
>
> I was not aware of this when writing it. So we could change it, but it
> doesn't seem like a very good idea to do so; any code relying on it is
> non-portable/racy. If the new thread needs its own id, there's an easy
> and portable way to obtain it: pthread_self().
>
> Are there reasons you still think the alternate behavior would be
> preferable?
>
> Rich
>

--000000000000fdaaab059d248b01
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Upo=
n successful completion, pthread_create() shall store the ID</div></blockqu=
ote><div><br></div><div>&quot;Upon successfully completion&quot; is somewha=
t vague wording IMO which could result in misinterpretation? You seem to ha=
ve interpreted it as &quot;Upon function completion&quot;, but surely if cl=
one() succeeds, and the new thread starts running, then there is already &q=
uot;successful completion&quot; even before pthread_create() has returned t=
o caller?</div><div><br></div><div>It&#39;s also interesting that the lates=
t man pages for pthread_create() [2] have changed the wording substantially=
:</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px=
 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><di=
v>Before returning, a successful call to pthread_create() stores the ID of =
the new thread in the buffer pointed to by thread; this identifier is used =
to refer to the thread in subsequent calls to other pthreads functions.</di=
v></blockquote><div><br></div><div>This seems more specific than &quot;Upon=
 successful completion&quot; but still doesn&#39;t exactly tie down whether=
 the thread ID should be written before the new thread starts execution, on=
ly saying (somewhen) &quot;Before returning&quot; the thread ID should be w=
ritten by pthread_create(). In a way, it&#39;s backing up the musl thread I=
D store implementation because it says &#39;subsequent calls to other pthre=
ads functions&#39; which implies calls after pthread_create() returns, whet=
her in the same thread or other threads including the newly created thread.=
<br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">=
<div>any code relying on it is non-portable/racy.</div></blockquote><div><b=
r></div><div>I checked the glibc pthread_create() implementation and it app=
ears to set the thread ID before the new thread executes. So from that POV =
any code using glibc relying on this is arguably non-portable, but not racy=
 since there is no race with the glibc implementation?<br></div><div><br></=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Are there reason=
s you still think the alternate behavior would be preferable?</div></blockq=
uote><div><br></div><div>I&#39;m not advocating for C/C++ code to access th=
e thread ID like this. However, assuming (a big if...) that long existing c=
ode relying on this -- that hasn&#39;t been compiled with musl yet -- exist=
s and is not racy and works fine with glibc, but will intermittently fail w=
ith the musl implementation, wouldn&#39;t it be &#39;defensive programming&=
#39; [3] to move the thread ID store in the musl pthread_create() implement=
ation? Worse, a binary compiled with musl and this issue might already exis=
t and be &quot;working&quot;, but like the code I provided, only fails if r=
un by gdb or strace or specific timing situations... and maybe gdb or strac=
e have never been run on it to date? So it&#39;s a sort of &#39;sleeper&#39=
; issue?</div><div><br></div><div>Personally I think the proper fix is to (=
a) write the thread ID before the new thread starts executing because it=
9;s common sense that things should work that way, and existing code writte=
n with locks will continue to work, and existing code written without locks=
 will also just work, and (b) whether fixing or not, try to update the vari=
ous pthread_create() documentation to account for this edge case, so that i=
t&#39;s obvious one way or the other. This would (a) fix the cause of the c=
onfusion, (b) avoid users of pthread_create() having to fall back to common=
 sense, and (c) optionally make all existing caller implementations more ro=
bust?<br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex"><div>If the new thread needs its own id, there&#39;s an easy and porta=
ble way to obtain it: pthread_self().</div></blockquote><div><br></div><div=
>Interestingly, my initial implementation with pthread_self() actually fail=
ed via glibc, or at least one of the pthread functions using it seemed to f=
ail... I cannot remember the exact circumstances unfortunately. Which is wh=
y I started using the thread ID returned by pthread_create() in the first p=
lace, because that didn&#39;t fail and no lock was needed. My personal phil=
osophy with threads and locks is to try and minimize their use where-ever p=
ossible, and that&#39;s the path which prompted this email thread in the fi=
rst place because I developed the working code with glibc and only later tr=
ied compiling and running with musl and to my surprise it didn&#39;t work..=
.<br></div><div><br></div><div>Anyway, I hope this is food for thought and =
no worries if you decide not to change anything. It&#39;s been fun bringing=
 this to your attention and interacting with you! And if there&#39;s anythi=
ng more I can do to help further then please let me know!<br></div><div><br=
></div><div>--</div><div>Simon<br></div><div><br></div><div>[1] <a href=3D"=
https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html">ht=
tps://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html</a></=
div><div>[2] <a href=3D"http://man7.org/linux/man-pages/man3/pthread_create=
.3.html">http://man7.org/linux/man-pages/man3/pthread_create.3.html</a></di=
v><div>[3] <a href=3D"https://en.wikipedia.org/wiki/Defensive_programming">=
https://en.wikipedia.org/wiki/Defensive_programming</a></div></div><br><div=
 class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jan 27=
, 2020 at 9:51 AM Rich Felker &lt;<a href=3D"mailto:dalias@libc.org">dalias=
@libc.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">On Sun, Jan 26, 2020 at 04:33:57PM -0800, Simon wrote:<br>
&gt; Hello! I recently had some C code which works normally with glibc but =
seg<br>
&gt; faults sometimes with musl. I managed to reproduce the seg fault via<b=
r>
&gt; musl-gcc and Alpine Linux and document it here [1]. Seems to be some k=
ind<br>
&gt; of race condition, so hopefully you guys also get a seg fault when you=
<br>
&gt; follow my reproduction steps. Hope this helps and looking forward to a=
ny<br>
&gt; feedback or helping further if possible, Simon<br>
&gt; <br>
&gt; [1] <a href=3D"https://gist.github.com/simonhf/6d8097f4d6caa572cc42354=
f494b20ef" rel=3D"noreferrer" target=3D"_blank">https://gist.github.com/sim=
onhf/6d8097f4d6caa572cc42354f494b20ef</a><br>
<br>
This behavior was originally intentional. In general, if a function is<br>
specified to modify pointed-to memory as output as a side effect of<br>
success, that does not give it license to modify it on failure. And<br>
since pthread_create can&#39;t commit to success until after the thread is<=
br>
created, it would have to hold back start of the new thread<br>
unnecessarily to guarantee that the result is written before the new<br>
thread starts. (Note that it can&#39;t simply write the value from both<br>
the caller and the new thread; the latter could end up writing to the<br>
pthread_t object after the end of its lifetime.)<br>
<br>
Moreover, there is no expectation from the application that it should<br>
be able to read the result object from the new thread without<br>
additional synchronization. The wording of the spec is:<br>
<br>
=C2=A0 =C2=A0 &quot;Upon successful completion, pthread_create() shall stor=
e the ID<br>
=C2=A0 =C2=A0 =C2=A0^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
=C2=A0 =C2=A0 of the created thread in the location referenced by thread.&q=
uot;<br>
<br>
Until completion of (observation of &amp; synchronization with return<br>
from) pthread_create, nothing can be said about value of the object;<br>
access to it is unsynchronized.<br>
<br>
With that said, the specification for pthread_create does *allow*<br>
implementations that store the value speculatively before success:<br>
<br>
=C2=A0 =C2=A0 &quot;If pthread_create() fails, no new thread is created and=
 the<br>
=C2=A0 =C2=A0 contents of the location referenced by thread are undefined.&=
quot;<br>
<br>
I was not aware of this when writing it. So we could change it, but it<br>
doesn&#39;t seem like a very good idea to do so; any code relying on it is<=
br>
non-portable/racy. If the new thread needs its own id, there&#39;s an easy<=
br>
and portable way to obtain it: pthread_self().<br>
<br>
Are there reasons you still think the alternate behavior would be<br>
preferable?<br>
<br>
Rich<br>
</blockquote></div>

--000000000000fdaaab059d248b01--