[musl] ptc in pthread

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [musl] ptc in pthread
@ 2024-08-16  2:51 Zibin Liu
  2024-08-16 14:38 ` Markus Wichmann
  2024-08-16 15:40 ` Rich Felker
  0 siblings, 2 replies; 4+ messages in thread
From: Zibin Liu @ 2024-08-16  2:51 UTC (permalink / raw)
  To: musl

Hi,

I’m not sure if this is the appropriate mailing list for my question. If
it isn't, I’d appreciate it if someone could direct me to the correct
one.

I’m currently studying pthreads and related concepts, and I’ve come
across some code in pthread_create.c that I find a bit confusing.

In src/thread/pthread_create.c, I noticed the following:

int __pthread_create(pthread_t *restrict res, const pthread_attr_t
*restrict attrp, void *(*entry)(void *), void *restrict arg)
{
    ......

    __acquire_ptc();
    ......
    __release_ptc();
    ......
fail:
    __release_ptc();
    return EAGAIN;
}

It appears that when pthread_create is called, it acquires a lock
(using __acquire_ptc()) and releases it afterward. I’m wondering why
this locking mechanism is necessary.

Additionally, I observed that a related lock is acquired during dlopen
in ldso/dynlink.c:

void *dlopen(const char *file, int mode)
{
    ......
    __inhibit_ptc();
    ......
end:
    ......
    __release_ptc();
    ......
    return p;
}

From this, it seems that when dlopen is called, creating a new pthread
is not allowed during the process. Does this mean that it’s entirely
prohibited to create any threads (even if one were to use a custom thread
library specifically within dlopen) during the execution of dlopen?

I also traced the commit logs and found that the 'ptc' mechanism was
introduced in commit dcd6037, with the following message:

> I've re-appropriated the lock that was previously used for __synccall
> (synchronizing set*id() syscalls between threads) as a general
> pthread_create lock. it's a "backwards" rwlock where the "read"
> operation is safe atomic modification of the live thread count, which
> multiple threads can perform at the same time, and the "write"
> operation is making sure the count does not increase during an
> operation that depends on it remaining bounded (__synccall or dlopen).
> in static-linked programs that don't use __synccall, this lock is a
> no-op and has no cost.

Despite this, I’m still unclear on why dlopen needs to ensure that the
thread count does not increase. Could someone provide more details on
this?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] ptc in pthread
  2024-08-16  2:51 [musl] ptc in pthread Zibin Liu
@ 2024-08-16 14:38 ` Markus Wichmann
  2024-08-16 15:51   ` Rich Felker
  2024-08-16 15:40 ` Rich Felker
  1 sibling, 1 reply; 4+ messages in thread
From: Markus Wichmann @ 2024-08-16 14:38 UTC (permalink / raw)
  To: musl; +Cc: Zibin Liu

Am Fri, Aug 16, 2024 at 10:51:53AM +0800 schrieb Zibin Liu:
> Despite this, I’m still unclear on why dlopen needs to ensure that the
> thread count does not increase. Could someone provide more details on
> this?

This is in case a library is opened that contains TLS. In that case, the
thread calling dlopen() must allocate a new TLS block for the library
for every thread that currently exists, as well as a new DTV to contain
the pointers. If a thread could be created during this, obviously there
could be a thread created without that TLS block.

musl doesn't use the lazy TLS initialization scheme glibc uses, because
that one admits no failure. In that scheme, memory for the new TLS is
allocated in __tls_get_addr(), but if allocation fails, there is no
choice but to abort. In musl's implementation, the memory is allocated
in dlopen(), and if it cannot be allocated, the dlopen() fails.

The lock cannot be reduced in scope to the TLS installation, since each
library can pull in dependencies that can also have TLS.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] ptc in pthread
  2024-08-16  2:51 [musl] ptc in pthread Zibin Liu
  2024-08-16 14:38 ` Markus Wichmann
@ 2024-08-16 15:40 ` Rich Felker
  1 sibling, 0 replies; 4+ messages in thread
From: Rich Felker @ 2024-08-16 15:40 UTC (permalink / raw)
  To: Zibin Liu; +Cc: musl

On Fri, Aug 16, 2024 at 10:51:53AM +0800, Zibin Liu wrote:
> Hi,
> 
> I’m not sure if this is the appropriate mailing list for my question. If
> it isn't, I’d appreciate it if someone could direct me to the correct
> one.
> 
> I’m currently studying pthreads and related concepts, and I’ve come
> across some code in pthread_create.c that I find a bit confusing.
> 
> In src/thread/pthread_create.c, I noticed the following:
> 
> int __pthread_create(pthread_t *restrict res, const pthread_attr_t
> *restrict attrp, void *(*entry)(void *), void *restrict arg)
> {
>     ......
> 
>     __acquire_ptc();
>     ......
>     __release_ptc();
>     ......
> fail:
>     __release_ptc();
>     return EAGAIN;
> }
> 
> It appears that when pthread_create is called, it acquires a lock
> (using __acquire_ptc()) and releases it afterward. I’m wondering why
> this locking mechanism is necessary.

See below.

> Additionally, I observed that a related lock is acquired during dlopen
> in ldso/dynlink.c:
> 
> void *dlopen(const char *file, int mode)
> {
>     ......
>     __inhibit_ptc();
>     ......
> end:
>     ......
>     __release_ptc();
>     ......
>     return p;
> }
> 
> From this, it seems that when dlopen is called, creating a new pthread
> is not allowed during the process. Does this mean that it’s entirely
> prohibited to create any threads (even if one were to use a custom thread
> library specifically within dlopen) during the execution of dlopen?

"Custom threads library" is completely outside the scope of musl, but
if you were doing that, you could not call any libc code from any
thread it created, since they would not be executing with the expected
state/context (of having a thread pointer pointing to a valid and
unique-to-the-thread libc thread structure that's a member of the libc
thread list).

It also would not be using the same lock, so nothing about this lock
would apply to it.

"During the execution of dlopen" is not a particularly interesting
limitation. I suspect you might be thinking that this includes
execution of ctors of loaded libraries, which might take arbitrarily
long in application-provided code, but it does not. All dlopen-held
locks are released once success of the dlopen is committed (i.e. once
dependency load and relocations have finished without errors).

> I also traced the commit logs and found that the 'ptc' mechanism was
> introduced in commit dcd6037, with the following message:
> 
> > I've re-appropriated the lock that was previously used for __synccall
> > (synchronizing set*id() syscalls between threads) as a general
> > pthread_create lock. it's a "backwards" rwlock where the "read"
> > operation is safe atomic modification of the live thread count, which
> > multiple threads can perform at the same time, and the "write"
> > operation is making sure the count does not increase during an
> > operation that depends on it remaining bounded (__synccall or dlopen).
> > in static-linked programs that don't use __synccall, this lock is a
> > no-op and has no cost.
> 
> Despite this, I’m still unclear on why dlopen needs to ensure that the
> thread count does not increase. Could someone provide more details on
> this?

The purpose of the lock has evolved considerably since it was
originally added, and I think it's not really well-named or
well-described as what it's really for.

The easiest way to see what it's actually doing though is to look at
what shared (global) state is accessed under the lock by
pthread_create:

- default attributes values: __default_stacksize and
  __default_guardsize

- TLS storage space requirement: libc.tls_size

In addition, there's the current thread count/thread list, accessed
and modified under __acquire_ptc lock from pthread_create, but it's
also protected by its own lock since 8f11e6127f. Previously, it was
written by pthread_create without further lock beyond the (shared,
read-mode __acquire_ptc lock), but that was fine since was only a
counter and the write was atomic.

So, in current usage, the ptc (pthread_create) lock is really behaving
as a simple rwlock for shared state that's read by pthread_create and
written to only rarely in a few other places.

Does this information help?

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] ptc in pthread
  2024-08-16 14:38 ` Markus Wichmann
@ 2024-08-16 15:51   ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2024-08-16 15:51 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl, Zibin Liu

On Fri, Aug 16, 2024 at 04:38:39PM +0200, Markus Wichmann wrote:
> Am Fri, Aug 16, 2024 at 10:51:53AM +0800 schrieb Zibin Liu:
> > Despite this, I’m still unclear on why dlopen needs to ensure that the
> > thread count does not increase. Could someone provide more details on
> > this?
> 
> This is in case a library is opened that contains TLS. In that case, the
> thread calling dlopen() must allocate a new TLS block for the library
> for every thread that currently exists, as well as a new DTV to contain
> the pointers. If a thread could be created during this, obviously there
> could be a thread created without that TLS block.
> 
> musl doesn't use the lazy TLS initialization scheme glibc uses, because
> that one admits no failure. In that scheme, memory for the new TLS is
> allocated in __tls_get_addr(), but if allocation fails, there is no
> choice but to abort. In musl's implementation, the memory is allocated
> in dlopen(), and if it cannot be allocated, the dlopen() fails.
> 
> The lock cannot be reduced in scope to the TLS installation, since each
> library can pull in dependencies that can also have TLS.

I should probably go into a little bit more detail on this.

Since dynamic loading involves dynamic TLS, pthread_create and dlopen
need a contract between them for who is responsible for allocation of
memory for dynamic-loaded modules' TLS.

The way we do this is by making the operations ordered with respect to
each other, via a lock.

When pthread_create happens, it is responsible for allocation of TLS
storage for all modules that existed (as a result of initial program
load or dlopen) prior to the pthread_create call (prior to it taking
the __acquire_ptc lock).

When dlopen happens, it is responsible for allocation of TLS storage
for all threads that existed prior to the dlopen call (prior to it
taking the __inhibit_ptc lock).

One might think dlopen could release the ptc lock earlier once it
finishes loading libaries, before it does the time-costlier relocation
part. However, at this point success of the dlopen isn't committed, so
pthread_create could see a wrong speculative value of the needed tls
size, which ends up getting reverted before dlopen returns.

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-08-16 15:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-16  2:51 [musl] ptc in pthread Zibin Liu
2024-08-16 14:38 ` Markus Wichmann
2024-08-16 15:51   ` Rich Felker
2024-08-16 15:40 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).