mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] pthread_mutex_t shared between processes with different pid namespaces
@ 2025-01-28 13:22 Daniele Personal
  2025-01-28 15:02 ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Daniele Personal @ 2025-01-28 13:22 UTC (permalink / raw)
  To: musl

Hello everyone,
I'm working on a library linked by some processes in order to exchange
information. Such library uses some pthread_mutex_t instances to safely
read/write the information to exchange: the mutexes are created with
the PTHREAD_PROCESS_SHARED and PTHREAD_MUTEX_ROBUST attributes and
shared through shared memory mmapped by the processes.

Now, for certain reasons, I have to run one of the processes in a
container and I found that, after a random interval of time, the
process in the container got stuck in a pthread_mutex_lock without any
reason.

After some investigation I figured out that if the container is started
without pid namespace isolation everithing works like a charm.

So the questions: is the pid namespace isolation a problem when working
with shared mutexes or should I investigate in other directions?
If the problem is pid namespace isolation, what could be done to make
it working apart from sharing the same pid namespace?

The actual development is based on musl 1.2.4 built with Yocto
Scarthgap for aarch64 and arm.

Thanks in advance for any help,
Daniele.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-28 13:22 [musl] pthread_mutex_t shared between processes with different pid namespaces Daniele Personal
@ 2025-01-28 15:02 ` Rich Felker
  2025-01-28 16:13   ` Daniele Personal
  2025-01-28 18:24   ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: Rich Felker @ 2025-01-28 15:02 UTC (permalink / raw)
  To: Daniele Personal; +Cc: musl

On Tue, Jan 28, 2025 at 02:22:31PM +0100, Daniele Personal wrote:
> Hello everyone,
> I'm working on a library linked by some processes in order to exchange
> information. Such library uses some pthread_mutex_t instances to safely
> read/write the information to exchange: the mutexes are created with
> the PTHREAD_PROCESS_SHARED and PTHREAD_MUTEX_ROBUST attributes and
> shared through shared memory mmapped by the processes.
> 
> Now, for certain reasons, I have to run one of the processes in a
> container and I found that, after a random interval of time, the
> process in the container got stuck in a pthread_mutex_lock without any
> reason.
> 
> After some investigation I figured out that if the container is started
> without pid namespace isolation everithing works like a charm.
> 
> So the questions: is the pid namespace isolation a problem when working
> with shared mutexes or should I investigate in other directions?
> If the problem is pid namespace isolation, what could be done to make
> it working apart from sharing the same pid namespace?
> 
> The actual development is based on musl 1.2.4 built with Yocto
> Scarthgap for aarch64 and arm.

Yes, the pid namespace boundary is your problem. Process-shared
mutexes only work on the same logical system with a unique set of
thread identifiers. If you're trying to share them across different
pid namespaces, the same pid/tid may refer to different
processes/threads in different ones, and it's not usable as a mutex
ownership identity.

If you want robust-mutex-like functionality that bridges pid
namespaces, sysv semaphores are probably your only option.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-28 15:02 ` Rich Felker
@ 2025-01-28 16:13   ` Daniele Personal
  2025-01-28 18:24   ` Florian Weimer
  1 sibling, 0 replies; 28+ messages in thread
From: Daniele Personal @ 2025-01-28 16:13 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Tue, 2025-01-28 at 10:02 -0500, Rich Felker wrote:
> On Tue, Jan 28, 2025 at 02:22:31PM +0100, Daniele Personal wrote:
> > Hello everyone,
> > I'm working on a library linked by some processes in order to
> > exchange
> > information. Such library uses some pthread_mutex_t instances to
> > safely
> > read/write the information to exchange: the mutexes are created
> > with
> > the PTHREAD_PROCESS_SHARED and PTHREAD_MUTEX_ROBUST attributes and
> > shared through shared memory mmapped by the processes.
> > 
> > Now, for certain reasons, I have to run one of the processes in a
> > container and I found that, after a random interval of time, the
> > process in the container got stuck in a pthread_mutex_lock without
> > any
> > reason.
> > 
> > After some investigation I figured out that if the container is
> > started
> > without pid namespace isolation everithing works like a charm.
> > 
> > So the questions: is the pid namespace isolation a problem when
> > working
> > with shared mutexes or should I investigate in other directions?
> > If the problem is pid namespace isolation, what could be done to
> > make
> > it working apart from sharing the same pid namespace?
> > 
> > The actual development is based on musl 1.2.4 built with Yocto
> > Scarthgap for aarch64 and arm.
> 
> Yes, the pid namespace boundary is your problem. Process-shared
> mutexes only work on the same logical system with a unique set of
> thread identifiers. If you're trying to share them across different
> pid namespaces, the same pid/tid may refer to different
> processes/threads in different ones, and it's not usable as a mutex
> ownership identity.
> 
> If you want robust-mutex-like functionality that bridges pid
> namespaces, sysv semaphores are probably your only option.
> 
> Rich

Oh, thanks for clarifying it.

But at a first glance at man 2 semop I see:

Each semaphore in a System V semaphore set has the following
       associated values:

           unsigned short  semval;   /* semaphore value */
           unsigned short  semzcnt;  /* # waiting for zero */
           unsigned short  semncnt;  /* # waiting for increase */
           pid_t           sempid;   /* PID of process that last
                                        modified the semaphore value */

Could it be that also sysv semaphores don't like different pid
namespaces?

If so, the only chance is to use the same pid namespace, isn't it?

Daniele.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-28 15:02 ` Rich Felker
  2025-01-28 16:13   ` Daniele Personal
@ 2025-01-28 18:24   ` Florian Weimer
  2025-01-31  9:31     ` Daniele Personal
  1 sibling, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2025-01-28 18:24 UTC (permalink / raw)
  To: Rich Felker; +Cc: Daniele Personal, musl

* Rich Felker:

> Yes, the pid namespace boundary is your problem. Process-shared
> mutexes only work on the same logical system with a unique set of
> thread identifiers. If you're trying to share them across different
> pid namespaces, the same pid/tid may refer to different
> processes/threads in different ones, and it's not usable as a mutex
> ownership identity.

Is this required for implementing the unlock-if-not-owner error code on
mutex unlock?

By the way, there is a proposal to teach the kernel to rewrite the
ownership list of task exit:

  [PATCH v2 0/4] futex: Drop ROBUST_LIST_LIMIT
  <https://lore.kernel.org/linux-kernel/20250127202608.223864-1-andrealmeid@igalia.com/>

I'm worried about the compatibility impact.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-28 18:24   ` Florian Weimer
@ 2025-01-31  9:31     ` Daniele Personal
  2025-01-31 20:30       ` Markus Wichmann
  2025-02-01 16:03       ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: Daniele Personal @ 2025-01-31  9:31 UTC (permalink / raw)
  To: Florian Weimer, Rich Felker; +Cc: musl

On Tue, 2025-01-28 at 19:24 +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > Yes, the pid namespace boundary is your problem. Process-shared
> > mutexes only work on the same logical system with a unique set of
> > thread identifiers. If you're trying to share them across different
> > pid namespaces, the same pid/tid may refer to different
> > processes/threads in different ones, and it's not usable as a mutex
> > ownership identity.

From what I see, the problem seems to happen only in case of contention
of the mutex.

int __pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct
timespec *restrict at)
{
	if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL
	    && !a_cas(&m->_m_lock, 0, EBUSY))
		return 0;

	int type = m->_m_type;
	int r, t, priv = (type & 128) ^ 128;

	r = __pthread_mutex_trylock(m);
	if (r != EBUSY) return r;

IIUC, if it is not locked, the __pthread_mutex_timedlock will acquire
it and return 0 (don't understand if with the first check or with the
__pthread_mutex_trylock) and everything works.

If instead it is locked the problem arises only inside the container.
If it was a pthread_mutex_lock it waits forever, if it was a timed lock
it exits after the timeout and you can retry.

Is this correct?

> 
> Is this required for implementing the unlock-if-not-owner error code
> on
> mutex unlock?

No, I don't see problems related to EOWNERDEAD.

> 
> By the way, there is a proposal to teach the kernel to rewrite the
> ownership list of task exit:
> 
>   [PATCH v2 0/4] futex: Drop ROBUST_LIST_LIMIT
>  
> <https://lore.kernel.org/linux-kernel/20250127202608.223864-1-andreal
> meid@igalia.com/>
> 
> I'm worried about the compatibility impact.
> 
> Thanks,
> Florian
> 

Thanks,
Daniele.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-31  9:31     ` Daniele Personal
@ 2025-01-31 20:30       ` Markus Wichmann
  2025-02-03 13:54         ` Daniele Personal
  2025-02-01 16:03       ` Florian Weimer
  1 sibling, 1 reply; 28+ messages in thread
From: Markus Wichmann @ 2025-01-31 20:30 UTC (permalink / raw)
  To: musl; +Cc: Daniele Personal

Am Fri, Jan 31, 2025 at 10:31:46AM +0100 schrieb Daniele Personal:
> IIUC, if it is not locked, the __pthread_mutex_timedlock will acquire
> it and return 0 (don't understand if with the first check or with the
> __pthread_mutex_trylock) and everything works.
>
> If instead it is locked the problem arises only inside the container.
> If it was a pthread_mutex_lock it waits forever, if it was a timed lock
> it exits after the timeout and you can retry.
>
> Is this correct?
>

Essentially yes. If uncontended, kernel space never gets involved at all
and everything just works, but if contended, futex wait and futex wake
do not meet each other if issued from different PID namespaces. Thus
they end up waiting until the timeout expires. Unless there is no
timeout, then they wait until the user gets bored and kills the process.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-31  9:31     ` Daniele Personal
  2025-01-31 20:30       ` Markus Wichmann
@ 2025-02-01 16:03       ` Florian Weimer
  2025-02-03 12:58         ` Daniele Personal
  1 sibling, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2025-02-01 16:03 UTC (permalink / raw)
  To: Daniele Personal; +Cc: Rich Felker, d.dario76, musl

* Daniele Personal:

>> Is this required for implementing the unlock-if-not-owner error code
>> on mutex unlock?
>
> No, I don't see problems related to EOWNERDEAD.

Sorry, what I meant is that the TID is needed for efficient reporting of
usage errors.  It's not imposed by the robust list protocol as such.
There could be a PID-namespace-compatible robust mutex type that does
not have this problem (but with less error checking).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-01 16:03       ` Florian Weimer
@ 2025-02-03 12:58         ` Daniele Personal
  2025-02-03 17:25           ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Daniele Personal @ 2025-02-03 12:58 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rich Felker, d.dario76, musl

On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> * Daniele Personal:
> 
> > > Is this required for implementing the unlock-if-not-owner error
> > > code
> > > on mutex unlock?
> > 
> > No, I don't see problems related to EOWNERDEAD.
> 
> Sorry, what I meant is that the TID is needed for efficient reporting
> of
> usage errors.  It's not imposed by the robust list protocol as such.
> There could be a PID-namespace-compatible robust mutex type that does
> not have this problem (but with less error checking).
> 
> Thanks,
> Florian
> 

Are you saying that there are pthread_mutexes which can be shared
across processes run on different pid namespaces? If yes I'm definitely
interested on this. Can you tell me something more?

What do you mean with "less error checking"? I need for sure to be able
to detect the EOWNERDEAD condition in order to restore consistency but
I'm not interested in recursive locks.

Thanks,
Daniele. 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-01-31 20:30       ` Markus Wichmann
@ 2025-02-03 13:54         ` Daniele Personal
  0 siblings, 0 replies; 28+ messages in thread
From: Daniele Personal @ 2025-02-03 13:54 UTC (permalink / raw)
  To: Markus Wichmann, musl

On Fri, 2025-01-31 at 21:30 +0100, Markus Wichmann wrote:
> Am Fri, Jan 31, 2025 at 10:31:46AM +0100 schrieb Daniele Personal:
> > IIUC, if it is not locked, the __pthread_mutex_timedlock will
> > acquire
> > it and return 0 (don't understand if with the first check or with
> > the
> > __pthread_mutex_trylock) and everything works.
> > 
> > If instead it is locked the problem arises only inside the
> > container.
> > If it was a pthread_mutex_lock it waits forever, if it was a timed
> > lock
> > it exits after the timeout and you can retry.
> > 
> > Is this correct?
> > 
> 
> Essentially yes. If uncontended, kernel space never gets involved at
> all
> and everything just works, but if contended, futex wait and futex
> wake
> do not meet each other if issued from different PID namespaces. Thus
> they end up waiting until the timeout expires. Unless there is no
> timeout, then they wait until the user gets bored and kills the
> process.
> 
> Ciao,
> Markus

Thanks for the explanation. So there's no way to have pthread_mutexes
mapped on shared memory shared between host and container working when
the container is created with its own pid namespece?

Daniele.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-03 12:58         ` Daniele Personal
@ 2025-02-03 17:25           ` Florian Weimer
  2025-02-04 16:48             ` Daniele Personal
  2025-02-04 18:53             ` Rich Felker
  0 siblings, 2 replies; 28+ messages in thread
From: Florian Weimer @ 2025-02-03 17:25 UTC (permalink / raw)
  To: Daniele Personal; +Cc: d.dario76, Rich Felker, musl

* Daniele Personal:

> On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
>> * Daniele Personal:
>> 
>> > > Is this required for implementing the unlock-if-not-owner error
>> > > code
>> > > on mutex unlock?
>> > 
>> > No, I don't see problems related to EOWNERDEAD.
>> 
>> Sorry, what I meant is that the TID is needed for efficient reporting
>> of
>> usage errors.  It's not imposed by the robust list protocol as such.
>> There could be a PID-namespace-compatible robust mutex type that does
>> not have this problem (but with less error checking).
>> 
>> Thanks,
>> Florian
>> 
>
> Are you saying that there are pthread_mutexes which can be shared
> across processes run on different pid namespaces? If yes I'm definitely
> interested on this. Can you tell me something more?

You would have to add a new mutex type that is a mix of
PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the latter,
but without the ownership checks.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-03 17:25           ` Florian Weimer
@ 2025-02-04 16:48             ` Daniele Personal
  2025-02-04 18:53             ` Rich Felker
  1 sibling, 0 replies; 28+ messages in thread
From: Daniele Personal @ 2025-02-04 16:48 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rich Felker, musl

[-- Attachment #1: Type: text/plain, Size: 2600 bytes --]

On Mon, 2025-02-03 at 18:25 +0100, Florian Weimer wrote:
> * Daniele Personal:
> 
> > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > * Daniele Personal:
> > > 
> > > > > Is this required for implementing the unlock-if-not-owner
> > > > > error
> > > > > code
> > > > > on mutex unlock?
> > > > 
> > > > No, I don't see problems related to EOWNERDEAD.
> > > 
> > > Sorry, what I meant is that the TID is needed for efficient
> > > reporting
> > > of
> > > usage errors.  It's not imposed by the robust list protocol as
> > > such.
> > > There could be a PID-namespace-compatible robust mutex type that
> > > does
> > > not have this problem (but with less error checking).
> > > 
> > > Thanks,
> > > Florian
> > > 
> > 
> > Are you saying that there are pthread_mutexes which can be shared
> > across processes run on different pid namespaces? If yes I'm
> > definitely
> > interested on this. Can you tell me something more?
> 
> You would have to add a new mutex type that is a mix of
> PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the latter,
> but without the ownership checks.
> 
> Thanks,
> Florian
> 

I wrote a stupid test to exercise things. It creates (if needed) a
shared object which will hold the pthread_mutex_t instance, mmaps it
and if the shared object has been created, initializes the mutex
(changing mutex initialization requires to delete the shared object).

Then it locks the mutex for 5 seconds, unlocks it and locks it again
after another 2 seconds and exits with the mutex locked.

I compiled the code linking to musl 1.2.4 with Yocto or for my laptop
linking to glibc 2.34 with

gcc -D_GNU_SOURCE -Wall -pthread -g -O2 -c -o main.o main.c
gcc -pthread -g -O2 -pthread -o mutex-test main.o -lrt

I have a container which is started with its own pid, mount and uts
namespaces (it shares IPC and network with the host).

Running the test separately on host and container, both can recognize
the case where the previous instance left the mutex locked and can
recover from it.

Running them in parallel gives two distinct results if the mutex is
initialized with PTHREAD_PRIO_INHERIT protocol or PTHREAD_PRIO_NONE.

If PTHREAD_PRIO_INHERIT is set, in case of contention, the waiter gets
stuck.

If PTHREAD_PRIO_NONE is used, everything seems to work fine: the
application which starts later waits for the mutex to be released by
the other one and gets waked property.

I now need to understand if this behavior is expected and reliable or
not.

Thanks in advance,
Daniele.

[-- Attachment #2: main.c --]
[-- Type: text/x-csrc, Size: 11941 bytes --]

/*
 * main.c
 *
 *  Created on: Feb 4, 2025
 */

#include <time.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>

/**
 * Definition of shared object to be
 * mmapped() by the different processes
 */
typedef struct _SharedMutex
{
  /* A magic number just to check if object has been initialized */
  uint32_t magic;

  /* The attributes used to initialize the shared mutex */
  pthread_mutexattr_t attr;

  /* The shared mutex */
  pthread_mutex_t lock;
} SharedMutex;

#define MAGIC 0xdeadbeef

/** Default name to use when printing debug stuff */
static const char *app = "A";

#define DBG_PRINT(f,a...) \
  do {\
      struct timespec now; \
      clock_gettime (CLOCK_MONOTONIC, &now); \
      printf ("(%s) [%li.%09li] "f"\n", app, now.tv_sec, now.tv_nsec, ##a); \
  } while (0);

/** Protocol type: by default PTHREAD_PRIO_NONE */
static int prio = PTHREAD_PRIO_NONE;

/**
 * @brief Initialize a mutex instance setting #PTHREAD_MUTEX_ROBUST and
 *        #PTHREAD_PROCESS_SHARED attributes. This allows to have a mutex
 *        usable within different processes and recoverable in case the
 *        process which owns it crashes.
 *
 * @param lock A #pthread_mutex_t instance to initialize
 *
 * @return 0 on success, -1 otherwise
 */
static int mutex_init (pthread_mutex_t *lock, pthread_mutexattr_t *attr)
{
  int res;

  pthread_mutexattr_init (attr);

  DBG_PRINT ("setting PTHREAD_PROCESS_SHARED attribute");
  res = pthread_mutexattr_setpshared (attr, PTHREAD_PROCESS_SHARED);
  if (res != 0)
    {
      /* Failed to set SHARED */
      printf ("failed to set PTHREAD_PROCESS_SHARED: %i %s\n", res,
          strerror (res));
      return -1;
    }

  if (prio != PTHREAD_PRIO_NONE)
    {
      DBG_PRINT ("setting PTHREAD_PRIO_INHERIT attribute");
      res = pthread_mutexattr_setprotocol (attr, PTHREAD_PRIO_INHERIT);
      if (res != 0)
        {
          /* Failed to set protocol */
          printf ("failed to set PTHREAD_PRIO_INHERIT: %i %s\n", res,
              strerror (res));
          return -1;
        }
    }

  DBG_PRINT ("setting PTHREAD_MUTEX_ROBUST attribute");
  res = pthread_mutexattr_setrobust (attr, PTHREAD_MUTEX_ROBUST);
  if (res != 0)
    {
      printf ("failed to set PTHREAD_MUTEX_ROBUST: %i %s\n", res,
          strerror (res));
      return -1;
    }

  /*
   * Initialize mutex instance.
   * These method always return 0 so no need to check for success.
   */
  pthread_mutex_init (lock, attr);

  return 0;
}

/**
 * @brief Initialize a mutex instance setting #PTHREAD_MUTEX_ROBUST and
 *        #PTHREAD_PROCESS_SHARED attributes. This allows to have a mutex
 *        usable within different processes and recoverable in case the
 *        process which owns it crashes.
 *
 * @param lock A #pthread_mutex_t instance to initialize
 *
 * @return 0 on success, -1 otherwise
 */
static int mutexattr_check (pthread_mutexattr_t *attr)
{
  int res;
  int val;

  DBG_PRINT ("checking PTHREAD_PROCESS attribute");
  res = pthread_mutexattr_getpshared (attr, &val);
  if (res != 0)
    {
      /* Failed to set SHARED */
      printf ("failed to get PTHREAD_PROCESS attribute: %i %s\n",
          res, strerror (res));
      return -1;
    }
  if (val != PTHREAD_PROCESS_SHARED)
    {
      printf ("mutex was not initialized with PTHREAD_PROCESS_SHARED"
          " attribute\n");
      return -1;
    }

  DBG_PRINT ("checking protocol attribute");
  res = pthread_mutexattr_getprotocol (attr, &val);
  if (res != 0)
    {
      /* Failed to set protocol */
      printf ("failed to get protocol attribute: %i %s\n", res,
          strerror (res));
      return -1;
    }
  if (val != prio)
    {
      printf ("mutex was initialized with protocol %s but we wanted %s\n",
          (val == PTHREAD_PRIO_NONE) ? "PTHREAD_PRIO_NONE" : "PTHREAD_PRIO_INHERIT",
          (prio == PTHREAD_PRIO_NONE) ? "PTHREAD_PRIO_NONE" : "PTHREAD_PRIO_INHERIT");
      return -1;
    }

  DBG_PRINT ("checking PTHREAD_MUTEX_ROBUST attribute");
  res = pthread_mutexattr_getrobust (attr, &val);
  if (res != 0)
    {
      printf ("failed to get robust attribute: %i %s\n", res,
          strerror (res));
      return -1;
    }
  if (val != PTHREAD_MUTEX_ROBUST)
    {
      printf ("mutex was not initialized with PTHREAD_MUTEX_ROBUST"
          " attribute\n");
      return -1;
    }

  return 0;
}

/**
 * @brief Lock the given mutex. If the mutex is currently unlocked, it
 *        becomes locked and owned by the calling thread, and method
 *        returns immediately. If the mutex is already locked by another
 *        thread, method suspends the calling thread until the mutex is
 *        unlocked. If the thread which was owning the mutex terminates,
 *        method restores consistency of the mutex and locks it returning 0.
 *
 * @param lock A #pthread_mutex_t instance to lock
 *
 * @return 0 on success, a positive error code otherwise
 */
static int mutex_lock (pthread_mutex_t *lock)
{
  /* Try to lock mutex */
  int res = pthread_mutex_lock (lock);
  switch (res) {
    case 0:
      /* Common use case */
      break;
    case EOWNERDEAD:
      /*
       * Process which was holding the mutex terminated: now we're
       * holding it but before to go on we have to make it consistent.
       */
      DBG_PRINT ("restoring mutex consistency");
      res = pthread_mutex_consistent (lock);
      if (res != 0)
        {
          /* Failed to restore consistency */
          printf ("failed to restore consistency of mutex: %i %s\n", res,
              strerror (res));
        }

      break;
    default:
      printf ("failed to lock mutex: %i %s\n", res, strerror (res));
      break;
  }

  return res;
}

/**
 * @brief Try to open @shmfile shared object and if this is the first instance
 *        that opens it, initialize shared memory.
 *
 * @param name Name of the shared memory object to be created or opened.
 *             For portable use, a shared memory object should be identified
 *             by a name of the form /some-name; that is, a null-terminated
 *             string of up to NAME_MAX (i.e., 255) characters consisting of
 *             an initial slash, followed by one or more characters, none of
 *             which are slashes.
 *
 * @return pointer to #SharedMutex instance on success, NULL otherwise
 */
static SharedMutex *shm_map (const char *name)
{
#define SHM_OPS_INIT  1
#define SHM_OPS_CHECK 2

  int fd;
  int res;
  int ops = SHM_OPS_INIT;
  size_t size = sizeof (SharedMutex);

  /* Sanity checks */
  if (name == NULL)
    {
      printf ("unable to open shared object: missing shared object name\n");
      return NULL;
    }

  /* Open handle to shared object */
  fd = shm_open (name, O_RDWR | O_CREAT | O_EXCL, 0600);
  if (fd < 0 && errno == EEXIST)
    {
      /*
       * If memory object @name already exists, another process has
       * initialized the memory area.
       */
      ops = SHM_OPS_CHECK;
      fd = shm_open (name, O_RDWR, 0600);
    }

  if (fd < 0)
    {
      /* Failed to open shared object */
      printf ("unable to open shared object %s: %i %s\n", name, errno,
          strerror (errno));
      return NULL;
    }

  if (ops == SHM_OPS_INIT)
    {
      /* Set desired size of shared object */
      res = ftruncate (fd, (off_t) size);
      if (res < 0)
        {
          printf ("unable to set size of shared object %s: %i %s\n", name,
              errno, strerror (errno));
          close (fd);
          shm_unlink (name);
          return NULL;
        }
    }

  /* Map shared memory region */
  void *p = mmap (NULL, size, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, 0);
  if (p == MAP_FAILED)
    {
      printf ("unable to map shared object %s contents: %i %s\n", name,
          errno, strerror (errno));

      close (fd);
      if (ops == SHM_OPS_INIT)
        {
          /* Also unlink shared object */
          shm_unlink (name);
        }

      return NULL;
    }

  /* We can safely close file descriptor */
  close (fd);

  /* Helper to access shared info */
  SharedMutex *info = p;

  switch (ops) {
    case SHM_OPS_INIT:
      /*
       * Clear contents of shared memory area before to start working on it.
       * This should not be needed because #ftruncate() should do it but
       * it doesn't harm.
       */
      memset (p, 0, size);

      /* Initialize shared lock */
      if (mutex_init (&info->lock, &info->attr) != 0)
        {
          /* Cleanup stuff */
          munmap (p, size);

          /* Remove shared object */
          shm_unlink (name);

          return NULL;
        }

      /* Write magic number */
      info->magic = MAGIC;
      break;
    case SHM_OPS_CHECK:
      /*
       * Shared object has already been created. Last thing that is set is the
       * magic value: loop until it becomes valid before to do other things.
       */
      while (info->magic != MAGIC)
        {
          if (info->magic != 0 && info->magic != MAGIC)
            {
              printf ("shared object %s initialized with wrong magic:"
                  " aborting\n", name);
              return NULL;
            }

          sched_yield ();
        }

      /* Check if attributes are consistent with our choices */
      res = mutexattr_check (&info->attr);
      if (res != 0)
        {
          printf ("mutex attributes incompatible: aborting\n");

          /* Cleanup stuff */
          munmap (p, size);

          return NULL;
        }

      break;
    default:
      /* This should never happen */
      printf ("invalid initialization option\n");
      return NULL;
  }

  /* Return mapped shared object */
  return info;
}

/**
 * @brief Show application usage
 */
static void usage (const char *name)
{
  printf ("Usage: %s [options]\n", name);
  printf (" -h | --help : show this help\n");
  printf (" -n | --name : a string to distinguish between running instances"
      " (default \"A\"\n");
  printf (" -p | --prio_inherit : set PTHREAD_PRIO_INHERIT (default"
      " PTHREAD_PRIO_NONE)\n");
}

/**
 * Main method
 */
int main (int argc, char *argv [])
{
  int res;
  SharedMutex *obj;

  for (int c = 1; c < argc; c++)
    {
      if (strcmp (argv [c], "-h") == 0 ||
          strcmp (argv [c], "--help") == 0)
        {
          usage (argv [0]);
          return -1;
        }

      if (strcmp (argv [c], "-n") == 0 ||
          strcmp (argv [c], "--name") == 0)
        {
          if ((++c) >= argc)
            {
              /* Missing name argument */
              usage (argv [0]);
              return -1;
            }

          /* Replace application name */
          app = argv [c];
          continue;
        }

      if (strcmp (argv [c], "-p") == 0 ||
          strcmp (argv [c], "--prio-inherit") == 0)
        {
          /* Set priority inheritance */
          prio = PTHREAD_PRIO_INHERIT;
          continue;
        }
    }

  DBG_PRINT ("Mapping shared object");
  obj = shm_map ("/test");
  if (obj == NULL)
    {
      /* Failed to create/open shared object */
      return -1;
    }

  DBG_PRINT ("locking mutex");
  res = mutex_lock (&obj->lock);
  if (res != 0)
    {
      /* Failed to lock mutex */
      return -1;
    }

  DBG_PRINT ("waiting 5 [s] before to unlock mutex");
  sleep (5);

  DBG_PRINT ("releasing mutex");
  res = pthread_mutex_unlock (&obj->lock);
  if (res != 0)
    {
      /* Failed to unlock */
      DBG_PRINT ("failed to unlock mutex: %i %s", res, strerror (res));
      return -1;
    }

  DBG_PRINT ("waiting 2 [s] before to retry locking mutex");
  sleep (2);

  DBG_PRINT ("locking mutex");
  res = mutex_lock (&obj->lock);
  if (res != 0)
    {
      /* Failed to lock mutex */
      return -1;
    }

  DBG_PRINT ("terminating with mutex locked");

  return 0;
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-03 17:25           ` Florian Weimer
  2025-02-04 16:48             ` Daniele Personal
@ 2025-02-04 18:53             ` Rich Felker
  2025-02-05 10:17               ` Daniele Personal
  1 sibling, 1 reply; 28+ messages in thread
From: Rich Felker @ 2025-02-04 18:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Daniele Personal, d.dario76, musl

On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> * Daniele Personal:
> 
> > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> >> * Daniele Personal:
> >> 
> >> > > Is this required for implementing the unlock-if-not-owner error
> >> > > code
> >> > > on mutex unlock?
> >> > 
> >> > No, I don't see problems related to EOWNERDEAD.
> >> 
> >> Sorry, what I meant is that the TID is needed for efficient reporting
> >> of
> >> usage errors.  It's not imposed by the robust list protocol as such..
> >> There could be a PID-namespace-compatible robust mutex type that does
> >> not have this problem (but with less error checking).
> >> 
> >> Thanks,
> >> Florian
> >> 
> >
> > Are you saying that there are pthread_mutexes which can be shared
> > across processes run on different pid namespaces? If yes I'm definitely
> > interested on this. Can you tell me something more?
> 
> You would have to add a new mutex type that is a mix of
> PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the latter,
> but without the ownership checks.

This is inaccurate. Robust mutexes fundamentally depend on having the
owner's tid in the owner field, and on this value not matching the tid
of any other task that might hold the mutex. If these properties don't
hold, the mutex may fail to unlock when the owner dies, or incorrectly
unlock when another task mimicking the owner dies.

The Linux robust mutex protocol fundamentally does not work across pid
namespaces.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-04 18:53             ` Rich Felker
@ 2025-02-05 10:17               ` Daniele Personal
  2025-02-05 10:32                 ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Daniele Personal @ 2025-02-05 10:17 UTC (permalink / raw)
  To: Rich Felker, Florian Weimer; +Cc: musl

On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > * Daniele Personal:
> > 
> > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > * Daniele Personal:
> > > > 
> > > > > > Is this required for implementing the unlock-if-not-owner
> > > > > > error
> > > > > > code
> > > > > > on mutex unlock?
> > > > > 
> > > > > No, I don't see problems related to EOWNERDEAD.
> > > > 
> > > > Sorry, what I meant is that the TID is needed for efficient
> > > > reporting
> > > > of
> > > > usage errors.  It's not imposed by the robust list protocol as
> > > > such..
> > > > There could be a PID-namespace-compatible robust mutex type
> > > > that does
> > > > not have this problem (but with less error checking).
> > > > 
> > > > Thanks,
> > > > Florian
> > > > 
> > > 
> > > Are you saying that there are pthread_mutexes which can be shared
> > > across processes run on different pid namespaces? If yes I'm
> > > definitely
> > > interested on this. Can you tell me something more?
> > 
> > You would have to add a new mutex type that is a mix of
> > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
> > latter,
> > but without the ownership checks.
> 
> This is inaccurate. Robust mutexes fundamentally depend on having the
> owner's tid in the owner field, and on this value not matching the
> tid
> of any other task that might hold the mutex. If these properties
> don't
> hold, the mutex may fail to unlock when the owner dies, or
> incorrectly
> unlock when another task mimicking the owner dies.
> 
> The Linux robust mutex protocol fundamentally does not work across
> pid
> namespaces.
> 
> Rich

Looking at the code for musl 1.2.4, a pthread_mutex_t which has been
initialized as shared and robust but not PI capable leaves uncovered
only the case of pthread_mutex_unlock().

int __pthread_mutex_unlock(pthread_mutex_t *m)
{
	pthread_t self;
	int waiters = m->_m_waiters;
	int cont;
	int type = m->_m_type & 15;  <== type = 4 (robust)
	int priv = (m->_m_type & 128) ^ 128; <== priv = 0 (shared)
	int new = 0;
	int old;

	if (type != PTHREAD_MUTEX_NORMAL) {
		/* this is executed because type != 0 */
		self = __pthread_self();
		old = m->_m_lock;
		int own = old & 0x3fffffff;
		if (own != self->tid) <== TIDs could collide!!!
			return EPERM;
		if ((type&3) == PTHREAD_MUTEX_RECURSIVE && m-
>_m_count)
			return m->_m_count--, 0;
		^^^ not executed: type&3 = 0

		if ((type&4) && (old&0x40000000))
			new = 0x7fffffff;
		if (!priv) {
			/* this is executed */
			self->robust_list.pending = &m->_m_next;
			__vm_lock();
		}
		volatile void *prev = m->_m_prev;
		volatile void *next = m->_m_next;
		*(volatile void *volatile *)prev = next;
		if (next != &self->robust_list.head) *(volatile void
*volatile *)
			((char *)next - sizeof(void *)) = prev;
	}
	if (type&8) {
		/* this is NOT executed: not PI capable */
		if (old<0 || a_cas(&m->_m_lock, old, new)!=old) {
			if (new) a_store(&m->_m_waiters, -1);
			__syscall(SYS_futex, &m->_m_lock,
FUTEX_UNLOCK_PI|priv);
		}
		cont = 0;
		waiters = 0;
	} else {
		/* this is executed */
		cont = a_swap(&m->_m_lock, new);
	}
	if (type != PTHREAD_MUTEX_NORMAL && !priv) {
		/* this is executed */
		self->robust_list.pending = 0;
		__vm_unlock();
	}
	if (waiters || cont<0) 
		__wake(&m->_m_lock, 1, priv);
	return 0;
}

As mentioned by Rich, since TIDs are not unique across different
namespaces, a task might unlock a mutex hold by another one if they
have the same TID.

I don't see other possible errors, am I missing something?

Anyway, apart from the implementation which could improve or cover
corner cases, I have not found any paper which gives a clear statement
of how things should behave. Moreover it might be good to mention the
information that robust mutex protocol fundamentally does not work
across pid namespaces or with which limitations.

Daniele.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-05 10:17               ` Daniele Personal
@ 2025-02-05 10:32                 ` Florian Weimer
  2025-02-06  7:45                   ` Daniele Personal
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2025-02-05 10:32 UTC (permalink / raw)
  To: Daniele Personal; +Cc: Rich Felker, musl

* Daniele Personal:

> On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
>> On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
>> > * Daniele Personal:
>> > 
>> > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
>> > > > * Daniele Personal:
>> > > > 
>> > > > > > Is this required for implementing the unlock-if-not-owner
>> > > > > > error
>> > > > > > code
>> > > > > > on mutex unlock?
>> > > > > 
>> > > > > No, I don't see problems related to EOWNERDEAD.
>> > > > 
>> > > > Sorry, what I meant is that the TID is needed for efficient
>> > > > reporting
>> > > > of
>> > > > usage errors.  It's not imposed by the robust list protocol as
>> > > > such..
>> > > > There could be a PID-namespace-compatible robust mutex type
>> > > > that does
>> > > > not have this problem (but with less error checking).
>> > > > 
>> > > > Thanks,
>> > > > Florian
>> > > > 
>> > > 
>> > > Are you saying that there are pthread_mutexes which can be shared
>> > > across processes run on different pid namespaces? If yes I'm
>> > > definitely
>> > > interested on this. Can you tell me something more?
>> > 
>> > You would have to add a new mutex type that is a mix of
>> > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
>> > latter,
>> > but without the ownership checks.
>> 
>> This is inaccurate. Robust mutexes fundamentally depend on having the
>> owner's tid in the owner field, and on this value not matching the
>> tid of any other task that might hold the mutex. If these properties
>> don't hold, the mutex may fail to unlock when the owner dies, or
>> incorrectly unlock when another task mimicking the owner dies.
>> 
>> The Linux robust mutex protocol fundamentally does not work across
>> pid namespaces.

Thank you, Rich, for the correction.

> Looking at the code for musl 1.2.4, a pthread_mutex_t which has been
> initialized as shared and robust but not PI capable leaves uncovered
> only the case of pthread_mutex_unlock().

> As mentioned by Rich, since TIDs are not unique across different
> namespaces, a task might unlock a mutex hold by another one if they
> have the same TID.
>
> I don't see other possible errors, am I missing something?

The kernel code uses the owner TID to handle some special cases:

	/*
	 * Special case for regular (non PI) futexes. The unlock path in
	 * user space has two race scenarios:
	 *
	 * 1. The unlock path releases the user space futex value and
	 *    before it can execute the futex() syscall to wake up
	 *    waiters it is killed.
	 *
	 * 2. A woken up waiter is killed before it can acquire the
	 *    futex in user space.
	 *
	 * In the second case, the wake up notification could be generated
	 * by the unlock path in user space after setting the futex value
	 * to zero or by the kernel after setting the OWNER_DIED bit below.
	 *
	 * In both cases the TID validation below prevents a wakeup of
	 * potential waiters which can cause these waiters to block
	 * forever.
	 *
	 * In both cases the following conditions are met:
	 *
	 *	1) task->robust_list->list_op_pending != NULL
	 *	   @pending_op == true
	 *	2) The owner part of user space futex value == 0
	 *	3) Regular futex: @pi == false
	 *
	 * If these conditions are met, it is safe to attempt waking up a
	 * potential waiter without touching the user space futex value and
	 * trying to set the OWNER_DIED bit. If the futex value is zero,
	 * the rest of the user space mutex state is consistent, so a woken
	 * waiter will just take over the uncontended futex. Setting the
	 * OWNER_DIED bit would create inconsistent state and malfunction
	 * of the user space owner died handling. Otherwise, the OWNER_DIED
	 * bit is already set, and the woken waiter is expected to deal with
	 * this.
	 */
	owner = uval & FUTEX_TID_MASK;

	if (pending_op && !pi && !owner) {
		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
			   FUTEX_BITSET_MATCH_ANY);
		return 0;
	}

As a result, it's definitely just a userspace-only change if you need to
use the robust mutex list across PID namespaces.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-05 10:32                 ` Florian Weimer
@ 2025-02-06  7:45                   ` Daniele Personal
  2025-02-07 16:19                     ` Rich Felker
  2025-02-07 16:34                     ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: Daniele Personal @ 2025-02-06  7:45 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Rich Felker, musl

On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote:
> * Daniele Personal:
> 
> > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > > > * Daniele Personal:
> > > > 
> > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > > > * Daniele Personal:
> > > > > > 
> > > > > > > > Is this required for implementing the unlock-if-not-
> > > > > > > > owner
> > > > > > > > error
> > > > > > > > code
> > > > > > > > on mutex unlock?
> > > > > > > 
> > > > > > > No, I don't see problems related to EOWNERDEAD.
> > > > > > 
> > > > > > Sorry, what I meant is that the TID is needed for efficient
> > > > > > reporting
> > > > > > of
> > > > > > usage errors.  It's not imposed by the robust list protocol
> > > > > > as
> > > > > > such..
> > > > > > There could be a PID-namespace-compatible robust mutex type
> > > > > > that does
> > > > > > not have this problem (but with less error checking).
> > > > > > 
> > > > > > Thanks,
> > > > > > Florian
> > > > > > 
> > > > > 
> > > > > Are you saying that there are pthread_mutexes which can be
> > > > > shared
> > > > > across processes run on different pid namespaces? If yes I'm
> > > > > definitely
> > > > > interested on this. Can you tell me something more?
> > > > 
> > > > You would have to add a new mutex type that is a mix of
> > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
> > > > latter,
> > > > but without the ownership checks.
> > > 
> > > This is inaccurate. Robust mutexes fundamentally depend on having
> > > the
> > > owner's tid in the owner field, and on this value not matching
> > > the
> > > tid of any other task that might hold the mutex. If these
> > > properties
> > > don't hold, the mutex may fail to unlock when the owner dies, or
> > > incorrectly unlock when another task mimicking the owner dies.
> > > 
> > > The Linux robust mutex protocol fundamentally does not work
> > > across
> > > pid namespaces.
> 
> Thank you, Rich, for the correction.
> 
> > Looking at the code for musl 1.2.4, a pthread_mutex_t which has
> > been
> > initialized as shared and robust but not PI capable leaves
> > uncovered
> > only the case of pthread_mutex_unlock().
> 
> > As mentioned by Rich, since TIDs are not unique across different
> > namespaces, a task might unlock a mutex hold by another one if they
> > have the same TID.
> > 
> > I don't see other possible errors, am I missing something?
> 
> The kernel code uses the owner TID to handle some special cases:
> 
> 	/*
> 	 * Special case for regular (non PI) futexes. The unlock
> path in
> 	 * user space has two race scenarios:
> 	 *
> 	 * 1. The unlock path releases the user space futex value
> and
> 	 *    before it can execute the futex() syscall to wake up
> 	 *    waiters it is killed.
> 	 *
> 	 * 2. A woken up waiter is killed before it can acquire the
> 	 *    futex in user space.
> 	 *
> 	 * In the second case, the wake up notification could be
> generated
> 	 * by the unlock path in user space after setting the futex
> value
> 	 * to zero or by the kernel after setting the OWNER_DIED bit
> below.
> 	 *
> 	 * In both cases the TID validation below prevents a wakeup
> of
> 	 * potential waiters which can cause these waiters to block
> 	 * forever.
> 	 *
> 	 * In both cases the following conditions are met:
> 	 *
> 	 *	1) task->robust_list->list_op_pending != NULL
> 	 *	   @pending_op == true
> 	 *	2) The owner part of user space futex value == 0
> 	 *	3) Regular futex: @pi == false
> 	 *
> 	 * If these conditions are met, it is safe to attempt waking
> up a
> 	 * potential waiter without touching the user space futex
> value and
> 	 * trying to set the OWNER_DIED bit. If the futex value is
> zero,
> 	 * the rest of the user space mutex state is consistent, so
> a woken
> 	 * waiter will just take over the uncontended futex. Setting
> the
> 	 * OWNER_DIED bit would create inconsistent state and
> malfunction
> 	 * of the user space owner died handling. Otherwise, the
> OWNER_DIED
> 	 * bit is already set, and the woken waiter is expected to
> deal with
> 	 * this.
> 	 */
> 	owner = uval & FUTEX_TID_MASK;
> 
> 	if (pending_op && !pi && !owner) {
> 		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
> 			   FUTEX_BITSET_MATCH_ANY);
> 		return 0;
> 	}
> 
> As a result, it's definitely just a userspace-only change if you need
> to
> use the robust mutex list across PID namespaces.
> 

I tried to understand what you mean here but can't: can you please
explain me which userspace-only change is needed?

> Thanks,
> Florian
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-06  7:45                   ` Daniele Personal
@ 2025-02-07 16:19                     ` Rich Felker
  2025-02-08  9:20                       ` Daniele Dario
  2025-02-07 16:34                     ` Florian Weimer
  1 sibling, 1 reply; 28+ messages in thread
From: Rich Felker @ 2025-02-07 16:19 UTC (permalink / raw)
  To: Daniele Personal; +Cc: Florian Weimer, musl

On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote:
> On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote:
> > * Daniele Personal:
> > 
> > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > > > > * Daniele Personal:
> > > > > 
> > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > > > > * Daniele Personal:
> > > > > > > 
> > > > > > > > > Is this required for implementing the unlock-if-not-
> > > > > > > > > owner
> > > > > > > > > error
> > > > > > > > > code
> > > > > > > > > on mutex unlock?
> > > > > > > > 
> > > > > > > > No, I don't see problems related to EOWNERDEAD.
> > > > > > > 
> > > > > > > Sorry, what I meant is that the TID is needed for efficient
> > > > > > > reporting
> > > > > > > of
> > > > > > > usage errors.  It's not imposed by the robust list protocol
> > > > > > > as
> > > > > > > such..
> > > > > > > There could be a PID-namespace-compatible robust mutex type
> > > > > > > that does
> > > > > > > not have this problem (but with less error checking).
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Florian
> > > > > > > 
> > > > > > 
> > > > > > Are you saying that there are pthread_mutexes which can be
> > > > > > shared
> > > > > > across processes run on different pid namespaces? If yes I'm
> > > > > > definitely
> > > > > > interested on this. Can you tell me something more?
> > > > > 
> > > > > You would have to add a new mutex type that is a mix of
> > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
> > > > > latter,
> > > > > but without the ownership checks.
> > > > 
> > > > This is inaccurate. Robust mutexes fundamentally depend on having
> > > > the
> > > > owner's tid in the owner field, and on this value not matching
> > > > the
> > > > tid of any other task that might hold the mutex. If these
> > > > properties
> > > > don't hold, the mutex may fail to unlock when the owner dies, or
> > > > incorrectly unlock when another task mimicking the owner dies.
> > > > 
> > > > The Linux robust mutex protocol fundamentally does not work
> > > > across
> > > > pid namespaces.
> > 
> > Thank you, Rich, for the correction.
> > 
> > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has
> > > been
> > > initialized as shared and robust but not PI capable leaves
> > > uncovered
> > > only the case of pthread_mutex_unlock().
> > 
> > > As mentioned by Rich, since TIDs are not unique across different
> > > namespaces, a task might unlock a mutex hold by another one if they
> > > have the same TID.
> > > 
> > > I don't see other possible errors, am I missing something?
> > 
> > The kernel code uses the owner TID to handle some special cases:
> > 
> > 	/*
> > 	 * Special case for regular (non PI) futexes. The unlock
> > path in
> > 	 * user space has two race scenarios:
> > 	 *
> > 	 * 1. The unlock path releases the user space futex value
> > and
> > 	 *    before it can execute the futex() syscall to wake up
> > 	 *    waiters it is killed.
> > 	 *
> > 	 * 2. A woken up waiter is killed before it can acquire the
> > 	 *    futex in user space.
> > 	 *
> > 	 * In the second case, the wake up notification could be
> > generated
> > 	 * by the unlock path in user space after setting the futex
> > value
> > 	 * to zero or by the kernel after setting the OWNER_DIED bit
> > below.
> > 	 *
> > 	 * In both cases the TID validation below prevents a wakeup
> > of
> > 	 * potential waiters which can cause these waiters to block
> > 	 * forever.
> > 	 *
> > 	 * In both cases the following conditions are met:
> > 	 *
> > 	 *	1) task->robust_list->list_op_pending != NULL
> > 	 *	   @pending_op == true
> > 	 *	2) The owner part of user space futex value == 0
> > 	 *	3) Regular futex: @pi == false
> > 	 *
> > 	 * If these conditions are met, it is safe to attempt waking
> > up a
> > 	 * potential waiter without touching the user space futex
> > value and
> > 	 * trying to set the OWNER_DIED bit. If the futex value is
> > zero,
> > 	 * the rest of the user space mutex state is consistent, so
> > a woken
> > 	 * waiter will just take over the uncontended futex. Setting
> > the
> > 	 * OWNER_DIED bit would create inconsistent state and
> > malfunction
> > 	 * of the user space owner died handling. Otherwise, the
> > OWNER_DIED
> > 	 * bit is already set, and the woken waiter is expected to
> > deal with
> > 	 * this.
> > 	 */
> > 	owner = uval & FUTEX_TID_MASK;
> > 
> > 	if (pending_op && !pi && !owner) {
> > 		futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
> > 			   FUTEX_BITSET_MATCH_ANY);
> > 		return 0;
> > 	}
> > 
> > As a result, it's definitely just a userspace-only change if you need
> > to
> > use the robust mutex list across PID namespaces.
> > 
> 
> I tried to understand what you mean here but can't: can you please
> explain me which userspace-only change is needed?

No such change is possible. Robust futexes inherently rely on the
kernel being able to evaluate, on async process death, whether the
dying task was the owner of a mutex in the robust list. This depends
on the tid stored in memory being an accurate and unique identifier
for the task. If you violate this, you can hack things make the
userspace side work, but the whole robust functionality you want will
fail to work.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-06  7:45                   ` Daniele Personal
  2025-02-07 16:19                     ` Rich Felker
@ 2025-02-07 16:34                     ` Florian Weimer
  1 sibling, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2025-02-07 16:34 UTC (permalink / raw)
  To: Daniele Personal; +Cc: Rich Felker, musl

* Daniele Personal:

>> As a result, it's definitely [not] just a userspace-only change if
>> you need to use the robust mutex list across PID namespaces.
>> 
>
> I tried to understand what you mean here but can't: can you please
> explain me which userspace-only change is needed?

Sorry, there was a ”not” missing in the quoted conclusion.

Florian


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-07 16:19                     ` Rich Felker
@ 2025-02-08  9:20                       ` Daniele Dario
  2025-02-08 12:39                         ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Daniele Dario @ 2025-02-08  9:20 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, musl

[-- Attachment #1: Type: text/plain, Size: 6136 bytes --]

But wouldn't this mean that robust mutexes functionality is totally
incompatible with pid namespaces?
If the kernel relies on tid stored in memory by the process this always
lacks the information about the pid namespace the tid belongs to.

Daniele.


Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@libc.org> ha
scritto:

> On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote:
> > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote:
> > > * Daniele Personal:
> > >
> > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > > > > > * Daniele Personal:
> > > > > >
> > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > > > > > * Daniele Personal:
> > > > > > > >
> > > > > > > > > > Is this required for implementing the unlock-if-not-
> > > > > > > > > > owner
> > > > > > > > > > error
> > > > > > > > > > code
> > > > > > > > > > on mutex unlock?
> > > > > > > > >
> > > > > > > > > No, I don't see problems related to EOWNERDEAD.
> > > > > > > >
> > > > > > > > Sorry, what I meant is that the TID is needed for efficient
> > > > > > > > reporting
> > > > > > > > of
> > > > > > > > usage errors.  It's not imposed by the robust list protocol
> > > > > > > > as
> > > > > > > > such..
> > > > > > > > There could be a PID-namespace-compatible robust mutex type
> > > > > > > > that does
> > > > > > > > not have this problem (but with less error checking).
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Florian
> > > > > > > >
> > > > > > >
> > > > > > > Are you saying that there are pthread_mutexes which can be
> > > > > > > shared
> > > > > > > across processes run on different pid namespaces? If yes I'm
> > > > > > > definitely
> > > > > > > interested on this. Can you tell me something more?
> > > > > >
> > > > > > You would have to add a new mutex type that is a mix of
> > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
> > > > > > latter,
> > > > > > but without the ownership checks.
> > > > >
> > > > > This is inaccurate. Robust mutexes fundamentally depend on having
> > > > > the
> > > > > owner's tid in the owner field, and on this value not matching
> > > > > the
> > > > > tid of any other task that might hold the mutex. If these
> > > > > properties
> > > > > don't hold, the mutex may fail to unlock when the owner dies, or
> > > > > incorrectly unlock when another task mimicking the owner dies.
> > > > >
> > > > > The Linux robust mutex protocol fundamentally does not work
> > > > > across
> > > > > pid namespaces.
> > >
> > > Thank you, Rich, for the correction.
> > >
> > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has
> > > > been
> > > > initialized as shared and robust but not PI capable leaves
> > > > uncovered
> > > > only the case of pthread_mutex_unlock().
> > >
> > > > As mentioned by Rich, since TIDs are not unique across different
> > > > namespaces, a task might unlock a mutex hold by another one if they
> > > > have the same TID.
> > > >
> > > > I don't see other possible errors, am I missing something?
> > >
> > > The kernel code uses the owner TID to handle some special cases:
> > >
> > >     /*
> > >      * Special case for regular (non PI) futexes. The unlock
> > > path in
> > >      * user space has two race scenarios:
> > >      *
> > >      * 1. The unlock path releases the user space futex value
> > > and
> > >      *    before it can execute the futex() syscall to wake up
> > >      *    waiters it is killed.
> > >      *
> > >      * 2. A woken up waiter is killed before it can acquire the
> > >      *    futex in user space.
> > >      *
> > >      * In the second case, the wake up notification could be
> > > generated
> > >      * by the unlock path in user space after setting the futex
> > > value
> > >      * to zero or by the kernel after setting the OWNER_DIED bit
> > > below.
> > >      *
> > >      * In both cases the TID validation below prevents a wakeup
> > > of
> > >      * potential waiters which can cause these waiters to block
> > >      * forever.
> > >      *
> > >      * In both cases the following conditions are met:
> > >      *
> > >      *      1) task->robust_list->list_op_pending != NULL
> > >      *         @pending_op == true
> > >      *      2) The owner part of user space futex value == 0
> > >      *      3) Regular futex: @pi == false
> > >      *
> > >      * If these conditions are met, it is safe to attempt waking
> > > up a
> > >      * potential waiter without touching the user space futex
> > > value and
> > >      * trying to set the OWNER_DIED bit. If the futex value is
> > > zero,
> > >      * the rest of the user space mutex state is consistent, so
> > > a woken
> > >      * waiter will just take over the uncontended futex. Setting
> > > the
> > >      * OWNER_DIED bit would create inconsistent state and
> > > malfunction
> > >      * of the user space owner died handling. Otherwise, the
> > > OWNER_DIED
> > >      * bit is already set, and the woken waiter is expected to
> > > deal with
> > >      * this.
> > >      */
> > >     owner = uval & FUTEX_TID_MASK;
> > >
> > >     if (pending_op && !pi && !owner) {
> > >             futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
> > >                        FUTEX_BITSET_MATCH_ANY);
> > >             return 0;
> > >     }
> > >
> > > As a result, it's definitely just a userspace-only change if you need
> > > to
> > > use the robust mutex list across PID namespaces.
> > >
> >
> > I tried to understand what you mean here but can't: can you please
> > explain me which userspace-only change is needed?
>
> No such change is possible. Robust futexes inherently rely on the
> kernel being able to evaluate, on async process death, whether the
> dying task was the owner of a mutex in the robust list. This depends
> on the tid stored in memory being an accurate and unique identifier
> for the task. If you violate this, you can hack things make the
> userspace side work, but the whole robust functionality you want will
> fail to work.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 8645 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-08  9:20                       ` Daniele Dario
@ 2025-02-08 12:39                         ` Rich Felker
  2025-02-08 14:40                           ` Daniele Dario
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2025-02-08 12:39 UTC (permalink / raw)
  To: Daniele Dario; +Cc: Florian Weimer, musl

On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> But wouldn't this mean that robust mutexes functionality is totally
> incompatible with pid namespaces?

No, only with trying to synchronize *across* different pid namespaces.

> If the kernel relies on tid stored in memory by the process this always
> lacks the information about the pid namespace the tid belongs to.

It's necessarily within the same pid namespace as the process itself.

Functionally, you should consider different pid namespaces as
different systems that happen to be capable of sharing some resources.

Rich



> Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@libc.org> ha
> scritto:
> 
> > On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote:
> > > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote:
> > > > * Daniele Personal:
> > > >
> > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> > > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > > > > > > * Daniele Personal:
> > > > > > >
> > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > > > > > > * Daniele Personal:
> > > > > > > > >
> > > > > > > > > > > Is this required for implementing the unlock-if-not-
> > > > > > > > > > > owner
> > > > > > > > > > > error
> > > > > > > > > > > code
> > > > > > > > > > > on mutex unlock?
> > > > > > > > > >
> > > > > > > > > > No, I don't see problems related to EOWNERDEAD.
> > > > > > > > >
> > > > > > > > > Sorry, what I meant is that the TID is needed for efficient
> > > > > > > > > reporting
> > > > > > > > > of
> > > > > > > > > usage errors.  It's not imposed by the robust list protocol
> > > > > > > > > as
> > > > > > > > > such..
> > > > > > > > > There could be a PID-namespace-compatible robust mutex type
> > > > > > > > > that does
> > > > > > > > > not have this problem (but with less error checking).
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Florian
> > > > > > > > >
> > > > > > > >
> > > > > > > > Are you saying that there are pthread_mutexes which can be
> > > > > > > > shared
> > > > > > > > across processes run on different pid namespaces? If yes I'm
> > > > > > > > definitely
> > > > > > > > interested on this. Can you tell me something more?
> > > > > > >
> > > > > > > You would have to add a new mutex type that is a mix of
> > > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
> > > > > > > latter,
> > > > > > > but without the ownership checks.
> > > > > >
> > > > > > This is inaccurate. Robust mutexes fundamentally depend on having
> > > > > > the
> > > > > > owner's tid in the owner field, and on this value not matching
> > > > > > the
> > > > > > tid of any other task that might hold the mutex. If these
> > > > > > properties
> > > > > > don't hold, the mutex may fail to unlock when the owner dies, or
> > > > > > incorrectly unlock when another task mimicking the owner dies.
> > > > > >
> > > > > > The Linux robust mutex protocol fundamentally does not work
> > > > > > across
> > > > > > pid namespaces.
> > > >
> > > > Thank you, Rich, for the correction.
> > > >
> > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has
> > > > > been
> > > > > initialized as shared and robust but not PI capable leaves
> > > > > uncovered
> > > > > only the case of pthread_mutex_unlock().
> > > >
> > > > > As mentioned by Rich, since TIDs are not unique across different
> > > > > namespaces, a task might unlock a mutex hold by another one if they
> > > > > have the same TID.
> > > > >
> > > > > I don't see other possible errors, am I missing something?
> > > >
> > > > The kernel code uses the owner TID to handle some special cases:
> > > >
> > > >     /*
> > > >      * Special case for regular (non PI) futexes. The unlock
> > > > path in
> > > >      * user space has two race scenarios:
> > > >      *
> > > >      * 1. The unlock path releases the user space futex value
> > > > and
> > > >      *    before it can execute the futex() syscall to wake up
> > > >      *    waiters it is killed.
> > > >      *
> > > >      * 2. A woken up waiter is killed before it can acquire the
> > > >      *    futex in user space.
> > > >      *
> > > >      * In the second case, the wake up notification could be
> > > > generated
> > > >      * by the unlock path in user space after setting the futex
> > > > value
> > > >      * to zero or by the kernel after setting the OWNER_DIED bit
> > > > below.
> > > >      *
> > > >      * In both cases the TID validation below prevents a wakeup
> > > > of
> > > >      * potential waiters which can cause these waiters to block
> > > >      * forever.
> > > >      *
> > > >      * In both cases the following conditions are met:
> > > >      *
> > > >      *      1) task->robust_list->list_op_pending != NULL
> > > >      *         @pending_op == true
> > > >      *      2) The owner part of user space futex value == 0
> > > >      *      3) Regular futex: @pi == false
> > > >      *
> > > >      * If these conditions are met, it is safe to attempt waking
> > > > up a
> > > >      * potential waiter without touching the user space futex
> > > > value and
> > > >      * trying to set the OWNER_DIED bit. If the futex value is
> > > > zero,
> > > >      * the rest of the user space mutex state is consistent, so
> > > > a woken
> > > >      * waiter will just take over the uncontended futex. Setting
> > > > the
> > > >      * OWNER_DIED bit would create inconsistent state and
> > > > malfunction
> > > >      * of the user space owner died handling. Otherwise, the
> > > > OWNER_DIED
> > > >      * bit is already set, and the woken waiter is expected to
> > > > deal with
> > > >      * this.
> > > >      */
> > > >     owner = uval & FUTEX_TID_MASK;
> > > >
> > > >     if (pending_op && !pi && !owner) {
> > > >             futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
> > > >                        FUTEX_BITSET_MATCH_ANY);
> > > >             return 0;
> > > >     }
> > > >
> > > > As a result, it's definitely just a userspace-only change if you need
> > > > to
> > > > use the robust mutex list across PID namespaces.
> > > >
> > >
> > > I tried to understand what you mean here but can't: can you please
> > > explain me which userspace-only change is needed?
> >
> > No such change is possible. Robust futexes inherently rely on the
> > kernel being able to evaluate, on async process death, whether the
> > dying task was the owner of a mutex in the robust list. This depends
> > on the tid stored in memory being an accurate and unique identifier
> > for the task. If you violate this, you can hack things make the
> > userspace side work, but the whole robust functionality you want will
> > fail to work.
> >
> > Rich
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-08 12:39                         ` Rich Felker
@ 2025-02-08 14:40                           ` Daniele Dario
  2025-02-08 14:52                             ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Daniele Dario @ 2025-02-08 14:40 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, musl

[-- Attachment #1: Type: text/plain, Size: 7596 bytes --]

Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto:

> On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > But wouldn't this mean that robust mutexes functionality is totally
> > incompatible with pid namespaces?
>
> No, only with trying to synchronize *across* different pid namespaces.
>
> > If the kernel relies on tid stored in memory by the process this always
> > lacks the information about the pid namespace the tid belongs to.
>
> It's necessarily within the same pid namespace as the process itself.
>
> Functionally, you should consider different pid namespaces as
> different systems that happen to be capable of sharing some resources.
>
> Rich
>

Yes, I'm just saying that sharing pthread_mutex_t instances across
processes within the same pid namespace but on a system with more than a
pid namespace could lead to issues anyway if the stored tid value is used
by the kernel as who to contact without the knowledge of on which pid
namespace.

I not saying this is true, I'm trying to understand and if possible,
improve things.

Daniele


>
>
> > Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@libc.org> ha
> > scritto:
> >
> > > On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote:
> > > > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote:
> > > > > * Daniele Personal:
> > > > >
> > > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> > > > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > > > > > > > * Daniele Personal:
> > > > > > > >
> > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > > > > > > > * Daniele Personal:
> > > > > > > > > >
> > > > > > > > > > > > Is this required for implementing the unlock-if-not-
> > > > > > > > > > > > owner
> > > > > > > > > > > > error
> > > > > > > > > > > > code
> > > > > > > > > > > > on mutex unlock?
> > > > > > > > > > >
> > > > > > > > > > > No, I don't see problems related to EOWNERDEAD.
> > > > > > > > > >
> > > > > > > > > > Sorry, what I meant is that the TID is needed for
> efficient
> > > > > > > > > > reporting
> > > > > > > > > > of
> > > > > > > > > > usage errors.  It's not imposed by the robust list
> protocol
> > > > > > > > > > as
> > > > > > > > > > such..
> > > > > > > > > > There could be a PID-namespace-compatible robust mutex
> type
> > > > > > > > > > that does
> > > > > > > > > > not have this problem (but with less error checking).
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Florian
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Are you saying that there are pthread_mutexes which can be
> > > > > > > > > shared
> > > > > > > > > across processes run on different pid namespaces? If yes
> I'm
> > > > > > > > > definitely
> > > > > > > > > interested on this. Can you tell me something more?
> > > > > > > >
> > > > > > > > You would have to add a new mutex type that is a mix of
> > > > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST.  Closer to the
> > > > > > > > latter,
> > > > > > > > but without the ownership checks.
> > > > > > >
> > > > > > > This is inaccurate. Robust mutexes fundamentally depend on
> having
> > > > > > > the
> > > > > > > owner's tid in the owner field, and on this value not matching
> > > > > > > the
> > > > > > > tid of any other task that might hold the mutex. If these
> > > > > > > properties
> > > > > > > don't hold, the mutex may fail to unlock when the owner dies,
> or
> > > > > > > incorrectly unlock when another task mimicking the owner dies.
> > > > > > >
> > > > > > > The Linux robust mutex protocol fundamentally does not work
> > > > > > > across
> > > > > > > pid namespaces.
> > > > >
> > > > > Thank you, Rich, for the correction.
> > > > >
> > > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has
> > > > > > been
> > > > > > initialized as shared and robust but not PI capable leaves
> > > > > > uncovered
> > > > > > only the case of pthread_mutex_unlock().
> > > > >
> > > > > > As mentioned by Rich, since TIDs are not unique across different
> > > > > > namespaces, a task might unlock a mutex hold by another one if
> they
> > > > > > have the same TID.
> > > > > >
> > > > > > I don't see other possible errors, am I missing something?
> > > > >
> > > > > The kernel code uses the owner TID to handle some special cases:
> > > > >
> > > > >     /*
> > > > >      * Special case for regular (non PI) futexes. The unlock
> > > > > path in
> > > > >      * user space has two race scenarios:
> > > > >      *
> > > > >      * 1. The unlock path releases the user space futex value
> > > > > and
> > > > >      *    before it can execute the futex() syscall to wake up
> > > > >      *    waiters it is killed.
> > > > >      *
> > > > >      * 2. A woken up waiter is killed before it can acquire the
> > > > >      *    futex in user space.
> > > > >      *
> > > > >      * In the second case, the wake up notification could be
> > > > > generated
> > > > >      * by the unlock path in user space after setting the futex
> > > > > value
> > > > >      * to zero or by the kernel after setting the OWNER_DIED bit
> > > > > below.
> > > > >      *
> > > > >      * In both cases the TID validation below prevents a wakeup
> > > > > of
> > > > >      * potential waiters which can cause these waiters to block
> > > > >      * forever.
> > > > >      *
> > > > >      * In both cases the following conditions are met:
> > > > >      *
> > > > >      *      1) task->robust_list->list_op_pending != NULL
> > > > >      *         @pending_op == true
> > > > >      *      2) The owner part of user space futex value == 0
> > > > >      *      3) Regular futex: @pi == false
> > > > >      *
> > > > >      * If these conditions are met, it is safe to attempt waking
> > > > > up a
> > > > >      * potential waiter without touching the user space futex
> > > > > value and
> > > > >      * trying to set the OWNER_DIED bit. If the futex value is
> > > > > zero,
> > > > >      * the rest of the user space mutex state is consistent, so
> > > > > a woken
> > > > >      * waiter will just take over the uncontended futex. Setting
> > > > > the
> > > > >      * OWNER_DIED bit would create inconsistent state and
> > > > > malfunction
> > > > >      * of the user space owner died handling. Otherwise, the
> > > > > OWNER_DIED
> > > > >      * bit is already set, and the woken waiter is expected to
> > > > > deal with
> > > > >      * this.
> > > > >      */
> > > > >     owner = uval & FUTEX_TID_MASK;
> > > > >
> > > > >     if (pending_op && !pi && !owner) {
> > > > >             futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
> > > > >                        FUTEX_BITSET_MATCH_ANY);
> > > > >             return 0;
> > > > >     }
> > > > >
> > > > > As a result, it's definitely just a userspace-only change if you
> need
> > > > > to
> > > > > use the robust mutex list across PID namespaces.
> > > > >
> > > >
> > > > I tried to understand what you mean here but can't: can you please
> > > > explain me which userspace-only change is needed?
> > >
> > > No such change is possible. Robust futexes inherently rely on the
> > > kernel being able to evaluate, on async process death, whether the
> > > dying task was the owner of a mutex in the robust list. This depends
> > > on the tid stored in memory being an accurate and unique identifier
> > > for the task. If you violate this, you can hack things make the
> > > userspace side work, but the whole robust functionality you want will
> > > fail to work.
> > >
> > > Rich
> > >
>

[-- Attachment #2: Type: text/html, Size: 11432 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-08 14:40                           ` Daniele Dario
@ 2025-02-08 14:52                             ` Rich Felker
  2025-02-10 16:12                               ` Daniele Personal
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2025-02-08 14:52 UTC (permalink / raw)
  To: Daniele Dario; +Cc: Florian Weimer, musl

On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto:
> 
> > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > > But wouldn't this mean that robust mutexes functionality is totally
> > > incompatible with pid namespaces?
> >
> > No, only with trying to synchronize *across* different pid namespaces.
> >
> > > If the kernel relies on tid stored in memory by the process this always
> > > lacks the information about the pid namespace the tid belongs to.
> >
> > It's necessarily within the same pid namespace as the process itself.
> >
> > Functionally, you should consider different pid namespaces as
> > different systems that happen to be capable of sharing some resources.
> >
> > Rich
> >
> 
> Yes, I'm just saying that sharing pthread_mutex_t instances across
> processes within the same pid namespace but on a system with more than a
> pid namespace could lead to issues anyway if the stored tid value is used
> by the kernel as who to contact without the knowledge of on which pid
> namespace.
> 
> I not saying this is true, I'm trying to understand and if possible,
> improve things.

That's not a problem. The stored tid is used only in the context of a
process exiting, where the kernel code knows the relevant pid
namespace (the one the exiting process is in) and uses the tid
relative to that. If it didn't work this way, it would be a fatal bug
in the pid namespace implementation, which is supposed to allow
essentially transparent containerization (which includes processes in
the ns being able to use their tids as they could if they were outside
of any container/in global ns).

Rich



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-08 14:52                             ` Rich Felker
@ 2025-02-10 16:12                               ` Daniele Personal
  2025-02-10 18:14                                 ` Rich Felker
  2025-02-10 18:44                                 ` Jeffrey Walton
  0 siblings, 2 replies; 28+ messages in thread
From: Daniele Personal @ 2025-02-10 16:12 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, musl

On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto:
> > 
> > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > > > But wouldn't this mean that robust mutexes functionality is
> > > > totally
> > > > incompatible with pid namespaces?
> > > 
> > > No, only with trying to synchronize *across* different pid
> > > namespaces.
> > > 
> > > > If the kernel relies on tid stored in memory by the process
> > > > this always
> > > > lacks the information about the pid namespace the tid belongs
> > > > to.
> > > 
> > > It's necessarily within the same pid namespace as the process
> > > itself.
> > > 
> > > Functionally, you should consider different pid namespaces as
> > > different systems that happen to be capable of sharing some
> > > resources.
> > > 
> > > Rich
> > > 
> > 
> > Yes, I'm just saying that sharing pthread_mutex_t instances across
> > processes within the same pid namespace but on a system with more
> > than a
> > pid namespace could lead to issues anyway if the stored tid value
> > is used
> > by the kernel as who to contact without the knowledge of on which
> > pid
> > namespace.
> > 
> > I not saying this is true, I'm trying to understand and if
> > possible,
> > improve things.
> 
> That's not a problem. The stored tid is used only in the context of a
> process exiting, where the kernel code knows the relevant pid
> namespace (the one the exiting process is in) and uses the tid
> relative to that. If it didn't work this way, it would be a fatal bug
> in the pid namespace implementation, which is supposed to allow
> essentially transparent containerization (which includes processes in
> the ns being able to use their tids as they could if they were
> outside
> of any container/in global ns).
> 
> Rich
> 

So, IIUC, the problem of sharing robust pthread_mutex_t instances
across different pid namespaces is on the user space side which is not
able to distinguish clashes on TIDs. In particular, problems could
arise when:
 * an application tries to unlock a mutex owned by another one with its
   same TID but on a different pid namespace (but this is an
   application design problem and libc can't help because TIDs are not
   unique across different pid namespaces)
 * an application tries to lock a mutex owned by another one with its
   same TID but on a different pid namespace: this is a real issue
   because it could happen

I know that pid namespace isolation usually comes also with ipc
namespace isolation but it is not a violation to have one without the
other. Wouldn't it be a good idea to figure out a way to have a safe
way to use robust mutexes shared across different pid namespaces?

Daniele.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-10 16:12                               ` Daniele Personal
@ 2025-02-10 18:14                                 ` Rich Felker
  2025-02-11  9:34                                   ` Daniele Personal
  2025-02-10 18:44                                 ` Jeffrey Walton
  1 sibling, 1 reply; 28+ messages in thread
From: Rich Felker @ 2025-02-10 18:14 UTC (permalink / raw)
  To: Daniele Personal; +Cc: Florian Weimer, musl

On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote:
> On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto:
> > > 
> > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > > > > But wouldn't this mean that robust mutexes functionality is
> > > > > totally
> > > > > incompatible with pid namespaces?
> > > > 
> > > > No, only with trying to synchronize *across* different pid
> > > > namespaces.
> > > > 
> > > > > If the kernel relies on tid stored in memory by the process
> > > > > this always
> > > > > lacks the information about the pid namespace the tid belongs
> > > > > to.
> > > > 
> > > > It's necessarily within the same pid namespace as the process
> > > > itself.
> > > > 
> > > > Functionally, you should consider different pid namespaces as
> > > > different systems that happen to be capable of sharing some
> > > > resources.
> > > > 
> > > > Rich
> > > > 
> > > 
> > > Yes, I'm just saying that sharing pthread_mutex_t instances across
> > > processes within the same pid namespace but on a system with more
> > > than a
> > > pid namespace could lead to issues anyway if the stored tid value
> > > is used
> > > by the kernel as who to contact without the knowledge of on which
> > > pid
> > > namespace.
> > > 
> > > I not saying this is true, I'm trying to understand and if
> > > possible,
> > > improve things.
> > 
> > That's not a problem. The stored tid is used only in the context of a
> > process exiting, where the kernel code knows the relevant pid
> > namespace (the one the exiting process is in) and uses the tid
> > relative to that. If it didn't work this way, it would be a fatal bug
> > in the pid namespace implementation, which is supposed to allow
> > essentially transparent containerization (which includes processes in
> > the ns being able to use their tids as they could if they were
> > outside
> > of any container/in global ns).
> > 
> > Rich
> > 
> 
> So, IIUC, the problem of sharing robust pthread_mutex_t instances
> across different pid namespaces is on the user space side which is not
> able to distinguish clashes on TIDs. In particular, problems could
> arise when:

No, it is not "on the user side". The user side can be modified
arbitrarily, and, modulo some cost, could surely be made to work for
non-robust process-shared mutexes. The problem is that the kernel --
the part which makes them robust -- has to honor the protocol, and the
protocol does not admit distinguishing "pid N in ns X" from "pid N in
ns Y".

>  * an application tries to unlock a mutex owned by another one with its
>    same TID but on a different pid namespace (but this is an
>    application design problem and libc can't help because TIDs are not
>    unique across different pid namespaces)
>  * an application tries to lock a mutex owned by another one with its
>    same TID but on a different pid namespace: this is a real issue
>    because it could happen
> 
> I know that pid namespace isolation usually comes also with ipc
> namespace isolation but it is not a violation to have one without the
> other. Wouldn't it be a good idea to figure out a way to have a safe
> way to use robust mutexes shared across different pid namespaces?

I do not consider this a reasonable expenditure of complexity
whatsoever. It would require at least having a new robust list
protocol, with userspace having to support both the old and new ones
adapting at runtime, and may even require larger-than-wordsize
atomics, which are not something you can assume exists. All of this
for the explicit purpose of *violating* the whole intended purpose of
namespaces: the isolation.

For cases where you really need cross-ns locking, you could use sysv
semaphores if the sysvipc namespace is shared. If it's not, you could
use fcntl ODF locks on a shared file descriptor, which should have
your needed robustness properties.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-10 16:12                               ` Daniele Personal
  2025-02-10 18:14                                 ` Rich Felker
@ 2025-02-10 18:44                                 ` Jeffrey Walton
  2025-02-10 18:58                                   ` Rich Felker
  1 sibling, 1 reply; 28+ messages in thread
From: Jeffrey Walton @ 2025-02-10 18:44 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker, Florian Weimer

On Mon, Feb 10, 2025 at 11:13 AM Daniele Personal <d.dario76@gmail.com> wrote:
>
> On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto:
> > >
> > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > > > > But wouldn't this mean that robust mutexes functionality is
> > > > > totally
> > > > > incompatible with pid namespaces?
> > > >
> > > > No, only with trying to synchronize *across* different pid
> > > > namespaces.
> > > >
> > > > > If the kernel relies on tid stored in memory by the process
> > > > > this always
> > > > > lacks the information about the pid namespace the tid belongs
> > > > > to.
> > > >
> > > > It's necessarily within the same pid namespace as the process
> > > > itself.
> > > >
> > > > Functionally, you should consider different pid namespaces as
> > > > different systems that happen to be capable of sharing some
> > > > resources.
> > >
> > > Yes, I'm just saying that sharing pthread_mutex_t instances across
> > > processes within the same pid namespace but on a system with more
> > > than a
> > > pid namespace could lead to issues anyway if the stored tid value
> > > is used
> > > by the kernel as who to contact without the knowledge of on which
> > > pid
> > > namespace.
> > >
> > > I not saying this is true, I'm trying to understand and if
> > > possible,
> > > improve things.
> >
> > That's not a problem. The stored tid is used only in the context of a
> > process exiting, where the kernel code knows the relevant pid
> > namespace (the one the exiting process is in) and uses the tid
> > relative to that. If it didn't work this way, it would be a fatal bug
> > in the pid namespace implementation, which is supposed to allow
> > essentially transparent containerization (which includes processes in
> > the ns being able to use their tids as they could if they were
> > outside
> > of any container/in global ns).
>
> So, IIUC, the problem of sharing robust pthread_mutex_t instances
> across different pid namespaces is on the user space side which is not
> able to distinguish clashes on TIDs. In particular, problems could
> arise when:
>  * an application tries to unlock a mutex owned by another one with its
>    same TID but on a different pid namespace (but this is an
>    application design problem and libc can't help because TIDs are not
>    unique across different pid namespaces)
>  * an application tries to lock a mutex owned by another one with its
>    same TID but on a different pid namespace: this is a real issue
>    because it could happen
>
> I know that pid namespace isolation usually comes also with ipc
> namespace isolation but it is not a violation to have one without the
> other. Wouldn't it be a good idea to figure out a way to have a safe
> way to use robust mutexes shared across different pid namespaces?

It's been a while since I took my computer science classes, but...

It sounds like (to me) the wrong tool is being used for the job. If
you want a synchronization object that works across processes, then
you use a semaphore, and not a pthread mutex.

Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-10 18:44                                 ` Jeffrey Walton
@ 2025-02-10 18:58                                   ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2025-02-10 18:58 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: musl, Florian Weimer

On Mon, Feb 10, 2025 at 01:44:12PM -0500, Jeffrey Walton wrote:
> On Mon, Feb 10, 2025 at 11:13 AM Daniele Personal <d.dario76@gmail.com> wrote:
> >
> > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto:
> > > >
> > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > > > > > But wouldn't this mean that robust mutexes functionality is
> > > > > > totally
> > > > > > incompatible with pid namespaces?
> > > > >
> > > > > No, only with trying to synchronize *across* different pid
> > > > > namespaces.
> > > > >
> > > > > > If the kernel relies on tid stored in memory by the process
> > > > > > this always
> > > > > > lacks the information about the pid namespace the tid belongs
> > > > > > to.
> > > > >
> > > > > It's necessarily within the same pid namespace as the process
> > > > > itself.
> > > > >
> > > > > Functionally, you should consider different pid namespaces as
> > > > > different systems that happen to be capable of sharing some
> > > > > resources.
> > > >
> > > > Yes, I'm just saying that sharing pthread_mutex_t instances across
> > > > processes within the same pid namespace but on a system with more
> > > > than a
> > > > pid namespace could lead to issues anyway if the stored tid value
> > > > is used
> > > > by the kernel as who to contact without the knowledge of on which
> > > > pid
> > > > namespace.
> > > >
> > > > I not saying this is true, I'm trying to understand and if
> > > > possible,
> > > > improve things.
> > >
> > > That's not a problem. The stored tid is used only in the context of a
> > > process exiting, where the kernel code knows the relevant pid
> > > namespace (the one the exiting process is in) and uses the tid
> > > relative to that. If it didn't work this way, it would be a fatal bug
> > > in the pid namespace implementation, which is supposed to allow
> > > essentially transparent containerization (which includes processes in
> > > the ns being able to use their tids as they could if they were
> > > outside
> > > of any container/in global ns).
> >
> > So, IIUC, the problem of sharing robust pthread_mutex_t instances
> > across different pid namespaces is on the user space side which is not
> > able to distinguish clashes on TIDs. In particular, problems could
> > arise when:
> >  * an application tries to unlock a mutex owned by another one with its
> >    same TID but on a different pid namespace (but this is an
> >    application design problem and libc can't help because TIDs are not
> >    unique across different pid namespaces)
> >  * an application tries to lock a mutex owned by another one with its
> >    same TID but on a different pid namespace: this is a real issue
> >    because it could happen
> >
> > I know that pid namespace isolation usually comes also with ipc
> > namespace isolation but it is not a violation to have one without the
> > other. Wouldn't it be a good idea to figure out a way to have a safe
> > way to use robust mutexes shared across different pid namespaces?
> 
> It's been a while since I took my computer science classes, but...
> 
> It sounds like (to me) the wrong tool is being used for the job. If
> you want a synchronization object that works across processes, then
> you use a semaphore, and not a pthread mutex.

There are process-shared mutexes and robust process-shared mutexes
that automatically unlock and notify the new owner on next acquire if
a process died while owning the mutex, possibly leaving the protected
data inconsistent. These are very reasonable tools to use, but they
don't work across the boundaries between different physical or logical
systems.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-10 18:14                                 ` Rich Felker
@ 2025-02-11  9:34                                   ` Daniele Personal
  2025-02-11 11:38                                     ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Daniele Personal @ 2025-02-11  9:34 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, musl

On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote:
> On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote:
> > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha
> > > > scritto:
> > > > 
> > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario
> > > > > wrote:
> > > > > > But wouldn't this mean that robust mutexes functionality is
> > > > > > totally
> > > > > > incompatible with pid namespaces?
> > > > > 
> > > > > No, only with trying to synchronize *across* different pid
> > > > > namespaces.
> > > > > 
> > > > > > If the kernel relies on tid stored in memory by the process
> > > > > > this always
> > > > > > lacks the information about the pid namespace the tid
> > > > > > belongs
> > > > > > to.
> > > > > 
> > > > > It's necessarily within the same pid namespace as the process
> > > > > itself.
> > > > > 
> > > > > Functionally, you should consider different pid namespaces as
> > > > > different systems that happen to be capable of sharing some
> > > > > resources.
> > > > > 
> > > > > Rich
> > > > > 
> > > > 
> > > > Yes, I'm just saying that sharing pthread_mutex_t instances
> > > > across
> > > > processes within the same pid namespace but on a system with
> > > > more
> > > > than a
> > > > pid namespace could lead to issues anyway if the stored tid
> > > > value
> > > > is used
> > > > by the kernel as who to contact without the knowledge of on
> > > > which
> > > > pid
> > > > namespace.
> > > > 
> > > > I not saying this is true, I'm trying to understand and if
> > > > possible,
> > > > improve things.
> > > 
> > > That's not a problem. The stored tid is used only in the context
> > > of a
> > > process exiting, where the kernel code knows the relevant pid
> > > namespace (the one the exiting process is in) and uses the tid
> > > relative to that. If it didn't work this way, it would be a fatal
> > > bug
> > > in the pid namespace implementation, which is supposed to allow
> > > essentially transparent containerization (which includes
> > > processes in
> > > the ns being able to use their tids as they could if they were
> > > outside
> > > of any container/in global ns).
> > > 
> > > Rich
> > > 
> > 
> > So, IIUC, the problem of sharing robust pthread_mutex_t instances
> > across different pid namespaces is on the user space side which is
> > not
> > able to distinguish clashes on TIDs. In particular, problems could
> > arise when:
> 
> No, it is not "on the user side". The user side can be modified
> arbitrarily, and, modulo some cost, could surely be made to work for
> non-robust process-shared mutexes. The problem is that the kernel --
> the part which makes them robust -- has to honor the protocol, and
> the
> protocol does not admit distinguishing "pid N in ns X" from "pid N in
> ns Y".

Ah, I thought your previous sentence was saying that the kernel is able
to make this distinction.

> 
> >  * an application tries to unlock a mutex owned by another one with
> > its
> >    same TID but on a different pid namespace (but this is an
> >    application design problem and libc can't help because TIDs are
> > not
> >    unique across different pid namespaces)
> >  * an application tries to lock a mutex owned by another one with
> > its
> >    same TID but on a different pid namespace: this is a real issue
> >    because it could happen
> > 
> > I know that pid namespace isolation usually comes also with ipc
> > namespace isolation but it is not a violation to have one without
> > the
> > other. Wouldn't it be a good idea to figure out a way to have a
> > safe
> > way to use robust mutexes shared across different pid namespaces?
> 
> I do not consider this a reasonable expenditure of complexity
> whatsoever. It would require at least having a new robust list
> protocol, with userspace having to support both the old and new ones
> adapting at runtime, and may even require larger-than-wordsize
> atomics, which are not something you can assume exists. All of this
> for the explicit purpose of *violating* the whole intended purpose of
> namespaces: the isolation.
> 
> For cases where you really need cross-ns locking, you could use sysv
> semaphores if the sysvipc namespace is shared. If it's not, you could
> use fcntl ODF locks on a shared file descriptor, which should have
> your needed robustness properties.
> 
> Rich

Unfortunately it is not possible to say which variables need cross-ns
locking and which not. This means that we should treat all in the same
way and so replace all the mutexes with sysv semaphores but this has
some costs: locking sysv semaphores always require syscalls and context
switch between user/kernel spaces even if there's no contention and
moreover, they imply the presence of accessible files.

We basically use a chunk of shared memory as a storage where variables
could be added/read/written by the various applications. Since mutexes
used to protect the variables are embedded in the same chunk of shared
memory, there is only an mmap needed in order to access the storage by
applications.

Up to now, applications were running in the same pid namespace but now,
for some products, we needed to integrate a 3rd party application and
this requires a certain degree of isolation so we opted to containerize
this application and here we come to why I asked for clarifications.

I get your point when you say that sharing robust pthread_mutex_t
instances violates the pid namespace isolation but you choose the
degree of isolation balancing the risks and the benefits. Even if you
have a new mount namespace you can decide to bind mount some parts of
the filesystem to allow access to pars of the host flash for instance,
same could happen with network.

Long story short, I'm pulling water to my mill, but I think that it's
not bad to have posix robust shared mutexes working across different
pid namespaces. It will allow users to use a really powerful tool also
with containerized applications (again pulling water to my mill) which
need it.

If there's any idea on how to gain this I'd really work on it: limiting
the max number of pids which could run on a pid namespace to allow the
use of some bits for the ns in the tid stored in the robust list for
instance?

On the other hand I'll surely try what you suggested.

Thanks,
Daniele.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-11  9:34                                   ` Daniele Personal
@ 2025-02-11 11:38                                     ` Rich Felker
  2025-02-11 13:53                                       ` Daniele Personal
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2025-02-11 11:38 UTC (permalink / raw)
  To: Daniele Personal; +Cc: Florian Weimer, musl

On Tue, Feb 11, 2025 at 10:34:30AM +0100, Daniele Personal wrote:
> On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote:
> > On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote:
> > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha
> > > > > scritto:
> > > > > 
> > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario
> > > > > > wrote:
> > > > > > > But wouldn't this mean that robust mutexes functionality is
> > > > > > > totally
> > > > > > > incompatible with pid namespaces?
> > > > > > 
> > > > > > No, only with trying to synchronize *across* different pid
> > > > > > namespaces.
> > > > > > 
> > > > > > > If the kernel relies on tid stored in memory by the process
> > > > > > > this always
> > > > > > > lacks the information about the pid namespace the tid
> > > > > > > belongs
> > > > > > > to.
> > > > > > 
> > > > > > It's necessarily within the same pid namespace as the process
> > > > > > itself.
> > > > > > 
> > > > > > Functionally, you should consider different pid namespaces as
> > > > > > different systems that happen to be capable of sharing some
> > > > > > resources.
> > > > > > 
> > > > > > Rich
> > > > > > 
> > > > > 
> > > > > Yes, I'm just saying that sharing pthread_mutex_t instances
> > > > > across
> > > > > processes within the same pid namespace but on a system with
> > > > > more
> > > > > than a
> > > > > pid namespace could lead to issues anyway if the stored tid
> > > > > value
> > > > > is used
> > > > > by the kernel as who to contact without the knowledge of on
> > > > > which
> > > > > pid
> > > > > namespace.
> > > > > 
> > > > > I not saying this is true, I'm trying to understand and if
> > > > > possible,
> > > > > improve things.
> > > > 
> > > > That's not a problem. The stored tid is used only in the context
> > > > of a
> > > > process exiting, where the kernel code knows the relevant pid
> > > > namespace (the one the exiting process is in) and uses the tid
> > > > relative to that. If it didn't work this way, it would be a fatal
> > > > bug
> > > > in the pid namespace implementation, which is supposed to allow
> > > > essentially transparent containerization (which includes
> > > > processes in
> > > > the ns being able to use their tids as they could if they were
> > > > outside
> > > > of any container/in global ns).
> > > > 
> > > > Rich
> > > > 
> > > 
> > > So, IIUC, the problem of sharing robust pthread_mutex_t instances
> > > across different pid namespaces is on the user space side which is
> > > not
> > > able to distinguish clashes on TIDs. In particular, problems could
> > > arise when:
> > 
> > No, it is not "on the user side". The user side can be modified
> > arbitrarily, and, modulo some cost, could surely be made to work for
> > non-robust process-shared mutexes. The problem is that the kernel --
> > the part which makes them robust -- has to honor the protocol, and
> > the
> > protocol does not admit distinguishing "pid N in ns X" from "pid N in
> > ns Y".
> 
> Ah, I thought your previous sentence was saying that the kernel is able
> to make this distinction.

No, it's able to make the *assumption* that the namespace the tid is
relative to is that of the dying process. That's what lets it work
(and a large part of why namespaces were practical to add to Linux to
begin with -- all of the existing interfaces that use pids/tids need
to know which namespace you're talking about, but they work because
the kernel can assume "same namespace as the executing task").

> Unfortunately it is not possible to say which variables need cross-ns
> locking and which not. This means that we should treat all in the same
> way and so replace all the mutexes with sysv semaphores but this has
> some costs: locking sysv semaphores always require syscalls and context
> switch between user/kernel spaces even if there's no contention and
> moreover, they imply the presence of accessible files.
> 
> We basically use a chunk of shared memory as a storage where variables
> could be added/read/written by the various applications. Since mutexes
> used to protect the variables are embedded in the same chunk of shared
> memory, there is only an mmap needed in order to access the storage by
> applications.
> 
> Up to now, applications were running in the same pid namespace but now,
> for some products, we needed to integrate a 3rd party application and
> this requires a certain degree of isolation so we opted to containerize
> this application and here we come to why I asked for clarifications.
> 
> I get your point when you say that sharing robust pthread_mutex_t
> instances violates the pid namespace isolation but you choose the
> degree of isolation balancing the risks and the benefits. Even if you
> have a new mount namespace you can decide to bind mount some parts of
> the filesystem to allow access to pars of the host flash for instance,
> same could happen with network.
> 
> Long story short, I'm pulling water to my mill, but I think that it's
> not bad to have posix robust shared mutexes working across different
> pid namespaces. It will allow users to use a really powerful tool also
> with containerized applications (again pulling water to my mill) which
> need it.

Generally we implement nonstandard functionality only on the basis of
strong historical precedent, need by multiple major real-world
applications, lack of cost imposed onto everyone else who doesn't
want/need the functionality, and other similar conditions. On all of
these axes, the thing you're asking for is completely in the opposite
direction.

> If there's any idea on how to gain this I'd really work on it: limiting
> the max number of pids which could run on a pid namespace to allow the
> use of some bits for the ns in the tid stored in the robust list for
> instance?

This is something where you're on your own either writing it or hiring
someone to do so and maintianing your forks of musl and the kernel.
There is just no way this kind of hack ever belongs upstream.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces
  2025-02-11 11:38                                     ` Rich Felker
@ 2025-02-11 13:53                                       ` Daniele Personal
  0 siblings, 0 replies; 28+ messages in thread
From: Daniele Personal @ 2025-02-11 13:53 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, musl

On Tue, 2025-02-11 at 06:38 -0500, Rich Felker wrote:
> On Tue, Feb 11, 2025 at 10:34:30AM +0100, Daniele Personal wrote:
> > On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote:
> > > On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote:
> > > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario
> > > > > wrote:
> > > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha
> > > > > > scritto:
> > > > > > 
> > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario
> > > > > > > wrote:
> > > > > > > > But wouldn't this mean that robust mutexes
> > > > > > > > functionality is
> > > > > > > > totally
> > > > > > > > incompatible with pid namespaces?
> > > > > > > 
> > > > > > > No, only with trying to synchronize *across* different
> > > > > > > pid
> > > > > > > namespaces.
> > > > > > > 
> > > > > > > > If the kernel relies on tid stored in memory by the
> > > > > > > > process
> > > > > > > > this always
> > > > > > > > lacks the information about the pid namespace the tid
> > > > > > > > belongs
> > > > > > > > to.
> > > > > > > 
> > > > > > > It's necessarily within the same pid namespace as the
> > > > > > > process
> > > > > > > itself.
> > > > > > > 
> > > > > > > Functionally, you should consider different pid
> > > > > > > namespaces as
> > > > > > > different systems that happen to be capable of sharing
> > > > > > > some
> > > > > > > resources.
> > > > > > > 
> > > > > > > Rich
> > > > > > > 
> > > > > > 
> > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances
> > > > > > across
> > > > > > processes within the same pid namespace but on a system
> > > > > > with
> > > > > > more
> > > > > > than a
> > > > > > pid namespace could lead to issues anyway if the stored tid
> > > > > > value
> > > > > > is used
> > > > > > by the kernel as who to contact without the knowledge of on
> > > > > > which
> > > > > > pid
> > > > > > namespace.
> > > > > > 
> > > > > > I not saying this is true, I'm trying to understand and if
> > > > > > possible,
> > > > > > improve things.
> > > > > 
> > > > > That's not a problem. The stored tid is used only in the
> > > > > context
> > > > > of a
> > > > > process exiting, where the kernel code knows the relevant pid
> > > > > namespace (the one the exiting process is in) and uses the
> > > > > tid
> > > > > relative to that. If it didn't work this way, it would be a
> > > > > fatal
> > > > > bug
> > > > > in the pid namespace implementation, which is supposed to
> > > > > allow
> > > > > essentially transparent containerization (which includes
> > > > > processes in
> > > > > the ns being able to use their tids as they could if they
> > > > > were
> > > > > outside
> > > > > of any container/in global ns).
> > > > > 
> > > > > Rich
> > > > > 
> > > > 
> > > > So, IIUC, the problem of sharing robust pthread_mutex_t
> > > > instances
> > > > across different pid namespaces is on the user space side which
> > > > is
> > > > not
> > > > able to distinguish clashes on TIDs. In particular, problems
> > > > could
> > > > arise when:
> > > 
> > > No, it is not "on the user side". The user side can be modified
> > > arbitrarily, and, modulo some cost, could surely be made to work
> > > for
> > > non-robust process-shared mutexes. The problem is that the kernel
> > > --
> > > the part which makes them robust -- has to honor the protocol,
> > > and
> > > the
> > > protocol does not admit distinguishing "pid N in ns X" from "pid
> > > N in
> > > ns Y".
> > 
> > Ah, I thought your previous sentence was saying that the kernel is
> > able
> > to make this distinction.
> 
> No, it's able to make the *assumption* that the namespace the tid is
> relative to is that of the dying process. That's what lets it work
> (and a large part of why namespaces were practical to add to Linux to
> begin with -- all of the existing interfaces that use pids/tids need
> to know which namespace you're talking about, but they work because
> the kernel can assume "same namespace as the executing task").
> 
> > Unfortunately it is not possible to say which variables need cross-
> > ns
> > locking and which not. This means that we should treat all in the
> > same
> > way and so replace all the mutexes with sysv semaphores but this
> > has
> > some costs: locking sysv semaphores always require syscalls and
> > context
> > switch between user/kernel spaces even if there's no contention and
> > moreover, they imply the presence of accessible files.
> > 
> > We basically use a chunk of shared memory as a storage where
> > variables
> > could be added/read/written by the various applications. Since
> > mutexes
> > used to protect the variables are embedded in the same chunk of
> > shared
> > memory, there is only an mmap needed in order to access the storage
> > by
> > applications.
> > 
> > Up to now, applications were running in the same pid namespace but
> > now,
> > for some products, we needed to integrate a 3rd party application
> > and
> > this requires a certain degree of isolation so we opted to
> > containerize
> > this application and here we come to why I asked for
> > clarifications.
> > 
> > I get your point when you say that sharing robust pthread_mutex_t
> > instances violates the pid namespace isolation but you choose the
> > degree of isolation balancing the risks and the benefits. Even if
> > you
> > have a new mount namespace you can decide to bind mount some parts
> > of
> > the filesystem to allow access to pars of the host flash for
> > instance,
> > same could happen with network.
> > 
> > Long story short, I'm pulling water to my mill, but I think that
> > it's
> > not bad to have posix robust shared mutexes working across
> > different
> > pid namespaces. It will allow users to use a really powerful tool
> > also
> > with containerized applications (again pulling water to my mill)
> > which
> > need it.
> 
> Generally we implement nonstandard functionality only on the basis of
> strong historical precedent, need by multiple major real-world
> applications, lack of cost imposed onto everyone else who doesn't
> want/need the functionality, and other similar conditions. On all of
> these axes, the thing you're asking for is completely in the opposite
> direction.
> 
> > If there's any idea on how to gain this I'd really work on it:
> > limiting
> > the max number of pids which could run on a pid namespace to allow
> > the
> > use of some bits for the ns in the tid stored in the robust list
> > for
> > instance?
> 
> This is something where you're on your own either writing it or
> hiring
> someone to do so and maintianing your forks of musl and the kernel.
> There is just no way this kind of hack ever belongs upstream.
> 
> Rich

Thanks for the time you spent on this, I really appreciated.

Daniele.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-02-11 13:53 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-28 13:22 [musl] pthread_mutex_t shared between processes with different pid namespaces Daniele Personal
2025-01-28 15:02 ` Rich Felker
2025-01-28 16:13   ` Daniele Personal
2025-01-28 18:24   ` Florian Weimer
2025-01-31  9:31     ` Daniele Personal
2025-01-31 20:30       ` Markus Wichmann
2025-02-03 13:54         ` Daniele Personal
2025-02-01 16:03       ` Florian Weimer
2025-02-03 12:58         ` Daniele Personal
2025-02-03 17:25           ` Florian Weimer
2025-02-04 16:48             ` Daniele Personal
2025-02-04 18:53             ` Rich Felker
2025-02-05 10:17               ` Daniele Personal
2025-02-05 10:32                 ` Florian Weimer
2025-02-06  7:45                   ` Daniele Personal
2025-02-07 16:19                     ` Rich Felker
2025-02-08  9:20                       ` Daniele Dario
2025-02-08 12:39                         ` Rich Felker
2025-02-08 14:40                           ` Daniele Dario
2025-02-08 14:52                             ` Rich Felker
2025-02-10 16:12                               ` Daniele Personal
2025-02-10 18:14                                 ` Rich Felker
2025-02-11  9:34                                   ` Daniele Personal
2025-02-11 11:38                                     ` Rich Felker
2025-02-11 13:53                                       ` Daniele Personal
2025-02-10 18:44                                 ` Jeffrey Walton
2025-02-10 18:58                                   ` Rich Felker
2025-02-07 16:34                     ` Florian Weimer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).