* [musl] pthread_mutex_t shared between processes with different pid namespaces @ 2025-01-28 13:22 Daniele Personal 2025-01-28 15:02 ` Rich Felker 0 siblings, 1 reply; 28+ messages in thread From: Daniele Personal @ 2025-01-28 13:22 UTC (permalink / raw) To: musl Hello everyone, I'm working on a library linked by some processes in order to exchange information. Such library uses some pthread_mutex_t instances to safely read/write the information to exchange: the mutexes are created with the PTHREAD_PROCESS_SHARED and PTHREAD_MUTEX_ROBUST attributes and shared through shared memory mmapped by the processes. Now, for certain reasons, I have to run one of the processes in a container and I found that, after a random interval of time, the process in the container got stuck in a pthread_mutex_lock without any reason. After some investigation I figured out that if the container is started without pid namespace isolation everithing works like a charm. So the questions: is the pid namespace isolation a problem when working with shared mutexes or should I investigate in other directions? If the problem is pid namespace isolation, what could be done to make it working apart from sharing the same pid namespace? The actual development is based on musl 1.2.4 built with Yocto Scarthgap for aarch64 and arm. Thanks in advance for any help, Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-28 13:22 [musl] pthread_mutex_t shared between processes with different pid namespaces Daniele Personal @ 2025-01-28 15:02 ` Rich Felker 2025-01-28 16:13 ` Daniele Personal 2025-01-28 18:24 ` Florian Weimer 0 siblings, 2 replies; 28+ messages in thread From: Rich Felker @ 2025-01-28 15:02 UTC (permalink / raw) To: Daniele Personal; +Cc: musl On Tue, Jan 28, 2025 at 02:22:31PM +0100, Daniele Personal wrote: > Hello everyone, > I'm working on a library linked by some processes in order to exchange > information. Such library uses some pthread_mutex_t instances to safely > read/write the information to exchange: the mutexes are created with > the PTHREAD_PROCESS_SHARED and PTHREAD_MUTEX_ROBUST attributes and > shared through shared memory mmapped by the processes. > > Now, for certain reasons, I have to run one of the processes in a > container and I found that, after a random interval of time, the > process in the container got stuck in a pthread_mutex_lock without any > reason. > > After some investigation I figured out that if the container is started > without pid namespace isolation everithing works like a charm. > > So the questions: is the pid namespace isolation a problem when working > with shared mutexes or should I investigate in other directions? > If the problem is pid namespace isolation, what could be done to make > it working apart from sharing the same pid namespace? > > The actual development is based on musl 1.2.4 built with Yocto > Scarthgap for aarch64 and arm. Yes, the pid namespace boundary is your problem. Process-shared mutexes only work on the same logical system with a unique set of thread identifiers. If you're trying to share them across different pid namespaces, the same pid/tid may refer to different processes/threads in different ones, and it's not usable as a mutex ownership identity. If you want robust-mutex-like functionality that bridges pid namespaces, sysv semaphores are probably your only option. Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-28 15:02 ` Rich Felker @ 2025-01-28 16:13 ` Daniele Personal 2025-01-28 18:24 ` Florian Weimer 1 sibling, 0 replies; 28+ messages in thread From: Daniele Personal @ 2025-01-28 16:13 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Tue, 2025-01-28 at 10:02 -0500, Rich Felker wrote: > On Tue, Jan 28, 2025 at 02:22:31PM +0100, Daniele Personal wrote: > > Hello everyone, > > I'm working on a library linked by some processes in order to > > exchange > > information. Such library uses some pthread_mutex_t instances to > > safely > > read/write the information to exchange: the mutexes are created > > with > > the PTHREAD_PROCESS_SHARED and PTHREAD_MUTEX_ROBUST attributes and > > shared through shared memory mmapped by the processes. > > > > Now, for certain reasons, I have to run one of the processes in a > > container and I found that, after a random interval of time, the > > process in the container got stuck in a pthread_mutex_lock without > > any > > reason. > > > > After some investigation I figured out that if the container is > > started > > without pid namespace isolation everithing works like a charm. > > > > So the questions: is the pid namespace isolation a problem when > > working > > with shared mutexes or should I investigate in other directions? > > If the problem is pid namespace isolation, what could be done to > > make > > it working apart from sharing the same pid namespace? > > > > The actual development is based on musl 1.2.4 built with Yocto > > Scarthgap for aarch64 and arm. > > Yes, the pid namespace boundary is your problem. Process-shared > mutexes only work on the same logical system with a unique set of > thread identifiers. If you're trying to share them across different > pid namespaces, the same pid/tid may refer to different > processes/threads in different ones, and it's not usable as a mutex > ownership identity. > > If you want robust-mutex-like functionality that bridges pid > namespaces, sysv semaphores are probably your only option. > > Rich Oh, thanks for clarifying it. But at a first glance at man 2 semop I see: Each semaphore in a System V semaphore set has the following associated values: unsigned short semval; /* semaphore value */ unsigned short semzcnt; /* # waiting for zero */ unsigned short semncnt; /* # waiting for increase */ pid_t sempid; /* PID of process that last modified the semaphore value */ Could it be that also sysv semaphores don't like different pid namespaces? If so, the only chance is to use the same pid namespace, isn't it? Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-28 15:02 ` Rich Felker 2025-01-28 16:13 ` Daniele Personal @ 2025-01-28 18:24 ` Florian Weimer 2025-01-31 9:31 ` Daniele Personal 1 sibling, 1 reply; 28+ messages in thread From: Florian Weimer @ 2025-01-28 18:24 UTC (permalink / raw) To: Rich Felker; +Cc: Daniele Personal, musl * Rich Felker: > Yes, the pid namespace boundary is your problem. Process-shared > mutexes only work on the same logical system with a unique set of > thread identifiers. If you're trying to share them across different > pid namespaces, the same pid/tid may refer to different > processes/threads in different ones, and it's not usable as a mutex > ownership identity. Is this required for implementing the unlock-if-not-owner error code on mutex unlock? By the way, there is a proposal to teach the kernel to rewrite the ownership list of task exit: [PATCH v2 0/4] futex: Drop ROBUST_LIST_LIMIT <https://lore.kernel.org/linux-kernel/20250127202608.223864-1-andrealmeid@igalia.com/> I'm worried about the compatibility impact. Thanks, Florian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-28 18:24 ` Florian Weimer @ 2025-01-31 9:31 ` Daniele Personal 2025-01-31 20:30 ` Markus Wichmann 2025-02-01 16:03 ` Florian Weimer 0 siblings, 2 replies; 28+ messages in thread From: Daniele Personal @ 2025-01-31 9:31 UTC (permalink / raw) To: Florian Weimer, Rich Felker; +Cc: musl On Tue, 2025-01-28 at 19:24 +0100, Florian Weimer wrote: > * Rich Felker: > > > Yes, the pid namespace boundary is your problem. Process-shared > > mutexes only work on the same logical system with a unique set of > > thread identifiers. If you're trying to share them across different > > pid namespaces, the same pid/tid may refer to different > > processes/threads in different ones, and it's not usable as a mutex > > ownership identity. From what I see, the problem seems to happen only in case of contention of the mutex. int __pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *restrict at) { if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL && !a_cas(&m->_m_lock, 0, EBUSY)) return 0; int type = m->_m_type; int r, t, priv = (type & 128) ^ 128; r = __pthread_mutex_trylock(m); if (r != EBUSY) return r; IIUC, if it is not locked, the __pthread_mutex_timedlock will acquire it and return 0 (don't understand if with the first check or with the __pthread_mutex_trylock) and everything works. If instead it is locked the problem arises only inside the container. If it was a pthread_mutex_lock it waits forever, if it was a timed lock it exits after the timeout and you can retry. Is this correct? > > Is this required for implementing the unlock-if-not-owner error code > on > mutex unlock? No, I don't see problems related to EOWNERDEAD. > > By the way, there is a proposal to teach the kernel to rewrite the > ownership list of task exit: > > [PATCH v2 0/4] futex: Drop ROBUST_LIST_LIMIT > > <https://lore.kernel.org/linux-kernel/20250127202608.223864-1-andreal > meid@igalia.com/> > > I'm worried about the compatibility impact. > > Thanks, > Florian > Thanks, Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-31 9:31 ` Daniele Personal @ 2025-01-31 20:30 ` Markus Wichmann 2025-02-03 13:54 ` Daniele Personal 2025-02-01 16:03 ` Florian Weimer 1 sibling, 1 reply; 28+ messages in thread From: Markus Wichmann @ 2025-01-31 20:30 UTC (permalink / raw) To: musl; +Cc: Daniele Personal Am Fri, Jan 31, 2025 at 10:31:46AM +0100 schrieb Daniele Personal: > IIUC, if it is not locked, the __pthread_mutex_timedlock will acquire > it and return 0 (don't understand if with the first check or with the > __pthread_mutex_trylock) and everything works. > > If instead it is locked the problem arises only inside the container. > If it was a pthread_mutex_lock it waits forever, if it was a timed lock > it exits after the timeout and you can retry. > > Is this correct? > Essentially yes. If uncontended, kernel space never gets involved at all and everything just works, but if contended, futex wait and futex wake do not meet each other if issued from different PID namespaces. Thus they end up waiting until the timeout expires. Unless there is no timeout, then they wait until the user gets bored and kills the process. Ciao, Markus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-31 20:30 ` Markus Wichmann @ 2025-02-03 13:54 ` Daniele Personal 0 siblings, 0 replies; 28+ messages in thread From: Daniele Personal @ 2025-02-03 13:54 UTC (permalink / raw) To: Markus Wichmann, musl On Fri, 2025-01-31 at 21:30 +0100, Markus Wichmann wrote: > Am Fri, Jan 31, 2025 at 10:31:46AM +0100 schrieb Daniele Personal: > > IIUC, if it is not locked, the __pthread_mutex_timedlock will > > acquire > > it and return 0 (don't understand if with the first check or with > > the > > __pthread_mutex_trylock) and everything works. > > > > If instead it is locked the problem arises only inside the > > container. > > If it was a pthread_mutex_lock it waits forever, if it was a timed > > lock > > it exits after the timeout and you can retry. > > > > Is this correct? > > > > Essentially yes. If uncontended, kernel space never gets involved at > all > and everything just works, but if contended, futex wait and futex > wake > do not meet each other if issued from different PID namespaces. Thus > they end up waiting until the timeout expires. Unless there is no > timeout, then they wait until the user gets bored and kills the > process. > > Ciao, > Markus Thanks for the explanation. So there's no way to have pthread_mutexes mapped on shared memory shared between host and container working when the container is created with its own pid namespece? Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-01-31 9:31 ` Daniele Personal 2025-01-31 20:30 ` Markus Wichmann @ 2025-02-01 16:03 ` Florian Weimer 2025-02-03 12:58 ` Daniele Personal 1 sibling, 1 reply; 28+ messages in thread From: Florian Weimer @ 2025-02-01 16:03 UTC (permalink / raw) To: Daniele Personal; +Cc: Rich Felker, d.dario76, musl * Daniele Personal: >> Is this required for implementing the unlock-if-not-owner error code >> on mutex unlock? > > No, I don't see problems related to EOWNERDEAD. Sorry, what I meant is that the TID is needed for efficient reporting of usage errors. It's not imposed by the robust list protocol as such. There could be a PID-namespace-compatible robust mutex type that does not have this problem (but with less error checking). Thanks, Florian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-01 16:03 ` Florian Weimer @ 2025-02-03 12:58 ` Daniele Personal 2025-02-03 17:25 ` Florian Weimer 0 siblings, 1 reply; 28+ messages in thread From: Daniele Personal @ 2025-02-03 12:58 UTC (permalink / raw) To: Florian Weimer; +Cc: Rich Felker, d.dario76, musl On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > * Daniele Personal: > > > > Is this required for implementing the unlock-if-not-owner error > > > code > > > on mutex unlock? > > > > No, I don't see problems related to EOWNERDEAD. > > Sorry, what I meant is that the TID is needed for efficient reporting > of > usage errors. It's not imposed by the robust list protocol as such. > There could be a PID-namespace-compatible robust mutex type that does > not have this problem (but with less error checking). > > Thanks, > Florian > Are you saying that there are pthread_mutexes which can be shared across processes run on different pid namespaces? If yes I'm definitely interested on this. Can you tell me something more? What do you mean with "less error checking"? I need for sure to be able to detect the EOWNERDEAD condition in order to restore consistency but I'm not interested in recursive locks. Thanks, Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-03 12:58 ` Daniele Personal @ 2025-02-03 17:25 ` Florian Weimer 2025-02-04 16:48 ` Daniele Personal 2025-02-04 18:53 ` Rich Felker 0 siblings, 2 replies; 28+ messages in thread From: Florian Weimer @ 2025-02-03 17:25 UTC (permalink / raw) To: Daniele Personal; +Cc: d.dario76, Rich Felker, musl * Daniele Personal: > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: >> * Daniele Personal: >> >> > > Is this required for implementing the unlock-if-not-owner error >> > > code >> > > on mutex unlock? >> > >> > No, I don't see problems related to EOWNERDEAD. >> >> Sorry, what I meant is that the TID is needed for efficient reporting >> of >> usage errors. It's not imposed by the robust list protocol as such. >> There could be a PID-namespace-compatible robust mutex type that does >> not have this problem (but with less error checking). >> >> Thanks, >> Florian >> > > Are you saying that there are pthread_mutexes which can be shared > across processes run on different pid namespaces? If yes I'm definitely > interested on this. Can you tell me something more? You would have to add a new mutex type that is a mix of PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the latter, but without the ownership checks. Thanks, Florian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-03 17:25 ` Florian Weimer @ 2025-02-04 16:48 ` Daniele Personal 2025-02-04 18:53 ` Rich Felker 1 sibling, 0 replies; 28+ messages in thread From: Daniele Personal @ 2025-02-04 16:48 UTC (permalink / raw) To: Florian Weimer; +Cc: Rich Felker, musl [-- Attachment #1: Type: text/plain, Size: 2600 bytes --] On Mon, 2025-02-03 at 18:25 +0100, Florian Weimer wrote: > * Daniele Personal: > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > * Daniele Personal: > > > > > > > > Is this required for implementing the unlock-if-not-owner > > > > > error > > > > > code > > > > > on mutex unlock? > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > Sorry, what I meant is that the TID is needed for efficient > > > reporting > > > of > > > usage errors. It's not imposed by the robust list protocol as > > > such. > > > There could be a PID-namespace-compatible robust mutex type that > > > does > > > not have this problem (but with less error checking). > > > > > > Thanks, > > > Florian > > > > > > > Are you saying that there are pthread_mutexes which can be shared > > across processes run on different pid namespaces? If yes I'm > > definitely > > interested on this. Can you tell me something more? > > You would have to add a new mutex type that is a mix of > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the latter, > but without the ownership checks. > > Thanks, > Florian > I wrote a stupid test to exercise things. It creates (if needed) a shared object which will hold the pthread_mutex_t instance, mmaps it and if the shared object has been created, initializes the mutex (changing mutex initialization requires to delete the shared object). Then it locks the mutex for 5 seconds, unlocks it and locks it again after another 2 seconds and exits with the mutex locked. I compiled the code linking to musl 1.2.4 with Yocto or for my laptop linking to glibc 2.34 with gcc -D_GNU_SOURCE -Wall -pthread -g -O2 -c -o main.o main.c gcc -pthread -g -O2 -pthread -o mutex-test main.o -lrt I have a container which is started with its own pid, mount and uts namespaces (it shares IPC and network with the host). Running the test separately on host and container, both can recognize the case where the previous instance left the mutex locked and can recover from it. Running them in parallel gives two distinct results if the mutex is initialized with PTHREAD_PRIO_INHERIT protocol or PTHREAD_PRIO_NONE. If PTHREAD_PRIO_INHERIT is set, in case of contention, the waiter gets stuck. If PTHREAD_PRIO_NONE is used, everything seems to work fine: the application which starts later waits for the mutex to be released by the other one and gets waked property. I now need to understand if this behavior is expected and reliable or not. Thanks in advance, Daniele. [-- Attachment #2: main.c --] [-- Type: text/x-csrc, Size: 11941 bytes --] /* * main.c * * Created on: Feb 4, 2025 */ #include <time.h> #include <errno.h> #include <fcntl.h> #include <sched.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <pthread.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> /** * Definition of shared object to be * mmapped() by the different processes */ typedef struct _SharedMutex { /* A magic number just to check if object has been initialized */ uint32_t magic; /* The attributes used to initialize the shared mutex */ pthread_mutexattr_t attr; /* The shared mutex */ pthread_mutex_t lock; } SharedMutex; #define MAGIC 0xdeadbeef /** Default name to use when printing debug stuff */ static const char *app = "A"; #define DBG_PRINT(f,a...) \ do {\ struct timespec now; \ clock_gettime (CLOCK_MONOTONIC, &now); \ printf ("(%s) [%li.%09li] "f"\n", app, now.tv_sec, now.tv_nsec, ##a); \ } while (0); /** Protocol type: by default PTHREAD_PRIO_NONE */ static int prio = PTHREAD_PRIO_NONE; /** * @brief Initialize a mutex instance setting #PTHREAD_MUTEX_ROBUST and * #PTHREAD_PROCESS_SHARED attributes. This allows to have a mutex * usable within different processes and recoverable in case the * process which owns it crashes. * * @param lock A #pthread_mutex_t instance to initialize * * @return 0 on success, -1 otherwise */ static int mutex_init (pthread_mutex_t *lock, pthread_mutexattr_t *attr) { int res; pthread_mutexattr_init (attr); DBG_PRINT ("setting PTHREAD_PROCESS_SHARED attribute"); res = pthread_mutexattr_setpshared (attr, PTHREAD_PROCESS_SHARED); if (res != 0) { /* Failed to set SHARED */ printf ("failed to set PTHREAD_PROCESS_SHARED: %i %s\n", res, strerror (res)); return -1; } if (prio != PTHREAD_PRIO_NONE) { DBG_PRINT ("setting PTHREAD_PRIO_INHERIT attribute"); res = pthread_mutexattr_setprotocol (attr, PTHREAD_PRIO_INHERIT); if (res != 0) { /* Failed to set protocol */ printf ("failed to set PTHREAD_PRIO_INHERIT: %i %s\n", res, strerror (res)); return -1; } } DBG_PRINT ("setting PTHREAD_MUTEX_ROBUST attribute"); res = pthread_mutexattr_setrobust (attr, PTHREAD_MUTEX_ROBUST); if (res != 0) { printf ("failed to set PTHREAD_MUTEX_ROBUST: %i %s\n", res, strerror (res)); return -1; } /* * Initialize mutex instance. * These method always return 0 so no need to check for success. */ pthread_mutex_init (lock, attr); return 0; } /** * @brief Initialize a mutex instance setting #PTHREAD_MUTEX_ROBUST and * #PTHREAD_PROCESS_SHARED attributes. This allows to have a mutex * usable within different processes and recoverable in case the * process which owns it crashes. * * @param lock A #pthread_mutex_t instance to initialize * * @return 0 on success, -1 otherwise */ static int mutexattr_check (pthread_mutexattr_t *attr) { int res; int val; DBG_PRINT ("checking PTHREAD_PROCESS attribute"); res = pthread_mutexattr_getpshared (attr, &val); if (res != 0) { /* Failed to set SHARED */ printf ("failed to get PTHREAD_PROCESS attribute: %i %s\n", res, strerror (res)); return -1; } if (val != PTHREAD_PROCESS_SHARED) { printf ("mutex was not initialized with PTHREAD_PROCESS_SHARED" " attribute\n"); return -1; } DBG_PRINT ("checking protocol attribute"); res = pthread_mutexattr_getprotocol (attr, &val); if (res != 0) { /* Failed to set protocol */ printf ("failed to get protocol attribute: %i %s\n", res, strerror (res)); return -1; } if (val != prio) { printf ("mutex was initialized with protocol %s but we wanted %s\n", (val == PTHREAD_PRIO_NONE) ? "PTHREAD_PRIO_NONE" : "PTHREAD_PRIO_INHERIT", (prio == PTHREAD_PRIO_NONE) ? "PTHREAD_PRIO_NONE" : "PTHREAD_PRIO_INHERIT"); return -1; } DBG_PRINT ("checking PTHREAD_MUTEX_ROBUST attribute"); res = pthread_mutexattr_getrobust (attr, &val); if (res != 0) { printf ("failed to get robust attribute: %i %s\n", res, strerror (res)); return -1; } if (val != PTHREAD_MUTEX_ROBUST) { printf ("mutex was not initialized with PTHREAD_MUTEX_ROBUST" " attribute\n"); return -1; } return 0; } /** * @brief Lock the given mutex. If the mutex is currently unlocked, it * becomes locked and owned by the calling thread, and method * returns immediately. If the mutex is already locked by another * thread, method suspends the calling thread until the mutex is * unlocked. If the thread which was owning the mutex terminates, * method restores consistency of the mutex and locks it returning 0. * * @param lock A #pthread_mutex_t instance to lock * * @return 0 on success, a positive error code otherwise */ static int mutex_lock (pthread_mutex_t *lock) { /* Try to lock mutex */ int res = pthread_mutex_lock (lock); switch (res) { case 0: /* Common use case */ break; case EOWNERDEAD: /* * Process which was holding the mutex terminated: now we're * holding it but before to go on we have to make it consistent. */ DBG_PRINT ("restoring mutex consistency"); res = pthread_mutex_consistent (lock); if (res != 0) { /* Failed to restore consistency */ printf ("failed to restore consistency of mutex: %i %s\n", res, strerror (res)); } break; default: printf ("failed to lock mutex: %i %s\n", res, strerror (res)); break; } return res; } /** * @brief Try to open @shmfile shared object and if this is the first instance * that opens it, initialize shared memory. * * @param name Name of the shared memory object to be created or opened. * For portable use, a shared memory object should be identified * by a name of the form /some-name; that is, a null-terminated * string of up to NAME_MAX (i.e., 255) characters consisting of * an initial slash, followed by one or more characters, none of * which are slashes. * * @return pointer to #SharedMutex instance on success, NULL otherwise */ static SharedMutex *shm_map (const char *name) { #define SHM_OPS_INIT 1 #define SHM_OPS_CHECK 2 int fd; int res; int ops = SHM_OPS_INIT; size_t size = sizeof (SharedMutex); /* Sanity checks */ if (name == NULL) { printf ("unable to open shared object: missing shared object name\n"); return NULL; } /* Open handle to shared object */ fd = shm_open (name, O_RDWR | O_CREAT | O_EXCL, 0600); if (fd < 0 && errno == EEXIST) { /* * If memory object @name already exists, another process has * initialized the memory area. */ ops = SHM_OPS_CHECK; fd = shm_open (name, O_RDWR, 0600); } if (fd < 0) { /* Failed to open shared object */ printf ("unable to open shared object %s: %i %s\n", name, errno, strerror (errno)); return NULL; } if (ops == SHM_OPS_INIT) { /* Set desired size of shared object */ res = ftruncate (fd, (off_t) size); if (res < 0) { printf ("unable to set size of shared object %s: %i %s\n", name, errno, strerror (errno)); close (fd); shm_unlink (name); return NULL; } } /* Map shared memory region */ void *p = mmap (NULL, size, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, 0); if (p == MAP_FAILED) { printf ("unable to map shared object %s contents: %i %s\n", name, errno, strerror (errno)); close (fd); if (ops == SHM_OPS_INIT) { /* Also unlink shared object */ shm_unlink (name); } return NULL; } /* We can safely close file descriptor */ close (fd); /* Helper to access shared info */ SharedMutex *info = p; switch (ops) { case SHM_OPS_INIT: /* * Clear contents of shared memory area before to start working on it. * This should not be needed because #ftruncate() should do it but * it doesn't harm. */ memset (p, 0, size); /* Initialize shared lock */ if (mutex_init (&info->lock, &info->attr) != 0) { /* Cleanup stuff */ munmap (p, size); /* Remove shared object */ shm_unlink (name); return NULL; } /* Write magic number */ info->magic = MAGIC; break; case SHM_OPS_CHECK: /* * Shared object has already been created. Last thing that is set is the * magic value: loop until it becomes valid before to do other things. */ while (info->magic != MAGIC) { if (info->magic != 0 && info->magic != MAGIC) { printf ("shared object %s initialized with wrong magic:" " aborting\n", name); return NULL; } sched_yield (); } /* Check if attributes are consistent with our choices */ res = mutexattr_check (&info->attr); if (res != 0) { printf ("mutex attributes incompatible: aborting\n"); /* Cleanup stuff */ munmap (p, size); return NULL; } break; default: /* This should never happen */ printf ("invalid initialization option\n"); return NULL; } /* Return mapped shared object */ return info; } /** * @brief Show application usage */ static void usage (const char *name) { printf ("Usage: %s [options]\n", name); printf (" -h | --help : show this help\n"); printf (" -n | --name : a string to distinguish between running instances" " (default \"A\"\n"); printf (" -p | --prio_inherit : set PTHREAD_PRIO_INHERIT (default" " PTHREAD_PRIO_NONE)\n"); } /** * Main method */ int main (int argc, char *argv []) { int res; SharedMutex *obj; for (int c = 1; c < argc; c++) { if (strcmp (argv [c], "-h") == 0 || strcmp (argv [c], "--help") == 0) { usage (argv [0]); return -1; } if (strcmp (argv [c], "-n") == 0 || strcmp (argv [c], "--name") == 0) { if ((++c) >= argc) { /* Missing name argument */ usage (argv [0]); return -1; } /* Replace application name */ app = argv [c]; continue; } if (strcmp (argv [c], "-p") == 0 || strcmp (argv [c], "--prio-inherit") == 0) { /* Set priority inheritance */ prio = PTHREAD_PRIO_INHERIT; continue; } } DBG_PRINT ("Mapping shared object"); obj = shm_map ("/test"); if (obj == NULL) { /* Failed to create/open shared object */ return -1; } DBG_PRINT ("locking mutex"); res = mutex_lock (&obj->lock); if (res != 0) { /* Failed to lock mutex */ return -1; } DBG_PRINT ("waiting 5 [s] before to unlock mutex"); sleep (5); DBG_PRINT ("releasing mutex"); res = pthread_mutex_unlock (&obj->lock); if (res != 0) { /* Failed to unlock */ DBG_PRINT ("failed to unlock mutex: %i %s", res, strerror (res)); return -1; } DBG_PRINT ("waiting 2 [s] before to retry locking mutex"); sleep (2); DBG_PRINT ("locking mutex"); res = mutex_lock (&obj->lock); if (res != 0) { /* Failed to lock mutex */ return -1; } DBG_PRINT ("terminating with mutex locked"); return 0; } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-03 17:25 ` Florian Weimer 2025-02-04 16:48 ` Daniele Personal @ 2025-02-04 18:53 ` Rich Felker 2025-02-05 10:17 ` Daniele Personal 1 sibling, 1 reply; 28+ messages in thread From: Rich Felker @ 2025-02-04 18:53 UTC (permalink / raw) To: Florian Weimer; +Cc: Daniele Personal, d.dario76, musl On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > * Daniele Personal: > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > >> * Daniele Personal: > >> > >> > > Is this required for implementing the unlock-if-not-owner error > >> > > code > >> > > on mutex unlock? > >> > > >> > No, I don't see problems related to EOWNERDEAD. > >> > >> Sorry, what I meant is that the TID is needed for efficient reporting > >> of > >> usage errors. It's not imposed by the robust list protocol as such.. > >> There could be a PID-namespace-compatible robust mutex type that does > >> not have this problem (but with less error checking). > >> > >> Thanks, > >> Florian > >> > > > > Are you saying that there are pthread_mutexes which can be shared > > across processes run on different pid namespaces? If yes I'm definitely > > interested on this. Can you tell me something more? > > You would have to add a new mutex type that is a mix of > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the latter, > but without the ownership checks. This is inaccurate. Robust mutexes fundamentally depend on having the owner's tid in the owner field, and on this value not matching the tid of any other task that might hold the mutex. If these properties don't hold, the mutex may fail to unlock when the owner dies, or incorrectly unlock when another task mimicking the owner dies. The Linux robust mutex protocol fundamentally does not work across pid namespaces. Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-04 18:53 ` Rich Felker @ 2025-02-05 10:17 ` Daniele Personal 2025-02-05 10:32 ` Florian Weimer 0 siblings, 1 reply; 28+ messages in thread From: Daniele Personal @ 2025-02-05 10:17 UTC (permalink / raw) To: Rich Felker, Florian Weimer; +Cc: musl On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > > * Daniele Personal: > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > > * Daniele Personal: > > > > > > > > > > Is this required for implementing the unlock-if-not-owner > > > > > > error > > > > > > code > > > > > > on mutex unlock? > > > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > > > Sorry, what I meant is that the TID is needed for efficient > > > > reporting > > > > of > > > > usage errors. It's not imposed by the robust list protocol as > > > > such.. > > > > There could be a PID-namespace-compatible robust mutex type > > > > that does > > > > not have this problem (but with less error checking). > > > > > > > > Thanks, > > > > Florian > > > > > > > > > > Are you saying that there are pthread_mutexes which can be shared > > > across processes run on different pid namespaces? If yes I'm > > > definitely > > > interested on this. Can you tell me something more? > > > > You would have to add a new mutex type that is a mix of > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the > > latter, > > but without the ownership checks. > > This is inaccurate. Robust mutexes fundamentally depend on having the > owner's tid in the owner field, and on this value not matching the > tid > of any other task that might hold the mutex. If these properties > don't > hold, the mutex may fail to unlock when the owner dies, or > incorrectly > unlock when another task mimicking the owner dies. > > The Linux robust mutex protocol fundamentally does not work across > pid > namespaces. > > Rich Looking at the code for musl 1.2.4, a pthread_mutex_t which has been initialized as shared and robust but not PI capable leaves uncovered only the case of pthread_mutex_unlock(). int __pthread_mutex_unlock(pthread_mutex_t *m) { pthread_t self; int waiters = m->_m_waiters; int cont; int type = m->_m_type & 15; <== type = 4 (robust) int priv = (m->_m_type & 128) ^ 128; <== priv = 0 (shared) int new = 0; int old; if (type != PTHREAD_MUTEX_NORMAL) { /* this is executed because type != 0 */ self = __pthread_self(); old = m->_m_lock; int own = old & 0x3fffffff; if (own != self->tid) <== TIDs could collide!!! return EPERM; if ((type&3) == PTHREAD_MUTEX_RECURSIVE && m- >_m_count) return m->_m_count--, 0; ^^^ not executed: type&3 = 0 if ((type&4) && (old&0x40000000)) new = 0x7fffffff; if (!priv) { /* this is executed */ self->robust_list.pending = &m->_m_next; __vm_lock(); } volatile void *prev = m->_m_prev; volatile void *next = m->_m_next; *(volatile void *volatile *)prev = next; if (next != &self->robust_list.head) *(volatile void *volatile *) ((char *)next - sizeof(void *)) = prev; } if (type&8) { /* this is NOT executed: not PI capable */ if (old<0 || a_cas(&m->_m_lock, old, new)!=old) { if (new) a_store(&m->_m_waiters, -1); __syscall(SYS_futex, &m->_m_lock, FUTEX_UNLOCK_PI|priv); } cont = 0; waiters = 0; } else { /* this is executed */ cont = a_swap(&m->_m_lock, new); } if (type != PTHREAD_MUTEX_NORMAL && !priv) { /* this is executed */ self->robust_list.pending = 0; __vm_unlock(); } if (waiters || cont<0) __wake(&m->_m_lock, 1, priv); return 0; } As mentioned by Rich, since TIDs are not unique across different namespaces, a task might unlock a mutex hold by another one if they have the same TID. I don't see other possible errors, am I missing something? Anyway, apart from the implementation which could improve or cover corner cases, I have not found any paper which gives a clear statement of how things should behave. Moreover it might be good to mention the information that robust mutex protocol fundamentally does not work across pid namespaces or with which limitations. Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-05 10:17 ` Daniele Personal @ 2025-02-05 10:32 ` Florian Weimer 2025-02-06 7:45 ` Daniele Personal 0 siblings, 1 reply; 28+ messages in thread From: Florian Weimer @ 2025-02-05 10:32 UTC (permalink / raw) To: Daniele Personal; +Cc: Rich Felker, musl * Daniele Personal: > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: >> On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: >> > * Daniele Personal: >> > >> > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: >> > > > * Daniele Personal: >> > > > >> > > > > > Is this required for implementing the unlock-if-not-owner >> > > > > > error >> > > > > > code >> > > > > > on mutex unlock? >> > > > > >> > > > > No, I don't see problems related to EOWNERDEAD. >> > > > >> > > > Sorry, what I meant is that the TID is needed for efficient >> > > > reporting >> > > > of >> > > > usage errors. It's not imposed by the robust list protocol as >> > > > such.. >> > > > There could be a PID-namespace-compatible robust mutex type >> > > > that does >> > > > not have this problem (but with less error checking). >> > > > >> > > > Thanks, >> > > > Florian >> > > > >> > > >> > > Are you saying that there are pthread_mutexes which can be shared >> > > across processes run on different pid namespaces? If yes I'm >> > > definitely >> > > interested on this. Can you tell me something more? >> > >> > You would have to add a new mutex type that is a mix of >> > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the >> > latter, >> > but without the ownership checks. >> >> This is inaccurate. Robust mutexes fundamentally depend on having the >> owner's tid in the owner field, and on this value not matching the >> tid of any other task that might hold the mutex. If these properties >> don't hold, the mutex may fail to unlock when the owner dies, or >> incorrectly unlock when another task mimicking the owner dies. >> >> The Linux robust mutex protocol fundamentally does not work across >> pid namespaces. Thank you, Rich, for the correction. > Looking at the code for musl 1.2.4, a pthread_mutex_t which has been > initialized as shared and robust but not PI capable leaves uncovered > only the case of pthread_mutex_unlock(). > As mentioned by Rich, since TIDs are not unique across different > namespaces, a task might unlock a mutex hold by another one if they > have the same TID. > > I don't see other possible errors, am I missing something? The kernel code uses the owner TID to handle some special cases: /* * Special case for regular (non PI) futexes. The unlock path in * user space has two race scenarios: * * 1. The unlock path releases the user space futex value and * before it can execute the futex() syscall to wake up * waiters it is killed. * * 2. A woken up waiter is killed before it can acquire the * futex in user space. * * In the second case, the wake up notification could be generated * by the unlock path in user space after setting the futex value * to zero or by the kernel after setting the OWNER_DIED bit below. * * In both cases the TID validation below prevents a wakeup of * potential waiters which can cause these waiters to block * forever. * * In both cases the following conditions are met: * * 1) task->robust_list->list_op_pending != NULL * @pending_op == true * 2) The owner part of user space futex value == 0 * 3) Regular futex: @pi == false * * If these conditions are met, it is safe to attempt waking up a * potential waiter without touching the user space futex value and * trying to set the OWNER_DIED bit. If the futex value is zero, * the rest of the user space mutex state is consistent, so a woken * waiter will just take over the uncontended futex. Setting the * OWNER_DIED bit would create inconsistent state and malfunction * of the user space owner died handling. Otherwise, the OWNER_DIED * bit is already set, and the woken waiter is expected to deal with * this. */ owner = uval & FUTEX_TID_MASK; if (pending_op && !pi && !owner) { futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, FUTEX_BITSET_MATCH_ANY); return 0; } As a result, it's definitely just a userspace-only change if you need to use the robust mutex list across PID namespaces. Thanks, Florian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-05 10:32 ` Florian Weimer @ 2025-02-06 7:45 ` Daniele Personal 2025-02-07 16:19 ` Rich Felker 2025-02-07 16:34 ` Florian Weimer 0 siblings, 2 replies; 28+ messages in thread From: Daniele Personal @ 2025-02-06 7:45 UTC (permalink / raw) To: Florian Weimer; +Cc: Rich Felker, musl On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote: > * Daniele Personal: > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > > > > * Daniele Personal: > > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > > > > * Daniele Personal: > > > > > > > > > > > > > > Is this required for implementing the unlock-if-not- > > > > > > > > owner > > > > > > > > error > > > > > > > > code > > > > > > > > on mutex unlock? > > > > > > > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > > > > > > > Sorry, what I meant is that the TID is needed for efficient > > > > > > reporting > > > > > > of > > > > > > usage errors. It's not imposed by the robust list protocol > > > > > > as > > > > > > such.. > > > > > > There could be a PID-namespace-compatible robust mutex type > > > > > > that does > > > > > > not have this problem (but with less error checking). > > > > > > > > > > > > Thanks, > > > > > > Florian > > > > > > > > > > > > > > > > Are you saying that there are pthread_mutexes which can be > > > > > shared > > > > > across processes run on different pid namespaces? If yes I'm > > > > > definitely > > > > > interested on this. Can you tell me something more? > > > > > > > > You would have to add a new mutex type that is a mix of > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the > > > > latter, > > > > but without the ownership checks. > > > > > > This is inaccurate. Robust mutexes fundamentally depend on having > > > the > > > owner's tid in the owner field, and on this value not matching > > > the > > > tid of any other task that might hold the mutex. If these > > > properties > > > don't hold, the mutex may fail to unlock when the owner dies, or > > > incorrectly unlock when another task mimicking the owner dies. > > > > > > The Linux robust mutex protocol fundamentally does not work > > > across > > > pid namespaces. > > Thank you, Rich, for the correction. > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has > > been > > initialized as shared and robust but not PI capable leaves > > uncovered > > only the case of pthread_mutex_unlock(). > > > As mentioned by Rich, since TIDs are not unique across different > > namespaces, a task might unlock a mutex hold by another one if they > > have the same TID. > > > > I don't see other possible errors, am I missing something? > > The kernel code uses the owner TID to handle some special cases: > > /* > * Special case for regular (non PI) futexes. The unlock > path in > * user space has two race scenarios: > * > * 1. The unlock path releases the user space futex value > and > * before it can execute the futex() syscall to wake up > * waiters it is killed. > * > * 2. A woken up waiter is killed before it can acquire the > * futex in user space. > * > * In the second case, the wake up notification could be > generated > * by the unlock path in user space after setting the futex > value > * to zero or by the kernel after setting the OWNER_DIED bit > below. > * > * In both cases the TID validation below prevents a wakeup > of > * potential waiters which can cause these waiters to block > * forever. > * > * In both cases the following conditions are met: > * > * 1) task->robust_list->list_op_pending != NULL > * @pending_op == true > * 2) The owner part of user space futex value == 0 > * 3) Regular futex: @pi == false > * > * If these conditions are met, it is safe to attempt waking > up a > * potential waiter without touching the user space futex > value and > * trying to set the OWNER_DIED bit. If the futex value is > zero, > * the rest of the user space mutex state is consistent, so > a woken > * waiter will just take over the uncontended futex. Setting > the > * OWNER_DIED bit would create inconsistent state and > malfunction > * of the user space owner died handling. Otherwise, the > OWNER_DIED > * bit is already set, and the woken waiter is expected to > deal with > * this. > */ > owner = uval & FUTEX_TID_MASK; > > if (pending_op && !pi && !owner) { > futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, > FUTEX_BITSET_MATCH_ANY); > return 0; > } > > As a result, it's definitely just a userspace-only change if you need > to > use the robust mutex list across PID namespaces. > I tried to understand what you mean here but can't: can you please explain me which userspace-only change is needed? > Thanks, > Florian > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-06 7:45 ` Daniele Personal @ 2025-02-07 16:19 ` Rich Felker 2025-02-08 9:20 ` Daniele Dario 2025-02-07 16:34 ` Florian Weimer 1 sibling, 1 reply; 28+ messages in thread From: Rich Felker @ 2025-02-07 16:19 UTC (permalink / raw) To: Daniele Personal; +Cc: Florian Weimer, musl On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote: > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote: > > * Daniele Personal: > > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > > > > > * Daniele Personal: > > > > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > > > > > * Daniele Personal: > > > > > > > > > > > > > > > > Is this required for implementing the unlock-if-not- > > > > > > > > > owner > > > > > > > > > error > > > > > > > > > code > > > > > > > > > on mutex unlock? > > > > > > > > > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > > > > > > > > > Sorry, what I meant is that the TID is needed for efficient > > > > > > > reporting > > > > > > > of > > > > > > > usage errors. It's not imposed by the robust list protocol > > > > > > > as > > > > > > > such.. > > > > > > > There could be a PID-namespace-compatible robust mutex type > > > > > > > that does > > > > > > > not have this problem (but with less error checking). > > > > > > > > > > > > > > Thanks, > > > > > > > Florian > > > > > > > > > > > > > > > > > > > Are you saying that there are pthread_mutexes which can be > > > > > > shared > > > > > > across processes run on different pid namespaces? If yes I'm > > > > > > definitely > > > > > > interested on this. Can you tell me something more? > > > > > > > > > > You would have to add a new mutex type that is a mix of > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the > > > > > latter, > > > > > but without the ownership checks. > > > > > > > > This is inaccurate. Robust mutexes fundamentally depend on having > > > > the > > > > owner's tid in the owner field, and on this value not matching > > > > the > > > > tid of any other task that might hold the mutex. If these > > > > properties > > > > don't hold, the mutex may fail to unlock when the owner dies, or > > > > incorrectly unlock when another task mimicking the owner dies. > > > > > > > > The Linux robust mutex protocol fundamentally does not work > > > > across > > > > pid namespaces. > > > > Thank you, Rich, for the correction. > > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has > > > been > > > initialized as shared and robust but not PI capable leaves > > > uncovered > > > only the case of pthread_mutex_unlock(). > > > > > As mentioned by Rich, since TIDs are not unique across different > > > namespaces, a task might unlock a mutex hold by another one if they > > > have the same TID. > > > > > > I don't see other possible errors, am I missing something? > > > > The kernel code uses the owner TID to handle some special cases: > > > > /* > > * Special case for regular (non PI) futexes. The unlock > > path in > > * user space has two race scenarios: > > * > > * 1. The unlock path releases the user space futex value > > and > > * before it can execute the futex() syscall to wake up > > * waiters it is killed. > > * > > * 2. A woken up waiter is killed before it can acquire the > > * futex in user space. > > * > > * In the second case, the wake up notification could be > > generated > > * by the unlock path in user space after setting the futex > > value > > * to zero or by the kernel after setting the OWNER_DIED bit > > below. > > * > > * In both cases the TID validation below prevents a wakeup > > of > > * potential waiters which can cause these waiters to block > > * forever. > > * > > * In both cases the following conditions are met: > > * > > * 1) task->robust_list->list_op_pending != NULL > > * @pending_op == true > > * 2) The owner part of user space futex value == 0 > > * 3) Regular futex: @pi == false > > * > > * If these conditions are met, it is safe to attempt waking > > up a > > * potential waiter without touching the user space futex > > value and > > * trying to set the OWNER_DIED bit. If the futex value is > > zero, > > * the rest of the user space mutex state is consistent, so > > a woken > > * waiter will just take over the uncontended futex. Setting > > the > > * OWNER_DIED bit would create inconsistent state and > > malfunction > > * of the user space owner died handling. Otherwise, the > > OWNER_DIED > > * bit is already set, and the woken waiter is expected to > > deal with > > * this. > > */ > > owner = uval & FUTEX_TID_MASK; > > > > if (pending_op && !pi && !owner) { > > futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, > > FUTEX_BITSET_MATCH_ANY); > > return 0; > > } > > > > As a result, it's definitely just a userspace-only change if you need > > to > > use the robust mutex list across PID namespaces. > > > > I tried to understand what you mean here but can't: can you please > explain me which userspace-only change is needed? No such change is possible. Robust futexes inherently rely on the kernel being able to evaluate, on async process death, whether the dying task was the owner of a mutex in the robust list. This depends on the tid stored in memory being an accurate and unique identifier for the task. If you violate this, you can hack things make the userspace side work, but the whole robust functionality you want will fail to work. Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-07 16:19 ` Rich Felker @ 2025-02-08 9:20 ` Daniele Dario 2025-02-08 12:39 ` Rich Felker 0 siblings, 1 reply; 28+ messages in thread From: Daniele Dario @ 2025-02-08 9:20 UTC (permalink / raw) To: Rich Felker; +Cc: Florian Weimer, musl [-- Attachment #1: Type: text/plain, Size: 6136 bytes --] But wouldn't this mean that robust mutexes functionality is totally incompatible with pid namespaces? If the kernel relies on tid stored in memory by the process this always lacks the information about the pid namespace the tid belongs to. Daniele. Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@libc.org> ha scritto: > On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote: > > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote: > > > * Daniele Personal: > > > > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: > > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > > > > > > * Daniele Personal: > > > > > > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > > > > > > * Daniele Personal: > > > > > > > > > > > > > > > > > > Is this required for implementing the unlock-if-not- > > > > > > > > > > owner > > > > > > > > > > error > > > > > > > > > > code > > > > > > > > > > on mutex unlock? > > > > > > > > > > > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > > > > > > > > > > > Sorry, what I meant is that the TID is needed for efficient > > > > > > > > reporting > > > > > > > > of > > > > > > > > usage errors. It's not imposed by the robust list protocol > > > > > > > > as > > > > > > > > such.. > > > > > > > > There could be a PID-namespace-compatible robust mutex type > > > > > > > > that does > > > > > > > > not have this problem (but with less error checking). > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Florian > > > > > > > > > > > > > > > > > > > > > > Are you saying that there are pthread_mutexes which can be > > > > > > > shared > > > > > > > across processes run on different pid namespaces? If yes I'm > > > > > > > definitely > > > > > > > interested on this. Can you tell me something more? > > > > > > > > > > > > You would have to add a new mutex type that is a mix of > > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the > > > > > > latter, > > > > > > but without the ownership checks. > > > > > > > > > > This is inaccurate. Robust mutexes fundamentally depend on having > > > > > the > > > > > owner's tid in the owner field, and on this value not matching > > > > > the > > > > > tid of any other task that might hold the mutex. If these > > > > > properties > > > > > don't hold, the mutex may fail to unlock when the owner dies, or > > > > > incorrectly unlock when another task mimicking the owner dies. > > > > > > > > > > The Linux robust mutex protocol fundamentally does not work > > > > > across > > > > > pid namespaces. > > > > > > Thank you, Rich, for the correction. > > > > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has > > > > been > > > > initialized as shared and robust but not PI capable leaves > > > > uncovered > > > > only the case of pthread_mutex_unlock(). > > > > > > > As mentioned by Rich, since TIDs are not unique across different > > > > namespaces, a task might unlock a mutex hold by another one if they > > > > have the same TID. > > > > > > > > I don't see other possible errors, am I missing something? > > > > > > The kernel code uses the owner TID to handle some special cases: > > > > > > /* > > > * Special case for regular (non PI) futexes. The unlock > > > path in > > > * user space has two race scenarios: > > > * > > > * 1. The unlock path releases the user space futex value > > > and > > > * before it can execute the futex() syscall to wake up > > > * waiters it is killed. > > > * > > > * 2. A woken up waiter is killed before it can acquire the > > > * futex in user space. > > > * > > > * In the second case, the wake up notification could be > > > generated > > > * by the unlock path in user space after setting the futex > > > value > > > * to zero or by the kernel after setting the OWNER_DIED bit > > > below. > > > * > > > * In both cases the TID validation below prevents a wakeup > > > of > > > * potential waiters which can cause these waiters to block > > > * forever. > > > * > > > * In both cases the following conditions are met: > > > * > > > * 1) task->robust_list->list_op_pending != NULL > > > * @pending_op == true > > > * 2) The owner part of user space futex value == 0 > > > * 3) Regular futex: @pi == false > > > * > > > * If these conditions are met, it is safe to attempt waking > > > up a > > > * potential waiter without touching the user space futex > > > value and > > > * trying to set the OWNER_DIED bit. If the futex value is > > > zero, > > > * the rest of the user space mutex state is consistent, so > > > a woken > > > * waiter will just take over the uncontended futex. Setting > > > the > > > * OWNER_DIED bit would create inconsistent state and > > > malfunction > > > * of the user space owner died handling. Otherwise, the > > > OWNER_DIED > > > * bit is already set, and the woken waiter is expected to > > > deal with > > > * this. > > > */ > > > owner = uval & FUTEX_TID_MASK; > > > > > > if (pending_op && !pi && !owner) { > > > futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, > > > FUTEX_BITSET_MATCH_ANY); > > > return 0; > > > } > > > > > > As a result, it's definitely just a userspace-only change if you need > > > to > > > use the robust mutex list across PID namespaces. > > > > > > > I tried to understand what you mean here but can't: can you please > > explain me which userspace-only change is needed? > > No such change is possible. Robust futexes inherently rely on the > kernel being able to evaluate, on async process death, whether the > dying task was the owner of a mutex in the robust list. This depends > on the tid stored in memory being an accurate and unique identifier > for the task. If you violate this, you can hack things make the > userspace side work, but the whole robust functionality you want will > fail to work. > > Rich > [-- Attachment #2: Type: text/html, Size: 8645 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-08 9:20 ` Daniele Dario @ 2025-02-08 12:39 ` Rich Felker 2025-02-08 14:40 ` Daniele Dario 0 siblings, 1 reply; 28+ messages in thread From: Rich Felker @ 2025-02-08 12:39 UTC (permalink / raw) To: Daniele Dario; +Cc: Florian Weimer, musl On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > But wouldn't this mean that robust mutexes functionality is totally > incompatible with pid namespaces? No, only with trying to synchronize *across* different pid namespaces. > If the kernel relies on tid stored in memory by the process this always > lacks the information about the pid namespace the tid belongs to. It's necessarily within the same pid namespace as the process itself. Functionally, you should consider different pid namespaces as different systems that happen to be capable of sharing some resources. Rich > Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@libc.org> ha > scritto: > > > On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote: > > > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote: > > > > * Daniele Personal: > > > > > > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: > > > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > > > > > > > * Daniele Personal: > > > > > > > > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > > > > > > > * Daniele Personal: > > > > > > > > > > > > > > > > > > > > Is this required for implementing the unlock-if-not- > > > > > > > > > > > owner > > > > > > > > > > > error > > > > > > > > > > > code > > > > > > > > > > > on mutex unlock? > > > > > > > > > > > > > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > > > > > > > > > > > > > Sorry, what I meant is that the TID is needed for efficient > > > > > > > > > reporting > > > > > > > > > of > > > > > > > > > usage errors. It's not imposed by the robust list protocol > > > > > > > > > as > > > > > > > > > such.. > > > > > > > > > There could be a PID-namespace-compatible robust mutex type > > > > > > > > > that does > > > > > > > > > not have this problem (but with less error checking). > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Florian > > > > > > > > > > > > > > > > > > > > > > > > > Are you saying that there are pthread_mutexes which can be > > > > > > > > shared > > > > > > > > across processes run on different pid namespaces? If yes I'm > > > > > > > > definitely > > > > > > > > interested on this. Can you tell me something more? > > > > > > > > > > > > > > You would have to add a new mutex type that is a mix of > > > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the > > > > > > > latter, > > > > > > > but without the ownership checks. > > > > > > > > > > > > This is inaccurate. Robust mutexes fundamentally depend on having > > > > > > the > > > > > > owner's tid in the owner field, and on this value not matching > > > > > > the > > > > > > tid of any other task that might hold the mutex. If these > > > > > > properties > > > > > > don't hold, the mutex may fail to unlock when the owner dies, or > > > > > > incorrectly unlock when another task mimicking the owner dies. > > > > > > > > > > > > The Linux robust mutex protocol fundamentally does not work > > > > > > across > > > > > > pid namespaces. > > > > > > > > Thank you, Rich, for the correction. > > > > > > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has > > > > > been > > > > > initialized as shared and robust but not PI capable leaves > > > > > uncovered > > > > > only the case of pthread_mutex_unlock(). > > > > > > > > > As mentioned by Rich, since TIDs are not unique across different > > > > > namespaces, a task might unlock a mutex hold by another one if they > > > > > have the same TID. > > > > > > > > > > I don't see other possible errors, am I missing something? > > > > > > > > The kernel code uses the owner TID to handle some special cases: > > > > > > > > /* > > > > * Special case for regular (non PI) futexes. The unlock > > > > path in > > > > * user space has two race scenarios: > > > > * > > > > * 1. The unlock path releases the user space futex value > > > > and > > > > * before it can execute the futex() syscall to wake up > > > > * waiters it is killed. > > > > * > > > > * 2. A woken up waiter is killed before it can acquire the > > > > * futex in user space. > > > > * > > > > * In the second case, the wake up notification could be > > > > generated > > > > * by the unlock path in user space after setting the futex > > > > value > > > > * to zero or by the kernel after setting the OWNER_DIED bit > > > > below. > > > > * > > > > * In both cases the TID validation below prevents a wakeup > > > > of > > > > * potential waiters which can cause these waiters to block > > > > * forever. > > > > * > > > > * In both cases the following conditions are met: > > > > * > > > > * 1) task->robust_list->list_op_pending != NULL > > > > * @pending_op == true > > > > * 2) The owner part of user space futex value == 0 > > > > * 3) Regular futex: @pi == false > > > > * > > > > * If these conditions are met, it is safe to attempt waking > > > > up a > > > > * potential waiter without touching the user space futex > > > > value and > > > > * trying to set the OWNER_DIED bit. If the futex value is > > > > zero, > > > > * the rest of the user space mutex state is consistent, so > > > > a woken > > > > * waiter will just take over the uncontended futex. Setting > > > > the > > > > * OWNER_DIED bit would create inconsistent state and > > > > malfunction > > > > * of the user space owner died handling. Otherwise, the > > > > OWNER_DIED > > > > * bit is already set, and the woken waiter is expected to > > > > deal with > > > > * this. > > > > */ > > > > owner = uval & FUTEX_TID_MASK; > > > > > > > > if (pending_op && !pi && !owner) { > > > > futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, > > > > FUTEX_BITSET_MATCH_ANY); > > > > return 0; > > > > } > > > > > > > > As a result, it's definitely just a userspace-only change if you need > > > > to > > > > use the robust mutex list across PID namespaces. > > > > > > > > > > I tried to understand what you mean here but can't: can you please > > > explain me which userspace-only change is needed? > > > > No such change is possible. Robust futexes inherently rely on the > > kernel being able to evaluate, on async process death, whether the > > dying task was the owner of a mutex in the robust list. This depends > > on the tid stored in memory being an accurate and unique identifier > > for the task. If you violate this, you can hack things make the > > userspace side work, but the whole robust functionality you want will > > fail to work. > > > > Rich > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-08 12:39 ` Rich Felker @ 2025-02-08 14:40 ` Daniele Dario 2025-02-08 14:52 ` Rich Felker 0 siblings, 1 reply; 28+ messages in thread From: Daniele Dario @ 2025-02-08 14:40 UTC (permalink / raw) To: Rich Felker; +Cc: Florian Weimer, musl [-- Attachment #1: Type: text/plain, Size: 7596 bytes --] Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto: > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > > But wouldn't this mean that robust mutexes functionality is totally > > incompatible with pid namespaces? > > No, only with trying to synchronize *across* different pid namespaces. > > > If the kernel relies on tid stored in memory by the process this always > > lacks the information about the pid namespace the tid belongs to. > > It's necessarily within the same pid namespace as the process itself. > > Functionally, you should consider different pid namespaces as > different systems that happen to be capable of sharing some resources. > > Rich > Yes, I'm just saying that sharing pthread_mutex_t instances across processes within the same pid namespace but on a system with more than a pid namespace could lead to issues anyway if the stored tid value is used by the kernel as who to contact without the knowledge of on which pid namespace. I not saying this is true, I'm trying to understand and if possible, improve things. Daniele > > > > Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@libc.org> ha > > scritto: > > > > > On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote: > > > > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote: > > > > > * Daniele Personal: > > > > > > > > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote: > > > > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote: > > > > > > > > * Daniele Personal: > > > > > > > > > > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote: > > > > > > > > > > * Daniele Personal: > > > > > > > > > > > > > > > > > > > > > > Is this required for implementing the unlock-if-not- > > > > > > > > > > > > owner > > > > > > > > > > > > error > > > > > > > > > > > > code > > > > > > > > > > > > on mutex unlock? > > > > > > > > > > > > > > > > > > > > > > No, I don't see problems related to EOWNERDEAD. > > > > > > > > > > > > > > > > > > > > Sorry, what I meant is that the TID is needed for > efficient > > > > > > > > > > reporting > > > > > > > > > > of > > > > > > > > > > usage errors. It's not imposed by the robust list > protocol > > > > > > > > > > as > > > > > > > > > > such.. > > > > > > > > > > There could be a PID-namespace-compatible robust mutex > type > > > > > > > > > > that does > > > > > > > > > > not have this problem (but with less error checking). > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Florian > > > > > > > > > > > > > > > > > > > > > > > > > > > > Are you saying that there are pthread_mutexes which can be > > > > > > > > > shared > > > > > > > > > across processes run on different pid namespaces? If yes > I'm > > > > > > > > > definitely > > > > > > > > > interested on this. Can you tell me something more? > > > > > > > > > > > > > > > > You would have to add a new mutex type that is a mix of > > > > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the > > > > > > > > latter, > > > > > > > > but without the ownership checks. > > > > > > > > > > > > > > This is inaccurate. Robust mutexes fundamentally depend on > having > > > > > > > the > > > > > > > owner's tid in the owner field, and on this value not matching > > > > > > > the > > > > > > > tid of any other task that might hold the mutex. If these > > > > > > > properties > > > > > > > don't hold, the mutex may fail to unlock when the owner dies, > or > > > > > > > incorrectly unlock when another task mimicking the owner dies. > > > > > > > > > > > > > > The Linux robust mutex protocol fundamentally does not work > > > > > > > across > > > > > > > pid namespaces. > > > > > > > > > > Thank you, Rich, for the correction. > > > > > > > > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has > > > > > > been > > > > > > initialized as shared and robust but not PI capable leaves > > > > > > uncovered > > > > > > only the case of pthread_mutex_unlock(). > > > > > > > > > > > As mentioned by Rich, since TIDs are not unique across different > > > > > > namespaces, a task might unlock a mutex hold by another one if > they > > > > > > have the same TID. > > > > > > > > > > > > I don't see other possible errors, am I missing something? > > > > > > > > > > The kernel code uses the owner TID to handle some special cases: > > > > > > > > > > /* > > > > > * Special case for regular (non PI) futexes. The unlock > > > > > path in > > > > > * user space has two race scenarios: > > > > > * > > > > > * 1. The unlock path releases the user space futex value > > > > > and > > > > > * before it can execute the futex() syscall to wake up > > > > > * waiters it is killed. > > > > > * > > > > > * 2. A woken up waiter is killed before it can acquire the > > > > > * futex in user space. > > > > > * > > > > > * In the second case, the wake up notification could be > > > > > generated > > > > > * by the unlock path in user space after setting the futex > > > > > value > > > > > * to zero or by the kernel after setting the OWNER_DIED bit > > > > > below. > > > > > * > > > > > * In both cases the TID validation below prevents a wakeup > > > > > of > > > > > * potential waiters which can cause these waiters to block > > > > > * forever. > > > > > * > > > > > * In both cases the following conditions are met: > > > > > * > > > > > * 1) task->robust_list->list_op_pending != NULL > > > > > * @pending_op == true > > > > > * 2) The owner part of user space futex value == 0 > > > > > * 3) Regular futex: @pi == false > > > > > * > > > > > * If these conditions are met, it is safe to attempt waking > > > > > up a > > > > > * potential waiter without touching the user space futex > > > > > value and > > > > > * trying to set the OWNER_DIED bit. If the futex value is > > > > > zero, > > > > > * the rest of the user space mutex state is consistent, so > > > > > a woken > > > > > * waiter will just take over the uncontended futex. Setting > > > > > the > > > > > * OWNER_DIED bit would create inconsistent state and > > > > > malfunction > > > > > * of the user space owner died handling. Otherwise, the > > > > > OWNER_DIED > > > > > * bit is already set, and the woken waiter is expected to > > > > > deal with > > > > > * this. > > > > > */ > > > > > owner = uval & FUTEX_TID_MASK; > > > > > > > > > > if (pending_op && !pi && !owner) { > > > > > futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, > > > > > FUTEX_BITSET_MATCH_ANY); > > > > > return 0; > > > > > } > > > > > > > > > > As a result, it's definitely just a userspace-only change if you > need > > > > > to > > > > > use the robust mutex list across PID namespaces. > > > > > > > > > > > > > I tried to understand what you mean here but can't: can you please > > > > explain me which userspace-only change is needed? > > > > > > No such change is possible. Robust futexes inherently rely on the > > > kernel being able to evaluate, on async process death, whether the > > > dying task was the owner of a mutex in the robust list. This depends > > > on the tid stored in memory being an accurate and unique identifier > > > for the task. If you violate this, you can hack things make the > > > userspace side work, but the whole robust functionality you want will > > > fail to work. > > > > > > Rich > > > > [-- Attachment #2: Type: text/html, Size: 11432 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-08 14:40 ` Daniele Dario @ 2025-02-08 14:52 ` Rich Felker 2025-02-10 16:12 ` Daniele Personal 0 siblings, 1 reply; 28+ messages in thread From: Rich Felker @ 2025-02-08 14:52 UTC (permalink / raw) To: Daniele Dario; +Cc: Florian Weimer, musl On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto: > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > > > But wouldn't this mean that robust mutexes functionality is totally > > > incompatible with pid namespaces? > > > > No, only with trying to synchronize *across* different pid namespaces. > > > > > If the kernel relies on tid stored in memory by the process this always > > > lacks the information about the pid namespace the tid belongs to. > > > > It's necessarily within the same pid namespace as the process itself. > > > > Functionally, you should consider different pid namespaces as > > different systems that happen to be capable of sharing some resources. > > > > Rich > > > > Yes, I'm just saying that sharing pthread_mutex_t instances across > processes within the same pid namespace but on a system with more than a > pid namespace could lead to issues anyway if the stored tid value is used > by the kernel as who to contact without the knowledge of on which pid > namespace. > > I not saying this is true, I'm trying to understand and if possible, > improve things. That's not a problem. The stored tid is used only in the context of a process exiting, where the kernel code knows the relevant pid namespace (the one the exiting process is in) and uses the tid relative to that. If it didn't work this way, it would be a fatal bug in the pid namespace implementation, which is supposed to allow essentially transparent containerization (which includes processes in the ns being able to use their tids as they could if they were outside of any container/in global ns). Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-08 14:52 ` Rich Felker @ 2025-02-10 16:12 ` Daniele Personal 2025-02-10 18:14 ` Rich Felker 2025-02-10 18:44 ` Jeffrey Walton 0 siblings, 2 replies; 28+ messages in thread From: Daniele Personal @ 2025-02-10 16:12 UTC (permalink / raw) To: Rich Felker; +Cc: Florian Weimer, musl On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto: > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > > > > But wouldn't this mean that robust mutexes functionality is > > > > totally > > > > incompatible with pid namespaces? > > > > > > No, only with trying to synchronize *across* different pid > > > namespaces. > > > > > > > If the kernel relies on tid stored in memory by the process > > > > this always > > > > lacks the information about the pid namespace the tid belongs > > > > to. > > > > > > It's necessarily within the same pid namespace as the process > > > itself. > > > > > > Functionally, you should consider different pid namespaces as > > > different systems that happen to be capable of sharing some > > > resources. > > > > > > Rich > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances across > > processes within the same pid namespace but on a system with more > > than a > > pid namespace could lead to issues anyway if the stored tid value > > is used > > by the kernel as who to contact without the knowledge of on which > > pid > > namespace. > > > > I not saying this is true, I'm trying to understand and if > > possible, > > improve things. > > That's not a problem. The stored tid is used only in the context of a > process exiting, where the kernel code knows the relevant pid > namespace (the one the exiting process is in) and uses the tid > relative to that. If it didn't work this way, it would be a fatal bug > in the pid namespace implementation, which is supposed to allow > essentially transparent containerization (which includes processes in > the ns being able to use their tids as they could if they were > outside > of any container/in global ns). > > Rich > So, IIUC, the problem of sharing robust pthread_mutex_t instances across different pid namespaces is on the user space side which is not able to distinguish clashes on TIDs. In particular, problems could arise when: * an application tries to unlock a mutex owned by another one with its same TID but on a different pid namespace (but this is an application design problem and libc can't help because TIDs are not unique across different pid namespaces) * an application tries to lock a mutex owned by another one with its same TID but on a different pid namespace: this is a real issue because it could happen I know that pid namespace isolation usually comes also with ipc namespace isolation but it is not a violation to have one without the other. Wouldn't it be a good idea to figure out a way to have a safe way to use robust mutexes shared across different pid namespaces? Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-10 16:12 ` Daniele Personal @ 2025-02-10 18:14 ` Rich Felker 2025-02-11 9:34 ` Daniele Personal 2025-02-10 18:44 ` Jeffrey Walton 1 sibling, 1 reply; 28+ messages in thread From: Rich Felker @ 2025-02-10 18:14 UTC (permalink / raw) To: Daniele Personal; +Cc: Florian Weimer, musl On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote: > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto: > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > > > > > But wouldn't this mean that robust mutexes functionality is > > > > > totally > > > > > incompatible with pid namespaces? > > > > > > > > No, only with trying to synchronize *across* different pid > > > > namespaces. > > > > > > > > > If the kernel relies on tid stored in memory by the process > > > > > this always > > > > > lacks the information about the pid namespace the tid belongs > > > > > to. > > > > > > > > It's necessarily within the same pid namespace as the process > > > > itself. > > > > > > > > Functionally, you should consider different pid namespaces as > > > > different systems that happen to be capable of sharing some > > > > resources. > > > > > > > > Rich > > > > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances across > > > processes within the same pid namespace but on a system with more > > > than a > > > pid namespace could lead to issues anyway if the stored tid value > > > is used > > > by the kernel as who to contact without the knowledge of on which > > > pid > > > namespace. > > > > > > I not saying this is true, I'm trying to understand and if > > > possible, > > > improve things. > > > > That's not a problem. The stored tid is used only in the context of a > > process exiting, where the kernel code knows the relevant pid > > namespace (the one the exiting process is in) and uses the tid > > relative to that. If it didn't work this way, it would be a fatal bug > > in the pid namespace implementation, which is supposed to allow > > essentially transparent containerization (which includes processes in > > the ns being able to use their tids as they could if they were > > outside > > of any container/in global ns). > > > > Rich > > > > So, IIUC, the problem of sharing robust pthread_mutex_t instances > across different pid namespaces is on the user space side which is not > able to distinguish clashes on TIDs. In particular, problems could > arise when: No, it is not "on the user side". The user side can be modified arbitrarily, and, modulo some cost, could surely be made to work for non-robust process-shared mutexes. The problem is that the kernel -- the part which makes them robust -- has to honor the protocol, and the protocol does not admit distinguishing "pid N in ns X" from "pid N in ns Y". > * an application tries to unlock a mutex owned by another one with its > same TID but on a different pid namespace (but this is an > application design problem and libc can't help because TIDs are not > unique across different pid namespaces) > * an application tries to lock a mutex owned by another one with its > same TID but on a different pid namespace: this is a real issue > because it could happen > > I know that pid namespace isolation usually comes also with ipc > namespace isolation but it is not a violation to have one without the > other. Wouldn't it be a good idea to figure out a way to have a safe > way to use robust mutexes shared across different pid namespaces? I do not consider this a reasonable expenditure of complexity whatsoever. It would require at least having a new robust list protocol, with userspace having to support both the old and new ones adapting at runtime, and may even require larger-than-wordsize atomics, which are not something you can assume exists. All of this for the explicit purpose of *violating* the whole intended purpose of namespaces: the isolation. For cases where you really need cross-ns locking, you could use sysv semaphores if the sysvipc namespace is shared. If it's not, you could use fcntl ODF locks on a shared file descriptor, which should have your needed robustness properties. Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-10 18:14 ` Rich Felker @ 2025-02-11 9:34 ` Daniele Personal 2025-02-11 11:38 ` Rich Felker 0 siblings, 1 reply; 28+ messages in thread From: Daniele Personal @ 2025-02-11 9:34 UTC (permalink / raw) To: Rich Felker; +Cc: Florian Weimer, musl On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote: > On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote: > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha > > > > scritto: > > > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario > > > > > wrote: > > > > > > But wouldn't this mean that robust mutexes functionality is > > > > > > totally > > > > > > incompatible with pid namespaces? > > > > > > > > > > No, only with trying to synchronize *across* different pid > > > > > namespaces. > > > > > > > > > > > If the kernel relies on tid stored in memory by the process > > > > > > this always > > > > > > lacks the information about the pid namespace the tid > > > > > > belongs > > > > > > to. > > > > > > > > > > It's necessarily within the same pid namespace as the process > > > > > itself. > > > > > > > > > > Functionally, you should consider different pid namespaces as > > > > > different systems that happen to be capable of sharing some > > > > > resources. > > > > > > > > > > Rich > > > > > > > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances > > > > across > > > > processes within the same pid namespace but on a system with > > > > more > > > > than a > > > > pid namespace could lead to issues anyway if the stored tid > > > > value > > > > is used > > > > by the kernel as who to contact without the knowledge of on > > > > which > > > > pid > > > > namespace. > > > > > > > > I not saying this is true, I'm trying to understand and if > > > > possible, > > > > improve things. > > > > > > That's not a problem. The stored tid is used only in the context > > > of a > > > process exiting, where the kernel code knows the relevant pid > > > namespace (the one the exiting process is in) and uses the tid > > > relative to that. If it didn't work this way, it would be a fatal > > > bug > > > in the pid namespace implementation, which is supposed to allow > > > essentially transparent containerization (which includes > > > processes in > > > the ns being able to use their tids as they could if they were > > > outside > > > of any container/in global ns). > > > > > > Rich > > > > > > > So, IIUC, the problem of sharing robust pthread_mutex_t instances > > across different pid namespaces is on the user space side which is > > not > > able to distinguish clashes on TIDs. In particular, problems could > > arise when: > > No, it is not "on the user side". The user side can be modified > arbitrarily, and, modulo some cost, could surely be made to work for > non-robust process-shared mutexes. The problem is that the kernel -- > the part which makes them robust -- has to honor the protocol, and > the > protocol does not admit distinguishing "pid N in ns X" from "pid N in > ns Y". Ah, I thought your previous sentence was saying that the kernel is able to make this distinction. > > > * an application tries to unlock a mutex owned by another one with > > its > > same TID but on a different pid namespace (but this is an > > application design problem and libc can't help because TIDs are > > not > > unique across different pid namespaces) > > * an application tries to lock a mutex owned by another one with > > its > > same TID but on a different pid namespace: this is a real issue > > because it could happen > > > > I know that pid namespace isolation usually comes also with ipc > > namespace isolation but it is not a violation to have one without > > the > > other. Wouldn't it be a good idea to figure out a way to have a > > safe > > way to use robust mutexes shared across different pid namespaces? > > I do not consider this a reasonable expenditure of complexity > whatsoever. It would require at least having a new robust list > protocol, with userspace having to support both the old and new ones > adapting at runtime, and may even require larger-than-wordsize > atomics, which are not something you can assume exists. All of this > for the explicit purpose of *violating* the whole intended purpose of > namespaces: the isolation. > > For cases where you really need cross-ns locking, you could use sysv > semaphores if the sysvipc namespace is shared. If it's not, you could > use fcntl ODF locks on a shared file descriptor, which should have > your needed robustness properties. > > Rich Unfortunately it is not possible to say which variables need cross-ns locking and which not. This means that we should treat all in the same way and so replace all the mutexes with sysv semaphores but this has some costs: locking sysv semaphores always require syscalls and context switch between user/kernel spaces even if there's no contention and moreover, they imply the presence of accessible files. We basically use a chunk of shared memory as a storage where variables could be added/read/written by the various applications. Since mutexes used to protect the variables are embedded in the same chunk of shared memory, there is only an mmap needed in order to access the storage by applications. Up to now, applications were running in the same pid namespace but now, for some products, we needed to integrate a 3rd party application and this requires a certain degree of isolation so we opted to containerize this application and here we come to why I asked for clarifications. I get your point when you say that sharing robust pthread_mutex_t instances violates the pid namespace isolation but you choose the degree of isolation balancing the risks and the benefits. Even if you have a new mount namespace you can decide to bind mount some parts of the filesystem to allow access to pars of the host flash for instance, same could happen with network. Long story short, I'm pulling water to my mill, but I think that it's not bad to have posix robust shared mutexes working across different pid namespaces. It will allow users to use a really powerful tool also with containerized applications (again pulling water to my mill) which need it. If there's any idea on how to gain this I'd really work on it: limiting the max number of pids which could run on a pid namespace to allow the use of some bits for the ns in the tid stored in the robust list for instance? On the other hand I'll surely try what you suggested. Thanks, Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-11 9:34 ` Daniele Personal @ 2025-02-11 11:38 ` Rich Felker 2025-02-11 13:53 ` Daniele Personal 0 siblings, 1 reply; 28+ messages in thread From: Rich Felker @ 2025-02-11 11:38 UTC (permalink / raw) To: Daniele Personal; +Cc: Florian Weimer, musl On Tue, Feb 11, 2025 at 10:34:30AM +0100, Daniele Personal wrote: > On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote: > > On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote: > > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha > > > > > scritto: > > > > > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario > > > > > > wrote: > > > > > > > But wouldn't this mean that robust mutexes functionality is > > > > > > > totally > > > > > > > incompatible with pid namespaces? > > > > > > > > > > > > No, only with trying to synchronize *across* different pid > > > > > > namespaces. > > > > > > > > > > > > > If the kernel relies on tid stored in memory by the process > > > > > > > this always > > > > > > > lacks the information about the pid namespace the tid > > > > > > > belongs > > > > > > > to. > > > > > > > > > > > > It's necessarily within the same pid namespace as the process > > > > > > itself. > > > > > > > > > > > > Functionally, you should consider different pid namespaces as > > > > > > different systems that happen to be capable of sharing some > > > > > > resources. > > > > > > > > > > > > Rich > > > > > > > > > > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances > > > > > across > > > > > processes within the same pid namespace but on a system with > > > > > more > > > > > than a > > > > > pid namespace could lead to issues anyway if the stored tid > > > > > value > > > > > is used > > > > > by the kernel as who to contact without the knowledge of on > > > > > which > > > > > pid > > > > > namespace. > > > > > > > > > > I not saying this is true, I'm trying to understand and if > > > > > possible, > > > > > improve things. > > > > > > > > That's not a problem. The stored tid is used only in the context > > > > of a > > > > process exiting, where the kernel code knows the relevant pid > > > > namespace (the one the exiting process is in) and uses the tid > > > > relative to that. If it didn't work this way, it would be a fatal > > > > bug > > > > in the pid namespace implementation, which is supposed to allow > > > > essentially transparent containerization (which includes > > > > processes in > > > > the ns being able to use their tids as they could if they were > > > > outside > > > > of any container/in global ns). > > > > > > > > Rich > > > > > > > > > > So, IIUC, the problem of sharing robust pthread_mutex_t instances > > > across different pid namespaces is on the user space side which is > > > not > > > able to distinguish clashes on TIDs. In particular, problems could > > > arise when: > > > > No, it is not "on the user side". The user side can be modified > > arbitrarily, and, modulo some cost, could surely be made to work for > > non-robust process-shared mutexes. The problem is that the kernel -- > > the part which makes them robust -- has to honor the protocol, and > > the > > protocol does not admit distinguishing "pid N in ns X" from "pid N in > > ns Y". > > Ah, I thought your previous sentence was saying that the kernel is able > to make this distinction. No, it's able to make the *assumption* that the namespace the tid is relative to is that of the dying process. That's what lets it work (and a large part of why namespaces were practical to add to Linux to begin with -- all of the existing interfaces that use pids/tids need to know which namespace you're talking about, but they work because the kernel can assume "same namespace as the executing task"). > Unfortunately it is not possible to say which variables need cross-ns > locking and which not. This means that we should treat all in the same > way and so replace all the mutexes with sysv semaphores but this has > some costs: locking sysv semaphores always require syscalls and context > switch between user/kernel spaces even if there's no contention and > moreover, they imply the presence of accessible files. > > We basically use a chunk of shared memory as a storage where variables > could be added/read/written by the various applications. Since mutexes > used to protect the variables are embedded in the same chunk of shared > memory, there is only an mmap needed in order to access the storage by > applications. > > Up to now, applications were running in the same pid namespace but now, > for some products, we needed to integrate a 3rd party application and > this requires a certain degree of isolation so we opted to containerize > this application and here we come to why I asked for clarifications. > > I get your point when you say that sharing robust pthread_mutex_t > instances violates the pid namespace isolation but you choose the > degree of isolation balancing the risks and the benefits. Even if you > have a new mount namespace you can decide to bind mount some parts of > the filesystem to allow access to pars of the host flash for instance, > same could happen with network. > > Long story short, I'm pulling water to my mill, but I think that it's > not bad to have posix robust shared mutexes working across different > pid namespaces. It will allow users to use a really powerful tool also > with containerized applications (again pulling water to my mill) which > need it. Generally we implement nonstandard functionality only on the basis of strong historical precedent, need by multiple major real-world applications, lack of cost imposed onto everyone else who doesn't want/need the functionality, and other similar conditions. On all of these axes, the thing you're asking for is completely in the opposite direction. > If there's any idea on how to gain this I'd really work on it: limiting > the max number of pids which could run on a pid namespace to allow the > use of some bits for the ns in the tid stored in the robust list for > instance? This is something where you're on your own either writing it or hiring someone to do so and maintianing your forks of musl and the kernel. There is just no way this kind of hack ever belongs upstream. Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-11 11:38 ` Rich Felker @ 2025-02-11 13:53 ` Daniele Personal 0 siblings, 0 replies; 28+ messages in thread From: Daniele Personal @ 2025-02-11 13:53 UTC (permalink / raw) To: Rich Felker; +Cc: Florian Weimer, musl On Tue, 2025-02-11 at 06:38 -0500, Rich Felker wrote: > On Tue, Feb 11, 2025 at 10:34:30AM +0100, Daniele Personal wrote: > > On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote: > > > On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote: > > > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario > > > > > wrote: > > > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha > > > > > > scritto: > > > > > > > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario > > > > > > > wrote: > > > > > > > > But wouldn't this mean that robust mutexes > > > > > > > > functionality is > > > > > > > > totally > > > > > > > > incompatible with pid namespaces? > > > > > > > > > > > > > > No, only with trying to synchronize *across* different > > > > > > > pid > > > > > > > namespaces. > > > > > > > > > > > > > > > If the kernel relies on tid stored in memory by the > > > > > > > > process > > > > > > > > this always > > > > > > > > lacks the information about the pid namespace the tid > > > > > > > > belongs > > > > > > > > to. > > > > > > > > > > > > > > It's necessarily within the same pid namespace as the > > > > > > > process > > > > > > > itself. > > > > > > > > > > > > > > Functionally, you should consider different pid > > > > > > > namespaces as > > > > > > > different systems that happen to be capable of sharing > > > > > > > some > > > > > > > resources. > > > > > > > > > > > > > > Rich > > > > > > > > > > > > > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances > > > > > > across > > > > > > processes within the same pid namespace but on a system > > > > > > with > > > > > > more > > > > > > than a > > > > > > pid namespace could lead to issues anyway if the stored tid > > > > > > value > > > > > > is used > > > > > > by the kernel as who to contact without the knowledge of on > > > > > > which > > > > > > pid > > > > > > namespace. > > > > > > > > > > > > I not saying this is true, I'm trying to understand and if > > > > > > possible, > > > > > > improve things. > > > > > > > > > > That's not a problem. The stored tid is used only in the > > > > > context > > > > > of a > > > > > process exiting, where the kernel code knows the relevant pid > > > > > namespace (the one the exiting process is in) and uses the > > > > > tid > > > > > relative to that. If it didn't work this way, it would be a > > > > > fatal > > > > > bug > > > > > in the pid namespace implementation, which is supposed to > > > > > allow > > > > > essentially transparent containerization (which includes > > > > > processes in > > > > > the ns being able to use their tids as they could if they > > > > > were > > > > > outside > > > > > of any container/in global ns). > > > > > > > > > > Rich > > > > > > > > > > > > > So, IIUC, the problem of sharing robust pthread_mutex_t > > > > instances > > > > across different pid namespaces is on the user space side which > > > > is > > > > not > > > > able to distinguish clashes on TIDs. In particular, problems > > > > could > > > > arise when: > > > > > > No, it is not "on the user side". The user side can be modified > > > arbitrarily, and, modulo some cost, could surely be made to work > > > for > > > non-robust process-shared mutexes. The problem is that the kernel > > > -- > > > the part which makes them robust -- has to honor the protocol, > > > and > > > the > > > protocol does not admit distinguishing "pid N in ns X" from "pid > > > N in > > > ns Y". > > > > Ah, I thought your previous sentence was saying that the kernel is > > able > > to make this distinction. > > No, it's able to make the *assumption* that the namespace the tid is > relative to is that of the dying process. That's what lets it work > (and a large part of why namespaces were practical to add to Linux to > begin with -- all of the existing interfaces that use pids/tids need > to know which namespace you're talking about, but they work because > the kernel can assume "same namespace as the executing task"). > > > Unfortunately it is not possible to say which variables need cross- > > ns > > locking and which not. This means that we should treat all in the > > same > > way and so replace all the mutexes with sysv semaphores but this > > has > > some costs: locking sysv semaphores always require syscalls and > > context > > switch between user/kernel spaces even if there's no contention and > > moreover, they imply the presence of accessible files. > > > > We basically use a chunk of shared memory as a storage where > > variables > > could be added/read/written by the various applications. Since > > mutexes > > used to protect the variables are embedded in the same chunk of > > shared > > memory, there is only an mmap needed in order to access the storage > > by > > applications. > > > > Up to now, applications were running in the same pid namespace but > > now, > > for some products, we needed to integrate a 3rd party application > > and > > this requires a certain degree of isolation so we opted to > > containerize > > this application and here we come to why I asked for > > clarifications. > > > > I get your point when you say that sharing robust pthread_mutex_t > > instances violates the pid namespace isolation but you choose the > > degree of isolation balancing the risks and the benefits. Even if > > you > > have a new mount namespace you can decide to bind mount some parts > > of > > the filesystem to allow access to pars of the host flash for > > instance, > > same could happen with network. > > > > Long story short, I'm pulling water to my mill, but I think that > > it's > > not bad to have posix robust shared mutexes working across > > different > > pid namespaces. It will allow users to use a really powerful tool > > also > > with containerized applications (again pulling water to my mill) > > which > > need it. > > Generally we implement nonstandard functionality only on the basis of > strong historical precedent, need by multiple major real-world > applications, lack of cost imposed onto everyone else who doesn't > want/need the functionality, and other similar conditions. On all of > these axes, the thing you're asking for is completely in the opposite > direction. > > > If there's any idea on how to gain this I'd really work on it: > > limiting > > the max number of pids which could run on a pid namespace to allow > > the > > use of some bits for the ns in the tid stored in the robust list > > for > > instance? > > This is something where you're on your own either writing it or > hiring > someone to do so and maintianing your forks of musl and the kernel. > There is just no way this kind of hack ever belongs upstream. > > Rich Thanks for the time you spent on this, I really appreciated. Daniele. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-10 16:12 ` Daniele Personal 2025-02-10 18:14 ` Rich Felker @ 2025-02-10 18:44 ` Jeffrey Walton 2025-02-10 18:58 ` Rich Felker 1 sibling, 1 reply; 28+ messages in thread From: Jeffrey Walton @ 2025-02-10 18:44 UTC (permalink / raw) To: musl; +Cc: Rich Felker, Florian Weimer On Mon, Feb 10, 2025 at 11:13 AM Daniele Personal <d.dario76@gmail.com> wrote: > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto: > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > > > > > But wouldn't this mean that robust mutexes functionality is > > > > > totally > > > > > incompatible with pid namespaces? > > > > > > > > No, only with trying to synchronize *across* different pid > > > > namespaces. > > > > > > > > > If the kernel relies on tid stored in memory by the process > > > > > this always > > > > > lacks the information about the pid namespace the tid belongs > > > > > to. > > > > > > > > It's necessarily within the same pid namespace as the process > > > > itself. > > > > > > > > Functionally, you should consider different pid namespaces as > > > > different systems that happen to be capable of sharing some > > > > resources. > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances across > > > processes within the same pid namespace but on a system with more > > > than a > > > pid namespace could lead to issues anyway if the stored tid value > > > is used > > > by the kernel as who to contact without the knowledge of on which > > > pid > > > namespace. > > > > > > I not saying this is true, I'm trying to understand and if > > > possible, > > > improve things. > > > > That's not a problem. The stored tid is used only in the context of a > > process exiting, where the kernel code knows the relevant pid > > namespace (the one the exiting process is in) and uses the tid > > relative to that. If it didn't work this way, it would be a fatal bug > > in the pid namespace implementation, which is supposed to allow > > essentially transparent containerization (which includes processes in > > the ns being able to use their tids as they could if they were > > outside > > of any container/in global ns). > > So, IIUC, the problem of sharing robust pthread_mutex_t instances > across different pid namespaces is on the user space side which is not > able to distinguish clashes on TIDs. In particular, problems could > arise when: > * an application tries to unlock a mutex owned by another one with its > same TID but on a different pid namespace (but this is an > application design problem and libc can't help because TIDs are not > unique across different pid namespaces) > * an application tries to lock a mutex owned by another one with its > same TID but on a different pid namespace: this is a real issue > because it could happen > > I know that pid namespace isolation usually comes also with ipc > namespace isolation but it is not a violation to have one without the > other. Wouldn't it be a good idea to figure out a way to have a safe > way to use robust mutexes shared across different pid namespaces? It's been a while since I took my computer science classes, but... It sounds like (to me) the wrong tool is being used for the job. If you want a synchronization object that works across processes, then you use a semaphore, and not a pthread mutex. Jeff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-10 18:44 ` Jeffrey Walton @ 2025-02-10 18:58 ` Rich Felker 0 siblings, 0 replies; 28+ messages in thread From: Rich Felker @ 2025-02-10 18:58 UTC (permalink / raw) To: Jeffrey Walton; +Cc: musl, Florian Weimer On Mon, Feb 10, 2025 at 01:44:12PM -0500, Jeffrey Walton wrote: > On Mon, Feb 10, 2025 at 11:13 AM Daniele Personal <d.dario76@gmail.com> wrote: > > > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote: > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@libc.org> ha scritto: > > > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote: > > > > > > But wouldn't this mean that robust mutexes functionality is > > > > > > totally > > > > > > incompatible with pid namespaces? > > > > > > > > > > No, only with trying to synchronize *across* different pid > > > > > namespaces. > > > > > > > > > > > If the kernel relies on tid stored in memory by the process > > > > > > this always > > > > > > lacks the information about the pid namespace the tid belongs > > > > > > to. > > > > > > > > > > It's necessarily within the same pid namespace as the process > > > > > itself. > > > > > > > > > > Functionally, you should consider different pid namespaces as > > > > > different systems that happen to be capable of sharing some > > > > > resources. > > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances across > > > > processes within the same pid namespace but on a system with more > > > > than a > > > > pid namespace could lead to issues anyway if the stored tid value > > > > is used > > > > by the kernel as who to contact without the knowledge of on which > > > > pid > > > > namespace. > > > > > > > > I not saying this is true, I'm trying to understand and if > > > > possible, > > > > improve things. > > > > > > That's not a problem. The stored tid is used only in the context of a > > > process exiting, where the kernel code knows the relevant pid > > > namespace (the one the exiting process is in) and uses the tid > > > relative to that. If it didn't work this way, it would be a fatal bug > > > in the pid namespace implementation, which is supposed to allow > > > essentially transparent containerization (which includes processes in > > > the ns being able to use their tids as they could if they were > > > outside > > > of any container/in global ns). > > > > So, IIUC, the problem of sharing robust pthread_mutex_t instances > > across different pid namespaces is on the user space side which is not > > able to distinguish clashes on TIDs. In particular, problems could > > arise when: > > * an application tries to unlock a mutex owned by another one with its > > same TID but on a different pid namespace (but this is an > > application design problem and libc can't help because TIDs are not > > unique across different pid namespaces) > > * an application tries to lock a mutex owned by another one with its > > same TID but on a different pid namespace: this is a real issue > > because it could happen > > > > I know that pid namespace isolation usually comes also with ipc > > namespace isolation but it is not a violation to have one without the > > other. Wouldn't it be a good idea to figure out a way to have a safe > > way to use robust mutexes shared across different pid namespaces? > > It's been a while since I took my computer science classes, but... > > It sounds like (to me) the wrong tool is being used for the job. If > you want a synchronization object that works across processes, then > you use a semaphore, and not a pthread mutex. There are process-shared mutexes and robust process-shared mutexes that automatically unlock and notify the new owner on next acquire if a process died while owning the mutex, possibly leaving the protected data inconsistent. These are very reasonable tools to use, but they don't work across the boundaries between different physical or logical systems. Rich ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [musl] pthread_mutex_t shared between processes with different pid namespaces 2025-02-06 7:45 ` Daniele Personal 2025-02-07 16:19 ` Rich Felker @ 2025-02-07 16:34 ` Florian Weimer 1 sibling, 0 replies; 28+ messages in thread From: Florian Weimer @ 2025-02-07 16:34 UTC (permalink / raw) To: Daniele Personal; +Cc: Rich Felker, musl * Daniele Personal: >> As a result, it's definitely [not] just a userspace-only change if >> you need to use the robust mutex list across PID namespaces. >> > > I tried to understand what you mean here but can't: can you please > explain me which userspace-only change is needed? Sorry, there was a ”not” missing in the quoted conclusion. Florian ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-02-11 13:53 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-01-28 13:22 [musl] pthread_mutex_t shared between processes with different pid namespaces Daniele Personal 2025-01-28 15:02 ` Rich Felker 2025-01-28 16:13 ` Daniele Personal 2025-01-28 18:24 ` Florian Weimer 2025-01-31 9:31 ` Daniele Personal 2025-01-31 20:30 ` Markus Wichmann 2025-02-03 13:54 ` Daniele Personal 2025-02-01 16:03 ` Florian Weimer 2025-02-03 12:58 ` Daniele Personal 2025-02-03 17:25 ` Florian Weimer 2025-02-04 16:48 ` Daniele Personal 2025-02-04 18:53 ` Rich Felker 2025-02-05 10:17 ` Daniele Personal 2025-02-05 10:32 ` Florian Weimer 2025-02-06 7:45 ` Daniele Personal 2025-02-07 16:19 ` Rich Felker 2025-02-08 9:20 ` Daniele Dario 2025-02-08 12:39 ` Rich Felker 2025-02-08 14:40 ` Daniele Dario 2025-02-08 14:52 ` Rich Felker 2025-02-10 16:12 ` Daniele Personal 2025-02-10 18:14 ` Rich Felker 2025-02-11 9:34 ` Daniele Personal 2025-02-11 11:38 ` Rich Felker 2025-02-11 13:53 ` Daniele Personal 2025-02-10 18:44 ` Jeffrey Walton 2025-02-10 18:58 ` Rich Felker 2025-02-07 16:34 ` Florian Weimer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).