mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
Subject: Re: [musl] [PATCH] [RFC] make clone() usable
Date: Fri, 02 Jun 2023 10:28:09 +0300	[thread overview]
Message-ID: <67a2fd2f637d0800ff48d27588c00384@ispras.ru> (raw)
In-Reply-To: <20230601200806.GM4163@brightrain.aerifal.cx>

On 2023-06-01 23:08, Rich Felker wrote:
> On Thu, Jun 01, 2023 at 01:12:57PM +0300, Alexey Izbyshev wrote:
>> On 2023-05-31 02:35, Rich Felker wrote:
>> >As discussed before (see the 2021 thread "Incorrect thread TID
>> >caching") clone() has been effectively unusable because it produces a
>> >child process in invalid state, with wrong tid in its thread
>> >structure, among other problems.
>> >
>> >The attached proposed patch attempts to make clone() usable by having
>> >it share the _Fork logic for establishing a consistent process state
>> >after forking, and also blocks use of flags which produce invalid
>> >state.
>> >
>> Tangentially, while thinking about this, I realized that because the
>> kernel clears the address of the exit futex on clone and _Fork
>> doesn't re-establish it, the following blocks forever:
>> 
>> void *thr(void *arg) {
>>         // Blocks in __tl_sync because __thread_list_lock is never
>> unlocked
>>         pthread_join(*(pthread_t *)arg, NULL));
>>         return NULL;
>> }
>> 
>> int main() {
>>         if (!fork()) {
>>                 static pthread_t pt;
>>                 pt = pthread_self();
>>                 pthread_create(&(pthread_t){0}, NULL, thr, &pt);
>>                 pthread_exit(NULL);
>>         }
>>         wait(NULL);
>> }
>> 
>> This could be fixed by the following change in _Fork.c:
>> 
>> -               self->tid = __syscall(SYS_gettid);
>> +               self->tid = __syscall(SYS_set_tid_address,
>> &__thread_list_lock);
> 
> Wow, I had no idea Linux's fork cleared the exit futex address. That's
> a big bug we've had around for nobody to have noticed...
> 
>> >With CLONE_VM, the old behavior is kept, with a caveat that the child
>> >context is extremely restrictive, ala vfork.
>> >
>> >It was raised previously (pardon the pun) that raise() should perhaps
>> >be modified to make a SYS_gettid syscall rather than using the thread
>> >structure tid, so that it's possible to call raise() in a vfork
>> >context without signaling the wrong process. It might make sense to do
>> >that now too.
>> >
>> Yes, it would be nice to have raise and abort (which also reads the
>> cached tid separately) working in a vfork child.
>> 
>> >diff --git a/src/linux/clone.c b/src/linux/clone.c
>> >index 8c1af7d3..e8f76bbc 100644
>> >--- a/src/linux/clone.c
>> >+++ b/src/linux/clone.c
>> >@@ -4,18 +4,55 @@
>> > #include <sched.h>
>> > #include "pthread_impl.h"
>> > #include "syscall.h"
>> >+#include "lock.h"
>> >+#include "fork_impl.h"
>> >+
>> >+struct clone_start_args {
>> >+	int (*func)(void *);
>> >+	void *arg;
>> >+	sigset_t sigmask;
>> >+};
>> >+
>> >+static int clone_start(void *arg)
>> >+{
>> >+	struct clone_start_args *csa = arg;
>> >+	__post_Fork(0);
>> >+	__restore_sigs(&csa->sigmask);
>> >+	return csa->func(csa->arg);
>> >+}
>> >
>> > int clone(int (*func)(void *), void *stack, int flags, void *arg, ...)
>> > {
>> >+	struct clone_start_args csa;
>> > 	va_list ap;
>> >-	pid_t *ptid, *ctid;
>> >-	void  *tls;
>> >+	pid_t *ptid = 0, *ctid = 0;
>> >+	void  *tls = 0;
>> >+
>> >+	/* Flags that produce an invalid thread/TLS state are disallowed. */
>> >+	int badflags = CLONE_THREAD | CLONE_SETTLS | CLONE_CHILD_CLEARTID;
>> >+	if (flags & badflags)
>> >+		return __syscall_ret(-EINVAL);
>> >
>> Maybe also check for NULL stack, as proposed in
>> https://www.openwall.com/lists/musl/2022/08/02/1 ?
> 
> I guess that can be done. Normally we don't check for null pointers,
> but I think the syscall would silently keep the existing stack pointer
> if passed null rather than blowing up, which we probably don't want --
> even without CLONE_VM, where it'd blow up, we'd be giving
> appears-to-work behavior without a contract for it to work.
> 
Yes, and this is not only about CLONE_VM. On some arches (e.g. i386, 
x86_64, aarch64) musl's __clone stores things on "stack" and hence 
segfaults if "stack" is NULL, but on other arches it doesn't (e.g. arm). 
Exposing this is inconsistent and confusing. If support for passing NULL 
"stack" is ever desired, __clone should be changed first.

>> > 	va_start(ap, arg);
>> >-	ptid = va_arg(ap, pid_t *);
>> >-	tls  = va_arg(ap, void *);
>> >-	ctid = va_arg(ap, pid_t *);
>> >+	if (flags & (CLONE_PIDFD | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID))
>> >+	 	ptid = va_arg(ap, pid_t *);
>> >+	if (flags & CLONE_CHILD_SETTID) {
>> >+		tls = va_arg(ap, void *);
>> >+		ctid = va_arg(ap, pid_t *);
>> >+	}
>> > 	va_end(ap);
>> >
>> >-	return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls,
>> >ctid));
>> >+	if (flags & CLONE_VM)
>> >+		return __syscall_ret(__clone(func, stack, flags, arg, ptid,
>> >tls, ctid));
>> >+
>> >+	__block_all_sigs(&csa.sigmask);
>> >+	LOCK(__abort_lock);
>> >+
>> >+	csa.func = func;
>> >+	csa.arg = arg;
>> >+	int ret = __clone(clone_start, stack, flags, &csa, ptid, tls, ctid);
>> >+
>> >+	__post_Fork(ret);
>> >+	__restore_sigs(&csa.sigmask);
>> >+	return __syscall_ret(ret);
>> > }
>> 
>> Another thing that might be somewhat relevant if we expect to see
>> wider usage of clone is that syscall (the public function) currently
>> blindly extracts six arguments via va_arg, which on some
>> architectures requires sufficient stack space not to crash. So on
>> i386 the following silly function will crash if passed to clone,
>> provided that "stack" points to the beginning of unmapped space past
>> the stack page end.
> 
> That's changed in the patch. It only calls va_arg for arguments whose
> existence the flags implies.
> 
I was talking about va_arg usage in syscall() (in misc/syscall.c). 
__clone only aligns stack on 16 bytes (at least on arches that I 
checked), but doesn't reserve space. So a sufficiently unlucky 
user-provided function will crash when called from __clone in the child, 
as in my example.

Alexey

  parent reply	other threads:[~2023-06-02  7:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 23:35 Rich Felker
2023-06-01 10:12 ` Alexey Izbyshev
2023-06-01 20:08   ` Rich Felker
2023-06-01 20:17     ` [musl] [PATCH] [v2] " Rich Felker
2023-06-02  7:28     ` Alexey Izbyshev [this message]
2023-06-02 14:16       ` [musl] [PATCH] [RFC] " Rich Felker
2023-06-02 14:19     ` Markus Wichmann
2023-06-02 15:30       ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67a2fd2f637d0800ff48d27588c00384@ispras.ru \
    --to=izbyshev@ispras.ru \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).