mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [musl] [PATCH v3] MT-fork (series)
Date: Thu, 12 Nov 2020 22:37:28 -0500	[thread overview]
Message-ID: <20201113033728.GU534@brightrain.aerifal.cx> (raw)
In-Reply-To: <20201112232655.GJ1370092@port70.net>

Thanks for the detailed review! Replies inline below:

On Fri, Nov 13, 2020 at 12:26:55AM +0100, Szabolcs Nagy wrote:
> > +static ssize_t read_loop(int fd, void *p, size_t n)
> > +{
> > +	for (size_t i=0; i<n; ) {
> > +		ssize_t l = read(fd, (char *)p+i, n-i);
> > +		if (l<0) {
> > +			if (errno==EINTR) continue;
> > +			else return -1;
> > +		}
> > +		if (l==0) return i;
> > +		i += l;
> > +	}
> > +	return n;
> > +}
> > +
> >  static void *mmap_fixed(void *p, size_t n, int prot, int flags, int fd, off_t off)
> >  {
> >  	static int no_map_fixed;
> > @@ -1060,13 +1073,17 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
> >  				snprintf(etc_ldso_path, sizeof etc_ldso_path,
> >  					"%.*s/etc/ld-musl-" LDSO_ARCH ".path",
> >  					(int)prefix_len, prefix);
> > -				FILE *f = fopen(etc_ldso_path, "rbe");
> > -				if (f) {
> > -					if (getdelim(&sys_path, (size_t[1]){0}, 0, f) <= 0) {
> > +				fd = open(etc_ldso_path, O_RDONLY|O_CLOEXEC);
> > +				if (fd>=0) {
> > +					size_t n = 0;
> > +					if (!fstat(fd, &st)) n = st.st_size;
> > +					if ((sys_path = malloc(n+1)))
> > +						sys_path[n] = 0;
> > +					if (!sys_path || read_loop(fd, sys_path, n)<0) {
> 
> should this handle the short read case?
> 
> i assume we only want to support atomic updates to
> the path file so there should not be a short read,
> but i think rejecting read_loop(,,n)!=n is safer.

We could reject != n, but it's possible to have races where you read a
partial file even when the return value is n (because fstat only
returned the size of the partial file). In fact the only way you'd get
a return value <n is if the file were truncated after the fstat;
normally you'd see the opposite, file growing after fstat. I'm not
opposed to changing it but I don't think it gives any practical
improvement to safety. The only actually-safe way to change this file
is by atomic replacement.

> > diff --git a/src/malloc/lite_malloc.c b/src/malloc/lite_malloc.c
> > index f8931ba5..0f461617 100644
> > --- a/src/malloc/lite_malloc.c
> > +++ b/src/malloc/lite_malloc.c
> > @@ -100,4 +100,16 @@ static void *__simple_malloc(size_t n)
> >  	return p;
> >  }
> >  
> > -weak_alias(__simple_malloc, malloc);
> > +weak_alias(__simple_malloc, __libc_malloc_impl);
> > +
> > +void *__libc_malloc(size_t n)
> > +{
> > +	return __libc_malloc_impl(n);
> > +}
> > +
> > +static void *default_malloc(size_t n)
> > +{
> > +	return __libc_malloc_impl(n);
> > +}
> > +
> > +weak_alias(default_malloc, malloc);
> 
> 
> maybe i'm missing something but i thought it would be enough to do
> 
> weak_alias(__simple_malloc, __libc_malloc);
> static void *default_malloc(size_t n)
> {
> 	return __libc_malloc(n);
> }
> weak_alias(default_malloc, malloc);
> 
> here and have strong __libc_malloc symbol in the malloc implementation.

That's what I did at first, and it works, but it keeps the property of
depending on order of files in the archive, which was undesirable.
Previously I left it alone because it avoided jumping thru an
additional entry point, but now it makes no difference for the public
malloc, and the libc-internal one is not at all performance-relevant.
So we're just wasting a few bytes of size here to avoid depending on
the link order hack, and I think that's a reasonable trade.

> > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > index 61714f40..6b868c84 100644
> > --- a/ldso/dynlink.c
> > +++ b/ldso/dynlink.c
> > @@ -20,6 +20,7 @@
> >  #include <semaphore.h>
> >  #include <sys/membarrier.h>
> >  #include "pthread_impl.h"
> > +#include "fork_impl.h"
> >  #include "libc.h"
> >  #include "dynlink.h"
> >  
> > @@ -1426,6 +1427,17 @@ void __libc_exit_fini()
> >  	}
> >  }
> >  
> > +void __ldso_atfork(int who)
> > +{
> > +	if (who<0) {
> > +		pthread_rwlock_wrlock(&lock);
> > +		pthread_mutex_lock(&init_fini_lock);
> > +	} else {
> > +		pthread_mutex_unlock(&init_fini_lock);
> > +		pthread_rwlock_unlock(&lock);
> > +	}
> > +}
> > +
> >  static struct dso **queue_ctors(struct dso *dso)
> >  {
> >  	size_t cnt, qpos, spos, i;
> > @@ -1484,6 +1496,13 @@ static struct dso **queue_ctors(struct dso *dso)
> >  	}
> >  	queue[qpos] = 0;
> >  	for (i=0; i<qpos; i++) queue[i]->mark = 0;
> > +	for (i=0; i<qpos; i++)
> > +		if (queue[i]->ctor_visitor && queue[i]->ctor_visitor->tid < 0) {
> > +			error("State of %s is inconsistent due to multithreaded fork\n",
> > +				queue[i]->name);
> > +			free(queue);
> > +			if (runtime) longjmp(*rtld_fail, 1);
> > +		}
> 
> 
> hm since fork takes the init_fini_lock i guess
> the ctors could be finished in the child if
> necessary. or is there some problem with that?
> 
> otherwise the patch looks good to me.

The init_fini_lock is not held across running of ctors/dtors, only to
protect access to the queue. Otherwise it would break concurrent
dlopen, recursive dlopen, etc. (which can be delayed arbitrarily long
by ctor execution). In fact if the lock worked like that and fork took
the lock, fork would also be delayed arbitrarily long and you could
not fork from ctors.

The underlying issue this code is addressing is that, if a ctor was
running in another thread at the time of fork, it will never finish,
but it also can't be restarted because then parts of it would be
executed twice. We discussed this on IRC before and the only
conclusion I could come it was that the DSO is permanently unusable.

Maybe it would make sense to strip the affected DSO and all that
depend on it from the loaded list, so they'd be reloaded from the
filesystem next time they're dlopened (with the partly loaded old
mapping just abandoned). I'm not sure how bad that would be in terms
of unwanted side effects.

Note however that, with the way I implemented it here, you can avoid
the "perma-dead DSO" problem just by taking a lock around your own
calls to dlopen, and installing an atfork handler that takes that
lock. This is basically "opting into" the above "fork delayed
arbitrarily long" behavior I mentioned.

Rich

  reply	other threads:[~2020-11-13  3:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 20:57 Rich Felker
2020-11-12 23:26 ` Szabolcs Nagy
2020-11-13  3:37   ` Rich Felker [this message]
2020-11-13  9:22     ` Szabolcs Nagy
2020-11-13  6:22 ` Ariadne Conill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201113033728.GU534@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).