From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no
	version=3.4.4
Received: (qmail 12170 invoked from network); 13 Nov 2020 09:22:54 -0000
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with ESMTPUTF8; 13 Nov 2020 09:22:54 -0000
Received: (qmail 9603 invoked by uid 550); 13 Nov 2020 09:22:47 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 9579 invoked from network); 13 Nov 2020 09:22:47 -0000
Date: Fri, 13 Nov 2020 10:22:34 +0100
From: Szabolcs Nagy <nsz@port70.net>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Message-ID: <20201113092234.GK1370092@port70.net>
Mail-Followup-To: Rich Felker <dalias@libc.org>, musl@lists.openwall.com
References: <20201111205728.GA19755@brightrain.aerifal.cx>
 <20201112232655.GJ1370092@port70.net>
 <20201113033728.GU534@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201113033728.GU534@brightrain.aerifal.cx>
Subject: Re: [musl] [PATCH v3] MT-fork (series)

* Rich Felker <dalias@libc.org> [2020-11-12 22:37:28 -0500]:
> On Fri, Nov 13, 2020 at 12:26:55AM +0100, Szabolcs Nagy wrote:
> > > @@ -1060,13 +1073,17 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
> > >  				snprintf(etc_ldso_path, sizeof etc_ldso_path,
> > >  					"%.*s/etc/ld-musl-" LDSO_ARCH ".path",
> > >  					(int)prefix_len, prefix);
> > > -				FILE *f = fopen(etc_ldso_path, "rbe");
> > > -				if (f) {
> > > -					if (getdelim(&sys_path, (size_t[1]){0}, 0, f) <= 0) {
> > > +				fd = open(etc_ldso_path, O_RDONLY|O_CLOEXEC);
> > > +				if (fd>=0) {
> > > +					size_t n = 0;
> > > +					if (!fstat(fd, &st)) n = st.st_size;
> > > +					if ((sys_path = malloc(n+1)))
> > > +						sys_path[n] = 0;
> > > +					if (!sys_path || read_loop(fd, sys_path, n)<0) {
> > 
> > should this handle the short read case?
> > 
> > i assume we only want to support atomic updates to
> > the path file so there should not be a short read,
> > but i think rejecting read_loop(,,n)!=n is safer.
> 
> We could reject != n, but it's possible to have races where you read a
> partial file even when the return value is n (because fstat only
> returned the size of the partial file). In fact the only way you'd get
> a return value <n is if the file were truncated after the fstat;
> normally you'd see the opposite, file growing after fstat. I'm not
> opposed to changing it but I don't think it gives any practical
> improvement to safety. The only actually-safe way to change this file
> is by atomic replacement.

i think !=n does not solve any problems, but can fail earlier:
if the file is concurrently truncated and we notice it here then
we can fail immediately instead of continuing with a broken path.
(it also makes it clear that we expect ==n)

> > maybe i'm missing something but i thought it would be enough to do
> > 
> > weak_alias(__simple_malloc, __libc_malloc);
> > static void *default_malloc(size_t n)
> > {
> > 	return __libc_malloc(n);
> > }
> > weak_alias(default_malloc, malloc);
> > 
> > here and have strong __libc_malloc symbol in the malloc implementation.
> 
> That's what I did at first, and it works, but it keeps the property of
> depending on order of files in the archive, which was undesirable.
> Previously I left it alone because it avoided jumping thru an
> additional entry point, but now it makes no difference for the public
> malloc, and the libc-internal one is not at all performance-relevant.
> So we're just wasting a few bytes of size here to avoid depending on
> the link order hack, and I think that's a reasonable trade.

ok. makes sense.

> > > +	for (i=0; i<qpos; i++)
> > > +		if (queue[i]->ctor_visitor && queue[i]->ctor_visitor->tid < 0) {
> > > +			error("State of %s is inconsistent due to multithreaded fork\n",
> > > +				queue[i]->name);
> > > +			free(queue);
> > > +			if (runtime) longjmp(*rtld_fail, 1);
> > > +		}
> > 
> > 
> > hm since fork takes the init_fini_lock i guess
> > the ctors could be finished in the child if
> > necessary. or is there some problem with that?
> > 
> > otherwise the patch looks good to me.
> 
> The init_fini_lock is not held across running of ctors/dtors, only to
> protect access to the queue. Otherwise it would break concurrent
> dlopen, recursive dlopen, etc. (which can be delayed arbitrarily long
> by ctor execution). In fact if the lock worked like that and fork took
> the lock, fork would also be delayed arbitrarily long and you could
> not fork from ctors.
> 
> The underlying issue this code is addressing is that, if a ctor was
> running in another thread at the time of fork, it will never finish,
> but it also can't be restarted because then parts of it would be
> executed twice. We discussed this on IRC before and the only
> conclusion I could come it was that the DSO is permanently unusable.

ah right.

> 
> Maybe it would make sense to strip the affected DSO and all that
> depend on it from the loaded list, so they'd be reloaded from the
> filesystem next time they're dlopened (with the partly loaded old
> mapping just abandoned). I'm not sure how bad that would be in terms
> of unwanted side effects.
> 
> Note however that, with the way I implemented it here, you can avoid
> the "perma-dead DSO" problem just by taking a lock around your own
> calls to dlopen, and installing an atfork handler that takes that
> lock. This is basically "opting into" the above "fork delayed
> arbitrarily long" behavior I mentioned.

yes this is ok.

i don't think reloading helps much (e.g. if ctors have
visible side effects) and undoing tls may not be trivial.
(and dl_iterate_phdr can find loaded but not yet
constructed dsos so this can be visible in many ways)