From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 18757 invoked from network); 6 Nov 2020 03:36:36 -0000
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with ESMTPUTF8; 6 Nov 2020 03:36:36 -0000
Received: (qmail 28086 invoked by uid 550); 6 Nov 2020 03:36:30 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 28064 invoked from network); 6 Nov 2020 03:36:29 -0000
Date: Thu, 5 Nov 2020 22:36:17 -0500
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Message-ID: <20201106033616.GX534@brightrain.aerifal.cx>
References: <20201027211735.GV534@brightrain.aerifal.cx>
 <20201030185205.GA10849@arya.arvanta.net>
 <20201030185716.GE534@brightrain.aerifal.cx>
 <5298816.XTEcGr0bgB@nanabozho>
 <20201031033117.GH534@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201031033117.GH534@brightrain.aerifal.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] [PATCH v2] MT fork

On Fri, Oct 30, 2020 at 11:31:17PM -0400, Rich Felker wrote:
> On Fri, Oct 30, 2020 at 03:31:54PM -0600, Ariadne Conill wrote:
> > Hello,
> > 
> > On Friday, October 30, 2020 12:57:17 PM MDT Rich Felker wrote:
> > > There was a regression in musl too, I think. With
> > > 27b2fc9d6db956359727a66c262f1e69995660aa you should be able to
> > > re-enable parallel mark. If you get a chance to test, let us know if
> > > it works for you.
> > 
> > I have pushed current musl git plus the MT fork patch to Alpine edge as Alpine 
> > musl 1.2.2_pre0, and reenabling parallel mark has worked fine.
> > 
> > It would be nice to have a musl 1.2.2 release that I can use for the source 
> > tarball instead of a git snapshot, but this will do for now.
> 
> Thanks for the feedback. I'll try to wrap up this release cycle pretty
> quickly now, since I know this has been a big stress for distros, but
> I want to make sure the MT-fork doesn't introduce other breakage.
> 
> One thing I know is potentially problematic is interaction with malloc
> replacement -- locking of any of the subsystems locked at fork time
> necessarily takes place after application atfork handlers, so if the
> malloc replacement registers atfork handlers (as many do), it could
> deadlock. I'm exploring whether malloc use in these systems can be
> eliminated. A few are almost-surely better just using direct mmap
> anyway, but for some it's borderline. I'll have a better idea sometime
> in the next few days.

OK, here's a summary of the affected locks (where there's a lock order
conflict between them and application-replaced malloc):

- atexit: uses calloc to allocate more handler slots of the builtin 32
  are exhausted. Could reasonably be changed to just mmap a whole page
  of slots in this case.

- dlerror: the lock is just for a queue of buffers to be freed on
  future calls, since they can't be freed at thread exit time because
  the calling context (thread that's "already exited") is not valid to
  call application code, and malloc might be replaced. one plausible
  solution here is getting rid of the free queue hack (and thus the
  lock) entirely and instead calling libc's malloc/free via dlsym
  rather than using the potentially-replaced symbol. but this would
  not work for static linking (same dlerror is used even though dlopen
  always fails; in future it may work) so it's probably not a good
  approach. mmap is really not a good option here because it's
  excessive mem usage. It's probably possible to just repeatedly
  unlock/relock around performing each free so that only one lock is
  held at once.

- gettext: bindtextdomain calls calloc while holding the lock on list
  of bindings. It could drop the lock, allocate, retake it, recheck
  for an existing binding, and free in that case, but this is
  undesirable because it introduces a dependency on free in
  static-linked programs. Otherwise all memory gettext allocates is
  permanent. Because of this we could just mmap an area and bump
  allocate it, but that's wasteful because most programs will only use
  one tiny binding. We could also just leak on the rare possibility of
  concurrent binding allocations; the number of such leaks is bounded
  by nthreads*ndomains, and we could make it just nthreads by keeping
  and reusing abandoned ones.

- sem_open: a one-time calloc of global semtab takes place with the
  lock held. On 32-bit archs this table is exactly 4k; on 64-bit it's
  6k. So it seems very reasonable to just mmap instead of calloc.

- timezone: The tz core allocates memory to remember the last-seen
  value of TZ env var to react if it changes. Normally it's small, so
  perhaps we could just use a small (e.g. 32 byte) static buffer and
  replace it with a whole mmapped page if a value too large for that
  is seen.

Also, somehow I failed to find one of the important locks MT-fork
needs to be taking: locale_map.c has a lock for the records of mapped
locales. Allocation also takes place with it held, and for the same
reason as gettext it really shouldn't be changed to allocate
differently. It could possibly do the allocation without the lock held
though and leak it (or save it for reuse later if needed) when another
thread races to load the locale.

So anywya, it looks like there's some nontrivial work to be done here
in order to make the MT-fork not be a regression for replaced-malloc
uage... :( Ideas for the above and keeping the solutions non-invasive
will be very much welcome!

Rich