From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 13169 invoked from network); 9 Nov 2020 18:45:14 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 9 Nov 2020 18:45:14 -0000 Received: (qmail 17811 invoked by uid 550); 9 Nov 2020 18:45:09 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 17791 invoked from network); 9 Nov 2020 18:45:09 -0000 Date: Mon, 9 Nov 2020 13:44:56 -0500 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20201109184456.GB534@brightrain.aerifal.cx> References: <20201109170729.GA534@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH v2] MT fork On Mon, Nov 09, 2020 at 03:01:24PM -0300, Érico Nogueira wrote: > On Mon Nov 9, 2020 at 9:07 AM -03, Rich Felker wrote: > > On Sun, Nov 08, 2020 at 05:12:15PM +0100, Szabolcs Nagy wrote: > > > * Rich Felker [2020-11-05 22:36:17 -0500]: > > > > On Fri, Oct 30, 2020 at 11:31:17PM -0400, Rich Felker wrote: > > > > > One thing I know is potentially problematic is interaction with malloc > > > > > replacement -- locking of any of the subsystems locked at fork time > > > > > necessarily takes place after application atfork handlers, so if the > > > > > malloc replacement registers atfork handlers (as many do), it could > > > > > deadlock. I'm exploring whether malloc use in these systems can be > > > > > eliminated. A few are almost-surely better just using direct mmap > > > > > anyway, but for some it's borderline. I'll have a better idea sometime > > > > > in the next few days. > > > > > > > > OK, here's a summary of the affected locks (where there's a lock order > > > > conflict between them and application-replaced malloc): > > > > > > if malloc replacements take internal locks in atfork > > > handlers then it's not just libc internal locks that > > > can cause problems but locks taken by other atfork > > > handlers that were registered before the malloc one. > > > > No other locks taken there could impede forward process in libc. The > > only reason malloc is special here is because we allowed the > > application to redefine a family of functions used by libc. For > > MT-fork with malloc inside libc, we do the malloc_atfork code last so > > that the lock isn't held while other libc components' locks are being > > taken. But we can't start taking libc-internal locks until the > > application atfork handlers run, or the application could see a > > deadlocked libc state (e.g. if something in the atfork handlers used > > time functions, maybe to log time of fork, or gettext functions, maybe > > to print a localized message, etc.). > > > > > i don't think there is a clean solution to this. i > > > think using mmap is ugly (uless there is some reason > > > to prefer that other than the atfork issue). > > > > When the size is exactly a 4k page, it's preferable in terms of memory > > usage to use mmap instead of malloc, except on wacky archs with large > > pages (which I don't think it makes sense to memory-usage-optimize > > for; they're inherently and unfixably memory-inefficient). > > > > But for the most part, yes, it's ugly and I don't want to make such a > > change. > > > > > > - atexit: uses calloc to allocate more handler slots of the builtin 32 > > > > are exhausted. Could reasonably be changed to just mmap a whole page > > > > of slots in this case. > > > > Not sure on this. Since it's only used in the extremely rare case > > where a huge number of atexit handlers are registered, it's probably > > nicer to use mmap anyway -- it avoids linking a malloc that will > > almost surely never be used by atfork. > > > > > > - dlerror: the lock is just for a queue of buffers to be freed on > > > > future calls, since they can't be freed at thread exit time because > > > > the calling context (thread that's "already exited") is not valid to > > > > call application code, and malloc might be replaced. one plausible > > > > solution here is getting rid of the free queue hack (and thus the > > > > lock) entirely and instead calling libc's malloc/free via dlsym > > > > rather than using the potentially-replaced symbol. but this would > > > > not work for static linking (same dlerror is used even though dlopen > > > > always fails; in future it may work) so it's probably not a good > > > > approach. mmap is really not a good option here because it's > > > > excessive mem usage. It's probably possible to just repeatedly > > > > unlock/relock around performing each free so that only one lock is > > > > held at once. > > > > I think the repeated unlock/relock works fine here, but it would also > > work to just null out the list in the child (i.e. never free any > > buffers that were queued to be freed in the parent before fork). Fork > > of MT process is inherently leaky anyway (you can never free thread > > data from the other parent threads). I'm not sure which approach is > > nicer. The repeated unlock/relock is less logically invasive I think. > > > > > > - gettext: bindtextdomain calls calloc while holding the lock on list > > > > of bindings. It could drop the lock, allocate, retake it, recheck > > > > for an existing binding, and free in that case, but this is > > > > undesirable because it introduces a dependency on free in > > > > static-linked programs. Otherwise all memory gettext allocates is > > > > permanent. Because of this we could just mmap an area and bump > > > > allocate it, but that's wasteful because most programs will only use > > > > one tiny binding. We could also just leak on the rare possibility of > > > > concurrent binding allocations; the number of such leaks is bounded > > > > by nthreads*ndomains, and we could make it just nthreads by keeping > > > > and reusing abandoned ones. > > > > Thoughts here? > > > > > > - sem_open: a one-time calloc of global semtab takes place with the > > > > lock held. On 32-bit archs this table is exactly 4k; on 64-bit it's > > > > 6k. So it seems very reasonable to just mmap instead of calloc. > > > > This should almost surely just be changed to mmap. With mallocng, an > > early malloc(4k) will take over 5k, and this table is permanent so the > > excess will never be released. mmap avoids that entirely. > > > > > > - timezone: The tz core allocates memory to remember the last-seen > > > > value of TZ env var to react if it changes. Normally it's small, so > > > > perhaps we could just use a small (e.g. 32 byte) static buffer and > > > > replace it with a whole mmapped page if a value too large for that > > > > is seen. > > > > > > > > Also, somehow I failed to find one of the important locks MT-fork > > > > needs to be taking: locale_map.c has a lock for the records of mapped > > > > locales. Allocation also takes place with it held, and for the same > > > > reason as gettext it really shouldn't be changed to allocate > > > > differently. It could possibly do the allocation without the lock held > > > > though and leak it (or save it for reuse later if needed) when another > > > > thread races to load the locale. > > > > > > yeah this sounds problematic. > > > > > > if malloc interposers want to do something around fork > > > then libc may need to expose some better api than atfork. > > > > Most of them use existing pthread_atfork if they're intended to be > > usable with MT-fork. I don't think inventing a new musl-specific API > > is a solution here. Their pthread_atfork approach already fully works > > with glibc because glibc *doesn't* make anything but malloc work in > > the MT-forked child. All the other subsystems mentioned here will > > deadlock or blow up in the child with glibc, but only with 0.01% > > probability since it's rare for them to be under hammering at fork > > time. > > > > One solution you might actually like: getting rid of > > application-provided-malloc use inside libc. This could be achieved by > > making malloc a thin wrapper for __libc_malloc or whatever, which > > could be called by everything in libc that doesn't actually have a > > contract to return "as-if-by-malloc" memory. Only a few functions like > > getdelim would be left still calling malloc. > > This code block in glob() uses strdup(), which I'd assume would have to > use the application provided malloc. Wouldn't that have to be worked > around somehow? > > if (*pat) { > char *p = strdup(pat); > if (!p) return GLOB_NOSPACE; > buf[0] = 0; > size_t pos = 0; > char *s = p; > if ((flags & (GLOB_TILDE | GLOB_TILDE_CHECK)) && *p == '~') > error = expand_tilde(&s, buf, &pos); > if (!error) > error = do_glob(buf, pos, 0, s, flags, errfunc, &tail); > free(p); > } It could either be left using public malloc (imo fine since this is not an "internal component of libc" but generic library code with no tie-in to libc) or use of strdup could be replaced with a trivial alternate version that uses __libc_malloc instead. My leaning would be towards the former -- only using libc malloc in places where calling the application-provided malloc could lead to recursive locking of libc-internal locks (because the caller already holds a libc-internal lock) or other "inconsistent state" issues (like dlerror buffers at pthread_exit time). Rich