From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id ec8bfe1d for ; Mon, 9 Mar 2020 17:12:44 +0000 (UTC) Received: (qmail 26311 invoked by uid 550); 9 Mar 2020 17:12:42 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 26281 invoked from network); 9 Mar 2020 17:12:41 -0000 Date: Mon, 9 Mar 2020 13:12:27 -0400 From: Rich Felker To: Pirmin Walthert Cc: musl@lists.openwall.com Message-ID: <20200309171227.GY11469@brightrain.aerifal.cx> References: <41ea935d-39e4-1460-e502-5c82d7dd6a4d@wwcom.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <41ea935d-39e4-1460-e502-5c82d7dd6a4d@wwcom.ch> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: [musl] Re: FYI: some observations when testing next-gen malloc On Mon, Mar 09, 2020 at 05:49:02PM +0100, Pirmin Walthert wrote: > Dear Rich, > > First of all many thanks for your brilliant C library. > > As I do not know whether the musl mailinglist is already the right > place to discuss the next-gen malloc module, I decided to send you > my observations directly. It is, so I'm cc'ing the list now. > I'd like to mention that I am not yet entirely sure whether the > following is a problem with the new malloc code or with asterisk > itself but maybe you can already keep the following in the back of > your head if someone else is reporting similar behavior with a > different application: > > We use asterisk (16.7) in a musl libc based distribution and for > some operations asterisk forks (in a thread) the main process to > execute a system command. When using libmallocng.so (newest version > with "fix race condition in lock-free path of free" applied, but > already without that change) some of these forked child processes > will hang during a call to pthread_mutex_unlock. > > Unfortunatelly the backtrace is not of much help I guess, but the > child process always seems to hang on pthread_mutex_unlock. So > something seems to happen with the mutex on fork: > > #0  0x00007f2152a20092 in pthread_mutex_unlock () from > /lib/ld-musl-x86_64.so.1 > No symbol table info available. > #1  0x0000000000000008 in ?? () > No symbol table info available. > #2  0x0000000000000000 in ?? () > No symbol table info available. > > I will for sure try to dig into this further. For the moment the > only thing I know is that I did not yet observe this on any of the > several hundred systems with musl 1.1.23 (same asterisk version), > not on any of the around 5 with 1.2.0 (same asterisk version, old > malloc) but quite frequently on the two systems with 1.1.24 and > libmallocng.so. This is completely expected and should happen with old or new malloc. I'm surprised you haven't hit it before. After a multithreaded process calls fork, the child inherits a state where locks may be permanently held. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html - A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. It's not described very rigorously, but effectively it's in an async signal context and can only call functions which are AS-safe. A future version of the standard is expected to drop the requirement that fork itself be async-signal-safe, and may thereby add requirements to synchronize against some or all internal locks so that the child can inherit a working context. But the right solution here is always to stop using fork without exec. Rich