Re: [musl] Re: FYI: some observations when testing next-gen malloc

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Pirmin Walthert <pirmin.walthert@wwcom.ch>
To: musl@lists.openwall.com
Subject: Re: [musl] Re: FYI: some observations when testing next-gen malloc
Date: Mon, 9 Mar 2020 19:14:59 +0100	[thread overview]
Message-ID: <82b69741-72e6-ab53-c523-ce4e1e7dc98e@wwcom.ch> (raw)
In-Reply-To: <20200309171227.GY11469@brightrain.aerifal.cx>

Am 09.03.20 um 18:12 schrieb Rich Felker:
> On Mon, Mar 09, 2020 at 05:49:02PM +0100, Pirmin Walthert wrote:
>> Dear Rich,
>>
>> First of all many thanks for your brilliant C library.
>>
>> As I do not know whether the musl mailinglist is already the right
>> place to discuss the next-gen malloc module, I decided to send you
>> my observations directly.
> It is, so I'm cc'ing the list now.
>
>> I'd like to mention that I am not yet entirely sure whether the
>> following is a problem with the new malloc code or with asterisk
>> itself but maybe you can already keep the following in the back of
>> your head if someone else is reporting similar behavior with a
>> different application:
>>
>> We use asterisk (16.7) in a musl libc based distribution and for
>> some operations asterisk forks (in a thread) the main process to
>> execute a system command. When using libmallocng.so (newest version
>> with "fix race condition in lock-free path of free" applied, but
>> already without that change) some of these forked child processes
>> will hang during a call to pthread_mutex_unlock.
>>
>> Unfortunatelly the backtrace is not of much help I guess, but the
>> child process always seems to hang on pthread_mutex_unlock. So
>> something seems to happen with the mutex on fork:
>>
>> #0  0x00007f2152a20092 in pthread_mutex_unlock () from
>> /lib/ld-musl-x86_64.so.1
>> No symbol table info available.
>> #1  0x0000000000000008 in ?? ()
>> No symbol table info available.
>> #2  0x0000000000000000 in ?? ()
>> No symbol table info available.
>>
>> I will for sure try to dig into this further. For the moment the
>> only thing I know is that I did not yet observe this on any of the
>> several hundred systems with musl 1.1.23 (same asterisk version),
>> not on any of the around 5 with 1.2.0 (same asterisk version, old
>> malloc) but quite frequently on the two systems with 1.1.24 and
>> libmallocng.so.
> This is completely expected and should happen with old or new malloc.
> I'm surprised you haven't hit it before. After a multithreaded process
> calls fork, the child inherits a state where locks may be permanently
> held. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
>
>      - A process shall be created with a single thread. If a
>        multi-threaded process calls fork(), the new process shall
>        contain a replica of the calling thread and its entire address
>        space, possibly including the states of mutexes and other
>        resources. Consequently, to avoid errors, the child process may
>        only execute async-signal-safe operations until such time as one
>        of the exec functions is called.
>
> It's not described very rigorously, but effectively it's in an async
> signal context and can only call functions which are AS-safe.
>
> A future version of the standard is expected to drop the requirement
> that fork itself be async-signal-safe, and may thereby add
> requirements to synchronize against some or all internal locks so that
> the child can inherit a working context. But the right solution here is
> always to stop using fork without exec.
>
> Rich

Well, I have now changed the code a bit to make sure that no 
async-signal-unsafe command is being executed before execl. Things I've 
removed:

a call to cap_from_text, cap_set_proc and cap_free has been removed as 
well as sched_setscheduler. Now the only thing being executed before 
execl in the child process is closefrom()

However I got a hanging process again:

(gdb) bt full
#0  0x00007f42f649c6da in __syscall_cp_c () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.

Best regards,

Pirmin

next prev parent reply	other threads:[~2020-03-09 18:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <41ea935d-39e4-1460-e502-5c82d7dd6a4d@wwcom.ch>
2020-03-09 17:12 ` Rich Felker
2020-03-09 18:14   ` Pirmin Walthert [this message]
2020-03-09 18:55     ` Szabolcs Nagy
     [not found]       ` <5957e47c-50c6-0ae1-3e5c-32fd96c756eb@wwcom.ch>
2020-03-10 10:06         ` Szabolcs Nagy
2020-03-11  0:47           ` Rich Felker
2020-03-11  8:54             ` Pirmin Walthert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82b69741-72e6-ab53-c523-ce4e1e7dc98e@wwcom.ch \
    --to=pirmin.walthert@wwcom.ch \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).