mailing list of musl libc
 help / color / Atom feed
* [musl] Re: FYI: some observations when testing next-gen malloc
       [not found] <41ea935d-39e4-1460-e502-5c82d7dd6a4d@wwcom.ch>
@ 2020-03-09 17:12 ` Rich Felker
  2020-03-09 18:14   ` Pirmin Walthert
  0 siblings, 1 reply; 6+ messages in thread
From: Rich Felker @ 2020-03-09 17:12 UTC (permalink / raw)
  To: Pirmin Walthert; +Cc: musl

On Mon, Mar 09, 2020 at 05:49:02PM +0100, Pirmin Walthert wrote:
> Dear Rich,
> 
> First of all many thanks for your brilliant C library.
> 
> As I do not know whether the musl mailinglist is already the right
> place to discuss the next-gen malloc module, I decided to send you
> my observations directly.

It is, so I'm cc'ing the list now.

> I'd like to mention that I am not yet entirely sure whether the
> following is a problem with the new malloc code or with asterisk
> itself but maybe you can already keep the following in the back of
> your head if someone else is reporting similar behavior with a
> different application:
> 
> We use asterisk (16.7) in a musl libc based distribution and for
> some operations asterisk forks (in a thread) the main process to
> execute a system command. When using libmallocng.so (newest version
> with "fix race condition in lock-free path of free" applied, but
> already without that change) some of these forked child processes
> will hang during a call to pthread_mutex_unlock.
> 
> Unfortunatelly the backtrace is not of much help I guess, but the
> child process always seems to hang on pthread_mutex_unlock. So
> something seems to happen with the mutex on fork:
> 
> #0  0x00007f2152a20092 in pthread_mutex_unlock () from
> /lib/ld-musl-x86_64.so.1
> No symbol table info available.
> #1  0x0000000000000008 in ?? ()
> No symbol table info available.
> #2  0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> I will for sure try to dig into this further. For the moment the
> only thing I know is that I did not yet observe this on any of the
> several hundred systems with musl 1.1.23 (same asterisk version),
> not on any of the around 5 with 1.2.0 (same asterisk version, old
> malloc) but quite frequently on the two systems with 1.1.24 and
> libmallocng.so.

This is completely expected and should happen with old or new malloc.
I'm surprised you haven't hit it before. After a multithreaded process
calls fork, the child inherits a state where locks may be permanently
held. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html

    - A process shall be created with a single thread. If a
      multi-threaded process calls fork(), the new process shall
      contain a replica of the calling thread and its entire address
      space, possibly including the states of mutexes and other
      resources. Consequently, to avoid errors, the child process may
      only execute async-signal-safe operations until such time as one
      of the exec functions is called.

It's not described very rigorously, but effectively it's in an async
signal context and can only call functions which are AS-safe.

A future version of the standard is expected to drop the requirement
that fork itself be async-signal-safe, and may thereby add
requirements to synchronize against some or all internal locks so that
the child can inherit a working context. But the right solution here is
always to stop using fork without exec.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] Re: FYI: some observations when testing next-gen malloc
  2020-03-09 17:12 ` [musl] Re: FYI: some observations when testing next-gen malloc Rich Felker
@ 2020-03-09 18:14   ` Pirmin Walthert
  2020-03-09 18:55     ` Szabolcs Nagy
  0 siblings, 1 reply; 6+ messages in thread
From: Pirmin Walthert @ 2020-03-09 18:14 UTC (permalink / raw)
  To: musl

Am 09.03.20 um 18:12 schrieb Rich Felker:
> On Mon, Mar 09, 2020 at 05:49:02PM +0100, Pirmin Walthert wrote:
>> Dear Rich,
>>
>> First of all many thanks for your brilliant C library.
>>
>> As I do not know whether the musl mailinglist is already the right
>> place to discuss the next-gen malloc module, I decided to send you
>> my observations directly.
> It is, so I'm cc'ing the list now.
>
>> I'd like to mention that I am not yet entirely sure whether the
>> following is a problem with the new malloc code or with asterisk
>> itself but maybe you can already keep the following in the back of
>> your head if someone else is reporting similar behavior with a
>> different application:
>>
>> We use asterisk (16.7) in a musl libc based distribution and for
>> some operations asterisk forks (in a thread) the main process to
>> execute a system command. When using libmallocng.so (newest version
>> with "fix race condition in lock-free path of free" applied, but
>> already without that change) some of these forked child processes
>> will hang during a call to pthread_mutex_unlock.
>>
>> Unfortunatelly the backtrace is not of much help I guess, but the
>> child process always seems to hang on pthread_mutex_unlock. So
>> something seems to happen with the mutex on fork:
>>
>> #0  0x00007f2152a20092 in pthread_mutex_unlock () from
>> /lib/ld-musl-x86_64.so.1
>> No symbol table info available.
>> #1  0x0000000000000008 in ?? ()
>> No symbol table info available.
>> #2  0x0000000000000000 in ?? ()
>> No symbol table info available.
>>
>> I will for sure try to dig into this further. For the moment the
>> only thing I know is that I did not yet observe this on any of the
>> several hundred systems with musl 1.1.23 (same asterisk version),
>> not on any of the around 5 with 1.2.0 (same asterisk version, old
>> malloc) but quite frequently on the two systems with 1.1.24 and
>> libmallocng.so.
> This is completely expected and should happen with old or new malloc.
> I'm surprised you haven't hit it before. After a multithreaded process
> calls fork, the child inherits a state where locks may be permanently
> held. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
>
>      - A process shall be created with a single thread. If a
>        multi-threaded process calls fork(), the new process shall
>        contain a replica of the calling thread and its entire address
>        space, possibly including the states of mutexes and other
>        resources. Consequently, to avoid errors, the child process may
>        only execute async-signal-safe operations until such time as one
>        of the exec functions is called.
>
> It's not described very rigorously, but effectively it's in an async
> signal context and can only call functions which are AS-safe.
>
> A future version of the standard is expected to drop the requirement
> that fork itself be async-signal-safe, and may thereby add
> requirements to synchronize against some or all internal locks so that
> the child can inherit a working context. But the right solution here is
> always to stop using fork without exec.
>
> Rich

Well, I have now changed the code a bit to make sure that no 
async-signal-unsafe command is being executed before execl. Things I've 
removed:

a call to cap_from_text, cap_set_proc and cap_free has been removed as 
well as sched_setscheduler. Now the only thing being executed before 
execl in the child process is closefrom()

However I got a hanging process again:

(gdb) bt full
#0  0x00007f42f649c6da in __syscall_cp_c () from /lib/ld-musl-x86_64.so.1
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.

Best regards,

Pirmin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] Re: FYI: some observations when testing next-gen malloc
  2020-03-09 18:14   ` Pirmin Walthert
@ 2020-03-09 18:55     ` Szabolcs Nagy
       [not found]       ` <5957e47c-50c6-0ae1-3e5c-32fd96c756eb@wwcom.ch>
  0 siblings, 1 reply; 6+ messages in thread
From: Szabolcs Nagy @ 2020-03-09 18:55 UTC (permalink / raw)
  To: Pirmin Walthert; +Cc: musl

* Pirmin Walthert <pirmin.walthert@wwcom.ch> [2020-03-09 19:14:59 +0100]:
> Am 09.03.20 um 18:12 schrieb Rich Felker:
> > On Mon, Mar 09, 2020 at 05:49:02PM +0100, Pirmin Walthert wrote:
> > > I'd like to mention that I am not yet entirely sure whether the
> > > following is a problem with the new malloc code or with asterisk
> > > itself but maybe you can already keep the following in the back of
> > > your head if someone else is reporting similar behavior with a
> > > different application:
> > > 
> > > We use asterisk (16.7) in a musl libc based distribution and for
> > > some operations asterisk forks (in a thread) the main process to
> > > execute a system command. When using libmallocng.so (newest version
> > > with "fix race condition in lock-free path of free" applied, but
> > > already without that change) some of these forked child processes
> > > will hang during a call to pthread_mutex_unlock.
> > > 
> > > Unfortunatelly the backtrace is not of much help I guess, but the
> > > child process always seems to hang on pthread_mutex_unlock. So
> > > something seems to happen with the mutex on fork:
> > > 
> > > #0  0x00007f2152a20092 in pthread_mutex_unlock () from
> > > /lib/ld-musl-x86_64.so.1
> > > No symbol table info available.
> > > #1  0x0000000000000008 in ?? ()
> > > No symbol table info available.
> > > #2  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > 
> > > I will for sure try to dig into this further. For the moment the
> > > only thing I know is that I did not yet observe this on any of the
> > > several hundred systems with musl 1.1.23 (same asterisk version),
> > > not on any of the around 5 with 1.2.0 (same asterisk version, old
> > > malloc) but quite frequently on the two systems with 1.1.24 and
> > > libmallocng.so.
> > This is completely expected and should happen with old or new malloc.
> > I'm surprised you haven't hit it before. After a multithreaded process
> > calls fork, the child inherits a state where locks may be permanently
> > held. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
> > 
> >      - A process shall be created with a single thread. If a
> >        multi-threaded process calls fork(), the new process shall
> >        contain a replica of the calling thread and its entire address
> >        space, possibly including the states of mutexes and other
> >        resources. Consequently, to avoid errors, the child process may
> >        only execute async-signal-safe operations until such time as one
> >        of the exec functions is called.
> > 
> > It's not described very rigorously, but effectively it's in an async
> > signal context and can only call functions which are AS-safe.
> > 
> > A future version of the standard is expected to drop the requirement
> > that fork itself be async-signal-safe, and may thereby add
> > requirements to synchronize against some or all internal locks so that
> > the child can inherit a working context. But the right solution here is
> > always to stop using fork without exec.
> > 
> > Rich
> 
> Well, I have now changed the code a bit to make sure that no
> async-signal-unsafe command is being executed before execl. Things I've
> removed:
> 
> a call to cap_from_text, cap_set_proc and cap_free has been removed as well
> as sched_setscheduler. Now the only thing being executed before execl in the
> child process is closefrom()


closefrom is not as-safe.

i think it reads /proc/self/fd directory to close fds.
(haven't checked the specific asterisk version)
opendir calls malloc so it can deadlock.

> 
> However I got a hanging process again:
> 
> (gdb) bt full
> #0  0x00007f42f649c6da in __syscall_cp_c () from /lib/ld-musl-x86_64.so.1
> No symbol table info available.
> #1  0x0000000000000000 in ?? ()
> No symbol table info available.
> 
> Best regards,
> 
> Pirmin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] Re: FYI: some observations when testing next-gen malloc
       [not found]       ` <5957e47c-50c6-0ae1-3e5c-32fd96c756eb@wwcom.ch>
@ 2020-03-10 10:06         ` Szabolcs Nagy
  2020-03-11  0:47           ` Rich Felker
  0 siblings, 1 reply; 6+ messages in thread
From: Szabolcs Nagy @ 2020-03-10 10:06 UTC (permalink / raw)
  To: Pirmin Walthert; +Cc: musl

* Pirmin Walthert <pirmin.walthert@wwcom.ch> [2020-03-10 10:44:46 +0100]:
> Am 09.03.20 um 19:55 schrieb Szabolcs Nagy:
> > * Pirmin Walthert <pirmin.walthert@wwcom.ch> [2020-03-09 19:14:59 +0100]:
> > > Am 09.03.20 um 18:12 schrieb Rich Felker:
> > > > It's not described very rigorously, but effectively it's in an async
> > > > signal context and can only call functions which are AS-safe.
> > > > 
> > > > A future version of the standard is expected to drop the requirement
> > > > that fork itself be async-signal-safe, and may thereby add
> > > > requirements to synchronize against some or all internal locks so that
> > > > the child can inherit a working context. But the right solution here is
> > > > always to stop using fork without exec.
> > > > 
> > > > Rich
> > > Well, I have now changed the code a bit to make sure that no
> > > async-signal-unsafe command is being executed before execl. Things I've
> > > removed:
> > > 
> > > a call to cap_from_text, cap_set_proc and cap_free has been removed as well
> > > as sched_setscheduler. Now the only thing being executed before execl in the
> > > child process is closefrom()
> > 
> > closefrom is not as-safe.
> > 
> > i think it reads /proc/self/fd directory to close fds.
> > (haven't checked the specific asterisk version)
> > opendir calls malloc so it can deadlock.
> > 
> Indeed I am not able to reproduce the problem any longer with a modified
> version of asterisk. What I've changed is:
> 
> Removed some code that sets the capabilities after fork() (with
> cap_from_text, cap_set_proc, cap_free) and closefrom replaced with a thumb
> loop over all possible fds up to sysconf(_SC_OPEN_MAX). With this
> modification the fd closing procedure with max open files set to 21471 now
> needs 7ms instead of 70ns (so a slowdown by times 100), however this is not
> critical in our environment...
> 
> Will discuss the findings with the asterisk developers.
> 
> Thanks for your hints!

good.

ideally they would use close-on-exec fds and then
you don't need such ugliness.

please don't drop the list from replies.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] Re: FYI: some observations when testing next-gen malloc
  2020-03-10 10:06         ` Szabolcs Nagy
@ 2020-03-11  0:47           ` Rich Felker
  2020-03-11  8:54             ` Pirmin Walthert
  0 siblings, 1 reply; 6+ messages in thread
From: Rich Felker @ 2020-03-11  0:47 UTC (permalink / raw)
  To: Pirmin Walthert, musl

On Tue, Mar 10, 2020 at 11:06:57AM +0100, Szabolcs Nagy wrote:
> * Pirmin Walthert <pirmin.walthert@wwcom.ch> [2020-03-10 10:44:46 +0100]:
> > Am 09.03.20 um 19:55 schrieb Szabolcs Nagy:
> > > * Pirmin Walthert <pirmin.walthert@wwcom.ch> [2020-03-09 19:14:59 +0100]:
> > > > Am 09.03.20 um 18:12 schrieb Rich Felker:
> > > > > It's not described very rigorously, but effectively it's in an async
> > > > > signal context and can only call functions which are AS-safe.
> > > > > 
> > > > > A future version of the standard is expected to drop the requirement
> > > > > that fork itself be async-signal-safe, and may thereby add
> > > > > requirements to synchronize against some or all internal locks so that
> > > > > the child can inherit a working context. But the right solution here is
> > > > > always to stop using fork without exec.
> > > > > 
> > > > > Rich
> > > > Well, I have now changed the code a bit to make sure that no
> > > > async-signal-unsafe command is being executed before execl. Things I've
> > > > removed:
> > > > 
> > > > a call to cap_from_text, cap_set_proc and cap_free has been removed as well
> > > > as sched_setscheduler. Now the only thing being executed before execl in the
> > > > child process is closefrom()
> > > 
> > > closefrom is not as-safe.
> > > 
> > > i think it reads /proc/self/fd directory to close fds.
> > > (haven't checked the specific asterisk version)
> > > opendir calls malloc so it can deadlock.
> > > 
> > Indeed I am not able to reproduce the problem any longer with a modified
> > version of asterisk. What I've changed is:
> > 
> > Removed some code that sets the capabilities after fork() (with
> > cap_from_text, cap_set_proc, cap_free) and closefrom replaced with a thumb
> > loop over all possible fds up to sysconf(_SC_OPEN_MAX). With this
> > modification the fd closing procedure with max open files set to 21471 now
> > needs 7ms instead of 70ns (so a slowdown by times 100), however this is not
> > critical in our environment...
> > 
> > Will discuss the findings with the asterisk developers.
> > 
> > Thanks for your hints!
> 
> good.
> 
> ideally they would use close-on-exec fds and then
> you don't need such ugliness.
> 
> please don't drop the list from replies.

While indeed the right thing is not to do is a closefrom/closeall hack
at all, if you really can't fix the fd leaks and need to, there is a
fast but safe version that doesn't require listing /proc/self/fd.
Instead, call poll() with a large array of pollfd with .events = 0 for
each element, and zero timeout, and check the .revents of each for
POLLNVAL. This will tell you with very few syscalls (probably just 1)
which file descriptors are open.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] Re: FYI: some observations when testing next-gen malloc
  2020-03-11  0:47           ` Rich Felker
@ 2020-03-11  8:54             ` Pirmin Walthert
  0 siblings, 0 replies; 6+ messages in thread
From: Pirmin Walthert @ 2020-03-11  8:54 UTC (permalink / raw)
  To: musl


Am 11.03.20 um 01:47 schrieb Rich Felker:
> On Tue, Mar 10, 2020 at 11:06:57AM +0100, Szabolcs Nagy wrote:
>> * Pirmin Walthert<pirmin.walthert@wwcom.ch>  [2020-03-10 10:44:46 +0100]:
>>> Am 09.03.20 um 19:55 schrieb Szabolcs Nagy:
>>>> * Pirmin Walthert<pirmin.walthert@wwcom.ch>  [2020-03-09 19:14:59 +0100]:
>>>>> Am 09.03.20 um 18:12 schrieb Rich Felker:
>>>>>> It's not described very rigorously, but effectively it's in an async
>>>>>> signal context and can only call functions which are AS-safe.
>>>>>>
>>>>>> A future version of the standard is expected to drop the requirement
>>>>>> that fork itself be async-signal-safe, and may thereby add
>>>>>> requirements to synchronize against some or all internal locks so that
>>>>>> the child can inherit a working context. But the right solution here is
>>>>>> always to stop using fork without exec.
>>>>>>
>>>>>> Rich
>>>>> Well, I have now changed the code a bit to make sure that no
>>>>> async-signal-unsafe command is being executed before execl. Things I've
>>>>> removed:
>>>>>
>>>>> a call to cap_from_text, cap_set_proc and cap_free has been removed as well
>>>>> as sched_setscheduler. Now the only thing being executed before execl in the
>>>>> child process is closefrom()
>>>> closefrom is not as-safe.
>>>>
>>>> i think it reads /proc/self/fd directory to close fds.
>>>> (haven't checked the specific asterisk version)
>>>> opendir calls malloc so it can deadlock.
>>>>
>>> Indeed I am not able to reproduce the problem any longer with a modified
>>> version of asterisk. What I've changed is:
>>>
>>> Removed some code that sets the capabilities after fork() (with
>>> cap_from_text, cap_set_proc, cap_free) and closefrom replaced with a thumb
>>> loop over all possible fds up to sysconf(_SC_OPEN_MAX). With this
>>> modification the fd closing procedure with max open files set to 21471 now
>>> needs 7ms instead of 70ns (so a slowdown by times 100), however this is not
>>> critical in our environment...
>>>
>>> Will discuss the findings with the asterisk developers.
>>>
>>> Thanks for your hints!
>> good.
>>
>> ideally they would use close-on-exec fds and then
>> you don't need such ugliness.
>>
>> please don't drop the list from replies.
> While indeed the right thing is not to do is a closefrom/closeall hack
> at all, if you really can't fix the fd leaks and need to, there is a
> fast but safe version that doesn't require listing /proc/self/fd.
> Instead, call poll() with a large array of pollfd with .events = 0 for
> each element, and zero timeout, and check the .revents of each for
> POLLNVAL. This will tell you with very few syscalls (probably just 1)
> which file descriptors are open.
>
> Rich

First of all: the quick&dirty solution was never meant to be a real 
solution at all, it was more like a proof of concept to show that the 
deadlocks do not happen when removing the non-async-signal-safe 
syscalls. So when I opened the asterisk bug report yesterday I'd already 
mentioned that I would not consider this to be the final solution/right 
approach ...

But: thanks a lot, this is a nice solution, it improved the performance 
by times ten (~800ns instead of 8000ns, but still slower than the 
original approach which needed 70ns (Measured with defining the array 
after the fork, this seems to be the most time consuming part)).

Funnily the quick&dirty loop approach is the one you'll find in most 
programs and forums... In my own software I usually keep track of the 
biggest fd and simply close up to this one what should work well enough 
as long as the process is not forked from several threads.

While I am not an asterisk core dev (only submitted a few patches in the 
past), I assume that the fds are simply closed to ensure that the 
command executed with exec will not have access to any fds it shouldn't 
have, so simply for security reasons.

However, here is what I've done now based on your input (asterisk can 
have lots of fds, especially with SIP over TCP. So for big systems they 
usually recommend to set the max open files to a value > 1024. However: 
with 1024 for POLL_SIZE I got the best performance, even though I needed 
to do a few more syscalls this way):

                 #define POLL_SIZE 1024

                 int maxfd = sysconf (_SC_OPEN_MAX);
                 if (maxfd == -1)
                         maxfd = 1024;
                 if (maxfd > 65536)
                         maxfd = 65536;
                 struct pollfd fds[POLL_SIZE];
                 int fd=STDERR_FILENO+1, loopmax, i;
                 while (fd<maxfd) {
                         loopmax = maxfd - fd;
                         if (loopmax>POLL_SIZE) {
                                 loopmax = POLL_SIZE;
                         }
                         for (i=0;i<loopmax;i++) {
                                 fds[i].fd = fd+i;
                                 fds[i].events = 0;
                         }
                         poll(fds, loopmax, 0);
                         for (i=0;i<loopmax;i++) {
                                 if (fds[i].revents == POLLNVAL) {
                                         continue;
                                 }
                                 close(fds[i].fd);
                         }
                         fd+=loopmax;
                 }



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <41ea935d-39e4-1460-e502-5c82d7dd6a4d@wwcom.ch>
2020-03-09 17:12 ` [musl] Re: FYI: some observations when testing next-gen malloc Rich Felker
2020-03-09 18:14   ` Pirmin Walthert
2020-03-09 18:55     ` Szabolcs Nagy
     [not found]       ` <5957e47c-50c6-0ae1-3e5c-32fd96c756eb@wwcom.ch>
2020-03-10 10:06         ` Szabolcs Nagy
2020-03-11  0:47           ` Rich Felker
2020-03-11  8:54             ` Pirmin Walthert

mailing list of musl libc

Archives are clonable: git clone --mirror http://inbox.vuxu.org/musl

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.musl


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git