From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,RDNS_NONE,
	SPF_PASS autolearn=ham autolearn_force=no version=3.4.2
Received: (qmail 4455 invoked from network); 9 Mar 2020 18:55:51 -0000
Received-SPF:  pass (mother.openwall.net: domain of lists.openwall.com
  designates 195.42.179.200 as permitted sender)
  receiver=inbox.vuxu.org; client-ip=195.42.179.200
  envelope-from=<musl-return-15536-ml=inbox.vuxu.org@lists.openwall.com>
Received: from unknown (HELO mother.openwall.net) (195.42.179.200)
  by inbox.vuxu.org with ESMTP; 9 Mar 2020 18:55:51 -0000
Received: (qmail 32589 invoked by uid 550); 9 Mar 2020 18:55:50 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 32568 invoked from network); 9 Mar 2020 18:55:49 -0000
Date: Mon, 9 Mar 2020 19:55:37 +0100
From: Szabolcs Nagy <nsz@port70.net>
To: Pirmin Walthert <pirmin.walthert@wwcom.ch>
Cc: musl@lists.openwall.com
Message-ID: <20200309185536.GI14278@port70.net>
Mail-Followup-To: Pirmin Walthert <pirmin.walthert@wwcom.ch>,
	musl@lists.openwall.com
References: <41ea935d-39e4-1460-e502-5c82d7dd6a4d@wwcom.ch>
 <20200309171227.GY11469@brightrain.aerifal.cx>
 <82b69741-72e6-ab53-c523-ce4e1e7dc98e@wwcom.ch>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <82b69741-72e6-ab53-c523-ce4e1e7dc98e@wwcom.ch>
User-Agent: Mutt/1.10.1 (2018-07-13)
Subject: Re: [musl] Re: FYI: some observations when testing next-gen malloc

* Pirmin Walthert <pirmin.walthert@wwcom.ch> [2020-03-09 19:14:59 +0100]:
> Am 09.03.20 um 18:12 schrieb Rich Felker:
> > On Mon, Mar 09, 2020 at 05:49:02PM +0100, Pirmin Walthert wrote:
> > > I'd like to mention that I am not yet entirely sure whether the
> > > following is a problem with the new malloc code or with asterisk
> > > itself but maybe you can already keep the following in the back of
> > > your head if someone else is reporting similar behavior with a
> > > different application:
> > >=20
> > > We use asterisk (16.7) in a musl libc based distribution and for
> > > some operations asterisk forks (in a thread) the main process to
> > > execute a system command. When using libmallocng.so (newest version
> > > with "fix race condition in lock-free path of free" applied, but
> > > already without that change) some of these forked child processes
> > > will hang during a call to pthread_mutex_unlock.
> > >=20
> > > Unfortunatelly the backtrace is not of much help I guess, but the
> > > child process always seems to hang on pthread_mutex_unlock. So
> > > something seems to happen with the mutex on fork:
> > >=20
> > > #0=C2=A0 0x00007f2152a20092 in pthread_mutex_unlock () from
> > > /lib/ld-musl-x86_64.so.1
> > > No symbol table info available.
> > > #1=C2=A0 0x0000000000000008 in ?? ()
> > > No symbol table info available.
> > > #2=C2=A0 0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > >=20
> > > I will for sure try to dig into this further. For the moment the
> > > only thing I know is that I did not yet observe this on any of the
> > > several hundred systems with musl 1.1.23 (same asterisk version),
> > > not on any of the around 5 with 1.2.0 (same asterisk version, old
> > > malloc) but quite frequently on the two systems with 1.1.24 and
> > > libmallocng.so.
> > This is completely expected and should happen with old or new malloc.
> > I'm surprised you haven't hit it before. After a multithreaded process
> > calls fork, the child inherits a state where locks may be permanently
> > held. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fo=
rk.html
> >=20
> >      - A process shall be created with a single thread. If a
> >        multi-threaded process calls fork(), the new process shall
> >        contain a replica of the calling thread and its entire address
> >        space, possibly including the states of mutexes and other
> >        resources. Consequently, to avoid errors, the child process may
> >        only execute async-signal-safe operations until such time as one
> >        of the exec functions is called.
> >=20
> > It's not described very rigorously, but effectively it's in an async
> > signal context and can only call functions which are AS-safe.
> >=20
> > A future version of the standard is expected to drop the requirement
> > that fork itself be async-signal-safe, and may thereby add
> > requirements to synchronize against some or all internal locks so that
> > the child can inherit a working context. But the right solution here is
> > always to stop using fork without exec.
> >=20
> > Rich
>=20
> Well, I have now changed the code a bit to make sure that no
> async-signal-unsafe command is being executed before execl. Things I've
> removed:
>=20
> a call to cap_from_text, cap_set_proc and cap_free has been removed as we=
ll
> as sched_setscheduler. Now the only thing being executed before execl in =
the
> child process is closefrom()


closefrom is not as-safe.

i think it reads /proc/self/fd directory to close fds.
(haven't checked the specific asterisk version)
opendir calls malloc so it can deadlock.

>=20
> However I got a hanging process again:
>=20
> (gdb) bt full
> #0=C2=A0 0x00007f42f649c6da in __syscall_cp_c () from /lib/ld-musl-x86_64=
=2Eso.1
> No symbol table info available.
> #1=C2=A0 0x0000000000000000 in ?? ()
> No symbol table info available.
>=20
> Best regards,
>=20
> Pirmin