From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 19658 invoked from network); 5 Oct 2023 12:39:13 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 5 Oct 2023 12:39:13 -0000 Received: (qmail 19606 invoked by uid 550); 5 Oct 2023 12:39:08 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 19572 invoked from network); 5 Oct 2023 12:39:07 -0000 Date: Thu, 5 Oct 2023 08:39:03 -0400 From: Rich Felker To: Markus Wichmann Cc: musl@lists.openwall.com Message-ID: <20231005123858.GH4163@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Hung processes with althttpd web server On Thu, Oct 05, 2023 at 05:37:41AM +0200, Markus Wichmann wrote: > Am Wed, Oct 04, 2023 at 09:41:41PM -0400 schrieb Carl Chave: > > Hello, I'm running the althttpd web server on Alpine Linux using a Ramnode VPS. > > > > I've been having issues for quite a while with "hung" processes. There > > is a long lived parent process and then a short lived forked process > > for each http request. What I've been seeing is that the forked > > processes will sometimes get stuck: > > > > sod01:/srv/www/log$ sudo strace -p 11329 > > strace: Process 11329 attached > > futex(0x7f5bdcd77900, FUTEX_WAIT_PRIVATE, 4294967295, NULL > > > > I often see this system call hung when signal handlers are doing > signal-unsafe things. Looking at the source code, that is exactly what > happens if the process catches a signal at the wrong time. Try removing > all calls to signal(); that should do what the designers intended > better (namely quit the process). If you want to log when a process dies > of unnatural causes, that's something the parent process can do. > > The signal handler will call MakeLogEntry(), and that will do > signal-unsafe things such as call free(), localtime(), or fopen(). If > the main process is currently using malloc() when that happens, you will > get precisely this hang. > > > > Please see this forum thread for additional information: > > https://sqlite.org/althttpd/forumpost/4dc31619341ce947 > > > > Seems like they haven't yet found the trail of the signal handler. OK, this is almost surely the source of the problem. It would still be interesting to know which lock is being hit here, since for the most part, locks are skipped in single-threaded processes. But even if the lock were skipped, the invalid calls to async-signal-unsafe functions from async-signal context would be corrupting the state those locks were meant to protect. That's probably what's happening on glibc (meaning this code only appears to work there, but it likely behaving dangerously). Rich