From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,RDNS_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 Received: (qmail 1883 invoked from network); 24 Mar 2020 13:53:30 -0000 Received-SPF: pass (mother.openwall.net: domain of lists.openwall.com designates 195.42.179.200 as permitted sender) receiver=inbox.vuxu.org; client-ip=195.42.179.200 envelope-from= Received: from unknown (HELO mother.openwall.net) (195.42.179.200) by inbox.vuxu.org with ESMTP; 24 Mar 2020 13:53:30 -0000 Received: (qmail 11952 invoked by uid 550); 24 Mar 2020 13:53:25 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 11934 invoked from network); 24 Mar 2020 13:53:25 -0000 Date: Tue, 24 Mar 2020 09:53:12 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200324135312.GV11469@brightrain.aerifal.cx> References: <20200323163829.GR11469@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [Bug] Do not ignore membarrier return code On Tue, Mar 24, 2020 at 02:20:08PM +0100, Julio Guerra wrote: > Hello Rich, > > Here are more details on what we did to reproduce the issue. > You can clone this gist > https://gist.github.com/vdeturckheim/d420310e272f525824d7e92e7e875024 > and have a look at the run.sh file example in order to get started > with it. The test.js file does a require of the js bindings of grpc, > which involves the dlopen. > > What we observed yesterday with this example was: > - It crashed approximately 9 times out of 10 on aws codebuild with the > machine BUILD_GENERAL1_SMALL (3 GB memory, 2 vCPUs). > - It worked all the time by only adding membarrier to the seccomp > profile of the docker run. > > But I wanted to give you more details with stack traces of the > segfault by retrying today with gdb but I cannot reproduce it > anymore...! > I'll retry later to see if I see the error again... > > If what you say about membarrier is true, I think there may be some > synchronization side-effect of the syscall since, afaik, node uses > threads in order to load the shared libraries in the libuv. My best guess, especially since the crash was unpredictable, is that the stack size on at least one thread is barely sufficient for what it's doing. The fallback path when the membarrier syscall fails requires the ability to deliver signals, and if any thread has insufficient space left on its stack to accept a signal, it will crash. Rich