From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10994 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Hassold Newsgroups: gmane.linux.lib.musl.general Subject: Re: Fix pthread_create on some devices failing to initialize guard area Date: Mon, 30 Jan 2017 18:52:46 -0800 Message-ID: <5be67b99-2853-85ee-6ea3-ddf519bb6031@gmail.com> References: <20170120195649.GS1533@brightrain.aerifal.cx> <30588c41-eb3d-627e-c5eb-91e19ef56790@gmail.com> <20170120212933.GT1533@brightrain.aerifal.cx> <80ab9b97-dc0d-e642-cbf1-1b20e1cddf64@gmail.com> <81f188cf-3fb8-2899-5c24-ec72d38ad300@gmail.com> <20170130231321.GO1533@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1485831183 23876 195.159.176.226 (31 Jan 2017 02:53:03 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 31 Jan 2017 02:53:03 +0000 (UTC) User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 To: musl@lists.openwall.com Original-X-From: musl-return-11009-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 31 03:53:00 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cYOZ1-00062C-8x for gllmg-musl@m.gmane.org; Tue, 31 Jan 2017 03:52:59 +0100 Original-Received: (qmail 28627 invoked by uid 550); 31 Jan 2017 02:53:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 28606 invoked from network); 31 Jan 2017 02:52:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=Roa24aGhFvTaNuayE5YyUmrRQOVqiCTt4wIPMfo5MFg=; b=kLFy5B1VK9vi1s5zEOFzbFmltHP0YGpRweCK2mRXlhuUMIlIQLpgSc/7A+nxxUgXfa QNRruJBHLSpXvMaVAwt/5ljvqUT7q5VNyXB6mZYFJCzcx61cDXgTKRN1ew8FRKp8IngO FWQwA7PhGe/9o6a0iTUWvCc4Cy8S27vnzj1K42QslphtwBbx6qY50t3raPD9+cSKOkYZ FyUY5Hl54EwggWnjBLb5Y0W94VHw2ufBmhAuZ9k+WbdXIAL/Ldh+eNPEGWBFrpiJz4yX 3y4ys/Kuc9XCuSowUAlAmk3aRWQRt0GMeKWbiSlVPEncdQFJcym3E5UDTxzIA0A4Lbdj vQEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=Roa24aGhFvTaNuayE5YyUmrRQOVqiCTt4wIPMfo5MFg=; b=QegF5KpWnWfoKC85xzuHf2TTAfQlfYy4+LfO3JPtia6eL6j6Y/UqM47NhUw4CKgjEj iZ28az0t5jP3h8grX4ybmVaHihDjCV/M5G6G1RY6PbJpHqH7xmiBgeJ5ai1+uYPoigFf 5mwLP/C0KJ99p9vpCs+5eLgBhNKEXzgCDqHWknW+f4XD7sFMUOO+ueIm4aio7QhVjoG/ gSToQOh38MzaSPyhi/XqKHTYzuQwaXlipAKdBhBk7a0yl+xqxv0LFyuOz4eNU4NwawhX mo/0STv3sStylviTAlWnilXTlmDowostWxIURxV1OIeOEqya525dPN1A00GC0KInu/yt Azgw== X-Gm-Message-State: AIkVDXLWOQei8zxceaQtv9pT31HCMCsiYgWcr5lbcBusskrKAgRXqesXARoqDsARNMlpmg== X-Received: by 10.98.208.70 with SMTP id p67mr27065146pfg.101.1485831167564; Mon, 30 Jan 2017 18:52:47 -0800 (PST) In-Reply-To: <20170130231321.GO1533@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:10994 Archived-At: On 1/30/17 3:13 PM, Rich Felker wrote: > On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote: >>>>>>> This occurs because of call to mprotect() in pthread_create fails. >>>>>>> In current implementation, if guard size is non null, memory for >>>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no >>>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable >>>>>>> read/write access to (memory + guardsize). Since call to mprotect() >>>>>>> systematically fails in this scenario (returning error code EINVAL), >>>>>>> it is impossible to create thread. >>>>>> Failure is ignored and the memory is assumed to be writable in this >>>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption >>>>>> wrong in your case, and if so, can you explain why? >>>>> In my case, devices exhibiting issue are not MMU-less, they are >>>>> Cortex-A9 devices with valid mmu / page protection working as >>>>> expected otherwise. Note that current Musl code assumes ENOSYS means >>>>> no MMU and handles it by assuming the system has no page protection >>>>> at all. For the case I observe, it is EINVAL which is returned, this >>>>> is not ignored, so memory is unmap'ed and pthread_create() fails. >>>> In that case I think this is a kernel bug. Do you know why EINVAL is >>>> happening? If there's an MMU, Linux should be able to replace the >>>> anon PROT_NONE pages with anon RW pages. >>> Agree. Unfortunately, those are devices we don't built the kernel >>> for, so have been hardly able to track issue deeper. The point is >>> however that such devices with this issue in kernel might not be >>> that uncommon, and it concretely means impossibility at that >>> moment to deploy on them a functional static executable built with >>> musl. >> [...] >> Pinging... any comment, feedback or concern about latest version of >> the patch, attached above, keeping current behavior but falling back >> to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only >> if current approach detected to fail) ? > I still want to know what's going on on the kernel side, because it > looks like this a rogue/nonsensical patch to the kernel that breaks > mmap functionality in a general way that has nothing to do with the > specific cpu/board. > > Rich Yes, I understand this would be ideal, and wish too I would be able to find out more about what's happening, beyond the symptomatic approach. But as mentioned, we have very little leverage to investigate much further what's going on, since issue wasn't reproducible on any device we built kernel for, but only on a few devices we don't build those kernels nor have easy way to perform some lower level debugging on those platforms (no jtag access). Telemetry was reporting failure on pthread creation on some devices, one of them we could reproduce in-house was a Marvell 375 board used e.g. in Western Digital MyCloud devices. But it seems hard to even just get reference to the actual GPL source of the exact kernel used there, so I don't see a way to find out if it's a broken patch introduced at one point by a vendor, or some transient regression fixed at one point but still having devices shipped with this oddity (kernel versions seems to be 3.10 branch, so one very rough guess is that it may be related to multiple patches done to support huge TLB on ARM at that time, but that's more of a guts feeling than supported by facts). Though I understand it is not the role of a libc implementation to work around all kind of flaws in kernel implementations that vendors ever shipped, I would find it valuable, in this specific case, to prevent musl from consistently bubbling up an issue not (easily) reproducible otherwise (e.g. with other libc implementations). One of the great value of musl is to allow easy deployment across a variety of devices (not as opened as we would ideally hope) of the same statically linked executable. This suggested patch aims at supporting this use case, since fixing kernel isn't an option there. Of course, this pragmatic motivation conflicts somehow with the legitimate goal to keep musl implementation lean and clean (from a functional standpoint, this doesn't introduce any change on behavior on all non broken platforms). So of course only you can value the pros (regarding deployment) and decide whether it is worth the cons of slightly "bloating" source code with alternative recovery path. Thanks, Eric