From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10959 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Hassold Newsgroups: gmane.linux.lib.musl.general Subject: Re: Fix pthread_create on some devices failing to initialize guard area Date: Fri, 20 Jan 2017 14:42:01 -0800 Message-ID: <80ab9b97-dc0d-e642-cbf1-1b20e1cddf64@gmail.com> References: <20170120195649.GS1533@brightrain.aerifal.cx> <30588c41-eb3d-627e-c5eb-91e19ef56790@gmail.com> <20170120212933.GT1533@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1484952140 926 195.159.176.226 (20 Jan 2017 22:42:20 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 20 Jan 2017 22:42:20 +0000 (UTC) User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 To: musl@lists.openwall.com Original-X-From: musl-return-10974-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jan 20 23:42:15 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cUhsr-00082N-3I for gllmg-musl@m.gmane.org; Fri, 20 Jan 2017 23:42:13 +0100 Original-Received: (qmail 28568 invoked by uid 550); 20 Jan 2017 22:42:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 28543 invoked from network); 20 Jan 2017 22:42:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=rPJuhCVUOi7GiPddN6mlrkLrIsThtj8N6j6OoWGtdrI=; b=h1r2+Qqzpm0fJvs2nbritUx7PVanp8ym8EvqeZy+oiRSIuJJ9tZsC1XFCV42i7CYN9 Xcz8Y3OFUnONLWxz4sAriNwpoEKyBRBT/CYHLSJc2MkZezeh0bIXShd3ZjmgCw0c4VvY ioJmFGR7AatjB8pMQ2Cu++7WQ+Wm5ny65tp/TxHCDfnP0u0QvIIrj6h/mOZ4IZECCMB0 veTYdPVcI9BpdKf6XfS77Bo9Zk+RVm0HuR2ROzaegUCGQK6sWWSRDLs9WzPBJ/Wwaq02 5hYXN5B0OFhX6D1RQL8JMTzfLYOBC+GtJz/i3Tul3vB4fqVCLugoMlY6mR9LFe1mNypH UtvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=rPJuhCVUOi7GiPddN6mlrkLrIsThtj8N6j6OoWGtdrI=; b=eXbiP3ucr57ZlLZSq46dLDdYjCXVTWPLedgAkfkwggBLk1RUGvz5gTRMylgDTUgI10 +h6/t8A5lcVRANQv4Ba7vYOrXa6/5Ly7JZQR3RhSO4asmaD0spBWjNrURPB+XIGfAc92 I5gcBRP/njgNoCbjHvJDxGWsxgvohzFOPjnnkbmjIU5ciuegzbmr5ZVCLXN5n5OzAwkM aj6gE2t597lo94NdbXcH/TaCEzdLgMpFvsmBLrOjw+SMpDKNjV7yfKD3LkjTbZWKsvtE orTaU1i0zdOsPBV2fmzCsseikHKJbfXV2+WoAwgi1pY4y4wGdgUCdUptypTgje9Kz6VF c5gg== X-Gm-Message-State: AIkVDXL9hxhxUF1Jgh+YE5uC1NDHG45xJW3op3lqzHhAdxVVmU999ElJTEJgFJY3q7Do1g== X-Received: by 10.99.135.195 with SMTP id i186mr19559935pge.105.1484952123283; Fri, 20 Jan 2017 14:42:03 -0800 (PST) In-Reply-To: <20170120212933.GT1533@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:10959 Archived-At: On 1/20/17 1:29 PM, Rich Felker wrote: > On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote: >> On 1/20/17 11:56 AM, Rich Felker wrote: >>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: >>>> Hi All, >>>> >>>> While deploying test static executable across farm of different >>>> embedded systems, found out that pthread_create() is failing >>>> systematically on some (very few) arm-linux devices whenever non >>>> null stack guard is enabled (that is, also when calling >>>> pthread_create with default - i.e. null - attributes since default >>>> is a one page of guard). One of those device is for example a >>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with >>>> alternative libc implementations (glibc, uClibc) works as expected >>>> on those devices. >>>> >>>> >>>> Issue >>>> >>>> This occurs because of call to mprotect() in pthread_create fails. >>>> In current implementation, if guard size is non null, memory for >>>> (guard + stack + ...) is first allocated (mmap'ed) with no >>>> accessibility (PROT_NONE), then mprotect() is called to re-enable >>>> read/write access to (memory + guardsize). Since call to mprotect() >>>> systematically fails in this scenario (returning error code EINVAL), >>>> it is impossible to create thread. >>> Failure is ignored and the memory is assumed to be writable in this >>> case, since EINVAL is assumed to imply no MMU. Is this assumption >>> wrong in your case, and if so, can you explain why? >> In my case, devices exhibiting issue are not MMU-less, they are >> Cortex-A9 devices with valid mmu / page protection working as >> expected otherwise. Note that current Musl code assumes ENOSYS means >> no MMU and handles it by assuming the system has no page protection >> at all. For the case I observe, it is EINVAL which is returned, this >> is not ignored, so memory is unmap'ed and pthread_create() fails. > In that case I think this is a kernel bug. Do you know why EINVAL is > happening? If there's an MMU, Linux should be able to replace the > anon PROT_NONE pages with anon RW pages. Agree. Unfortunately, those are devices we don't built the kernel for, so have been hardly able to track issue deeper. The point is however that such devices with this issue in kernel might not be that uncommon, and it concretely means impossibility at that moment to deploy on them a functional static executable built with musl. > >>>> In proposed patch (attached below), memory for (guard + stack + ...) >>>> is first mmap'ed with read/write accessibility, then guard area is >>>> protected by calling mprotect() with PROT_NONE on guardsize first >>>> bytes of returned memory. This call to mprotect() to remove all >>>> accessibility on guard area, with guard area being at beginning of >>>> previously mmap'ed memory, works correctly on those platforms having >>>> issue with current implementation. Incidentally, this makes the >>>> logic more concise to handle both cases (with or without guard) is a >>>> more consistent way, and handle systems with partial/invalid page >>>> protection implementation (e.g. mprotect() returning ENOSYS) more >>>> gracefully since the stack is explicitly created with read/write >>>> access. >>> This doesn't work correctly on normal systems with mmu, because the >>> size of the guard pages is accounted against commit charge. Linux >>> should, but AFAIK doesn't, subtract it from commit charge once it's >>> changed to PROT_NONE without having been dirtied, but even if this bug >>> is fixed on the kernel side, there would still be a moment where >>> excess commit charge is consumed and thus where pthread_create might >>> spuriously fail or cause allocations in other processes/threads to >>> fail. >>> >>> If the kernel is not allocating actually-usable address ranges for >>> PROT_NONE on all nommu systems, I think the only solution is to handle >>> EINVAL from mprotect by going back and re-doing the mmap with >>> PROT_READ|PROT_WRITE. Do you have any better ideas? >>> >>> Rich >> Had this "deja vu" feeling... reminds me conversation you had in >> this thread some time ago elsewhere... >> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html >> >> Your proposition seems reasonable on nommu system, but again, the >> issue observed here is on legit systems with mmu, with mprotect >> failing with EINVAL (and not ENOSYS), for some other reason than >> system not supporting page protection. Catching EINVAL error >> returned by mprotect and falling back to re-doing the mmap would >> mean actually silently running without stack guard on system >> supporting it, so I believe it is actually legitimate to fail and >> return error in that case. But that's difference use case than the >> issue I'm observing. >> >> I took note of Balazs's suggestion (in the thread referenced above) >> to switch to a pattern similar to Musl's current one >> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid >> those guard pages to actually occupy resources. But I can indeed >> observe that this approach fails on some devices (which have valid >> mmu), while I'm not sure I'm seeing the issue with first mapping >> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter >> approach (as implemented by patch) is, at least, consistent with all >> the other implementations out there (I checked glibc's >> allocatestack.c, but also e.g. bionic), and couldn't find report of >> those failures you are envisaging. > Consider the case of guard_size=128M stack_size=128k with > Commit_Limit=128M. This will fail with your approach but works > perfectly well now. Right, makes sense. Though I would point out that such uncommon scenario for an application would fail when linked with anything but musl, since "my approach" is the widespread one across all other libc implementations (glibc, bionic, ...). But I understand both approaches aim at working around issue in some corner case while creating issue in other rare one, and so at the end it's all about deciding which edge case is more critical than the other, and whether consistency/compatibility with existing 3rd parties solutions matters or not. But in order to get closer to "having the cake and eat it too", please find attached another patch implementing the "unmaping and re-doing" strategy you initially suggested, i.e. starting with current approach, then giving it a second chance using "my" approach if and only if issue with mprotect() is detected. Should be consistent with current behavior on any system working correctly today, and just provide a plan B falling back to more common approach only when needed. Does that sound more acceptable? Thanks, Eric ---- [PATCH] set up guard protection after stack is mapped calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on some devices with buggy kernel, making it impossible to create thread. if such condition is met, fall back to allocating memory for stack and guard first, with read and write permission, then protect guard area. --- src/thread/pthread_create.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index 49f2f72..d3c030b 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE) && errno != ENOSYS) { __munmap(map, size); - goto fail; + if (errno == EINVAL) { + map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); + if (map == MAP_FAILED) goto fail; + if (__mprotect(map, guard, PROT_NONE) + && errno != ENOSYS) { + __munmap(map, size); + goto fail; + } + } else { + goto fail; + } } } else { map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); -- 2.7.4