From: Eric Hassold <hassold@gmail.com>
To: musl@lists.openwall.com
Subject: Re: Fix pthread_create on some devices failing to initialize guard area
Date: Mon, 30 Jan 2017 13:30:00 -0800 [thread overview]
Message-ID: <81f188cf-3fb8-2899-5c24-ec72d38ad300@gmail.com> (raw)
In-Reply-To: <80ab9b97-dc0d-e642-cbf1-1b20e1cddf64@gmail.com>
On 1/20/17 2:42 PM, Eric Hassold wrote:
>
>
> On 1/20/17 1:29 PM, Rich Felker wrote:
>> On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
>>> On 1/20/17 11:56 AM, Rich Felker wrote:
>>>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>>>>> Hi All,
>>>>>
>>>>> While deploying test static executable across farm of different
>>>>> embedded systems, found out that pthread_create() is failing
>>>>> systematically on some (very few) arm-linux devices whenever non
>>>>> null stack guard is enabled (that is, also when calling
>>>>> pthread_create with default - i.e. null - attributes since default
>>>>> is a one page of guard). One of those device is for example a
>>>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>>>>> alternative libc implementations (glibc, uClibc) works as expected
>>>>> on those devices.
>>>>>
>>>>>
>>>>> Issue
>>>>>
>>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>>> In current implementation, if guard size is non null, memory for
>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>>> systematically fails in this scenario (returning error code EINVAL),
>>>>> it is impossible to create thread.
>>>> Failure is ignored and the memory is assumed to be writable in this
>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>>> wrong in your case, and if so, can you explain why?
>>> In my case, devices exhibiting issue are not MMU-less, they are
>>> Cortex-A9 devices with valid mmu / page protection working as
>>> expected otherwise. Note that current Musl code assumes ENOSYS means
>>> no MMU and handles it by assuming the system has no page protection
>>> at all. For the case I observe, it is EINVAL which is returned, this
>>> is not ignored, so memory is unmap'ed and pthread_create() fails.
>> In that case I think this is a kernel bug. Do you know why EINVAL is
>> happening? If there's an MMU, Linux should be able to replace the
>> anon PROT_NONE pages with anon RW pages.
>
> Agree. Unfortunately, those are devices we don't built the kernel for,
> so have been hardly able to track issue deeper. The point is however
> that such devices with this issue in kernel might not be that
> uncommon, and it concretely means impossibility at that moment to
> deploy on them a functional static executable built with musl.
>>
>>>>> In proposed patch (attached below), memory for (guard + stack + ...)
>>>>> is first mmap'ed with read/write accessibility, then guard area is
>>>>> protected by calling mprotect() with PROT_NONE on guardsize first
>>>>> bytes of returned memory. This call to mprotect() to remove all
>>>>> accessibility on guard area, with guard area being at beginning of
>>>>> previously mmap'ed memory, works correctly on those platforms having
>>>>> issue with current implementation. Incidentally, this makes the
>>>>> logic more concise to handle both cases (with or without guard) is a
>>>>> more consistent way, and handle systems with partial/invalid page
>>>>> protection implementation (e.g. mprotect() returning ENOSYS) more
>>>>> gracefully since the stack is explicitly created with read/write
>>>>> access.
>>>> This doesn't work correctly on normal systems with mmu, because the
>>>> size of the guard pages is accounted against commit charge. Linux
>>>> should, but AFAIK doesn't, subtract it from commit charge once it's
>>>> changed to PROT_NONE without having been dirtied, but even if this bug
>>>> is fixed on the kernel side, there would still be a moment where
>>>> excess commit charge is consumed and thus where pthread_create might
>>>> spuriously fail or cause allocations in other processes/threads to
>>>> fail.
>>>>
>>>> If the kernel is not allocating actually-usable address ranges for
>>>> PROT_NONE on all nommu systems, I think the only solution is to handle
>>>> EINVAL from mprotect by going back and re-doing the mmap with
>>>> PROT_READ|PROT_WRITE. Do you have any better ideas?
>>>>
>>>> Rich
>>> Had this "deja vu" feeling... reminds me conversation you had in
>>> this thread some time ago elsewhere...
>>> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
>>>
>>> Your proposition seems reasonable on nommu system, but again, the
>>> issue observed here is on legit systems with mmu, with mprotect
>>> failing with EINVAL (and not ENOSYS), for some other reason than
>>> system not supporting page protection. Catching EINVAL error
>>> returned by mprotect and falling back to re-doing the mmap would
>>> mean actually silently running without stack guard on system
>>> supporting it, so I believe it is actually legitimate to fail and
>>> return error in that case. But that's difference use case than the
>>> issue I'm observing.
>>>
>>> I took note of Balazs's suggestion (in the thread referenced above)
>>> to switch to a pattern similar to Musl's current one
>>> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
>>> those guard pages to actually occupy resources. But I can indeed
>>> observe that this approach fails on some devices (which have valid
>>> mmu), while I'm not sure I'm seeing the issue with first mapping
>>> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
>>> approach (as implemented by patch) is, at least, consistent with all
>>> the other implementations out there (I checked glibc's
>>> allocatestack.c, but also e.g. bionic), and couldn't find report of
>>> those failures you are envisaging.
>> Consider the case of guard_size=128M stack_size=128k with
>> Commit_Limit=128M. This will fail with your approach but works
>> perfectly well now.
> Right, makes sense. Though I would point out that such uncommon
> scenario for an application would fail when linked with anything but
> musl, since "my approach" is the widespread one across all other libc
> implementations (glibc, bionic, ...). But I understand both approaches
> aim at working around issue in some corner case while creating issue
> in other rare one, and so at the end it's all about deciding which
> edge case is more critical than the other, and whether
> consistency/compatibility with existing 3rd parties solutions matters
> or not.
>
> But in order to get closer to "having the cake and eat it too", please
> find attached another patch implementing the "unmaping and re-doing"
> strategy you initially suggested, i.e. starting with current approach,
> then giving it a second chance using "my" approach if and only if
> issue with mprotect() is detected. Should be consistent with current
> behavior on any system working correctly today, and just provide a
> plan B falling back to more common approach only when needed. Does
> that sound more acceptable?
>
> Thanks,
>
> Eric
>
> ----
> [PATCH] set up guard protection after stack is mapped
>
> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack
> fails on
> some devices with buggy kernel, making it impossible to create
> thread. if such condition is met, fall back to allocating memory
> for stack and guard first, with read and write permission, then
> protect guard
> area.
> ---
> src/thread/pthread_create.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 49f2f72..d3c030b 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
> const pthread_attr_t *restrict att
> if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
> && errno != ENOSYS) {
> __munmap(map, size);
> - goto fail;
> + if (errno == EINVAL) {
> + map = __mmap(0, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANON, -1, 0);
> + if (map == MAP_FAILED) goto fail;
> + if (__mprotect(map, guard, PROT_NONE)
> + && errno != ENOSYS) {
> + __munmap(map, size);
> + goto fail;
> + }
> + } else {
> + goto fail;
> + }
> }
> } else {
> map = __mmap(0, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANON, -1, 0);
Hi Rich,
Pinging... any comment, feedback or concern about latest version of the
patch, attached above, keeping current behavior but falling back to
(mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only if
current approach detected to fail) ?
Eric
next prev parent reply other threads:[~2017-01-30 21:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-20 19:45 Eric Hassold
2017-01-20 19:56 ` Rich Felker
2017-01-20 21:04 ` Eric Hassold
2017-01-20 21:29 ` Rich Felker
2017-01-20 22:42 ` Eric Hassold
2017-01-30 21:30 ` Eric Hassold [this message]
2017-01-30 23:13 ` Rich Felker
2017-01-31 2:52 ` Eric Hassold
2017-01-31 3:58 ` Rich Felker
2017-01-31 21:18 ` Eric Hassold
2017-01-31 22:44 ` Eric Hassold
2017-02-01 9:52 ` Szabolcs Nagy
2017-02-01 18:21 ` Eric Hassold
2017-02-01 18:35 ` Rich Felker
2017-02-01 18:52 ` Eric Hassold
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=81f188cf-3fb8-2899-5c24-ec72d38ad300@gmail.com \
--to=hassold@gmail.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).