From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10958
Path: news.gmane.org!.POSTED!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Fix pthread_create on some devices failing to initialize
 guard area
Date: Fri, 20 Jan 2017 16:29:33 -0500
Message-ID: <20170120212933.GT1533@brightrain.aerifal.cx>
References: <dc9ff861-1893-10fe-07d9-333310fc56bc@gmail.com>
 <20170120195649.GS1533@brightrain.aerifal.cx>
 <30588c41-eb3d-627e-c5eb-91e19ef56790@gmail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1484947793 18597 195.159.176.226 (20 Jan 2017 21:29:53 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Fri, 20 Jan 2017 21:29:53 +0000 (UTC)
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-10973-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jan 20 22:29:49 2017
Return-path: <musl-return-10973-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-10973-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1cUgkg-0003uI-Ay
	for gllmg-musl@m.gmane.org; Fri, 20 Jan 2017 22:29:42 +0100
Original-Received: (qmail 28246 invoked by uid 550); 20 Jan 2017 21:29:46 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 28225 invoked from network); 20 Jan 2017 21:29:45 -0000
Content-Disposition: inline
In-Reply-To: <30588c41-eb3d-627e-c5eb-91e19ef56790@gmail.com>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:10958
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/10958>

On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
> 
> On 1/20/17 11:56 AM, Rich Felker wrote:
> >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
> >>Hi All,
> >>
> >>While deploying test static executable across farm of different
> >>embedded systems, found out that pthread_create() is failing
> >>systematically on some (very few) arm-linux devices whenever non
> >>null stack guard is enabled (that is, also when calling
> >>pthread_create with default - i.e. null - attributes since default
> >>is a one page of guard). One of those device is for example a
> >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with
> >>alternative libc implementations (glibc, uClibc) works as expected
> >>on those devices.
> >>
> >>
> >>Issue
> >>
> >>This occurs because of call to mprotect() in pthread_create fails.
> >>In current implementation, if guard size is non null, memory for
> >>(guard + stack + ...) is first allocated (mmap'ed) with no
> >>accessibility (PROT_NONE), then mprotect() is called to re-enable
> >>read/write access to (memory + guardsize). Since call to mprotect()
> >>systematically fails in this scenario (returning error code EINVAL),
> >>it is impossible to create thread.
> >Failure is ignored and the memory is assumed to be writable in this
> >case, since EINVAL is assumed to imply no MMU. Is this assumption
> >wrong in your case, and if so, can you explain why?
> 
> In my case, devices exhibiting issue are not MMU-less, they are
> Cortex-A9 devices with valid mmu / page protection working as
> expected otherwise. Note that current Musl code assumes ENOSYS means
> no MMU and handles it by assuming the system has no page protection
> at all. For the case I observe, it is EINVAL which is returned, this
> is not ignored, so memory is unmap'ed and pthread_create() fails.

In that case I think this is a kernel bug. Do you know why EINVAL is
happening? If there's an MMU, Linux should be able to replace the
anon PROT_NONE pages with anon RW pages.

> >>In proposed patch (attached below), memory for (guard + stack + ...)
> >>is first mmap'ed with read/write accessibility, then guard area is
> >>protected by calling mprotect() with PROT_NONE on guardsize first
> >>bytes of returned memory. This call to mprotect() to remove all
> >>accessibility on guard area, with guard area being at beginning of
> >>previously mmap'ed memory, works correctly on those platforms having
> >>issue with current implementation. Incidentally, this makes the
> >>logic more concise to handle both cases (with or without guard) is a
> >>more consistent way, and handle systems with partial/invalid page
> >>protection implementation (e.g. mprotect() returning ENOSYS) more
> >>gracefully since the stack is explicitly created with read/write
> >>access.
> >This doesn't work correctly on normal systems with mmu, because the
> >size of the guard pages is accounted against commit charge. Linux
> >should, but AFAIK doesn't, subtract it from commit charge once it's
> >changed to PROT_NONE without having been dirtied, but even if this bug
> >is fixed on the kernel side, there would still be a moment where
> >excess commit charge is consumed and thus where pthread_create might
> >spuriously fail or cause allocations in other processes/threads to
> >fail.
> >
> >If the kernel is not allocating actually-usable address ranges for
> >PROT_NONE on all nommu systems, I think the only solution is to handle
> >EINVAL from mprotect by going back and re-doing the mmap with
> >PROT_READ|PROT_WRITE. Do you have any better ideas?
> >
> >Rich
> Had this "deja vu" feeling... reminds me conversation you had in
> this thread some time ago elsewhere...
> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
> 
> Your proposition seems reasonable on nommu system, but again, the
> issue observed here is on legit systems with mmu, with mprotect
> failing with EINVAL (and not ENOSYS), for some other reason than
> system not supporting page protection. Catching EINVAL error
> returned by mprotect and falling back to re-doing the mmap would
> mean actually silently running without stack guard on system
> supporting it, so I believe it is actually legitimate to fail and
> return error in that case. But that's difference use case than the
> issue I'm observing.
> 
> I took note of Balazs's suggestion (in the thread referenced above)
> to switch to a pattern similar to Musl's current one
> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
> those guard pages to actually occupy resources. But I can indeed
> observe that this approach fails on some devices (which have valid
> mmu), while I'm not sure I'm seeing the issue with first mapping
> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
> approach (as implemented by patch) is, at least, consistent with all
> the other implementations out there (I checked glibc's
> allocatestack.c, but also e.g. bionic), and couldn't find report of
> those failures you are envisaging.

Consider the case of guard_size=128M stack_size=128k with
Commit_Limit=128M. This will fail with your approach but works
perfectly well now.

Rich