From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10958 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Fix pthread_create on some devices failing to initialize guard area Date: Fri, 20 Jan 2017 16:29:33 -0500 Message-ID: <20170120212933.GT1533@brightrain.aerifal.cx> References: <20170120195649.GS1533@brightrain.aerifal.cx> <30588c41-eb3d-627e-c5eb-91e19ef56790@gmail.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1484947793 18597 195.159.176.226 (20 Jan 2017 21:29:53 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 20 Jan 2017 21:29:53 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-10973-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jan 20 22:29:49 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cUgkg-0003uI-Ay for gllmg-musl@m.gmane.org; Fri, 20 Jan 2017 22:29:42 +0100 Original-Received: (qmail 28246 invoked by uid 550); 20 Jan 2017 21:29:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 28225 invoked from network); 20 Jan 2017 21:29:45 -0000 Content-Disposition: inline In-Reply-To: <30588c41-eb3d-627e-c5eb-91e19ef56790@gmail.com> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:10958 Archived-At: On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote: > > On 1/20/17 11:56 AM, Rich Felker wrote: > >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: > >>Hi All, > >> > >>While deploying test static executable across farm of different > >>embedded systems, found out that pthread_create() is failing > >>systematically on some (very few) arm-linux devices whenever non > >>null stack guard is enabled (that is, also when calling > >>pthread_create with default - i.e. null - attributes since default > >>is a one page of guard). One of those device is for example a > >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with > >>alternative libc implementations (glibc, uClibc) works as expected > >>on those devices. > >> > >> > >>Issue > >> > >>This occurs because of call to mprotect() in pthread_create fails. > >>In current implementation, if guard size is non null, memory for > >>(guard + stack + ...) is first allocated (mmap'ed) with no > >>accessibility (PROT_NONE), then mprotect() is called to re-enable > >>read/write access to (memory + guardsize). Since call to mprotect() > >>systematically fails in this scenario (returning error code EINVAL), > >>it is impossible to create thread. > >Failure is ignored and the memory is assumed to be writable in this > >case, since EINVAL is assumed to imply no MMU. Is this assumption > >wrong in your case, and if so, can you explain why? > > In my case, devices exhibiting issue are not MMU-less, they are > Cortex-A9 devices with valid mmu / page protection working as > expected otherwise. Note that current Musl code assumes ENOSYS means > no MMU and handles it by assuming the system has no page protection > at all. For the case I observe, it is EINVAL which is returned, this > is not ignored, so memory is unmap'ed and pthread_create() fails. In that case I think this is a kernel bug. Do you know why EINVAL is happening? If there's an MMU, Linux should be able to replace the anon PROT_NONE pages with anon RW pages. > >>In proposed patch (attached below), memory for (guard + stack + ...) > >>is first mmap'ed with read/write accessibility, then guard area is > >>protected by calling mprotect() with PROT_NONE on guardsize first > >>bytes of returned memory. This call to mprotect() to remove all > >>accessibility on guard area, with guard area being at beginning of > >>previously mmap'ed memory, works correctly on those platforms having > >>issue with current implementation. Incidentally, this makes the > >>logic more concise to handle both cases (with or without guard) is a > >>more consistent way, and handle systems with partial/invalid page > >>protection implementation (e.g. mprotect() returning ENOSYS) more > >>gracefully since the stack is explicitly created with read/write > >>access. > >This doesn't work correctly on normal systems with mmu, because the > >size of the guard pages is accounted against commit charge. Linux > >should, but AFAIK doesn't, subtract it from commit charge once it's > >changed to PROT_NONE without having been dirtied, but even if this bug > >is fixed on the kernel side, there would still be a moment where > >excess commit charge is consumed and thus where pthread_create might > >spuriously fail or cause allocations in other processes/threads to > >fail. > > > >If the kernel is not allocating actually-usable address ranges for > >PROT_NONE on all nommu systems, I think the only solution is to handle > >EINVAL from mprotect by going back and re-doing the mmap with > >PROT_READ|PROT_WRITE. Do you have any better ideas? > > > >Rich > Had this "deja vu" feeling... reminds me conversation you had in > this thread some time ago elsewhere... > https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html > > Your proposition seems reasonable on nommu system, but again, the > issue observed here is on legit systems with mmu, with mprotect > failing with EINVAL (and not ENOSYS), for some other reason than > system not supporting page protection. Catching EINVAL error > returned by mprotect and falling back to re-doing the mmap would > mean actually silently running without stack guard on system > supporting it, so I believe it is actually legitimate to fail and > return error in that case. But that's difference use case than the > issue I'm observing. > > I took note of Balazs's suggestion (in the thread referenced above) > to switch to a pattern similar to Musl's current one > (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid > those guard pages to actually occupy resources. But I can indeed > observe that this approach fails on some devices (which have valid > mmu), while I'm not sure I'm seeing the issue with first mapping > PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter > approach (as implemented by patch) is, at least, consistent with all > the other implementations out there (I checked glibc's > allocatestack.c, but also e.g. bionic), and couldn't find report of > those failures you are envisaging. Consider the case of guard_size=128M stack_size=128k with Commit_Limit=128M. This will fail with your approach but works perfectly well now. Rich