* Fix pthread_create on some devices failing to initialize guard area @ 2017-01-20 19:45 Eric Hassold 2017-01-20 19:56 ` Rich Felker 0 siblings, 1 reply; 15+ messages in thread From: Eric Hassold @ 2017-01-20 19:45 UTC (permalink / raw) To: musl Hi All, While deploying test static executable across farm of different embedded systems, found out that pthread_create() is failing systematically on some (very few) arm-linux devices whenever non null stack guard is enabled (that is, also when calling pthread_create with default - i.e. null - attributes since default is a one page of guard). One of those device is for example a Marvell Armada 375 running Linux 3.10.39. Same test code, built with alternative libc implementations (glibc, uClibc) works as expected on those devices. Issue This occurs because of call to mprotect() in pthread_create fails. In current implementation, if guard size is non null, memory for (guard + stack + ...) is first allocated (mmap'ed) with no accessibility (PROT_NONE), then mprotect() is called to re-enable read/write access to (memory + guardsize). Since call to mprotect() systematically fails in this scenario (returning error code EINVAL), it is impossible to create thread. Patch In proposed patch (attached below), memory for (guard + stack + ...) is first mmap'ed with read/write accessibility, then guard area is protected by calling mprotect() with PROT_NONE on guardsize first bytes of returned memory. This call to mprotect() to remove all accessibility on guard area, with guard area being at beginning of previously mmap'ed memory, works correctly on those platforms having issue with current implementation. Incidentally, this makes the logic more concise to handle both cases (with or without guard) is a more consistent way, and handle systems with partial/invalid page protection implementation (e.g. mprotect() returning ENOSYS) more gracefully since the stack is explicitly created with read/write access. -Eric. --- Subject: [PATCH] set up guard protection after stack is mapped calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on some devices, making it impossible to create thread. instead, allocate memory for stack and guard first, with read and write permission, then protect guard area. --- src/thread/pthread_create.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index 49f2f72..f846bec 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -232,21 +232,18 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att } else { guard = ROUND(attr._a_guardsize); size = guard + ROUND(attr._a_stacksize - + libc.tls_size + __pthread_tsd_size); + + libc.tls_size + __pthread_tsd_size); } if (!tsd) { + map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); + if (map == MAP_FAILED) goto fail; if (guard) { - map = __mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0); - if (map == MAP_FAILED) goto fail; - if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE) + if (__mprotect(map, guard, PROT_NONE) && errno != ENOSYS) { __munmap(map, size); goto fail; } - } else { - map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); - if (map == MAP_FAILED) goto fail; } tsd = map + size - __pthread_tsd_size; if (!stack) { -- 2.7.4 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-20 19:45 Fix pthread_create on some devices failing to initialize guard area Eric Hassold @ 2017-01-20 19:56 ` Rich Felker 2017-01-20 21:04 ` Eric Hassold 0 siblings, 1 reply; 15+ messages in thread From: Rich Felker @ 2017-01-20 19:56 UTC (permalink / raw) To: musl On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: > Hi All, > > While deploying test static executable across farm of different > embedded systems, found out that pthread_create() is failing > systematically on some (very few) arm-linux devices whenever non > null stack guard is enabled (that is, also when calling > pthread_create with default - i.e. null - attributes since default > is a one page of guard). One of those device is for example a > Marvell Armada 375 running Linux 3.10.39. Same test code, built with > alternative libc implementations (glibc, uClibc) works as expected > on those devices. > > > Issue > > This occurs because of call to mprotect() in pthread_create fails. > In current implementation, if guard size is non null, memory for > (guard + stack + ...) is first allocated (mmap'ed) with no > accessibility (PROT_NONE), then mprotect() is called to re-enable > read/write access to (memory + guardsize). Since call to mprotect() > systematically fails in this scenario (returning error code EINVAL), > it is impossible to create thread. Failure is ignored and the memory is assumed to be writable in this case, since EINVAL is assumed to imply no MMU. Is this assumption wrong in your case, and if so, can you explain why? > Patch > > In proposed patch (attached below), memory for (guard + stack + ...) > is first mmap'ed with read/write accessibility, then guard area is > protected by calling mprotect() with PROT_NONE on guardsize first > bytes of returned memory. This call to mprotect() to remove all > accessibility on guard area, with guard area being at beginning of > previously mmap'ed memory, works correctly on those platforms having > issue with current implementation. Incidentally, this makes the > logic more concise to handle both cases (with or without guard) is a > more consistent way, and handle systems with partial/invalid page > protection implementation (e.g. mprotect() returning ENOSYS) more > gracefully since the stack is explicitly created with read/write > access. This doesn't work correctly on normal systems with mmu, because the size of the guard pages is accounted against commit charge. Linux should, but AFAIK doesn't, subtract it from commit charge once it's changed to PROT_NONE without having been dirtied, but even if this bug is fixed on the kernel side, there would still be a moment where excess commit charge is consumed and thus where pthread_create might spuriously fail or cause allocations in other processes/threads to fail. If the kernel is not allocating actually-usable address ranges for PROT_NONE on all nommu systems, I think the only solution is to handle EINVAL from mprotect by going back and re-doing the mmap with PROT_READ|PROT_WRITE. Do you have any better ideas? Rich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-20 19:56 ` Rich Felker @ 2017-01-20 21:04 ` Eric Hassold 2017-01-20 21:29 ` Rich Felker 0 siblings, 1 reply; 15+ messages in thread From: Eric Hassold @ 2017-01-20 21:04 UTC (permalink / raw) To: musl On 1/20/17 11:56 AM, Rich Felker wrote: > On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: >> Hi All, >> >> While deploying test static executable across farm of different >> embedded systems, found out that pthread_create() is failing >> systematically on some (very few) arm-linux devices whenever non >> null stack guard is enabled (that is, also when calling >> pthread_create with default - i.e. null - attributes since default >> is a one page of guard). One of those device is for example a >> Marvell Armada 375 running Linux 3.10.39. Same test code, built with >> alternative libc implementations (glibc, uClibc) works as expected >> on those devices. >> >> >> Issue >> >> This occurs because of call to mprotect() in pthread_create fails. >> In current implementation, if guard size is non null, memory for >> (guard + stack + ...) is first allocated (mmap'ed) with no >> accessibility (PROT_NONE), then mprotect() is called to re-enable >> read/write access to (memory + guardsize). Since call to mprotect() >> systematically fails in this scenario (returning error code EINVAL), >> it is impossible to create thread. > Failure is ignored and the memory is assumed to be writable in this > case, since EINVAL is assumed to imply no MMU. Is this assumption > wrong in your case, and if so, can you explain why? In my case, devices exhibiting issue are not MMU-less, they are Cortex-A9 devices with valid mmu / page protection working as expected otherwise. Note that current Musl code assumes ENOSYS means no MMU and handles it by assuming the system has no page protection at all. For the case I observe, it is EINVAL which is returned, this is not ignored, so memory is unmap'ed and pthread_create() fails. > >> Patch >> >> In proposed patch (attached below), memory for (guard + stack + ...) >> is first mmap'ed with read/write accessibility, then guard area is >> protected by calling mprotect() with PROT_NONE on guardsize first >> bytes of returned memory. This call to mprotect() to remove all >> accessibility on guard area, with guard area being at beginning of >> previously mmap'ed memory, works correctly on those platforms having >> issue with current implementation. Incidentally, this makes the >> logic more concise to handle both cases (with or without guard) is a >> more consistent way, and handle systems with partial/invalid page >> protection implementation (e.g. mprotect() returning ENOSYS) more >> gracefully since the stack is explicitly created with read/write >> access. > This doesn't work correctly on normal systems with mmu, because the > size of the guard pages is accounted against commit charge. Linux > should, but AFAIK doesn't, subtract it from commit charge once it's > changed to PROT_NONE without having been dirtied, but even if this bug > is fixed on the kernel side, there would still be a moment where > excess commit charge is consumed and thus where pthread_create might > spuriously fail or cause allocations in other processes/threads to > fail. > > If the kernel is not allocating actually-usable address ranges for > PROT_NONE on all nommu systems, I think the only solution is to handle > EINVAL from mprotect by going back and re-doing the mmap with > PROT_READ|PROT_WRITE. Do you have any better ideas? > > Rich Had this "deja vu" feeling... reminds me conversation you had in this thread some time ago elsewhere... https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html Your proposition seems reasonable on nommu system, but again, the issue observed here is on legit systems with mmu, with mprotect failing with EINVAL (and not ENOSYS), for some other reason than system not supporting page protection. Catching EINVAL error returned by mprotect and falling back to re-doing the mmap would mean actually silently running without stack guard on system supporting it, so I believe it is actually legitimate to fail and return error in that case. But that's difference use case than the issue I'm observing. I took note of Balazs's suggestion (in the thread referenced above) to switch to a pattern similar to Musl's current one (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid those guard pages to actually occupy resources. But I can indeed observe that this approach fails on some devices (which have valid mmu), while I'm not sure I'm seeing the issue with first mapping PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter approach (as implemented by patch) is, at least, consistent with all the other implementations out there (I checked glibc's allocatestack.c, but also e.g. bionic), and couldn't find report of those failures you are envisaging. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-20 21:04 ` Eric Hassold @ 2017-01-20 21:29 ` Rich Felker 2017-01-20 22:42 ` Eric Hassold 0 siblings, 1 reply; 15+ messages in thread From: Rich Felker @ 2017-01-20 21:29 UTC (permalink / raw) To: musl On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote: > > On 1/20/17 11:56 AM, Rich Felker wrote: > >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: > >>Hi All, > >> > >>While deploying test static executable across farm of different > >>embedded systems, found out that pthread_create() is failing > >>systematically on some (very few) arm-linux devices whenever non > >>null stack guard is enabled (that is, also when calling > >>pthread_create with default - i.e. null - attributes since default > >>is a one page of guard). One of those device is for example a > >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with > >>alternative libc implementations (glibc, uClibc) works as expected > >>on those devices. > >> > >> > >>Issue > >> > >>This occurs because of call to mprotect() in pthread_create fails. > >>In current implementation, if guard size is non null, memory for > >>(guard + stack + ...) is first allocated (mmap'ed) with no > >>accessibility (PROT_NONE), then mprotect() is called to re-enable > >>read/write access to (memory + guardsize). Since call to mprotect() > >>systematically fails in this scenario (returning error code EINVAL), > >>it is impossible to create thread. > >Failure is ignored and the memory is assumed to be writable in this > >case, since EINVAL is assumed to imply no MMU. Is this assumption > >wrong in your case, and if so, can you explain why? > > In my case, devices exhibiting issue are not MMU-less, they are > Cortex-A9 devices with valid mmu / page protection working as > expected otherwise. Note that current Musl code assumes ENOSYS means > no MMU and handles it by assuming the system has no page protection > at all. For the case I observe, it is EINVAL which is returned, this > is not ignored, so memory is unmap'ed and pthread_create() fails. In that case I think this is a kernel bug. Do you know why EINVAL is happening? If there's an MMU, Linux should be able to replace the anon PROT_NONE pages with anon RW pages. > >>In proposed patch (attached below), memory for (guard + stack + ...) > >>is first mmap'ed with read/write accessibility, then guard area is > >>protected by calling mprotect() with PROT_NONE on guardsize first > >>bytes of returned memory. This call to mprotect() to remove all > >>accessibility on guard area, with guard area being at beginning of > >>previously mmap'ed memory, works correctly on those platforms having > >>issue with current implementation. Incidentally, this makes the > >>logic more concise to handle both cases (with or without guard) is a > >>more consistent way, and handle systems with partial/invalid page > >>protection implementation (e.g. mprotect() returning ENOSYS) more > >>gracefully since the stack is explicitly created with read/write > >>access. > >This doesn't work correctly on normal systems with mmu, because the > >size of the guard pages is accounted against commit charge. Linux > >should, but AFAIK doesn't, subtract it from commit charge once it's > >changed to PROT_NONE without having been dirtied, but even if this bug > >is fixed on the kernel side, there would still be a moment where > >excess commit charge is consumed and thus where pthread_create might > >spuriously fail or cause allocations in other processes/threads to > >fail. > > > >If the kernel is not allocating actually-usable address ranges for > >PROT_NONE on all nommu systems, I think the only solution is to handle > >EINVAL from mprotect by going back and re-doing the mmap with > >PROT_READ|PROT_WRITE. Do you have any better ideas? > > > >Rich > Had this "deja vu" feeling... reminds me conversation you had in > this thread some time ago elsewhere... > https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html > > Your proposition seems reasonable on nommu system, but again, the > issue observed here is on legit systems with mmu, with mprotect > failing with EINVAL (and not ENOSYS), for some other reason than > system not supporting page protection. Catching EINVAL error > returned by mprotect and falling back to re-doing the mmap would > mean actually silently running without stack guard on system > supporting it, so I believe it is actually legitimate to fail and > return error in that case. But that's difference use case than the > issue I'm observing. > > I took note of Balazs's suggestion (in the thread referenced above) > to switch to a pattern similar to Musl's current one > (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid > those guard pages to actually occupy resources. But I can indeed > observe that this approach fails on some devices (which have valid > mmu), while I'm not sure I'm seeing the issue with first mapping > PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter > approach (as implemented by patch) is, at least, consistent with all > the other implementations out there (I checked glibc's > allocatestack.c, but also e.g. bionic), and couldn't find report of > those failures you are envisaging. Consider the case of guard_size=128M stack_size=128k with Commit_Limit=128M. This will fail with your approach but works perfectly well now. Rich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-20 21:29 ` Rich Felker @ 2017-01-20 22:42 ` Eric Hassold 2017-01-30 21:30 ` Eric Hassold 2017-01-31 3:58 ` Rich Felker 0 siblings, 2 replies; 15+ messages in thread From: Eric Hassold @ 2017-01-20 22:42 UTC (permalink / raw) To: musl On 1/20/17 1:29 PM, Rich Felker wrote: > On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote: >> On 1/20/17 11:56 AM, Rich Felker wrote: >>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: >>>> Hi All, >>>> >>>> While deploying test static executable across farm of different >>>> embedded systems, found out that pthread_create() is failing >>>> systematically on some (very few) arm-linux devices whenever non >>>> null stack guard is enabled (that is, also when calling >>>> pthread_create with default - i.e. null - attributes since default >>>> is a one page of guard). One of those device is for example a >>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with >>>> alternative libc implementations (glibc, uClibc) works as expected >>>> on those devices. >>>> >>>> >>>> Issue >>>> >>>> This occurs because of call to mprotect() in pthread_create fails. >>>> In current implementation, if guard size is non null, memory for >>>> (guard + stack + ...) is first allocated (mmap'ed) with no >>>> accessibility (PROT_NONE), then mprotect() is called to re-enable >>>> read/write access to (memory + guardsize). Since call to mprotect() >>>> systematically fails in this scenario (returning error code EINVAL), >>>> it is impossible to create thread. >>> Failure is ignored and the memory is assumed to be writable in this >>> case, since EINVAL is assumed to imply no MMU. Is this assumption >>> wrong in your case, and if so, can you explain why? >> In my case, devices exhibiting issue are not MMU-less, they are >> Cortex-A9 devices with valid mmu / page protection working as >> expected otherwise. Note that current Musl code assumes ENOSYS means >> no MMU and handles it by assuming the system has no page protection >> at all. For the case I observe, it is EINVAL which is returned, this >> is not ignored, so memory is unmap'ed and pthread_create() fails. > In that case I think this is a kernel bug. Do you know why EINVAL is > happening? If there's an MMU, Linux should be able to replace the > anon PROT_NONE pages with anon RW pages. Agree. Unfortunately, those are devices we don't built the kernel for, so have been hardly able to track issue deeper. The point is however that such devices with this issue in kernel might not be that uncommon, and it concretely means impossibility at that moment to deploy on them a functional static executable built with musl. > >>>> In proposed patch (attached below), memory for (guard + stack + ...) >>>> is first mmap'ed with read/write accessibility, then guard area is >>>> protected by calling mprotect() with PROT_NONE on guardsize first >>>> bytes of returned memory. This call to mprotect() to remove all >>>> accessibility on guard area, with guard area being at beginning of >>>> previously mmap'ed memory, works correctly on those platforms having >>>> issue with current implementation. Incidentally, this makes the >>>> logic more concise to handle both cases (with or without guard) is a >>>> more consistent way, and handle systems with partial/invalid page >>>> protection implementation (e.g. mprotect() returning ENOSYS) more >>>> gracefully since the stack is explicitly created with read/write >>>> access. >>> This doesn't work correctly on normal systems with mmu, because the >>> size of the guard pages is accounted against commit charge. Linux >>> should, but AFAIK doesn't, subtract it from commit charge once it's >>> changed to PROT_NONE without having been dirtied, but even if this bug >>> is fixed on the kernel side, there would still be a moment where >>> excess commit charge is consumed and thus where pthread_create might >>> spuriously fail or cause allocations in other processes/threads to >>> fail. >>> >>> If the kernel is not allocating actually-usable address ranges for >>> PROT_NONE on all nommu systems, I think the only solution is to handle >>> EINVAL from mprotect by going back and re-doing the mmap with >>> PROT_READ|PROT_WRITE. Do you have any better ideas? >>> >>> Rich >> Had this "deja vu" feeling... reminds me conversation you had in >> this thread some time ago elsewhere... >> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html >> >> Your proposition seems reasonable on nommu system, but again, the >> issue observed here is on legit systems with mmu, with mprotect >> failing with EINVAL (and not ENOSYS), for some other reason than >> system not supporting page protection. Catching EINVAL error >> returned by mprotect and falling back to re-doing the mmap would >> mean actually silently running without stack guard on system >> supporting it, so I believe it is actually legitimate to fail and >> return error in that case. But that's difference use case than the >> issue I'm observing. >> >> I took note of Balazs's suggestion (in the thread referenced above) >> to switch to a pattern similar to Musl's current one >> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid >> those guard pages to actually occupy resources. But I can indeed >> observe that this approach fails on some devices (which have valid >> mmu), while I'm not sure I'm seeing the issue with first mapping >> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter >> approach (as implemented by patch) is, at least, consistent with all >> the other implementations out there (I checked glibc's >> allocatestack.c, but also e.g. bionic), and couldn't find report of >> those failures you are envisaging. > Consider the case of guard_size=128M stack_size=128k with > Commit_Limit=128M. This will fail with your approach but works > perfectly well now. Right, makes sense. Though I would point out that such uncommon scenario for an application would fail when linked with anything but musl, since "my approach" is the widespread one across all other libc implementations (glibc, bionic, ...). But I understand both approaches aim at working around issue in some corner case while creating issue in other rare one, and so at the end it's all about deciding which edge case is more critical than the other, and whether consistency/compatibility with existing 3rd parties solutions matters or not. But in order to get closer to "having the cake and eat it too", please find attached another patch implementing the "unmaping and re-doing" strategy you initially suggested, i.e. starting with current approach, then giving it a second chance using "my" approach if and only if issue with mprotect() is detected. Should be consistent with current behavior on any system working correctly today, and just provide a plan B falling back to more common approach only when needed. Does that sound more acceptable? Thanks, Eric ---- [PATCH] set up guard protection after stack is mapped calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on some devices with buggy kernel, making it impossible to create thread. if such condition is met, fall back to allocating memory for stack and guard first, with read and write permission, then protect guard area. --- src/thread/pthread_create.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index 49f2f72..d3c030b 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE) && errno != ENOSYS) { __munmap(map, size); - goto fail; + if (errno == EINVAL) { + map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); + if (map == MAP_FAILED) goto fail; + if (__mprotect(map, guard, PROT_NONE) + && errno != ENOSYS) { + __munmap(map, size); + goto fail; + } + } else { + goto fail; + } } } else { map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); -- 2.7.4 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-20 22:42 ` Eric Hassold @ 2017-01-30 21:30 ` Eric Hassold 2017-01-30 23:13 ` Rich Felker 2017-01-31 3:58 ` Rich Felker 1 sibling, 1 reply; 15+ messages in thread From: Eric Hassold @ 2017-01-30 21:30 UTC (permalink / raw) To: musl On 1/20/17 2:42 PM, Eric Hassold wrote: > > > On 1/20/17 1:29 PM, Rich Felker wrote: >> On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote: >>> On 1/20/17 11:56 AM, Rich Felker wrote: >>>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: >>>>> Hi All, >>>>> >>>>> While deploying test static executable across farm of different >>>>> embedded systems, found out that pthread_create() is failing >>>>> systematically on some (very few) arm-linux devices whenever non >>>>> null stack guard is enabled (that is, also when calling >>>>> pthread_create with default - i.e. null - attributes since default >>>>> is a one page of guard). One of those device is for example a >>>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with >>>>> alternative libc implementations (glibc, uClibc) works as expected >>>>> on those devices. >>>>> >>>>> >>>>> Issue >>>>> >>>>> This occurs because of call to mprotect() in pthread_create fails. >>>>> In current implementation, if guard size is non null, memory for >>>>> (guard + stack + ...) is first allocated (mmap'ed) with no >>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable >>>>> read/write access to (memory + guardsize). Since call to mprotect() >>>>> systematically fails in this scenario (returning error code EINVAL), >>>>> it is impossible to create thread. >>>> Failure is ignored and the memory is assumed to be writable in this >>>> case, since EINVAL is assumed to imply no MMU. Is this assumption >>>> wrong in your case, and if so, can you explain why? >>> In my case, devices exhibiting issue are not MMU-less, they are >>> Cortex-A9 devices with valid mmu / page protection working as >>> expected otherwise. Note that current Musl code assumes ENOSYS means >>> no MMU and handles it by assuming the system has no page protection >>> at all. For the case I observe, it is EINVAL which is returned, this >>> is not ignored, so memory is unmap'ed and pthread_create() fails. >> In that case I think this is a kernel bug. Do you know why EINVAL is >> happening? If there's an MMU, Linux should be able to replace the >> anon PROT_NONE pages with anon RW pages. > > Agree. Unfortunately, those are devices we don't built the kernel for, > so have been hardly able to track issue deeper. The point is however > that such devices with this issue in kernel might not be that > uncommon, and it concretely means impossibility at that moment to > deploy on them a functional static executable built with musl. >> >>>>> In proposed patch (attached below), memory for (guard + stack + ...) >>>>> is first mmap'ed with read/write accessibility, then guard area is >>>>> protected by calling mprotect() with PROT_NONE on guardsize first >>>>> bytes of returned memory. This call to mprotect() to remove all >>>>> accessibility on guard area, with guard area being at beginning of >>>>> previously mmap'ed memory, works correctly on those platforms having >>>>> issue with current implementation. Incidentally, this makes the >>>>> logic more concise to handle both cases (with or without guard) is a >>>>> more consistent way, and handle systems with partial/invalid page >>>>> protection implementation (e.g. mprotect() returning ENOSYS) more >>>>> gracefully since the stack is explicitly created with read/write >>>>> access. >>>> This doesn't work correctly on normal systems with mmu, because the >>>> size of the guard pages is accounted against commit charge. Linux >>>> should, but AFAIK doesn't, subtract it from commit charge once it's >>>> changed to PROT_NONE without having been dirtied, but even if this bug >>>> is fixed on the kernel side, there would still be a moment where >>>> excess commit charge is consumed and thus where pthread_create might >>>> spuriously fail or cause allocations in other processes/threads to >>>> fail. >>>> >>>> If the kernel is not allocating actually-usable address ranges for >>>> PROT_NONE on all nommu systems, I think the only solution is to handle >>>> EINVAL from mprotect by going back and re-doing the mmap with >>>> PROT_READ|PROT_WRITE. Do you have any better ideas? >>>> >>>> Rich >>> Had this "deja vu" feeling... reminds me conversation you had in >>> this thread some time ago elsewhere... >>> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html >>> >>> Your proposition seems reasonable on nommu system, but again, the >>> issue observed here is on legit systems with mmu, with mprotect >>> failing with EINVAL (and not ENOSYS), for some other reason than >>> system not supporting page protection. Catching EINVAL error >>> returned by mprotect and falling back to re-doing the mmap would >>> mean actually silently running without stack guard on system >>> supporting it, so I believe it is actually legitimate to fail and >>> return error in that case. But that's difference use case than the >>> issue I'm observing. >>> >>> I took note of Balazs's suggestion (in the thread referenced above) >>> to switch to a pattern similar to Musl's current one >>> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid >>> those guard pages to actually occupy resources. But I can indeed >>> observe that this approach fails on some devices (which have valid >>> mmu), while I'm not sure I'm seeing the issue with first mapping >>> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter >>> approach (as implemented by patch) is, at least, consistent with all >>> the other implementations out there (I checked glibc's >>> allocatestack.c, but also e.g. bionic), and couldn't find report of >>> those failures you are envisaging. >> Consider the case of guard_size=128M stack_size=128k with >> Commit_Limit=128M. This will fail with your approach but works >> perfectly well now. > Right, makes sense. Though I would point out that such uncommon > scenario for an application would fail when linked with anything but > musl, since "my approach" is the widespread one across all other libc > implementations (glibc, bionic, ...). But I understand both approaches > aim at working around issue in some corner case while creating issue > in other rare one, and so at the end it's all about deciding which > edge case is more critical than the other, and whether > consistency/compatibility with existing 3rd parties solutions matters > or not. > > But in order to get closer to "having the cake and eat it too", please > find attached another patch implementing the "unmaping and re-doing" > strategy you initially suggested, i.e. starting with current approach, > then giving it a second chance using "my" approach if and only if > issue with mprotect() is detected. Should be consistent with current > behavior on any system working correctly today, and just provide a > plan B falling back to more common approach only when needed. Does > that sound more acceptable? > > Thanks, > > Eric > > ---- > [PATCH] set up guard protection after stack is mapped > > calling mprotect beyond the guard page of PROT_NONE mmap-ed stack > fails on > some devices with buggy kernel, making it impossible to create > thread. if such condition is met, fall back to allocating memory > for stack and guard first, with read and write permission, then > protect guard > area. > --- > src/thread/pthread_create.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c > index 49f2f72..d3c030b 100644 > --- a/src/thread/pthread_create.c > +++ b/src/thread/pthread_create.c > @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, > const pthread_attr_t *restrict att > if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE) > && errno != ENOSYS) { > __munmap(map, size); > - goto fail; > + if (errno == EINVAL) { > + map = __mmap(0, size, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANON, -1, 0); > + if (map == MAP_FAILED) goto fail; > + if (__mprotect(map, guard, PROT_NONE) > + && errno != ENOSYS) { > + __munmap(map, size); > + goto fail; > + } > + } else { > + goto fail; > + } > } > } else { > map = __mmap(0, size, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANON, -1, 0); Hi Rich, Pinging... any comment, feedback or concern about latest version of the patch, attached above, keeping current behavior but falling back to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only if current approach detected to fail) ? Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-30 21:30 ` Eric Hassold @ 2017-01-30 23:13 ` Rich Felker 2017-01-31 2:52 ` Eric Hassold 0 siblings, 1 reply; 15+ messages in thread From: Rich Felker @ 2017-01-30 23:13 UTC (permalink / raw) To: musl On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote: > >>>>>This occurs because of call to mprotect() in pthread_create fails. > >>>>>In current implementation, if guard size is non null, memory for > >>>>>(guard + stack + ...) is first allocated (mmap'ed) with no > >>>>>accessibility (PROT_NONE), then mprotect() is called to re-enable > >>>>>read/write access to (memory + guardsize). Since call to mprotect() > >>>>>systematically fails in this scenario (returning error code EINVAL), > >>>>>it is impossible to create thread. > >>>>Failure is ignored and the memory is assumed to be writable in this > >>>>case, since EINVAL is assumed to imply no MMU. Is this assumption > >>>>wrong in your case, and if so, can you explain why? > >>>In my case, devices exhibiting issue are not MMU-less, they are > >>>Cortex-A9 devices with valid mmu / page protection working as > >>>expected otherwise. Note that current Musl code assumes ENOSYS means > >>>no MMU and handles it by assuming the system has no page protection > >>>at all. For the case I observe, it is EINVAL which is returned, this > >>>is not ignored, so memory is unmap'ed and pthread_create() fails. > >>In that case I think this is a kernel bug. Do you know why EINVAL is > >>happening? If there's an MMU, Linux should be able to replace the > >>anon PROT_NONE pages with anon RW pages. > > > >Agree. Unfortunately, those are devices we don't built the kernel > >for, so have been hardly able to track issue deeper. The point is > >however that such devices with this issue in kernel might not be > >that uncommon, and it concretely means impossibility at that > >moment to deploy on them a functional static executable built with > >musl. > [...] > Pinging... any comment, feedback or concern about latest version of > the patch, attached above, keeping current behavior but falling back > to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only > if current approach detected to fail) ? I still want to know what's going on on the kernel side, because it looks like this a rogue/nonsensical patch to the kernel that breaks mmap functionality in a general way that has nothing to do with the specific cpu/board. Rich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-30 23:13 ` Rich Felker @ 2017-01-31 2:52 ` Eric Hassold 0 siblings, 0 replies; 15+ messages in thread From: Eric Hassold @ 2017-01-31 2:52 UTC (permalink / raw) To: musl On 1/30/17 3:13 PM, Rich Felker wrote: > On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote: >>>>>>> This occurs because of call to mprotect() in pthread_create fails. >>>>>>> In current implementation, if guard size is non null, memory for >>>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no >>>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable >>>>>>> read/write access to (memory + guardsize). Since call to mprotect() >>>>>>> systematically fails in this scenario (returning error code EINVAL), >>>>>>> it is impossible to create thread. >>>>>> Failure is ignored and the memory is assumed to be writable in this >>>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption >>>>>> wrong in your case, and if so, can you explain why? >>>>> In my case, devices exhibiting issue are not MMU-less, they are >>>>> Cortex-A9 devices with valid mmu / page protection working as >>>>> expected otherwise. Note that current Musl code assumes ENOSYS means >>>>> no MMU and handles it by assuming the system has no page protection >>>>> at all. For the case I observe, it is EINVAL which is returned, this >>>>> is not ignored, so memory is unmap'ed and pthread_create() fails. >>>> In that case I think this is a kernel bug. Do you know why EINVAL is >>>> happening? If there's an MMU, Linux should be able to replace the >>>> anon PROT_NONE pages with anon RW pages. >>> Agree. Unfortunately, those are devices we don't built the kernel >>> for, so have been hardly able to track issue deeper. The point is >>> however that such devices with this issue in kernel might not be >>> that uncommon, and it concretely means impossibility at that >>> moment to deploy on them a functional static executable built with >>> musl. >> [...] >> Pinging... any comment, feedback or concern about latest version of >> the patch, attached above, keeping current behavior but falling back >> to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only >> if current approach detected to fail) ? > I still want to know what's going on on the kernel side, because it > looks like this a rogue/nonsensical patch to the kernel that breaks > mmap functionality in a general way that has nothing to do with the > specific cpu/board. > > Rich Yes, I understand this would be ideal, and wish too I would be able to find out more about what's happening, beyond the symptomatic approach. But as mentioned, we have very little leverage to investigate much further what's going on, since issue wasn't reproducible on any device we built kernel for, but only on a few devices we don't build those kernels nor have easy way to perform some lower level debugging on those platforms (no jtag access). Telemetry was reporting failure on pthread creation on some devices, one of them we could reproduce in-house was a Marvell 375 board used e.g. in Western Digital MyCloud devices. But it seems hard to even just get reference to the actual GPL source of the exact kernel used there, so I don't see a way to find out if it's a broken patch introduced at one point by a vendor, or some transient regression fixed at one point but still having devices shipped with this oddity (kernel versions seems to be 3.10 branch, so one very rough guess is that it may be related to multiple patches done to support huge TLB on ARM at that time, but that's more of a guts feeling than supported by facts). Though I understand it is not the role of a libc implementation to work around all kind of flaws in kernel implementations that vendors ever shipped, I would find it valuable, in this specific case, to prevent musl from consistently bubbling up an issue not (easily) reproducible otherwise (e.g. with other libc implementations). One of the great value of musl is to allow easy deployment across a variety of devices (not as opened as we would ideally hope) of the same statically linked executable. This suggested patch aims at supporting this use case, since fixing kernel isn't an option there. Of course, this pragmatic motivation conflicts somehow with the legitimate goal to keep musl implementation lean and clean (from a functional standpoint, this doesn't introduce any change on behavior on all non broken platforms). So of course only you can value the pros (regarding deployment) and decide whether it is worth the cons of slightly "bloating" source code with alternative recovery path. Thanks, Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-20 22:42 ` Eric Hassold 2017-01-30 21:30 ` Eric Hassold @ 2017-01-31 3:58 ` Rich Felker 2017-01-31 21:18 ` Eric Hassold 1 sibling, 1 reply; 15+ messages in thread From: Rich Felker @ 2017-01-31 3:58 UTC (permalink / raw) To: musl On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote: > ---- > [PATCH] set up guard protection after stack is mapped > > calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on > some devices with buggy kernel, making it impossible to create > thread. if such condition is met, fall back to allocating memory > for stack and guard first, with read and write permission, then > protect guard > area. > --- > src/thread/pthread_create.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c > index 49f2f72..d3c030b 100644 > --- a/src/thread/pthread_create.c > +++ b/src/thread/pthread_create.c > @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, > const pthread_attr_t *restrict att > if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE) > && errno != ENOSYS) { > __munmap(map, size); > - goto fail; > + if (errno == EINVAL) { In principle __munmap could have clobbered errno at this point. I don't think it will but to be safe errno should probably be saved. > + map = __mmap(0, size, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANON, -1, 0); > + if (map == MAP_FAILED) goto fail; > + if (__mprotect(map, guard, PROT_NONE) > + && errno != ENOSYS) { > + __munmap(map, size); > + goto fail; > + } > + } else { > + goto fail; > + } Can you try instead falling back to mmap with MAP_FIXED over top of the part that should be RW rather than unmapping and redoing the whole thing? If that works it would preserve the property of avoiding excess commit charge. If not then these kernels are very very broken. Either way it might shed some more light on what's going on. I understand you just want things to work, but I do also want to have some understanding of _why_ we're going to be working around something awful, if we really have to. Rich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-31 3:58 ` Rich Felker @ 2017-01-31 21:18 ` Eric Hassold 2017-01-31 22:44 ` Eric Hassold 0 siblings, 1 reply; 15+ messages in thread From: Eric Hassold @ 2017-01-31 21:18 UTC (permalink / raw) To: musl On 1/30/17 7:58 PM, Rich Felker wrote: > On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote: >> ---- >> [PATCH] set up guard protection after stack is mapped >> >> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on >> some devices with buggy kernel, making it impossible to create >> thread. if such condition is met, fall back to allocating memory >> for stack and guard first, with read and write permission, then >> protect guard >> area. >> --- >> src/thread/pthread_create.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c >> index 49f2f72..d3c030b 100644 >> --- a/src/thread/pthread_create.c >> +++ b/src/thread/pthread_create.c >> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, >> const pthread_attr_t *restrict att >> if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE) >> && errno != ENOSYS) { >> __munmap(map, size); >> - goto fail; >> + if (errno == EINVAL) { > In principle __munmap could have clobbered errno at this point. I > don't think it will but to be safe errno should probably be saved. > >> + map = __mmap(0, size, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_ANON, -1, 0); >> + if (map == MAP_FAILED) goto fail; >> + if (__mprotect(map, guard, PROT_NONE) >> + && errno != ENOSYS) { >> + __munmap(map, size); >> + goto fail; >> + } >> + } else { >> + goto fail; >> + } > Can you try instead falling back to mmap with MAP_FIXED over top of > the part that should be RW rather than unmapping and redoing the whole > thing? If that works it would preserve the property of avoiding excess > commit charge. If not then these kernels are very very broken. Either > way it might shed some more light on what's going on. I understand you > just want things to work, but I do also want to have some > understanding of _why_ we're going to be working around something > awful, if we really have to. > > Rich mmap(MAP_FIXED) at the address returned by first mmap(PROT_NONE) and then mprotect() indeed works. BUT... actually kept exploring the idea that this might just be an issue with page alignment. And found out that forcing the guard size to be 32k-aligned make current approach work well. So checked and found out that on systems where issue happens, the actual page size determined dynamically at runtime (using "getinfo PAGESIZE", and confirmed using ratio of Mapped memory / nr_mapped pages from /proc/vmstat) is indeed 32k, whereas Musl has an hardcoded at compile time 4k page size for arm architecture (with same eventually incorrect value returned by getpagesize() and sysconf(__SC_PAGESIZE)). Long story short, no odd behavior of the mmap in kernel, but Musl not supporting correctly non-4k page systems, impacting pthread_create but probably also malloc and mprotect. Given non-4k page systems are getting more and more common (on arm and arm64 architectures at least), this might be an important issue to fix. Should I start new "support for non-4k page systems" discussion here ? Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-31 21:18 ` Eric Hassold @ 2017-01-31 22:44 ` Eric Hassold 2017-02-01 9:52 ` Szabolcs Nagy 0 siblings, 1 reply; 15+ messages in thread From: Eric Hassold @ 2017-01-31 22:44 UTC (permalink / raw) To: musl On 1/31/17 1:18 PM, Eric Hassold wrote: > > On 1/30/17 7:58 PM, Rich Felker wrote: >> On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote: >>> ---- >>> [PATCH] set up guard protection after stack is mapped >>> >>> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack >>> fails on >>> some devices with buggy kernel, making it impossible to create >>> thread. if such condition is met, fall back to allocating memory >>> for stack and guard first, with read and write permission, then >>> protect guard >>> area. >>> --- >>> src/thread/pthread_create.c | 12 +++++++++++- >>> 1 file changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c >>> index 49f2f72..d3c030b 100644 >>> --- a/src/thread/pthread_create.c >>> +++ b/src/thread/pthread_create.c >>> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, >>> const pthread_attr_t *restrict att >>> if (__mprotect(map+guard, size-guard, >>> PROT_READ|PROT_WRITE) >>> && errno != ENOSYS) { >>> __munmap(map, size); >>> - goto fail; >>> + if (errno == EINVAL) { >> In principle __munmap could have clobbered errno at this point. I >> don't think it will but to be safe errno should probably be saved. >> >>> + map = __mmap(0, size, PROT_READ|PROT_WRITE, >>> MAP_PRIVATE|MAP_ANON, -1, 0); >>> + if (map == MAP_FAILED) goto fail; >>> + if (__mprotect(map, guard, PROT_NONE) >>> + && errno != ENOSYS) { >>> + __munmap(map, size); >>> + goto fail; >>> + } >>> + } else { >>> + goto fail; >>> + } >> Can you try instead falling back to mmap with MAP_FIXED over top of >> the part that should be RW rather than unmapping and redoing the whole >> thing? If that works it would preserve the property of avoiding excess >> commit charge. If not then these kernels are very very broken. Either >> way it might shed some more light on what's going on. I understand you >> just want things to work, but I do also want to have some >> understanding of _why_ we're going to be working around something >> awful, if we really have to. >> >> Rich > > mmap(MAP_FIXED) at the address returned by first mmap(PROT_NONE) and > then mprotect() indeed works. > > BUT... actually kept exploring the idea that this might just be an > issue with page alignment. And found out that forcing the guard size > to be 32k-aligned make current approach work well. So checked and > found out that on systems where issue happens, the actual page size > determined dynamically at runtime (using "getinfo PAGESIZE", and > confirmed using ratio of Mapped memory / nr_mapped pages from > /proc/vmstat) is indeed 32k, whereas Musl has an hardcoded at compile > time 4k page size for arm architecture (with same eventually incorrect > value returned by getpagesize() and sysconf(__SC_PAGESIZE)). > > Long story short, no odd behavior of the mmap in kernel, but Musl not > supporting correctly non-4k page systems, impacting pthread_create but > probably also malloc and mprotect. Given non-4k page systems are > getting more and more common (on arm and arm64 architectures at > least), this might be an important issue to fix. Should I start new > "support for non-4k page systems" discussion here ? > > Eric > Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is initialized correctly at runtime, making the correct page size be used all around musl. Since a page size of 4k is no more a valid assumption on arm-linux target, is removing this compile time define a right fix? Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-01-31 22:44 ` Eric Hassold @ 2017-02-01 9:52 ` Szabolcs Nagy 2017-02-01 18:21 ` Eric Hassold 0 siblings, 1 reply; 15+ messages in thread From: Szabolcs Nagy @ 2017-02-01 9:52 UTC (permalink / raw) To: musl * Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]: > On 1/31/17 1:18 PM, Eric Hassold wrote: > > Long story short, no odd behavior of the mmap in kernel, but Musl not > > supporting correctly non-4k page systems, impacting pthread_create but > > probably also malloc and mprotect. Given non-4k page systems are getting > > more and more common (on arm and arm64 architectures at least), this > > might be an important issue to fix. Should I start new "support for > > non-4k page systems" discussion here ? > > > > Eric > > > Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes > PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is > initialized correctly at runtime, making the correct page size be used all > around musl. Since a page size of 4k is no more a valid assumption on > arm-linux target, is removing this compile time define a right fix? it is ok to remove PAGE_SIZE but i thought arm elf binaries would depend on the 4k page size.. making it less useful to support other page sizes, but if 32k works in practice then i think musl should support it. i guess the boundary of unaligned sections will be mapped twice which is not a big issue except may be you don't get relro support if the readonly relocs are 4k aligned. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-02-01 9:52 ` Szabolcs Nagy @ 2017-02-01 18:21 ` Eric Hassold 2017-02-01 18:35 ` Rich Felker 0 siblings, 1 reply; 15+ messages in thread From: Eric Hassold @ 2017-02-01 18:21 UTC (permalink / raw) To: musl On 2/1/17 1:52 AM, Szabolcs Nagy wrote: > * Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]: >> On 1/31/17 1:18 PM, Eric Hassold wrote: >>> Long story short, no odd behavior of the mmap in kernel, but Musl not >>> supporting correctly non-4k page systems, impacting pthread_create but >>> probably also malloc and mprotect. Given non-4k page systems are getting >>> more and more common (on arm and arm64 architectures at least), this >>> might be an important issue to fix. Should I start new "support for >>> non-4k page systems" discussion here ? >>> >>> Eric >>> >> Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes >> PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is >> initialized correctly at runtime, making the correct page size be used all >> around musl. Since a page size of 4k is no more a valid assumption on >> arm-linux target, is removing this compile time define a right fix? > it is ok to remove PAGE_SIZE but i thought arm elf > binaries would depend on the 4k page size.. making it > less useful to support other page sizes, but if 32k > works in practice then i think musl should support it. Binutils was updated couple years ago to support up to 64KB page size and default text alignment to 64kB boundary for armel and match changes in kernel: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7572ca8989ead4c3425a1500bc241eaaeffa2c89 So yes, executables built with older toolchains can't be deployed as is on such 32kB pagesize systems, but no issue with with executables built with recent (binutils 2.27 && gcc 5.4) toolchain we are using. Any further action on my side to proceed? I guess no much value in submitting a patch though mail here for such one liner change... Eric > > i guess the boundary of unaligned sections will be > mapped twice which is not a big issue except may be > you don't get relro support if the readonly relocs > are 4k aligned. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-02-01 18:21 ` Eric Hassold @ 2017-02-01 18:35 ` Rich Felker 2017-02-01 18:52 ` Eric Hassold 0 siblings, 1 reply; 15+ messages in thread From: Rich Felker @ 2017-02-01 18:35 UTC (permalink / raw) To: musl On Wed, Feb 01, 2017 at 10:21:40AM -0800, Eric Hassold wrote: > > On 2/1/17 1:52 AM, Szabolcs Nagy wrote: > >* Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]: > >>On 1/31/17 1:18 PM, Eric Hassold wrote: > >>>Long story short, no odd behavior of the mmap in kernel, but Musl not > >>>supporting correctly non-4k page systems, impacting pthread_create but > >>>probably also malloc and mprotect. Given non-4k page systems are getting > >>>more and more common (on arm and arm64 architectures at least), this > >>>might be an important issue to fix. Should I start new "support for > >>>non-4k page systems" discussion here ? > >>> > >>>Eric > >>> > >>Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes > >>PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is > >>initialized correctly at runtime, making the correct page size be used all > >>around musl. Since a page size of 4k is no more a valid assumption on > >>arm-linux target, is removing this compile time define a right fix? > >it is ok to remove PAGE_SIZE but i thought arm elf > >binaries would depend on the 4k page size.. making it > >less useful to support other page sizes, but if 32k > >works in practice then i think musl should support it. > > Binutils was updated couple years ago to support up to 64KB page > size and default text alignment to 64kB boundary for armel and match > changes in kernel: > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7572ca8989ead4c3425a1500bc241eaaeffa2c89 > > So yes, executables built with older toolchains can't be deployed as > is on such 32kB pagesize systems, but no issue with with executables > built with recent (binutils 2.27 && gcc 5.4) toolchain we are > using. OK, so this was a generalization of the ABI that we weren't aware about. Bleh. But if the rest of the world is no longer considering 4k page size part of the ABI, we should update musl to reflect this. > Any further action on my side to proceed? I guess no much value in > submitting a patch though mail here for such one liner change... No further action needed. Thanks for being patient with my hesitance to accept the initial patch and for working through this to find the real problem. Rich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area 2017-02-01 18:35 ` Rich Felker @ 2017-02-01 18:52 ` Eric Hassold 0 siblings, 0 replies; 15+ messages in thread From: Eric Hassold @ 2017-02-01 18:52 UTC (permalink / raw) To: musl On 2/1/17 10:35 AM, Rich Felker wrote: > OK, so this was a generalization of the ABI that we weren't aware > about. Bleh. But if the rest of the world is no longer considering 4k > page size part of the ABI, we should update musl to reflect this. > >> Any further action on my side to proceed? I guess no much value in >> submitting a patch though mail here for such one liner change... > No further action needed. Thanks for being patient with my hesitance > to accept the initial patch and for working through this to find the > real problem. > > Rich Your "hesitance" reflects your consistent rigor in asking the right questions, which allows Musl to be the highly robust and clean codebase we all enjoy, and created incentive to nail the actual root cause of the issue down. So your hesitance is much appreciated. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-02-01 18:52 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-20 19:45 Fix pthread_create on some devices failing to initialize guard area Eric Hassold 2017-01-20 19:56 ` Rich Felker 2017-01-20 21:04 ` Eric Hassold 2017-01-20 21:29 ` Rich Felker 2017-01-20 22:42 ` Eric Hassold 2017-01-30 21:30 ` Eric Hassold 2017-01-30 23:13 ` Rich Felker 2017-01-31 2:52 ` Eric Hassold 2017-01-31 3:58 ` Rich Felker 2017-01-31 21:18 ` Eric Hassold 2017-01-31 22:44 ` Eric Hassold 2017-02-01 9:52 ` Szabolcs Nagy 2017-02-01 18:21 ` Eric Hassold 2017-02-01 18:35 ` Rich Felker 2017-02-01 18:52 ` Eric Hassold
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).