Fix pthread_create on some devices failing to initialize guard area

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Fix pthread_create on some devices failing to initialize guard area
@ 2017-01-20 19:45 Eric Hassold
  2017-01-20 19:56 ` Rich Felker
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-20 19:45 UTC (permalink / raw)
  To: musl

Hi All,

While deploying test static executable across farm of different embedded systems, found out that pthread_create() is failing systematically on some (very few) arm-linux devices whenever non null stack guard is enabled (that is, also when calling pthread_create with default - i.e. null - attributes since default is a one page of guard). One of those device is for example a Marvell Armada 375 running Linux 3.10.39. Same test code, built with alternative libc implementations (glibc, uClibc) works as expected on those devices.

Issue

This occurs because of call to mprotect() in pthread_create fails. In current implementation, if guard size is non null, memory for (guard + stack + ...) is first allocated (mmap'ed) with no accessibility (PROT_NONE), then mprotect() is called to re-enable read/write access to (memory + guardsize). Since call to mprotect() systematically fails in this scenario (returning error code EINVAL), it is impossible to create thread.

Patch

In proposed patch (attached below), memory for (guard + stack + ...) is first mmap'ed with read/write accessibility, then guard area is protected by calling mprotect() with PROT_NONE on guardsize first bytes of returned memory. This call to mprotect() to remove all accessibility on guard area, with guard area being at beginning of previously mmap'ed memory, works correctly on those platforms having issue with current implementation. Incidentally, this makes the logic more concise to handle both cases (with or without guard) is a more consistent way, and handle systems with partial/invalid page protection implementation (e.g. mprotect() returning ENOSYS) more gracefully since the stack is explicitly created with read/write access.

-Eric.

---

Subject: [PATCH] set up guard protection after stack is mapped

calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
some devices, making it impossible to create thread. instead, allocate memory
for stack and guard first, with read and write permission, then protect guard
area.
---
  src/thread/pthread_create.c | 11 ++++-------
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 49f2f72..f846bec 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -232,21 +232,18 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
  	} else {
  		guard = ROUND(attr._a_guardsize);
  		size = guard + ROUND(attr._a_stacksize
-			+ libc.tls_size +  __pthread_tsd_size);
+			+ libc.tls_size + __pthread_tsd_size);
  	}

  	if (!tsd) {
+		map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
+		if (map == MAP_FAILED) goto fail;
  		if (guard) {
-			map = __mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);
-			if (map == MAP_FAILED) goto fail;
-			if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
+			if (__mprotect(map, guard, PROT_NONE)
  			    && errno != ENOSYS) {
  				__munmap(map, size);
  				goto fail;
  			}
-		} else {
-			map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
-			if (map == MAP_FAILED) goto fail;
  		}
  		tsd = map + size - __pthread_tsd_size;
  		if (!stack) {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-20 19:45 Fix pthread_create on some devices failing to initialize guard area Eric Hassold
@ 2017-01-20 19:56 ` Rich Felker
  2017-01-20 21:04   ` Eric Hassold
  0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-20 19:56 UTC (permalink / raw)
  To: musl

On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
> Hi All,
> 
> While deploying test static executable across farm of different
> embedded systems, found out that pthread_create() is failing
> systematically on some (very few) arm-linux devices whenever non
> null stack guard is enabled (that is, also when calling
> pthread_create with default - i.e. null - attributes since default
> is a one page of guard). One of those device is for example a
> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
> alternative libc implementations (glibc, uClibc) works as expected
> on those devices.
> 
> 
> Issue
> 
> This occurs because of call to mprotect() in pthread_create fails.
> In current implementation, if guard size is non null, memory for
> (guard + stack + ...) is first allocated (mmap'ed) with no
> accessibility (PROT_NONE), then mprotect() is called to re-enable
> read/write access to (memory + guardsize). Since call to mprotect()
> systematically fails in this scenario (returning error code EINVAL),
> it is impossible to create thread.

Failure is ignored and the memory is assumed to be writable in this
case, since EINVAL is assumed to imply no MMU. Is this assumption
wrong in your case, and if so, can you explain why?

> Patch
> 
> In proposed patch (attached below), memory for (guard + stack + ...)
> is first mmap'ed with read/write accessibility, then guard area is
> protected by calling mprotect() with PROT_NONE on guardsize first
> bytes of returned memory. This call to mprotect() to remove all
> accessibility on guard area, with guard area being at beginning of
> previously mmap'ed memory, works correctly on those platforms having
> issue with current implementation. Incidentally, this makes the
> logic more concise to handle both cases (with or without guard) is a
> more consistent way, and handle systems with partial/invalid page
> protection implementation (e.g. mprotect() returning ENOSYS) more
> gracefully since the stack is explicitly created with read/write
> access.

This doesn't work correctly on normal systems with mmu, because the
size of the guard pages is accounted against commit charge. Linux
should, but AFAIK doesn't, subtract it from commit charge once it's
changed to PROT_NONE without having been dirtied, but even if this bug
is fixed on the kernel side, there would still be a moment where
excess commit charge is consumed and thus where pthread_create might
spuriously fail or cause allocations in other processes/threads to
fail.

If the kernel is not allocating actually-usable address ranges for
PROT_NONE on all nommu systems, I think the only solution is to handle
EINVAL from mprotect by going back and re-doing the mmap with
PROT_READ|PROT_WRITE. Do you have any better ideas?

Rich


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-20 19:56 ` Rich Felker
@ 2017-01-20 21:04   ` Eric Hassold
  2017-01-20 21:29     ` Rich Felker
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-20 21:04 UTC (permalink / raw)
  To: musl

On 1/20/17 11:56 AM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>> Hi All,
>>
>> While deploying test static executable across farm of different
>> embedded systems, found out that pthread_create() is failing
>> systematically on some (very few) arm-linux devices whenever non
>> null stack guard is enabled (that is, also when calling
>> pthread_create with default - i.e. null - attributes since default
>> is a one page of guard). One of those device is for example a
>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>> alternative libc implementations (glibc, uClibc) works as expected
>> on those devices.
>>
>>
>> Issue
>>
>> This occurs because of call to mprotect() in pthread_create fails.
>> In current implementation, if guard size is non null, memory for
>> (guard + stack + ...) is first allocated (mmap'ed) with no
>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>> read/write access to (memory + guardsize). Since call to mprotect()
>> systematically fails in this scenario (returning error code EINVAL),
>> it is impossible to create thread.
> Failure is ignored and the memory is assumed to be writable in this
> case, since EINVAL is assumed to imply no MMU. Is this assumption
> wrong in your case, and if so, can you explain why?

In my case, devices exhibiting issue are not MMU-less, they are 
Cortex-A9 devices with valid mmu / page protection working as expected 
otherwise. Note that current Musl code assumes ENOSYS means no MMU and 
handles it by assuming the system has no page protection at all. For the 
case I observe, it is EINVAL which is returned, this is not ignored, so 
memory is unmap'ed and pthread_create() fails.

>
>> Patch
>>
>> In proposed patch (attached below), memory for (guard + stack + ...)
>> is first mmap'ed with read/write accessibility, then guard area is
>> protected by calling mprotect() with PROT_NONE on guardsize first
>> bytes of returned memory. This call to mprotect() to remove all
>> accessibility on guard area, with guard area being at beginning of
>> previously mmap'ed memory, works correctly on those platforms having
>> issue with current implementation. Incidentally, this makes the
>> logic more concise to handle both cases (with or without guard) is a
>> more consistent way, and handle systems with partial/invalid page
>> protection implementation (e.g. mprotect() returning ENOSYS) more
>> gracefully since the stack is explicitly created with read/write
>> access.
> This doesn't work correctly on normal systems with mmu, because the
> size of the guard pages is accounted against commit charge. Linux
> should, but AFAIK doesn't, subtract it from commit charge once it's
> changed to PROT_NONE without having been dirtied, but even if this bug
> is fixed on the kernel side, there would still be a moment where
> excess commit charge is consumed and thus where pthread_create might
> spuriously fail or cause allocations in other processes/threads to
> fail.
>
> If the kernel is not allocating actually-usable address ranges for
> PROT_NONE on all nommu systems, I think the only solution is to handle
> EINVAL from mprotect by going back and re-doing the mmap with
> PROT_READ|PROT_WRITE. Do you have any better ideas?
>
> Rich
Had this "deja vu" feeling... reminds me conversation you had in this 
thread some time ago elsewhere...
https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html

Your proposition seems reasonable on nommu system, but again, the issue 
observed here is on legit systems with mmu, with mprotect failing with 
EINVAL (and not ENOSYS), for some other reason than system not 
supporting page protection. Catching EINVAL error returned by mprotect 
and falling back to re-doing the mmap would mean actually silently 
running without stack guard on system supporting it, so I believe it is 
actually legitimate to fail and return error in that case. But that's 
difference use case than the issue I'm observing.

I took note of Balazs's suggestion (in the thread referenced above) to 
switch to a pattern similar to Musl's current one (mmap(PROT_NONE) + 
mprotect(stack, READ|WRITE)) in order to avoid those guard pages to 
actually occupy resources. But I can indeed observe that this approach 
fails on some devices (which have valid mmu), while I'm not sure I'm 
seeing the issue with first mapping PROT_READ|PROT_WRITE then 
mprotect(PROT_NONE) guard area. Latter approach (as implemented by 
patch) is, at least, consistent with all the other implementations out 
there (I checked glibc's allocatestack.c, but also e.g. bionic), and 
couldn't find report of those failures you are envisaging.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-20 21:04   ` Eric Hassold
@ 2017-01-20 21:29     ` Rich Felker
  2017-01-20 22:42       ` Eric Hassold
  0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-20 21:29 UTC (permalink / raw)
  To: musl

On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
> 
> On 1/20/17 11:56 AM, Rich Felker wrote:
> >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
> >>Hi All,
> >>
> >>While deploying test static executable across farm of different
> >>embedded systems, found out that pthread_create() is failing
> >>systematically on some (very few) arm-linux devices whenever non
> >>null stack guard is enabled (that is, also when calling
> >>pthread_create with default - i.e. null - attributes since default
> >>is a one page of guard). One of those device is for example a
> >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with
> >>alternative libc implementations (glibc, uClibc) works as expected
> >>on those devices.
> >>
> >>
> >>Issue
> >>
> >>This occurs because of call to mprotect() in pthread_create fails.
> >>In current implementation, if guard size is non null, memory for
> >>(guard + stack + ...) is first allocated (mmap'ed) with no
> >>accessibility (PROT_NONE), then mprotect() is called to re-enable
> >>read/write access to (memory + guardsize). Since call to mprotect()
> >>systematically fails in this scenario (returning error code EINVAL),
> >>it is impossible to create thread.
> >Failure is ignored and the memory is assumed to be writable in this
> >case, since EINVAL is assumed to imply no MMU. Is this assumption
> >wrong in your case, and if so, can you explain why?
> 
> In my case, devices exhibiting issue are not MMU-less, they are
> Cortex-A9 devices with valid mmu / page protection working as
> expected otherwise. Note that current Musl code assumes ENOSYS means
> no MMU and handles it by assuming the system has no page protection
> at all. For the case I observe, it is EINVAL which is returned, this
> is not ignored, so memory is unmap'ed and pthread_create() fails.

In that case I think this is a kernel bug. Do you know why EINVAL is
happening? If there's an MMU, Linux should be able to replace the
anon PROT_NONE pages with anon RW pages.

> >>In proposed patch (attached below), memory for (guard + stack + ...)
> >>is first mmap'ed with read/write accessibility, then guard area is
> >>protected by calling mprotect() with PROT_NONE on guardsize first
> >>bytes of returned memory. This call to mprotect() to remove all
> >>accessibility on guard area, with guard area being at beginning of
> >>previously mmap'ed memory, works correctly on those platforms having
> >>issue with current implementation. Incidentally, this makes the
> >>logic more concise to handle both cases (with or without guard) is a
> >>more consistent way, and handle systems with partial/invalid page
> >>protection implementation (e.g. mprotect() returning ENOSYS) more
> >>gracefully since the stack is explicitly created with read/write
> >>access.
> >This doesn't work correctly on normal systems with mmu, because the
> >size of the guard pages is accounted against commit charge. Linux
> >should, but AFAIK doesn't, subtract it from commit charge once it's
> >changed to PROT_NONE without having been dirtied, but even if this bug
> >is fixed on the kernel side, there would still be a moment where
> >excess commit charge is consumed and thus where pthread_create might
> >spuriously fail or cause allocations in other processes/threads to
> >fail.
> >
> >If the kernel is not allocating actually-usable address ranges for
> >PROT_NONE on all nommu systems, I think the only solution is to handle
> >EINVAL from mprotect by going back and re-doing the mmap with
> >PROT_READ|PROT_WRITE. Do you have any better ideas?
> >
> >Rich
> Had this "deja vu" feeling... reminds me conversation you had in
> this thread some time ago elsewhere...
> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
> 
> Your proposition seems reasonable on nommu system, but again, the
> issue observed here is on legit systems with mmu, with mprotect
> failing with EINVAL (and not ENOSYS), for some other reason than
> system not supporting page protection. Catching EINVAL error
> returned by mprotect and falling back to re-doing the mmap would
> mean actually silently running without stack guard on system
> supporting it, so I believe it is actually legitimate to fail and
> return error in that case. But that's difference use case than the
> issue I'm observing.
> 
> I took note of Balazs's suggestion (in the thread referenced above)
> to switch to a pattern similar to Musl's current one
> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
> those guard pages to actually occupy resources. But I can indeed
> observe that this approach fails on some devices (which have valid
> mmu), while I'm not sure I'm seeing the issue with first mapping
> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
> approach (as implemented by patch) is, at least, consistent with all
> the other implementations out there (I checked glibc's
> allocatestack.c, but also e.g. bionic), and couldn't find report of
> those failures you are envisaging.

Consider the case of guard_size=128M stack_size=128k with
Commit_Limit=128M. This will fail with your approach but works
perfectly well now.

Rich


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-20 21:29     ` Rich Felker
@ 2017-01-20 22:42       ` Eric Hassold
  2017-01-30 21:30         ` Eric Hassold
  2017-01-31  3:58         ` Rich Felker
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Hassold @ 2017-01-20 22:42 UTC (permalink / raw)
  To: musl



On 1/20/17 1:29 PM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
>> On 1/20/17 11:56 AM, Rich Felker wrote:
>>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>>>> Hi All,
>>>>
>>>> While deploying test static executable across farm of different
>>>> embedded systems, found out that pthread_create() is failing
>>>> systematically on some (very few) arm-linux devices whenever non
>>>> null stack guard is enabled (that is, also when calling
>>>> pthread_create with default - i.e. null - attributes since default
>>>> is a one page of guard). One of those device is for example a
>>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>>>> alternative libc implementations (glibc, uClibc) works as expected
>>>> on those devices.
>>>>
>>>>
>>>> Issue
>>>>
>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>> In current implementation, if guard size is non null, memory for
>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>> systematically fails in this scenario (returning error code EINVAL),
>>>> it is impossible to create thread.
>>> Failure is ignored and the memory is assumed to be writable in this
>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>> wrong in your case, and if so, can you explain why?
>> In my case, devices exhibiting issue are not MMU-less, they are
>> Cortex-A9 devices with valid mmu / page protection working as
>> expected otherwise. Note that current Musl code assumes ENOSYS means
>> no MMU and handles it by assuming the system has no page protection
>> at all. For the case I observe, it is EINVAL which is returned, this
>> is not ignored, so memory is unmap'ed and pthread_create() fails.
> In that case I think this is a kernel bug. Do you know why EINVAL is
> happening? If there's an MMU, Linux should be able to replace the
> anon PROT_NONE pages with anon RW pages.

Agree. Unfortunately, those are devices we don't built the kernel for, 
so have been hardly able to track issue deeper. The point is however 
that such devices with this issue in kernel might not be that uncommon, 
and it concretely means impossibility at that moment to deploy on them a 
functional static executable built with musl.
>
>>>> In proposed patch (attached below), memory for (guard + stack + ...)
>>>> is first mmap'ed with read/write accessibility, then guard area is
>>>> protected by calling mprotect() with PROT_NONE on guardsize first
>>>> bytes of returned memory. This call to mprotect() to remove all
>>>> accessibility on guard area, with guard area being at beginning of
>>>> previously mmap'ed memory, works correctly on those platforms having
>>>> issue with current implementation. Incidentally, this makes the
>>>> logic more concise to handle both cases (with or without guard) is a
>>>> more consistent way, and handle systems with partial/invalid page
>>>> protection implementation (e.g. mprotect() returning ENOSYS) more
>>>> gracefully since the stack is explicitly created with read/write
>>>> access.
>>> This doesn't work correctly on normal systems with mmu, because the
>>> size of the guard pages is accounted against commit charge. Linux
>>> should, but AFAIK doesn't, subtract it from commit charge once it's
>>> changed to PROT_NONE without having been dirtied, but even if this bug
>>> is fixed on the kernel side, there would still be a moment where
>>> excess commit charge is consumed and thus where pthread_create might
>>> spuriously fail or cause allocations in other processes/threads to
>>> fail.
>>>
>>> If the kernel is not allocating actually-usable address ranges for
>>> PROT_NONE on all nommu systems, I think the only solution is to handle
>>> EINVAL from mprotect by going back and re-doing the mmap with
>>> PROT_READ|PROT_WRITE. Do you have any better ideas?
>>>
>>> Rich
>> Had this "deja vu" feeling... reminds me conversation you had in
>> this thread some time ago elsewhere...
>> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
>>
>> Your proposition seems reasonable on nommu system, but again, the
>> issue observed here is on legit systems with mmu, with mprotect
>> failing with EINVAL (and not ENOSYS), for some other reason than
>> system not supporting page protection. Catching EINVAL error
>> returned by mprotect and falling back to re-doing the mmap would
>> mean actually silently running without stack guard on system
>> supporting it, so I believe it is actually legitimate to fail and
>> return error in that case. But that's difference use case than the
>> issue I'm observing.
>>
>> I took note of Balazs's suggestion (in the thread referenced above)
>> to switch to a pattern similar to Musl's current one
>> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
>> those guard pages to actually occupy resources. But I can indeed
>> observe that this approach fails on some devices (which have valid
>> mmu), while I'm not sure I'm seeing the issue with first mapping
>> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
>> approach (as implemented by patch) is, at least, consistent with all
>> the other implementations out there (I checked glibc's
>> allocatestack.c, but also e.g. bionic), and couldn't find report of
>> those failures you are envisaging.
> Consider the case of guard_size=128M stack_size=128k with
> Commit_Limit=128M. This will fail with your approach but works
> perfectly well now.
Right, makes sense. Though I would point out that such uncommon scenario 
for an application would fail when linked with anything but musl, since 
"my approach" is the widespread one across all other libc 
implementations (glibc, bionic, ...). But I understand both approaches 
aim at working around issue in some corner case while creating issue in 
other rare one, and so at the end it's all about deciding which edge 
case is more critical than the other, and whether 
consistency/compatibility with existing 3rd parties solutions matters or 
not.

But in order to get closer to "having the cake and eat it too", please 
find attached another patch implementing the "unmaping and re-doing" 
strategy you initially suggested, i.e. starting with current approach, 
then giving it a second chance using "my" approach if and only if issue 
with mprotect() is detected. Should be consistent with current behavior 
on any system working correctly today, and just provide a plan B falling 
back to more common approach only when needed. Does that sound more 
acceptable?

Thanks,

Eric

----
[PATCH] set up guard protection after stack is mapped

calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
some devices with buggy kernel, making it impossible to create
thread. if such condition is met, fall back to allocating memory
for stack and guard first, with read and write permission, then protect 
guard
area.
---
  src/thread/pthread_create.c | 12 +++++++++++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 49f2f72..d3c030b 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, const 
pthread_attr_t *restrict att
              if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
                  && errno != ENOSYS) {
                  __munmap(map, size);
-                goto fail;
+                if (errno == EINVAL) {
+                    map = __mmap(0, size, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANON, -1, 0);
+                    if (map == MAP_FAILED) goto fail;
+                    if (__mprotect(map, guard, PROT_NONE)
+                        && errno != ENOSYS) {
+                        __munmap(map, size);
+                        goto fail;
+                                        }
+                } else {
+                    goto fail;
+                                }
              }
          } else {
              map = __mmap(0, size, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANON, -1, 0);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-20 22:42       ` Eric Hassold
@ 2017-01-30 21:30         ` Eric Hassold
  2017-01-30 23:13           ` Rich Felker
  2017-01-31  3:58         ` Rich Felker
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-30 21:30 UTC (permalink / raw)
  To: musl



On 1/20/17 2:42 PM, Eric Hassold wrote:
>
>
> On 1/20/17 1:29 PM, Rich Felker wrote:
>> On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
>>> On 1/20/17 11:56 AM, Rich Felker wrote:
>>>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>>>>> Hi All,
>>>>>
>>>>> While deploying test static executable across farm of different
>>>>> embedded systems, found out that pthread_create() is failing
>>>>> systematically on some (very few) arm-linux devices whenever non
>>>>> null stack guard is enabled (that is, also when calling
>>>>> pthread_create with default - i.e. null - attributes since default
>>>>> is a one page of guard). One of those device is for example a
>>>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>>>>> alternative libc implementations (glibc, uClibc) works as expected
>>>>> on those devices.
>>>>>
>>>>>
>>>>> Issue
>>>>>
>>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>>> In current implementation, if guard size is non null, memory for
>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>>> systematically fails in this scenario (returning error code EINVAL),
>>>>> it is impossible to create thread.
>>>> Failure is ignored and the memory is assumed to be writable in this
>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>>> wrong in your case, and if so, can you explain why?
>>> In my case, devices exhibiting issue are not MMU-less, they are
>>> Cortex-A9 devices with valid mmu / page protection working as
>>> expected otherwise. Note that current Musl code assumes ENOSYS means
>>> no MMU and handles it by assuming the system has no page protection
>>> at all. For the case I observe, it is EINVAL which is returned, this
>>> is not ignored, so memory is unmap'ed and pthread_create() fails.
>> In that case I think this is a kernel bug. Do you know why EINVAL is
>> happening? If there's an MMU, Linux should be able to replace the
>> anon PROT_NONE pages with anon RW pages.
>
> Agree. Unfortunately, those are devices we don't built the kernel for, 
> so have been hardly able to track issue deeper. The point is however 
> that such devices with this issue in kernel might not be that 
> uncommon, and it concretely means impossibility at that moment to 
> deploy on them a functional static executable built with musl.
>>
>>>>> In proposed patch (attached below), memory for (guard + stack + ...)
>>>>> is first mmap'ed with read/write accessibility, then guard area is
>>>>> protected by calling mprotect() with PROT_NONE on guardsize first
>>>>> bytes of returned memory. This call to mprotect() to remove all
>>>>> accessibility on guard area, with guard area being at beginning of
>>>>> previously mmap'ed memory, works correctly on those platforms having
>>>>> issue with current implementation. Incidentally, this makes the
>>>>> logic more concise to handle both cases (with or without guard) is a
>>>>> more consistent way, and handle systems with partial/invalid page
>>>>> protection implementation (e.g. mprotect() returning ENOSYS) more
>>>>> gracefully since the stack is explicitly created with read/write
>>>>> access.
>>>> This doesn't work correctly on normal systems with mmu, because the
>>>> size of the guard pages is accounted against commit charge. Linux
>>>> should, but AFAIK doesn't, subtract it from commit charge once it's
>>>> changed to PROT_NONE without having been dirtied, but even if this bug
>>>> is fixed on the kernel side, there would still be a moment where
>>>> excess commit charge is consumed and thus where pthread_create might
>>>> spuriously fail or cause allocations in other processes/threads to
>>>> fail.
>>>>
>>>> If the kernel is not allocating actually-usable address ranges for
>>>> PROT_NONE on all nommu systems, I think the only solution is to handle
>>>> EINVAL from mprotect by going back and re-doing the mmap with
>>>> PROT_READ|PROT_WRITE. Do you have any better ideas?
>>>>
>>>> Rich
>>> Had this "deja vu" feeling... reminds me conversation you had in
>>> this thread some time ago elsewhere...
>>> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
>>>
>>> Your proposition seems reasonable on nommu system, but again, the
>>> issue observed here is on legit systems with mmu, with mprotect
>>> failing with EINVAL (and not ENOSYS), for some other reason than
>>> system not supporting page protection. Catching EINVAL error
>>> returned by mprotect and falling back to re-doing the mmap would
>>> mean actually silently running without stack guard on system
>>> supporting it, so I believe it is actually legitimate to fail and
>>> return error in that case. But that's difference use case than the
>>> issue I'm observing.
>>>
>>> I took note of Balazs's suggestion (in the thread referenced above)
>>> to switch to a pattern similar to Musl's current one
>>> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
>>> those guard pages to actually occupy resources. But I can indeed
>>> observe that this approach fails on some devices (which have valid
>>> mmu), while I'm not sure I'm seeing the issue with first mapping
>>> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
>>> approach (as implemented by patch) is, at least, consistent with all
>>> the other implementations out there (I checked glibc's
>>> allocatestack.c, but also e.g. bionic), and couldn't find report of
>>> those failures you are envisaging.
>> Consider the case of guard_size=128M stack_size=128k with
>> Commit_Limit=128M. This will fail with your approach but works
>> perfectly well now.
> Right, makes sense. Though I would point out that such uncommon 
> scenario for an application would fail when linked with anything but 
> musl, since "my approach" is the widespread one across all other libc 
> implementations (glibc, bionic, ...). But I understand both approaches 
> aim at working around issue in some corner case while creating issue 
> in other rare one, and so at the end it's all about deciding which 
> edge case is more critical than the other, and whether 
> consistency/compatibility with existing 3rd parties solutions matters 
> or not.
>
> But in order to get closer to "having the cake and eat it too", please 
> find attached another patch implementing the "unmaping and re-doing" 
> strategy you initially suggested, i.e. starting with current approach, 
> then giving it a second chance using "my" approach if and only if 
> issue with mprotect() is detected. Should be consistent with current 
> behavior on any system working correctly today, and just provide a 
> plan B falling back to more common approach only when needed. Does 
> that sound more acceptable?
>
> Thanks,
>
> Eric
>
> ----
> [PATCH] set up guard protection after stack is mapped
>
> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack 
> fails on
> some devices with buggy kernel, making it impossible to create
> thread. if such condition is met, fall back to allocating memory
> for stack and guard first, with read and write permission, then 
> protect guard
> area.
> ---
>  src/thread/pthread_create.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 49f2f72..d3c030b 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, 
> const pthread_attr_t *restrict att
>              if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
>                  && errno != ENOSYS) {
>                  __munmap(map, size);
> -                goto fail;
> +                if (errno == EINVAL) {
> +                    map = __mmap(0, size, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANON, -1, 0);
> +                    if (map == MAP_FAILED) goto fail;
> +                    if (__mprotect(map, guard, PROT_NONE)
> +                        && errno != ENOSYS) {
> +                        __munmap(map, size);
> +                        goto fail;
> +                                        }
> +                } else {
> +                    goto fail;
> +                                }
>              }
>          } else {
>              map = __mmap(0, size, PROT_READ|PROT_WRITE, 
> MAP_PRIVATE|MAP_ANON, -1, 0);

Hi Rich,

Pinging... any comment, feedback or concern about latest version of the 
patch, attached above, keeping current behavior but falling back to 
(mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only if 
current approach detected to fail) ?

Eric



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-30 21:30         ` Eric Hassold
@ 2017-01-30 23:13           ` Rich Felker
  2017-01-31  2:52             ` Eric Hassold
  0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-30 23:13 UTC (permalink / raw)
  To: musl

On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote:
> >>>>>This occurs because of call to mprotect() in pthread_create fails.
> >>>>>In current implementation, if guard size is non null, memory for
> >>>>>(guard + stack + ...) is first allocated (mmap'ed) with no
> >>>>>accessibility (PROT_NONE), then mprotect() is called to re-enable
> >>>>>read/write access to (memory + guardsize). Since call to mprotect()
> >>>>>systematically fails in this scenario (returning error code EINVAL),
> >>>>>it is impossible to create thread.
> >>>>Failure is ignored and the memory is assumed to be writable in this
> >>>>case, since EINVAL is assumed to imply no MMU. Is this assumption
> >>>>wrong in your case, and if so, can you explain why?
> >>>In my case, devices exhibiting issue are not MMU-less, they are
> >>>Cortex-A9 devices with valid mmu / page protection working as
> >>>expected otherwise. Note that current Musl code assumes ENOSYS means
> >>>no MMU and handles it by assuming the system has no page protection
> >>>at all. For the case I observe, it is EINVAL which is returned, this
> >>>is not ignored, so memory is unmap'ed and pthread_create() fails.
> >>In that case I think this is a kernel bug. Do you know why EINVAL is
> >>happening? If there's an MMU, Linux should be able to replace the
> >>anon PROT_NONE pages with anon RW pages.
> >
> >Agree. Unfortunately, those are devices we don't built the kernel
> >for, so have been hardly able to track issue deeper. The point is
> >however that such devices with this issue in kernel might not be
> >that uncommon, and it concretely means impossibility at that
> >moment to deploy on them a functional static executable built with
> >musl.
> [...]
> Pinging... any comment, feedback or concern about latest version of
> the patch, attached above, keeping current behavior but falling back
> to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only
> if current approach detected to fail) ?

I still want to know what's going on on the kernel side, because it
looks like this a rogue/nonsensical patch to the kernel that breaks
mmap functionality in a general way that has nothing to do with the
specific cpu/board.

Rich


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-30 23:13           ` Rich Felker
@ 2017-01-31  2:52             ` Eric Hassold
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Hassold @ 2017-01-31  2:52 UTC (permalink / raw)
  To: musl



On 1/30/17 3:13 PM, Rich Felker wrote:
> On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote:
>>>>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>>>>> In current implementation, if guard size is non null, memory for
>>>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>>>>> systematically fails in this scenario (returning error code EINVAL),
>>>>>>> it is impossible to create thread.
>>>>>> Failure is ignored and the memory is assumed to be writable in this
>>>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>>>>> wrong in your case, and if so, can you explain why?
>>>>> In my case, devices exhibiting issue are not MMU-less, they are
>>>>> Cortex-A9 devices with valid mmu / page protection working as
>>>>> expected otherwise. Note that current Musl code assumes ENOSYS means
>>>>> no MMU and handles it by assuming the system has no page protection
>>>>> at all. For the case I observe, it is EINVAL which is returned, this
>>>>> is not ignored, so memory is unmap'ed and pthread_create() fails.
>>>> In that case I think this is a kernel bug. Do you know why EINVAL is
>>>> happening? If there's an MMU, Linux should be able to replace the
>>>> anon PROT_NONE pages with anon RW pages.
>>> Agree. Unfortunately, those are devices we don't built the kernel
>>> for, so have been hardly able to track issue deeper. The point is
>>> however that such devices with this issue in kernel might not be
>>> that uncommon, and it concretely means impossibility at that
>>> moment to deploy on them a functional static executable built with
>>> musl.
>> [...]
>> Pinging... any comment, feedback or concern about latest version of
>> the patch, attached above, keeping current behavior but falling back
>> to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only
>> if current approach detected to fail) ?
> I still want to know what's going on on the kernel side, because it
> looks like this a rogue/nonsensical patch to the kernel that breaks
> mmap functionality in a general way that has nothing to do with the
> specific cpu/board.
>
> Rich
Yes, I understand this would be ideal, and wish too I would be able to 
find out more about what's happening, beyond the symptomatic approach. 
But as mentioned, we have very little leverage to investigate much 
further what's going on, since issue wasn't reproducible on any device 
we built kernel for, but only on a few devices we don't build those 
kernels nor have easy way to perform some lower level debugging on those 
platforms (no jtag access). Telemetry was reporting failure on pthread 
creation on some devices, one of them we could reproduce in-house was a 
Marvell 375 board used e.g. in Western Digital MyCloud devices. But it 
seems hard to even just get reference to the actual GPL source of the 
exact kernel used there, so I don't see a way to find out if it's a 
broken patch introduced at one point by a vendor, or some transient 
regression fixed at one point but still having devices shipped with this 
oddity (kernel versions seems to be 3.10 branch, so one very rough guess 
is that it may be related to multiple patches done to support huge TLB 
on ARM at that time, but that's more of a guts feeling than supported by 
facts).

Though I understand it is not the role of a libc implementation to work 
around all kind of flaws in kernel implementations that vendors ever 
shipped, I would find it valuable, in this specific case, to prevent 
musl from consistently bubbling up an issue not (easily) reproducible 
otherwise (e.g. with other libc implementations). One of the great value 
of musl is to allow easy deployment across a variety of devices (not as 
opened as we would ideally hope) of the same statically linked 
executable. This suggested patch aims at supporting this use case, since 
fixing kernel isn't an option there. Of course, this pragmatic 
motivation conflicts somehow with the legitimate goal to keep musl 
implementation lean and clean (from a functional standpoint, this 
doesn't introduce any change on behavior on all non broken platforms). 
So of course only you can value the pros (regarding deployment) and 
decide whether it is worth the cons of slightly "bloating" source code 
with alternative recovery path.

Thanks,

Eric



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-20 22:42       ` Eric Hassold
  2017-01-30 21:30         ` Eric Hassold
@ 2017-01-31  3:58         ` Rich Felker
  2017-01-31 21:18           ` Eric Hassold
  1 sibling, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-31  3:58 UTC (permalink / raw)
  To: musl

On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote:
> ----
> [PATCH] set up guard protection after stack is mapped
> 
> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
> some devices with buggy kernel, making it impossible to create
> thread. if such condition is met, fall back to allocating memory
> for stack and guard first, with read and write permission, then
> protect guard
> area.
> ---
>  src/thread/pthread_create.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 49f2f72..d3c030b 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
> const pthread_attr_t *restrict att
>              if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
>                  && errno != ENOSYS) {
>                  __munmap(map, size);
> -                goto fail;
> +                if (errno == EINVAL) {

In principle __munmap could have clobbered errno at this point. I
don't think it will but to be safe errno should probably be saved.

> +                    map = __mmap(0, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANON, -1, 0);
> +                    if (map == MAP_FAILED) goto fail;
> +                    if (__mprotect(map, guard, PROT_NONE)
> +                        && errno != ENOSYS) {
> +                        __munmap(map, size);
> +                        goto fail;
> +                                        }
> +                } else {
> +                    goto fail;
> +                                }

Can you try instead falling back to mmap with MAP_FIXED over top of
the part that should be RW rather than unmapping and redoing the whole
thing? If that works it would preserve the property of avoiding excess
commit charge. If not then these kernels are very very broken. Either
way it might shed some more light on what's going on. I understand you
just want things to work, but I do also want to have some
understanding of _why_ we're going to be working around something
awful, if we really have to.

Rich


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-31  3:58         ` Rich Felker
@ 2017-01-31 21:18           ` Eric Hassold
  2017-01-31 22:44             ` Eric Hassold
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-31 21:18 UTC (permalink / raw)
  To: musl


On 1/30/17 7:58 PM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote:
>> ----
>> [PATCH] set up guard protection after stack is mapped
>>
>> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
>> some devices with buggy kernel, making it impossible to create
>> thread. if such condition is met, fall back to allocating memory
>> for stack and guard first, with read and write permission, then
>> protect guard
>> area.
>> ---
>>   src/thread/pthread_create.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
>> index 49f2f72..d3c030b 100644
>> --- a/src/thread/pthread_create.c
>> +++ b/src/thread/pthread_create.c
>> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
>> const pthread_attr_t *restrict att
>>               if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
>>                   && errno != ENOSYS) {
>>                   __munmap(map, size);
>> -                goto fail;
>> +                if (errno == EINVAL) {
> In principle __munmap could have clobbered errno at this point. I
> don't think it will but to be safe errno should probably be saved.
>
>> +                    map = __mmap(0, size, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANON, -1, 0);
>> +                    if (map == MAP_FAILED) goto fail;
>> +                    if (__mprotect(map, guard, PROT_NONE)
>> +                        && errno != ENOSYS) {
>> +                        __munmap(map, size);
>> +                        goto fail;
>> +                                        }
>> +                } else {
>> +                    goto fail;
>> +                                }
> Can you try instead falling back to mmap with MAP_FIXED over top of
> the part that should be RW rather than unmapping and redoing the whole
> thing? If that works it would preserve the property of avoiding excess
> commit charge. If not then these kernels are very very broken. Either
> way it might shed some more light on what's going on. I understand you
> just want things to work, but I do also want to have some
> understanding of _why_ we're going to be working around something
> awful, if we really have to.
>
> Rich

mmap(MAP_FIXED) at the address returned by first mmap(PROT_NONE) and 
then mprotect() indeed works.

BUT... actually kept exploring the idea that this might just be an issue 
with page alignment. And found out that forcing the guard size to be 
32k-aligned make current approach work well. So checked and found out 
that on systems where issue happens, the actual page size determined 
dynamically at runtime (using "getinfo PAGESIZE", and confirmed using 
ratio of Mapped memory / nr_mapped pages from /proc/vmstat) is indeed 
32k, whereas Musl has an hardcoded at compile time 4k page size for arm 
architecture (with same eventually incorrect value returned by 
getpagesize() and sysconf(__SC_PAGESIZE)).

Long story short, no odd behavior of the mmap in kernel, but Musl not 
supporting correctly non-4k page systems, impacting pthread_create but 
probably also malloc and mprotect. Given non-4k page systems are getting 
more and more common (on arm and arm64 architectures at least), this 
might be an important issue to fix. Should I start new "support for 
non-4k page systems" discussion here ?

Eric



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-31 21:18           ` Eric Hassold
@ 2017-01-31 22:44             ` Eric Hassold
  2017-02-01  9:52               ` Szabolcs Nagy
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-31 22:44 UTC (permalink / raw)
  To: musl

On 1/31/17 1:18 PM, Eric Hassold wrote:
>
> On 1/30/17 7:58 PM, Rich Felker wrote:
>> On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote:
>>> ----
>>> [PATCH] set up guard protection after stack is mapped
>>>
>>> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack 
>>> fails on
>>> some devices with buggy kernel, making it impossible to create
>>> thread. if such condition is met, fall back to allocating memory
>>> for stack and guard first, with read and write permission, then
>>> protect guard
>>> area.
>>> ---
>>>   src/thread/pthread_create.c | 12 +++++++++++-
>>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
>>> index 49f2f72..d3c030b 100644
>>> --- a/src/thread/pthread_create.c
>>> +++ b/src/thread/pthread_create.c
>>> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
>>> const pthread_attr_t *restrict att
>>>               if (__mprotect(map+guard, size-guard, 
>>> PROT_READ|PROT_WRITE)
>>>                   && errno != ENOSYS) {
>>>                   __munmap(map, size);
>>> -                goto fail;
>>> +                if (errno == EINVAL) {
>> In principle __munmap could have clobbered errno at this point. I
>> don't think it will but to be safe errno should probably be saved.
>>
>>> +                    map = __mmap(0, size, PROT_READ|PROT_WRITE,
>>> MAP_PRIVATE|MAP_ANON, -1, 0);
>>> +                    if (map == MAP_FAILED) goto fail;
>>> +                    if (__mprotect(map, guard, PROT_NONE)
>>> +                        && errno != ENOSYS) {
>>> +                        __munmap(map, size);
>>> +                        goto fail;
>>> +                                        }
>>> +                } else {
>>> +                    goto fail;
>>> +                                }
>> Can you try instead falling back to mmap with MAP_FIXED over top of
>> the part that should be RW rather than unmapping and redoing the whole
>> thing? If that works it would preserve the property of avoiding excess
>> commit charge. If not then these kernels are very very broken. Either
>> way it might shed some more light on what's going on. I understand you
>> just want things to work, but I do also want to have some
>> understanding of _why_ we're going to be working around something
>> awful, if we really have to.
>>
>> Rich
>
> mmap(MAP_FIXED) at the address returned by first mmap(PROT_NONE) and 
> then mprotect() indeed works.
>
> BUT... actually kept exploring the idea that this might just be an 
> issue with page alignment. And found out that forcing the guard size 
> to be 32k-aligned make current approach work well. So checked and 
> found out that on systems where issue happens, the actual page size 
> determined dynamically at runtime (using "getinfo PAGESIZE", and 
> confirmed using ratio of Mapped memory / nr_mapped pages from 
> /proc/vmstat) is indeed 32k, whereas Musl has an hardcoded at compile 
> time 4k page size for arm architecture (with same eventually incorrect 
> value returned by getpagesize() and sysconf(__SC_PAGESIZE)).
>
> Long story short, no odd behavior of the mmap in kernel, but Musl not 
> supporting correctly non-4k page systems, impacting pthread_create but 
> probably also malloc and mprotect. Given non-4k page systems are 
> getting more and more common (on arm and arm64 architectures at 
> least), this might be an important issue to fix. Should I start new 
> "support for non-4k page systems" discussion here ?
>
> Eric
>
Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h 
makes PAGE_SIZE to be defined by internal/libc.h as libc.page_size which 
is initialized correctly at runtime, making the correct page size be 
used all around musl. Since a page size of 4k is no more a valid 
assumption on arm-linux target, is removing this compile time define a 
right fix?

Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-01-31 22:44             ` Eric Hassold
@ 2017-02-01  9:52               ` Szabolcs Nagy
  2017-02-01 18:21                 ` Eric Hassold
  0 siblings, 1 reply; 15+ messages in thread
From: Szabolcs Nagy @ 2017-02-01  9:52 UTC (permalink / raw)
  To: musl

* Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]:
> On 1/31/17 1:18 PM, Eric Hassold wrote:
> > Long story short, no odd behavior of the mmap in kernel, but Musl not
> > supporting correctly non-4k page systems, impacting pthread_create but
> > probably also malloc and mprotect. Given non-4k page systems are getting
> > more and more common (on arm and arm64 architectures at least), this
> > might be an important issue to fix. Should I start new "support for
> > non-4k page systems" discussion here ?
> > 
> > Eric
> > 
> Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes
> PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is
> initialized correctly at runtime, making the correct page size be used all
> around musl. Since a page size of 4k is no more a valid assumption on
> arm-linux target, is removing this compile time define a right fix?

it is ok to remove PAGE_SIZE but i thought arm elf
binaries would depend on the 4k page size.. making it
less useful to support other page sizes, but if 32k
works in practice then i think musl should support it.

i guess the boundary of unaligned sections will be
mapped twice which is not a big issue except may be
you don't get relro support if the readonly relocs
are 4k aligned.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-02-01  9:52               ` Szabolcs Nagy
@ 2017-02-01 18:21                 ` Eric Hassold
  2017-02-01 18:35                   ` Rich Felker
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-02-01 18:21 UTC (permalink / raw)
  To: musl


On 2/1/17 1:52 AM, Szabolcs Nagy wrote:
> * Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]:
>> On 1/31/17 1:18 PM, Eric Hassold wrote:
>>> Long story short, no odd behavior of the mmap in kernel, but Musl not
>>> supporting correctly non-4k page systems, impacting pthread_create but
>>> probably also malloc and mprotect. Given non-4k page systems are getting
>>> more and more common (on arm and arm64 architectures at least), this
>>> might be an important issue to fix. Should I start new "support for
>>> non-4k page systems" discussion here ?
>>>
>>> Eric
>>>
>> Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes
>> PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is
>> initialized correctly at runtime, making the correct page size be used all
>> around musl. Since a page size of 4k is no more a valid assumption on
>> arm-linux target, is removing this compile time define a right fix?
> it is ok to remove PAGE_SIZE but i thought arm elf
> binaries would depend on the 4k page size.. making it
> less useful to support other page sizes, but if 32k
> works in practice then i think musl should support it.

Binutils was updated couple years ago to support up to 64KB page size 
and default text alignment to 64kB boundary for armel and match changes 
in kernel:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7572ca8989ead4c3425a1500bc241eaaeffa2c89

So yes, executables built with older toolchains can't be deployed as is 
on such 32kB pagesize systems, but no issue with with executables built 
with recent  (binutils 2.27 && gcc 5.4) toolchain we are using.

Any further action on my side to proceed? I guess no much value in 
submitting a patch though mail here for such one liner change...

Eric

>
> i guess the boundary of unaligned sections will be
> mapped twice which is not a big issue except may be
> you don't get relro support if the readonly relocs
> are 4k aligned.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-02-01 18:21                 ` Eric Hassold
@ 2017-02-01 18:35                   ` Rich Felker
  2017-02-01 18:52                     ` Eric Hassold
  0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-02-01 18:35 UTC (permalink / raw)
  To: musl

On Wed, Feb 01, 2017 at 10:21:40AM -0800, Eric Hassold wrote:
> 
> On 2/1/17 1:52 AM, Szabolcs Nagy wrote:
> >* Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]:
> >>On 1/31/17 1:18 PM, Eric Hassold wrote:
> >>>Long story short, no odd behavior of the mmap in kernel, but Musl not
> >>>supporting correctly non-4k page systems, impacting pthread_create but
> >>>probably also malloc and mprotect. Given non-4k page systems are getting
> >>>more and more common (on arm and arm64 architectures at least), this
> >>>might be an important issue to fix. Should I start new "support for
> >>>non-4k page systems" discussion here ?
> >>>
> >>>Eric
> >>>
> >>Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes
> >>PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is
> >>initialized correctly at runtime, making the correct page size be used all
> >>around musl. Since a page size of 4k is no more a valid assumption on
> >>arm-linux target, is removing this compile time define a right fix?
> >it is ok to remove PAGE_SIZE but i thought arm elf
> >binaries would depend on the 4k page size.. making it
> >less useful to support other page sizes, but if 32k
> >works in practice then i think musl should support it.
> 
> Binutils was updated couple years ago to support up to 64KB page
> size and default text alignment to 64kB boundary for armel and match
> changes in kernel:
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7572ca8989ead4c3425a1500bc241eaaeffa2c89
> 
> So yes, executables built with older toolchains can't be deployed as
> is on such 32kB pagesize systems, but no issue with with executables
> built with recent  (binutils 2.27 && gcc 5.4) toolchain we are
> using.

OK, so this was a generalization of the ABI that we weren't aware
about. Bleh. But if the rest of the world is no longer considering 4k
page size part of the ABI, we should update musl to reflect this.

> Any further action on my side to proceed? I guess no much value in
> submitting a patch though mail here for such one liner change...

No further action needed. Thanks for being patient with my hesitance
to accept the initial patch and for working through this to find the
real problem.

Rich


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fix pthread_create on some devices failing to initialize guard area
  2017-02-01 18:35                   ` Rich Felker
@ 2017-02-01 18:52                     ` Eric Hassold
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Hassold @ 2017-02-01 18:52 UTC (permalink / raw)
  To: musl

On 2/1/17 10:35 AM, Rich Felker wrote:

> OK, so this was a generalization of the ABI that we weren't aware
> about. Bleh. But if the rest of the world is no longer considering 4k
> page size part of the ABI, we should update musl to reflect this.
>
>> Any further action on my side to proceed? I guess no much value in
>> submitting a patch though mail here for such one liner change...
> No further action needed. Thanks for being patient with my hesitance
> to accept the initial patch and for working through this to find the
> real problem.
>
> Rich
Your "hesitance" reflects your consistent rigor in asking the right 
questions, which allows Musl to be the highly robust and clean codebase 
we all enjoy, and created incentive to nail the actual root cause of the 
issue down. So your hesitance is much appreciated.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-02-01 18:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-20 19:45 Fix pthread_create on some devices failing to initialize guard area Eric Hassold
2017-01-20 19:56 ` Rich Felker
2017-01-20 21:04   ` Eric Hassold
2017-01-20 21:29     ` Rich Felker
2017-01-20 22:42       ` Eric Hassold
2017-01-30 21:30         ` Eric Hassold
2017-01-30 23:13           ` Rich Felker
2017-01-31  2:52             ` Eric Hassold
2017-01-31  3:58         ` Rich Felker
2017-01-31 21:18           ` Eric Hassold
2017-01-31 22:44             ` Eric Hassold
2017-02-01  9:52               ` Szabolcs Nagy
2017-02-01 18:21                 ` Eric Hassold
2017-02-01 18:35                   ` Rich Felker
2017-02-01 18:52                     ` Eric Hassold

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).