* Fix pthread_create on some devices failing to initialize guard area
@ 2017-01-20 19:45 Eric Hassold
2017-01-20 19:56 ` Rich Felker
0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-20 19:45 UTC (permalink / raw)
To: musl
Hi All,
While deploying test static executable across farm of different embedded systems, found out that pthread_create() is failing systematically on some (very few) arm-linux devices whenever non null stack guard is enabled (that is, also when calling pthread_create with default - i.e. null - attributes since default is a one page of guard). One of those device is for example a Marvell Armada 375 running Linux 3.10.39. Same test code, built with alternative libc implementations (glibc, uClibc) works as expected on those devices.
Issue
This occurs because of call to mprotect() in pthread_create fails. In current implementation, if guard size is non null, memory for (guard + stack + ...) is first allocated (mmap'ed) with no accessibility (PROT_NONE), then mprotect() is called to re-enable read/write access to (memory + guardsize). Since call to mprotect() systematically fails in this scenario (returning error code EINVAL), it is impossible to create thread.
Patch
In proposed patch (attached below), memory for (guard + stack + ...) is first mmap'ed with read/write accessibility, then guard area is protected by calling mprotect() with PROT_NONE on guardsize first bytes of returned memory. This call to mprotect() to remove all accessibility on guard area, with guard area being at beginning of previously mmap'ed memory, works correctly on those platforms having issue with current implementation. Incidentally, this makes the logic more concise to handle both cases (with or without guard) is a more consistent way, and handle systems with partial/invalid page protection implementation (e.g. mprotect() returning ENOSYS) more gracefully since the stack is explicitly created with read/write access.
-Eric.
---
Subject: [PATCH] set up guard protection after stack is mapped
calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
some devices, making it impossible to create thread. instead, allocate memory
for stack and guard first, with read and write permission, then protect guard
area.
---
src/thread/pthread_create.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 49f2f72..f846bec 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -232,21 +232,18 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
} else {
guard = ROUND(attr._a_guardsize);
size = guard + ROUND(attr._a_stacksize
- + libc.tls_size + __pthread_tsd_size);
+ + libc.tls_size + __pthread_tsd_size);
}
if (!tsd) {
+ map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
+ if (map == MAP_FAILED) goto fail;
if (guard) {
- map = __mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);
- if (map == MAP_FAILED) goto fail;
- if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
+ if (__mprotect(map, guard, PROT_NONE)
&& errno != ENOSYS) {
__munmap(map, size);
goto fail;
}
- } else {
- map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
- if (map == MAP_FAILED) goto fail;
}
tsd = map + size - __pthread_tsd_size;
if (!stack) {
--
2.7.4
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-20 19:45 Fix pthread_create on some devices failing to initialize guard area Eric Hassold
@ 2017-01-20 19:56 ` Rich Felker
2017-01-20 21:04 ` Eric Hassold
0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-20 19:56 UTC (permalink / raw)
To: musl
On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
> Hi All,
>
> While deploying test static executable across farm of different
> embedded systems, found out that pthread_create() is failing
> systematically on some (very few) arm-linux devices whenever non
> null stack guard is enabled (that is, also when calling
> pthread_create with default - i.e. null - attributes since default
> is a one page of guard). One of those device is for example a
> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
> alternative libc implementations (glibc, uClibc) works as expected
> on those devices.
>
>
> Issue
>
> This occurs because of call to mprotect() in pthread_create fails.
> In current implementation, if guard size is non null, memory for
> (guard + stack + ...) is first allocated (mmap'ed) with no
> accessibility (PROT_NONE), then mprotect() is called to re-enable
> read/write access to (memory + guardsize). Since call to mprotect()
> systematically fails in this scenario (returning error code EINVAL),
> it is impossible to create thread.
Failure is ignored and the memory is assumed to be writable in this
case, since EINVAL is assumed to imply no MMU. Is this assumption
wrong in your case, and if so, can you explain why?
> Patch
>
> In proposed patch (attached below), memory for (guard + stack + ...)
> is first mmap'ed with read/write accessibility, then guard area is
> protected by calling mprotect() with PROT_NONE on guardsize first
> bytes of returned memory. This call to mprotect() to remove all
> accessibility on guard area, with guard area being at beginning of
> previously mmap'ed memory, works correctly on those platforms having
> issue with current implementation. Incidentally, this makes the
> logic more concise to handle both cases (with or without guard) is a
> more consistent way, and handle systems with partial/invalid page
> protection implementation (e.g. mprotect() returning ENOSYS) more
> gracefully since the stack is explicitly created with read/write
> access.
This doesn't work correctly on normal systems with mmu, because the
size of the guard pages is accounted against commit charge. Linux
should, but AFAIK doesn't, subtract it from commit charge once it's
changed to PROT_NONE without having been dirtied, but even if this bug
is fixed on the kernel side, there would still be a moment where
excess commit charge is consumed and thus where pthread_create might
spuriously fail or cause allocations in other processes/threads to
fail.
If the kernel is not allocating actually-usable address ranges for
PROT_NONE on all nommu systems, I think the only solution is to handle
EINVAL from mprotect by going back and re-doing the mmap with
PROT_READ|PROT_WRITE. Do you have any better ideas?
Rich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-20 19:56 ` Rich Felker
@ 2017-01-20 21:04 ` Eric Hassold
2017-01-20 21:29 ` Rich Felker
0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-20 21:04 UTC (permalink / raw)
To: musl
On 1/20/17 11:56 AM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>> Hi All,
>>
>> While deploying test static executable across farm of different
>> embedded systems, found out that pthread_create() is failing
>> systematically on some (very few) arm-linux devices whenever non
>> null stack guard is enabled (that is, also when calling
>> pthread_create with default - i.e. null - attributes since default
>> is a one page of guard). One of those device is for example a
>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>> alternative libc implementations (glibc, uClibc) works as expected
>> on those devices.
>>
>>
>> Issue
>>
>> This occurs because of call to mprotect() in pthread_create fails.
>> In current implementation, if guard size is non null, memory for
>> (guard + stack + ...) is first allocated (mmap'ed) with no
>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>> read/write access to (memory + guardsize). Since call to mprotect()
>> systematically fails in this scenario (returning error code EINVAL),
>> it is impossible to create thread.
> Failure is ignored and the memory is assumed to be writable in this
> case, since EINVAL is assumed to imply no MMU. Is this assumption
> wrong in your case, and if so, can you explain why?
In my case, devices exhibiting issue are not MMU-less, they are
Cortex-A9 devices with valid mmu / page protection working as expected
otherwise. Note that current Musl code assumes ENOSYS means no MMU and
handles it by assuming the system has no page protection at all. For the
case I observe, it is EINVAL which is returned, this is not ignored, so
memory is unmap'ed and pthread_create() fails.
>
>> Patch
>>
>> In proposed patch (attached below), memory for (guard + stack + ...)
>> is first mmap'ed with read/write accessibility, then guard area is
>> protected by calling mprotect() with PROT_NONE on guardsize first
>> bytes of returned memory. This call to mprotect() to remove all
>> accessibility on guard area, with guard area being at beginning of
>> previously mmap'ed memory, works correctly on those platforms having
>> issue with current implementation. Incidentally, this makes the
>> logic more concise to handle both cases (with or without guard) is a
>> more consistent way, and handle systems with partial/invalid page
>> protection implementation (e.g. mprotect() returning ENOSYS) more
>> gracefully since the stack is explicitly created with read/write
>> access.
> This doesn't work correctly on normal systems with mmu, because the
> size of the guard pages is accounted against commit charge. Linux
> should, but AFAIK doesn't, subtract it from commit charge once it's
> changed to PROT_NONE without having been dirtied, but even if this bug
> is fixed on the kernel side, there would still be a moment where
> excess commit charge is consumed and thus where pthread_create might
> spuriously fail or cause allocations in other processes/threads to
> fail.
>
> If the kernel is not allocating actually-usable address ranges for
> PROT_NONE on all nommu systems, I think the only solution is to handle
> EINVAL from mprotect by going back and re-doing the mmap with
> PROT_READ|PROT_WRITE. Do you have any better ideas?
>
> Rich
Had this "deja vu" feeling... reminds me conversation you had in this
thread some time ago elsewhere...
https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
Your proposition seems reasonable on nommu system, but again, the issue
observed here is on legit systems with mmu, with mprotect failing with
EINVAL (and not ENOSYS), for some other reason than system not
supporting page protection. Catching EINVAL error returned by mprotect
and falling back to re-doing the mmap would mean actually silently
running without stack guard on system supporting it, so I believe it is
actually legitimate to fail and return error in that case. But that's
difference use case than the issue I'm observing.
I took note of Balazs's suggestion (in the thread referenced above) to
switch to a pattern similar to Musl's current one (mmap(PROT_NONE) +
mprotect(stack, READ|WRITE)) in order to avoid those guard pages to
actually occupy resources. But I can indeed observe that this approach
fails on some devices (which have valid mmu), while I'm not sure I'm
seeing the issue with first mapping PROT_READ|PROT_WRITE then
mprotect(PROT_NONE) guard area. Latter approach (as implemented by
patch) is, at least, consistent with all the other implementations out
there (I checked glibc's allocatestack.c, but also e.g. bionic), and
couldn't find report of those failures you are envisaging.
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-20 21:04 ` Eric Hassold
@ 2017-01-20 21:29 ` Rich Felker
2017-01-20 22:42 ` Eric Hassold
0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-20 21:29 UTC (permalink / raw)
To: musl
On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
>
> On 1/20/17 11:56 AM, Rich Felker wrote:
> >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
> >>Hi All,
> >>
> >>While deploying test static executable across farm of different
> >>embedded systems, found out that pthread_create() is failing
> >>systematically on some (very few) arm-linux devices whenever non
> >>null stack guard is enabled (that is, also when calling
> >>pthread_create with default - i.e. null - attributes since default
> >>is a one page of guard). One of those device is for example a
> >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with
> >>alternative libc implementations (glibc, uClibc) works as expected
> >>on those devices.
> >>
> >>
> >>Issue
> >>
> >>This occurs because of call to mprotect() in pthread_create fails.
> >>In current implementation, if guard size is non null, memory for
> >>(guard + stack + ...) is first allocated (mmap'ed) with no
> >>accessibility (PROT_NONE), then mprotect() is called to re-enable
> >>read/write access to (memory + guardsize). Since call to mprotect()
> >>systematically fails in this scenario (returning error code EINVAL),
> >>it is impossible to create thread.
> >Failure is ignored and the memory is assumed to be writable in this
> >case, since EINVAL is assumed to imply no MMU. Is this assumption
> >wrong in your case, and if so, can you explain why?
>
> In my case, devices exhibiting issue are not MMU-less, they are
> Cortex-A9 devices with valid mmu / page protection working as
> expected otherwise. Note that current Musl code assumes ENOSYS means
> no MMU and handles it by assuming the system has no page protection
> at all. For the case I observe, it is EINVAL which is returned, this
> is not ignored, so memory is unmap'ed and pthread_create() fails.
In that case I think this is a kernel bug. Do you know why EINVAL is
happening? If there's an MMU, Linux should be able to replace the
anon PROT_NONE pages with anon RW pages.
> >>In proposed patch (attached below), memory for (guard + stack + ...)
> >>is first mmap'ed with read/write accessibility, then guard area is
> >>protected by calling mprotect() with PROT_NONE on guardsize first
> >>bytes of returned memory. This call to mprotect() to remove all
> >>accessibility on guard area, with guard area being at beginning of
> >>previously mmap'ed memory, works correctly on those platforms having
> >>issue with current implementation. Incidentally, this makes the
> >>logic more concise to handle both cases (with or without guard) is a
> >>more consistent way, and handle systems with partial/invalid page
> >>protection implementation (e.g. mprotect() returning ENOSYS) more
> >>gracefully since the stack is explicitly created with read/write
> >>access.
> >This doesn't work correctly on normal systems with mmu, because the
> >size of the guard pages is accounted against commit charge. Linux
> >should, but AFAIK doesn't, subtract it from commit charge once it's
> >changed to PROT_NONE without having been dirtied, but even if this bug
> >is fixed on the kernel side, there would still be a moment where
> >excess commit charge is consumed and thus where pthread_create might
> >spuriously fail or cause allocations in other processes/threads to
> >fail.
> >
> >If the kernel is not allocating actually-usable address ranges for
> >PROT_NONE on all nommu systems, I think the only solution is to handle
> >EINVAL from mprotect by going back and re-doing the mmap with
> >PROT_READ|PROT_WRITE. Do you have any better ideas?
> >
> >Rich
> Had this "deja vu" feeling... reminds me conversation you had in
> this thread some time ago elsewhere...
> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
>
> Your proposition seems reasonable on nommu system, but again, the
> issue observed here is on legit systems with mmu, with mprotect
> failing with EINVAL (and not ENOSYS), for some other reason than
> system not supporting page protection. Catching EINVAL error
> returned by mprotect and falling back to re-doing the mmap would
> mean actually silently running without stack guard on system
> supporting it, so I believe it is actually legitimate to fail and
> return error in that case. But that's difference use case than the
> issue I'm observing.
>
> I took note of Balazs's suggestion (in the thread referenced above)
> to switch to a pattern similar to Musl's current one
> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
> those guard pages to actually occupy resources. But I can indeed
> observe that this approach fails on some devices (which have valid
> mmu), while I'm not sure I'm seeing the issue with first mapping
> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
> approach (as implemented by patch) is, at least, consistent with all
> the other implementations out there (I checked glibc's
> allocatestack.c, but also e.g. bionic), and couldn't find report of
> those failures you are envisaging.
Consider the case of guard_size=128M stack_size=128k with
Commit_Limit=128M. This will fail with your approach but works
perfectly well now.
Rich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-20 21:29 ` Rich Felker
@ 2017-01-20 22:42 ` Eric Hassold
2017-01-30 21:30 ` Eric Hassold
2017-01-31 3:58 ` Rich Felker
0 siblings, 2 replies; 15+ messages in thread
From: Eric Hassold @ 2017-01-20 22:42 UTC (permalink / raw)
To: musl
On 1/20/17 1:29 PM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
>> On 1/20/17 11:56 AM, Rich Felker wrote:
>>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>>>> Hi All,
>>>>
>>>> While deploying test static executable across farm of different
>>>> embedded systems, found out that pthread_create() is failing
>>>> systematically on some (very few) arm-linux devices whenever non
>>>> null stack guard is enabled (that is, also when calling
>>>> pthread_create with default - i.e. null - attributes since default
>>>> is a one page of guard). One of those device is for example a
>>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>>>> alternative libc implementations (glibc, uClibc) works as expected
>>>> on those devices.
>>>>
>>>>
>>>> Issue
>>>>
>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>> In current implementation, if guard size is non null, memory for
>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>> systematically fails in this scenario (returning error code EINVAL),
>>>> it is impossible to create thread.
>>> Failure is ignored and the memory is assumed to be writable in this
>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>> wrong in your case, and if so, can you explain why?
>> In my case, devices exhibiting issue are not MMU-less, they are
>> Cortex-A9 devices with valid mmu / page protection working as
>> expected otherwise. Note that current Musl code assumes ENOSYS means
>> no MMU and handles it by assuming the system has no page protection
>> at all. For the case I observe, it is EINVAL which is returned, this
>> is not ignored, so memory is unmap'ed and pthread_create() fails.
> In that case I think this is a kernel bug. Do you know why EINVAL is
> happening? If there's an MMU, Linux should be able to replace the
> anon PROT_NONE pages with anon RW pages.
Agree. Unfortunately, those are devices we don't built the kernel for,
so have been hardly able to track issue deeper. The point is however
that such devices with this issue in kernel might not be that uncommon,
and it concretely means impossibility at that moment to deploy on them a
functional static executable built with musl.
>
>>>> In proposed patch (attached below), memory for (guard + stack + ...)
>>>> is first mmap'ed with read/write accessibility, then guard area is
>>>> protected by calling mprotect() with PROT_NONE on guardsize first
>>>> bytes of returned memory. This call to mprotect() to remove all
>>>> accessibility on guard area, with guard area being at beginning of
>>>> previously mmap'ed memory, works correctly on those platforms having
>>>> issue with current implementation. Incidentally, this makes the
>>>> logic more concise to handle both cases (with or without guard) is a
>>>> more consistent way, and handle systems with partial/invalid page
>>>> protection implementation (e.g. mprotect() returning ENOSYS) more
>>>> gracefully since the stack is explicitly created with read/write
>>>> access.
>>> This doesn't work correctly on normal systems with mmu, because the
>>> size of the guard pages is accounted against commit charge. Linux
>>> should, but AFAIK doesn't, subtract it from commit charge once it's
>>> changed to PROT_NONE without having been dirtied, but even if this bug
>>> is fixed on the kernel side, there would still be a moment where
>>> excess commit charge is consumed and thus where pthread_create might
>>> spuriously fail or cause allocations in other processes/threads to
>>> fail.
>>>
>>> If the kernel is not allocating actually-usable address ranges for
>>> PROT_NONE on all nommu systems, I think the only solution is to handle
>>> EINVAL from mprotect by going back and re-doing the mmap with
>>> PROT_READ|PROT_WRITE. Do you have any better ideas?
>>>
>>> Rich
>> Had this "deja vu" feeling... reminds me conversation you had in
>> this thread some time ago elsewhere...
>> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
>>
>> Your proposition seems reasonable on nommu system, but again, the
>> issue observed here is on legit systems with mmu, with mprotect
>> failing with EINVAL (and not ENOSYS), for some other reason than
>> system not supporting page protection. Catching EINVAL error
>> returned by mprotect and falling back to re-doing the mmap would
>> mean actually silently running without stack guard on system
>> supporting it, so I believe it is actually legitimate to fail and
>> return error in that case. But that's difference use case than the
>> issue I'm observing.
>>
>> I took note of Balazs's suggestion (in the thread referenced above)
>> to switch to a pattern similar to Musl's current one
>> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
>> those guard pages to actually occupy resources. But I can indeed
>> observe that this approach fails on some devices (which have valid
>> mmu), while I'm not sure I'm seeing the issue with first mapping
>> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
>> approach (as implemented by patch) is, at least, consistent with all
>> the other implementations out there (I checked glibc's
>> allocatestack.c, but also e.g. bionic), and couldn't find report of
>> those failures you are envisaging.
> Consider the case of guard_size=128M stack_size=128k with
> Commit_Limit=128M. This will fail with your approach but works
> perfectly well now.
Right, makes sense. Though I would point out that such uncommon scenario
for an application would fail when linked with anything but musl, since
"my approach" is the widespread one across all other libc
implementations (glibc, bionic, ...). But I understand both approaches
aim at working around issue in some corner case while creating issue in
other rare one, and so at the end it's all about deciding which edge
case is more critical than the other, and whether
consistency/compatibility with existing 3rd parties solutions matters or
not.
But in order to get closer to "having the cake and eat it too", please
find attached another patch implementing the "unmaping and re-doing"
strategy you initially suggested, i.e. starting with current approach,
then giving it a second chance using "my" approach if and only if issue
with mprotect() is detected. Should be consistent with current behavior
on any system working correctly today, and just provide a plan B falling
back to more common approach only when needed. Does that sound more
acceptable?
Thanks,
Eric
----
[PATCH] set up guard protection after stack is mapped
calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
some devices with buggy kernel, making it impossible to create
thread. if such condition is met, fall back to allocating memory
for stack and guard first, with read and write permission, then protect
guard
area.
---
src/thread/pthread_create.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 49f2f72..d3c030b 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res, const
pthread_attr_t *restrict att
if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
&& errno != ENOSYS) {
__munmap(map, size);
- goto fail;
+ if (errno == EINVAL) {
+ map = __mmap(0, size, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANON, -1, 0);
+ if (map == MAP_FAILED) goto fail;
+ if (__mprotect(map, guard, PROT_NONE)
+ && errno != ENOSYS) {
+ __munmap(map, size);
+ goto fail;
+ }
+ } else {
+ goto fail;
+ }
}
} else {
map = __mmap(0, size, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANON, -1, 0);
--
2.7.4
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-20 22:42 ` Eric Hassold
@ 2017-01-30 21:30 ` Eric Hassold
2017-01-30 23:13 ` Rich Felker
2017-01-31 3:58 ` Rich Felker
1 sibling, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-30 21:30 UTC (permalink / raw)
To: musl
On 1/20/17 2:42 PM, Eric Hassold wrote:
>
>
> On 1/20/17 1:29 PM, Rich Felker wrote:
>> On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
>>> On 1/20/17 11:56 AM, Rich Felker wrote:
>>>> On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
>>>>> Hi All,
>>>>>
>>>>> While deploying test static executable across farm of different
>>>>> embedded systems, found out that pthread_create() is failing
>>>>> systematically on some (very few) arm-linux devices whenever non
>>>>> null stack guard is enabled (that is, also when calling
>>>>> pthread_create with default - i.e. null - attributes since default
>>>>> is a one page of guard). One of those device is for example a
>>>>> Marvell Armada 375 running Linux 3.10.39. Same test code, built with
>>>>> alternative libc implementations (glibc, uClibc) works as expected
>>>>> on those devices.
>>>>>
>>>>>
>>>>> Issue
>>>>>
>>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>>> In current implementation, if guard size is non null, memory for
>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>>> systematically fails in this scenario (returning error code EINVAL),
>>>>> it is impossible to create thread.
>>>> Failure is ignored and the memory is assumed to be writable in this
>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>>> wrong in your case, and if so, can you explain why?
>>> In my case, devices exhibiting issue are not MMU-less, they are
>>> Cortex-A9 devices with valid mmu / page protection working as
>>> expected otherwise. Note that current Musl code assumes ENOSYS means
>>> no MMU and handles it by assuming the system has no page protection
>>> at all. For the case I observe, it is EINVAL which is returned, this
>>> is not ignored, so memory is unmap'ed and pthread_create() fails.
>> In that case I think this is a kernel bug. Do you know why EINVAL is
>> happening? If there's an MMU, Linux should be able to replace the
>> anon PROT_NONE pages with anon RW pages.
>
> Agree. Unfortunately, those are devices we don't built the kernel for,
> so have been hardly able to track issue deeper. The point is however
> that such devices with this issue in kernel might not be that
> uncommon, and it concretely means impossibility at that moment to
> deploy on them a functional static executable built with musl.
>>
>>>>> In proposed patch (attached below), memory for (guard + stack + ...)
>>>>> is first mmap'ed with read/write accessibility, then guard area is
>>>>> protected by calling mprotect() with PROT_NONE on guardsize first
>>>>> bytes of returned memory. This call to mprotect() to remove all
>>>>> accessibility on guard area, with guard area being at beginning of
>>>>> previously mmap'ed memory, works correctly on those platforms having
>>>>> issue with current implementation. Incidentally, this makes the
>>>>> logic more concise to handle both cases (with or without guard) is a
>>>>> more consistent way, and handle systems with partial/invalid page
>>>>> protection implementation (e.g. mprotect() returning ENOSYS) more
>>>>> gracefully since the stack is explicitly created with read/write
>>>>> access.
>>>> This doesn't work correctly on normal systems with mmu, because the
>>>> size of the guard pages is accounted against commit charge. Linux
>>>> should, but AFAIK doesn't, subtract it from commit charge once it's
>>>> changed to PROT_NONE without having been dirtied, but even if this bug
>>>> is fixed on the kernel side, there would still be a moment where
>>>> excess commit charge is consumed and thus where pthread_create might
>>>> spuriously fail or cause allocations in other processes/threads to
>>>> fail.
>>>>
>>>> If the kernel is not allocating actually-usable address ranges for
>>>> PROT_NONE on all nommu systems, I think the only solution is to handle
>>>> EINVAL from mprotect by going back and re-doing the mmap with
>>>> PROT_READ|PROT_WRITE. Do you have any better ideas?
>>>>
>>>> Rich
>>> Had this "deja vu" feeling... reminds me conversation you had in
>>> this thread some time ago elsewhere...
>>> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
>>>
>>> Your proposition seems reasonable on nommu system, but again, the
>>> issue observed here is on legit systems with mmu, with mprotect
>>> failing with EINVAL (and not ENOSYS), for some other reason than
>>> system not supporting page protection. Catching EINVAL error
>>> returned by mprotect and falling back to re-doing the mmap would
>>> mean actually silently running without stack guard on system
>>> supporting it, so I believe it is actually legitimate to fail and
>>> return error in that case. But that's difference use case than the
>>> issue I'm observing.
>>>
>>> I took note of Balazs's suggestion (in the thread referenced above)
>>> to switch to a pattern similar to Musl's current one
>>> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
>>> those guard pages to actually occupy resources. But I can indeed
>>> observe that this approach fails on some devices (which have valid
>>> mmu), while I'm not sure I'm seeing the issue with first mapping
>>> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
>>> approach (as implemented by patch) is, at least, consistent with all
>>> the other implementations out there (I checked glibc's
>>> allocatestack.c, but also e.g. bionic), and couldn't find report of
>>> those failures you are envisaging.
>> Consider the case of guard_size=128M stack_size=128k with
>> Commit_Limit=128M. This will fail with your approach but works
>> perfectly well now.
> Right, makes sense. Though I would point out that such uncommon
> scenario for an application would fail when linked with anything but
> musl, since "my approach" is the widespread one across all other libc
> implementations (glibc, bionic, ...). But I understand both approaches
> aim at working around issue in some corner case while creating issue
> in other rare one, and so at the end it's all about deciding which
> edge case is more critical than the other, and whether
> consistency/compatibility with existing 3rd parties solutions matters
> or not.
>
> But in order to get closer to "having the cake and eat it too", please
> find attached another patch implementing the "unmaping and re-doing"
> strategy you initially suggested, i.e. starting with current approach,
> then giving it a second chance using "my" approach if and only if
> issue with mprotect() is detected. Should be consistent with current
> behavior on any system working correctly today, and just provide a
> plan B falling back to more common approach only when needed. Does
> that sound more acceptable?
>
> Thanks,
>
> Eric
>
> ----
> [PATCH] set up guard protection after stack is mapped
>
> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack
> fails on
> some devices with buggy kernel, making it impossible to create
> thread. if such condition is met, fall back to allocating memory
> for stack and guard first, with read and write permission, then
> protect guard
> area.
> ---
> src/thread/pthread_create.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 49f2f72..d3c030b 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
> const pthread_attr_t *restrict att
> if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
> && errno != ENOSYS) {
> __munmap(map, size);
> - goto fail;
> + if (errno == EINVAL) {
> + map = __mmap(0, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANON, -1, 0);
> + if (map == MAP_FAILED) goto fail;
> + if (__mprotect(map, guard, PROT_NONE)
> + && errno != ENOSYS) {
> + __munmap(map, size);
> + goto fail;
> + }
> + } else {
> + goto fail;
> + }
> }
> } else {
> map = __mmap(0, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANON, -1, 0);
Hi Rich,
Pinging... any comment, feedback or concern about latest version of the
patch, attached above, keeping current behavior but falling back to
(mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only if
current approach detected to fail) ?
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-30 21:30 ` Eric Hassold
@ 2017-01-30 23:13 ` Rich Felker
2017-01-31 2:52 ` Eric Hassold
0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-30 23:13 UTC (permalink / raw)
To: musl
On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote:
> >>>>>This occurs because of call to mprotect() in pthread_create fails.
> >>>>>In current implementation, if guard size is non null, memory for
> >>>>>(guard + stack + ...) is first allocated (mmap'ed) with no
> >>>>>accessibility (PROT_NONE), then mprotect() is called to re-enable
> >>>>>read/write access to (memory + guardsize). Since call to mprotect()
> >>>>>systematically fails in this scenario (returning error code EINVAL),
> >>>>>it is impossible to create thread.
> >>>>Failure is ignored and the memory is assumed to be writable in this
> >>>>case, since EINVAL is assumed to imply no MMU. Is this assumption
> >>>>wrong in your case, and if so, can you explain why?
> >>>In my case, devices exhibiting issue are not MMU-less, they are
> >>>Cortex-A9 devices with valid mmu / page protection working as
> >>>expected otherwise. Note that current Musl code assumes ENOSYS means
> >>>no MMU and handles it by assuming the system has no page protection
> >>>at all. For the case I observe, it is EINVAL which is returned, this
> >>>is not ignored, so memory is unmap'ed and pthread_create() fails.
> >>In that case I think this is a kernel bug. Do you know why EINVAL is
> >>happening? If there's an MMU, Linux should be able to replace the
> >>anon PROT_NONE pages with anon RW pages.
> >
> >Agree. Unfortunately, those are devices we don't built the kernel
> >for, so have been hardly able to track issue deeper. The point is
> >however that such devices with this issue in kernel might not be
> >that uncommon, and it concretely means impossibility at that
> >moment to deploy on them a functional static executable built with
> >musl.
> [...]
> Pinging... any comment, feedback or concern about latest version of
> the patch, attached above, keeping current behavior but falling back
> to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only
> if current approach detected to fail) ?
I still want to know what's going on on the kernel side, because it
looks like this a rogue/nonsensical patch to the kernel that breaks
mmap functionality in a general way that has nothing to do with the
specific cpu/board.
Rich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-30 23:13 ` Rich Felker
@ 2017-01-31 2:52 ` Eric Hassold
0 siblings, 0 replies; 15+ messages in thread
From: Eric Hassold @ 2017-01-31 2:52 UTC (permalink / raw)
To: musl
On 1/30/17 3:13 PM, Rich Felker wrote:
> On Mon, Jan 30, 2017 at 01:30:00PM -0800, Eric Hassold wrote:
>>>>>>> This occurs because of call to mprotect() in pthread_create fails.
>>>>>>> In current implementation, if guard size is non null, memory for
>>>>>>> (guard + stack + ...) is first allocated (mmap'ed) with no
>>>>>>> accessibility (PROT_NONE), then mprotect() is called to re-enable
>>>>>>> read/write access to (memory + guardsize). Since call to mprotect()
>>>>>>> systematically fails in this scenario (returning error code EINVAL),
>>>>>>> it is impossible to create thread.
>>>>>> Failure is ignored and the memory is assumed to be writable in this
>>>>>> case, since EINVAL is assumed to imply no MMU. Is this assumption
>>>>>> wrong in your case, and if so, can you explain why?
>>>>> In my case, devices exhibiting issue are not MMU-less, they are
>>>>> Cortex-A9 devices with valid mmu / page protection working as
>>>>> expected otherwise. Note that current Musl code assumes ENOSYS means
>>>>> no MMU and handles it by assuming the system has no page protection
>>>>> at all. For the case I observe, it is EINVAL which is returned, this
>>>>> is not ignored, so memory is unmap'ed and pthread_create() fails.
>>>> In that case I think this is a kernel bug. Do you know why EINVAL is
>>>> happening? If there's an MMU, Linux should be able to replace the
>>>> anon PROT_NONE pages with anon RW pages.
>>> Agree. Unfortunately, those are devices we don't built the kernel
>>> for, so have been hardly able to track issue deeper. The point is
>>> however that such devices with this issue in kernel might not be
>>> that uncommon, and it concretely means impossibility at that
>>> moment to deploy on them a functional static executable built with
>>> musl.
>> [...]
>> Pinging... any comment, feedback or concern about latest version of
>> the patch, attached above, keeping current behavior but falling back
>> to (mmap(PROT_READ|PROT_WRITE) && mprotect(guard, none)) if and only
>> if current approach detected to fail) ?
> I still want to know what's going on on the kernel side, because it
> looks like this a rogue/nonsensical patch to the kernel that breaks
> mmap functionality in a general way that has nothing to do with the
> specific cpu/board.
>
> Rich
Yes, I understand this would be ideal, and wish too I would be able to
find out more about what's happening, beyond the symptomatic approach.
But as mentioned, we have very little leverage to investigate much
further what's going on, since issue wasn't reproducible on any device
we built kernel for, but only on a few devices we don't build those
kernels nor have easy way to perform some lower level debugging on those
platforms (no jtag access). Telemetry was reporting failure on pthread
creation on some devices, one of them we could reproduce in-house was a
Marvell 375 board used e.g. in Western Digital MyCloud devices. But it
seems hard to even just get reference to the actual GPL source of the
exact kernel used there, so I don't see a way to find out if it's a
broken patch introduced at one point by a vendor, or some transient
regression fixed at one point but still having devices shipped with this
oddity (kernel versions seems to be 3.10 branch, so one very rough guess
is that it may be related to multiple patches done to support huge TLB
on ARM at that time, but that's more of a guts feeling than supported by
facts).
Though I understand it is not the role of a libc implementation to work
around all kind of flaws in kernel implementations that vendors ever
shipped, I would find it valuable, in this specific case, to prevent
musl from consistently bubbling up an issue not (easily) reproducible
otherwise (e.g. with other libc implementations). One of the great value
of musl is to allow easy deployment across a variety of devices (not as
opened as we would ideally hope) of the same statically linked
executable. This suggested patch aims at supporting this use case, since
fixing kernel isn't an option there. Of course, this pragmatic
motivation conflicts somehow with the legitimate goal to keep musl
implementation lean and clean (from a functional standpoint, this
doesn't introduce any change on behavior on all non broken platforms).
So of course only you can value the pros (regarding deployment) and
decide whether it is worth the cons of slightly "bloating" source code
with alternative recovery path.
Thanks,
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-20 22:42 ` Eric Hassold
2017-01-30 21:30 ` Eric Hassold
@ 2017-01-31 3:58 ` Rich Felker
2017-01-31 21:18 ` Eric Hassold
1 sibling, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-01-31 3:58 UTC (permalink / raw)
To: musl
On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote:
> ----
> [PATCH] set up guard protection after stack is mapped
>
> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
> some devices with buggy kernel, making it impossible to create
> thread. if such condition is met, fall back to allocating memory
> for stack and guard first, with read and write permission, then
> protect guard
> area.
> ---
> src/thread/pthread_create.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 49f2f72..d3c030b 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
> const pthread_attr_t *restrict att
> if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
> && errno != ENOSYS) {
> __munmap(map, size);
> - goto fail;
> + if (errno == EINVAL) {
In principle __munmap could have clobbered errno at this point. I
don't think it will but to be safe errno should probably be saved.
> + map = __mmap(0, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANON, -1, 0);
> + if (map == MAP_FAILED) goto fail;
> + if (__mprotect(map, guard, PROT_NONE)
> + && errno != ENOSYS) {
> + __munmap(map, size);
> + goto fail;
> + }
> + } else {
> + goto fail;
> + }
Can you try instead falling back to mmap with MAP_FIXED over top of
the part that should be RW rather than unmapping and redoing the whole
thing? If that works it would preserve the property of avoiding excess
commit charge. If not then these kernels are very very broken. Either
way it might shed some more light on what's going on. I understand you
just want things to work, but I do also want to have some
understanding of _why_ we're going to be working around something
awful, if we really have to.
Rich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-31 3:58 ` Rich Felker
@ 2017-01-31 21:18 ` Eric Hassold
2017-01-31 22:44 ` Eric Hassold
0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-31 21:18 UTC (permalink / raw)
To: musl
On 1/30/17 7:58 PM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote:
>> ----
>> [PATCH] set up guard protection after stack is mapped
>>
>> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack fails on
>> some devices with buggy kernel, making it impossible to create
>> thread. if such condition is met, fall back to allocating memory
>> for stack and guard first, with read and write permission, then
>> protect guard
>> area.
>> ---
>> src/thread/pthread_create.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
>> index 49f2f72..d3c030b 100644
>> --- a/src/thread/pthread_create.c
>> +++ b/src/thread/pthread_create.c
>> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
>> const pthread_attr_t *restrict att
>> if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)
>> && errno != ENOSYS) {
>> __munmap(map, size);
>> - goto fail;
>> + if (errno == EINVAL) {
> In principle __munmap could have clobbered errno at this point. I
> don't think it will but to be safe errno should probably be saved.
>
>> + map = __mmap(0, size, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANON, -1, 0);
>> + if (map == MAP_FAILED) goto fail;
>> + if (__mprotect(map, guard, PROT_NONE)
>> + && errno != ENOSYS) {
>> + __munmap(map, size);
>> + goto fail;
>> + }
>> + } else {
>> + goto fail;
>> + }
> Can you try instead falling back to mmap with MAP_FIXED over top of
> the part that should be RW rather than unmapping and redoing the whole
> thing? If that works it would preserve the property of avoiding excess
> commit charge. If not then these kernels are very very broken. Either
> way it might shed some more light on what's going on. I understand you
> just want things to work, but I do also want to have some
> understanding of _why_ we're going to be working around something
> awful, if we really have to.
>
> Rich
mmap(MAP_FIXED) at the address returned by first mmap(PROT_NONE) and
then mprotect() indeed works.
BUT... actually kept exploring the idea that this might just be an issue
with page alignment. And found out that forcing the guard size to be
32k-aligned make current approach work well. So checked and found out
that on systems where issue happens, the actual page size determined
dynamically at runtime (using "getinfo PAGESIZE", and confirmed using
ratio of Mapped memory / nr_mapped pages from /proc/vmstat) is indeed
32k, whereas Musl has an hardcoded at compile time 4k page size for arm
architecture (with same eventually incorrect value returned by
getpagesize() and sysconf(__SC_PAGESIZE)).
Long story short, no odd behavior of the mmap in kernel, but Musl not
supporting correctly non-4k page systems, impacting pthread_create but
probably also malloc and mprotect. Given non-4k page systems are getting
more and more common (on arm and arm64 architectures at least), this
might be an important issue to fix. Should I start new "support for
non-4k page systems" discussion here ?
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-31 21:18 ` Eric Hassold
@ 2017-01-31 22:44 ` Eric Hassold
2017-02-01 9:52 ` Szabolcs Nagy
0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-01-31 22:44 UTC (permalink / raw)
To: musl
On 1/31/17 1:18 PM, Eric Hassold wrote:
>
> On 1/30/17 7:58 PM, Rich Felker wrote:
>> On Fri, Jan 20, 2017 at 02:42:01PM -0800, Eric Hassold wrote:
>>> ----
>>> [PATCH] set up guard protection after stack is mapped
>>>
>>> calling mprotect beyond the guard page of PROT_NONE mmap-ed stack
>>> fails on
>>> some devices with buggy kernel, making it impossible to create
>>> thread. if such condition is met, fall back to allocating memory
>>> for stack and guard first, with read and write permission, then
>>> protect guard
>>> area.
>>> ---
>>> src/thread/pthread_create.c | 12 +++++++++++-
>>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
>>> index 49f2f72..d3c030b 100644
>>> --- a/src/thread/pthread_create.c
>>> +++ b/src/thread/pthread_create.c
>>> @@ -242,7 +242,17 @@ int __pthread_create(pthread_t *restrict res,
>>> const pthread_attr_t *restrict att
>>> if (__mprotect(map+guard, size-guard,
>>> PROT_READ|PROT_WRITE)
>>> && errno != ENOSYS) {
>>> __munmap(map, size);
>>> - goto fail;
>>> + if (errno == EINVAL) {
>> In principle __munmap could have clobbered errno at this point. I
>> don't think it will but to be safe errno should probably be saved.
>>
>>> + map = __mmap(0, size, PROT_READ|PROT_WRITE,
>>> MAP_PRIVATE|MAP_ANON, -1, 0);
>>> + if (map == MAP_FAILED) goto fail;
>>> + if (__mprotect(map, guard, PROT_NONE)
>>> + && errno != ENOSYS) {
>>> + __munmap(map, size);
>>> + goto fail;
>>> + }
>>> + } else {
>>> + goto fail;
>>> + }
>> Can you try instead falling back to mmap with MAP_FIXED over top of
>> the part that should be RW rather than unmapping and redoing the whole
>> thing? If that works it would preserve the property of avoiding excess
>> commit charge. If not then these kernels are very very broken. Either
>> way it might shed some more light on what's going on. I understand you
>> just want things to work, but I do also want to have some
>> understanding of _why_ we're going to be working around something
>> awful, if we really have to.
>>
>> Rich
>
> mmap(MAP_FIXED) at the address returned by first mmap(PROT_NONE) and
> then mprotect() indeed works.
>
> BUT... actually kept exploring the idea that this might just be an
> issue with page alignment. And found out that forcing the guard size
> to be 32k-aligned make current approach work well. So checked and
> found out that on systems where issue happens, the actual page size
> determined dynamically at runtime (using "getinfo PAGESIZE", and
> confirmed using ratio of Mapped memory / nr_mapped pages from
> /proc/vmstat) is indeed 32k, whereas Musl has an hardcoded at compile
> time 4k page size for arm architecture (with same eventually incorrect
> value returned by getpagesize() and sysconf(__SC_PAGESIZE)).
>
> Long story short, no odd behavior of the mmap in kernel, but Musl not
> supporting correctly non-4k page systems, impacting pthread_create but
> probably also malloc and mprotect. Given non-4k page systems are
> getting more and more common (on arm and arm64 architectures at
> least), this might be an important issue to fix. Should I start new
> "support for non-4k page systems" discussion here ?
>
> Eric
>
Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h
makes PAGE_SIZE to be defined by internal/libc.h as libc.page_size which
is initialized correctly at runtime, making the correct page size be
used all around musl. Since a page size of 4k is no more a valid
assumption on arm-linux target, is removing this compile time define a
right fix?
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-01-31 22:44 ` Eric Hassold
@ 2017-02-01 9:52 ` Szabolcs Nagy
2017-02-01 18:21 ` Eric Hassold
0 siblings, 1 reply; 15+ messages in thread
From: Szabolcs Nagy @ 2017-02-01 9:52 UTC (permalink / raw)
To: musl
* Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]:
> On 1/31/17 1:18 PM, Eric Hassold wrote:
> > Long story short, no odd behavior of the mmap in kernel, but Musl not
> > supporting correctly non-4k page systems, impacting pthread_create but
> > probably also malloc and mprotect. Given non-4k page systems are getting
> > more and more common (on arm and arm64 architectures at least), this
> > might be an important issue to fix. Should I start new "support for
> > non-4k page systems" discussion here ?
> >
> > Eric
> >
> Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes
> PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is
> initialized correctly at runtime, making the correct page size be used all
> around musl. Since a page size of 4k is no more a valid assumption on
> arm-linux target, is removing this compile time define a right fix?
it is ok to remove PAGE_SIZE but i thought arm elf
binaries would depend on the 4k page size.. making it
less useful to support other page sizes, but if 32k
works in practice then i think musl should support it.
i guess the boundary of unaligned sections will be
mapped twice which is not a big issue except may be
you don't get relro support if the readonly relocs
are 4k aligned.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-02-01 9:52 ` Szabolcs Nagy
@ 2017-02-01 18:21 ` Eric Hassold
2017-02-01 18:35 ` Rich Felker
0 siblings, 1 reply; 15+ messages in thread
From: Eric Hassold @ 2017-02-01 18:21 UTC (permalink / raw)
To: musl
On 2/1/17 1:52 AM, Szabolcs Nagy wrote:
> * Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]:
>> On 1/31/17 1:18 PM, Eric Hassold wrote:
>>> Long story short, no odd behavior of the mmap in kernel, but Musl not
>>> supporting correctly non-4k page systems, impacting pthread_create but
>>> probably also malloc and mprotect. Given non-4k page systems are getting
>>> more and more common (on arm and arm64 architectures at least), this
>>> might be an important issue to fix. Should I start new "support for
>>> non-4k page systems" discussion here ?
>>>
>>> Eric
>>>
>> Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes
>> PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is
>> initialized correctly at runtime, making the correct page size be used all
>> around musl. Since a page size of 4k is no more a valid assumption on
>> arm-linux target, is removing this compile time define a right fix?
> it is ok to remove PAGE_SIZE but i thought arm elf
> binaries would depend on the 4k page size.. making it
> less useful to support other page sizes, but if 32k
> works in practice then i think musl should support it.
Binutils was updated couple years ago to support up to 64KB page size
and default text alignment to 64kB boundary for armel and match changes
in kernel:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7572ca8989ead4c3425a1500bc241eaaeffa2c89
So yes, executables built with older toolchains can't be deployed as is
on such 32kB pagesize systems, but no issue with with executables built
with recent (binutils 2.27 && gcc 5.4) toolchain we are using.
Any further action on my side to proceed? I guess no much value in
submitting a patch though mail here for such one liner change...
Eric
>
> i guess the boundary of unaligned sections will be
> mapped twice which is not a big issue except may be
> you don't get relro support if the readonly relocs
> are 4k aligned.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-02-01 18:21 ` Eric Hassold
@ 2017-02-01 18:35 ` Rich Felker
2017-02-01 18:52 ` Eric Hassold
0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2017-02-01 18:35 UTC (permalink / raw)
To: musl
On Wed, Feb 01, 2017 at 10:21:40AM -0800, Eric Hassold wrote:
>
> On 2/1/17 1:52 AM, Szabolcs Nagy wrote:
> >* Eric Hassold <hassold@gmail.com> [2017-01-31 14:44:52 -0800]:
> >>On 1/31/17 1:18 PM, Eric Hassold wrote:
> >>>Long story short, no odd behavior of the mmap in kernel, but Musl not
> >>>supporting correctly non-4k page systems, impacting pthread_create but
> >>>probably also malloc and mprotect. Given non-4k page systems are getting
> >>>more and more common (on arm and arm64 architectures at least), this
> >>>might be an important issue to fix. Should I start new "support for
> >>>non-4k page systems" discussion here ?
> >>>
> >>>Eric
> >>>
> >>Just removing the "#define PAGE_SIZE 4096" from arch/arm/bits/limits.h makes
> >>PAGE_SIZE to be defined by internal/libc.h as libc.page_size which is
> >>initialized correctly at runtime, making the correct page size be used all
> >>around musl. Since a page size of 4k is no more a valid assumption on
> >>arm-linux target, is removing this compile time define a right fix?
> >it is ok to remove PAGE_SIZE but i thought arm elf
> >binaries would depend on the 4k page size.. making it
> >less useful to support other page sizes, but if 32k
> >works in practice then i think musl should support it.
>
> Binutils was updated couple years ago to support up to 64KB page
> size and default text alignment to 64kB boundary for armel and match
> changes in kernel:
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=7572ca8989ead4c3425a1500bc241eaaeffa2c89
>
> So yes, executables built with older toolchains can't be deployed as
> is on such 32kB pagesize systems, but no issue with with executables
> built with recent (binutils 2.27 && gcc 5.4) toolchain we are
> using.
OK, so this was a generalization of the ABI that we weren't aware
about. Bleh. But if the rest of the world is no longer considering 4k
page size part of the ABI, we should update musl to reflect this.
> Any further action on my side to proceed? I guess no much value in
> submitting a patch though mail here for such one liner change...
No further action needed. Thanks for being patient with my hesitance
to accept the initial patch and for working through this to find the
real problem.
Rich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix pthread_create on some devices failing to initialize guard area
2017-02-01 18:35 ` Rich Felker
@ 2017-02-01 18:52 ` Eric Hassold
0 siblings, 0 replies; 15+ messages in thread
From: Eric Hassold @ 2017-02-01 18:52 UTC (permalink / raw)
To: musl
On 2/1/17 10:35 AM, Rich Felker wrote:
> OK, so this was a generalization of the ABI that we weren't aware
> about. Bleh. But if the rest of the world is no longer considering 4k
> page size part of the ABI, we should update musl to reflect this.
>
>> Any further action on my side to proceed? I guess no much value in
>> submitting a patch though mail here for such one liner change...
> No further action needed. Thanks for being patient with my hesitance
> to accept the initial patch and for working through this to find the
> real problem.
>
> Rich
Your "hesitance" reflects your consistent rigor in asking the right
questions, which allows Musl to be the highly robust and clean codebase
we all enjoy, and created incentive to nail the actual root cause of the
issue down. So your hesitance is much appreciated.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-02-01 18:52 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-20 19:45 Fix pthread_create on some devices failing to initialize guard area Eric Hassold
2017-01-20 19:56 ` Rich Felker
2017-01-20 21:04 ` Eric Hassold
2017-01-20 21:29 ` Rich Felker
2017-01-20 22:42 ` Eric Hassold
2017-01-30 21:30 ` Eric Hassold
2017-01-30 23:13 ` Rich Felker
2017-01-31 2:52 ` Eric Hassold
2017-01-31 3:58 ` Rich Felker
2017-01-31 21:18 ` Eric Hassold
2017-01-31 22:44 ` Eric Hassold
2017-02-01 9:52 ` Szabolcs Nagy
2017-02-01 18:21 ` Eric Hassold
2017-02-01 18:35 ` Rich Felker
2017-02-01 18:52 ` Eric Hassold
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).