* [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc @ 2024-08-29 20:54 Rich Felker 2024-08-31 9:29 ` Szabolcs Nagy 0 siblings, 1 reply; 7+ messages in thread From: Rich Felker @ 2024-08-29 20:54 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: linux-api, libc-alpha, musl As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)), such that it's safe to pass as a size to sigaltstack. However, this is not how the kernel actually implements it. At least on x86 and powerpc, the kernel fills it via get_sigframe_size, which computes the size of the sigcontext/siginfo/etc to be pushed and uses that directly, without allowing any space for actual execution, and without ensuring the value is at least as large as the legacy constant MINSIGSTKSZ. This leads to two problems: 1. If userspace uses the value without clamping it not-below MINSIGSTKSZ, sigaltstack will fail with ENOMEM. 2. If the kernel needs more space than MINSIGSTKSZ just for the signal frame structures, userspace that trusts AT_MINSIGSTKSZ will only allocate enough for the frame, and the program will immediately crash/stack-overflow once execution passes to userspace. Since existing kernels in the wild can't be fixed, and since it looks like the problem is just that the kernel chose a poor definition of AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work around the problem, adding a per-arch correction term to AT_MINSIGSTKSZ that's basically equal to: legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw such that adding the correction term would reproduce the expected value MINSIGSTKSZ. The only question is whether the kernel will commit to keeping this behavior, or whether it would be "fixed" to include all the needed working space when they eventually decide they want bigger stacks for some new register file bloat. I think keeping the current behavior, so we can just add a fixed offset, is probably the best thing to do. Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc 2024-08-29 20:54 [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc Rich Felker @ 2024-08-31 9:29 ` Szabolcs Nagy 2024-08-31 15:02 ` Rich Felker 0 siblings, 1 reply; 7+ messages in thread From: Szabolcs Nagy @ 2024-08-31 9:29 UTC (permalink / raw) To: Rich Felker; +Cc: Linux Kernel Mailing List, linux-api, libc-alpha, musl * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]: > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)), > such that it's safe to pass as a size to sigaltstack. However, this is > not how the kernel actually implements it. At least on x86 and > powerpc, the kernel fills it via get_sigframe_size, which computes the > size of the sigcontext/siginfo/etc to be pushed and uses that > directly, without allowing any space for actual execution, and without > ensuring the value is at least as large as the legacy constant > MINSIGSTKSZ. This leads to two problems: > > 1. If userspace uses the value without clamping it not-below > MINSIGSTKSZ, sigaltstack will fail with ENOMEM. > > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal > frame structures, userspace that trusts AT_MINSIGSTKSZ will only > allocate enough for the frame, and the program will immediately > crash/stack-overflow once execution passes to userspace. > > Since existing kernels in the wild can't be fixed, and since it looks > like the problem is just that the kernel chose a poor definition of > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work > around the problem, adding a per-arch correction term to > AT_MINSIGSTKSZ that's basically equal to: > > legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw > > such that adding the correction term would reproduce the expected > value MINSIGSTKSZ. > > The only question is whether the kernel will commit to keeping this > behavior, or whether it would be "fixed" to include all the needed > working space when they eventually decide they want bigger stacks for > some new register file bloat. I think keeping the current behavior, so > we can just add a fixed offset, is probably the best thing to do. i think it makes sense that the kernel sets AT_MINSIGSTKSZ according to what the kernel needs (signal frame size) anything beyond that is up to userspace requirements (e.g. the kernel cannot know if the libc wraps signal handlers) it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ) according to posix or backward compat requirements. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc 2024-08-31 9:29 ` Szabolcs Nagy @ 2024-08-31 15:02 ` Rich Felker 2024-08-31 15:09 ` H.J. Lu 0 siblings, 1 reply; 7+ messages in thread From: Rich Felker @ 2024-08-31 15:02 UTC (permalink / raw) To: Linux Kernel Mailing List, linux-api, libc-alpha, musl On Sat, Aug 31, 2024 at 11:29:02AM +0200, Szabolcs Nagy wrote: > * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]: > > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a > > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)), > > such that it's safe to pass as a size to sigaltstack. However, this is > > not how the kernel actually implements it. At least on x86 and > > powerpc, the kernel fills it via get_sigframe_size, which computes the > > size of the sigcontext/siginfo/etc to be pushed and uses that > > directly, without allowing any space for actual execution, and without > > ensuring the value is at least as large as the legacy constant > > MINSIGSTKSZ. This leads to two problems: > > > > 1. If userspace uses the value without clamping it not-below > > MINSIGSTKSZ, sigaltstack will fail with ENOMEM. > > > > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal > > frame structures, userspace that trusts AT_MINSIGSTKSZ will only > > allocate enough for the frame, and the program will immediately > > crash/stack-overflow once execution passes to userspace. > > > > Since existing kernels in the wild can't be fixed, and since it looks > > like the problem is just that the kernel chose a poor definition of > > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work > > around the problem, adding a per-arch correction term to > > AT_MINSIGSTKSZ that's basically equal to: > > > > legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw > > > > such that adding the correction term would reproduce the expected > > value MINSIGSTKSZ. > > > > The only question is whether the kernel will commit to keeping this > > behavior, or whether it would be "fixed" to include all the needed > > working space when they eventually decide they want bigger stacks for > > some new register file bloat. I think keeping the current behavior, so > > we can just add a fixed offset, is probably the best thing to do. > > i think it makes sense that the kernel sets AT_MINSIGSTKSZ > according to what the kernel needs (signal frame size) > anything beyond that is up to userspace requirements (e.g. > the kernel cannot know if the libc wraps signal handlers) > > it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ) > according to posix or backward compat requirements. I think this is a reasonable viea and means the aux key was just very poorly named. It should have been called something like AT_SIGFRAMESIZE to indicate to the userspace-side consumer that it's not a suitable value for MINSIGSTKSZ, only a contributing term for it. Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc 2024-08-31 15:02 ` Rich Felker @ 2024-08-31 15:09 ` H.J. Lu 2024-08-31 15:41 ` Rich Felker 0 siblings, 1 reply; 7+ messages in thread From: H.J. Lu @ 2024-08-31 15:09 UTC (permalink / raw) To: Rich Felker; +Cc: Linux Kernel Mailing List, linux-api, libc-alpha, musl On Sat, Aug 31, 2024 at 8:03 AM Rich Felker <dalias@libc.org> wrote: > > On Sat, Aug 31, 2024 at 11:29:02AM +0200, Szabolcs Nagy wrote: > > * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]: > > > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a > > > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)), > > > such that it's safe to pass as a size to sigaltstack. However, this is > > > not how the kernel actually implements it. At least on x86 and > > > powerpc, the kernel fills it via get_sigframe_size, which computes the > > > size of the sigcontext/siginfo/etc to be pushed and uses that > > > directly, without allowing any space for actual execution, and without > > > ensuring the value is at least as large as the legacy constant > > > MINSIGSTKSZ. This leads to two problems: > > > > > > 1. If userspace uses the value without clamping it not-below > > > MINSIGSTKSZ, sigaltstack will fail with ENOMEM. > > > > > > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal > > > frame structures, userspace that trusts AT_MINSIGSTKSZ will only > > > allocate enough for the frame, and the program will immediately > > > crash/stack-overflow once execution passes to userspace. > > > > > > Since existing kernels in the wild can't be fixed, and since it looks > > > like the problem is just that the kernel chose a poor definition of > > > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work > > > around the problem, adding a per-arch correction term to > > > AT_MINSIGSTKSZ that's basically equal to: > > > > > > legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw > > > > > > such that adding the correction term would reproduce the expected > > > value MINSIGSTKSZ. > > > > > > The only question is whether the kernel will commit to keeping this > > > behavior, or whether it would be "fixed" to include all the needed > > > working space when they eventually decide they want bigger stacks for > > > some new register file bloat. I think keeping the current behavior, so > > > we can just add a fixed offset, is probably the best thing to do. > > > > i think it makes sense that the kernel sets AT_MINSIGSTKSZ > > according to what the kernel needs (signal frame size) > > anything beyond that is up to userspace requirements (e.g. > > the kernel cannot know if the libc wraps signal handlers) > > > > it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ) > > according to posix or backward compat requirements. > > I think this is a reasonable viea and means the aux key was just very > poorly named. It should have been called something like > AT_SIGFRAMESIZE to indicate to the userspace-side consumer that it's > not a suitable value for MINSIGSTKSZ, only a contributing term for it. > > Rich glibc manual has ‘_SC_MINSIGSTKSZ’ Inquire about the minimum number of bytes of free stack space required in order to guarantee successful, non-nested handling of a single signal whose handler is an empty function. ‘MINSIGSTKSZ’ This is the amount of signal stack space the operating system needs just to implement signal delivery. The size of a signal stack *must* be greater than this. For most cases, just using ‘SIGSTKSZ’ for ‘ss_size’ is sufficient. But if you know how much stack space your program's signal handlers will need, you may want to use a different size. In this case, you should allocate ‘MINSIGSTKSZ’ additional bytes for the signal stack and increase ‘ss_size’ accordingly. -- H.J. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc 2024-08-31 15:09 ` H.J. Lu @ 2024-08-31 15:41 ` Rich Felker 2024-09-02 12:07 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Rich Felker @ 2024-08-31 15:41 UTC (permalink / raw) To: H.J. Lu; +Cc: Linux Kernel Mailing List, linux-api, libc-alpha, musl On Sat, Aug 31, 2024 at 08:09:49AM -0700, H.J. Lu wrote: > On Sat, Aug 31, 2024 at 8:03 AM Rich Felker <dalias@libc.org> wrote: > > > > On Sat, Aug 31, 2024 at 11:29:02AM +0200, Szabolcs Nagy wrote: > > > * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]: > > > > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a > > > > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)), > > > > such that it's safe to pass as a size to sigaltstack. However, this is > > > > not how the kernel actually implements it. At least on x86 and > > > > powerpc, the kernel fills it via get_sigframe_size, which computes the > > > > size of the sigcontext/siginfo/etc to be pushed and uses that > > > > directly, without allowing any space for actual execution, and without > > > > ensuring the value is at least as large as the legacy constant > > > > MINSIGSTKSZ. This leads to two problems: > > > > > > > > 1. If userspace uses the value without clamping it not-below > > > > MINSIGSTKSZ, sigaltstack will fail with ENOMEM. > > > > > > > > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal > > > > frame structures, userspace that trusts AT_MINSIGSTKSZ will only > > > > allocate enough for the frame, and the program will immediately > > > > crash/stack-overflow once execution passes to userspace. > > > > > > > > Since existing kernels in the wild can't be fixed, and since it looks > > > > like the problem is just that the kernel chose a poor definition of > > > > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work > > > > around the problem, adding a per-arch correction term to > > > > AT_MINSIGSTKSZ that's basically equal to: > > > > > > > > legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw > > > > > > > > such that adding the correction term would reproduce the expected > > > > value MINSIGSTKSZ. > > > > > > > > The only question is whether the kernel will commit to keeping this > > > > behavior, or whether it would be "fixed" to include all the needed > > > > working space when they eventually decide they want bigger stacks for > > > > some new register file bloat. I think keeping the current behavior, so > > > > we can just add a fixed offset, is probably the best thing to do. > > > > > > i think it makes sense that the kernel sets AT_MINSIGSTKSZ > > > according to what the kernel needs (signal frame size) > > > anything beyond that is up to userspace requirements (e.g. > > > the kernel cannot know if the libc wraps signal handlers) > > > > > > it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ) > > > according to posix or backward compat requirements. > > > > I think this is a reasonable viea and means the aux key was just very > > poorly named. It should have been called something like > > AT_SIGFRAMESIZE to indicate to the userspace-side consumer that it's > > not a suitable value for MINSIGSTKSZ, only a contributing term for it. > > > > Rich > > glibc manual has > > ‘_SC_MINSIGSTKSZ’ > > Inquire about the minimum number of bytes of free stack space > required in order to guarantee successful, non-nested handling of a > single signal whose handler is an empty function. > > ‘MINSIGSTKSZ’ > This is the amount of signal stack space the operating > system needs just to implement signal delivery. The size > of a signal stack *must* be greater than this. > > For most cases, just using ‘SIGSTKSZ’ for ‘ss_size’ is > sufficient. But if you know how much stack space your > program's signal handlers will need, you may want to use > a different size. In this case, you should allocate > ‘MINSIGSTKSZ’ additional bytes for the signal stack and > increase ‘ss_size’ accordingly. This is ambiguously worded (does "operating system" mean kernel?) and does not agree with POSIX, which defines it as: Minimum stack size for a signal handler. And otherwise just specifies that sigaltstack shall fail if given a smaller size. The POSIX definition is also underspecified but it's clear that it should be possible to execute at least a do-nothing signal handler (like one which immediately returns and whose sole purpose is to induce EINTR when intalled without SA_RESTART), or even a minimal one that does something like storing to a global variable, with such a small stack. Allowing a size where even a do-nothing signal handler results in a memory-clobbering overflow or access fault seems non-conforming to me. The historical sizes all allowed for 1k of execution space on top of what the historical signal frames consumed, and more for some archs. I don't think there's a POSIX contract to include that much, but I think there is a backwards-compatibility motivation to do so. Otherwise there will be application that were working when sysconf(_SC_MINSIGSTKSZ) historical_val as the result of max(value_from_aux,historial_val), but which break catastrophically as soon as value_from_aux is bigger than historical_val-sigframe_size. Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc 2024-08-31 15:41 ` Rich Felker @ 2024-09-02 12:07 ` Florian Weimer 2024-09-02 12:51 ` Rich Felker 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2024-09-02 12:07 UTC (permalink / raw) To: Rich Felker Cc: H.J. Lu, Linux Kernel Mailing List, linux-api, libc-alpha, musl * Rich Felker: > This is ambiguously worded (does "operating system" mean kernel?) and > does not agree with POSIX, which defines it as: > > Minimum stack size for a signal handler. > > And otherwise just specifies that sigaltstack shall fail if given a > smaller size. > > The POSIX definition is also underspecified but it's clear that it > should be possible to execute at least a do-nothing signal handler > (like one which immediately returns and whose sole purpose is to > induce EINTR when intalled without SA_RESTART), or even a minimal one > that does something like storing to a global variable, with such a > small stack. Allowing a size where even a do-nothing signal handler > results in a memory-clobbering overflow or access fault seems > non-conforming to me. POSIX does not specify what happens on a stack overflow (or more generally, if most resource limits are exceeded), so I think the behavior is conforming on a technicality. Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc 2024-09-02 12:07 ` Florian Weimer @ 2024-09-02 12:51 ` Rich Felker 0 siblings, 0 replies; 7+ messages in thread From: Rich Felker @ 2024-09-02 12:51 UTC (permalink / raw) To: Florian Weimer Cc: H.J. Lu, Linux Kernel Mailing List, linux-api, libc-alpha, musl On Mon, Sep 02, 2024 at 02:07:36PM +0200, Florian Weimer wrote: > * Rich Felker: > > > This is ambiguously worded (does "operating system" mean kernel?) and > > does not agree with POSIX, which defines it as: > > > > Minimum stack size for a signal handler. > > > > And otherwise just specifies that sigaltstack shall fail if given a > > smaller size. > > > > The POSIX definition is also underspecified but it's clear that it > > should be possible to execute at least a do-nothing signal handler > > (like one which immediately returns and whose sole purpose is to > > induce EINTR when intalled without SA_RESTART), or even a minimal one > > that does something like storing to a global variable, with such a > > small stack. Allowing a size where even a do-nothing signal handler > > results in a memory-clobbering overflow or access fault seems > > non-conforming to me. > > POSIX does not specify what happens on a stack overflow (or more > generally, if most resource limits are exceeded), so I think the > behavior is conforming on a technicality. It doesn't specify what happens on overflow. It does specify what happens on non-overflow: the program executes correctly. Failure to do that is the problem here, not failure to trap on fault. Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-09-02 12:51 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-08-29 20:54 [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc Rich Felker 2024-08-31 9:29 ` Szabolcs Nagy 2024-08-31 15:02 ` Rich Felker 2024-08-31 15:09 ` H.J. Lu 2024-08-31 15:41 ` Rich Felker 2024-09-02 12:07 ` Florian Weimer 2024-09-02 12:51 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).