mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
@ 2024-08-29 20:54 Rich Felker
  2024-08-31  9:29 ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2024-08-29 20:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: linux-api, libc-alpha, musl

As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a
suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)),
such that it's safe to pass as a size to sigaltstack. However, this is
not how the kernel actually implements it. At least on x86 and
powerpc, the kernel fills it via get_sigframe_size, which computes the
size of the sigcontext/siginfo/etc to be pushed and uses that
directly, without allowing any space for actual execution, and without
ensuring the value is at least as large as the legacy constant
MINSIGSTKSZ. This leads to two problems:

1. If userspace uses the value without clamping it not-below
   MINSIGSTKSZ, sigaltstack will fail with ENOMEM.

2. If the kernel needs more space than MINSIGSTKSZ just for the signal
   frame structures, userspace that trusts AT_MINSIGSTKSZ will only
   allocate enough for the frame, and the program will immediately
   crash/stack-overflow once execution passes to userspace.

Since existing kernels in the wild can't be fixed, and since it looks
like the problem is just that the kernel chose a poor definition of
AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work
around the problem, adding a per-arch correction term to
AT_MINSIGSTKSZ that's basically equal to:

    legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw

such that adding the correction term would reproduce the expected
value MINSIGSTKSZ.

The only question is whether the kernel will commit to keeping this
behavior, or whether it would be "fixed" to include all the needed
working space when they eventually decide they want bigger stacks for
some new register file bloat. I think keeping the current behavior, so
we can just add a fixed offset, is probably the best thing to do.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
  2024-08-29 20:54 [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc Rich Felker
@ 2024-08-31  9:29 ` Szabolcs Nagy
  2024-08-31 15:02   ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2024-08-31  9:29 UTC (permalink / raw)
  To: Rich Felker; +Cc: Linux Kernel Mailing List, linux-api, libc-alpha, musl

* Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]:
> As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a
> suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)),
> such that it's safe to pass as a size to sigaltstack. However, this is
> not how the kernel actually implements it. At least on x86 and
> powerpc, the kernel fills it via get_sigframe_size, which computes the
> size of the sigcontext/siginfo/etc to be pushed and uses that
> directly, without allowing any space for actual execution, and without
> ensuring the value is at least as large as the legacy constant
> MINSIGSTKSZ. This leads to two problems:
> 
> 1. If userspace uses the value without clamping it not-below
>    MINSIGSTKSZ, sigaltstack will fail with ENOMEM.
> 
> 2. If the kernel needs more space than MINSIGSTKSZ just for the signal
>    frame structures, userspace that trusts AT_MINSIGSTKSZ will only
>    allocate enough for the frame, and the program will immediately
>    crash/stack-overflow once execution passes to userspace.
> 
> Since existing kernels in the wild can't be fixed, and since it looks
> like the problem is just that the kernel chose a poor definition of
> AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work
> around the problem, adding a per-arch correction term to
> AT_MINSIGSTKSZ that's basically equal to:
> 
>     legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw
> 
> such that adding the correction term would reproduce the expected
> value MINSIGSTKSZ.
> 
> The only question is whether the kernel will commit to keeping this
> behavior, or whether it would be "fixed" to include all the needed
> working space when they eventually decide they want bigger stacks for
> some new register file bloat. I think keeping the current behavior, so
> we can just add a fixed offset, is probably the best thing to do.

i think it makes sense that the kernel sets AT_MINSIGSTKSZ
according to what the kernel needs (signal frame size)
anything beyond that is up to userspace requirements (e.g.
the kernel cannot know if the libc wraps signal handlers)

it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ)
according to posix or backward compat requirements.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
  2024-08-31  9:29 ` Szabolcs Nagy
@ 2024-08-31 15:02   ` Rich Felker
  2024-08-31 15:09     ` H.J. Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2024-08-31 15:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-api, libc-alpha, musl

On Sat, Aug 31, 2024 at 11:29:02AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]:
> > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a
> > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)),
> > such that it's safe to pass as a size to sigaltstack. However, this is
> > not how the kernel actually implements it. At least on x86 and
> > powerpc, the kernel fills it via get_sigframe_size, which computes the
> > size of the sigcontext/siginfo/etc to be pushed and uses that
> > directly, without allowing any space for actual execution, and without
> > ensuring the value is at least as large as the legacy constant
> > MINSIGSTKSZ. This leads to two problems:
> > 
> > 1. If userspace uses the value without clamping it not-below
> >    MINSIGSTKSZ, sigaltstack will fail with ENOMEM.
> > 
> > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal
> >    frame structures, userspace that trusts AT_MINSIGSTKSZ will only
> >    allocate enough for the frame, and the program will immediately
> >    crash/stack-overflow once execution passes to userspace.
> > 
> > Since existing kernels in the wild can't be fixed, and since it looks
> > like the problem is just that the kernel chose a poor definition of
> > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work
> > around the problem, adding a per-arch correction term to
> > AT_MINSIGSTKSZ that's basically equal to:
> > 
> >     legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw
> > 
> > such that adding the correction term would reproduce the expected
> > value MINSIGSTKSZ.
> > 
> > The only question is whether the kernel will commit to keeping this
> > behavior, or whether it would be "fixed" to include all the needed
> > working space when they eventually decide they want bigger stacks for
> > some new register file bloat. I think keeping the current behavior, so
> > we can just add a fixed offset, is probably the best thing to do.
> 
> i think it makes sense that the kernel sets AT_MINSIGSTKSZ
> according to what the kernel needs (signal frame size)
> anything beyond that is up to userspace requirements (e.g.
> the kernel cannot know if the libc wraps signal handlers)
> 
> it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ)
> according to posix or backward compat requirements.

I think this is a reasonable viea and means the aux key was just very
poorly named. It should have been called something like
AT_SIGFRAMESIZE to indicate to the userspace-side consumer that it's
not a suitable value for MINSIGSTKSZ, only a contributing term for it.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
  2024-08-31 15:02   ` Rich Felker
@ 2024-08-31 15:09     ` H.J. Lu
  2024-08-31 15:41       ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2024-08-31 15:09 UTC (permalink / raw)
  To: Rich Felker; +Cc: Linux Kernel Mailing List, linux-api, libc-alpha, musl

On Sat, Aug 31, 2024 at 8:03 AM Rich Felker <dalias@libc.org> wrote:
>
> On Sat, Aug 31, 2024 at 11:29:02AM +0200, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]:
> > > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a
> > > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)),
> > > such that it's safe to pass as a size to sigaltstack. However, this is
> > > not how the kernel actually implements it. At least on x86 and
> > > powerpc, the kernel fills it via get_sigframe_size, which computes the
> > > size of the sigcontext/siginfo/etc to be pushed and uses that
> > > directly, without allowing any space for actual execution, and without
> > > ensuring the value is at least as large as the legacy constant
> > > MINSIGSTKSZ. This leads to two problems:
> > >
> > > 1. If userspace uses the value without clamping it not-below
> > >    MINSIGSTKSZ, sigaltstack will fail with ENOMEM.
> > >
> > > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal
> > >    frame structures, userspace that trusts AT_MINSIGSTKSZ will only
> > >    allocate enough for the frame, and the program will immediately
> > >    crash/stack-overflow once execution passes to userspace.
> > >
> > > Since existing kernels in the wild can't be fixed, and since it looks
> > > like the problem is just that the kernel chose a poor definition of
> > > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work
> > > around the problem, adding a per-arch correction term to
> > > AT_MINSIGSTKSZ that's basically equal to:
> > >
> > >     legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw
> > >
> > > such that adding the correction term would reproduce the expected
> > > value MINSIGSTKSZ.
> > >
> > > The only question is whether the kernel will commit to keeping this
> > > behavior, or whether it would be "fixed" to include all the needed
> > > working space when they eventually decide they want bigger stacks for
> > > some new register file bloat. I think keeping the current behavior, so
> > > we can just add a fixed offset, is probably the best thing to do.
> >
> > i think it makes sense that the kernel sets AT_MINSIGSTKSZ
> > according to what the kernel needs (signal frame size)
> > anything beyond that is up to userspace requirements (e.g.
> > the kernel cannot know if the libc wraps signal handlers)
> >
> > it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ)
> > according to posix or backward compat requirements.
>
> I think this is a reasonable viea and means the aux key was just very
> poorly named. It should have been called something like
> AT_SIGFRAMESIZE to indicate to the userspace-side consumer that it's
> not a suitable value for MINSIGSTKSZ, only a contributing term for it.
>
> Rich

glibc manual has

‘_SC_MINSIGSTKSZ’

     Inquire about the minimum number of bytes of free stack space
     required in order to guarantee successful, non-nested handling of a
     single signal whose handler is an empty function.

          ‘MINSIGSTKSZ’
               This is the amount of signal stack space the operating
               system needs just to implement signal delivery.  The size
               of a signal stack *must* be greater than this.

               For most cases, just using ‘SIGSTKSZ’ for ‘ss_size’ is
               sufficient.  But if you know how much stack space your
               program's signal handlers will need, you may want to use
               a different size.  In this case, you should allocate
               ‘MINSIGSTKSZ’ additional bytes for the signal stack and
               increase ‘ss_size’ accordingly.


-- 
H.J.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
  2024-08-31 15:09     ` H.J. Lu
@ 2024-08-31 15:41       ` Rich Felker
  2024-09-02 12:07         ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2024-08-31 15:41 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Linux Kernel Mailing List, linux-api, libc-alpha, musl

On Sat, Aug 31, 2024 at 08:09:49AM -0700, H.J. Lu wrote:
> On Sat, Aug 31, 2024 at 8:03 AM Rich Felker <dalias@libc.org> wrote:
> >
> > On Sat, Aug 31, 2024 at 11:29:02AM +0200, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@libc.org> [2024-08-29 16:54:38 -0400]:
> > > > As I understand it, the AT_MINSIGSTKSZ auxv value is supposed to be a
> > > > suitable runtime value for MINSIGSTKSZ (sysconf(_SC_MINSIGSTKSZ)),
> > > > such that it's safe to pass as a size to sigaltstack. However, this is
> > > > not how the kernel actually implements it. At least on x86 and
> > > > powerpc, the kernel fills it via get_sigframe_size, which computes the
> > > > size of the sigcontext/siginfo/etc to be pushed and uses that
> > > > directly, without allowing any space for actual execution, and without
> > > > ensuring the value is at least as large as the legacy constant
> > > > MINSIGSTKSZ. This leads to two problems:
> > > >
> > > > 1. If userspace uses the value without clamping it not-below
> > > >    MINSIGSTKSZ, sigaltstack will fail with ENOMEM.
> > > >
> > > > 2. If the kernel needs more space than MINSIGSTKSZ just for the signal
> > > >    frame structures, userspace that trusts AT_MINSIGSTKSZ will only
> > > >    allocate enough for the frame, and the program will immediately
> > > >    crash/stack-overflow once execution passes to userspace.
> > > >
> > > > Since existing kernels in the wild can't be fixed, and since it looks
> > > > like the problem is just that the kernel chose a poor definition of
> > > > AT_MINSIGSTKSZ, I think userspace (glibc, musl, etc.) need to work
> > > > around the problem, adding a per-arch correction term to
> > > > AT_MINSIGSTKSZ that's basically equal to:
> > > >
> > > >     legacy_MINSIGSTKSZ - AT_MINSIGSTKSZ as returned on legacy hw
> > > >
> > > > such that adding the correction term would reproduce the expected
> > > > value MINSIGSTKSZ.
> > > >
> > > > The only question is whether the kernel will commit to keeping this
> > > > behavior, or whether it would be "fixed" to include all the needed
> > > > working space when they eventually decide they want bigger stacks for
> > > > some new register file bloat. I think keeping the current behavior, so
> > > > we can just add a fixed offset, is probably the best thing to do.
> > >
> > > i think it makes sense that the kernel sets AT_MINSIGSTKSZ
> > > according to what the kernel needs (signal frame size)
> > > anything beyond that is up to userspace requirements (e.g.
> > > the kernel cannot know if the libc wraps signal handlers)
> > >
> > > it's up to the libc to adjust sysconf(_SC_MINSIGSTKSZ)
> > > according to posix or backward compat requirements.
> >
> > I think this is a reasonable viea and means the aux key was just very
> > poorly named. It should have been called something like
> > AT_SIGFRAMESIZE to indicate to the userspace-side consumer that it's
> > not a suitable value for MINSIGSTKSZ, only a contributing term for it.
> >
> > Rich
> 
> glibc manual has
> 
> ‘_SC_MINSIGSTKSZ’
> 
>      Inquire about the minimum number of bytes of free stack space
>      required in order to guarantee successful, non-nested handling of a
>      single signal whose handler is an empty function.
> 
>           ‘MINSIGSTKSZ’
>                This is the amount of signal stack space the operating
>                system needs just to implement signal delivery.  The size
>                of a signal stack *must* be greater than this.
> 
>                For most cases, just using ‘SIGSTKSZ’ for ‘ss_size’ is
>                sufficient.  But if you know how much stack space your
>                program's signal handlers will need, you may want to use
>                a different size.  In this case, you should allocate
>                ‘MINSIGSTKSZ’ additional bytes for the signal stack and
>                increase ‘ss_size’ accordingly.

This is ambiguously worded (does "operating system" mean kernel?) and
does not agree with POSIX, which defines it as:

    Minimum stack size for a signal handler.

And otherwise just specifies that sigaltstack shall fail if given a
smaller size.

The POSIX definition is also underspecified but it's clear that it
should be possible to execute at least a do-nothing signal handler
(like one which immediately returns and whose sole purpose is to
induce EINTR when intalled without SA_RESTART), or even a minimal one
that does something like storing to a global variable, with such a
small stack. Allowing a size where even a do-nothing signal handler
results in a memory-clobbering overflow or access fault seems
non-conforming to me.

The historical sizes all allowed for 1k of execution space on top of
what the historical signal frames consumed, and more for some archs. I
don't think there's a POSIX contract to include that much, but I think
there is a backwards-compatibility motivation to do so. Otherwise
there will be application that were working when
sysconf(_SC_MINSIGSTKSZ) historical_val as the result of
max(value_from_aux,historial_val), but which break catastrophically as
soon as value_from_aux is bigger than historical_val-sigframe_size.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
  2024-08-31 15:41       ` Rich Felker
@ 2024-09-02 12:07         ` Florian Weimer
  2024-09-02 12:51           ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2024-09-02 12:07 UTC (permalink / raw)
  To: Rich Felker
  Cc: H.J. Lu, Linux Kernel Mailing List, linux-api, libc-alpha, musl

* Rich Felker:

> This is ambiguously worded (does "operating system" mean kernel?) and
> does not agree with POSIX, which defines it as:
>
>     Minimum stack size for a signal handler.
>
> And otherwise just specifies that sigaltstack shall fail if given a
> smaller size.
>
> The POSIX definition is also underspecified but it's clear that it
> should be possible to execute at least a do-nothing signal handler
> (like one which immediately returns and whose sole purpose is to
> induce EINTR when intalled without SA_RESTART), or even a minimal one
> that does something like storing to a global variable, with such a
> small stack. Allowing a size where even a do-nothing signal handler
> results in a memory-clobbering overflow or access fault seems
> non-conforming to me.

POSIX does not specify what happens on a stack overflow (or more
generally, if most resource limits are exceeded), so I think the
behavior is conforming on a technicality.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc
  2024-09-02 12:07         ` Florian Weimer
@ 2024-09-02 12:51           ` Rich Felker
  0 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2024-09-02 12:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: H.J. Lu, Linux Kernel Mailing List, linux-api, libc-alpha, musl

On Mon, Sep 02, 2024 at 02:07:36PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > This is ambiguously worded (does "operating system" mean kernel?) and
> > does not agree with POSIX, which defines it as:
> >
> >     Minimum stack size for a signal handler.
> >
> > And otherwise just specifies that sigaltstack shall fail if given a
> > smaller size.
> >
> > The POSIX definition is also underspecified but it's clear that it
> > should be possible to execute at least a do-nothing signal handler
> > (like one which immediately returns and whose sole purpose is to
> > induce EINTR when intalled without SA_RESTART), or even a minimal one
> > that does something like storing to a global variable, with such a
> > small stack. Allowing a size where even a do-nothing signal handler
> > results in a memory-clobbering overflow or access fault seems
> > non-conforming to me.
> 
> POSIX does not specify what happens on a stack overflow (or more
> generally, if most resource limits are exceeded), so I think the
> behavior is conforming on a technicality.

It doesn't specify what happens on overflow. It does specify what
happens on non-overflow: the program executes correctly. Failure to do
that is the problem here, not failure to trap on fault.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-09-02 12:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-29 20:54 [musl] AT_MINSIGSTKSZ mismatched interpretation kernel vs libc Rich Felker
2024-08-31  9:29 ` Szabolcs Nagy
2024-08-31 15:02   ` Rich Felker
2024-08-31 15:09     ` H.J. Lu
2024-08-31 15:41       ` Rich Felker
2024-09-02 12:07         ` Florian Weimer
2024-09-02 12:51           ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).