> Can you clarify how the whole system is breaking as a result of the
macro value?

i just tried again and the issue seems to come from initgroups (https://git.busybox.net/busybox/tree/libbb/change_identity.c?h=1_36_stable) so your commit might have fixed this particular issue.
Would it be possible to have a release of musl? Or alternatively is there a simple way to install musl master on Alpine to test if it fixed that problem?

The way to reproduce this issue is to simply:
# su - user
$ exit
# addgroup user kvm
# echo $?
0
# groups user | wc -w
33
# su - user
su: can't set groups

If you do that to root you can't login anymore and you've locked yourself out of your system. From what i remember on reboot you can't login to other users either.

> getting to the root cause and eliminating the technical debt behind it rather than making new technical debt to work around it.

Sure, i do agree that's the ideal thing to do. However this only works if someone (or a group of people) is actively working on fixing it everywhere. I personally don't have time to do that outside of OCaml.

> but I suspect there are actually only like 3 or 4 programs that are having any problem

I disagree. Just glancing through https://codesearch.debian.net/ the issue can be seen in Pulseaudio, Sendmail, libcap2, OCaml, lynx, openafs, thunar, nemo, nautilus, opendoas, …

From: Rich Felker <dalias@libc.org>
Sent: 13 September 2024 17:12
To: Kate Deplaix <kit-ty-kate@outlook.com>
Cc: musl@lists.openwall.com <musl@lists.openwall.com>
Subject: Re: [musl] [PATCH] Increase NGROUPS_MAX from 32 to 1024
 
On Fri, Sep 13, 2024 at 04:00:52PM +0000, Kate Deplaix wrote:
> The problem is that NGROUPS_MAX is used in downstream projects, not
> sysconf(_SC_NGROUPS_MAX). Notoriously one of the main user of musl
> (Alpine Linux) does not modify this value which makes the whole
> system break completely if a user happens to be added to more than
> 32 groups.

Can you clarify how the whole system is breaking as a result of the
macro value?

> Changing every opensource projects that use NGROUPS_MAX to use
> sysconf(_SC_NGROUPS_MAX) instead doesn't seem like a reasonable
> answer to me, even if it might be the correct one in theory.
>
> I also really don't understand why you want to support 20+ years old
> kernel versions (pre 2.6.4) which aren't even POSIX conformant
> according to your own page:
> https://wiki.musl-libc.org/supported-platforms. I also don't think
> it would also break anything on those platforms anyway if a higher
> value was used. Most uses i've seen use this value to allocate a
> static array, so aside from a couple more bytes of memory used there
> isn't much to lose.

What was established so far is that applications are using NGROUPS_MAX
as an array dimension for an automatic-storage array, which will
immediately blow up with stack overflow if we increased the value to
the kernel value. I'm aware that you proposed just using an arbitrary
lower value like 1024, which might work in practice, but I suspect
there are actually only like 3 or 4 programs that are having any
problem, and they could just be fixed to fallback to allocated storage
if a small constant-size buffer doesn't work. This is normally the way
we work through this kind of problem -- getting to the root cause and
eliminating the technical debt behind it rather than making new
technical debt to work around it.

But in order to evaluate whether this is a good option, and what the
impact of different options would be, we actually need to know what
these programs are.

Rich