[musl] Calling setxid() in a vfork()-child

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [musl] Calling setxid() in a vfork()-child
@ 2020-10-12  9:27 Alexey Izbyshev
  2020-10-12 14:55 ` Rich Felker
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-12  9:27 UTC (permalink / raw)
  To: musl

Hello,

I'm investigating possibility of using vfork() instead of fork() in a 
Linux-only application. Before calling execve(), the app might need to 
call some functions to setup the child, including setxid() (let's assume 
that security concerns of [1] are not applicable). I'm aware that POSIX 
doesn't allow that for vfork()-children, but I'm also aware that it 
might be OK on Linux if the set of functions is sufficiently 
constrained, and that vfork() is used to efficiently implement 
posix_spawn() in C libraries. However, setuid()/setgid() seem 
particularly tricky because of the need to call the actual syscall in 
all threads, so if a C library is unaware that setxid() is called in a 
vfork()-child, it might attempt to interact with threads of the parent 
process, potentially causing trouble. I've checked musl and found a 
recent commit[2] that fixes this exact issue. I've also checked 
glibc[3], but haven't found any handling of this case (and vfork() 
doesn't appear to do anything special in this regard either[4]).

Do I understand correctly that, from an application developer 
perspective, it's currently better to avoid setxid/setrlimit libc 
functions in a vfork()-child, and that using syscall() or avoiding 
vfork() entirely is preferred in this case?

Thanks,
Alexey

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=14749
[2] https://git.musl-libc.org/cgit/musl/commit/?id=a5aff1972
[3] 
https://sourceware.org/git?p=glibc.git;a=blob;f=nptl/allocatestack.c;h=4b45f8c884b;hb=HEAD#l1082
[4] 
https://sourceware.org/git?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/vfork.S;h=776d2fc61;hb=HEAD#l44

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-12  9:27 [musl] Calling setxid() in a vfork()-child Alexey Izbyshev
@ 2020-10-12 14:55 ` Rich Felker
  2020-10-12 20:30   ` Alexey Izbyshev
  0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2020-10-12 14:55 UTC (permalink / raw)
  To: musl

On Mon, Oct 12, 2020 at 12:27:44PM +0300, Alexey Izbyshev wrote:
> Hello,
> 
> I'm investigating possibility of using vfork() instead of fork() in
> a Linux-only application. Before calling execve(), the app might
> need to call some functions to setup the child, including setxid()
> (let's assume that security concerns of [1] are not applicable). I'm
> aware that POSIX doesn't allow that for vfork()-children, but I'm
> also aware that it might be OK on Linux if the set of functions is
> sufficiently constrained, and that vfork() is used to efficiently
> implement posix_spawn() in C libraries. However, setuid()/setgid()
> seem particularly tricky because of the need to call the actual
> syscall in all threads, so if a C library is unaware that setxid()
> is called in a vfork()-child, it might attempt to interact with
> threads of the parent process, potentially causing trouble. I've
> checked musl and found a recent commit[2] that fixes this exact
> issue. I've also checked glibc[3], but haven't found any handling of
> this case (and vfork() doesn't appear to do anything special in this
> regard either[4]).
> 
> Do I understand correctly that, from an application developer
> perspective, it's currently better to avoid setxid/setrlimit libc
> functions in a vfork()-child, and that using syscall() or avoiding
> vfork() entirely is preferred in this case?

Really, avoiding vfork entirely is preferable. The traditional
specification of vfork (before it was deprecated and removed from
spec; POSIX has not had vfork for a *long* time) did not allow
*anything* after vfork except execve or _exit, so arguably it's UB,
although there's also some argument to be made that if we're
implementing the nonstandard and traditional vfork function it should
have most of the important traditional properties.

Indeed as you found this is fixed in musl, largely because the failure
mode was so egregiously bad.

Note that in addition to the issue you're asking about, it's
fundamentally a bad idea to be using set*id() in a vforked child (or
anywhere in a process that calls vfork) because it leaves moments
where there are tasks in different privilege domains executing from
the same VM space. If the task that's dropped privileges does anything
that could lead to an attacker seizing control of the flow of
execution, rather than just getting access to the set*id()-reduced
privilege domain, they have full access to the original privilege
domain. This is why musl's multithreaded set*id() (__synccall) takes
care not to admit forward progress of any application code during the
transition, and goes to the trouble of having a thread list lock that
unlocks atomically with kernel task exit so that there is no race
window where a still-live thread can be missed.

In any case, IMO unless you're programming for NOMMU compatibility,
you should just forget vfork ever existed. There's no good reason to
use it. If a process can't fork because it's too big or the fork would
impact performance too much, posix_spawn can do far more than
vfork+execve can do portably. It can't do everything you can do with
vfork+execve if you're willing to break portability rules (i.e. invoke
UB), but with a helper executable to run in the child you can get that
all back.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-12 14:55 ` Rich Felker
@ 2020-10-12 20:30   ` Alexey Izbyshev
  2020-10-13  2:47     ` Markus Wichmann
  2020-10-13 16:52     ` Alexey Izbyshev
  0 siblings, 2 replies; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-12 20:30 UTC (permalink / raw)
  To: musl

Thank you for the response, Rich!

On 2020-10-12 17:55, Rich Felker wrote:
> Note that in addition to the issue you're asking about, it's
> fundamentally a bad idea to be using set*id() in a vforked child (or
> anywhere in a process that calls vfork) because it leaves moments
> where there are tasks in different privilege domains executing from
> the same VM space. If the task that's dropped privileges does anything
> that could lead to an attacker seizing control of the flow of
> execution, rather than just getting access to the set*id()-reduced
> privilege domain, they have full access to the original privilege
> domain. This is why musl's multithreaded set*id() (__synccall) takes
> care not to admit forward progress of any application code during the
> transition, and goes to the trouble of having a thread list lock that
> unlocks atomically with kernel task exit so that there is no race
> window where a still-live thread can be missed.
> 
Yes, I initially learned about this security issue from your post[1]. I 
proposed to ignore it for the moment in my email because it was hard for 
me to imagine any security implications in a constrained case: if the 
only use of setuid() in a privileged app is to drop privileges in a 
vfork()-child before calling execve(). However, thinking about it more, 
I see that dropping privileges could open the child to new ways of 
interaction from outside of the app in a window before execve(), so, if, 
say, another unprivileged process can ptrace it at the right moment, bad 
things could happen. But now I don't see why wouldn't the same attack be 
applicable to __synccall(). While __synccall() seizes execution of 
application code, I don't see how it could prevent ptrace-like kind of 
an outside attack on a thread at the moment when another thread is in a 
different privilege domain.

> In any case, IMO unless you're programming for NOMMU compatibility,
> you should just forget vfork ever existed. There's no good reason to
> use it. If a process can't fork because it's too big or the fork would
> impact performance too much, posix_spawn can do far more than
> vfork+execve can do portably. It can't do everything you can do with
> vfork+execve if you're willing to break portability rules (i.e. invoke
> UB), but with a helper executable to run in the child you can get that
> all back.
> 
Unfortunately, while posix_spawn() + a helper executable would be fine 
in many cases (more so for applications than libraries), addition of a 
helper to an existing library exposing a process creation API that can't 
be implemented in terms of posix_spawn() may be not straightforward. If 
the helper is just an executable file, we'll need to locate it, which 
may be simply impossible when our library is called if, say, switching 
mount namespaces is involved (which may be out of control of our 
library). Solutions may exist (locate and open() at startup or 
memfd_create(), and then execveat()?), but vfork() (+ preventing 
execution of signal handlers in the child) seems so much simpler, and 
doesn't add the overhead of an extra execve() too.

Alexey

[1] https://ewontfix.com/7

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-12 20:30   ` Alexey Izbyshev
@ 2020-10-13  2:47     ` Markus Wichmann
  2020-10-13  9:52       ` Laurent Bercot
  2020-10-13 15:24       ` Alexey Izbyshev
  2020-10-13 16:52     ` Alexey Izbyshev
  1 sibling, 2 replies; 10+ messages in thread
From: Markus Wichmann @ 2020-10-13  2:47 UTC (permalink / raw)
  To: musl

On Mon, Oct 12, 2020 at 11:30:58PM +0300, Alexey Izbyshev wrote:
> Unfortunately, while posix_spawn() + a helper executable would be fine in
> many cases (more so for applications than libraries), addition of a helper
> to an existing library exposing a process creation API that can't be
> implemented in terms of posix_spawn() may be not straightforward. If the
> helper is just an executable file, we'll need to locate it, which may be
> simply impossible when our library is called if, say, switching mount
> namespaces is involved (which may be out of control of our library).
> Solutions may exist (locate and open() at startup or memfd_create(), and
> then execveat()?), but vfork() (+ preventing execution of signal handlers in
> the child) seems so much simpler, and doesn't add the overhead of an extra
> execve() too.
>
> Alexey
>

If dropping privileges is all you want, then posix_spawn() has a flag
for that. And if you are foregoing portability anyway by doing anything
between vfork() and execve(), might as well use clone() and do it
properly.

Yes, locating a helper binary is a bit of a problem. I've been recently
struggling with that myself. While you can make the binary part of the
main application (or library, in your case), using posix_spawn() means
the binary must have a path name, so using tempfile() is out of the
question (unless you use the proc file name, which is, again,
non-portable).

Ciao,
Markus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-13  2:47     ` Markus Wichmann
@ 2020-10-13  9:52       ` Laurent Bercot
  2020-10-13 15:48         ` Alexey Izbyshev
  2020-10-13 15:24       ` Alexey Izbyshev
  1 sibling, 1 reply; 10+ messages in thread
From: Laurent Bercot @ 2020-10-13  9:52 UTC (permalink / raw)
  To: musl

>If dropping privileges is all you want, then posix_spawn() has a flag
>for that.

  But it does not. All POSIX_SPAWN_RESETIDS does is make sure that a
s-bit program does not spawn a child with the same effective uid as
its caller; there is nothing in posix_spawn() about dropping root
privileges.

  This is one of process state change operations that are lacking in
posix_spawn(), along with being able to spawn the child as a session
leader (despite being able to spawn it as a process group leader).

  That's what makes exhaustive attribute listing a bad function design:
there is always an attribute that designers forget. I understand how
useful posix_spawn() is for portable correctness, I use it over fork()
whenever I can, but it is definitely not complete without helper
programs and it's an ugly wart that nobody benefits from ignoring.

--
  Laurent

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-13  2:47     ` Markus Wichmann
  2020-10-13  9:52       ` Laurent Bercot
@ 2020-10-13 15:24       ` Alexey Izbyshev
  2020-10-13 16:07         ` Rich Felker
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-13 15:24 UTC (permalink / raw)
  To: musl

On 2020-10-13 05:47, Markus Wichmann wrote:
> If dropping privileges is all you want, then posix_spawn() has a flag
> for that. And if you are foregoing portability anyway by doing anything
> between vfork() and execve(), might as well use clone() and do it
> properly.
> 
What do you mean by "do it properly"? Unless you mean doing syscalls, it 
seems that I'd have the same issues with clone() (with CLONE_VFORK, 
since I'm trying to avoid copying of page tables) as I do with vfork(). 
Namely, I'd still have to care about signals, and I wouldn't be able to 
safely call setxid() (and, frankly, anything else from a C library if we 
want a solution that's, while being Linux-specific, still portable 
across C libraries).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-13  9:52       ` Laurent Bercot
@ 2020-10-13 15:48         ` Alexey Izbyshev
  0 siblings, 0 replies; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-13 15:48 UTC (permalink / raw)
  To: musl

On 2020-10-13 12:52, Laurent Bercot wrote:
>  This is one of process state change operations that are lacking in
> posix_spawn(), along with being able to spawn the child as a session
> leader (despite being able to spawn it as a process group leader).
> 
Thankfully, POSIX_SPAWN_SETSID was accepted into POSIX[1] and is 
available in recent enough musl and glibc (and I also heard some rumors 
about macOS).

But yes, even if we accept that simply setting some attributes for a new 
process is enough (as opposed to running more complex logic requiring an 
already created process, which is possible with fork()), having a fixed 
of them for an evolving entity is problematic, more so since on Linux 
posix_spawn() implementations are developed separately from the kernel.

[1] https://www.austingroupbugs.net/view.php?id=1044

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-13 15:24       ` Alexey Izbyshev
@ 2020-10-13 16:07         ` Rich Felker
  0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2020-10-13 16:07 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Oct 13, 2020 at 06:24:45PM +0300, Alexey Izbyshev wrote:
> On 2020-10-13 05:47, Markus Wichmann wrote:
> >If dropping privileges is all you want, then posix_spawn() has a flag
> >for that. And if you are foregoing portability anyway by doing anything
> >between vfork() and execve(), might as well use clone() and do it
> >properly.
> >
> What do you mean by "do it properly"? Unless you mean doing
> syscalls, it seems that I'd have the same issues with clone() (with
> CLONE_VFORK, since I'm trying to avoid copying of page tables) as I
> do with vfork(). Namely, I'd still have to care about signals, and I
> wouldn't be able to safely call setxid() (and, frankly, anything
> else from a C library if we want a solution that's, while being
> Linux-specific, still portable across C libraries).

Indeed, it's not safe to call libc functions from a CLONE_VM context.
We might want to make some sort of contract about a subset that are
safe to call, but right now there really isn't such a set. AS-safe
functions might be close, and indeed after the __synccall change
set*id should in theory work from a clone() context too.

Really, unless you're trying to support NOMMU, "do it properly" means
just forgetting about CLONE_VM if posix_spawn doesn't meet your needs
and using plain fork+...+exec.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-12 20:30   ` Alexey Izbyshev
  2020-10-13  2:47     ` Markus Wichmann
@ 2020-10-13 16:52     ` Alexey Izbyshev
  2020-10-13 17:05       ` Rich Felker
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-13 16:52 UTC (permalink / raw)
  To: musl

On 2020-10-12 23:30, Alexey Izbyshev wrote:
> ...However, thinking about it
> more, I see that dropping privileges could open the child to new ways
> of interaction from outside of the app in a window before execve(),
> so, if, say, another unprivileged process can ptrace it at the right
> moment, bad things could happen.
> 
Alexander Monakov pointed out to me that this particular naive attack is 
not possible (unless "/proc/sys/fs/suid_dumpable" is changed or the 
application resets "dumpable" bit via prctl() after setxid()): 
https://elixir.bootlin.com/linux/v5.9/source/kernel/cred.c#L466

Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Calling setxid() in a vfork()-child
  2020-10-13 16:52     ` Alexey Izbyshev
@ 2020-10-13 17:05       ` Rich Felker
  0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2020-10-13 17:05 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Oct 13, 2020 at 07:52:28PM +0300, Alexey Izbyshev wrote:
> On 2020-10-12 23:30, Alexey Izbyshev wrote:
> >...However, thinking about it
> >more, I see that dropping privileges could open the child to new ways
> >of interaction from outside of the app in a window before execve(),
> >so, if, say, another unprivileged process can ptrace it at the right
> >moment, bad things could happen.
> >
> Alexander Monakov pointed out to me that this particular naive
> attack is not possible (unless "/proc/sys/fs/suid_dumpable" is
> changed or the application resets "dumpable" bit via prctl() after
> setxid()):
> https://elixir.bootlin.com/linux/v5.9/source/kernel/cred.c#L466

Yes, the underlying idea here is that suid programs will often open a
privileged resource to perform limited operations on it, and while the
outcome of giving the user unrestricted access to that resource might
not be catastrophic, it's likely at least disruptive (think ping and
raw sockets). Also it may still have privileged data (like private
keys or remnants thereof) in its memory. So the user is not allowed to
debug/trace/seize control of such a process unless it explicitly opts
in.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-13 17:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12  9:27 [musl] Calling setxid() in a vfork()-child Alexey Izbyshev
2020-10-12 14:55 ` Rich Felker
2020-10-12 20:30   ` Alexey Izbyshev
2020-10-13  2:47     ` Markus Wichmann
2020-10-13  9:52       ` Laurent Bercot
2020-10-13 15:48         ` Alexey Izbyshev
2020-10-13 15:24       ` Alexey Izbyshev
2020-10-13 16:07         ` Rich Felker
2020-10-13 16:52     ` Alexey Izbyshev
2020-10-13 17:05       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).