* [musl] Calling setxid() in a vfork()-child @ 2020-10-12 9:27 Alexey Izbyshev 2020-10-12 14:55 ` Rich Felker 0 siblings, 1 reply; 10+ messages in thread From: Alexey Izbyshev @ 2020-10-12 9:27 UTC (permalink / raw) To: musl Hello, I'm investigating possibility of using vfork() instead of fork() in a Linux-only application. Before calling execve(), the app might need to call some functions to setup the child, including setxid() (let's assume that security concerns of [1] are not applicable). I'm aware that POSIX doesn't allow that for vfork()-children, but I'm also aware that it might be OK on Linux if the set of functions is sufficiently constrained, and that vfork() is used to efficiently implement posix_spawn() in C libraries. However, setuid()/setgid() seem particularly tricky because of the need to call the actual syscall in all threads, so if a C library is unaware that setxid() is called in a vfork()-child, it might attempt to interact with threads of the parent process, potentially causing trouble. I've checked musl and found a recent commit[2] that fixes this exact issue. I've also checked glibc[3], but haven't found any handling of this case (and vfork() doesn't appear to do anything special in this regard either[4]). Do I understand correctly that, from an application developer perspective, it's currently better to avoid setxid/setrlimit libc functions in a vfork()-child, and that using syscall() or avoiding vfork() entirely is preferred in this case? Thanks, Alexey [1] https://sourceware.org/bugzilla/show_bug.cgi?id=14749 [2] https://git.musl-libc.org/cgit/musl/commit/?id=a5aff1972 [3] https://sourceware.org/git?p=glibc.git;a=blob;f=nptl/allocatestack.c;h=4b45f8c884b;hb=HEAD#l1082 [4] https://sourceware.org/git?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/vfork.S;h=776d2fc61;hb=HEAD#l44 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-12 9:27 [musl] Calling setxid() in a vfork()-child Alexey Izbyshev @ 2020-10-12 14:55 ` Rich Felker 2020-10-12 20:30 ` Alexey Izbyshev 0 siblings, 1 reply; 10+ messages in thread From: Rich Felker @ 2020-10-12 14:55 UTC (permalink / raw) To: musl On Mon, Oct 12, 2020 at 12:27:44PM +0300, Alexey Izbyshev wrote: > Hello, > > I'm investigating possibility of using vfork() instead of fork() in > a Linux-only application. Before calling execve(), the app might > need to call some functions to setup the child, including setxid() > (let's assume that security concerns of [1] are not applicable). I'm > aware that POSIX doesn't allow that for vfork()-children, but I'm > also aware that it might be OK on Linux if the set of functions is > sufficiently constrained, and that vfork() is used to efficiently > implement posix_spawn() in C libraries. However, setuid()/setgid() > seem particularly tricky because of the need to call the actual > syscall in all threads, so if a C library is unaware that setxid() > is called in a vfork()-child, it might attempt to interact with > threads of the parent process, potentially causing trouble. I've > checked musl and found a recent commit[2] that fixes this exact > issue. I've also checked glibc[3], but haven't found any handling of > this case (and vfork() doesn't appear to do anything special in this > regard either[4]). > > Do I understand correctly that, from an application developer > perspective, it's currently better to avoid setxid/setrlimit libc > functions in a vfork()-child, and that using syscall() or avoiding > vfork() entirely is preferred in this case? Really, avoiding vfork entirely is preferable. The traditional specification of vfork (before it was deprecated and removed from spec; POSIX has not had vfork for a *long* time) did not allow *anything* after vfork except execve or _exit, so arguably it's UB, although there's also some argument to be made that if we're implementing the nonstandard and traditional vfork function it should have most of the important traditional properties. Indeed as you found this is fixed in musl, largely because the failure mode was so egregiously bad. Note that in addition to the issue you're asking about, it's fundamentally a bad idea to be using set*id() in a vforked child (or anywhere in a process that calls vfork) because it leaves moments where there are tasks in different privilege domains executing from the same VM space. If the task that's dropped privileges does anything that could lead to an attacker seizing control of the flow of execution, rather than just getting access to the set*id()-reduced privilege domain, they have full access to the original privilege domain. This is why musl's multithreaded set*id() (__synccall) takes care not to admit forward progress of any application code during the transition, and goes to the trouble of having a thread list lock that unlocks atomically with kernel task exit so that there is no race window where a still-live thread can be missed. In any case, IMO unless you're programming for NOMMU compatibility, you should just forget vfork ever existed. There's no good reason to use it. If a process can't fork because it's too big or the fork would impact performance too much, posix_spawn can do far more than vfork+execve can do portably. It can't do everything you can do with vfork+execve if you're willing to break portability rules (i.e. invoke UB), but with a helper executable to run in the child you can get that all back. Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-12 14:55 ` Rich Felker @ 2020-10-12 20:30 ` Alexey Izbyshev 2020-10-13 2:47 ` Markus Wichmann 2020-10-13 16:52 ` Alexey Izbyshev 0 siblings, 2 replies; 10+ messages in thread From: Alexey Izbyshev @ 2020-10-12 20:30 UTC (permalink / raw) To: musl Thank you for the response, Rich! On 2020-10-12 17:55, Rich Felker wrote: > Note that in addition to the issue you're asking about, it's > fundamentally a bad idea to be using set*id() in a vforked child (or > anywhere in a process that calls vfork) because it leaves moments > where there are tasks in different privilege domains executing from > the same VM space. If the task that's dropped privileges does anything > that could lead to an attacker seizing control of the flow of > execution, rather than just getting access to the set*id()-reduced > privilege domain, they have full access to the original privilege > domain. This is why musl's multithreaded set*id() (__synccall) takes > care not to admit forward progress of any application code during the > transition, and goes to the trouble of having a thread list lock that > unlocks atomically with kernel task exit so that there is no race > window where a still-live thread can be missed. > Yes, I initially learned about this security issue from your post[1]. I proposed to ignore it for the moment in my email because it was hard for me to imagine any security implications in a constrained case: if the only use of setuid() in a privileged app is to drop privileges in a vfork()-child before calling execve(). However, thinking about it more, I see that dropping privileges could open the child to new ways of interaction from outside of the app in a window before execve(), so, if, say, another unprivileged process can ptrace it at the right moment, bad things could happen. But now I don't see why wouldn't the same attack be applicable to __synccall(). While __synccall() seizes execution of application code, I don't see how it could prevent ptrace-like kind of an outside attack on a thread at the moment when another thread is in a different privilege domain. > In any case, IMO unless you're programming for NOMMU compatibility, > you should just forget vfork ever existed. There's no good reason to > use it. If a process can't fork because it's too big or the fork would > impact performance too much, posix_spawn can do far more than > vfork+execve can do portably. It can't do everything you can do with > vfork+execve if you're willing to break portability rules (i.e. invoke > UB), but with a helper executable to run in the child you can get that > all back. > Unfortunately, while posix_spawn() + a helper executable would be fine in many cases (more so for applications than libraries), addition of a helper to an existing library exposing a process creation API that can't be implemented in terms of posix_spawn() may be not straightforward. If the helper is just an executable file, we'll need to locate it, which may be simply impossible when our library is called if, say, switching mount namespaces is involved (which may be out of control of our library). Solutions may exist (locate and open() at startup or memfd_create(), and then execveat()?), but vfork() (+ preventing execution of signal handlers in the child) seems so much simpler, and doesn't add the overhead of an extra execve() too. Alexey [1] https://ewontfix.com/7 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-12 20:30 ` Alexey Izbyshev @ 2020-10-13 2:47 ` Markus Wichmann 2020-10-13 9:52 ` Laurent Bercot 2020-10-13 15:24 ` Alexey Izbyshev 2020-10-13 16:52 ` Alexey Izbyshev 1 sibling, 2 replies; 10+ messages in thread From: Markus Wichmann @ 2020-10-13 2:47 UTC (permalink / raw) To: musl On Mon, Oct 12, 2020 at 11:30:58PM +0300, Alexey Izbyshev wrote: > Unfortunately, while posix_spawn() + a helper executable would be fine in > many cases (more so for applications than libraries), addition of a helper > to an existing library exposing a process creation API that can't be > implemented in terms of posix_spawn() may be not straightforward. If the > helper is just an executable file, we'll need to locate it, which may be > simply impossible when our library is called if, say, switching mount > namespaces is involved (which may be out of control of our library). > Solutions may exist (locate and open() at startup or memfd_create(), and > then execveat()?), but vfork() (+ preventing execution of signal handlers in > the child) seems so much simpler, and doesn't add the overhead of an extra > execve() too. > > Alexey > If dropping privileges is all you want, then posix_spawn() has a flag for that. And if you are foregoing portability anyway by doing anything between vfork() and execve(), might as well use clone() and do it properly. Yes, locating a helper binary is a bit of a problem. I've been recently struggling with that myself. While you can make the binary part of the main application (or library, in your case), using posix_spawn() means the binary must have a path name, so using tempfile() is out of the question (unless you use the proc file name, which is, again, non-portable). Ciao, Markus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-13 2:47 ` Markus Wichmann @ 2020-10-13 9:52 ` Laurent Bercot 2020-10-13 15:48 ` Alexey Izbyshev 2020-10-13 15:24 ` Alexey Izbyshev 1 sibling, 1 reply; 10+ messages in thread From: Laurent Bercot @ 2020-10-13 9:52 UTC (permalink / raw) To: musl >If dropping privileges is all you want, then posix_spawn() has a flag >for that. But it does not. All POSIX_SPAWN_RESETIDS does is make sure that a s-bit program does not spawn a child with the same effective uid as its caller; there is nothing in posix_spawn() about dropping root privileges. This is one of process state change operations that are lacking in posix_spawn(), along with being able to spawn the child as a session leader (despite being able to spawn it as a process group leader). That's what makes exhaustive attribute listing a bad function design: there is always an attribute that designers forget. I understand how useful posix_spawn() is for portable correctness, I use it over fork() whenever I can, but it is definitely not complete without helper programs and it's an ugly wart that nobody benefits from ignoring. -- Laurent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-13 9:52 ` Laurent Bercot @ 2020-10-13 15:48 ` Alexey Izbyshev 0 siblings, 0 replies; 10+ messages in thread From: Alexey Izbyshev @ 2020-10-13 15:48 UTC (permalink / raw) To: musl On 2020-10-13 12:52, Laurent Bercot wrote: > This is one of process state change operations that are lacking in > posix_spawn(), along with being able to spawn the child as a session > leader (despite being able to spawn it as a process group leader). > Thankfully, POSIX_SPAWN_SETSID was accepted into POSIX[1] and is available in recent enough musl and glibc (and I also heard some rumors about macOS). But yes, even if we accept that simply setting some attributes for a new process is enough (as opposed to running more complex logic requiring an already created process, which is possible with fork()), having a fixed of them for an evolving entity is problematic, more so since on Linux posix_spawn() implementations are developed separately from the kernel. [1] https://www.austingroupbugs.net/view.php?id=1044 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-13 2:47 ` Markus Wichmann 2020-10-13 9:52 ` Laurent Bercot @ 2020-10-13 15:24 ` Alexey Izbyshev 2020-10-13 16:07 ` Rich Felker 1 sibling, 1 reply; 10+ messages in thread From: Alexey Izbyshev @ 2020-10-13 15:24 UTC (permalink / raw) To: musl On 2020-10-13 05:47, Markus Wichmann wrote: > If dropping privileges is all you want, then posix_spawn() has a flag > for that. And if you are foregoing portability anyway by doing anything > between vfork() and execve(), might as well use clone() and do it > properly. > What do you mean by "do it properly"? Unless you mean doing syscalls, it seems that I'd have the same issues with clone() (with CLONE_VFORK, since I'm trying to avoid copying of page tables) as I do with vfork(). Namely, I'd still have to care about signals, and I wouldn't be able to safely call setxid() (and, frankly, anything else from a C library if we want a solution that's, while being Linux-specific, still portable across C libraries). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-13 15:24 ` Alexey Izbyshev @ 2020-10-13 16:07 ` Rich Felker 0 siblings, 0 replies; 10+ messages in thread From: Rich Felker @ 2020-10-13 16:07 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl On Tue, Oct 13, 2020 at 06:24:45PM +0300, Alexey Izbyshev wrote: > On 2020-10-13 05:47, Markus Wichmann wrote: > >If dropping privileges is all you want, then posix_spawn() has a flag > >for that. And if you are foregoing portability anyway by doing anything > >between vfork() and execve(), might as well use clone() and do it > >properly. > > > What do you mean by "do it properly"? Unless you mean doing > syscalls, it seems that I'd have the same issues with clone() (with > CLONE_VFORK, since I'm trying to avoid copying of page tables) as I > do with vfork(). Namely, I'd still have to care about signals, and I > wouldn't be able to safely call setxid() (and, frankly, anything > else from a C library if we want a solution that's, while being > Linux-specific, still portable across C libraries). Indeed, it's not safe to call libc functions from a CLONE_VM context. We might want to make some sort of contract about a subset that are safe to call, but right now there really isn't such a set. AS-safe functions might be close, and indeed after the __synccall change set*id should in theory work from a clone() context too. Really, unless you're trying to support NOMMU, "do it properly" means just forgetting about CLONE_VM if posix_spawn doesn't meet your needs and using plain fork+...+exec. Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-12 20:30 ` Alexey Izbyshev 2020-10-13 2:47 ` Markus Wichmann @ 2020-10-13 16:52 ` Alexey Izbyshev 2020-10-13 17:05 ` Rich Felker 1 sibling, 1 reply; 10+ messages in thread From: Alexey Izbyshev @ 2020-10-13 16:52 UTC (permalink / raw) To: musl On 2020-10-12 23:30, Alexey Izbyshev wrote: > ...However, thinking about it > more, I see that dropping privileges could open the child to new ways > of interaction from outside of the app in a window before execve(), > so, if, say, another unprivileged process can ptrace it at the right > moment, bad things could happen. > Alexander Monakov pointed out to me that this particular naive attack is not possible (unless "/proc/sys/fs/suid_dumpable" is changed or the application resets "dumpable" bit via prctl() after setxid()): https://elixir.bootlin.com/linux/v5.9/source/kernel/cred.c#L466 Alexey ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] Calling setxid() in a vfork()-child 2020-10-13 16:52 ` Alexey Izbyshev @ 2020-10-13 17:05 ` Rich Felker 0 siblings, 0 replies; 10+ messages in thread From: Rich Felker @ 2020-10-13 17:05 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl On Tue, Oct 13, 2020 at 07:52:28PM +0300, Alexey Izbyshev wrote: > On 2020-10-12 23:30, Alexey Izbyshev wrote: > >...However, thinking about it > >more, I see that dropping privileges could open the child to new ways > >of interaction from outside of the app in a window before execve(), > >so, if, say, another unprivileged process can ptrace it at the right > >moment, bad things could happen. > > > Alexander Monakov pointed out to me that this particular naive > attack is not possible (unless "/proc/sys/fs/suid_dumpable" is > changed or the application resets "dumpable" bit via prctl() after > setxid()): > https://elixir.bootlin.com/linux/v5.9/source/kernel/cred.c#L466 Yes, the underlying idea here is that suid programs will often open a privileged resource to perform limited operations on it, and while the outcome of giving the user unrestricted access to that resource might not be catastrophic, it's likely at least disruptive (think ping and raw sockets). Also it may still have privileged data (like private keys or remnants thereof) in its memory. So the user is not allowed to debug/trace/seize control of such a process unless it explicitly opts in. Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-10-13 17:05 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-12 9:27 [musl] Calling setxid() in a vfork()-child Alexey Izbyshev 2020-10-12 14:55 ` Rich Felker 2020-10-12 20:30 ` Alexey Izbyshev 2020-10-13 2:47 ` Markus Wichmann 2020-10-13 9:52 ` Laurent Bercot 2020-10-13 15:48 ` Alexey Izbyshev 2020-10-13 15:24 ` Alexey Izbyshev 2020-10-13 16:07 ` Rich Felker 2020-10-13 16:52 ` Alexey Izbyshev 2020-10-13 17:05 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).