* vfork replacement proposal @ 2012-12-31 20:34 Rich Felker 2013-02-03 18:49 ` Rich Felker 0 siblings, 1 reply; 4+ messages in thread From: Rich Felker @ 2012-12-31 20:34 UTC (permalink / raw) To: musl I've been looking for a viable replacement of the vfork usage in musl for a while, since it has two serious problems: 1. strace is buggy and causes the parent and child to run simultaneously on the same stack under vfork when the process is being traced. Binaries which can crash or go crazy under strace are highly undesirable, even if the fault is with strace. 2. While current compilers don't do this, the compiler is conceptually free to generate code that clobbers parts of the stack that still need to be used by the parent when it determines they are no longer needed in the child. The affected functions are posix_spawn[p], system, and popen. My new proposed design for these functions is: 1. Open a close-on-exec pipe. 2. Use clone with CLONE_VM|SIGCHLD as the flags to make a normal child process that shares VM but nothing else with the parent, and that runs a new function (rather than returning) on a small stack embedded in the caller's stack (e.g. a 1k automatic char array). 3. In the parent close the write end of the pipe and perform blocking read on the read end. 4. In the child, close the read end of the pipe and then shuffle file descriptors as needed (for setting up stdin/out for popen, or file actions for posix_spawn[p]), but with the added stipulations A-C: A. Before closing or dup2'ing onto a file descriptor in file actions, check to see if it's occupied by the pipe fd, and if so, use fcntl F_DUPFD_CLOEXEC to move it to a new number first. B. Before calling open in file actions, always use fcntl with F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the pipe is never occupying the otherwise-lowest-available fd number. C. Any failure to renumber the pipe fd as required in A-B is fatal. 5. On any failure in the child, write the error code for the failure to the pipe and _exit. This includes failure to renumber the pipe, or failure in the final call to an exec-family function. Otherwise the pipe closes on successful exec in the child. 6. If the parent reads 0 bytes (EOF) from the pipe, spawning the external process was successful. Otherwise, the error code is available indicating the cause of failure, and the cause can be reported to the calling program via a failure return value, instead of via immediate exit of the child process with result 127. This final point 6 makes the proposed new design superior to all existing implementations I know of: you get good data on the cause of failure in the parent rather than a false success followed by immediate exit with code 127 and no indication of the cause. The key breakthrough that made this design proposal possible was realizing that I can keep shuffling the pipe fd around in the child in a simple way that avoids interference with the POSIX spawn file actions. This is in contrast with the problem of determining in advance a "safe" fd number to locate the pipe on, which is a nontrivial problem when you can't know the existing set of open fds. Before I go trying to implement this, anyone see problems with it? Other comments? Rich ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfork replacement proposal 2012-12-31 20:34 vfork replacement proposal Rich Felker @ 2013-02-03 18:49 ` Rich Felker 2013-02-03 20:36 ` Szabolcs Nagy 0 siblings, 1 reply; 4+ messages in thread From: Rich Felker @ 2013-02-03 18:49 UTC (permalink / raw) To: musl On Mon, Dec 31, 2012 at 03:34:17PM -0500, Rich Felker wrote: > I've been looking for a viable replacement of the vfork usage in musl > for a while, since it has two serious problems: > > 1. strace is buggy and causes the parent and child to run > simultaneously on the same stack under vfork when the process is being > traced. Binaries which can crash or go crazy under strace are highly > undesirable, even if the fault is with strace. > > 2. While current compilers don't do this, the compiler is conceptually > free to generate code that clobbers parts of the stack that still need > to be used by the parent when it determines they are no longer needed > in the child. > > The affected functions are posix_spawn[p], system, and popen. > > My new proposed design for these functions is: I've implemented the new design and it seems to be working. After a few more checks, I'll commit it and see if anybody can give it some stress testing. > 4. In the child, close the read end of the pipe and then shuffle file > descriptors as needed (for setting up stdin/out for popen, or file > actions for posix_spawn[p]), but with the added stipulations A-C: > > A. Before closing or dup2'ing onto a file descriptor in file actions, > check to see if it's occupied by the pipe fd, and if so, use fcntl > F_DUPFD_CLOEXEC to move it to a new number first. > > B. Before calling open in file actions, always use fcntl with > F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the > pipe is never occupying the otherwise-lowest-available fd number. I was wrong about (B); the "open" file action does not assign the lowest-available fd, but a caller-chosen fd. Thus, for our purposes, it's just like close or dup2, targetting a known fd number. This means the same logic can be used for all three operations, and it can be based on dup() rather than F_DUPFD_CLOEXEC. Note that F_DUPFD_CLOEXEC is actually not viable because it's missing on slightly-old kernels (up through mid 2.6 series), but we don't need atomicity anyway since this thread/process is fully under posix_spawn's control. Also, I think it would be possible to abandon the "shuffling" logic and compute in advance a safe fd number to put the pipe on. Finally, it seems posix_spawn will be sufficient as a backend for implementing popen, wordexp, and system, so I just put all the logic in posix_spawn itself rather than trying to design a more abstract API with callbacks for the specific caller case. Rich ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfork replacement proposal 2013-02-03 18:49 ` Rich Felker @ 2013-02-03 20:36 ` Szabolcs Nagy 2013-02-03 20:47 ` Rich Felker 0 siblings, 1 reply; 4+ messages in thread From: Szabolcs Nagy @ 2013-02-03 20:36 UTC (permalink / raw) To: musl * Rich Felker <dalias@aerifal.cx> [2013-02-03 13:49:23 -0500]: > On Mon, Dec 31, 2012 at 03:34:17PM -0500, Rich Felker wrote: > > 4. In the child, close the read end of the pipe and then shuffle file > > descriptors as needed (for setting up stdin/out for popen, or file > > actions for posix_spawn[p]), but with the added stipulations A-C: > > > > A. Before closing or dup2'ing onto a file descriptor in file actions, > > check to see if it's occupied by the pipe fd, and if so, use fcntl > > F_DUPFD_CLOEXEC to move it to a new number first. > > > > B. Before calling open in file actions, always use fcntl with > > F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the > > pipe is never occupying the otherwise-lowest-available fd number. > > I was wrong about (B); the "open" file action does not assign the > lowest-available fd, but a caller-chosen fd. Thus, for our purposes, > it's just like close or dup2, targetting a known fd number. This means > the same logic can be used for all three operations, and it can be > based on dup() rather than F_DUPFD_CLOEXEC. Note that F_DUPFD_CLOEXEC > is actually not viable because it's missing on slightly-old kernels > (up through mid 2.6 series), but we don't need atomicity anyway since > this thread/process is fully under posix_spawn's control. > > Also, I think it would be possible to abandon the "shuffling" logic > and compute in advance a safe fd number to put the pipe on. > > Finally, it seems posix_spawn will be sufficient as a backend for > implementing popen, wordexp, and system, so I just put all the logic > in posix_spawn itself rather than trying to design a more abstract API > with callbacks for the specific caller case. > hm, is it possible to have a non-forking spawn that covers all the fork+exec cases? (things one might want to do before exec, eg by specifying extra attributes..) as far as i can see posix_spawn handles these: setenv fds (file_actions, O_CLOEXEC) setpgid (POSIX_SPAWN_SETPGROUP) drop euid, egid (POSIX_SPAWN_RESETIDS) sigmask, default sighandlers (POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF) sched param/policy (POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER) but not these: setsid setuid, setgid, setgroups chdir chroot rlimits enable ptrace ioctl, setctty/noctty prctl, parent death signal (maybe others..) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfork replacement proposal 2013-02-03 20:36 ` Szabolcs Nagy @ 2013-02-03 20:47 ` Rich Felker 0 siblings, 0 replies; 4+ messages in thread From: Rich Felker @ 2013-02-03 20:47 UTC (permalink / raw) To: musl On Sun, Feb 03, 2013 at 09:36:37PM +0100, Szabolcs Nagy wrote: > * Rich Felker <dalias@aerifal.cx> [2013-02-03 13:49:23 -0500]: > > On Mon, Dec 31, 2012 at 03:34:17PM -0500, Rich Felker wrote: > > > 4. In the child, close the read end of the pipe and then shuffle file > > > descriptors as needed (for setting up stdin/out for popen, or file > > > actions for posix_spawn[p]), but with the added stipulations A-C: > > > > > > A. Before closing or dup2'ing onto a file descriptor in file actions, > > > check to see if it's occupied by the pipe fd, and if so, use fcntl > > > F_DUPFD_CLOEXEC to move it to a new number first. > > > > > > B. Before calling open in file actions, always use fcntl with > > > F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the > > > pipe is never occupying the otherwise-lowest-available fd number. > > > > I was wrong about (B); the "open" file action does not assign the > > lowest-available fd, but a caller-chosen fd. Thus, for our purposes, > > it's just like close or dup2, targetting a known fd number. This means > > the same logic can be used for all three operations, and it can be > > based on dup() rather than F_DUPFD_CLOEXEC. Note that F_DUPFD_CLOEXEC > > is actually not viable because it's missing on slightly-old kernels > > (up through mid 2.6 series), but we don't need atomicity anyway since > > this thread/process is fully under posix_spawn's control. > > > > Also, I think it would be possible to abandon the "shuffling" logic > > and compute in advance a safe fd number to put the pipe on. > > > > Finally, it seems posix_spawn will be sufficient as a backend for > > implementing popen, wordexp, and system, so I just put all the logic > > in posix_spawn itself rather than trying to design a more abstract API > > with callbacks for the specific caller case. > > > > hm, is it possible to have a non-forking spawn that covers all the > fork+exec cases? (things one might want to do before exec, eg by > specifying extra attributes..) > > as far as i can see posix_spawn handles these: > > setenv > fds (file_actions, O_CLOEXEC) > setpgid (POSIX_SPAWN_SETPGROUP) > drop euid, egid (POSIX_SPAWN_RESETIDS) > sigmask, default sighandlers (POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF) > sched param/policy (POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER) > > but not these: > > setsid > setuid, setgid, setgroups > chdir > chroot > rlimits > enable ptrace > ioctl, setctty/noctty > prctl, parent death signal > (maybe others..) See http://austingroupbugs.net/view.php?id=603 It's possible this would be added if the effort is put into it. I'm unsure how desirable that is. Basically the number of things one might want to set in the child will continue to grow over time, so the ideal situaion would be to have a callback, but that would be incompatible with in-kernel implementations of posix_spawn (I think NetBSD has one now) and would be generally unsafe anyway (it would be difficult to specify what you could safely call from the callback). So if additional attributes are added, there will be a maintenance burden. Note that among the above list, several items (chroot, setgroups, ptrace, parent death signal, ...) are outside the scope of POSIX so they would not get added. It would be nice if the shell just had a way to perform these actions efficiently; then you could call posix_spawn with: "sh", "-c", "set_attrs_cmd && exec \"$@\"", "cmd", "arg1", ..., (void*)0 Rich ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-03 20:47 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-12-31 20:34 vfork replacement proposal Rich Felker 2013-02-03 18:49 ` Rich Felker 2013-02-03 20:36 ` Szabolcs Nagy 2013-02-03 20:47 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).