mailing list of musl libc
 help / color / mirror / code / Atom feed
* vfork replacement proposal
@ 2012-12-31 20:34 Rich Felker
  2013-02-03 18:49 ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2012-12-31 20:34 UTC (permalink / raw)
  To: musl

I've been looking for a viable replacement of the vfork usage in musl
for a while, since it has two serious problems:

1. strace is buggy and causes the parent and child to run
simultaneously on the same stack under vfork when the process is being
traced. Binaries which can crash or go crazy under strace are highly
undesirable, even if the fault is with strace.

2. While current compilers don't do this, the compiler is conceptually
free to generate code that clobbers parts of the stack that still need
to be used by the parent when it determines they are no longer needed
in the child.

The affected functions are posix_spawn[p], system, and popen.

My new proposed design for these functions is:

1. Open a close-on-exec pipe.

2. Use clone with CLONE_VM|SIGCHLD as the flags to make a normal child
process that shares VM but nothing else with the parent, and that runs
a new function (rather than returning) on a small stack embedded in
the caller's stack (e.g. a 1k automatic char array).

3. In the parent close the write end of the pipe and perform blocking
read on the read end.

4. In the child, close the read end of the pipe and then shuffle file
descriptors as needed (for setting up stdin/out for popen, or file
actions for posix_spawn[p]), but with the added stipulations A-C:

A. Before closing or dup2'ing onto a file descriptor in file actions,
check to see if it's occupied by the pipe fd, and if so, use fcntl
F_DUPFD_CLOEXEC to move it to a new number first.

B. Before calling open in file actions, always use fcntl with
F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the
pipe is never occupying the otherwise-lowest-available fd number.

C. Any failure to renumber the pipe fd as required in A-B is fatal.

5. On any failure in the child, write the error code for the failure
to the pipe and _exit. This includes failure to renumber the pipe, or
failure in the final call to an exec-family function. Otherwise the
pipe closes on successful exec in the child.

6. If the parent reads 0 bytes (EOF) from the pipe, spawning the
external process was successful. Otherwise, the error code is
available indicating the cause of failure, and the cause can be
reported to the calling program via a failure return value, instead of
via immediate exit of the child process with result 127.

This final point 6 makes the proposed new design superior to all
existing implementations I know of: you get good data on the cause of
failure in the parent rather than a false success followed by
immediate exit with code 127 and no indication of the cause.

The key breakthrough that made this design proposal possible was
realizing that I can keep shuffling the pipe fd around in the child in
a simple way that avoids interference with the POSIX spawn file
actions. This is in contrast with the problem of determining in
advance a "safe" fd number to locate the pipe on, which is a
nontrivial problem when you can't know the existing set of open fds.

Before I go trying to implement this, anyone see problems with it?
Other comments?

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vfork replacement proposal
  2012-12-31 20:34 vfork replacement proposal Rich Felker
@ 2013-02-03 18:49 ` Rich Felker
  2013-02-03 20:36   ` Szabolcs Nagy
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2013-02-03 18:49 UTC (permalink / raw)
  To: musl

On Mon, Dec 31, 2012 at 03:34:17PM -0500, Rich Felker wrote:
> I've been looking for a viable replacement of the vfork usage in musl
> for a while, since it has two serious problems:
> 
> 1. strace is buggy and causes the parent and child to run
> simultaneously on the same stack under vfork when the process is being
> traced. Binaries which can crash or go crazy under strace are highly
> undesirable, even if the fault is with strace.
> 
> 2. While current compilers don't do this, the compiler is conceptually
> free to generate code that clobbers parts of the stack that still need
> to be used by the parent when it determines they are no longer needed
> in the child.
> 
> The affected functions are posix_spawn[p], system, and popen.
> 
> My new proposed design for these functions is:

I've implemented the new design and it seems to be working. After a
few more checks, I'll commit it and see if anybody can give it some
stress testing.

> 4. In the child, close the read end of the pipe and then shuffle file
> descriptors as needed (for setting up stdin/out for popen, or file
> actions for posix_spawn[p]), but with the added stipulations A-C:
> 
> A. Before closing or dup2'ing onto a file descriptor in file actions,
> check to see if it's occupied by the pipe fd, and if so, use fcntl
> F_DUPFD_CLOEXEC to move it to a new number first.
> 
> B. Before calling open in file actions, always use fcntl with
> F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the
> pipe is never occupying the otherwise-lowest-available fd number.

I was wrong about (B); the "open" file action does not assign the
lowest-available fd, but a caller-chosen fd. Thus, for our purposes,
it's just like close or dup2, targetting a known fd number. This means
the same logic can be used for all three operations, and it can be
based on dup() rather than F_DUPFD_CLOEXEC. Note that F_DUPFD_CLOEXEC
is actually not viable because it's missing on slightly-old kernels
(up through mid 2.6 series), but we don't need atomicity anyway since
this thread/process is fully under posix_spawn's control.

Also, I think it would be possible to abandon the "shuffling" logic
and compute in advance a safe fd number to put the pipe on. 

Finally, it seems posix_spawn will be sufficient as a backend for
implementing popen, wordexp, and system, so I just put all the logic
in posix_spawn itself rather than trying to design a more abstract API
with callbacks for the specific caller case.

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vfork replacement proposal
  2013-02-03 18:49 ` Rich Felker
@ 2013-02-03 20:36   ` Szabolcs Nagy
  2013-02-03 20:47     ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Szabolcs Nagy @ 2013-02-03 20:36 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@aerifal.cx> [2013-02-03 13:49:23 -0500]:
> On Mon, Dec 31, 2012 at 03:34:17PM -0500, Rich Felker wrote:
> > 4. In the child, close the read end of the pipe and then shuffle file
> > descriptors as needed (for setting up stdin/out for popen, or file
> > actions for posix_spawn[p]), but with the added stipulations A-C:
> > 
> > A. Before closing or dup2'ing onto a file descriptor in file actions,
> > check to see if it's occupied by the pipe fd, and if so, use fcntl
> > F_DUPFD_CLOEXEC to move it to a new number first.
> > 
> > B. Before calling open in file actions, always use fcntl with
> > F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the
> > pipe is never occupying the otherwise-lowest-available fd number.
> 
> I was wrong about (B); the "open" file action does not assign the
> lowest-available fd, but a caller-chosen fd. Thus, for our purposes,
> it's just like close or dup2, targetting a known fd number. This means
> the same logic can be used for all three operations, and it can be
> based on dup() rather than F_DUPFD_CLOEXEC. Note that F_DUPFD_CLOEXEC
> is actually not viable because it's missing on slightly-old kernels
> (up through mid 2.6 series), but we don't need atomicity anyway since
> this thread/process is fully under posix_spawn's control.
> 
> Also, I think it would be possible to abandon the "shuffling" logic
> and compute in advance a safe fd number to put the pipe on. 
> 
> Finally, it seems posix_spawn will be sufficient as a backend for
> implementing popen, wordexp, and system, so I just put all the logic
> in posix_spawn itself rather than trying to design a more abstract API
> with callbacks for the specific caller case.
> 

hm, is it possible to have a non-forking spawn that covers all the
fork+exec cases? (things one might want to do before exec, eg by
specifying extra attributes..)

as far as i can see posix_spawn handles these:

 setenv
 fds (file_actions, O_CLOEXEC)
 setpgid (POSIX_SPAWN_SETPGROUP)
 drop euid, egid (POSIX_SPAWN_RESETIDS)
 sigmask, default sighandlers (POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF)
 sched param/policy (POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER)

but not these:

 setsid
 setuid, setgid, setgroups
 chdir
 chroot
 rlimits
 enable ptrace
 ioctl, setctty/noctty
 prctl, parent death signal
 (maybe others..)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vfork replacement proposal
  2013-02-03 20:36   ` Szabolcs Nagy
@ 2013-02-03 20:47     ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2013-02-03 20:47 UTC (permalink / raw)
  To: musl

On Sun, Feb 03, 2013 at 09:36:37PM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@aerifal.cx> [2013-02-03 13:49:23 -0500]:
> > On Mon, Dec 31, 2012 at 03:34:17PM -0500, Rich Felker wrote:
> > > 4. In the child, close the read end of the pipe and then shuffle file
> > > descriptors as needed (for setting up stdin/out for popen, or file
> > > actions for posix_spawn[p]), but with the added stipulations A-C:
> > > 
> > > A. Before closing or dup2'ing onto a file descriptor in file actions,
> > > check to see if it's occupied by the pipe fd, and if so, use fcntl
> > > F_DUPFD_CLOEXEC to move it to a new number first.
> > > 
> > > B. Before calling open in file actions, always use fcntl with
> > > F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the
> > > pipe is never occupying the otherwise-lowest-available fd number.
> > 
> > I was wrong about (B); the "open" file action does not assign the
> > lowest-available fd, but a caller-chosen fd. Thus, for our purposes,
> > it's just like close or dup2, targetting a known fd number. This means
> > the same logic can be used for all three operations, and it can be
> > based on dup() rather than F_DUPFD_CLOEXEC. Note that F_DUPFD_CLOEXEC
> > is actually not viable because it's missing on slightly-old kernels
> > (up through mid 2.6 series), but we don't need atomicity anyway since
> > this thread/process is fully under posix_spawn's control.
> > 
> > Also, I think it would be possible to abandon the "shuffling" logic
> > and compute in advance a safe fd number to put the pipe on. 
> > 
> > Finally, it seems posix_spawn will be sufficient as a backend for
> > implementing popen, wordexp, and system, so I just put all the logic
> > in posix_spawn itself rather than trying to design a more abstract API
> > with callbacks for the specific caller case.
> > 
> 
> hm, is it possible to have a non-forking spawn that covers all the
> fork+exec cases? (things one might want to do before exec, eg by
> specifying extra attributes..)
> 
> as far as i can see posix_spawn handles these:
> 
>  setenv
>  fds (file_actions, O_CLOEXEC)
>  setpgid (POSIX_SPAWN_SETPGROUP)
>  drop euid, egid (POSIX_SPAWN_RESETIDS)
>  sigmask, default sighandlers (POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSIGDEF)
>  sched param/policy (POSIX_SPAWN_SETSCHEDPARAM, POSIX_SPAWN_SETSCHEDULER)
> 
> but not these:
> 
>  setsid
>  setuid, setgid, setgroups
>  chdir
>  chroot
>  rlimits
>  enable ptrace
>  ioctl, setctty/noctty
>  prctl, parent death signal
>  (maybe others..)

See http://austingroupbugs.net/view.php?id=603

It's possible this would be added if the effort is put into it. I'm
unsure how desirable that is. Basically the number of things one might
want to set in the child will continue to grow over time, so the ideal
situaion would be to have a callback, but that would be incompatible
with in-kernel implementations of posix_spawn (I think NetBSD has one
now) and would be generally unsafe anyway (it would be difficult to
specify what you could safely call from the callback). So if
additional attributes are added, there will be a maintenance burden.

Note that among the above list, several items (chroot, setgroups,
ptrace, parent death signal, ...) are outside the scope of POSIX so
they would not get added.

It would be nice if the shell just had a way to perform these actions
efficiently; then you could call posix_spawn with:

"sh", "-c", "set_attrs_cmd && exec \"$@\"", "cmd", "arg1", ..., (void*)0

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-02-03 20:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-31 20:34 vfork replacement proposal Rich Felker
2013-02-03 18:49 ` Rich Felker
2013-02-03 20:36   ` Szabolcs Nagy
2013-02-03 20:47     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).