From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2524 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: vfork replacement proposal Date: Mon, 31 Dec 2012 15:34:17 -0500 Message-ID: <20121231203416.GA19960@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1356986068 4171 80.91.229.3 (31 Dec 2012 20:34:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 31 Dec 2012 20:34:28 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-2525-gllmg-musl=m.gmane.org@lists.openwall.com Mon Dec 31 21:34:44 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Tpm4K-00032H-CW for gllmg-musl@plane.gmane.org; Mon, 31 Dec 2012 21:34:44 +0100 Original-Received: (qmail 26278 invoked by uid 550); 31 Dec 2012 20:34:29 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 26270 invoked from network); 31 Dec 2012 20:34:28 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:2524 Archived-At: I've been looking for a viable replacement of the vfork usage in musl for a while, since it has two serious problems: 1. strace is buggy and causes the parent and child to run simultaneously on the same stack under vfork when the process is being traced. Binaries which can crash or go crazy under strace are highly undesirable, even if the fault is with strace. 2. While current compilers don't do this, the compiler is conceptually free to generate code that clobbers parts of the stack that still need to be used by the parent when it determines they are no longer needed in the child. The affected functions are posix_spawn[p], system, and popen. My new proposed design for these functions is: 1. Open a close-on-exec pipe. 2. Use clone with CLONE_VM|SIGCHLD as the flags to make a normal child process that shares VM but nothing else with the parent, and that runs a new function (rather than returning) on a small stack embedded in the caller's stack (e.g. a 1k automatic char array). 3. In the parent close the write end of the pipe and perform blocking read on the read end. 4. In the child, close the read end of the pipe and then shuffle file descriptors as needed (for setting up stdin/out for popen, or file actions for posix_spawn[p]), but with the added stipulations A-C: A. Before closing or dup2'ing onto a file descriptor in file actions, check to see if it's occupied by the pipe fd, and if so, use fcntl F_DUPFD_CLOEXEC to move it to a new number first. B. Before calling open in file actions, always use fcntl with F_DUPFD_CLOEXEC and close the original pipe fd, to ensure that the pipe is never occupying the otherwise-lowest-available fd number. C. Any failure to renumber the pipe fd as required in A-B is fatal. 5. On any failure in the child, write the error code for the failure to the pipe and _exit. This includes failure to renumber the pipe, or failure in the final call to an exec-family function. Otherwise the pipe closes on successful exec in the child. 6. If the parent reads 0 bytes (EOF) from the pipe, spawning the external process was successful. Otherwise, the error code is available indicating the cause of failure, and the cause can be reported to the calling program via a failure return value, instead of via immediate exit of the child process with result 127. This final point 6 makes the proposed new design superior to all existing implementations I know of: you get good data on the cause of failure in the parent rather than a false success followed by immediate exit with code 127 and no indication of the cause. The key breakthrough that made this design proposal possible was realizing that I can keep shuffling the pipe fd around in the child in a simple way that avoids interference with the POSIX spawn file actions. This is in contrast with the problem of determining in advance a "safe" fd number to locate the pipe on, which is a nontrivial problem when you can't know the existing set of open fds. Before I go trying to implement this, anyone see problems with it? Other comments? Rich