From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23471 invoked from network); 2 May 2022 21:19:12 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 2 May 2022 21:19:12 -0000 Received: (qmail 25788 invoked by uid 550); 2 May 2022 21:19:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 25753 invoked from network); 2 May 2022 21:19:09 -0000 Date: Mon, 2 May 2022 17:18:56 -0400 From: Rich Felker To: Alexey Izbyshev Cc: musl@lists.openwall.com Message-ID: <20220502211856.GR7074@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] vfork()-based posix_spawn() has more failure modes than fork()-based one On Mon, May 02, 2022 at 10:26:36PM +0300, Alexey Izbyshev wrote: > Hi, > > I was recently made aware via [1] that vfork() can have more failure > modes than fork() on Linux. The only case I know about is due to > Linux not allowing processes in different time namespaces to share > address space, but probably there are or will be more. An example is > below (requires Linux >= 5.6). > > $ cat test.c > #include > #include > #include > #include > #include > #include > > int main(int argc, char *argv[], char *envp[]) { > if (getenv("TEST_FORK")) { > pid_t pid = fork(); > if (pid < 0) { > perror("fork"); > return 127; > } > if (pid == 0) { > execve(argv[1], argv + 1, envp); > _exit(127); > } > } else { > int err = posix_spawn(0, argv[1], 0, 0, argv + 1, envp); > if (err) { > printf("posix_spawn: %s\n", strerror(err)); > return 127; > } > } > wait(NULL); > return 0; > } > > $ musl-gcc test.c > $ unshare -UrT ./a.out /bin/echo OK > posix_spawn: Invalid argument > $ TEST_FORK=1 unshare -UrT ./a.out /bin/echo OK > OK > > A common expectation from applications is that they can use > posix_spawn() as a drop-in replacement for fork()/exec() (when its > child-tweaking features are sufficient), but this case breaks the > expectation. Do you think it would make sense for musl to fallback > to fork() in case vfork() fails in posix_spawn()? > > I've also opened a bug about this in glibc[2]. Maybe libcs could > coordinate in how they handle this case. > > Alexey > > [1] https://github.com/python/cpython/issues/91307 > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=29115 I'm trying to understand how this comes to be. The child should inherit the namespaces of the parent and thus should not be in a different namespace that precludes spawn. I'm guessing this is some oddity where unshare doesn't affect the process itself, only its children? If so, it seems like a bug that it doesn't affect the process itself after execve (after unshare(1) runs your test program), but that probably can't be fixed now on the Linux side for stability reasons. :( For what it's worth, I feel like the answer here is really that you can't expect everything (or anything) to work after you've created a bad or inconsistent process state, which can be done in various ways like using unshare(2) in certain ways a multithreaded process, certain manual uses of clone(2), etc. Apparently unsharing time ns is one of those things too, and if it behaves the way it seems to, I don't think you can use it at all without an extra fork (adding -f to the unshare(1) command line). Otherwise the top-level process in your "container" and its children will be in different time namespaces, which is not at all what you would want anyway. We probably could make posix_spawn retry __clone without CLONE_VM if if fails with certain errors, as long as those errors are non-ambiguous about indicating a need for retry. I don't see EINVAL documented as being possible for any cases that would need to be treated as errors, but then again it doesn't seem to be documented for this corner case you found either. Rich