From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6523 Path: news.gmane.org!not-for-mail From: Andy Lutomirski Newsgroups: gmane.linux.kernel.api,gmane.comp.lib.glibc.alpha,gmane.linux.lib.musl.general Subject: Re: [RFC] Possible new execveat(2) Linux syscall Date: Sun, 16 Nov 2014 14:34:32 -0800 Message-ID: References: <20141116195246.GX22465@brightrain.aerifal.cx> <20141116220859.GY22465@brightrain.aerifal.cx> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1416177304 31143 80.91.229.3 (16 Nov 2014 22:35:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 16 Nov 2014 22:35:04 +0000 (UTC) Cc: libc-alpha , musl-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org, Andrew Morton , David Drysdale , Linux API , Christoph Hellwig To: Rich Felker Original-X-From: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Sun Nov 16 23:34:56 2014 Return-path: Envelope-to: glka-linux-api-wOFGN7rlS/M9smdsby/KFg@public.gmane.org Original-Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xq8PH-0002P0-MC for glka-linux-api-wOFGN7rlS/M9smdsby/KFg@public.gmane.org; Sun, 16 Nov 2014 23:34:56 +0100 Original-Received: (majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org) by vger.kernel.org via listexpand id S1752801AbaKPWey (ORCPT ); Sun, 16 Nov 2014 17:34:54 -0500 Original-Received: from mail-lb0-f180.google.com ([209.85.217.180]:51915 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752799AbaKPWey (ORCPT ); Sun, 16 Nov 2014 17:34:54 -0500 Original-Received: by mail-lb0-f180.google.com with SMTP id z11so8388157lbi.11 for ; Sun, 16 Nov 2014 14:34:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=+Vcuhss3b7yS03i4tAeOUoKbOF27UmoKJ1CtwJuBStE=; b=TWGqoWUWzZF9lM13GLHx5jpEm+uSvtl7iLO/+bzRFnOyu65tE41XPlzann90qEjXlk qaX2wkNZhmWTkcL1y3inLIh90uv7/vjg44UnVDXpbUMgY/EgmirY5DyHvJlRTTU5kbC/ 0IHjyVbLlq72yM/1Ua4uz1YWXYEnH8fwiXbSIlDEVrr4NNX66BlWjI6xrup2ZMD8DI1W y0GsfH8mP+NjLvj7WwmUeHAfM+ZebTkOUsaXlWqJlWdRyhOb4OKzQ+dpttYSBXuAxA/q dyBqO5EivPRpwm90yJhWMvKzS7qDoGzd2mV9xcCdFpkUfcqd+F1qxuI+CmIp6GZFTEsI VuEw== X-Gm-Message-State: ALoCoQmZE5zRCKkpwEZxhdi4gyS0z/KErJGfa14sP2HMbOfJRjPpIXvuAF1k7WX2vMzBjUWdJ1EF X-Received: by 10.112.219.3 with SMTP id pk3mr23145039lbc.18.1416177292476; Sun, 16 Nov 2014 14:34:52 -0800 (PST) Original-Received: by 10.152.7.170 with HTTP; Sun, 16 Nov 2014 14:34:32 -0800 (PST) In-Reply-To: <20141116220859.GY22465-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org> Original-Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Precedence: bulk List-ID: X-Mailing-List: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Xref: news.gmane.org gmane.linux.kernel.api:6167 gmane.comp.lib.glibc.alpha:46710 gmane.linux.lib.musl.general:6523 Archived-At: On Sun, Nov 16, 2014 at 2:08 PM, Rich Felker wrote: > On Sun, Nov 16, 2014 at 01:20:39PM -0800, Andy Lutomirski wrote: >> On Nov 16, 2014 11:53 AM, "Rich Felker" wrote: >> > >> > On Fri, Nov 14, 2014 at 02:54:19PM +0000, David Drysdale wrote: >> > > Hi, >> > > >> > > Over at the LKML[1] we've been discussing a possible new syscall, execveat(2), >> > > and it would be good to hear a glibc perspective about it (and whether there >> > > are any interface changes that would make it easier to use from userspace). >> > > >> > > The syscall prototype is: >> > > int execveat(int fd, const char *pathname, >> > > char *const argv[], char *const envp[], >> > > int flags); /* AT_EMPTY_PATH, AT_SYMLINK_NOFOLLOW */ >> > > and it works similarly to execve(2) except: >> > > - the executable to run is identified by the combination of fd+pathname, like >> > > other *at(2) syscalls >> > > - there's an extra flags field to control behaviour. >> > > (I've attached a text version of the suggested man page below) >> > > >> > > One particular benefit of this is that it allows an fexecve(3) implementation >> > > that doesn't rely on /proc being accessible, which is useful for sandboxed >> > > applications. (However, that does only work for non-interpreted programs: >> > > the name passed to a script interpreter is of the form "/dev/fd//" >> > > or "/dev/fd/", so the executed interpreter will normally still need /proc >> > > access to load the script file). >> > > >> > > How does this sound from a glibc perspective? >> > >> > I've been following the discussions so far and everything looks mostly >> > okay. There are still issues to be resolved with the different >> > semantics between Linux O_PATH and what POSIX requires for O_EXEC (and >> > O_SEARCH) but as long as the intent is that, once O_EXEC is defined to >> > save the permissions at the time of open and cause them to be used in >> > place of the current file permissions at the time of execveat >> >> Is something missing here? >> >> FWIW, I don't understand O_PATH or O_EXEC very well, so from my POV, >> help would be appreciated. > > Yes. POSIX requires that permission checks for execution (fexecve with > O_EXEC file descriptors) and directory-search (*at functions with > O_SEARCH file descriptors) succeed if the open operation succeeded -- > the permissions check is required to take place at open time rather > than at exec/search time. There's a separate discussion about how to > make this work on the kernel side. It may be worth making this work as part of adding execveat to the kernel. Does the kernel even have O_EXEC right now? > >> > One major issue however is FD_CLOEXEC with scripts. Last I checked, >> > this didn't work because the file is already closed by the time the >> > interpreted runs. The intended usage of fexecve is almost certainly to >> > call it with the file descriptor set close-on-exec; otherwise, there >> > would be no clean way to close it, since the program being executed >> > doesn't know that it's being executed via fexecve. So this is a >> > serious problem that needs to be solved if it hasn't already. I have >> > some ideas I could offer, but I'm not an expert on the kernel side >> > things so I'm not sure they'd be correct. >> >> Bring on the ideas. > > My thought is that when the kernel opens the binary and sees that it's > a script that needs an interpreter, the kernel should not pass > /proc/self/fd/%d to the interpreter, but instead should pass the name > of a new magic symlink in /proc/self that's connected to the inode for > the script to be executed but that ceases to exist as soon as it's > opened. In theory this could also be used for suid scripts to make > them secure. This doesn't help if /proc is not mounted, which is an important use case. > >> FWIW, I've often thought that interpreter binaries should mark >> themselves as such to enable better interactions with the kernel. > > That's hard since users expect to be able to use arbitrary > interpreters (and sometimes even pass through multiple ones, e.g. > #!/usr/bin/env perl). > Hmm. I'd be okay with old interpreters having a somewhat degraded experience. I guess that #!/some/interpreted/script isn't allowed, but maybe #!/usr/bin/env some-interpreted-script should work. It could be that all that's really needed is some convention to tell an interpreter that it should use fd N as a script *and close it*. Something like /dev/fd_and_close/N could work, but that has all kinds of problems. Alternatively, if we could have a way to mark an fd so that it's close-on-exec after exec, that would solve the nesting problem, as long as every interpreter in the chain does it. And the kernel could certainly implement execve on a close-on-exec fd by passing /dev/fd/N where N is a close-on-exec fd, at least in the non-nested case. --Andy > Rich -- Andy Lutomirski AMA Capital Management, LLC