From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8217 Path: news.gmane.org!not-for-mail From: Andy Lutomirski Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Using direct socket syscalls on x86_32 where available? Date: Mon, 27 Jul 2015 18:04:11 -0700 Message-ID: References: <20150726165907.GM16376@brightrain.aerifal.cx> <55B6C543.1020108@amacapital.net> <20150728004528.GQ16376@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1438045486 1613 80.91.229.3 (28 Jul 2015 01:04:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 28 Jul 2015 01:04:46 +0000 (UTC) Cc: "musl@lists.openwall.com" , Alexander Larsson To: Rich Felker Original-X-From: musl-return-8230-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jul 28 03:04:45 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZJtK0-0007Tx-Hz for gllmg-musl@m.gmane.org; Tue, 28 Jul 2015 03:04:44 +0200 Original-Received: (qmail 28161 invoked by uid 550); 28 Jul 2015 01:04:43 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 28143 invoked from network); 28 Jul 2015 01:04:42 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=uaWcKLy6XK48tcIJLMOcIK7U786Kgiv4xp76VFkWqZI=; b=jm5UP7/j3VBfaN2CrUSp092ddGOO3fpzwB0yt13LiesPnzvSlNuc92rluR6a7UzRSD 5/BsFgJiu2NOnZnUK9Z9B/RpHWZRqC28AyBz+hO9mcMitYzznZY2mCmWBVeqyO1eeGBO 2Lk1kIw04N6BhXG6JV0OYRzo0H5jIkHmYhpo5XLMXeOY3huu3EUQ1iw0vWMUl+Ejm1B8 rPX9PqixrS7xWn4HFuqe4jMr5Ldgx1cg/ihesFUS6w+JT71VyUJI9RkJrkHUpLM8195v gdUX+N4z5Ja/pbg/sT2vZhGqb2pBgD0AZXWiF4MZoj3p1rmgFdpKjMTAfbYvhoudXqQU BdwA== X-Gm-Message-State: ALoCoQnN7NgowaaYYVC/aOKTqvKpTQ6ASYu6XSRxXzSq5xW57uJznKsK5Zwfb+D5LZb5+3qWnTaZ X-Received: by 10.152.170.130 with SMTP id am2mr30357352lac.54.1438045471375; Mon, 27 Jul 2015 18:04:31 -0700 (PDT) In-Reply-To: <20150728004528.GQ16376@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:8217 Archived-At: On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker wrote: > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: >> On 07/26/2015 09:59 AM, Rich Felker wrote: >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: >> >>On x86_32, the only way to call socket(2), etc is using socketcall. >> >>This is slated to change in Linux 4.3: >> >> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb >> >> >> >>If userspace adapts by preferring the direct syscalls when available, >> >>it'll make it easier for seccomp to filter new userspace programs >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). >> >> >> >>Would musl be willing to detect these syscalls and use them if available? >> >> >> >>(Code to do this probably shouldn't be committed until that change >> >>lands in Linus' tree, just in case the syscall numbers change in the >> >>mean time.) >> > >> >My preference would be not to do this, since it seems to be enlarging >> >the code and pessimizing normal usage for the sake of a very special >> >usage scenario. At the very least there would be at least one extra >> >syscall to probe at first usage, and that probe could generate a >> >termination on existing seccomp setups. :-p >> >> There will be some tiny performance benefit for newer kernels: it >> avoids a silly indirection that has a switch statement along six >> stores into memory, validation of the userspace address, and then >> six loads to pull the syscall args back out of memory. It's not a >> big deal, but the new syscalls really will be slightly faster. > > Unless you're going to try the new syscalls first and fallback on > ENOSYS every time... > >> >So far we don't probe and >> >store results for any fallbacks though; we just do the fallback on >> >error every time. This is because all of the existing fallbacks are in >> >places where we actually want new functionality a new syscall offers, >> >and the old ones are not able to provide it precisely but require poor >> >emulation, and in these cases it's expected that the user not be using >> >old kernels that can't give correct semantics. But in the case of >> >these socket calls there's no semantic difference or reason for us to >> >be preferring the 'new' calls. It's just a duplicate API for the same >> >thing. >> >> One way to implement it would be to favor the new syscalls but to >> set some variable the first time one of them returns ENOSYS. Once >> that happens, either all of them could fall back to socketcall or >> just that one syscall could. > > ...right, a global. Which requires a barrier to access it. A barrier > costs a lot more than a few loads or a switch. Not on x86, and this is as x86-specific as it gets. In fact, I bet the totally untested code below is actually safe on pretty much any architecture that has free C11-style relaxed loads (and this code could even be switched to use actual C11 relaxed loads): volatile int socket_is_okay = true; if (socket_is_okay) { ret = socket(...); if (ret < 0) { if (ret == -ENOSYS) { socket_is_okay = false; } else { errno = -ret; return -1; } return ret; } else { usual socketcall code here; } > >> Or you could just avoid implementing it and see if anyone complains. >> It's plausible that xdg-app might start requiring the new syscalls >> (although it would presumably not kill you if tried to use >> socketcall). >> >> Alex, if glibc started using the new syscalls, would you want to >> require them inside xdg-app? > > I don't see any reason to require them except forcing policy. And I > don't see any reason for adding them to the kernel to begin with. > While we would have been better off with proper syscalls for each one > rather than this multiplexed mess if it had been done right from the > beginning, having to support both is even worse than the existing > multiplexed socketcall. Worse for libc implementations, certainly. On the other hand, the ability to cleanly limit address families and such is genuinely useful, and deployed software does it on x86_64. It's not really possible with current kernels on x86_32, but, with these patches, it becomes possible on x86_32 as long as libc implementations play along and sandbox implementations are willing to force their payloads to use new enough libc implementations. If I were porting something like Sandstorm to x86_32 and glibc supported the new syscalls, this would be a no-brainer for me. I'd simply block socketcall entirely (returning -ENOSYS) in the container, and anyone providing an app that wants to use sockets has to link against new glibc. Keep in mind that socket(2) with unrestricted address family is a big attack surface and is historically full of nasty vulnerabilities. --Andy