* Using direct socket syscalls on x86_32 where available? @ 2015-07-25 17:54 Andy Lutomirski 2015-07-25 18:35 ` Szabolcs Nagy 2015-07-26 16:59 ` Rich Felker 0 siblings, 2 replies; 19+ messages in thread From: Andy Lutomirski @ 2015-07-25 17:54 UTC (permalink / raw) To: musl On x86_32, the only way to call socket(2), etc is using socketcall. This is slated to change in Linux 4.3: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb If userspace adapts by preferring the direct syscalls when available, it'll make it easier for seccomp to filter new userspace programs (and, ideally, eventually disallow socketcall for sandbox-aware code). Would musl be willing to detect these syscalls and use them if available? (Code to do this probably shouldn't be committed until that change lands in Linus' tree, just in case the syscall numbers change in the mean time.) --Andy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Using direct socket syscalls on x86_32 where available? 2015-07-25 17:54 Using direct socket syscalls on x86_32 where available? Andy Lutomirski @ 2015-07-25 18:35 ` Szabolcs Nagy 2015-07-26 16:33 ` Justin Cormack 2015-07-26 16:59 ` Rich Felker 1 sibling, 1 reply; 19+ messages in thread From: Szabolcs Nagy @ 2015-07-25 18:35 UTC (permalink / raw) To: musl * Andy Lutomirski <luto@amacapital.net> [2015-07-25 10:54:28 -0700]: > If userspace adapts by preferring the direct syscalls when available, > it'll make it easier for seccomp to filter new userspace programs > (and, ideally, eventually disallow socketcall for sandbox-aware code). btw is there a nice cmdline tool for seccomp now? or is the api still manual construction of bpf byte code in c? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Using direct socket syscalls on x86_32 where available? 2015-07-25 18:35 ` Szabolcs Nagy @ 2015-07-26 16:33 ` Justin Cormack 0 siblings, 0 replies; 19+ messages in thread From: Justin Cormack @ 2015-07-26 16:33 UTC (permalink / raw) To: musl On 25 July 2015 at 19:35, Szabolcs Nagy <nsz@port70.net> wrote: > * Andy Lutomirski <luto@amacapital.net> [2015-07-25 10:54:28 -0700]: >> If userspace adapts by preferring the direct syscalls when available, >> it'll make it easier for seccomp to filter new userspace programs >> (and, ideally, eventually disallow socketcall for sandbox-aware code). > > btw is there a nice cmdline tool for seccomp now? > > or is the api still manual construction of bpf byte code in c? libseccomp https://github.com/seccomp/libseccomp is the standard library to use (its an Alpine now); there is not a command line tool that I am aware of. Justin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Using direct socket syscalls on x86_32 where available? 2015-07-25 17:54 Using direct socket syscalls on x86_32 where available? Andy Lutomirski 2015-07-25 18:35 ` Szabolcs Nagy @ 2015-07-26 16:59 ` Rich Felker 2015-07-27 23:56 ` Andy Lutomirski 1 sibling, 1 reply; 19+ messages in thread From: Rich Felker @ 2015-07-26 16:59 UTC (permalink / raw) To: musl On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: > On x86_32, the only way to call socket(2), etc is using socketcall. > This is slated to change in Linux 4.3: > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb > > If userspace adapts by preferring the direct syscalls when available, > it'll make it easier for seccomp to filter new userspace programs > (and, ideally, eventually disallow socketcall for sandbox-aware code). > > Would musl be willing to detect these syscalls and use them if available? > > (Code to do this probably shouldn't be committed until that change > lands in Linus' tree, just in case the syscall numbers change in the > mean time.) My preference would be not to do this, since it seems to be enlarging the code and pessimizing normal usage for the sake of a very special usage scenario. At the very least there would be at least one extra syscall to probe at first usage, and that probe could generate a termination on existing seccomp setups. :-p So far we don't probe and store results for any fallbacks though; we just do the fallback on error every time. This is because all of the existing fallbacks are in places where we actually want new functionality a new syscall offers, and the old ones are not able to provide it precisely but require poor emulation, and in these cases it's expected that the user not be using old kernels that can't give correct semantics. But in the case of these socket calls there's no semantic difference or reason for us to be preferring the 'new' calls. It's just a duplicate API for the same thing. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Using direct socket syscalls on x86_32 where available? 2015-07-26 16:59 ` Rich Felker @ 2015-07-27 23:56 ` Andy Lutomirski 2015-07-28 0:45 ` Rich Felker 2015-07-28 7:44 ` Alexander Larsson 0 siblings, 2 replies; 19+ messages in thread From: Andy Lutomirski @ 2015-07-27 23:56 UTC (permalink / raw) To: musl, Alexander Larsson On 07/26/2015 09:59 AM, Rich Felker wrote: > On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: >> On x86_32, the only way to call socket(2), etc is using socketcall. >> This is slated to change in Linux 4.3: >> >> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb >> >> If userspace adapts by preferring the direct syscalls when available, >> it'll make it easier for seccomp to filter new userspace programs >> (and, ideally, eventually disallow socketcall for sandbox-aware code). >> >> Would musl be willing to detect these syscalls and use them if available? >> >> (Code to do this probably shouldn't be committed until that change >> lands in Linus' tree, just in case the syscall numbers change in the >> mean time.) > > My preference would be not to do this, since it seems to be enlarging > the code and pessimizing normal usage for the sake of a very special > usage scenario. At the very least there would be at least one extra > syscall to probe at first usage, and that probe could generate a > termination on existing seccomp setups. :-p There will be some tiny performance benefit for newer kernels: it avoids a silly indirection that has a switch statement along six stores into memory, validation of the userspace address, and then six loads to pull the syscall args back out of memory. It's not a big deal, but the new syscalls really will be slightly faster. > So far we don't probe and > store results for any fallbacks though; we just do the fallback on > error every time. This is because all of the existing fallbacks are in > places where we actually want new functionality a new syscall offers, > and the old ones are not able to provide it precisely but require poor > emulation, and in these cases it's expected that the user not be using > old kernels that can't give correct semantics. But in the case of > these socket calls there's no semantic difference or reason for us to > be preferring the 'new' calls. It's just a duplicate API for the same > thing. One way to implement it would be to favor the new syscalls but to set some variable the first time one of them returns ENOSYS. Once that happens, either all of them could fall back to socketcall or just that one syscall could. Or you could just avoid implementing it and see if anyone complains. It's plausible that xdg-app might start requiring the new syscalls (although it would presumably not kill you if tried to use socketcall). Alex, if glibc started using the new syscalls, would you want to require them inside xdg-app? --Andy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-27 23:56 ` Andy Lutomirski @ 2015-07-28 0:45 ` Rich Felker 2015-07-28 1:04 ` Andy Lutomirski 2015-07-28 7:44 ` Alexander Larsson 1 sibling, 1 reply; 19+ messages in thread From: Rich Felker @ 2015-07-28 0:45 UTC (permalink / raw) To: Andy Lutomirski; +Cc: musl, Alexander Larsson On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: > On 07/26/2015 09:59 AM, Rich Felker wrote: > >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: > >>On x86_32, the only way to call socket(2), etc is using socketcall. > >>This is slated to change in Linux 4.3: > >> > >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb > >> > >>If userspace adapts by preferring the direct syscalls when available, > >>it'll make it easier for seccomp to filter new userspace programs > >>(and, ideally, eventually disallow socketcall for sandbox-aware code). > >> > >>Would musl be willing to detect these syscalls and use them if available? > >> > >>(Code to do this probably shouldn't be committed until that change > >>lands in Linus' tree, just in case the syscall numbers change in the > >>mean time.) > > > >My preference would be not to do this, since it seems to be enlarging > >the code and pessimizing normal usage for the sake of a very special > >usage scenario. At the very least there would be at least one extra > >syscall to probe at first usage, and that probe could generate a > >termination on existing seccomp setups. :-p > > There will be some tiny performance benefit for newer kernels: it > avoids a silly indirection that has a switch statement along six > stores into memory, validation of the userspace address, and then > six loads to pull the syscall args back out of memory. It's not a > big deal, but the new syscalls really will be slightly faster. Unless you're going to try the new syscalls first and fallback on ENOSYS every time... > >So far we don't probe and > >store results for any fallbacks though; we just do the fallback on > >error every time. This is because all of the existing fallbacks are in > >places where we actually want new functionality a new syscall offers, > >and the old ones are not able to provide it precisely but require poor > >emulation, and in these cases it's expected that the user not be using > >old kernels that can't give correct semantics. But in the case of > >these socket calls there's no semantic difference or reason for us to > >be preferring the 'new' calls. It's just a duplicate API for the same > >thing. > > One way to implement it would be to favor the new syscalls but to > set some variable the first time one of them returns ENOSYS. Once > that happens, either all of them could fall back to socketcall or > just that one syscall could. ...right, a global. Which requires a barrier to access it. A barrier costs a lot more than a few loads or a switch. > Or you could just avoid implementing it and see if anyone complains. > It's plausible that xdg-app might start requiring the new syscalls > (although it would presumably not kill you if tried to use > socketcall). > > Alex, if glibc started using the new syscalls, would you want to > require them inside xdg-app? I don't see any reason to require them except forcing policy. And I don't see any reason for adding them to the kernel to begin with. While we would have been better off with proper syscalls for each one rather than this multiplexed mess if it had been done right from the beginning, having to support both is even worse than the existing multiplexed socketcall. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-28 0:45 ` Rich Felker @ 2015-07-28 1:04 ` Andy Lutomirski 2015-07-28 1:21 ` Rich Felker 0 siblings, 1 reply; 19+ messages in thread From: Andy Lutomirski @ 2015-07-28 1:04 UTC (permalink / raw) To: Rich Felker; +Cc: musl, Alexander Larsson On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@libc.org> wrote: > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: >> On 07/26/2015 09:59 AM, Rich Felker wrote: >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: >> >>On x86_32, the only way to call socket(2), etc is using socketcall. >> >>This is slated to change in Linux 4.3: >> >> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb >> >> >> >>If userspace adapts by preferring the direct syscalls when available, >> >>it'll make it easier for seccomp to filter new userspace programs >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). >> >> >> >>Would musl be willing to detect these syscalls and use them if available? >> >> >> >>(Code to do this probably shouldn't be committed until that change >> >>lands in Linus' tree, just in case the syscall numbers change in the >> >>mean time.) >> > >> >My preference would be not to do this, since it seems to be enlarging >> >the code and pessimizing normal usage for the sake of a very special >> >usage scenario. At the very least there would be at least one extra >> >syscall to probe at first usage, and that probe could generate a >> >termination on existing seccomp setups. :-p >> >> There will be some tiny performance benefit for newer kernels: it >> avoids a silly indirection that has a switch statement along six >> stores into memory, validation of the userspace address, and then >> six loads to pull the syscall args back out of memory. It's not a >> big deal, but the new syscalls really will be slightly faster. > > Unless you're going to try the new syscalls first and fallback on > ENOSYS every time... > >> >So far we don't probe and >> >store results for any fallbacks though; we just do the fallback on >> >error every time. This is because all of the existing fallbacks are in >> >places where we actually want new functionality a new syscall offers, >> >and the old ones are not able to provide it precisely but require poor >> >emulation, and in these cases it's expected that the user not be using >> >old kernels that can't give correct semantics. But in the case of >> >these socket calls there's no semantic difference or reason for us to >> >be preferring the 'new' calls. It's just a duplicate API for the same >> >thing. >> >> One way to implement it would be to favor the new syscalls but to >> set some variable the first time one of them returns ENOSYS. Once >> that happens, either all of them could fall back to socketcall or >> just that one syscall could. > > ...right, a global. Which requires a barrier to access it. A barrier > costs a lot more than a few loads or a switch. Not on x86, and this is as x86-specific as it gets. In fact, I bet the totally untested code below is actually safe on pretty much any architecture that has free C11-style relaxed loads (and this code could even be switched to use actual C11 relaxed loads): volatile int socket_is_okay = true; if (socket_is_okay) { ret = socket(...); if (ret < 0) { if (ret == -ENOSYS) { socket_is_okay = false; } else { errno = -ret; return -1; } return ret; } else { usual socketcall code here; } > >> Or you could just avoid implementing it and see if anyone complains. >> It's plausible that xdg-app might start requiring the new syscalls >> (although it would presumably not kill you if tried to use >> socketcall). >> >> Alex, if glibc started using the new syscalls, would you want to >> require them inside xdg-app? > > I don't see any reason to require them except forcing policy. And I > don't see any reason for adding them to the kernel to begin with. > While we would have been better off with proper syscalls for each one > rather than this multiplexed mess if it had been done right from the > beginning, having to support both is even worse than the existing > multiplexed socketcall. Worse for libc implementations, certainly. On the other hand, the ability to cleanly limit address families and such is genuinely useful, and deployed software does it on x86_64. It's not really possible with current kernels on x86_32, but, with these patches, it becomes possible on x86_32 as long as libc implementations play along and sandbox implementations are willing to force their payloads to use new enough libc implementations. If I were porting something like Sandstorm to x86_32 and glibc supported the new syscalls, this would be a no-brainer for me. I'd simply block socketcall entirely (returning -ENOSYS) in the container, and anyone providing an app that wants to use sockets has to link against new glibc. Keep in mind that socket(2) with unrestricted address family is a big attack surface and is historically full of nasty vulnerabilities. --Andy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-28 1:04 ` Andy Lutomirski @ 2015-07-28 1:21 ` Rich Felker 2015-07-28 1:38 ` Andy Lutomirski 0 siblings, 1 reply; 19+ messages in thread From: Rich Felker @ 2015-07-28 1:21 UTC (permalink / raw) To: Andy Lutomirski; +Cc: musl, Alexander Larsson On Mon, Jul 27, 2015 at 06:04:11PM -0700, Andy Lutomirski wrote: > On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@libc.org> wrote: > > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: > >> On 07/26/2015 09:59 AM, Rich Felker wrote: > >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: > >> >>On x86_32, the only way to call socket(2), etc is using socketcall. > >> >>This is slated to change in Linux 4.3: > >> >> > >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb > >> >> > >> >>If userspace adapts by preferring the direct syscalls when available, > >> >>it'll make it easier for seccomp to filter new userspace programs > >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). > >> >> > >> >>Would musl be willing to detect these syscalls and use them if available? > >> >> > >> >>(Code to do this probably shouldn't be committed until that change > >> >>lands in Linus' tree, just in case the syscall numbers change in the > >> >>mean time.) > >> > > >> >My preference would be not to do this, since it seems to be enlarging > >> >the code and pessimizing normal usage for the sake of a very special > >> >usage scenario. At the very least there would be at least one extra > >> >syscall to probe at first usage, and that probe could generate a > >> >termination on existing seccomp setups. :-p > >> > >> There will be some tiny performance benefit for newer kernels: it > >> avoids a silly indirection that has a switch statement along six > >> stores into memory, validation of the userspace address, and then > >> six loads to pull the syscall args back out of memory. It's not a > >> big deal, but the new syscalls really will be slightly faster. > > > > Unless you're going to try the new syscalls first and fallback on > > ENOSYS every time... > > > >> >So far we don't probe and > >> >store results for any fallbacks though; we just do the fallback on > >> >error every time. This is because all of the existing fallbacks are in > >> >places where we actually want new functionality a new syscall offers, > >> >and the old ones are not able to provide it precisely but require poor > >> >emulation, and in these cases it's expected that the user not be using > >> >old kernels that can't give correct semantics. But in the case of > >> >these socket calls there's no semantic difference or reason for us to > >> >be preferring the 'new' calls. It's just a duplicate API for the same > >> >thing. > >> > >> One way to implement it would be to favor the new syscalls but to > >> set some variable the first time one of them returns ENOSYS. Once > >> that happens, either all of them could fall back to socketcall or > >> just that one syscall could. > > > > ...right, a global. Which requires a barrier to access it. A barrier > > costs a lot more than a few loads or a switch. > > Not on x86, and this is as x86-specific as it gets. In fact, I bet Is x86 really the only arch that needs socketcall multiplexing? If so that makes transitioning more attractive. I thought at least a few others needed it too. > the totally untested code below is actually safe on pretty much any > architecture that has free C11-style relaxed loads (and this code > could even be switched to use actual C11 relaxed loads): > > volatile int socket_is_okay = true; > > if (socket_is_okay) { > ret = socket(...); > if (ret < 0) { > if (ret == -ENOSYS) { > socket_is_okay = false; > } else { > errno = -ret; > return -1; > } > > return ret; > } else { > usual socketcall code here; > } This is probably workable with volatile there. Without volatile the x86 memory model does not help you; the compiler can make transformations that would make it unsafe even if the machine code you expected the compiler to generate would be safe. But I still don't like hacks like this. It's a big mess to keep it from getting used on non-x86 where it would be invalid/unsafe. > >> Or you could just avoid implementing it and see if anyone complains. > >> It's plausible that xdg-app might start requiring the new syscalls > >> (although it would presumably not kill you if tried to use > >> socketcall). > >> > >> Alex, if glibc started using the new syscalls, would you want to > >> require them inside xdg-app? > > > > I don't see any reason to require them except forcing policy. And I > > don't see any reason for adding them to the kernel to begin with. > > While we would have been better off with proper syscalls for each one > > rather than this multiplexed mess if it had been done right from the > > beginning, having to support both is even worse than the existing > > multiplexed socketcall. > > Worse for libc implementations, certainly. On the other hand, the > ability to cleanly limit address families and such is genuinely > useful, and deployed software does it on x86_64. It's not really > possible with current kernels on x86_32, but, with these patches, it > becomes possible on x86_32 as long as libc implementations play along > and sandbox implementations are willing to force their payloads to use > new enough libc implementations. > > If I were porting something like Sandstorm to x86_32 and glibc > supported the new syscalls, this would be a no-brainer for me. I'd > simply block socketcall entirely (returning -ENOSYS) in the container, > and anyone providing an app that wants to use sockets has to link > against new glibc. Doing that would create a hard dependency on latest glibc and latest kernel, which would be a show-stopper for use on Debian, etc. :-) > Keep in mind that socket(2) with unrestricted address family is a big > attack surface and is historically full of nasty vulnerabilities. Yes, but this is largely the fault of distros for enabling all sorts of ridiculous address families that nobody needs. If you just enable inet4/6 and unix, it's not such a problem. Anyway if x86 really is the only arch where this is needed, or if any other stragglers are also going to be updated alongside x86, I'm open to considering supporting the new syscalls. We just need to figure out a reasonable way to do it. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-28 1:21 ` Rich Felker @ 2015-07-28 1:38 ` Andy Lutomirski 2015-07-28 12:05 ` Szabolcs Nagy 0 siblings, 1 reply; 19+ messages in thread From: Andy Lutomirski @ 2015-07-28 1:38 UTC (permalink / raw) To: Rich Felker; +Cc: musl, Alexander Larsson On Mon, Jul 27, 2015 at 6:21 PM, Rich Felker <dalias@libc.org> wrote: > On Mon, Jul 27, 2015 at 06:04:11PM -0700, Andy Lutomirski wrote: >> On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@libc.org> wrote: >> > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: >> >> On 07/26/2015 09:59 AM, Rich Felker wrote: >> >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: >> >> >>On x86_32, the only way to call socket(2), etc is using socketcall. >> >> >>This is slated to change in Linux 4.3: >> >> >> >> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb >> >> >> >> >> >>If userspace adapts by preferring the direct syscalls when available, >> >> >>it'll make it easier for seccomp to filter new userspace programs >> >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). >> >> >> >> >> >>Would musl be willing to detect these syscalls and use them if available? >> >> >> >> >> >>(Code to do this probably shouldn't be committed until that change >> >> >>lands in Linus' tree, just in case the syscall numbers change in the >> >> >>mean time.) >> >> > >> >> >My preference would be not to do this, since it seems to be enlarging >> >> >the code and pessimizing normal usage for the sake of a very special >> >> >usage scenario. At the very least there would be at least one extra >> >> >syscall to probe at first usage, and that probe could generate a >> >> >termination on existing seccomp setups. :-p >> >> >> >> There will be some tiny performance benefit for newer kernels: it >> >> avoids a silly indirection that has a switch statement along six >> >> stores into memory, validation of the userspace address, and then >> >> six loads to pull the syscall args back out of memory. It's not a >> >> big deal, but the new syscalls really will be slightly faster. >> > >> > Unless you're going to try the new syscalls first and fallback on >> > ENOSYS every time... >> > >> >> >So far we don't probe and >> >> >store results for any fallbacks though; we just do the fallback on >> >> >error every time. This is because all of the existing fallbacks are in >> >> >places where we actually want new functionality a new syscall offers, >> >> >and the old ones are not able to provide it precisely but require poor >> >> >emulation, and in these cases it's expected that the user not be using >> >> >old kernels that can't give correct semantics. But in the case of >> >> >these socket calls there's no semantic difference or reason for us to >> >> >be preferring the 'new' calls. It's just a duplicate API for the same >> >> >thing. >> >> >> >> One way to implement it would be to favor the new syscalls but to >> >> set some variable the first time one of them returns ENOSYS. Once >> >> that happens, either all of them could fall back to socketcall or >> >> just that one syscall could. >> > >> > ...right, a global. Which requires a barrier to access it. A barrier >> > costs a lot more than a few loads or a switch. >> >> Not on x86, and this is as x86-specific as it gets. In fact, I bet > > Is x86 really the only arch that needs socketcall multiplexing? If so > that makes transitioning more attractive. I thought at least a few > others needed it too. > I'll try to figure out whether there are others and submit patches. >> the totally untested code below is actually safe on pretty much any >> architecture that has free C11-style relaxed loads (and this code >> could even be switched to use actual C11 relaxed loads): >> >> volatile int socket_is_okay = true; >> >> if (socket_is_okay) { >> ret = socket(...); >> if (ret < 0) { >> if (ret == -ENOSYS) { >> socket_is_okay = false; >> } else { >> errno = -ret; >> return -1; >> } >> >> return ret; >> } else { >> usual socketcall code here; >> } > > This is probably workable with volatile there. Without volatile the > x86 memory model does not help you; the compiler can make > transformations that would make it unsafe even if the machine code you > expected the compiler to generate would be safe. But I still don't > like hacks like this. It's a big mess to keep it from getting used on > non-x86 where it would be invalid/unsafe. Why's it unsafe on non-x86? I think it's safe if all those volatile accesses are replaced with standard C11 relaxed accesses. The only thing that code requires for correctness is that a relaxed read never returns a result that never was nor will be written. > >> >> Or you could just avoid implementing it and see if anyone complains. >> >> It's plausible that xdg-app might start requiring the new syscalls >> >> (although it would presumably not kill you if tried to use >> >> socketcall). >> >> >> >> Alex, if glibc started using the new syscalls, would you want to >> >> require them inside xdg-app? >> > >> > I don't see any reason to require them except forcing policy. And I >> > don't see any reason for adding them to the kernel to begin with. >> > While we would have been better off with proper syscalls for each one >> > rather than this multiplexed mess if it had been done right from the >> > beginning, having to support both is even worse than the existing >> > multiplexed socketcall. >> >> Worse for libc implementations, certainly. On the other hand, the >> ability to cleanly limit address families and such is genuinely >> useful, and deployed software does it on x86_64. It's not really >> possible with current kernels on x86_32, but, with these patches, it >> becomes possible on x86_32 as long as libc implementations play along >> and sandbox implementations are willing to force their payloads to use >> new enough libc implementations. >> >> If I were porting something like Sandstorm to x86_32 and glibc >> supported the new syscalls, this would be a no-brainer for me. I'd >> simply block socketcall entirely (returning -ENOSYS) in the container, >> and anyone providing an app that wants to use sockets has to link >> against new glibc. > > Doing that would create a hard dependency on latest glibc and latest > kernel, which would be a show-stopper for use on Debian, etc. :-) It only requires the payload to depend on the latest glibc, though, and the payload might be a binary from elsewhere. --Andy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-28 1:38 ` Andy Lutomirski @ 2015-07-28 12:05 ` Szabolcs Nagy 0 siblings, 0 replies; 19+ messages in thread From: Szabolcs Nagy @ 2015-07-28 12:05 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Rich Felker, musl, Alexander Larsson * Andy Lutomirski <luto@amacapital.net> [2015-07-27 18:38:08 -0700]: > On Mon, Jul 27, 2015 at 6:21 PM, Rich Felker <dalias@libc.org> wrote: > > On Mon, Jul 27, 2015 at 06:04:11PM -0700, Andy Lutomirski wrote: > >> On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@libc.org> wrote: > >> > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: > >> >> On 07/26/2015 09:59 AM, Rich Felker wrote: > >> >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: > >> >> >>On x86_32, the only way to call socket(2), etc is using socketcall. > >> >> >>This is slated to change in Linux 4.3: > >> >> >> > >> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb > >> >> >> > >> >> >>If userspace adapts by preferring the direct syscalls when available, > >> >> >>it'll make it easier for seccomp to filter new userspace programs > >> >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). > >> >> >> > >> >> >>Would musl be willing to detect these syscalls and use them if available? > >> the totally untested code below is actually safe on pretty much any > >> architecture that has free C11-style relaxed loads (and this code > >> could even be switched to use actual C11 relaxed loads): > >> > >> volatile int socket_is_okay = true; > >> > >> if (socket_is_okay) { > >> ret = socket(...); > >> if (ret < 0) { > >> if (ret == -ENOSYS) { > >> socket_is_okay = false; > >> } else { > >> errno = -ret; > >> return -1; > >> } > >> > >> return ret; > >> } else { > >> usual socketcall code here; > >> } > > > > This is probably workable with volatile there. Without volatile the > > x86 memory model does not help you; the compiler can make > > transformations that would make it unsafe even if the machine code you > > expected the compiler to generate would be safe. But I still don't > > like hacks like this. It's a big mess to keep it from getting used on > > non-x86 where it would be invalid/unsafe. > > Why's it unsafe on non-x86? I think it's safe if all those volatile > accesses are replaced with standard C11 relaxed accesses. The only > thing that code requires for correctness is that a relaxed read never > returns a result that never was nor will be written. > for posix conformance you would actually need volatile sig_atomic_t which may be smaller than int, but musl doesn't support any such arch. i agree that relaxed memory order access is enough here. it matters if all new socket calls are added together to the kernel (if not, then you need one flag per syscall). one ugliness is that there are archs with SYS_socketcall but musl doesn't use that if SYS_socket exists.. however on i386 the fallback is necessary. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Using direct socket syscalls on x86_32 where available? 2015-07-27 23:56 ` Andy Lutomirski 2015-07-28 0:45 ` Rich Felker @ 2015-07-28 7:44 ` Alexander Larsson 2015-07-29 12:51 ` Justin Cormack 1 sibling, 1 reply; 19+ messages in thread From: Alexander Larsson @ 2015-07-28 7:44 UTC (permalink / raw) To: Andy Lutomirski; +Cc: musl On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: > > One way to implement it would be to favor the new syscalls but to set some > variable the first time one of them returns ENOSYS. Once that happens, > either all of them could fall back to socketcall or just that one syscall > could. > > Or you could just avoid implementing it and see if anyone complains. It's > plausible that xdg-app might start requiring the new syscalls (although it > would presumably not kill you if tried to use socketcall). > > Alex, if glibc started using the new syscalls, would you want to require > them inside xdg-app? Probably not. At this point 32bit x86 just is not interesting enough for such extra pain. We'll just not filter on address types on 32bit. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-28 7:44 ` Alexander Larsson @ 2015-07-29 12:51 ` Justin Cormack 2015-07-29 18:32 ` Andy Lutomirski 0 siblings, 1 reply; 19+ messages in thread From: Justin Cormack @ 2015-07-29 12:51 UTC (permalink / raw) To: musl; +Cc: Andy Lutomirski On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: > On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> One way to implement it would be to favor the new syscalls but to set some >> variable the first time one of them returns ENOSYS. Once that happens, >> either all of them could fall back to socketcall or just that one syscall >> could. >> >> Or you could just avoid implementing it and see if anyone complains. It's >> plausible that xdg-app might start requiring the new syscalls (although it >> would presumably not kill you if tried to use socketcall). >> >> Alex, if glibc started using the new syscalls, would you want to require >> them inside xdg-app? > > Probably not. At this point 32bit x86 just is not interesting enough > for such extra pain. We'll just not filter on address types on 32bit. Why cant you write seccomp rules for socketcall too? It is just an extra register to match on (and libseccomp could perhaps be taught to make it easier). If the answer is because nobody cares about 32 bit x86 then I understand. Justin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-29 12:51 ` Justin Cormack @ 2015-07-29 18:32 ` Andy Lutomirski 2015-07-29 23:14 ` Justin Cormack 0 siblings, 1 reply; 19+ messages in thread From: Andy Lutomirski @ 2015-07-29 18:32 UTC (permalink / raw) To: Justin Cormack; +Cc: musl On Wed, Jul 29, 2015 at 5:51 AM, Justin Cormack <justin@specialbusservice.com> wrote: > On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: >> On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>> >>> One way to implement it would be to favor the new syscalls but to set some >>> variable the first time one of them returns ENOSYS. Once that happens, >>> either all of them could fall back to socketcall or just that one syscall >>> could. >>> >>> Or you could just avoid implementing it and see if anyone complains. It's >>> plausible that xdg-app might start requiring the new syscalls (although it >>> would presumably not kill you if tried to use socketcall). >>> >>> Alex, if glibc started using the new syscalls, would you want to require >>> them inside xdg-app? >> >> Probably not. At this point 32bit x86 just is not interesting enough >> for such extra pain. We'll just not filter on address types on 32bit. > > Why cant you write seccomp rules for socketcall too? It is just an > extra register to match on (and libseccomp could perhaps be taught to > make it easier). If the answer is because nobody cares about 32 bit > x86 then I understand. With socketcall, you can filter on the call number, but you can't filter on the arguments since they're in memory. So you can block socket(2) entirely, but you can't block all but AF_INET, for example. --Andy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-29 18:32 ` Andy Lutomirski @ 2015-07-29 23:14 ` Justin Cormack 2015-07-31 23:13 ` Brad Conroy 0 siblings, 1 reply; 19+ messages in thread From: Justin Cormack @ 2015-07-29 23:14 UTC (permalink / raw) To: Andy Lutomirski; +Cc: musl On 29 July 2015 at 19:32, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Jul 29, 2015 at 5:51 AM, Justin Cormack > <justin@specialbusservice.com> wrote: >> On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: >>> On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>> >>>> One way to implement it would be to favor the new syscalls but to set some >>>> variable the first time one of them returns ENOSYS. Once that happens, >>>> either all of them could fall back to socketcall or just that one syscall >>>> could. >>>> >>>> Or you could just avoid implementing it and see if anyone complains. It's >>>> plausible that xdg-app might start requiring the new syscalls (although it >>>> would presumably not kill you if tried to use socketcall). >>>> >>>> Alex, if glibc started using the new syscalls, would you want to require >>>> them inside xdg-app? >>> >>> Probably not. At this point 32bit x86 just is not interesting enough >>> for such extra pain. We'll just not filter on address types on 32bit. >> >> Why cant you write seccomp rules for socketcall too? It is just an >> extra register to match on (and libseccomp could perhaps be taught to >> make it easier). If the answer is because nobody cares about 32 bit >> x86 then I understand. > > With socketcall, you can filter on the call number, but you can't > filter on the arguments since they're in memory. So you can block > socket(2) entirely, but you can't block all but AF_INET, for example. Oh yes I forgot the socketcall args were passed as a pointer to the real args. Yeah, not worth special casing. Justin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-29 23:14 ` Justin Cormack @ 2015-07-31 23:13 ` Brad Conroy 2015-08-01 0:02 ` Rich Felker 0 siblings, 1 reply; 19+ messages in thread From: Brad Conroy @ 2015-07-31 23:13 UTC (permalink / raw) To: musl On 29 July 2015 at 19:32, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Jul 29, 2015 at 5:51 AM, Justin Cormack > <justin@specialbusservice.com> wrote: >> On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: >>> On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: >>>> >>>> One way to implement it would be to favor the new syscalls but to set some >>>> variable the first time one of them returns ENOSYS. Once that happens, >>>> either all of them could fall back to socketcall or just that one syscall >>>> could. I've had (DRY) concerns over including a copy of unistd.h for each arch. If musl used system linux include headers, this could be an ifdef. #include <linux/unistd.h> #ifdef __NR_something //use syscall #else //use socketcall #endif ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-07-31 23:13 ` Brad Conroy @ 2015-08-01 0:02 ` Rich Felker 2015-08-01 3:32 ` Brad Conroy 0 siblings, 1 reply; 19+ messages in thread From: Rich Felker @ 2015-08-01 0:02 UTC (permalink / raw) To: musl On Fri, Jul 31, 2015 at 11:13:54PM +0000, Brad Conroy wrote: > On 29 July 2015 at 19:32, Andy Lutomirski <luto@amacapital.net> wrote: > > On Wed, Jul 29, 2015 at 5:51 AM, Justin Cormack > > <justin@specialbusservice.com> wrote: > >> On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: > >>> On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: > >>>> > >>>> One way to implement it would be to favor the new syscalls but to set some > >>>> variable the first time one of them returns ENOSYS. Once that happens, > >>>> either all of them could fall back to socketcall or just that one syscall > >>>> could. > > > I've had (DRY) concerns over including a copy of unistd.h for each arch. > If musl used system linux include headers, this could be an ifdef. > > #include <linux/unistd.h> > #ifdef __NR_something > //use syscall > #else > //use socketcall > #endif I don't follow. This is roughly what we do and it does not solve the problem because it assumes that the choice is constant for a given arch, whereas the proposal is adding alternative syscalls that are conditionally available dependent on kernel version. Supporting this would require more complex logic for which to use and runtime fallback code. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-08-01 0:02 ` Rich Felker @ 2015-08-01 3:32 ` Brad Conroy 2015-08-01 3:47 ` Rich Felker 0 siblings, 1 reply; 19+ messages in thread From: Brad Conroy @ 2015-08-01 3:32 UTC (permalink / raw) To: musl > On Fri, Jul 31, 2015 at 11:13:54PM +0000, Brad Conroy wrote: > > On 29 July 2015 at 19:32, Andy Lutomirski <luto@amacapital.net> wrote: > > > On Wed, Jul 29, 2015 at 5:51 AM, Justin Cormack > > > <justin@specialbusservice.com> wrote: > > >> On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: > > >>> On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: > > >>>> > > >>>> One way to implement it would be to favor the new syscalls but to set some > > >>>> variable the first time one of them returns ENOSYS. Once that happens, > > >>>> either all of them could fall back to socketcall or just that one syscall > > >>>> could. > > > > > > I've had (DRY) concerns over including a copy of unistd.h for each arch. > > If musl used system linux include headers, this could be an ifdef. > > > > #include <linux/unistd.h> > > #ifdef __NR_something > > //use syscall > > #else > > //use socketcall > > #endif > > I don't follow. This is roughly what we do and it does not solve the > problem because it assumes that the choice is constant for a given > arch, whereas the proposal is adding alternative syscalls that are > conditionally available dependent on kernel version. Supporting this > would require more complex logic for which to use and runtime fallback > code. > > Rich AFAICT musl uses its own arch specific __NR_* definitions http://git.musl-libc.org/cgit/musl/tree/arch/i386/bits/syscall.h http://git.musl-libc.org/cgit/musl/tree/arch/arm/bits/syscall.h etc... If you replace those with <linux/unistd.h>, you will get the same defs except that: if the sytem kernel is newer you will have additional syscalls if the kernel is older you won't have some that are needed for functions Using system definitions will ensure the system supports the defined syscalls. This will provide an automatic path for future architectures to do the same. It looks like the current method allows musl to be built with syscalls that may not even be supported by the (older) system kernel and then try to make non-existent syscalls during runtime (rfkill comes to mind) - R, Brad Conroy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-08-01 3:32 ` Brad Conroy @ 2015-08-01 3:47 ` Rich Felker 2015-08-01 11:24 ` u-wsnj 0 siblings, 1 reply; 19+ messages in thread From: Rich Felker @ 2015-08-01 3:47 UTC (permalink / raw) To: musl On Sat, Aug 01, 2015 at 03:32:08AM +0000, Brad Conroy wrote: > > On Fri, Jul 31, 2015 at 11:13:54PM +0000, Brad Conroy wrote: > > > On 29 July 2015 at 19:32, Andy Lutomirski <luto@amacapital.net> wrote: > > > > On Wed, Jul 29, 2015 at 5:51 AM, Justin Cormack > > > > <justin@specialbusservice.com> wrote: > > > >> On 28 July 2015 at 08:44, Alexander Larsson <alexander.larsson@gmail.com> wrote: > > > >>> On Tue, Jul 28, 2015 at 1:56 AM, Andy Lutomirski <luto@amacapital.net> wrote: > > > >>>> > > > >>>> One way to implement it would be to favor the new syscalls but to set some > > > >>>> variable the first time one of them returns ENOSYS. Once that happens, > > > >>>> either all of them could fall back to socketcall or just that one syscall > > > >>>> could. > > > > > > > > > I've had (DRY) concerns over including a copy of unistd.h for each arch. > > > If musl used system linux include headers, this could be an ifdef. > > > > > > #include <linux/unistd.h> > > > #ifdef __NR_something > > > //use syscall > > > #else > > > //use socketcall > > > #endif > > > > I don't follow. This is roughly what we do and it does not solve the > > problem because it assumes that the choice is constant for a given > > arch, whereas the proposal is adding alternative syscalls that are > > conditionally available dependent on kernel version. Supporting this > > would require more complex logic for which to use and runtime fallback > > code. > > AFAICT musl uses its own arch specific __NR_* definitions > http://git.musl-libc.org/cgit/musl/tree/arch/i386/bits/syscall.h > http://git.musl-libc.org/cgit/musl/tree/arch/arm/bits/syscall.h > etc... > > If you replace those with <linux/unistd.h>, you will get the same defs > except that: > if the sytem kernel is newer you will have additional syscalls > if the kernel is older you won't have some that are needed for functions > > Using system definitions will ensure the system supports the defined syscalls. > This will provide an automatic path for future architectures to do the same. > > It looks like the current method allows musl to be built with syscalls that > may not even be supported by the (older) system kernel and then try to make > non-existent syscalls during runtime (rfkill comes to mind) Having behavior that depends on the kernel which was present at the time libc was built is utterly broken. In this specific case, you would end up with a libc that cannot run on older kernels. In other cases you would end up with a broken libc that's missing important functionality just because it happened to be built on a system with old kernel (headers). Neither of these is at all desirable. Generally musl supports any kernel >= 2.6.twenty-something (I'd have to check the exact version) fully, except for Linux-specific features added later which fail gracefully at runtime with ENOSYS or similar. Earlier kernels also work somewhat, theoretically back to 2.4.0 on some archs, but with degraded functionality and conformance (in particular, threads won't/can't work on pre-2.6 kernels). I can't see any reason why one would want to exchange this intentional broad compatibility for binaries that only work on particular kernel versions matching some particular build-time configuration. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Using direct socket syscalls on x86_32 where available? 2015-08-01 3:47 ` Rich Felker @ 2015-08-01 11:24 ` u-wsnj 0 siblings, 0 replies; 19+ messages in thread From: u-wsnj @ 2015-08-01 11:24 UTC (permalink / raw) To: musl On Fri, Jul 31, 2015 at 11:47:46PM -0400, Rich Felker wrote: > On Sat, Aug 01, 2015 at 03:32:08AM +0000, Brad Conroy wrote: > > Using system definitions will ensure the system supports the defined syscalls. > > This will provide an automatic path for future architectures to do the same. > Having behavior that depends on the kernel which was present at the > time libc was built is utterly broken. +1 (as well as generally making any assumption that the compilation and runtime environments are and remain related - through the life time of the binary, in all of its lives on multiple computers) A different thing would be compilation of the C library on demand (say with something like tcc) and discarding at reboot. In comparison, compilation in advance and reuse seem to be much more efficient and manageable. That's why we have got a pretty stable kernel ABI, after all. (The same goes about a stable libc ABI versus the applications which musl also does right, thanks Rich) Rune ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-08-01 11:24 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-25 17:54 Using direct socket syscalls on x86_32 where available? Andy Lutomirski 2015-07-25 18:35 ` Szabolcs Nagy 2015-07-26 16:33 ` Justin Cormack 2015-07-26 16:59 ` Rich Felker 2015-07-27 23:56 ` Andy Lutomirski 2015-07-28 0:45 ` Rich Felker 2015-07-28 1:04 ` Andy Lutomirski 2015-07-28 1:21 ` Rich Felker 2015-07-28 1:38 ` Andy Lutomirski 2015-07-28 12:05 ` Szabolcs Nagy 2015-07-28 7:44 ` Alexander Larsson 2015-07-29 12:51 ` Justin Cormack 2015-07-29 18:32 ` Andy Lutomirski 2015-07-29 23:14 ` Justin Cormack 2015-07-31 23:13 ` Brad Conroy 2015-08-01 0:02 ` Rich Felker 2015-08-01 3:32 ` Brad Conroy 2015-08-01 3:47 ` Rich Felker 2015-08-01 11:24 ` u-wsnj
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).