* Difficulty emulating F_DUPFD_CLOEXEC @ 2013-03-24 1:59 Rich Felker 2013-03-24 2:09 ` Rich Felker 2013-03-24 2:10 ` Zvi Gilboa 0 siblings, 2 replies; 11+ messages in thread From: Rich Felker @ 2013-03-24 1:59 UTC (permalink / raw) To: musl Hi again, Old kernels lack fcntl F_DUPFD_CLOEXEC, which musl needs internally and wants to provide to applications. Thus, I'd like to emulate it like we do for pipe2, dup3, socket SOCK_CLOEXEC, etc.; the emulation has a race condition that leaks fds, but it's better than nothing. The problem I'm having is how to detect the case where the kernel lacks F_DUPFD_CLOEXEC. For other other atomic close-on-exec operations, we either have ENOSYS (newly added syscall) or an unambiguous EINVAL. But for fcntl, we could get EINVAL because F_DUPFD_CLOEXEC is not recognized by the kernel, or because the argument is larger than OPEN_MAX. So we need a test for the cause behind EINVAL. False positives are not an option, because if we wrongly detect that F_DUPFD_CLOEXEC was not supported, we would emulate it with the code that has race conditions and fd leaks, even on new kernels which should not have these problems. The best idea I have so far is: 1. Try F_DUPFD_CLOEXEC. If it succeeds or fails with any error other than EINVAL, we're done. 2. If it fails with EINVAL, retry with an argument of 0. This will eliminate the other cause of EINVAL, so now we should get EINVAL only on old kernels that lack F_DUPFD_CLOEXEC. If this test succeeds, we need to close the new duplicate fd we made (on the wrong fd number) and report EINVAL back to the caller. 3. If the test in step 2 failed, F_DUPFD_CLOEXEC is unsupported, and we have to use the fallback code with race conditions. This uglifies fcntl.c a bit more, but I think it works. Does the above reasoning make sense? Any other ideas? Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 1:59 Difficulty emulating F_DUPFD_CLOEXEC Rich Felker @ 2013-03-24 2:09 ` Rich Felker 2013-03-24 2:12 ` Rich Felker 2013-03-24 2:10 ` Zvi Gilboa 1 sibling, 1 reply; 11+ messages in thread From: Rich Felker @ 2013-03-24 2:09 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 217 bytes --] On Sat, Mar 23, 2013 at 09:59:23PM -0400, Rich Felker wrote: > This uglifies fcntl.c a bit more, but I think it works. Does the above > reasoning make sense? Any other ideas? Proposed patch attached. Untested. Rich [-- Attachment #2: fcntl.diff --] [-- Type: text/plain, Size: 779 bytes --] diff --git a/src/fcntl/fcntl.c b/src/fcntl/fcntl.c index fb7806a..3e7da2a 100644 --- a/src/fcntl/fcntl.c +++ b/src/fcntl/fcntl.c @@ -22,5 +22,17 @@ int fcntl(int fd, int cmd, ...) if (ret) return __syscall_ret(ret); return ex.type == F_OWNER_PGRP ? -ex.pid : ex.pid; } + if (cmd == F_DUPFD_CLOEXEC) { + int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg); + if (ret != -EINVAL) return __syscall_ret(ret); + ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, 0); + if (ret != -EINVAL) { + if (ret >= 0) __syscall(SYS_close, ret); + return __syscall_ret(-EINVAL); + } + ret = __syscall(SYS_fcntl, fd, F_DUPFD, arg); + if (ret >= 0) __syscall(SYS_fcntl, ret, F_SETFD, FD_CLOEXEC); + return __syscall_ret(ret); + } return syscall(SYS_fcntl, fd, cmd, arg); } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:09 ` Rich Felker @ 2013-03-24 2:12 ` Rich Felker 0 siblings, 0 replies; 11+ messages in thread From: Rich Felker @ 2013-03-24 2:12 UTC (permalink / raw) To: musl On Sat, Mar 23, 2013 at 10:09:10PM -0400, Rich Felker wrote: > + if (cmd == F_DUPFD_CLOEXEC) { > + int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg); > + if (ret != -EINVAL) return __syscall_ret(ret); It might be desirable, even if F_DUPFD_CLOEXEC succeeded, to also call fcntl(ret, F_SETFD, FD_CLOEXEC)... Reportedly somebody broke F_DUPFD_CLOEXEC somewhere around 3.7.0 so that it silently failed to set the close-on-exec flag. Anybody know how many kernels are affected? Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 1:59 Difficulty emulating F_DUPFD_CLOEXEC Rich Felker 2013-03-24 2:09 ` Rich Felker @ 2013-03-24 2:10 ` Zvi Gilboa 2013-03-24 2:14 ` Szabolcs Nagy ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: Zvi Gilboa @ 2013-03-24 2:10 UTC (permalink / raw) To: musl On 03/23/2013 09:59 PM, Rich Felker wrote: > Hi again, > > Old kernels lack fcntl F_DUPFD_CLOEXEC, which musl needs internally > and wants to provide to applications. Thus, I'd like to emulate it > like we do for pipe2, dup3, socket SOCK_CLOEXEC, etc.; the emulation > has a race condition that leaks fds, but it's better than nothing. > > The problem I'm having is how to detect the case where the kernel > lacks F_DUPFD_CLOEXEC. For other other atomic close-on-exec > operations, we either have ENOSYS (newly added syscall) or an > unambiguous EINVAL. But for fcntl, we could get EINVAL because > F_DUPFD_CLOEXEC is not recognized by the kernel, or because the > argument is larger than OPEN_MAX. So we need a test for the cause > behind EINVAL. > > False positives are not an option, because if we wrongly detect that > F_DUPFD_CLOEXEC was not supported, we would emulate it with the code > that has race conditions and fd leaks, even on new kernels which > should not have these problems. > > The best idea I have so far is: > > 1. Try F_DUPFD_CLOEXEC. If it succeeds or fails with any error other > than EINVAL, we're done. > > 2. If it fails with EINVAL, retry with an argument of 0. This will > eliminate the other cause of EINVAL, so now we should get EINVAL > only on old kernels that lack F_DUPFD_CLOEXEC. If this test > succeeds, we need to close the new duplicate fd we made (on the > wrong fd number) and report EINVAL back to the caller. > > 3. If the test in step 2 failed, F_DUPFD_CLOEXEC is unsupported, and > we have to use the fallback code with race conditions. > > This uglifies fcntl.c a bit more, but I think it works. Does the above > reasoning make sense? Any other ideas? In the hope that this matches the project's spirit... how about running these tests during the build, and have a script (or a simple test program) #define whether the target architecture supports F_DUPFD_CLOEXEC or not? Potentially, this test could be added at the very end of alltypes.h.sh > > Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:10 ` Zvi Gilboa @ 2013-03-24 2:14 ` Szabolcs Nagy 2013-03-24 2:17 ` Rich Felker 2013-03-24 23:51 ` Rob Landley 2 siblings, 0 replies; 11+ messages in thread From: Szabolcs Nagy @ 2013-03-24 2:14 UTC (permalink / raw) To: musl * Zvi Gilboa <zg7s@eservices.virginia.edu> [2013-03-23 22:10:10 -0400]: > >This uglifies fcntl.c a bit more, but I think it works. Does the above > >reasoning make sense? Any other ideas? > > In the hope that this matches the project's spirit... how about > running these tests during the build, and have a script (or a simple > test program) #define whether the target architecture supports > F_DUPFD_CLOEXEC or not? Potentially, this test could be added at > the very end of alltypes.h.sh the kernel can be changed under libc and statically linked binaries should work on different machines so you cannot detect this at build time ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:10 ` Zvi Gilboa 2013-03-24 2:14 ` Szabolcs Nagy @ 2013-03-24 2:17 ` Rich Felker 2013-03-24 2:27 ` Zvi Gilboa 2013-03-24 23:51 ` Rob Landley 2 siblings, 1 reply; 11+ messages in thread From: Rich Felker @ 2013-03-24 2:17 UTC (permalink / raw) To: musl On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote: > >This uglifies fcntl.c a bit more, but I think it works. Does the above > >reasoning make sense? Any other ideas? > > In the hope that this matches the project's spirit... how about > running these tests during the build, and have a script (or a simple > test program) #define whether the target architecture supports > F_DUPFD_CLOEXEC or not? Potentially, this test could be added at > the very end of alltypes.h.sh It's not a matter of the architecture. It's a matter of the kernel version. F_DUPFD_CLOEXEC was not available until late in the 2.6 series, and musl aims to "mostly work" even with early 2.6, and to "partly work" (at least for single-threaded programs that don't use modern features) even on 2.4. For dynamic linking, it could make sense to have a slimmer version of libc.so that only supports up-to-date kernels, but for static linking, it's really frustrating to have binaries that break on older kernels, even if it is the broken kernel's fault. If the lack of these features were just breaking _apps_ that use them, it would be one thing, but several of the very-new atomic close-on-exec interfaces needed internally in musl for some core functionality -- things like dns lookups, popen, system, etc. Thus failure to emulate them when the kernel doesn't have working versions could badly break even "simple" apps that would otherwise be expected to work even on old kernels. Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:17 ` Rich Felker @ 2013-03-24 2:27 ` Zvi Gilboa 2013-03-24 2:33 ` Rich Felker 0 siblings, 1 reply; 11+ messages in thread From: Zvi Gilboa @ 2013-03-24 2:27 UTC (permalink / raw) To: musl On 03/23/2013 10:17 PM, Rich Felker wrote: > On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote: >>> This uglifies fcntl.c a bit more, but I think it works. Does the above >>> reasoning make sense? Any other ideas? >> In the hope that this matches the project's spirit... how about >> running these tests during the build, and have a script (or a simple >> test program) #define whether the target architecture supports >> F_DUPFD_CLOEXEC or not? Potentially, this test could be added at >> the very end of alltypes.h.sh > It's not a matter of the architecture. It's a matter of the kernel > version. F_DUPFD_CLOEXEC was not available until late in the 2.6 > series, and musl aims to "mostly work" even with early 2.6, and to > "partly work" (at least for single-threaded programs that don't use > modern features) even on 2.4. For dynamic linking, it could make sense > to have a slimmer version of libc.so that only supports up-to-date > kernels, but for static linking, it's really frustrating to have > binaries that break on older kernels, even if it is the broken > kernel's fault. > > If the lack of these features were just breaking _apps_ that use them, > it would be one thing, but several of the very-new atomic > close-on-exec interfaces needed internally in musl for some core > functionality -- things like dns lookups, popen, system, etc. Thus > failure to emulate them when the kernel doesn't have working versions > could badly break even "simple" apps that would otherwise be expected > to work even on old kernels. ... thanks, that makes perfect sense. As a second-best try: would it makes sense to run the long feature test just once, during startup (and save its result to some global variable), instead of inside fcntl.c? > > Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:27 ` Zvi Gilboa @ 2013-03-24 2:33 ` Rich Felker 2013-03-24 2:57 ` Zvi Gilboa 0 siblings, 1 reply; 11+ messages in thread From: Rich Felker @ 2013-03-24 2:33 UTC (permalink / raw) To: musl On Sat, Mar 23, 2013 at 10:27:14PM -0400, Zvi Gilboa wrote: > On 03/23/2013 10:17 PM, Rich Felker wrote: > >On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote: > >>>This uglifies fcntl.c a bit more, but I think it works. Does the above > >>>reasoning make sense? Any other ideas? > >>In the hope that this matches the project's spirit... how about > >>running these tests during the build, and have a script (or a simple > >>test program) #define whether the target architecture supports > >>F_DUPFD_CLOEXEC or not? Potentially, this test could be added at > >>the very end of alltypes.h.sh > >It's not a matter of the architecture. It's a matter of the kernel > >version. F_DUPFD_CLOEXEC was not available until late in the 2.6 > >series, and musl aims to "mostly work" even with early 2.6, and to > >"partly work" (at least for single-threaded programs that don't use > >modern features) even on 2.4. For dynamic linking, it could make sense > >to have a slimmer version of libc.so that only supports up-to-date > >kernels, but for static linking, it's really frustrating to have > >binaries that break on older kernels, even if it is the broken > >kernel's fault. > > > >If the lack of these features were just breaking _apps_ that use them, > >it would be one thing, but several of the very-new atomic > >close-on-exec interfaces needed internally in musl for some core > >functionality -- things like dns lookups, popen, system, etc. Thus > >failure to emulate them when the kernel doesn't have working versions > >could badly break even "simple" apps that would otherwise be expected > >to work even on old kernels. > > .... thanks, that makes perfect sense. As a second-best try: would > it makes sense to run the long feature test just once, during > startup (and save its result to some global variable), instead of > inside fcntl.c? The long test only runs when the fcntl syscall returns -EINVAL. So it should never happen at all on a modern kernel with correct usage. The tradeoff for saving the result of the test is that, by doing so, you would get slightly improved performance on old kernels that lack F_DUPFD_CLOEXEC, or when making a stupid fcntl call that would return EINVAL even on a new kernel, but you would add more global data. One byte of data should not really matter, but in a sense it does because you have to worry that the linking order might put it in the middle of an otherwise-unused page and thus increase the memory usage of the whole process by a page. This is actually an issue we need to address -- libc.so's memory usage is a lot higher than it should be right now, because the global vars that get touched by EVERY process at startup time are spread out across several pages. Ordering things so that they all end up in the same page would fix it, but I'm still looking for the cleanest way to do that.. Anyway, I think it's best not to save the results of fallback tests like this. There's also an issue of making sure the saved result is properly synchronized, which I haven't even touched on yet.. Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:33 ` Rich Felker @ 2013-03-24 2:57 ` Zvi Gilboa 2013-03-24 3:08 ` Rich Felker 0 siblings, 1 reply; 11+ messages in thread From: Zvi Gilboa @ 2013-03-24 2:57 UTC (permalink / raw) To: musl On 03/23/2013 10:33 PM, Rich Felker wrote: > On Sat, Mar 23, 2013 at 10:27:14PM -0400, Zvi Gilboa wrote: >> On 03/23/2013 10:17 PM, Rich Felker wrote: >>> On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote: >>>>> This uglifies fcntl.c a bit more, but I think it works. Does the above >>>>> reasoning make sense? Any other ideas? >>>> In the hope that this matches the project's spirit... how about >>>> running these tests during the build, and have a script (or a simple >>>> test program) #define whether the target architecture supports >>>> F_DUPFD_CLOEXEC or not? Potentially, this test could be added at >>>> the very end of alltypes.h.sh >>> It's not a matter of the architecture. It's a matter of the kernel >>> version. F_DUPFD_CLOEXEC was not available until late in the 2.6 >>> series, and musl aims to "mostly work" even with early 2.6, and to >>> "partly work" (at least for single-threaded programs that don't use >>> modern features) even on 2.4. For dynamic linking, it could make sense >>> to have a slimmer version of libc.so that only supports up-to-date >>> kernels, but for static linking, it's really frustrating to have >>> binaries that break on older kernels, even if it is the broken >>> kernel's fault. >>> >>> If the lack of these features were just breaking _apps_ that use them, >>> it would be one thing, but several of the very-new atomic >>> close-on-exec interfaces needed internally in musl for some core >>> functionality -- things like dns lookups, popen, system, etc. Thus >>> failure to emulate them when the kernel doesn't have working versions >>> could badly break even "simple" apps that would otherwise be expected >>> to work even on old kernels. >> .... thanks, that makes perfect sense. As a second-best try: would >> it makes sense to run the long feature test just once, during >> startup (and save its result to some global variable), instead of >> inside fcntl.c? > The long test only runs when the fcntl syscall returns -EINVAL. So it > should never happen at all on a modern kernel with correct usage. The > tradeoff for saving the result of the test is that, by doing so, you > would get slightly improved performance on old kernels that lack > F_DUPFD_CLOEXEC, or when making a stupid fcntl call that would return > EINVAL even on a new kernel, but you would add more global data. > > One byte of data should not really matter, but in a sense it does > because you have to worry that the linking order might put it in the > middle of an otherwise-unused page and thus increase the memory usage > of the whole process by a page. This is actually an issue we need to > address -- libc.so's memory usage is a lot higher than it should be > right now, because the global vars that get touched by EVERY process > at startup time are spread out across several pages. Ordering things > so that they all end up in the same page would fix it, but I'm still > looking for the cleanest way to do that.. ... could a structure holding all global variables be considered a clean way to achieve this? That would obviously mandate some code changes (adding the extern declaration to the relevant modules, and accordingly referring to global_vars.var_one instead of just to var_one), but at least "guarantee" that they are all kept together on the same page (or two or three)... > > Anyway, I think it's best not to save the results of fallback tests > like this. There's also an issue of making sure the saved result is > properly synchronized, which I haven't even touched on yet.. > > Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:57 ` Zvi Gilboa @ 2013-03-24 3:08 ` Rich Felker 0 siblings, 0 replies; 11+ messages in thread From: Rich Felker @ 2013-03-24 3:08 UTC (permalink / raw) To: musl On Sat, Mar 23, 2013 at 10:57:51PM -0400, Zvi Gilboa wrote: > >One byte of data should not really matter, but in a sense it does > >because you have to worry that the linking order might put it in the > >middle of an otherwise-unused page and thus increase the memory usage > >of the whole process by a page. This is actually an issue we need to > >address -- libc.so's memory usage is a lot higher than it should be > >right now, because the global vars that get touched by EVERY process > >at startup time are spread out across several pages. Ordering things > >so that they all end up in the same page would fix it, but I'm still > >looking for the cleanest way to do that.. > > .... could a structure holding all global variables be considered a > clean way to achieve this? That would obviously mandate some code > changes (adding the extern declaration to the relevant modules, and > accordingly referring to global_vars.var_one instead of just to > var_one), but at least "guarantee" that they are all kept together > on the same page (or two or three)... That's what we do for some data that's known to be needed in _all_ programs (see the "libc" structure in internal/libc.[ch]), but it's problematic for static linking if the data might or might not be needed, and you want to let the linker omit it when it's not needed. So far I've seen two half-decent approaches: 1. Put all the almost-surely-touched data in .data rather than .bss. The amount of .data is small enough that it all fits in one page anyway. The trade-off here is that .data takes space on disk (even if it's zero-filled), so you increase the size of libc.so and static-linked binaries slightly on disk. 2. Reorder the linking/archiving of .o files so that the ones with almost-surely-touched data appear first in the link order. The trade-off here is that the source tree and/or Makefile becomes a bit uglier. Of these, I tend to prefer the first. However, it still might not be enough for fully optimal layout. For example, we'd like the likely-to-be-used parts of .bss to immediately follow .data (in the same page): things like the buffers for stdin and stdout, which are not _necessarily_ used, but fairly likely to be used. Other big .bss buffers like pthread_key_create tables, which are unlikely to be used, should be at the end in hopes that the last few pages of .bss won't be dirtied at all. Rich ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Difficulty emulating F_DUPFD_CLOEXEC 2013-03-24 2:10 ` Zvi Gilboa 2013-03-24 2:14 ` Szabolcs Nagy 2013-03-24 2:17 ` Rich Felker @ 2013-03-24 23:51 ` Rob Landley 2 siblings, 0 replies; 11+ messages in thread From: Rob Landley @ 2013-03-24 23:51 UTC (permalink / raw) To: musl; +Cc: musl On 03/23/2013 09:10:10 PM, Zvi Gilboa wrote: > On 03/23/2013 09:59 PM, Rich Felker wrote: >> This uglifies fcntl.c a bit more, but I think it works. Does the >> above >> reasoning make sense? Any other ideas? > > In the hope that this matches the project's spirit... how about > running > these tests during the build, and have a script (or a simple test > program) > #define whether the target architecture supports F_DUPFD_CLOEXEC or > not? > Potentially, this test could be added at the very end of alltypes.h.sh The system you build on and the system you run on aren't always identical. (Especially when you try to build portable static multicall binaries.) Rob ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-03-24 23:51 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-03-24 1:59 Difficulty emulating F_DUPFD_CLOEXEC Rich Felker 2013-03-24 2:09 ` Rich Felker 2013-03-24 2:12 ` Rich Felker 2013-03-24 2:10 ` Zvi Gilboa 2013-03-24 2:14 ` Szabolcs Nagy 2013-03-24 2:17 ` Rich Felker 2013-03-24 2:27 ` Zvi Gilboa 2013-03-24 2:33 ` Rich Felker 2013-03-24 2:57 ` Zvi Gilboa 2013-03-24 3:08 ` Rich Felker 2013-03-24 23:51 ` Rob Landley
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).