Difficulty emulating F_DUPFD

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Difficulty emulating F_DUPFD_CLOEXEC
@ 2013-03-24  1:59 Rich Felker
  2013-03-24  2:09 ` Rich Felker
  2013-03-24  2:10 ` Zvi Gilboa
  0 siblings, 2 replies; 11+ messages in thread
From: Rich Felker @ 2013-03-24  1:59 UTC (permalink / raw)
  To: musl

Hi again,

Old kernels lack fcntl F_DUPFD_CLOEXEC, which musl needs internally
and wants to provide to applications. Thus, I'd like to emulate it
like we do for pipe2, dup3, socket SOCK_CLOEXEC, etc.; the emulation
has a race condition that leaks fds, but it's better than nothing.

The problem I'm having is how to detect the case where the kernel
lacks F_DUPFD_CLOEXEC. For other other atomic close-on-exec
operations, we either have ENOSYS (newly added syscall) or an
unambiguous EINVAL. But for fcntl, we could get EINVAL because
F_DUPFD_CLOEXEC is not recognized by the kernel, or because the
argument is larger than OPEN_MAX. So we need a test for the cause
behind EINVAL.

False positives are not an option, because if we wrongly detect that
F_DUPFD_CLOEXEC was not supported, we would emulate it with the code
that has race conditions and fd leaks, even on new kernels which
should not have these problems.

The best idea I have so far is:

1. Try F_DUPFD_CLOEXEC. If it succeeds or fails with any error other
   than EINVAL, we're done.

2. If it fails with EINVAL, retry with an argument of 0. This will
   eliminate the other cause of EINVAL, so now we should get EINVAL
   only on old kernels that lack F_DUPFD_CLOEXEC. If this test
   succeeds, we need to close the new duplicate fd we made (on the
   wrong fd number) and report EINVAL back to the caller.

3. If the test in step 2 failed, F_DUPFD_CLOEXEC is unsupported, and
   we have to use the fallback code with race conditions.

This uglifies fcntl.c a bit more, but I think it works. Does the above
reasoning make sense? Any other ideas?

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  1:59 Difficulty emulating F_DUPFD_CLOEXEC Rich Felker
@ 2013-03-24  2:09 ` Rich Felker
  2013-03-24  2:12   ` Rich Felker
  2013-03-24  2:10 ` Zvi Gilboa
  1 sibling, 1 reply; 11+ messages in thread
From: Rich Felker @ 2013-03-24  2:09 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]

On Sat, Mar 23, 2013 at 09:59:23PM -0400, Rich Felker wrote:
> This uglifies fcntl.c a bit more, but I think it works. Does the above
> reasoning make sense? Any other ideas?

Proposed patch attached. Untested.

Rich

[-- Attachment #2: fcntl.diff --]
[-- Type: text/plain, Size: 779 bytes --]

diff --git a/src/fcntl/fcntl.c b/src/fcntl/fcntl.c
index fb7806a..3e7da2a 100644
--- a/src/fcntl/fcntl.c
+++ b/src/fcntl/fcntl.c
@@ -22,5 +22,17 @@ int fcntl(int fd, int cmd, ...)
 		if (ret) return __syscall_ret(ret);
 		return ex.type == F_OWNER_PGRP ? -ex.pid : ex.pid;
 	}
+	if (cmd == F_DUPFD_CLOEXEC) {
+		int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg);
+		if (ret != -EINVAL) return __syscall_ret(ret);
+		ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, 0);
+		if (ret != -EINVAL) {
+			if (ret >= 0) __syscall(SYS_close, ret);
+			return __syscall_ret(-EINVAL);
+		}
+		ret = __syscall(SYS_fcntl, fd, F_DUPFD, arg);
+		if (ret >= 0) __syscall(SYS_fcntl, ret, F_SETFD, FD_CLOEXEC);
+		return __syscall_ret(ret);
+	}
 	return syscall(SYS_fcntl, fd, cmd, arg);
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:09 ` Rich Felker
@ 2013-03-24  2:12   ` Rich Felker
  0 siblings, 0 replies; 11+ messages in thread
From: Rich Felker @ 2013-03-24  2:12 UTC (permalink / raw)
  To: musl

On Sat, Mar 23, 2013 at 10:09:10PM -0400, Rich Felker wrote:
> +	if (cmd == F_DUPFD_CLOEXEC) {
> +		int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg);
> +		if (ret != -EINVAL) return __syscall_ret(ret);

It might be desirable, even if F_DUPFD_CLOEXEC succeeded, to also call
fcntl(ret, F_SETFD, FD_CLOEXEC)... Reportedly somebody broke
F_DUPFD_CLOEXEC somewhere around 3.7.0 so that it silently failed to
set the close-on-exec flag. Anybody know how many kernels are
affected?

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  1:59 Difficulty emulating F_DUPFD_CLOEXEC Rich Felker
  2013-03-24  2:09 ` Rich Felker
@ 2013-03-24  2:10 ` Zvi Gilboa
  2013-03-24  2:14   ` Szabolcs Nagy
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Zvi Gilboa @ 2013-03-24  2:10 UTC (permalink / raw)
  To: musl

On 03/23/2013 09:59 PM, Rich Felker wrote:
> Hi again,
>
> Old kernels lack fcntl F_DUPFD_CLOEXEC, which musl needs internally
> and wants to provide to applications. Thus, I'd like to emulate it
> like we do for pipe2, dup3, socket SOCK_CLOEXEC, etc.; the emulation
> has a race condition that leaks fds, but it's better than nothing.
>
> The problem I'm having is how to detect the case where the kernel
> lacks F_DUPFD_CLOEXEC. For other other atomic close-on-exec
> operations, we either have ENOSYS (newly added syscall) or an
> unambiguous EINVAL. But for fcntl, we could get EINVAL because
> F_DUPFD_CLOEXEC is not recognized by the kernel, or because the
> argument is larger than OPEN_MAX. So we need a test for the cause
> behind EINVAL.
>
> False positives are not an option, because if we wrongly detect that
> F_DUPFD_CLOEXEC was not supported, we would emulate it with the code
> that has race conditions and fd leaks, even on new kernels which
> should not have these problems.
>
> The best idea I have so far is:
>
> 1. Try F_DUPFD_CLOEXEC. If it succeeds or fails with any error other
>     than EINVAL, we're done.
>
> 2. If it fails with EINVAL, retry with an argument of 0. This will
>     eliminate the other cause of EINVAL, so now we should get EINVAL
>     only on old kernels that lack F_DUPFD_CLOEXEC. If this test
>     succeeds, we need to close the new duplicate fd we made (on the
>     wrong fd number) and report EINVAL back to the caller.
>
> 3. If the test in step 2 failed, F_DUPFD_CLOEXEC is unsupported, and
>     we have to use the fallback code with race conditions.
>
> This uglifies fcntl.c a bit more, but I think it works. Does the above
> reasoning make sense? Any other ideas?

In the hope that this matches the project's spirit... how about running 
these tests during the build, and have a script (or a simple test 
program) #define whether the target architecture supports 
F_DUPFD_CLOEXEC or not?  Potentially, this test could be added at the 
very end of alltypes.h.sh

>
> Rich



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:10 ` Zvi Gilboa
@ 2013-03-24  2:14   ` Szabolcs Nagy
  2013-03-24  2:17   ` Rich Felker
  2013-03-24 23:51   ` Rob Landley
  2 siblings, 0 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2013-03-24  2:14 UTC (permalink / raw)
  To: musl

* Zvi Gilboa <zg7s@eservices.virginia.edu> [2013-03-23 22:10:10 -0400]:
> >This uglifies fcntl.c a bit more, but I think it works. Does the above
> >reasoning make sense? Any other ideas?
> 
> In the hope that this matches the project's spirit... how about
> running these tests during the build, and have a script (or a simple
> test program) #define whether the target architecture supports
> F_DUPFD_CLOEXEC or not?  Potentially, this test could be added at
> the very end of alltypes.h.sh

the kernel can be changed under libc
and statically linked binaries should
work on different machines

so you cannot detect this at build time


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:10 ` Zvi Gilboa
  2013-03-24  2:14   ` Szabolcs Nagy
@ 2013-03-24  2:17   ` Rich Felker
  2013-03-24  2:27     ` Zvi Gilboa
  2013-03-24 23:51   ` Rob Landley
  2 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2013-03-24  2:17 UTC (permalink / raw)
  To: musl

On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote:
> >This uglifies fcntl.c a bit more, but I think it works. Does the above
> >reasoning make sense? Any other ideas?
> 
> In the hope that this matches the project's spirit... how about
> running these tests during the build, and have a script (or a simple
> test program) #define whether the target architecture supports
> F_DUPFD_CLOEXEC or not?  Potentially, this test could be added at
> the very end of alltypes.h.sh

It's not a matter of the architecture. It's a matter of the kernel
version. F_DUPFD_CLOEXEC was not available until late in the 2.6
series, and musl aims to "mostly work" even with early 2.6, and to
"partly work" (at least for single-threaded programs that don't use
modern features) even on 2.4. For dynamic linking, it could make sense
to have a slimmer version of libc.so that only supports up-to-date
kernels, but for static linking, it's really frustrating to have
binaries that break on older kernels, even if it is the broken
kernel's fault.

If the lack of these features were just breaking _apps_ that use them,
it would be one thing, but several of the very-new atomic
close-on-exec interfaces needed internally in musl for some core
functionality -- things like dns lookups, popen, system, etc. Thus
failure to emulate them when the kernel doesn't have working versions
could badly break even "simple" apps that would otherwise be expected
to work even on old kernels.

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:17   ` Rich Felker
@ 2013-03-24  2:27     ` Zvi Gilboa
  2013-03-24  2:33       ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Zvi Gilboa @ 2013-03-24  2:27 UTC (permalink / raw)
  To: musl

On 03/23/2013 10:17 PM, Rich Felker wrote:
> On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote:
>>> This uglifies fcntl.c a bit more, but I think it works. Does the above
>>> reasoning make sense? Any other ideas?
>> In the hope that this matches the project's spirit... how about
>> running these tests during the build, and have a script (or a simple
>> test program) #define whether the target architecture supports
>> F_DUPFD_CLOEXEC or not?  Potentially, this test could be added at
>> the very end of alltypes.h.sh
> It's not a matter of the architecture. It's a matter of the kernel
> version. F_DUPFD_CLOEXEC was not available until late in the 2.6
> series, and musl aims to "mostly work" even with early 2.6, and to
> "partly work" (at least for single-threaded programs that don't use
> modern features) even on 2.4. For dynamic linking, it could make sense
> to have a slimmer version of libc.so that only supports up-to-date
> kernels, but for static linking, it's really frustrating to have
> binaries that break on older kernels, even if it is the broken
> kernel's fault.
>
> If the lack of these features were just breaking _apps_ that use them,
> it would be one thing, but several of the very-new atomic
> close-on-exec interfaces needed internally in musl for some core
> functionality -- things like dns lookups, popen, system, etc. Thus
> failure to emulate them when the kernel doesn't have working versions
> could badly break even "simple" apps that would otherwise be expected
> to work even on old kernels.

... thanks, that makes perfect sense.  As a second-best try: would it 
makes sense to run the long feature test just once, during startup (and 
save its result to some global variable), instead of inside fcntl.c?
>
> Rich



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:27     ` Zvi Gilboa
@ 2013-03-24  2:33       ` Rich Felker
  2013-03-24  2:57         ` Zvi Gilboa
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2013-03-24  2:33 UTC (permalink / raw)
  To: musl

On Sat, Mar 23, 2013 at 10:27:14PM -0400, Zvi Gilboa wrote:
> On 03/23/2013 10:17 PM, Rich Felker wrote:
> >On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote:
> >>>This uglifies fcntl.c a bit more, but I think it works. Does the above
> >>>reasoning make sense? Any other ideas?
> >>In the hope that this matches the project's spirit... how about
> >>running these tests during the build, and have a script (or a simple
> >>test program) #define whether the target architecture supports
> >>F_DUPFD_CLOEXEC or not?  Potentially, this test could be added at
> >>the very end of alltypes.h.sh
> >It's not a matter of the architecture. It's a matter of the kernel
> >version. F_DUPFD_CLOEXEC was not available until late in the 2.6
> >series, and musl aims to "mostly work" even with early 2.6, and to
> >"partly work" (at least for single-threaded programs that don't use
> >modern features) even on 2.4. For dynamic linking, it could make sense
> >to have a slimmer version of libc.so that only supports up-to-date
> >kernels, but for static linking, it's really frustrating to have
> >binaries that break on older kernels, even if it is the broken
> >kernel's fault.
> >
> >If the lack of these features were just breaking _apps_ that use them,
> >it would be one thing, but several of the very-new atomic
> >close-on-exec interfaces needed internally in musl for some core
> >functionality -- things like dns lookups, popen, system, etc. Thus
> >failure to emulate them when the kernel doesn't have working versions
> >could badly break even "simple" apps that would otherwise be expected
> >to work even on old kernels.
> 
> .... thanks, that makes perfect sense.  As a second-best try: would
> it makes sense to run the long feature test just once, during
> startup (and save its result to some global variable), instead of
> inside fcntl.c?

The long test only runs when the fcntl syscall returns -EINVAL. So it
should never happen at all on a modern kernel with correct usage. The
tradeoff for saving the result of the test is that, by doing so, you
would get slightly improved performance on old kernels that lack
F_DUPFD_CLOEXEC, or when making a stupid fcntl call that would return
EINVAL even on a new kernel, but you would add more global data.

One byte of data should not really matter, but in a sense it does
because you have to worry that the linking order might put it in the
middle of an otherwise-unused page and thus increase the memory usage
of the whole process by a page. This is actually an issue we need to
address -- libc.so's memory usage is a lot higher than it should be
right now, because the global vars that get touched by EVERY process
at startup time are spread out across several pages. Ordering things
so that they all end up in the same page would fix it, but I'm still
looking for the cleanest way to do that..

Anyway, I think it's best not to save the results of fallback tests
like this. There's also an issue of making sure the saved result is
properly synchronized, which I haven't even touched on yet..

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:33       ` Rich Felker
@ 2013-03-24  2:57         ` Zvi Gilboa
  2013-03-24  3:08           ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Zvi Gilboa @ 2013-03-24  2:57 UTC (permalink / raw)
  To: musl

On 03/23/2013 10:33 PM, Rich Felker wrote:
> On Sat, Mar 23, 2013 at 10:27:14PM -0400, Zvi Gilboa wrote:
>> On 03/23/2013 10:17 PM, Rich Felker wrote:
>>> On Sat, Mar 23, 2013 at 10:10:10PM -0400, Zvi Gilboa wrote:
>>>>> This uglifies fcntl.c a bit more, but I think it works. Does the above
>>>>> reasoning make sense? Any other ideas?
>>>> In the hope that this matches the project's spirit... how about
>>>> running these tests during the build, and have a script (or a simple
>>>> test program) #define whether the target architecture supports
>>>> F_DUPFD_CLOEXEC or not?  Potentially, this test could be added at
>>>> the very end of alltypes.h.sh
>>> It's not a matter of the architecture. It's a matter of the kernel
>>> version. F_DUPFD_CLOEXEC was not available until late in the 2.6
>>> series, and musl aims to "mostly work" even with early 2.6, and to
>>> "partly work" (at least for single-threaded programs that don't use
>>> modern features) even on 2.4. For dynamic linking, it could make sense
>>> to have a slimmer version of libc.so that only supports up-to-date
>>> kernels, but for static linking, it's really frustrating to have
>>> binaries that break on older kernels, even if it is the broken
>>> kernel's fault.
>>>
>>> If the lack of these features were just breaking _apps_ that use them,
>>> it would be one thing, but several of the very-new atomic
>>> close-on-exec interfaces needed internally in musl for some core
>>> functionality -- things like dns lookups, popen, system, etc. Thus
>>> failure to emulate them when the kernel doesn't have working versions
>>> could badly break even "simple" apps that would otherwise be expected
>>> to work even on old kernels.
>> .... thanks, that makes perfect sense.  As a second-best try: would
>> it makes sense to run the long feature test just once, during
>> startup (and save its result to some global variable), instead of
>> inside fcntl.c?
> The long test only runs when the fcntl syscall returns -EINVAL. So it
> should never happen at all on a modern kernel with correct usage. The
> tradeoff for saving the result of the test is that, by doing so, you
> would get slightly improved performance on old kernels that lack
> F_DUPFD_CLOEXEC, or when making a stupid fcntl call that would return
> EINVAL even on a new kernel, but you would add more global data.
>
> One byte of data should not really matter, but in a sense it does
> because you have to worry that the linking order might put it in the
> middle of an otherwise-unused page and thus increase the memory usage
> of the whole process by a page. This is actually an issue we need to
> address -- libc.so's memory usage is a lot higher than it should be
> right now, because the global vars that get touched by EVERY process
> at startup time are spread out across several pages. Ordering things
> so that they all end up in the same page would fix it, but I'm still
> looking for the cleanest way to do that..

... could a structure holding all global variables be considered a clean 
way to achieve this?  That would obviously mandate some code changes 
(adding the extern declaration to the relevant modules, and accordingly 
referring to global_vars.var_one instead of just to var_one), but at 
least "guarantee" that they are all kept together on the same page (or 
two or three)...

>
> Anyway, I think it's best not to save the results of fallback tests
> like this. There's also an issue of making sure the saved result is
> properly synchronized, which I haven't even touched on yet..
>
> Rich



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:57         ` Zvi Gilboa
@ 2013-03-24  3:08           ` Rich Felker
  0 siblings, 0 replies; 11+ messages in thread
From: Rich Felker @ 2013-03-24  3:08 UTC (permalink / raw)
  To: musl

On Sat, Mar 23, 2013 at 10:57:51PM -0400, Zvi Gilboa wrote:
> >One byte of data should not really matter, but in a sense it does
> >because you have to worry that the linking order might put it in the
> >middle of an otherwise-unused page and thus increase the memory usage
> >of the whole process by a page. This is actually an issue we need to
> >address -- libc.so's memory usage is a lot higher than it should be
> >right now, because the global vars that get touched by EVERY process
> >at startup time are spread out across several pages. Ordering things
> >so that they all end up in the same page would fix it, but I'm still
> >looking for the cleanest way to do that..
> 
> .... could a structure holding all global variables be considered a
> clean way to achieve this?  That would obviously mandate some code
> changes (adding the extern declaration to the relevant modules, and
> accordingly referring to global_vars.var_one instead of just to
> var_one), but at least "guarantee" that they are all kept together
> on the same page (or two or three)...

That's what we do for some data that's known to be needed in _all_
programs (see the "libc" structure in internal/libc.[ch]), but it's
problematic for static linking if the data might or might not be
needed, and you want to let the linker omit it when it's not needed.

So far I've seen two half-decent approaches:

1. Put all the almost-surely-touched data in .data rather than .bss.
The amount of .data is small enough that it all fits in one page
anyway. The trade-off here is that .data takes space on disk (even if
it's zero-filled), so you increase the size of libc.so and
static-linked binaries slightly on disk.

2. Reorder the linking/archiving of .o files so that the ones with
almost-surely-touched data appear first in the link order. The
trade-off here is that the source tree and/or Makefile becomes a bit
uglier.

Of these, I tend to prefer the first. However, it still might not be
enough for fully optimal layout. For example, we'd like the
likely-to-be-used parts of .bss to immediately follow .data (in the
same page): things like the buffers for stdin and stdout, which are
not _necessarily_ used, but fairly likely to be used. Other big .bss
buffers like pthread_key_create tables, which are unlikely to be used,
should be at the end in hopes that the last few pages of .bss won't be
dirtied at all.

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Difficulty emulating F_DUPFD_CLOEXEC
  2013-03-24  2:10 ` Zvi Gilboa
  2013-03-24  2:14   ` Szabolcs Nagy
  2013-03-24  2:17   ` Rich Felker
@ 2013-03-24 23:51   ` Rob Landley
  2 siblings, 0 replies; 11+ messages in thread
From: Rob Landley @ 2013-03-24 23:51 UTC (permalink / raw)
  To: musl; +Cc: musl

On 03/23/2013 09:10:10 PM, Zvi Gilboa wrote:
> On 03/23/2013 09:59 PM, Rich Felker wrote:
>> This uglifies fcntl.c a bit more, but I think it works. Does the  
>> above
>> reasoning make sense? Any other ideas?
> 
> In the hope that this matches the project's spirit... how about  
> running
> these tests during the build, and have a script (or a simple test  
> program)
> #define whether the target architecture supports F_DUPFD_CLOEXEC or  
> not?
> Potentially, this test could be added at the very end of alltypes.h.sh

The system you build on and the system you run on aren't always  
identical. (Especially when you try to build portable static multicall  
binaries.)

Rob

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-03-24 23:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-24  1:59 Difficulty emulating F_DUPFD_CLOEXEC Rich Felker
2013-03-24  2:09 ` Rich Felker
2013-03-24  2:12   ` Rich Felker
2013-03-24  2:10 ` Zvi Gilboa
2013-03-24  2:14   ` Szabolcs Nagy
2013-03-24  2:17   ` Rich Felker
2013-03-24  2:27     ` Zvi Gilboa
2013-03-24  2:33       ` Rich Felker
2013-03-24  2:57         ` Zvi Gilboa
2013-03-24  3:08           ` Rich Felker
2013-03-24 23:51   ` Rob Landley

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).