[musl] fcntl: Purpose of second DUPFD

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [musl] fcntl: Purpose of second DUPFD_CLOEXEC?
@ 2024-08-22 13:52 Markus Wichmann
  2024-08-22 14:48 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Markus Wichmann @ 2024-08-22 13:52 UTC (permalink / raw)
  To: musl

Hi all,

I just stumbled upon the code to emulate F_DUPFD_CLOEXEC on old kernels.
At the moment, it looks like this:

|int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg);
|if (ret != -EINVAL) {
|	if (ret >= 0)
|		__syscall(SYS_fcntl, ret, F_SETFD, FD_CLOEXEC);
|	return __syscall_ret(ret);
|}
|ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, 0);
|if (ret != -EINVAL) {
|	if (ret >= 0) __syscall(SYS_close, ret);
|	return __syscall_ret(-EINVAL);
|}
|ret = __syscall(SYS_fcntl, fd, F_DUPFD, arg);
|if (ret >= 0) __syscall(SYS_fcntl, ret, F_SETFD, FD_CLOEXEC);
|return __syscall_ret(ret);

And I'm wondering what the point of the second call to F_DUPFD_CLOEXEC
is. If it is just to test that the argument is valid, then why not just
get rid of the block? F_DUPFD has the same constraints on the argument.

Getting rid of that block also has the advantage of being able to factor
out the F_SETFD call. Then it would simply be

int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg);
if (ret == -EINVAL) ret = __syscall(SYS_fcntl, fd, F_DUPFD, arg);
if (ret >= 0) __syscall(SYS_fcntl, fd, F_SETFL, FD_CLOEXEC);
return __syscall_ret(ret);

Much nicer, isn't it? And it doesn't even allocate a file descriptor
uselessly.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] fcntl: Purpose of second DUPFD_CLOEXEC?
  2024-08-22 13:52 [musl] fcntl: Purpose of second DUPFD_CLOEXEC? Markus Wichmann
@ 2024-08-22 14:48 ` Rich Felker
  2024-08-24  9:17   ` Markus Wichmann
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2024-08-22 14:48 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl

On Thu, Aug 22, 2024 at 03:52:58PM +0200, Markus Wichmann wrote:
> Hi all,
> 
> I just stumbled upon the code to emulate F_DUPFD_CLOEXEC on old kernels.
> At the moment, it looks like this:
> 
> |int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg);
> |if (ret != -EINVAL) {
> |	if (ret >= 0)
> |		__syscall(SYS_fcntl, ret, F_SETFD, FD_CLOEXEC);
> |	return __syscall_ret(ret);
> |}
> |ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, 0);
> |if (ret != -EINVAL) {
> |	if (ret >= 0) __syscall(SYS_close, ret);
> |	return __syscall_ret(-EINVAL);
> |}
> |ret = __syscall(SYS_fcntl, fd, F_DUPFD, arg);
> |if (ret >= 0) __syscall(SYS_fcntl, ret, F_SETFD, FD_CLOEXEC);
> |return __syscall_ret(ret);
> 
> And I'm wondering what the point of the second call to F_DUPFD_CLOEXEC
> is. If it is just to test that the argument is valid, then why not just
> get rid of the block?

It's to test if F_DUPFD_CLOEXEC failed with EINVAL because it's an
unknown fcntl command or because arg exceeds the fd limit.
Unfortunately, the error is ambiguous.

> F_DUPFD has the same constraints on the argument.

That's a good point -- F_DUPFD can be expected to fail (thus not
introduce any fd leak race condition) as long as the limit does not
change concurrently with the fcntl call. But if the limit does change
concurrently, the test result may be outdated by the time it's used.
So I think it may make sense to try the fallback directly.

> Getting rid of that block also has the advantage of being able to factor
> out the F_SETFD call. Then it would simply be
> 
> int ret = __syscall(SYS_fcntl, fd, F_DUPFD_CLOEXEC, arg);
> if (ret == -EINVAL) ret = __syscall(SYS_fcntl, fd, F_DUPFD, arg);
> if (ret >= 0) __syscall(SYS_fcntl, fd, F_SETFL, FD_CLOEXEC);
> return __syscall_ret(ret);
> 
> Much nicer, isn't it? And it doesn't even allocate a file descriptor
> uselessly.

Seems like it. I'll wait for further feedback but this might be the
right thing to do.

Note that this factoring-out by inverting the conditions can be done
even without dropping the arg=0 second check.

Rich

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] fcntl: Purpose of second DUPFD_CLOEXEC?
  2024-08-22 14:48 ` Rich Felker
@ 2024-08-24  9:17   ` Markus Wichmann
  0 siblings, 0 replies; 3+ messages in thread
From: Markus Wichmann @ 2024-08-24  9:17 UTC (permalink / raw)
  To: musl

Am Thu, Aug 22, 2024 at 10:48:53AM -0400 schrieb Rich Felker:
> That's a good point -- F_DUPFD can be expected to fail (thus not
> introduce any fd leak race condition) as long as the limit does not
> change concurrently with the fcntl call. But if the limit does change
> concurrently, the test result may be outdated by the time it's used.
> So I think it may make sense to try the fallback directly.
>

If the limit is lowered concurrently, then the DUPFD either comes in
before that and everything works out, or all calls return -EINVAL. If
the limit is raised concurrently, then in both versions of this, there
are timings where musl spuriously allocates a non-CLOEXEC file
descriptor for a short time (as in, the kernel could have given us a
CLOEXEC FD from the start). I don't think you can win in this. But in
the original version, there are also timings where you allocate an FD,
close it, and return EINVAL.

From the point of view of the caller, calling fcntl(F_DUPFD_CLOEXEC)
while simultaneously changing the FD limit results in either an EINVAL
return or a file descriptor with CLOEXEC flag. And that happens also in
both versions.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-08-24  9:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-22 13:52 [musl] fcntl: Purpose of second DUPFD_CLOEXEC? Markus Wichmann
2024-08-22 14:48 ` Rich Felker
2024-08-24  9:17   ` Markus Wichmann

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).