mailing list of musl libc
 help / color / mirror / Atom feed
* [musl] Why is setrlimit() considered to have per-thread effect?
@ 2020-10-15  5:01 Alexey Izbyshev
  2020-10-15  8:50 ` Szabolcs Nagy
  2020-10-15 15:43 ` Rich Felker
  0 siblings, 2 replies; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-15  5:01 UTC (permalink / raw)
  To: musl

Hello,

Commit 544ee752cd[1] claims that setrlimit() is per-thread on Linux, 
similarly to setxid() calls, so it should be called via __synccall(). 
But this appears to be wrong: the kernel code operates on 
tsk->signal[2], which is a per-thread-group structure. Glibc doesn't 
call setrlimit() for each thread either. Am I missing something?

Tangentially, setgroups() is not called via __synccall(), though it does 
have per-thread effect. Is this intentional?

Alexey

[1] 
https://git.musl-libc.org/cgit/musl/commit/?id=544ee752cd38febfa3aa3798b4dfb6fabd13846b
[2] https://elixir.bootlin.com/linux/v5.9/source/kernel/sys.c#L1566

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15  5:01 [musl] Why is setrlimit() considered to have per-thread effect? Alexey Izbyshev
@ 2020-10-15  8:50 ` Szabolcs Nagy
  2020-10-15 15:49   ` Rich Felker
  2020-10-15 15:50   ` Alexey Izbyshev
  2020-10-15 15:43 ` Rich Felker
  1 sibling, 2 replies; 10+ messages in thread
From: Szabolcs Nagy @ 2020-10-15  8:50 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

* Alexey Izbyshev <izbyshev@ispras.ru> [2020-10-15 08:01:00 +0300]:
> Commit 544ee752cd[1] claims that setrlimit() is per-thread on Linux,
> similarly to setxid() calls, so it should be called via __synccall(). But
> this appears to be wrong: the kernel code operates on tsk->signal[2], which
> is a per-thread-group structure. Glibc doesn't call setrlimit() for each
> thread either. Am I missing something?

note that prlimit does not have synccall in
musl: the kernel implemented the per process
rlimit setting when prlimit was added.
(i think this is linux commit
 1c1e618ddd15f69fd87ccea596769f78c8065504 )

but older kernels don't have that.

> 
> Tangentially, setgroups() is not called via __synccall(), though it does
> have per-thread effect. Is this intentional?

that may be a bug, but it's not a posix api
so not a conformance issue, but a linux issue:
if other linux libcs don't do synccall then
that's the defacto interface contract.

> 
> Alexey
> 
> [1] https://git.musl-libc.org/cgit/musl/commit/?id=544ee752cd38febfa3aa3798b4dfb6fabd13846b
> [2] https://elixir.bootlin.com/linux/v5.9/source/kernel/sys.c#L1566

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15  5:01 [musl] Why is setrlimit() considered to have per-thread effect? Alexey Izbyshev
  2020-10-15  8:50 ` Szabolcs Nagy
@ 2020-10-15 15:43 ` Rich Felker
  1 sibling, 0 replies; 10+ messages in thread
From: Rich Felker @ 2020-10-15 15:43 UTC (permalink / raw)
  To: musl

On Thu, Oct 15, 2020 at 08:01:00AM +0300, Alexey Izbyshev wrote:
> Hello,
> 
> Commit 544ee752cd[1] claims that setrlimit() is per-thread on Linux,
> similarly to setxid() calls, so it should be called via
> __synccall(). But this appears to be wrong: the kernel code operates
> on tsk->signal[2], which is a per-thread-group structure. Glibc
> doesn't call setrlimit() for each thread either. Am I missing
> something?

POSIX specifies that it sets the limits for the process. If the kernel
doesn't do that, we have to implement in userspace.

> Tangentially, setgroups() is not called via __synccall(), though it
> does have per-thread effect. Is this intentional?

POSIX doesn't define setgroups, so it's up to the implementation.
Conceptually since POSIX has supplemental groups they probably
*should* be forced process-global, so maybe we should change this.

For an example that's more clear-cut, setfs[ug]id is explicitly not
process-global because it's not a security boundary and the whole
purpose is to be able to do local id changes then revert them for the
sake of performing access as a different user/group.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15  8:50 ` Szabolcs Nagy
@ 2020-10-15 15:49   ` Rich Felker
  2020-10-15 16:13     ` Alexey Izbyshev
  2020-10-15 15:50   ` Alexey Izbyshev
  1 sibling, 1 reply; 10+ messages in thread
From: Rich Felker @ 2020-10-15 15:49 UTC (permalink / raw)
  To: musl; +Cc: Alexey Izbyshev

On Thu, Oct 15, 2020 at 10:50:24AM +0200, Szabolcs Nagy wrote:
> * Alexey Izbyshev <izbyshev@ispras.ru> [2020-10-15 08:01:00 +0300]:
> > Commit 544ee752cd[1] claims that setrlimit() is per-thread on Linux,
> > similarly to setxid() calls, so it should be called via __synccall(). But
> > this appears to be wrong: the kernel code operates on tsk->signal[2], which
> > is a per-thread-group structure. Glibc doesn't call setrlimit() for each
> > thread either. Am I missing something?
> 
> note that prlimit does not have synccall in
> musl: the kernel implemented the per process
> rlimit setting when prlimit was added.
> (i think this is linux commit
>  1c1e618ddd15f69fd87ccea596769f78c8065504 )
> 
> but older kernels don't have that.

setrlimit implemented in terms of prlimit does; as far as I can tell
prlimit does not perform any process-global action itself but just
lets you target different tasks. This means we *could* "optimize"
setrlimit to skip __synccall and instead just iterate over the thread
list and SYS_prlimit each one from the calling thread context.

The prlimit function on the other hand behaves as the Linux syscall
and lets you set thread-specific limits.

> > Tangentially, setgroups() is not called via __synccall(), though it does
> > have per-thread effect. Is this intentional?
> 
> that may be a bug, but it's not a posix api
> so not a conformance issue, but a linux issue:
> if other linux libcs don't do synccall then
> that's the defacto interface contract.

I'm not sure about that. We should probably examine whether there's
any intent to change just one thread in real world usage, as well as
whether other systems (e.g. BSDs) do the POSIX-like behavior and make
it process-global. If so I'm inclined to change it to be more
consistent.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15  8:50 ` Szabolcs Nagy
  2020-10-15 15:49   ` Rich Felker
@ 2020-10-15 15:50   ` Alexey Izbyshev
  2020-10-15 20:05     ` Rich Felker
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-15 15:50 UTC (permalink / raw)
  To: musl

On 2020-10-15 11:50, Szabolcs Nagy wrote:
> note that prlimit does not have synccall in
> musl: the kernel implemented the per process
> rlimit setting when prlimit was added.
> (i think this is linux commit
>  1c1e618ddd15f69fd87ccea596769f78c8065504 )
> 
> but older kernels don't have that.
> 
Ah, thank you for checking that, though the transition appear to have 
happened much earlier than the commit you referenced (which is not 
relevant), in pre-git epoch between 2.6.9 and 2.6.10[1, 2]. I was 
confused because Linux man pages never mention that and explicitly say 
"Resource limits are per-process attributes that are shared by all of 
the threads in a process."[3], but I should have checked old sources.

>> Tangentially, setgroups() is not called via __synccall(), though it 
>> does
>> have per-thread effect. Is this intentional?
> 
> that may be a bug, but it's not a posix api
> so not a conformance issue, but a linux issue:
> if other linux libcs don't do synccall then
> that's the defacto interface contract.
> 
FWIW, glibc does synccall since 2011: 
https://sourceware.org/git/?p=glibc.git;a=commit;h=70181fddf14

[1] https://elixir.bootlin.com/linux/v2.6.9/source/kernel/sys.c#L1537
[2] https://elixir.bootlin.com/linux/v2.6.10/source/kernel/sys.c#L1487
[3] https://man7.org/linux/man-pages/man2/setrlimit.2.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15 15:49   ` Rich Felker
@ 2020-10-15 16:13     ` Alexey Izbyshev
  2020-10-15 17:13       ` Rich Felker
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-15 16:13 UTC (permalink / raw)
  To: musl

On 2020-10-15 18:49, Rich Felker wrote:
> setrlimit implemented in terms of prlimit does; as far as I can tell
> prlimit does not perform any process-global action itself but just
> lets you target different tasks. This means we *could* "optimize"
> setrlimit to skip __synccall and instead just iterate over the thread
> list and SYS_prlimit each one from the calling thread context.
> 
> The prlimit function on the other hand behaves as the Linux syscall
> and lets you set thread-specific limits.
> 
But in my understanding, prlimit() sets process- (not thread-) specific 
limits, and have done so since its introduction[1]. The code operates on 
"signal" structure which is shared between threads of a thread group. 
Further, an earlier commit[2] explicitly says that "...rlimit are
per process and not per-thread.". It's true that in pre-2.6.10 kernels 
setrlimit() operated in per-thread limits (see my reply to Szabolcs), 
but it's not related to prlimit() syscall, which was added much later.

To be clear, I did not propose to optimize setrlimit() in my initial 
email, I was just surprised that synccall() is needed at all. But if we 
want optimization, it seems that trying prlimit() first and falling back 
to synccall() in case of ENOSYS would be what we want.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c022a0acad534fd5f5d5f17280f6d4d135e74e81
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7855c35da7ba16b389d17710401c4a55a3ea2102


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15 16:13     ` Alexey Izbyshev
@ 2020-10-15 17:13       ` Rich Felker
  2020-10-15 18:26         ` Alexey Izbyshev
  0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2020-10-15 17:13 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Thu, Oct 15, 2020 at 07:13:30PM +0300, Alexey Izbyshev wrote:
> On 2020-10-15 18:49, Rich Felker wrote:
> >setrlimit implemented in terms of prlimit does; as far as I can tell
> >prlimit does not perform any process-global action itself but just
> >lets you target different tasks. This means we *could* "optimize"
> >setrlimit to skip __synccall and instead just iterate over the thread
> >list and SYS_prlimit each one from the calling thread context.
> >
> >The prlimit function on the other hand behaves as the Linux syscall
> >and lets you set thread-specific limits.
> >
> But in my understanding, prlimit() sets process- (not thread-)
> specific limits, and have done so since its introduction[1]. The

That was not my understanding, but it may be true. I would not assume
it's true just because of the word "process" in a commit message or
comment since kernel folks, especially in that era, regularly used
"process" and "thread" interchangibly/inconsistently.

> code operates on "signal" structure which is shared between threads
> of a thread group. Further, an earlier commit[2] explicitly says
> that "...rlimit are
> per process and not per-thread.". It's true that in pre-2.6.10
> kernels setrlimit() operated in per-thread limits (see my reply to
> Szabolcs), but it's not related to prlimit() syscall, which was
> added much later.
> 
> To be clear, I did not propose to optimize setrlimit() in my initial
> email, I was just surprised that synccall() is needed at all. But if
> we want optimization, it seems that trying prlimit() first and
> falling back to synccall() in case of ENOSYS would be what we want.

If correct, I agree -- we can avoid the need for __synccall when
prlimit works. I'd like to find commits or source lines supporting
that in their actual (code) content though rather than just as a
mention in commit messages, since it's contrary to what my (probably
outdated) understanding of how rlimits worked was.

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c022a0acad534fd5f5d5f17280f6d4d135e74e81
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7855c35da7ba16b389d17710401c4a55a3ea2102

Somewhat off-topic, but for some reason that second link is bringing
my browser to a crawl swapping, despite the commit being tiny when I
view it locally in my kernel tree. Weird.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15 17:13       ` Rich Felker
@ 2020-10-15 18:26         ` Alexey Izbyshev
  2020-10-15 20:03           ` Rich Felker
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2020-10-15 18:26 UTC (permalink / raw)
  To: musl

On 2020-10-15 20:13, Rich Felker wrote:
> On Thu, Oct 15, 2020 at 07:13:30PM +0300, Alexey Izbyshev wrote:
> If correct, I agree -- we can avoid the need for __synccall when
> prlimit works. I'd like to find commits or source lines supporting
> that in their actual (code) content though rather than just as a
> mention in commit messages, since it's contrary to what my (probably
> outdated) understanding of how rlimits worked was.
> 
Here they are (the first two were referenced in my reply to Szabolcs).

* Change of setrlimit() to operate on signal_struct in 2.6.10: 
https://elixir.bootlin.com/linux/v2.6.10/source/kernel/sys.c#L1487 
(compare with 
https://elixir.bootlin.com/linux/v2.6.9/source/kernel/sys.c#L1537)

* Definition of signal_struct in 2.6.10, which is per-thread-group 
(apart from "rlim", it contains many other thread-group-related fields): 
https://elixir.bootlin.com/linux/v2.6.10/source/include/linux/sched.h#L268

* Usage if signal_struct in 2.6.36 (the first kernel with prlimit()) in 
do_prlimit(), which is a common function implementing setrlimit(), 
getrlimit() and prlimit(): 
https://elixir.bootlin.com/linux/v2.6.36/source/kernel/sys.c#L1333

Finally, I performed a simple experiment: on 2.6.30 kernel (with glibc 
2.5), created a thread and changed RLIMIT_FSIZE via setrlimit(). After 
that, "/proc/pid/limits" reported the new limit, so it was applied to 
the whole process. Strace confirmed that only a single setrlimit() 
system call was performed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15 18:26         ` Alexey Izbyshev
@ 2020-10-15 20:03           ` Rich Felker
  0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2020-10-15 20:03 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Thu, Oct 15, 2020 at 09:26:33PM +0300, Alexey Izbyshev wrote:
> On 2020-10-15 20:13, Rich Felker wrote:
> >On Thu, Oct 15, 2020 at 07:13:30PM +0300, Alexey Izbyshev wrote:
> >If correct, I agree -- we can avoid the need for __synccall when
> >prlimit works. I'd like to find commits or source lines supporting
> >that in their actual (code) content though rather than just as a
> >mention in commit messages, since it's contrary to what my (probably
> >outdated) understanding of how rlimits worked was.
> >
> Here they are (the first two were referenced in my reply to Szabolcs).
> 
> * Change of setrlimit() to operate on signal_struct in 2.6.10:
> https://elixir.bootlin.com/linux/v2.6.10/source/kernel/sys.c#L1487
> (compare with
> https://elixir.bootlin.com/linux/v2.6.9/source/kernel/sys.c#L1537)
> 
> * Definition of signal_struct in 2.6.10, which is per-thread-group
> (apart from "rlim", it contains many other thread-group-related
> fields): https://elixir.bootlin.com/linux/v2.6.10/source/include/linux/sched.h#L268
> 
> * Usage if signal_struct in 2.6.36 (the first kernel with prlimit())
> in do_prlimit(), which is a common function implementing
> setrlimit(), getrlimit() and prlimit():
> https://elixir.bootlin.com/linux/v2.6.36/source/kernel/sys.c#L1333
> 
> Finally, I performed a simple experiment: on 2.6.30 kernel (with
> glibc 2.5), created a thread and changed RLIMIT_FSIZE via
> setrlimit(). After that, "/proc/pid/limits" reported the new limit,
> so it was applied to the whole process. Strace confirmed that only a
> single setrlimit() system call was performed.

Excellent, thanks for doing this research! I'll adjust setrlimit to
use __synccall only in the fallback where SYS_prlimit fails.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] Why is setrlimit() considered to have per-thread effect?
  2020-10-15 15:50   ` Alexey Izbyshev
@ 2020-10-15 20:05     ` Rich Felker
  0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2020-10-15 20:05 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Thu, Oct 15, 2020 at 06:50:41PM +0300, Alexey Izbyshev wrote:
> >>Tangentially, setgroups() is not called via __synccall(), though
> >>it does
> >>have per-thread effect. Is this intentional?
> >
> >that may be a bug, but it's not a posix api
> >so not a conformance issue, but a linux issue:
> >if other linux libcs don't do synccall then
> >that's the defacto interface contract.
> >
> FWIW, glibc does synccall since 2011:
> https://sourceware.org/git/?p=glibc.git;a=commit;h=70181fddf14

Thanks, looks like we should do the same here.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-15 20:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15  5:01 [musl] Why is setrlimit() considered to have per-thread effect? Alexey Izbyshev
2020-10-15  8:50 ` Szabolcs Nagy
2020-10-15 15:49   ` Rich Felker
2020-10-15 16:13     ` Alexey Izbyshev
2020-10-15 17:13       ` Rich Felker
2020-10-15 18:26         ` Alexey Izbyshev
2020-10-15 20:03           ` Rich Felker
2020-10-15 15:50   ` Alexey Izbyshev
2020-10-15 20:05     ` Rich Felker
2020-10-15 15:43 ` Rich Felker

mailing list of musl libc

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.vuxu.org/musl

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 musl musl/ http://inbox.vuxu.org/musl \
		musl@inbox.vuxu.org
	public-inbox-index musl

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.musl


code repositories for the project(s) associated with this inbox:

	https://git.vuxu.org/mirror/musl/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git