* [9front] Re: [9fans] monotonic time and randomness on Plan 9
[not found] ` <CADSkJJXo2s-ttzB+3S0Nb_GQH35Fq9kjNrodUOEO6a8qngTfSA@mail.gmail.com>
@ 2025-03-12 10:50 ` hiro
2025-03-12 11:51 ` [9front] " Jamie McClymont
2025-03-12 12:28 ` [9front] " Dan Cross
0 siblings, 2 replies; 27+ messages in thread
From: hiro @ 2025-03-12 10:50 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 3634 bytes --]
I'd like to answer russ, but i'm scared to post to 9fans first without
consulting here.
a second pair of eyes would be welcome help to censor/revise my bullshit:
> Like rfork(2) or sbrk(2) or other system calls, I argue that these
specific operations (read time and read entropy) have become so fundamental
to modern programs that they should be available without the ceremony and
conceptual overhead of managing long-term fds.
Wouldn't high-accuracy timing, scheduling information, probing of entropy,
etc., in these times of pipelining and multithreading, be security critical
and thus requiring access control, granularity limits, rate-limits?
Are there no other paths left for optimizing per-process file descriptor
table maintenance?
I remember linux people postponed the problem of running out of FDs by
switching to OS nesting via virtualization (a python expert i met claimed
this to be a fact).
On plan9, can't we already share FDs between multiple processes? Maybe we
can make FD sharing more granular? Allow managing sets of FDs to be shared
and unshared easily. Bonus: not just at fork but also post-fork.
I realise, done right, this approaches a file hierarchy. Which we already
have. And we can tunnel all access through a single FD, via 9p.
Maybe make 9p the new syscall.
On Mon, Mar 10, 2025 at 7:41 PM Russ Cox <rsc@swtch.com> wrote:
> Thanks for the feedback so far. Correcting cinap's email address on this
> reply.
>
> I don't believe having a #c/magicfds file is a net win. Certainly if it is
> writable / configurable, then much of the benefit is lost. The benefit is
> not having to do set up like opening files and holding an fd open at
> process startup, even temporarily. Like rfork(2) or sbrk(2) or other system
> calls, I argue that these specific operations (read time and read entropy)
> have become so fundamental to modern programs that they should be available
> without the ceremony and conceptual overhead of managing long-term fds. A
> readable one might be useful as documentation, but if it's a hard-coded set
> then the kread(2) man page should suffice.
>
> As for user-space time access, it is true that on 386 and amd64 we could
> use RDTSC and saved clock parameters. We've actually moved away from that
> on basically every operating system at this point, because the parameters
> do change, and we need to know when to update them. Or the kernel would
> have to share a memory page with the parameters with the user process, and
> then that layout becomes part of the kernel interface. Of course we could
> add shared code that the kernel leaves to be called as well. All of that
> seems too complex and bespoke. (On Linux that's exactly what happens, but I
> repeat myself.) And I don't know what we'd do instead on arm.
>
> The Blue Gene system call batching also seems a bit much to me, especially
> since (1) we want to not have to keep the fd open, (2) we don't want to
> hard-code the target pointer for the read, and (3) ideally we don't want to
> go through a per-process setup dance that includes opening files each time
> a process is created. I wouldn't mind the per-process setup if (1) and (2)
> were not problems as well.
>
> Best,
> Russ
> *9fans <https://9fans.topicbox.com/latest>* / 9fans / see discussions
> <https://9fans.topicbox.com/groups/9fans> + participants
> <https://9fans.topicbox.com/groups/9fans/members> + delivery options
> <https://9fans.topicbox.com/groups/9fans/subscription> Permalink
> <https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M8cf21f21f8efb2e6ad43994c>
>
[-- Attachment #2: Type: text/html, Size: 4586 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] [9fans] monotonic time and randomness on Plan 9
2025-03-12 10:50 ` [9front] Re: [9fans] monotonic time and randomness on Plan 9 hiro
@ 2025-03-12 11:51 ` Jamie McClymont
2025-03-12 12:36 ` hiro
2025-03-12 12:28 ` [9front] " Dan Cross
1 sibling, 1 reply; 27+ messages in thread
From: Jamie McClymont @ 2025-03-12 11:51 UTC (permalink / raw)
To: 9front
> Wouldn't high-accuracy timing, scheduling information, probing of entropy, etc., in these times of pipelining and multithreading, be security critical and thus requiring access control, granularity limits, rate-limits?
IMO, no.
In the case of random number generation: we know how to make good random number generators where it is computationally impossible to determine the internal state form the output values. If we are not using one of those, it’s something we can and should fix, rather than designing kernel interfaces around the need to block access to a broken RNG.
In the case of high-accuracy timing, we already expose RDTSC to userspace, which enables you to measure how long something takes, which is all that is necessary to exploit speculative-execution attacks. If it were not available, a process could approximate it well enough by having a second thread increment a shared counter in a loop.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 10:50 ` [9front] Re: [9fans] monotonic time and randomness on Plan 9 hiro
2025-03-12 11:51 ` [9front] " Jamie McClymont
@ 2025-03-12 12:28 ` Dan Cross
2025-03-12 13:15 ` hiro
2025-03-12 15:27 ` Kurt H Maier
1 sibling, 2 replies; 27+ messages in thread
From: Dan Cross @ 2025-03-12 12:28 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 6:53 AM hiro <23hiro@gmail.com> wrote:
> I'd like to answer russ, but i'm scared to post to 9fans first without consulting here.
> a second pair of eyes would be welcome help to censor/revise my bullshit:
I hope that no one is scared to post to 9fans. Healthy communities
need to allow for healthy and robust debate about technical matters.
As long as it's presented in a collegial manner, no one should be
afraid to post.
I'll try to address some of your specific points, but confession: I
haven't been following the discussion in 9fans, so may have some of
the details wrong.
> > Like rfork(2) or sbrk(2) or other system calls, I argue that these specific operations (read time and read entropy) have become so fundamental to modern programs that they should be available without the ceremony and conceptual overhead of managing long-term fds.
>
> Wouldn't high-accuracy timing, scheduling information, probing of entropy, etc., in these times of pipelining and multithreading, be security critical and thus requiring access control, granularity limits, rate-limits?
I agree with Jamie McClymont's response, here. Fundamental machine
instructions (like `RDTSC` on x86) are difficult to govern in this
way, and unless very carefully designed, such guards are easy to work
around.
Moreover, high-accuracy timing, entropy generation, and so on is very
useful for legitimate applications. Limiting access to those things
due to probably-specious security concerns cuts out the legitimate use
cases, too.
That said, the random number instructions on e.g. x86 are not always
available, and on some platforms rely on firmware to set their state
(e.g, AMD EPYC requires getting the PSP involved). Ostensibly,
programs must check for it by looking at a CPUID leaf before issuing
the instruction, but trapping and emulating in the kernel is onerous;
`/dev/random` is cleaner from that perspective. A system call may be
fine, but I'm less convinced about that than I am about a time of day
system call. Then again, I also think it's a less-frequent operation;
often programs will grab some entropy from the OS to seed their own
pseudorandom number generator.
> Are there no other paths left for optimizing per-process file descriptor table maintenance?
I'm not sure it's about optimization of file descriptor maintenance,
but rather about the implied complexity that's pushed onto each
program that wants to grab a time stamp.
> I remember linux people postponed the problem of running out of FDs by switching to OS nesting via virtualization (a python expert i met claimed this to be a fact).
I don't believe that's true. The original impetus for
containerization (what I presume you're referring to, though note
that's rather different than true virtualization a la multiple
independent instances of an operating system running in a virtual
machine on some physical host) was to isolate services that made
assumptions about not being co-resident with other instances of
themselves, making it possible to host multiple instances of those
services on the same physical machine. This was essential for
utilization efficiency in hyperscalar data centers, though one might
reasonably argue that containers are, perhaps, not the best way to
address the problem. The concept has now morphed into a grab-bag of
different resource isolation mechanisms that can be combined to
sandbox, say, process (as in, PID), open file, and network namespaces,
in addition to the file namespace.
But I don't think people were worried about running out of file
descriptors: I just checked on a Linux machine, and the hard limit on
the maximum number of file descriptors for a single process is
configured as 2^19.
> On plan9, can't we already share FDs between multiple processes? Maybe we can make FD sharing more granular? Allow managing sets of FDs to be shared and unshared easily. Bonus: not just at fork but also post-fork.
Sure. But we can also close those file descriptors, meaning that
they'd have to be reopened if some library routine expects them to be
ever-present.
> I realise, done right, this approaches a file hierarchy. Which we already have. And we can tunnel all access through a single FD, via 9p.
>
> Maybe make 9p the new syscall.
9p isn't appropriate for everything, and I can imagine how this might
have some very unintended consequences if not carefully managed; for
instance, a program might end up accidentally reading time from
another host when it doesn't intend to.
- Dan C.
> On Mon, Mar 10, 2025 at 7:41 PM Russ Cox <rsc@swtch.com> wrote:
>>
>> Thanks for the feedback so far. Correcting cinap's email address on this reply.
>>
>> I don't believe having a #c/magicfds file is a net win. Certainly if it is writable / configurable, then much of the benefit is lost. The benefit is not having to do set up like opening files and holding an fd open at process startup, even temporarily. Like rfork(2) or sbrk(2) or other system calls, I argue that these specific operations (read time and read entropy) have become so fundamental to modern programs that they should be available without the ceremony and conceptual overhead of managing long-term fds. A readable one might be useful as documentation, but if it's a hard-coded set then the kread(2) man page should suffice.
>>
>> As for user-space time access, it is true that on 386 and amd64 we could use RDTSC and saved clock parameters. We've actually moved away from that on basically every operating system at this point, because the parameters do change, and we need to know when to update them. Or the kernel would have to share a memory page with the parameters with the user process, and then that layout becomes part of the kernel interface. Of course we could add shared code that the kernel leaves to be called as well. All of that seems too complex and bespoke. (On Linux that's exactly what happens, but I repeat myself.) And I don't know what we'd do instead on arm.
>>
>> The Blue Gene system call batching also seems a bit much to me, especially since (1) we want to not have to keep the fd open, (2) we don't want to hard-code the target pointer for the read, and (3) ideally we don't want to go through a per-process setup dance that includes opening files each time a process is created. I wouldn't mind the per-process setup if (1) and (2) were not problems as well.
>>
>> Best,
>> Russ
>> 9fans / 9fans / see discussions + participants + delivery options Permalink
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] [9fans] monotonic time and randomness on Plan 9
2025-03-12 11:51 ` [9front] " Jamie McClymont
@ 2025-03-12 12:36 ` hiro
0 siblings, 0 replies; 27+ messages in thread
From: hiro @ 2025-03-12 12:36 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 2053 bytes --]
> In the case of random number generation: we know how to make good random
number generators where it is computationally impossible to determine the
internal state form the output values. If we are not using one of those,
it’s something we can and should fix, rather than designing kernel
interfaces around the need to block access to a broken RNG.
In the case of rng I was concerned of ressource exhaustion, thus mentioned
rate-limiting. It is not the amount of randomness that is my perceived
risk, it is the cost of creation.
> If it were not available, a process could approximate it well enough by
having a second thread increment a shared counter in a loop.
thread? you mean process? you might be able to rely on inter process jitter
being high enough to make this a low-accuracy time.
there is a user=none that can do very little, but if it can exhaust all
FDs, exhaust all PIDs, all computational power, then the high accuracy of
some other processe's time management might become quite meaningless (DoS).
Why do I seem to conflate timing and randomness: both are costly.
Why is time costly? In order to get high-accuracy time into a process
context, you also need free computation ressources (guaranteed
uninterrupted on-cpu time during the complete time-critical procedure),
locality (cache synchronization delays), low syscall overhead (the total
delay). You can approach this with multiple techniques, adding complexity,
and often ressource underutilization. One example is pinning the
time-critical process to one core and spinning, polling, even if nothing
needs to be done, lest the core's pipeline not be corrupted by competing
workloads. I believe this way you do not need to care about scheduler and
interrupt handling complexity, but of course it's not very optimal.
Otoh, if your process is not time-critical, why do you need accurate time?
Lowest-accuracy yet monotonic clock seems to be the common ground that
mostly nobody would like to give up. All the other stuff is at best luxury.
[-- Attachment #2: Type: text/html, Size: 2322 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 12:28 ` [9front] " Dan Cross
@ 2025-03-12 13:15 ` hiro
2025-03-12 14:59 ` Dan Cross
2025-03-12 15:02 ` ron minnich
2025-03-12 15:27 ` Kurt H Maier
1 sibling, 2 replies; 27+ messages in thread
From: hiro @ 2025-03-12 13:15 UTC (permalink / raw)
To: 9front
> As long as it's presented in a collegial manner, no one should be
afraid to post.
I'm afraid my questions might be interpreted as statements, here
people are used to handle my probes with less heart-ache (for both
sides). So I'd like to condense it first, so that Russ doesn't lose
interest in the discussion just based on my incompetence. Thank you
all for helping me with this.
> I agree with Jamie McClymont's response, here. Fundamental machine
> instructions (like `RDTSC` on x86) are difficult to govern in this
> way, and unless very carefully designed, such guards are easy to work
> around.
I wouldn't want to govern RDTSC.
RDTSC is nice, bec. it's just a counter that is not synchronised. By
using RDTSC there is no leakage of actual time as perceived by another
process on another core. This is good.
Time is distinct in that it has a reference/synchronisation point.
That's more costly and maybe even a side-channel. That i would want to
govern.
> Moreover, high-accuracy timing, entropy generation, and so on is very
> useful for legitimate applications. Limiting access to those things
> due to probably-specious security concerns cuts out the legitimate use
> cases, too.
Agreed. Being allowed to use any (however limited) ressource
whatsoever (as user none) is already questionable. The limits are not
very good right effective right now, but I hope it can be improved
over time, bec. I love the security model in plan9 where you start
with the least possible privilege and work yourself up (as opposed to
other systems where you mainly start as root and then drop privileges
in the flow).
> That said, the random number instructions on e.g. x86 are not always
> available, and on some platforms rely on firmware to set their state
> (e.g, AMD EPYC requires getting the PSP involved). Ostensibly,
> programs must check for it by looking at a CPUID leaf before issuing
> the instruction, but trapping and emulating in the kernel is onerous;
Doing security in the kernel has some benefits, but without protection
from ressource exhaustion maybe doing rng in userland might be worth
it for the common case? I would see a benefit in starting up the
process with existing entropy/seeds at fork and then during the rest
of RNG during the process lifetime in userland only.
But I'm by no means convinced of anything here.
> `/dev/random` is cleaner from that perspective. A system call may be
> fine, but I'm less convinced about that than I am about a time of day
> system call. Then again, I also think it's a less-frequent operation;
> often programs will grab some entropy from the OS to seed their own
> pseudorandom number generator.
Oh, I guess my thought is not so original. That's great :D
> > Are there no other paths left for optimizing per-process file descriptor table maintenance?
>
> I'm not sure it's about optimization of file descriptor maintenance,
> but rather about the implied complexity that's pushed onto each
> program that wants to grab a time stamp.
Basically, what we're saying here is, that reading a file is too
difficult. maybe we can fix this with another abstraction? I see files
as very valuable for example bec. they have names and come in named
folder hierarchies. It's definitely worth preserving this concept,
even if we could optimize the interface as we dreamed to do with
batching for 9p2020.
Why do you say it's not an optimization? What is the problem exactly,
is it the complexity of a filepath, or is it the complexity of the
sequence of syscalls required to read a small file like that?
And before changing 9p and the syscall interface, remember also that
userland Libraries can be used to hide the sequence into a single
function call.
BTW: how often do we need accurate time? If we need it once per hour
or so, opening and closing the FD repeatedly wouldn't hurt anyone, or?
It seemed so far implied that the FD has to either stay open forever
or be replaced with a syscall. Nobody seemed to mention that it can be
closed and another FD reopened later on.
I clearly don't get the gist of the original problem. Are programs
that open many FDs at start somehow symptomatic for a certain kind of
low-quality software engineering trend these days that we are trying
really hard to avoid? I haven't looked! If true, are we fixing the
symptoms by cargo-culting older unix, or are we actually fixing the
root of the problem?
> > I remember linux people postponed the problem of running out of FDs by switching to OS nesting via virtualization (a python expert i met claimed this to be a fact).
>
> I don't believe that's true. The original impetus for
> containerization (what I presume you're referring to, though note
> that's rather different than true virtualization a la multiple
> independent instances of an operating system running in a virtual
> machine on some physical host) was to isolate services that made
> assumptions about not being co-resident with other instances of
> themselves, making it possible to host multiple instances of those
> services on the same physical machine. This was essential for
> utilization efficiency in hyperscalar data centers, though one might
> reasonably argue that containers are, perhaps, not the best way to
> address the problem. The concept has now morphed into a grab-bag of
> different resource isolation mechanisms that can be combined to
> sandbox, say, process (as in, PID), open file, and network namespaces,
> in addition to the file namespace.
No, I was indeed speaking of CPU level virtualization. If we need 1000
FDs, but every kernel can only handle 10 FDs, you can just start 100
VMs with 10 VMs each to solve the problem. That's how my python expert
explained it to me, sorry if it's not true ;)
Containers would only worsen this problem.
> Sure. But we can also close those file descriptors, meaning that
> they'd have to be reopened if some library routine expects them to be
> ever-present.
Exactly. I would understand if libraries chose not to close it bec.
processes traditionally are short-lived and the cost of an open FD low
(at least on linux as you just confirmed).
> 9p isn't appropriate for everything, and I can imagine how this might
> have some very unintended consequences if not carefully managed; for
> instance, a program might end up accidentally reading time from
> another host when it doesn't intend to.
Yes, time is really a weird concept :)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 13:15 ` hiro
@ 2025-03-12 14:59 ` Dan Cross
2025-03-14 23:33 ` hiro
2025-03-12 15:02 ` ron minnich
1 sibling, 1 reply; 27+ messages in thread
From: Dan Cross @ 2025-03-12 14:59 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 9:20 AM hiro <23hiro@gmail.com> wrote:
> > As long as it's presented in a collegial manner, no one should be
> afraid to post.
>
> I'm afraid my questions might be interpreted as statements, here
> people are used to handle my probes with less heart-ache (for both
> sides). So I'd like to condense it first, so that Russ doesn't lose
> interest in the discussion just based on my incompetence. Thank you
> all for helping me with this.
My advice is: don't worry about that. Keep the discussion centralized
on 9fans; Russ is genuinely one of the nicest people I know, he won't
bite.
> > I agree with Jamie McClymont's response, here. Fundamental machine
> > instructions (like `RDTSC` on x86) are difficult to govern in this
> > way, and unless very carefully designed, such guards are easy to work
> > around.
>
> I wouldn't want to govern RDTSC.
>
> RDTSC is nice, bec. it's just a counter that is not synchronised. By
> using RDTSC there is no leakage of actual time as perceived by another
> process on another core. This is good.
>
> Time is distinct in that it has a reference/synchronisation point.
> That's more costly and maybe even a side-channel. That i would want to
> govern.
But taking a timestamp and then computing an offset based on the TSC
is a fine way to _approximate_ time. It's not as accurate as having
the kernel do it for you, as Russ indicated, but if you allow the user
to read the clock at all, ever, _and_ allow them access to timing
source that's locked to some stable reference (like the CPU clock
frequency, DVFS notwithstanding) then you've lost, as far as the
security of the ToD clock across, say, hyperthread buddy pairs goes.
> > Moreover, high-accuracy timing, entropy generation, and so on is very
> > useful for legitimate applications. Limiting access to those things
> > due to probably-specious security concerns cuts out the legitimate use
> > cases, too.
>
> Agreed. Being allowed to use any (however limited) ressource
> whatsoever (as user none) is already questionable. The limits are not
> very good right effective right now, but I hope it can be improved
> over time, bec. I love the security model in plan9 where you start
> with the least possible privilege and work yourself up (as opposed to
> other systems where you mainly start as root and then drop privileges
> in the flow).
I wouldn't describe it that way, but honestly, I think arguing against
a ToD system call on security grounds isn't going to be persuasive.
> > That said, the random number instructions on e.g. x86 are not always
> > available, and on some platforms rely on firmware to set their state
> > (e.g, AMD EPYC requires getting the PSP involved). Ostensibly,
> > programs must check for it by looking at a CPUID leaf before issuing
> > the instruction, but trapping and emulating in the kernel is onerous;
>
> Doing security in the kernel has some benefits, but without protection
> from ressource exhaustion maybe doing rng in userland might be worth
> it for the common case?
Just so. The kernel has some limited pool of entropy that it adds to
from time to time, but if you have some need for a substantial amount
of randomness, you're better off taking matters into your own hands.
The point with mentioning RDRAND is to ask the question, "how does one
_best_ provide the programmer with the tools needed to accomplish a
task at hand?" Those tools, in turn, shape the design of programs. In
the case of entropy, if a system provides a call and documents it as
the preferred mechanism for getting some, then the design point of
"what do we do if they invoke an instruction that may or may not
exist?" becomes at least a little easier. Maybe the answer is "we give
them a little bit of entropy from our pool..." but at least the
program isn't sitting there chewing itself up generating illegal
instruction exceptions that the kernel is fielding on its behalf.
> I would see a benefit in starting up the
> process with existing entropy/seeds at fork and then during the rest
> of RNG during the process lifetime in userland only.
> But I'm by no means convinced of anything here.
That's an approach one could take, sure. But an issue there is that it
injects complexity into process creation in a way that is difficult to
hide from userspace; you'd essentially need a hook into an RNG library
that runs every time you `fork` and with the extant `rfork` semantics
this could lead to weird races and so on, and doesn't cover the case
where a user provides their own RNG, as opposed to using one from libc
or similar. That it doesn't compose well with the existing primitives
suggests to me that it's not a great approach in a plan9-ish context.
> > `/dev/random` is cleaner from that perspective. A system call may be
> > fine, but I'm less convinced about that than I am about a time of day
> > system call. Then again, I also think it's a less-frequent operation;
> > often programs will grab some entropy from the OS to seed their own
> > pseudorandom number generator.
>
> Oh, I guess my thought is not so original. That's great :D
>
> > > Are there no other paths left for optimizing per-process file descriptor table maintenance?
> >
> > I'm not sure it's about optimization of file descriptor maintenance,
> > but rather about the implied complexity that's pushed onto each
> > program that wants to grab a time stamp.
>
> Basically, what we're saying here is, that reading a file is too
> difficult. maybe we can fix this with another abstraction?
Well yeah, but looking at the abstractions at hand in plan9-ish style
systems, you've got basically three choices: a file, synthesized by
the operating system, that you interact with via the usual
open/close/read/write; a system call; or something involving virtual
memory (e.g., a page shared between the kernel and userspace). The
latter one has very little precedent in plan9, and raises a whole host
of design questions, so I'd probably rule it out; at least for
entropy, less so for time. That leaves the first two: for entropy,
I'd argue for the file, but for time, a system call seems appropriate.
> I see files
> as very valuable for example bec. they have names and come in named
> folder hierarchies. It's definitely worth preserving this concept,
> even if we could optimize the interface as we dreamed to do with
> batching for 9p2020.
I agree with this, but I don't think anyone is arguing for moving away
from that model, so I'm not sure how relevant it is to the question at
hand. I think a better way to phrase it is, where does one draw the
line between whether something should be provided by a file
abstraction versus a system call?
To illustrate, look at a slightly different case: process creation a
la `rfork()`. Plan 9 has always had `/proc`; one could imagine a
`#p/clone` that worked something like `/net/tcp/clone`: you open this
file and read from it, and a new process springs into existence, the
child of the current process, that is a duplicate of the parent except
that the data read is the path to a new directory representing the
child process in the parent, or the directory of the parent in the
child (or nil on failure). Perhaps one might write a set of textual
flags that represent the various flags one can pass to `rfork` to
control sharing and so forth, before reading. On the face of it, it
doesn't immediately seem like an awful interface, and I have no doubt
that something like this _could_ be built. But we _don't_ have that;
instead, `rfork()` is a system call. Why is that? I don't claim to
know definitively, but I imagine that someone might have at least
considered it once. Likely, it's too hard to get the details right
(some problems with concurrency spring to mind immediately, if the
read is into a non-stack buffer; what would it mean if I tried to do
access a remote machine's `/proc/clone`; etc), and process creation is
so frequent that the overhead of interacting with the file API isn't
worth it for some imagined gain in elegance or orthogonality. Anyway,
I think the question is at least worth pondering if we're asking
whether something should be elevated to the level of a syscall. In the
`rfork` case, being a system call really is simpler and more robust
for such a fundamental operation. I believe that's the case Russ is
trying to make, as well. It will always be a tradeoff; the key is
deciding whether that tradeoff is worth it along some important
dimension.
> Why do you say it's not an optimization? What is the problem exactly,
> is it the complexity of a filepath, or is it the complexity of the
> sequence of syscalls required to read a small file like that?
Sorry, I don't think I was clear: it's not that the system call isn't
an optimization (it is), but rather that that's only part of the
concern. I suspect that the complexity of maintaining the file
descriptor _combined_ with the performance implications of not doing
so is an issue. I don't really know the context (as I mentioned, I
hadn't been paying attention to 9fans until Skip mentioned this to me
yesterday), but if we accept as a given the premise that the current
mechanism is too slow, and that it would be good to address that, then
we're left to wonder how we might go about doing that.
> And before changing 9p
Surely no one is suggesting changing 9p to do this?
> and the syscall interface, remember also that
> userland Libraries can be used to hide the sequence into a single
> function call.
Yes, but at a significant cost in overhead.
> BTW: how often do we need accurate time?
Oh, you need it All. The. Time. On busy servers, many, many times a
second. Rates in the kilohertz, spread across many cores.
Of course it depends on the application, but in a lot of production
settings, the need is almost absurdly frequent and the overhead of
even a system call is absolutely brutal, which is why there's been so
much effort in the Linux world to get that stuff into userspace: first
by doing an initial read and then incrementing off of the TSC, and
then by using e.g. the vdso.
I don't know how much it matters on plan9. I know some folks using it
for HPC-style applications have data showing that it really matters.
Go cares a lot because they do a lot with high-precision timers in the
runtime. Most 9fans probably don't need it, but on the high end you
really want to reduce the overhead as much as possible; the closer you
can get it just being a load from RAM the better.
> If we need it once per hour
> or so, opening and closing the FD repeatedly wouldn't hurt anyone, or?
> It seemed so far implied that the FD has to either stay open forever
> or be replaced with a syscall. Nobody seemed to mention that it can be
> closed and another FD reopened later on.
I think it's being taken as a given that the overhead of
open/read/close is too high, and so if one wants to retain the file
metaphor for accessing time, that implies keeping an open FD to some
device around. But file descriptors are a process-global resource, and
can be closed by totally unrelated bits of code: there's no concept of
a "private" FD in the way that there is for "private memory", such as
a static inside of a function, or the stack segment of a process or
something. So whatever code is used to return the time has to deal
with a bunch of weird failure cases, which adds both complexity _and_
overhead. And there's the matter of what format the data should take:
traditionally, plan 9 would return text (e.g., `cat /dev/time` right
now). That adds overhead since it probably has to be parsed. If you
need nanosecond resolution for the phase-locked plesiosynchronous
timer that ensures consistency across your HPC cluster or whatever,
you may not be able to easily provide that if you've got to read and
parse the data. Presumably a system call just copies into a
native-order binary vlong or something.
> I clearly don't get the gist of the original problem. Are programs
> that open many FDs at start somehow symptomatic for a certain kind of
> low-quality software engineering trend these days that we are trying
> really hard to avoid? I haven't looked! If true, are we fixing the
> symptoms by cargo-culting older unix, or are we actually fixing the
> root of the problem?
I don't think it's a question the of engineering quality inherent in
one solution versus the other, nor is it a matter of merely
cargo-culting Unix/Linux to get a slap-job hack in place to address
the immediate need.
Rather, it is about identifying a specific need and the best way to
serve that need. In the context of a plan9 system, I would argue that
a system call for high precision timing is actually a pretty good
engineering tradeoff: it's not a terribly invasive change, avoids a
lot of complex (and honestly kind of weird) edge cases, and provides
an adequate, if imperfect, solution to a real-world problem.
> > > I remember linux people postponed the problem of running out of FDs by switching to OS nesting via virtualization (a python expert i met claimed this to be a fact).
> >
> > I don't believe that's true. The original impetus for
> > containerization (what I presume you're referring to, though note
> > that's rather different than true virtualization a la multiple
> > independent instances of an operating system running in a virtual
> > machine on some physical host) was to isolate services that made
> > assumptions about not being co-resident with other instances of
> > themselves, making it possible to host multiple instances of those
> > services on the same physical machine. This was essential for
> > utilization efficiency in hyperscalar data centers, though one might
> > reasonably argue that containers are, perhaps, not the best way to
> > address the problem. The concept has now morphed into a grab-bag of
> > different resource isolation mechanisms that can be combined to
> > sandbox, say, process (as in, PID), open file, and network namespaces,
> > in addition to the file namespace.
>
> No, I was indeed speaking of CPU level virtualization. If we need 1000
> FDs, but every kernel can only handle 10 FDs, you can just start 100
> VMs with 10 VMs each to solve the problem. That's how my python expert
> explained it to me, sorry if it's not true ;)
> Containers would only worsen this problem.
Oh, that hasn't been a problem for decades now. No offense to your
python expert, but I think they are probably mistaken.
Incidentally, modern VMs were introduced for flexibility and
utilization, much in the same way that containers were. I don't think
considerations about only being able to hold X many files open at a
time really entered into it. Sockets maybe, but I doubt it was an
upper bound on the number of socket descriptors available so much as
scalability in a particular system; e.g., POSIX (and plan 9) semantics
around file descriptor allocation practically force it to be quadratic
unless you're willing to throw a fair bit of per-process complexity at
it; if you've got a lot of network connections coming and going, that
could be a bottleneck.
> > Sure. But we can also close those file descriptors, meaning that
> > they'd have to be reopened if some library routine expects them to be
> > ever-present.
>
> Exactly. I would understand if libraries chose not to close it bec.
> processes traditionally are short-lived and the cost of an open FD low
> (at least on linux as you just confirmed).
Yeah. It's more an issue of what happens to the library if some other
bit of code in the process yanks the FD out from under it. If some FDs
become magical, in that you start to restrict whether they can be
closed, or by who, or whatever, then how far do you push the
complexity that that approach implies? And is it worth it to avoid
just adding a system call?
> > 9p isn't appropriate for everything, and I can imagine how this might
> > have some very unintended consequences if not carefully managed; for
> > instance, a program might end up accidentally reading time from
> > another host when it doesn't intend to.
>
> Yes, time is really a weird concept :)
Ain't that the truth.
- Dan C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 13:15 ` hiro
2025-03-12 14:59 ` Dan Cross
@ 2025-03-12 15:02 ` ron minnich
2025-03-12 17:43 ` Dan Cross
2025-03-12 21:30 ` qwx
1 sibling, 2 replies; 27+ messages in thread
From: ron minnich @ 2025-03-12 15:02 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 8155 bytes --]
I thought this was a pretty reasonable discussion.
re this one comment: "I remember linux people postponed the problem of
running out of FDs by switching to OS nesting via virtualization (a python
expert i met claimed this to be a fact)." -- I think that's not really how
it happened at all. VMWare kicked off all kinds of interest in
virtualization, Xen pushed it up a notch, and then in 2006 KVM came along
for LInux and it took off.
I can understand a python person thinking that was what happened given
Python's penchant for using FDs (the Sugar environment on OLPC had this
problem of opening up 10,000 files on startup). But that's not really how
it went. The problems of python and fds did not drive virtualization.
I agree with Russ that time and random are special, but I disagree with
his idea that you need a special system call. I've also demonstrated you
can get what he wants without having to leave an fd allocated. But I
understand his points, and figured it was not worth pushing, because our
process private system calls / currying are unlikely to make it into all
the Plan 9 kernels out there. That said, I decided not to lose those ideas,
and will likely be bringing them back to life in the NIX branch, which I'll
probably maintain as a permafork of 9front.
On Wed, Mar 12, 2025 at 6:21 AM hiro <23hiro@gmail.com> wrote:
> > As long as it's presented in a collegial manner, no one should be
> afraid to post.
>
> I'm afraid my questions might be interpreted as statements, here
> people are used to handle my probes with less heart-ache (for both
> sides). So I'd like to condense it first, so that Russ doesn't lose
> interest in the discussion just based on my incompetence. Thank you
> all for helping me with this.
>
> > I agree with Jamie McClymont's response, here. Fundamental machine
> > instructions (like `RDTSC` on x86) are difficult to govern in this
> > way, and unless very carefully designed, such guards are easy to work
> > around.
>
> I wouldn't want to govern RDTSC.
>
> RDTSC is nice, bec. it's just a counter that is not synchronised. By
> using RDTSC there is no leakage of actual time as perceived by another
> process on another core. This is good.
>
> Time is distinct in that it has a reference/synchronisation point.
> That's more costly and maybe even a side-channel. That i would want to
> govern.
>
> > Moreover, high-accuracy timing, entropy generation, and so on is very
> > useful for legitimate applications. Limiting access to those things
> > due to probably-specious security concerns cuts out the legitimate use
> > cases, too.
>
> Agreed. Being allowed to use any (however limited) ressource
> whatsoever (as user none) is already questionable. The limits are not
> very good right effective right now, but I hope it can be improved
> over time, bec. I love the security model in plan9 where you start
> with the least possible privilege and work yourself up (as opposed to
> other systems where you mainly start as root and then drop privileges
> in the flow).
>
> > That said, the random number instructions on e.g. x86 are not always
> > available, and on some platforms rely on firmware to set their state
> > (e.g, AMD EPYC requires getting the PSP involved). Ostensibly,
> > programs must check for it by looking at a CPUID leaf before issuing
> > the instruction, but trapping and emulating in the kernel is onerous;
>
> Doing security in the kernel has some benefits, but without protection
> from ressource exhaustion maybe doing rng in userland might be worth
> it for the common case? I would see a benefit in starting up the
> process with existing entropy/seeds at fork and then during the rest
> of RNG during the process lifetime in userland only.
> But I'm by no means convinced of anything here.
>
> > `/dev/random` is cleaner from that perspective. A system call may be
> > fine, but I'm less convinced about that than I am about a time of day
> > system call. Then again, I also think it's a less-frequent operation;
> > often programs will grab some entropy from the OS to seed their own
> > pseudorandom number generator.
>
> Oh, I guess my thought is not so original. That's great :D
>
> > > Are there no other paths left for optimizing per-process file
> descriptor table maintenance?
> >
> > I'm not sure it's about optimization of file descriptor maintenance,
> > but rather about the implied complexity that's pushed onto each
> > program that wants to grab a time stamp.
>
> Basically, what we're saying here is, that reading a file is too
> difficult. maybe we can fix this with another abstraction? I see files
> as very valuable for example bec. they have names and come in named
> folder hierarchies. It's definitely worth preserving this concept,
> even if we could optimize the interface as we dreamed to do with
> batching for 9p2020.
>
> Why do you say it's not an optimization? What is the problem exactly,
> is it the complexity of a filepath, or is it the complexity of the
> sequence of syscalls required to read a small file like that?
> And before changing 9p and the syscall interface, remember also that
> userland Libraries can be used to hide the sequence into a single
> function call.
> BTW: how often do we need accurate time? If we need it once per hour
> or so, opening and closing the FD repeatedly wouldn't hurt anyone, or?
> It seemed so far implied that the FD has to either stay open forever
> or be replaced with a syscall. Nobody seemed to mention that it can be
> closed and another FD reopened later on.
> I clearly don't get the gist of the original problem. Are programs
> that open many FDs at start somehow symptomatic for a certain kind of
> low-quality software engineering trend these days that we are trying
> really hard to avoid? I haven't looked! If true, are we fixing the
> symptoms by cargo-culting older unix, or are we actually fixing the
> root of the problem?
>
> > > I remember linux people postponed the problem of running out of FDs by
> switching to OS nesting via virtualization (a python expert i met claimed
> this to be a fact).
> >
> > I don't believe that's true. The original impetus for
> > containerization (what I presume you're referring to, though note
> > that's rather different than true virtualization a la multiple
> > independent instances of an operating system running in a virtual
> > machine on some physical host) was to isolate services that made
> > assumptions about not being co-resident with other instances of
> > themselves, making it possible to host multiple instances of those
> > services on the same physical machine. This was essential for
> > utilization efficiency in hyperscalar data centers, though one might
> > reasonably argue that containers are, perhaps, not the best way to
> > address the problem. The concept has now morphed into a grab-bag of
> > different resource isolation mechanisms that can be combined to
> > sandbox, say, process (as in, PID), open file, and network namespaces,
> > in addition to the file namespace.
>
> No, I was indeed speaking of CPU level virtualization. If we need 1000
> FDs, but every kernel can only handle 10 FDs, you can just start 100
> VMs with 10 VMs each to solve the problem. That's how my python expert
> explained it to me, sorry if it's not true ;)
> Containers would only worsen this problem.
>
> > Sure. But we can also close those file descriptors, meaning that
> > they'd have to be reopened if some library routine expects them to be
> > ever-present.
>
> Exactly. I would understand if libraries chose not to close it bec.
> processes traditionally are short-lived and the cost of an open FD low
> (at least on linux as you just confirmed).
>
> > 9p isn't appropriate for everything, and I can imagine how this might
> > have some very unintended consequences if not carefully managed; for
> > instance, a program might end up accidentally reading time from
> > another host when it doesn't intend to.
>
> Yes, time is really a weird concept :)
>
[-- Attachment #2: Type: text/html, Size: 9138 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 12:28 ` [9front] " Dan Cross
2025-03-12 13:15 ` hiro
@ 2025-03-12 15:27 ` Kurt H Maier
2025-03-12 15:50 ` Dan Cross
1 sibling, 1 reply; 27+ messages in thread
From: Kurt H Maier @ 2025-03-12 15:27 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 08:28:27AM -0400, Dan Cross wrote:
>
> 9p isn't appropriate for everything, and I can imagine how this might
> have some very unintended consequences if not carefully managed; for
> instance, a program might end up accidentally reading time from
> another host when it doesn't intend to.
This is an argument *for* 9p, not against it; the idea that it would be
easy to do this is what makes the system appealing. Every feature built
outside of the 9p paradigm is an added complexity to deal with. If 9p
is not the answer, the question wasn't about Plan 9.
khm
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 15:27 ` Kurt H Maier
@ 2025-03-12 15:50 ` Dan Cross
2025-03-12 19:51 ` Kurt H Maier
2025-03-13 3:15 ` Ori Bernstein
0 siblings, 2 replies; 27+ messages in thread
From: Dan Cross @ 2025-03-12 15:50 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 11:31 AM Kurt H Maier <khm@sciops.net> wrote:
> On Wed, Mar 12, 2025 at 08:28:27AM -0400, Dan Cross wrote:
> > 9p isn't appropriate for everything, and I can imagine how this might
> > have some very unintended consequences if not carefully managed; for
> > instance, a program might end up accidentally reading time from
> > another host when it doesn't intend to.
>
> This is an argument *for* 9p, not against it; the idea that it would be
> easy to do this is what makes the system appealing.
But you can already do that today. Simply read /dev/time from the
remote system; I don't think anyone is suggesting that that go away in
favor of a system call. It's not nanosecond-resolution, granted, but
you wouldn't really expect it to be over an arbitrary network with
today's technology anyway.
> Every feature built
> outside of the 9p paradigm is an added complexity to deal with. If 9p
> is not the answer, the question wasn't about Plan 9.
Processes are pretty fundamental, and their creation is outside of the
filesystem metaphor.
- Dan C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 15:02 ` ron minnich
@ 2025-03-12 17:43 ` Dan Cross
2025-03-12 21:30 ` qwx
1 sibling, 0 replies; 27+ messages in thread
From: Dan Cross @ 2025-03-12 17:43 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 11:05 AM ron minnich <rminnich@gmail.com> wrote:
> [snip]
> I agree with Russ that time and random are special, but I disagree with his idea that you need a special system call. I've also demonstrated you can get what he wants without having to leave an fd allocated. But I understand his points, and figured it was not worth pushing, because our process private system calls / currying are unlikely to make it into all the Plan 9 kernels out there. That said, I decided not to lose those ideas, and will likely be bringing them back to life in the NIX branch, which I'll probably maintain as a permafork of 9front.
Having now caught up on the 9fans thread, I think the private curried
syscall thing is pretty cool and would address the issue in a very
general way. Magical negative file descriptors would be awful, and I
think the (less general) kread/kwrite is somewhere in the middle. It
strikes me that there are some implementation questions I'd raise
around it, but that's probably a separate discussion for 9fans or over
a cup of coffee.
I agree that efficient access to monotonic time is special; I disagree
that randomness is similarly special and frankly don't see the urgency
there. The change that Russ ultimately proposed, extending the data
returned by reading `/dev/bintime`, seems like an acceptable solution
in the short term, just to get access to the relevant data at all.
- Dan C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 15:50 ` Dan Cross
@ 2025-03-12 19:51 ` Kurt H Maier
2025-03-12 19:58 ` Dan Cross
2025-03-12 20:01 ` [9front] " Jacob Moody
2025-03-13 3:15 ` Ori Bernstein
1 sibling, 2 replies; 27+ messages in thread
From: Kurt H Maier @ 2025-03-12 19:51 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 11:50:38AM -0400, Dan Cross wrote:
>
> But you can already do that today. Simply read /dev/time from the
> remote system; I don't think anyone is suggesting that that go away in
> favor of a system call. It's not nanosecond-resolution, granted, but
> you wouldn't really expect it to be over an arbitrary network with
> today's technology anyway.
Yes, I know you can do that today. Please note that nobody has been
worried about accidentally reading the wrong /dev/time. Nevertheless
it's important to maintain the focus of the design: today NIX is
rebasing on 9front; tomorrow, who knows, maybe someone shows up with
an infiniband driver.
> Processes are pretty fundamental, and their creation is outside of the
> filesystem metaphor.
The concept predates 9p, so I don't really find this argument
compelling, especially since /proc definitely exists and could be
extended to cover this sort of thing. Whoever tried it would have a lot
of fun along the way.
khm
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 19:51 ` Kurt H Maier
@ 2025-03-12 19:58 ` Dan Cross
2025-03-12 20:11 ` Kurt H Maier
2025-03-12 20:01 ` [9front] " Jacob Moody
1 sibling, 1 reply; 27+ messages in thread
From: Dan Cross @ 2025-03-12 19:58 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 3:55 PM Kurt H Maier <khm@sciops.net> wrote:
> On Wed, Mar 12, 2025 at 11:50:38AM -0400, Dan Cross wrote:
> > But you can already do that today. Simply read /dev/time from the
> > remote system; I don't think anyone is suggesting that that go away in
> > favor of a system call. It's not nanosecond-resolution, granted, but
> > you wouldn't really expect it to be over an arbitrary network with
> > today's technology anyway.
>
> Yes, I know you can do that today. Please note that nobody has been
> worried about accidentally reading the wrong /dev/time. Nevertheless
> it's important to maintain the focus of the design: today NIX is
> rebasing on 9front; tomorrow, who knows, maybe someone shows up with
> an infiniband driver.
>
> > Processes are pretty fundamental, and their creation is outside of the
> > filesystem metaphor.
>
> The concept predates 9p,
So does time.
> so I don't really find this argument
> compelling, especially since /proc definitely exists and could be
> extended to cover this sort of thing. Whoever tried it would have a lot
> of fun along the way.
Anyway, I don't see what the big deal is. From the 9fans thread, Russ
came up with something that's not perfect, but agreeable to most
everyone, works over 9p, and doesn't involve a new system call.
- Dan C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 19:51 ` Kurt H Maier
2025-03-12 19:58 ` Dan Cross
@ 2025-03-12 20:01 ` Jacob Moody
1 sibling, 0 replies; 27+ messages in thread
From: Jacob Moody @ 2025-03-12 20:01 UTC (permalink / raw)
To: 9front
On 3/12/25 14:51, Kurt H Maier wrote:
> On Wed, Mar 12, 2025 at 11:50:38AM -0400, Dan Cross wrote:
>>
>> But you can already do that today. Simply read /dev/time from the
>> remote system; I don't think anyone is suggesting that that go away in
>> favor of a system call. It's not nanosecond-resolution, granted, but
>> you wouldn't really expect it to be over an arbitrary network with
>> today's technology anyway.
>
> Yes, I know you can do that today. Please note that nobody has been
> worried about accidentally reading the wrong /dev/time. Nevertheless
> it's important to maintain the focus of the design: today NIX is
> rebasing on 9front; tomorrow, who knows, maybe someone shows up with
> an infiniband driver.
>
>> Processes are pretty fundamental, and their creation is outside of the
>> filesystem metaphor.
>
> The concept predates 9p, so I don't really find this argument
> compelling, especially since /proc definitely exists and could be
> extended to cover this sort of thing. Whoever tried it would have a lot
> of fun along the way.
Ori has discussed some of this before, I think its an interesting idea to
consider a world where we have /proc/clone instead of rfork().
At this point the syscall-only interface is entrenched however I think it
would be interesting to explore an implementation like this see what new
things it unlocks for us.
Just wanted to say that this is something which has been discussed and there
is at least a desire to explore what it would look like if practically implemented.
Maybe some of the original authors have some context as to why this wasn't the case,
would be curious to know.
Thanks,
moody
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 19:58 ` Dan Cross
@ 2025-03-12 20:11 ` Kurt H Maier
2025-03-14 12:04 ` theinicke
2025-03-15 10:21 ` [9front] " Shawn Rutledge
0 siblings, 2 replies; 27+ messages in thread
From: Kurt H Maier @ 2025-03-12 20:11 UTC (permalink / raw)
To: 9front
On Wed, Mar 12, 2025 at 03:58:48PM -0400, Dan Cross wrote:
>
> So does time.
That doesn't explain 'the unix epoch'
> Anyway, I don't see what the big deal is. From the 9fans thread, Russ
> came up with something that's not perfect, but agreeable to most
> everyone, works over 9p, and doesn't involve a new system call.
The big deal is that this is the third (or fourth, depending on how you
choose to count these things) ride on the "golang devs shitting up the
kernel" roller coaster. It was very nice of Russ to send an email to
someone about it this time, but once again we found ourselves in a
"commit first, ask questions later" situation. Last time I had to beg
in the golang ticket tracker for some documentation after we got a
stream of people joining the irc channel and demanding to know why go
stopped building on 9front.
I understand that this is a minor issue to most people, and things are
slowly improving, but we've been dealing with the knock-on effects of
golang flailing for many, many years now. If I thought it had a chance
in hell of succeeding I'd follow their 'proposals' documentation to
suggest discontinuing plan 9 support entirely. It would be unfortunate
for the six people who actually run golang programs on plan 9, but it
would solve this problem forever.
khm
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 15:02 ` ron minnich
2025-03-12 17:43 ` Dan Cross
@ 2025-03-12 21:30 ` qwx
1 sibling, 0 replies; 27+ messages in thread
From: qwx @ 2025-03-12 21:30 UTC (permalink / raw)
To: 9front
On Wed Mar 12 16:02:36 +0100 2025, rminnich@gmail.com wrote:
> I thought this was a pretty reasonable discussion.
>
> re this one comment: "I remember linux people postponed the problem of
> running out of FDs by switching to OS nesting via virtualization (a python
> expert i met claimed this to be a fact)." -- I think that's not really how
> it happened at all. VMWare kicked off all kinds of interest in
> virtualization, Xen pushed it up a notch, and then in 2006 KVM came along
> for LInux and it took off.
>
> I can understand a python person thinking that was what happened given
> Python's penchant for using FDs (the Sugar environment on OLPC had this
> problem of opening up 10,000 files on startup). But that's not really how
> it went. The problems of python and fds did not drive virtualization.
>
> I agree with Russ that time and random are special, but I disagree with
> his idea that you need a special system call. I've also demonstrated you
> can get what he wants without having to leave an fd allocated. But I
> understand his points, and figured it was not worth pushing, because our
> process private system calls / currying are unlikely to make it into all
> the Plan 9 kernels out there. That said, I decided not to lose those ideas,
> and will likely be bringing them back to life in the NIX branch, which I'll
> probably maintain as a permafork of 9front.
One thing I'm personally missing in the midst of all of this is actual
numbers. Some of the proposals, like the Blue Gene ones, actually had
measurements included, while others not so much, and it while the
effects might sometimes seem obvious, actual benchmarking would be
great. We have some tools right now in 9front to do some
measurements, but a more complete and usable benchmarking toolkit
would be immensely useful. I've seen mentions of such tools in papers
here and there (example: [1]), but I suspect there are a lot of
snippets of code written for various projects that never ended up
being released as an individual tool. I'd be interested to see some
of that code being made public, or at least discussed. Now that there
is renewed effort on several Plan 9 projects, some of those tools
could actually be fixed and completed collaboratively.
Thanks,
qwx
[1] https://doc.cat-v.org/plan_9/misc/adding_a_syscall_to_plan_9/plan9_syscall_howto.pdf
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 15:50 ` Dan Cross
2025-03-12 19:51 ` Kurt H Maier
@ 2025-03-13 3:15 ` Ori Bernstein
2025-03-13 16:32 ` Dan Cross
1 sibling, 1 reply; 27+ messages in thread
From: Ori Bernstein @ 2025-03-13 3:15 UTC (permalink / raw)
To: 9front; +Cc: Dan Cross
On Wed, 12 Mar 2025 11:50:38 -0400
Dan Cross <crossd@gmail.com> wrote:
> > Every feature built
> > outside of the 9p paradigm is an added complexity to deal with. If 9p
> > is not the answer, the question wasn't about Plan 9.
>
> Processes are pretty fundamental, and their creation is outside of the
> filesystem metaphor.
>
> - Dan C.
I always thought that it would have been interesting to do that part
differently, too.
At the very least, it would have been nice for fork() to return an fd
open to /proc/$pid/ctl. Maybe it's not too late: rfork(RFCTL) or such.
--
Ori Bernstein <ori@eigenstate.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-13 3:15 ` Ori Bernstein
@ 2025-03-13 16:32 ` Dan Cross
0 siblings, 0 replies; 27+ messages in thread
From: Dan Cross @ 2025-03-13 16:32 UTC (permalink / raw)
To: Ori Bernstein; +Cc: 9front
On Wed, Mar 12, 2025 at 11:15 PM Ori Bernstein <ori@eigenstate.org> wrote:
> On Wed, 12 Mar 2025 at 11:50:38 -0400 Dan Cross <crossd@gmail.com> wrote:
> > Processes are pretty fundamental, and their creation is outside of the
> > filesystem metaphor.
>
> I always thought that it would have been interesting to do that part
> differently, too.
I've thought about something along these lines from time to time as
well (e.g., the sketch of `#p/clone` I outlined in my email to hiro).
It raises lots of questions, particularly when one thinks of the
implications of extending it over the network: suppose I import
`/proc` from some machine and open `/proc/clone` from there:
presumably I end up basically forking exportfs on the remote side;
probably not what I intended. It'd be kinda fun and funky to think
about `/proc/$pid/clone` and what that might mean: potentially, a
process totally outside of the usual ancestral hierarchy can cause a
fork at almost any time, perhaps even initiated from a machine
elsewhere on the network. I haven't thought through all of the
implications, but at first blush I'd imagine they'd be rather profound
with respect to the process model overall.
> At the very least, it would have been nice for fork() to return an fd
> open to /proc/$pid/ctl. Maybe it's not too late: rfork(RFCTL) or such.
When you say return an fd from fork, I'm guessing you don't mean the
actual return value of the stub function; the problem there is a) a
bunch of existing code, and b) even if you introduce a new flag for
rfork, you still need some way to distinguish parent from child, and
the return value is all you've got. I guess you could derive it from
the file descriptor, but that'd be more work for the common case. I
suppose if you only get the fd back if RFCTL is set, you're already in
a context where you'd be prepared to do that work anyway.
There's always passing an out parameter by pointer or something;
possibly with nil meaning "don't care". You could introduce a new stub
function and redefine the existing rfork stub in terms of that.
What's your use case, though? What advantage would that confer over
just creating the filename and opening it after fork?
Perhaps this is something folks might want to weigh in on over on 9fans?
- Dan C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 20:11 ` Kurt H Maier
@ 2025-03-14 12:04 ` theinicke
2025-03-14 21:26 ` Kurt H Maier
2025-03-15 10:21 ` [9front] " Shawn Rutledge
1 sibling, 1 reply; 27+ messages in thread
From: theinicke @ 2025-03-14 12:04 UTC (permalink / raw)
To: 9front
Quoth Kurt H Maier <khm@sciops.net>:
>[...] I'd follow their 'proposals' documentation to
> suggest discontinuing plan 9 support entirely. It would be unfortunate
> for the six people who actually run golang programs on plan 9, but it
> would solve this problem forever.
Not really being very productive I most of the time decide to be silent
when you mention something like this, but I think this is wrong to assume
that no one is using X or Y on Plan 9/9front, just because some of us stay quite...
Personally I am very much using golang on 9front; similarly I heard an argument lately against json(2),
which I also use but did not complain about it aledgedly not being used.
I just think it is not fair to assume something is not used on the system
just because you may not hear so much from the actual users of said language/program/library.
--
Tobias Heinicke
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-14 12:04 ` theinicke
@ 2025-03-14 21:26 ` Kurt H Maier
2025-03-14 22:41 ` Paul Lalonde
0 siblings, 1 reply; 27+ messages in thread
From: Kurt H Maier @ 2025-03-14 21:26 UTC (permalink / raw)
To: 9front
On Fri, Mar 14, 2025 at 01:04:51PM +0100, theinicke@pheist.org wrote:
>
> Not really being very productive I most of the time decide to be silent
> when you mention something like this, but I think this is wrong to assume
> that no one is using X or Y on Plan 9/9front, just because some of us stay quite...
I appreciate the feedback but unless I hear from at least five more
people I don't know which way to revise my estimate.
> Personally I am very much using golang on 9front; similarly I heard an argument lately against json(2),
> which I also use but did not complain about it aledgedly not being used.
json(2), being in-tree, is a much easier maintenance burden than Go,
which is not only not in-tree but isn't meaningfully tested against our
tree.
> I just think it is not fair to assume something is not used on the system
> just because you may not hear so much from the actual users of said language/program/library.
I think it's perfectly fair.
khm
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-14 21:26 ` Kurt H Maier
@ 2025-03-14 22:41 ` Paul Lalonde
2025-03-15 1:19 ` Lyndon Nerenberg (VE7TFX/VE6BBM)
0 siblings, 1 reply; 27+ messages in thread
From: Paul Lalonde @ 2025-03-14 22:41 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]
Well, add at least one other golang user on 9front.
On Fri, Mar 14, 2025, 2:30 p.m. Kurt H Maier <khm@sciops.net> wrote:
> On Fri, Mar 14, 2025 at 01:04:51PM +0100, theinicke@pheist.org wrote:
> >
> > Not really being very productive I most of the time decide to be silent
> > when you mention something like this, but I think this is wrong to assume
> > that no one is using X or Y on Plan 9/9front, just because some of us
> stay quite...
>
> I appreciate the feedback but unless I hear from at least five more
> people I don't know which way to revise my estimate.
>
> > Personally I am very much using golang on 9front; similarly I heard an
> argument lately against json(2),
> > which I also use but did not complain about it aledgedly not being used.
>
> json(2), being in-tree, is a much easier maintenance burden than Go,
> which is not only not in-tree but isn't meaningfully tested against our
> tree.
>
> > I just think it is not fair to assume something is not used on the system
> > just because you may not hear so much from the actual users of said
> language/program/library.
>
> I think it's perfectly fair.
>
> khm
>
[-- Attachment #2: Type: text/html, Size: 1593 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-12 14:59 ` Dan Cross
@ 2025-03-14 23:33 ` hiro
2025-03-17 15:49 ` Dan Cross
0 siblings, 1 reply; 27+ messages in thread
From: hiro @ 2025-03-14 23:33 UTC (permalink / raw)
To: 9front
> My advice is: don't worry about that. Keep the discussion centralized
> on 9fans; Russ is genuinely one of the nicest people I know, he won't
> bite.
That's reassuring. Thank you.
> But taking a timestamp and then computing an offset based on the TSC
> is a fine way to _approximate_ time. It's not as accurate as having
> the kernel do it for you, as Russ indicated, but if you allow the user
Why not? The syscall delays might be longer than the time it takes to
just read the TSC. What makes you think that the inaccuracy caused by
syscall overhead is acceptable?
> to read the clock at all, ever, _and_ allow them access to timing
> source that's locked to some stable reference (like the CPU clock
> frequency, DVFS notwithstanding) then you've lost, as far as the
> security of the ToD clock across, say, hyperthread buddy pairs goes.
Ah, I did forget about hyperthreading. I was assuming no
hyperthreading and instead was looking at the edge-case of inter-core
side-channels.
Anyways, the point is I don't see many legit applications for such
tight synchronization in the first place.
Why can't we live with the inaccuracy that comes with TSC as it's a
monotonic counter anyways by design?
> I wouldn't describe it that way, but honestly, I think arguing against
> a ToD system call on security grounds isn't going to be persuasive.
Now you see another reason I didn't want to bother Russ: I do not want
to suggest this at all :)
Indeed, artificially reducing timing resolution seems particularly
short-sighted to me. I can understand the motive, even though I
disagree.
Either way I want the use-cases and motives documented. Artificially
increasing timing accuracy should be grounded on a proper use-case.
The natural response is to change nothing. But if it's *worth*
increasing time accuracy I would like to know why and how.
Also, again, It is not obvious to me why calling an extra syscall
ensures high accuracy from the point of view of a userland
application. I would really like an example.
In the real world, when very accurate timestamps are necessary one
generally lets the hardware insert them (for example ptp works like
this and thus requires special ethernet hardware support). Our
drivers/interrupt handlers, schedulers, etc. introduce delays, so
system designers learned to not expose time-critical stuff to a whole
stack of both operating systems and applications.
> That's an approach one could take, sure. But an issue there is that it
> injects complexity into process creation in a way that is difficult to
> hide from userspace; you'd essentially need a hook into an RNG library
> that runs every time you `fork` and with the extant `rfork` semantics
> this could lead to weird races and so on, and doesn't cover the case
> where a user provides their own RNG, as opposed to using one from libc
> or similar. That it doesn't compose well with the existing primitives
> suggests to me that it's not a great approach in a plan9-ish context.
Right, so back to /dev/random, good enough for getting a seed into userspace.
> Well yeah, but looking at the abstractions at hand in plan9-ish style
> systems, you've got basically three choices: a file, synthesized by
> the operating system, that you interact with via the usual
> open/close/read/write; a system call; or something involving virtual
> memory (e.g., a page shared between the kernel and userspace). The
> latter one has very little precedent in plan9, and raises a whole host
> of design questions, so I'd probably rule it out; at least for
> entropy, less so for time. That leaves the first two: for entropy,
> I'd argue for the file, but for time, a system call seems appropriate.
I see we all agree there's no downside for /dev/random.
This begs the question, why did Russ want a syscall for reading from
/dev/random?
I feel like nobody knows, and I feel strengthened enough to propose
this question to him finally.
For time the syscall would be appropriate, if two things are true:
1) higher accuracy than possible with a read() from /dev/bintime is needed
2) a syscall is faster than a register read.
if 1 is false, then the complexity seems wasteful.
if 2 is false, then accuracy might not be the goal, which makes me
question the original supposed intention.
Am I still missing something?
> line between whether something should be provided by a file
> abstraction versus a system call?
A file abstraction *can* be used by a system call already: read(fd)
where fd points to an open file.
A system call change is bad for binary compatibility. open/read
syscalls on File paths are much more flexible.
The line so far seems to be a matter of historic circumstance. for
example we can create a new pipe with a syscall, bec. the syscall was
there first.
But what new feature has ever needed a new syscall in the last 10
years? And what is the rationale why it couldn't be done without?
I don't think there's actually precedence. While there's new ioctl
bugs every few days, we seem to move the other way on plan9, away from
syscalls, to simple text-based, human debuggable nameable interfaces.
> To illustrate, look at a slightly different case: process creation a
> la `rfork()`. Plan 9 has always had `/proc`; one could imagine a
> `#p/clone` that worked something like `/net/tcp/clone`: you open this
> file and read from it, and a new process springs into existence, the
> child of the current process, that is a duplicate of the parent except
> that the data read is the path to a new directory representing the
> child process in the parent, or the directory of the parent in the
> child (or nil on failure). Perhaps one might write a set of textual
> flags that represent the various flags one can pass to `rfork` to
> control sharing and so forth, before reading. On the face of it, it
> doesn't immediately seem like an awful interface, and I have no doubt
> that something like this _could_ be built. But we _don't_ have that;
> instead, `rfork()` is a system call. Why is that? I don't claim to
> know definitively, but I imagine that someone might have at least
> considered it once. Likely, it's too hard to get the details right
> (some problems with concurrency spring to mind immediately, if the
> read is into a non-stack buffer; what would it mean if I tried to do
> access a remote machine's `/proc/clone`; etc), and process creation is
> so frequent that the overhead of interacting with the file API isn't
> worth it for some imagined gain in elegance or orthogonality. Anyway,
> I think the question is at least worth pondering if we're asking
> whether something should be elevated to the level of a syscall. In the
> `rfork` case, being a system call really is simpler and more robust
> for such a fundamental operation. I believe that's the case Russ is
> trying to make, as well. It will always be a tradeoff; the key is
> deciding whether that tradeoff is worth it along some important
> dimension.
The reason fork is a syscall is clearly just historical. Could it be
done differently today if we did it from scratch? Sure! Do we want
this? No, we are quite conservative towards such fundamental changes.
I think we have to be very careful and think this through.
But I'd love a good proposal how to send beta-level simulations of my
process and process context across universes at FTL speed. This would
require solving the problem with dangling FDs, which would be a much
greater excuse to rethink how our FDs (and syscalls) work. The current
best-practice implementations of "global namespace" will not be good
enough for cp /mnt/earth/proc/123 /mnt/alphacentauri/proc/. FDs would
need some kind of magic cross-kernel redirection, or we replace FDs
with something that has actual names.
Until then I think it's ok to fork.
> > and the syscall interface, remember also that
> > userland Libraries can be used to hide the sequence into a single
> > function call.
>
> Yes, but at a significant cost in overhead.
>
> > BTW: how often do we need accurate time?
>
> Oh, you need it All. The. Time. On busy servers, many, many times a
> second. Rates in the kilohertz, spread across many cores.
time yes, but accurate time? why?
maybe the problem is that not all of us here assume that even
high-resolution timestamps can also become practically inaccurate.
how accurate are we speaking? personally, i don't get it at all. in
99% of all cases i just care about 1-second resolution, as long as
everything is monotonic.
in the remaining 1% i might misuse time as a half-assed indicator of
ordering (generally unreliable).
> Of course it depends on the application, but in a lot of production
> settings, the need is almost absurdly frequent and the overhead of
> even a system call is absolutely brutal,
I feared the same. Maybe Russ missed this. I shall inquire.
> much effort in the Linux world to get that stuff into userspace: first
> by doing an initial read and then incrementing off of the TSC, and
> then by using e.g. the vdso.
Yeah, it's complex, but I can imagine it's worth it for some edge-cases.
> Go cares a lot because they do a lot with high-precision timers in the
> runtime.
Do you mean TSC, or actual high-accuracy time sources?
TSC is cheap and would make sense to me. Other kinds of timers would seem fishy.
> Most 9fans probably don't need it, but on the high end you
> really want to reduce the overhead as much as possible; the closer you
> can get it just being a load from RAM the better.
Time isn't generated by RAM. It's a property of the universe. :P
> I think it's being taken as a given that the overhead of
> open/read/close is too high, and so if one wants to retain the file
> metaphor for accessing time, that implies keeping an open FD to some
> device around. But file descriptors are a process-global resource, and
> can be closed by totally unrelated bits of code: there's no concept of
> a "private" FD in the way that there is for "private memory", such as
> a static inside of a function, or the stack segment of a process or
> something. So whatever code is used to return the time has to deal
> with a bunch of weird failure cases, which adds both complexity _and_
> overhead.
Yes, which is no different to how we already have to handle stdin/out etc.
adding a stdtime in golang doesn't seem like too much clutter to me personally.
if all code only ever does single reads on the time fd, i see no
problem with parallel access either.
i see no reason why any process should close the time fd, and given
that, where would those weird failure cases come from?
And there's the matter of what format the data should take:
> traditionally, plan 9 would return text (e.g., `cat /dev/time` right
> now). That adds overhead since it probably has to be parsed. If you
> need nanosecond resolution for the phase-locked plesiosynchronous
> timer that ensures consistency across your HPC cluster or whatever,
are they making PLL as a service in the cloud now?
seriously, for *actual* practically high-accuracy low-latency work you
would not want scheduling delays and syscall overheads.
sure you can have high-accuracy timestamps at high latencies, no
problem. but if the latency is high, again, why did you want
high-accuracy in the first place?
maybe just use dedicated hardware like everybody else?
or better, show me your magical "real" "time" system that can do
anything close to this.
you just can't have high accuracy scheduling with guarantees AND lower
latency than usual AND high processing power just by .
> you may not be able to easily provide that if you've got to read and
> parse the data. Presumably a system call just copies into a
> native-order binary vlong or something.
Unlike simple counters like TSC, time/date generally has to be parsed
to be meaningful. that parsing is likely expensive cause multiple
political systems, calendars, timezones, and many historical events
have to be taken into account potentially...
yes it sounds like too much overhead for my taste. just use a simple
counter like TSC.
> Rather, it is about identifying a specific need and the best way to
> serve that need. In the context of a plan9 system, I would argue that
> a system call for high precision timing is actually a pretty good
> engineering tradeoff: it's not a terribly invasive change, avoids a
> lot of complex (and honestly kind of weird) edge cases, and provides
> an adequate, if imperfect, solution to a real-world problem.
What exactly is that real-world problem? it has not been described
yet. Everything was vague, HPC was mentioned now as very sensitive (to
slow hardware, phase noise, etc.), but which exact HPC SOFTWARE
problem is this software solution solving?
> Oh, that hasn't been a problem for decades now. No offense to your
> python expert, but I think they are probably mistaken.
That's a relief tbh :)
> Incidentally, modern VMs were introduced for flexibility and
> utilization, much in the same way that containers were. I don't think
> considerations about only being able to hold X many files open at a
> time really entered into it. Sockets maybe, but I doubt it was an
> upper bound on the number of socket descriptors available so much as
> scalability in a particular system; e.g., POSIX (and plan 9) semantics
> around file descriptor allocation practically force it to be quadratic
> unless you're willing to throw a fair bit of per-process complexity at
> it; if you've got a lot of network connections coming and going, that
> could be a bottleneck.
yes, could have been sockets and not fds.
> Yeah. It's more an issue of what happens to the library if some other
> bit of code in the process yanks the FD out from under it. If some FDs
> become magical, in that you start to restrict whether they can be
> closed, or by who, or whatever, then how far do you push the
> complexity that that approach implies? And is it worth it to avoid
> just adding a system call?
Isn't that issue shared by all programs that use any FDs? Have you
seen any code here that goes around and closes all FDs?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-14 22:41 ` Paul Lalonde
@ 2025-03-15 1:19 ` Lyndon Nerenberg (VE7TFX/VE6BBM)
0 siblings, 0 replies; 27+ messages in thread
From: Lyndon Nerenberg (VE7TFX/VE6BBM) @ 2025-03-15 1:19 UTC (permalink / raw)
To: 9front, Paul Lalonde
Paul Lalonde writes:
> Well, add at least one other golang user on 9front.
+1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] [9fans] monotonic time and randomness on Plan 9
2025-03-12 20:11 ` Kurt H Maier
2025-03-14 12:04 ` theinicke
@ 2025-03-15 10:21 ` Shawn Rutledge
2025-03-15 13:33 ` Paul Lalonde
1 sibling, 1 reply; 27+ messages in thread
From: Shawn Rutledge @ 2025-03-15 10:21 UTC (permalink / raw)
To: 9front
> On Mar 12, 2025, at 21:11, Kurt H Maier <khm@sciops.net> wrote:
>
> The big deal is that this is the third (or fourth, depending on how you
> choose to count these things) ride on the "golang devs shitting up the
> kernel" roller coaster.
Well ideally wouldn’t it be nice if it’s possible to get them take 9 support seriously?
We know there are some smart people involved, so why doesn't the end result turn out better?
I tried golang on 9front. I had some conversation about it on IRC, and what I got was don’t go there, because go reinvents too many things. And I can see for myself that the binaries are inexplicably large relative to how much code I wrote. But they run, and it’s nice to have the option.
I don’t understand the bloat. A statically-linked C program and a statically-linked go program ought to be comparable, even if they did rewrite too many things. IMO shared libraries are a decent way to share blocks of code that are not expected to change much, but we don’t have that in either case.
So I haven’t quite made up my mind whether I should be more of a golang user. Portability is appealing. But why doesn’t the compiler/linker optimize more, and leave out the code that you aren’t using, whatever it is?
I guess the plan 9 answer is just keep it simple: don’t write such complex programs that you need to worry about bloat or sharing large volumes of code, or that you find it too onerous to write in C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] [9fans] monotonic time and randomness on Plan 9
2025-03-15 10:21 ` [9front] " Shawn Rutledge
@ 2025-03-15 13:33 ` Paul Lalonde
0 siblings, 0 replies; 27+ messages in thread
From: Paul Lalonde @ 2025-03-15 13:33 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 2352 bytes --]
To a first approximation we have zero plan9 users. Of that tiny number, an
even smaller number use Go.
I'm astounded, pleased, and grateful that the Go devs support the platform
at all, which largely they do out of nostalgia and community service.
Regarding bloat, Go includes a significant runtime in every application -
about one MB worth. That gets you memory management, channels, coroutines,
threading, slices, maps, and more - effectively the core language features
that any reasonably complex Go program comes to rely on. The choice was to
commit to a baseline environment in order to simplify other parts of the
toolchain. For my use cases, this is acceptable. Your mileage will vary.
Paul
On Sat, Mar 15, 2025 at 4:13 AM Shawn Rutledge <lists@ecloud.org> wrote:
>
> > On Mar 12, 2025, at 21:11, Kurt H Maier <khm@sciops.net> wrote:
> >
> > The big deal is that this is the third (or fourth, depending on how you
> > choose to count these things) ride on the "golang devs shitting up the
> > kernel" roller coaster.
>
> Well ideally wouldn’t it be nice if it’s possible to get them take 9
> support seriously?
>
> We know there are some smart people involved, so why doesn't the end
> result turn out better?
>
> I tried golang on 9front. I had some conversation about it on IRC, and
> what I got was don’t go there, because go reinvents too many things. And I
> can see for myself that the binaries are inexplicably large relative to how
> much code I wrote. But they run, and it’s nice to have the option.
>
> I don’t understand the bloat. A statically-linked C program and a
> statically-linked go program ought to be comparable, even if they did
> rewrite too many things. IMO shared libraries are a decent way to share
> blocks of code that are not expected to change much, but we don’t have that
> in either case.
>
> So I haven’t quite made up my mind whether I should be more of a golang
> user. Portability is appealing. But why doesn’t the compiler/linker
> optimize more, and leave out the code that you aren’t using, whatever it is?
>
> I guess the plan 9 answer is just keep it simple: don’t write such complex
> programs that you need to worry about bloat or sharing large volumes of
> code, or that you find it too onerous to write in C.
>
>
[-- Attachment #2: Type: text/html, Size: 2801 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-14 23:33 ` hiro
@ 2025-03-17 15:49 ` Dan Cross
2025-03-17 22:13 ` ron minnich
2025-03-18 2:16 ` hiro
0 siblings, 2 replies; 27+ messages in thread
From: Dan Cross @ 2025-03-17 15:49 UTC (permalink / raw)
To: 9front
On Fri, Mar 14, 2025 at 7:35 PM hiro <23hiro@gmail.com> wrote:
> > My advice is: don't worry about that. Keep the discussion centralized
> > on 9fans; Russ is genuinely one of the nicest people I know, he won't
> > bite.
>
> That's reassuring. Thank you.
No problem, but it's probably worth pointing out that, at this point,
the conversation on 9fans has changed and most of this is now
irrelevant. Also, I hadn't really understood the problem under
consideration, and why this was being discussed at all (I mentioned in
my earlier email that I hadn't paid attention to 9fans in many years
until Skip told me about this conversation; having now caught up with
the context, much of what I wrote earlier is irrelevant), so sorry for
the noise on that front. I'll make some general comments, though, in
response to some of your questions below that are (I think) largely
independent of the original.
As it happened, the issue on 9fans was about the need for tracking
monotonic time: that is, time that can't go backwards (like the time
of day clock can), not performance or the other considerations that I
have been talking about. Monotonicity is a very different problem than
accommodating the need to get accurate timestamps thousands of times a
second.
> > But taking a timestamp and then computing an offset based on the TSC
> > is a fine way to _approximate_ time. It's not as accurate as having
> > the kernel do it for you, as Russ indicated, but if you allow the user
>
> Why not? The syscall delays might be longer than the time it takes to
> just read the TSC. What makes you think that the inaccuracy caused by
> syscall overhead is acceptable?
This is a difference between accuracy and precision, and is a matter
of what one is tracking.
A system call suffers from some overhead (though that overhead may be
low for simple calls: reading the clock is pretty simple, just copying
some data into a register and returning to userspace), which limits
precision, but the kernel is better equipped to track time than
userspace programs are, since it can coordinate with something like
timesync to make smooth adjustments based on the offset from some
external reference, so it's more accurate. So while just doing `RDTSC`
gets you very _precise_ cycle counts, if you're using it to compute
wall clock time based on a delta from some measurement, it will
eventually drift away from your external reference, and be less
_accurate_.
For something like mounting a side-channel attack, accuracy vis an
actual agreed-upon time (like UTC) doesn't matter very much; for
something like maintaining consistency across a globally distributed
database, it may matter a lot (e.g., Google's Spanner uses
plesiosynchronous timing to order things relative to UTC).
> > to read the clock at all, ever, _and_ allow them access to timing
> > source that's locked to some stable reference (like the CPU clock
> > frequency, DVFS notwithstanding) then you've lost, as far as the
> > security of the ToD clock across, say, hyperthread buddy pairs goes.
>
> Ah, I did forget about hyperthreading. I was assuming no
> hyperthreading and instead was looking at the edge-case of inter-core
> side-channels.
> Anyways, the point is I don't see many legit applications for such
> tight synchronization in the first place.
> Why can't we live with the inaccuracy that comes with TSC as it's a
> monotonic counter anyways by design?
Then I don't understand the objection. The security concerns around
side-channels come from having access to a high-frequency,
high-precision timing source, not the wall clock time.
> > I wouldn't describe it that way, but honestly, I think arguing against
> > a ToD system call on security grounds isn't going to be persuasive.
>
> Now you see another reason I didn't want to bother Russ: I do not want
> to suggest this at all :)
It seems that the solution that was ultimately agreed upon in 9fans
was just to extend the data returned from /dev/time and /dev/bintime
to include a monotonic counter. So all of this business about system
calls and so forth is moot.
> Indeed, artificially reducing timing resolution seems particularly
> short-sighted to me. I can understand the motive, even though I
> disagree.
> Either way I want the use-cases and motives documented. Artificially
> increasing timing accuracy should be grounded on a proper use-case.
> The natural response is to change nothing. But if it's *worth*
> increasing time accuracy I would like to know why and how.
This wasn't even the important thing; just that time didn't go backwards. :-D
> Also, again, It is not obvious to me why calling an extra syscall
> ensures high accuracy from the point of view of a userland
> application. I would really like an example.
> In the real world, when very accurate timestamps are necessary one
> generally lets the hardware insert them (for example ptp works like
> this and thus requires special ethernet hardware support). Our
> drivers/interrupt handlers, schedulers, etc. introduce delays, so
> system designers learned to not expose time-critical stuff to a whole
> stack of both operating systems and applications.
That's not exactly how PTP works. IEEE 1588 does define a mechanism so
that an OS can get a timestamp from ethernet, avoiding a lot of
software jitter by giving you a signal of the form, "at the time this
interrupt was generated, the time was X." But it can also use
something like a PPS signal from a GPS receiver as an input source.
Regardless, for software to make use of that time, you've still got to
get the OS involved, assuming you've got an operating system at all
(and not just some microcontroller thing that's running on bare
metal). But the OS is synchronizing time to a very stable, accurate,
and precise time source, so you get high accuracy. And if the path
length between the program and the OS's notion of time is short
enough, it _can_ be very precise, as well.
In timing on real-world computers, you will always have some amount of
error; the trick is to usefully bound that. This means you've got to
do some math around your timing sources, current notion of time, rate
at which you're advancing the clock, etc. To get good accuracy, these
equations have lots of terms with varying coefficients, which is why
this has largely shifted to being centralized in the kernel (or
provided by the kernel, as in the case of the vdso), which is closest
to the clock.
> > That's an approach one could take, sure. But an issue there is that it
> > injects complexity into process creation in a way that is difficult to
> > hide from userspace; you'd essentially need a hook into an RNG library
> > that runs every time you `fork` and with the extant `rfork` semantics
> > this could lead to weird races and so on, and doesn't cover the case
> > where a user provides their own RNG, as opposed to using one from libc
> > or similar. That it doesn't compose well with the existing primitives
> > suggests to me that it's not a great approach in a plan9-ish context.
>
> Right, so back to /dev/random, good enough for getting a seed into userspace.
+1
> > Well yeah, but looking at the abstractions at hand in plan9-ish style
> > systems, you've got basically three choices: a file, synthesized by
> > the operating system, that you interact with via the usual
> > open/close/read/write; a system call; or something involving virtual
> > memory (e.g., a page shared between the kernel and userspace). The
> > latter one has very little precedent in plan9, and raises a whole host
> > of design questions, so I'd probably rule it out; at least for
> > entropy, less so for time. That leaves the first two: for entropy,
> > I'd argue for the file, but for time, a system call seems appropriate.
>
> I see we all agree there's no downside for /dev/random.
> This begs the question, why did Russ want a syscall for reading from
> /dev/random?
Dunno. I can't speak for him on that, of course, but I gather he feels
that it's so fundamental it shouldn't require opening a file first.
I'm not sure why.
> I feel like nobody knows, and I feel strengthened enough to propose
> this question to him finally.
>
> For time the syscall would be appropriate, if two things are true:
> 1) higher accuracy than possible with a read() from /dev/bintime is needed
> 2) a syscall is faster than a register read.
I'm not sure I understand what you mean by (2). A system call, since
it involves multiple register moves, will by definition be slower than
a single register move.
> if 1 is false, then the complexity seems wasteful.
> if 2 is false, then accuracy might not be the goal, which makes me
> question the original supposed intention.
>
> Am I still missing something?
This is the part that's been overcome by the events of the discussion
on 9fans. The real issue was just having a monotonic time source
available; Russ's argument that this should always be available as an
ambient source seemed like something he was proposing as a more
general thing. But that's changed and no one is talking about magic
"kernel" file-descriptors or system calls now.
> > line between whether something should be provided by a file
> > abstraction versus a system call?
>
> A file abstraction *can* be used by a system call already: read(fd)
> where fd points to an open file.
Heh, sure. But the context here is the distinction between a
special-purpose system call versus a general-purpose read against a
special file. Surely this was clear?
It may be worth mentioning why one might prefer the former, though,
and that has to do with the "path" length in the kernel. A special
system call to return the current notion of time is likely very short;
there's some necessary trap handling goo on entry into the kernel
(mostly saving a bunch of state and getting running on the kthread for
that proc, then jumping to the syscall dispatch function), and
similarly on existing the kernel (unwinding all of entry stuff and
restoring state; maybe handling note delivery, possibly doing a
context switch, etc), but that stuff is bounded and fairly
predictable: most of the time we're not delivering notes and we can
time things that and see how long these parts usually take. What's
more interesting, however, is what happens while we're in the kernel,
between when we enter and exit. And that can vary _wildly_.
Consider this hypothetical system call to read the time, for example:
conceptually, this is pretty simple, and we can imagine that its
handler probably just copies the system's notion of the current time
into a register and returns. The code that does that probably compiles
down to a handful of instructions (maybe, I dunno, 10 or 20 or
something like that). Maybe you might want to wrap the time read in a
lock (though if it's updated atomically that feels unnecessary; a
barrier would probably be good enough on non-TSO machines) but it's
still a pretty short path.
On the other hand, `rfork` is much more expensive: you've got to
allocate a new process, which may allocate memory, and depending on
the flags you give it, you've create a new address space, copy memory
from the parent to the child, duplicate a bunch of state (all open
files, which is going to lock a bunch of chans, incref them, etc, plus
environment variables, current working directory, etc). The system
might be short of RAM or procs or whatever, so the parent might sleep,
other things may run in the interval, etc, etc, etc.
Similarly with doing a `read`: this has to indirect through the chan
and get to the thing you're actually reading, but the generic
machinery doesn't know that that's just a clock, so it has to be
prepared to sleep or whatever; moreover, since `read` is so general,
it's going to have to copyout into user space and that means it'll
have to inspect the address space to make sure that where it's writing
to isn't garbage, handle generating an error if it is, and so on.
No matter how you shake it, the general mechanisms are a lot more expensive.
Now that doesn't a priori mean that adding a new system call is always
the best thing to do, and I wouldn't argue for that; as I said, it's
about making sound engineering tradeoffs. But it is appropriate to
recognize that not all system calls are created equal in terms of
complexity or cost.
In this case, the consensus has come down on the side of just
extending the information returned from the relevant files. Good to
go.
> A system call change is bad for binary compatibility. open/read
> syscalls on File paths are much more flexible.
Flexibility has a cost (see below).
But I think you're onto something with concerns about binary
compatibility. In my opinion, that's the best argument so far against
something more invasive, like a new system call.
> The line so far seems to be a matter of historic circumstance. for
> example we can create a new pipe with a syscall, bec. the syscall was
> there first.
That "it was here first" reasoning seems specious to me. There's
nothing magical about rfork or pipe that implies that they must be
system calls. Pipe, in particular, could very reasonably be
implemented via a filesystem abstraction. Again, I'd ask one to think
about why they are not. Did it not occur to the original implementers?
Possibly, I suppose. Did they think about it but discard the idea?
Again, possibly. But they aren't (or sadly in some cases weren't)
infallible, and that doesn't mean that the decisions they made almost
40 years ago are still the best ways to go about things. The
environment and context changes, and it's important to be open to
reevaluating things in light of that. That doesn't mean that one MUST
change how one does things, of course, but it's ok to at least think
about it every now and again.
> But what new feature has ever needed a new syscall in the last 10
> years? And what is the rationale why it couldn't be done without?
> I don't think there's actually precedence. While there's new ioctl
> bugs every few days, we seem to move the other way on plan9, away from
> syscalls, to simple text-based, human debuggable nameable interfaces.
This I'm going to push back on.
We tend to think of the interface between the kernel and userspace
programs on plan9 as being very simple and lean, because the number of
system calls is very small compared to other systems. However, this
isn't actually true: the system interface is extremely wide, because
so many devices expose special files that users can write into to
modify system behavior. Most of those use text, which has a number of
attractive advantages, but most of that text is unstructured, badly
documented, idiosyncratic, and parsed in an ad hoc manner; some of it
is downright buggy (how many devices tolerate splitting a ctl message
across multiple write()'s?). There's actually a lot of complexity
hiding in there.
Of course, the system largely works because most users know not to do
that, and the standard tools help here (`echo` accumulates its
arguments into a buffer and emits them all in a single write(), for
example), but this is acquired knowledge. And even then sometimes
things do weird stuff; I've had some devices fail to parse ctl
messages that had a newline at the end of the command string, while
others required a newline.
Don't get me wrong; I think that the plan9 way of doing this sort of
thing is much better than `ioctl`, but we should be clear that it
doesn't mean that the plan9 system interface is inherently simpler or
more constained because we're writing text into ctl files instead of
plumbing random binary structures as in Unix.
> > To illustrate, look at a slightly different case: process creation a
> > la `rfork()`. Plan 9 has always had `/proc`; one could imagine a
> > `#p/clone` that worked something like `/net/tcp/clone`: you open this
> > file and read from it, and a new process springs into existence, the
> > child of the current process, that is a duplicate of the parent except
> > that the data read is the path to a new directory representing the
> > child process in the parent, or the directory of the parent in the
> > child (or nil on failure). Perhaps one might write a set of textual
> > flags that represent the various flags one can pass to `rfork` to
> > control sharing and so forth, before reading. On the face of it, it
> > doesn't immediately seem like an awful interface, and I have no doubt
> > that something like this _could_ be built. But we _don't_ have that;
> > instead, `rfork()` is a system call. Why is that? I don't claim to
> > know definitively, but I imagine that someone might have at least
> > considered it once. Likely, it's too hard to get the details right
> > (some problems with concurrency spring to mind immediately, if the
> > read is into a non-stack buffer; what would it mean if I tried to do
> > access a remote machine's `/proc/clone`; etc), and process creation is
> > so frequent that the overhead of interacting with the file API isn't
> > worth it for some imagined gain in elegance or orthogonality. Anyway,
> > I think the question is at least worth pondering if we're asking
> > whether something should be elevated to the level of a syscall. In the
> > `rfork` case, being a system call really is simpler and more robust
> > for such a fundamental operation. I believe that's the case Russ is
> > trying to make, as well. It will always be a tradeoff; the key is
> > deciding whether that tradeoff is worth it along some important
> > dimension.
>
> The reason fork is a syscall is clearly just historical.
That may be, but I don't know that this statement can be made
definitively from available evidence; it's not clear at all.
> Could it be
> done differently today if we did it from scratch? Sure! Do we want
> this? No, we are quite conservative towards such fundamental changes.
> I think we have to be very careful and think this through.
I agree; that was kind of my point. If you were to expose a clone file
or something for processes, you'd suddenly have to confront a number
of emergent behaviors that, perhaps, hadn't been anticipated.
Reducing complexity in one area (one fewer system call! Yay!) would be
replaced by complexity in another area (weird stuff happens when you
clone over a network! Boo....). Like I said, it's a tradeoff.
> But I'd love a good proposal how to send beta-level simulations of my
> process and process context across universes at FTL speed. This would
> require solving the problem with dangling FDs, which would be a much
> greater excuse to rethink how our FDs (and syscalls) work. The current
> best-practice implementations of "global namespace" will not be good
> enough for cp /mnt/earth/proc/123 /mnt/alphacentauri/proc/. FDs would
> need some kind of magic cross-kernel redirection, or we replace FDs
> with something that has actual names.
> Until then I think it's ok to fork.
>
> > > and the syscall interface, remember also that
> > > userland Libraries can be used to hide the sequence into a single
> > > function call.
> >
> > Yes, but at a significant cost in overhead.
> >
> > > BTW: how often do we need accurate time?
> >
> > Oh, you need it All. The. Time. On busy servers, many, many times a
> > second. Rates in the kilohertz, spread across many cores.
>
> time yes, but accurate time? why?
> maybe the problem is that not all of us here assume that even
> high-resolution timestamps can also become practically inaccurate.
> how accurate are we speaking? personally, i don't get it at all. in
> 99% of all cases i just care about 1-second resolution, as long as
> everything is monotonic.
> in the remaining 1% i might misuse time as a half-assed indicator of
> ordering (generally unreliable).
So something that I've had to confront as a kernel person is that the
universe of interesting userspace software is very large, and part of
my job is to accommodate big chunks of it. I can argue that userspace
software that doesn't work the way I think it ought to should change,
but I can't really control that. If I'm lucky, I can convince some
people to change their programs; if I'm really lucky, they may think
my way is better and push for it in other systems. But most of the
time, if I want folks to use my system I've got to accommodate them
somehow. At all times, I _should_ try to understand their use cases.
Moreover, userspace stuff may be driven by considerations that are
much different than my own. But the overall point is that what I care
about based on my own sense of aesthetics and experience may not be
all that relevant to someone else, and generally speaking, those folks
aren't dummies.
> > Of course it depends on the application, but in a lot of production
> > settings, the need is almost absurdly frequent and the overhead of
> > even a system call is absolutely brutal,
>
> I feared the same. Maybe Russ missed this. I shall inquire.
It turns out it wasn't about that after all. Me culpa there. :-)
> > much effort in the Linux world to get that stuff into userspace: first
> > by doing an initial read and then incrementing off of the TSC, and
> > then by using e.g. the vdso.
>
> Yeah, it's complex, but I can imagine it's worth it for some edge-cases.
>
> > Go cares a lot because they do a lot with high-precision timers in the
> > runtime.
>
> Do you mean TSC, or actual high-accuracy time sources?
> TSC is cheap and would make sense to me. Other kinds of timers would seem fishy.
I dunno. I don't actually know what Go chooses to use as a timing
source in the runtime. I do know that their runtime maintains a
callout list ordered by time, and that they cancel a lot of timers, so
they keep track of time pretty aggressively. Clearly, the runtime
reacts badly to non-monotonic time.
> > Most 9fans probably don't need it, but on the high end you
> > really want to reduce the overhead as much as possible; the closer you
> > can get it just being a load from RAM the better.
>
> Time isn't generated by RAM. It's a property of the universe. :P
Heh. What I mean is retrieving it in a userspace program. If an
accurate time value was available as cheaply as a load, that would be
ideal.
> > I think it's being taken as a given that the overhead of
> > open/read/close is too high, and so if one wants to retain the file
> > metaphor for accessing time, that implies keeping an open FD to some
> > device around. But file descriptors are a process-global resource, and
> > can be closed by totally unrelated bits of code: there's no concept of
> > a "private" FD in the way that there is for "private memory", such as
> > a static inside of a function, or the stack segment of a process or
> > something. So whatever code is used to return the time has to deal
> > with a bunch of weird failure cases, which adds both complexity _and_
> > overhead.
>
> Yes, which is no different to how we already have to handle stdin/out etc.
> adding a stdtime in golang doesn't seem like too much clutter to me personally.
> if all code only ever does single reads on the time fd, i see no
> problem with parallel access either.
> i see no reason why any process should close the time fd, and given
> that, where would those weird failure cases come from?
Here's where I admit I veered very far from the context. My premise,
that open/read/close being too expensive was accepted as a given, was
simply wrong.
> And there's the matter of what format the data should take:
> > traditionally, plan 9 would return text (e.g., `cat /dev/time` right
> > now). That adds overhead since it probably has to be parsed. If you
> > need nanosecond resolution for the phase-locked plesiosynchronous
> > timer that ensures consistency across your HPC cluster or whatever,
>
> are they making PLL as a service in the cloud now?
> seriously, for *actual* practically high-accuracy low-latency work you
> would not want scheduling delays and syscall overheads.
Consider something like NIX, where you've got cores that are
effectively dedicated to a particular job; akaros did similar things
with spatial scheduling across lots of cores. There's no scheduling
overhead, really, because nothing is scheduled on those cores; with
Akaros, interference from the OS was almost exclusively something the
program itself requested, up to things required for the machine to
work reliably (like TLB shootdowns). So once you've chopped out that
noise, you're left with bounding the overhead of getting the value as
closely as you can, and minimizing that overhead as much as you can.
That's doable, as some of the HARE work showed (I think Ron's curried
syscall closures are really pretty slick).
> sure you can have high-accuracy timestamps at high latencies, no
> problem. but if the latency is high, again, why did you want
> high-accuracy in the first place?
It really depends on the application, doesn't it? Again, there's a lot
of userspace software out there; a lot more than kernel software. I
really can't speak to all of it. Perhaps this is a good question to
ask, "Have you considered another way of doing it?" But if we get a
good answer to that question ("yes, but we're doing it this way
because X, Y and Z..."), we've got to deal with that. In that spirit,
I kind of feel like this is asking the wrong question.
> maybe just use dedicated hardware like everybody else?
I'm not sure what dedicated hardware you are referring to?
> or better, show me your magical "real" "time" system that can do
> anything close to this.
> you just can't have high accuracy scheduling with guarantees AND lower
> latency than usual AND high processing power just by .
>
> > you may not be able to easily provide that if you've got to read and
> > parse the data. Presumably a system call just copies into a
> > native-order binary vlong or something.
>
> Unlike simple counters like TSC, time/date generally has to be parsed
> to be meaningful. that parsing is likely expensive cause multiple
> political systems, calendars, timezones, and many historical events
> have to be taken into account potentially...
> yes it sounds like too much overhead for my taste. just use a simple
> counter like TSC.
>
> > Rather, it is about identifying a specific need and the best way to
> > serve that need. In the context of a plan9 system, I would argue that
> > a system call for high precision timing is actually a pretty good
> > engineering tradeoff: it's not a terribly invasive change, avoids a
> > lot of complex (and honestly kind of weird) edge cases, and provides
> > an adequate, if imperfect, solution to a real-world problem.
>
> What exactly is that real-world problem? it has not been described
> yet. Everything was vague, HPC was mentioned now as very sensitive (to
> slow hardware, phase noise, etc.), but which exact HPC SOFTWARE
> problem is this software solution solving?
>
> > Oh, that hasn't been a problem for decades now. No offense to your
> > python expert, but I think they are probably mistaken.
>
> That's a relief tbh :)
>
> > Incidentally, modern VMs were introduced for flexibility and
> > utilization, much in the same way that containers were. I don't think
> > considerations about only being able to hold X many files open at a
> > time really entered into it. Sockets maybe, but I doubt it was an
> > upper bound on the number of socket descriptors available so much as
> > scalability in a particular system; e.g., POSIX (and plan 9) semantics
> > around file descriptor allocation practically force it to be quadratic
> > unless you're willing to throw a fair bit of per-process complexity at
> > it; if you've got a lot of network connections coming and going, that
> > could be a bottleneck.
>
> yes, could have been sockets and not fds.
>
> > Yeah. It's more an issue of what happens to the library if some other
> > bit of code in the process yanks the FD out from under it. If some FDs
> > become magical, in that you start to restrict whether they can be
> > closed, or by who, or whatever, then how far do you push the
> > complexity that that approach implies? And is it worth it to avoid
> > just adding a system call?
>
> Isn't that issue shared by all programs that use any FDs? Have you
> seen any code here that goes around and closes all FDs?
Not only have I seen that code, I've written it: `for (int i = 3; i <
whatever; i++) close(i); // Whoops.` Probably not my finest work, but
a lot of assumptions are hidden in a lot of code.
The cool thing about the curried syscall stuff is that it's immune to
this: the file descriptor can be closed, but the kernel retains a
reference to the Chan for the time device for the duration of the
process. That's a very elegant solution.
- Dan C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-17 15:49 ` Dan Cross
@ 2025-03-17 22:13 ` ron minnich
2025-03-18 2:16 ` hiro
1 sibling, 0 replies; 27+ messages in thread
From: ron minnich @ 2025-03-17 22:13 UTC (permalink / raw)
To: 9front
[-- Attachment #1: Type: text/plain, Size: 30861 bytes --]
by the way, related by not related: there was a GPS receiver that did the
following to adjust time forward 1 second, years ago:
go back X seconds, go forward X+1 seconds.
This did not end well.
But it was ok b/c the customers who bought it had many devices, from
several vendors, and were able to handle this by knocking that device out
of contention and, eventually, out of data centers :-)
monotonic time can be harder than it looks. Even the easy solutions (just
buy hardware!) can be a problem.
On Mon, Mar 17, 2025 at 8:55 AM Dan Cross <crossd@gmail.com> wrote:
> On Fri, Mar 14, 2025 at 7:35 PM hiro <23hiro@gmail.com> wrote:
> > > My advice is: don't worry about that. Keep the discussion centralized
> > > on 9fans; Russ is genuinely one of the nicest people I know, he won't
> > > bite.
> >
> > That's reassuring. Thank you.
>
> No problem, but it's probably worth pointing out that, at this point,
> the conversation on 9fans has changed and most of this is now
> irrelevant. Also, I hadn't really understood the problem under
> consideration, and why this was being discussed at all (I mentioned in
> my earlier email that I hadn't paid attention to 9fans in many years
> until Skip told me about this conversation; having now caught up with
> the context, much of what I wrote earlier is irrelevant), so sorry for
> the noise on that front. I'll make some general comments, though, in
> response to some of your questions below that are (I think) largely
> independent of the original.
>
> As it happened, the issue on 9fans was about the need for tracking
> monotonic time: that is, time that can't go backwards (like the time
> of day clock can), not performance or the other considerations that I
> have been talking about. Monotonicity is a very different problem than
> accommodating the need to get accurate timestamps thousands of times a
> second.
>
> > > But taking a timestamp and then computing an offset based on the TSC
> > > is a fine way to _approximate_ time. It's not as accurate as having
> > > the kernel do it for you, as Russ indicated, but if you allow the user
> >
> > Why not? The syscall delays might be longer than the time it takes to
> > just read the TSC. What makes you think that the inaccuracy caused by
> > syscall overhead is acceptable?
>
> This is a difference between accuracy and precision, and is a matter
> of what one is tracking.
>
> A system call suffers from some overhead (though that overhead may be
> low for simple calls: reading the clock is pretty simple, just copying
> some data into a register and returning to userspace), which limits
> precision, but the kernel is better equipped to track time than
> userspace programs are, since it can coordinate with something like
> timesync to make smooth adjustments based on the offset from some
> external reference, so it's more accurate. So while just doing `RDTSC`
> gets you very _precise_ cycle counts, if you're using it to compute
> wall clock time based on a delta from some measurement, it will
> eventually drift away from your external reference, and be less
> _accurate_.
>
> For something like mounting a side-channel attack, accuracy vis an
> actual agreed-upon time (like UTC) doesn't matter very much; for
> something like maintaining consistency across a globally distributed
> database, it may matter a lot (e.g., Google's Spanner uses
> plesiosynchronous timing to order things relative to UTC).
>
> > > to read the clock at all, ever, _and_ allow them access to timing
> > > source that's locked to some stable reference (like the CPU clock
> > > frequency, DVFS notwithstanding) then you've lost, as far as the
> > > security of the ToD clock across, say, hyperthread buddy pairs goes.
> >
> > Ah, I did forget about hyperthreading. I was assuming no
> > hyperthreading and instead was looking at the edge-case of inter-core
> > side-channels.
> > Anyways, the point is I don't see many legit applications for such
> > tight synchronization in the first place.
> > Why can't we live with the inaccuracy that comes with TSC as it's a
> > monotonic counter anyways by design?
>
> Then I don't understand the objection. The security concerns around
> side-channels come from having access to a high-frequency,
> high-precision timing source, not the wall clock time.
>
> > > I wouldn't describe it that way, but honestly, I think arguing against
> > > a ToD system call on security grounds isn't going to be persuasive.
> >
> > Now you see another reason I didn't want to bother Russ: I do not want
> > to suggest this at all :)
>
> It seems that the solution that was ultimately agreed upon in 9fans
> was just to extend the data returned from /dev/time and /dev/bintime
> to include a monotonic counter. So all of this business about system
> calls and so forth is moot.
>
> > Indeed, artificially reducing timing resolution seems particularly
> > short-sighted to me. I can understand the motive, even though I
> > disagree.
> > Either way I want the use-cases and motives documented. Artificially
> > increasing timing accuracy should be grounded on a proper use-case.
> > The natural response is to change nothing. But if it's *worth*
> > increasing time accuracy I would like to know why and how.
>
> This wasn't even the important thing; just that time didn't go backwards.
> :-D
>
> > Also, again, It is not obvious to me why calling an extra syscall
> > ensures high accuracy from the point of view of a userland
> > application. I would really like an example.
> > In the real world, when very accurate timestamps are necessary one
> > generally lets the hardware insert them (for example ptp works like
> > this and thus requires special ethernet hardware support). Our
> > drivers/interrupt handlers, schedulers, etc. introduce delays, so
> > system designers learned to not expose time-critical stuff to a whole
> > stack of both operating systems and applications.
>
> That's not exactly how PTP works. IEEE 1588 does define a mechanism so
> that an OS can get a timestamp from ethernet, avoiding a lot of
> software jitter by giving you a signal of the form, "at the time this
> interrupt was generated, the time was X." But it can also use
> something like a PPS signal from a GPS receiver as an input source.
> Regardless, for software to make use of that time, you've still got to
> get the OS involved, assuming you've got an operating system at all
> (and not just some microcontroller thing that's running on bare
> metal). But the OS is synchronizing time to a very stable, accurate,
> and precise time source, so you get high accuracy. And if the path
> length between the program and the OS's notion of time is short
> enough, it _can_ be very precise, as well.
>
> In timing on real-world computers, you will always have some amount of
> error; the trick is to usefully bound that. This means you've got to
> do some math around your timing sources, current notion of time, rate
> at which you're advancing the clock, etc. To get good accuracy, these
> equations have lots of terms with varying coefficients, which is why
> this has largely shifted to being centralized in the kernel (or
> provided by the kernel, as in the case of the vdso), which is closest
> to the clock.
>
> > > That's an approach one could take, sure. But an issue there is that it
> > > injects complexity into process creation in a way that is difficult to
> > > hide from userspace; you'd essentially need a hook into an RNG library
> > > that runs every time you `fork` and with the extant `rfork` semantics
> > > this could lead to weird races and so on, and doesn't cover the case
> > > where a user provides their own RNG, as opposed to using one from libc
> > > or similar. That it doesn't compose well with the existing primitives
> > > suggests to me that it's not a great approach in a plan9-ish context.
> >
> > Right, so back to /dev/random, good enough for getting a seed into
> userspace.
>
> +1
>
> > > Well yeah, but looking at the abstractions at hand in plan9-ish style
> > > systems, you've got basically three choices: a file, synthesized by
> > > the operating system, that you interact with via the usual
> > > open/close/read/write; a system call; or something involving virtual
> > > memory (e.g., a page shared between the kernel and userspace). The
> > > latter one has very little precedent in plan9, and raises a whole host
> > > of design questions, so I'd probably rule it out; at least for
> > > entropy, less so for time. That leaves the first two: for entropy,
> > > I'd argue for the file, but for time, a system call seems appropriate.
> >
> > I see we all agree there's no downside for /dev/random.
> > This begs the question, why did Russ want a syscall for reading from
> > /dev/random?
>
> Dunno. I can't speak for him on that, of course, but I gather he feels
> that it's so fundamental it shouldn't require opening a file first.
> I'm not sure why.
>
> > I feel like nobody knows, and I feel strengthened enough to propose
> > this question to him finally.
> >
> > For time the syscall would be appropriate, if two things are true:
> > 1) higher accuracy than possible with a read() from /dev/bintime is
> needed
> > 2) a syscall is faster than a register read.
>
> I'm not sure I understand what you mean by (2). A system call, since
> it involves multiple register moves, will by definition be slower than
> a single register move.
>
> > if 1 is false, then the complexity seems wasteful.
> > if 2 is false, then accuracy might not be the goal, which makes me
> > question the original supposed intention.
> >
> > Am I still missing something?
>
> This is the part that's been overcome by the events of the discussion
> on 9fans. The real issue was just having a monotonic time source
> available; Russ's argument that this should always be available as an
> ambient source seemed like something he was proposing as a more
> general thing. But that's changed and no one is talking about magic
> "kernel" file-descriptors or system calls now.
>
> > > line between whether something should be provided by a file
> > > abstraction versus a system call?
> >
> > A file abstraction *can* be used by a system call already: read(fd)
> > where fd points to an open file.
>
> Heh, sure. But the context here is the distinction between a
> special-purpose system call versus a general-purpose read against a
> special file. Surely this was clear?
>
> It may be worth mentioning why one might prefer the former, though,
> and that has to do with the "path" length in the kernel. A special
> system call to return the current notion of time is likely very short;
> there's some necessary trap handling goo on entry into the kernel
> (mostly saving a bunch of state and getting running on the kthread for
> that proc, then jumping to the syscall dispatch function), and
> similarly on existing the kernel (unwinding all of entry stuff and
> restoring state; maybe handling note delivery, possibly doing a
> context switch, etc), but that stuff is bounded and fairly
> predictable: most of the time we're not delivering notes and we can
> time things that and see how long these parts usually take. What's
> more interesting, however, is what happens while we're in the kernel,
> between when we enter and exit. And that can vary _wildly_.
>
> Consider this hypothetical system call to read the time, for example:
> conceptually, this is pretty simple, and we can imagine that its
> handler probably just copies the system's notion of the current time
> into a register and returns. The code that does that probably compiles
> down to a handful of instructions (maybe, I dunno, 10 or 20 or
> something like that). Maybe you might want to wrap the time read in a
> lock (though if it's updated atomically that feels unnecessary; a
> barrier would probably be good enough on non-TSO machines) but it's
> still a pretty short path.
>
> On the other hand, `rfork` is much more expensive: you've got to
> allocate a new process, which may allocate memory, and depending on
> the flags you give it, you've create a new address space, copy memory
> from the parent to the child, duplicate a bunch of state (all open
> files, which is going to lock a bunch of chans, incref them, etc, plus
> environment variables, current working directory, etc). The system
> might be short of RAM or procs or whatever, so the parent might sleep,
> other things may run in the interval, etc, etc, etc.
>
> Similarly with doing a `read`: this has to indirect through the chan
> and get to the thing you're actually reading, but the generic
> machinery doesn't know that that's just a clock, so it has to be
> prepared to sleep or whatever; moreover, since `read` is so general,
> it's going to have to copyout into user space and that means it'll
> have to inspect the address space to make sure that where it's writing
> to isn't garbage, handle generating an error if it is, and so on.
>
> No matter how you shake it, the general mechanisms are a lot more
> expensive.
>
> Now that doesn't a priori mean that adding a new system call is always
> the best thing to do, and I wouldn't argue for that; as I said, it's
> about making sound engineering tradeoffs. But it is appropriate to
> recognize that not all system calls are created equal in terms of
> complexity or cost.
>
> In this case, the consensus has come down on the side of just
> extending the information returned from the relevant files. Good to
> go.
>
> > A system call change is bad for binary compatibility. open/read
> > syscalls on File paths are much more flexible.
>
> Flexibility has a cost (see below).
>
> But I think you're onto something with concerns about binary
> compatibility. In my opinion, that's the best argument so far against
> something more invasive, like a new system call.
>
> > The line so far seems to be a matter of historic circumstance. for
> > example we can create a new pipe with a syscall, bec. the syscall was
> > there first.
>
> That "it was here first" reasoning seems specious to me. There's
> nothing magical about rfork or pipe that implies that they must be
> system calls. Pipe, in particular, could very reasonably be
> implemented via a filesystem abstraction. Again, I'd ask one to think
> about why they are not. Did it not occur to the original implementers?
> Possibly, I suppose. Did they think about it but discard the idea?
> Again, possibly. But they aren't (or sadly in some cases weren't)
> infallible, and that doesn't mean that the decisions they made almost
> 40 years ago are still the best ways to go about things. The
> environment and context changes, and it's important to be open to
> reevaluating things in light of that. That doesn't mean that one MUST
> change how one does things, of course, but it's ok to at least think
> about it every now and again.
>
> > But what new feature has ever needed a new syscall in the last 10
> > years? And what is the rationale why it couldn't be done without?
> > I don't think there's actually precedence. While there's new ioctl
> > bugs every few days, we seem to move the other way on plan9, away from
> > syscalls, to simple text-based, human debuggable nameable interfaces.
>
> This I'm going to push back on.
>
> We tend to think of the interface between the kernel and userspace
> programs on plan9 as being very simple and lean, because the number of
> system calls is very small compared to other systems. However, this
> isn't actually true: the system interface is extremely wide, because
> so many devices expose special files that users can write into to
> modify system behavior. Most of those use text, which has a number of
> attractive advantages, but most of that text is unstructured, badly
> documented, idiosyncratic, and parsed in an ad hoc manner; some of it
> is downright buggy (how many devices tolerate splitting a ctl message
> across multiple write()'s?). There's actually a lot of complexity
> hiding in there.
>
> Of course, the system largely works because most users know not to do
> that, and the standard tools help here (`echo` accumulates its
> arguments into a buffer and emits them all in a single write(), for
> example), but this is acquired knowledge. And even then sometimes
> things do weird stuff; I've had some devices fail to parse ctl
> messages that had a newline at the end of the command string, while
> others required a newline.
>
> Don't get me wrong; I think that the plan9 way of doing this sort of
> thing is much better than `ioctl`, but we should be clear that it
> doesn't mean that the plan9 system interface is inherently simpler or
> more constained because we're writing text into ctl files instead of
> plumbing random binary structures as in Unix.
>
> > > To illustrate, look at a slightly different case: process creation a
> > > la `rfork()`. Plan 9 has always had `/proc`; one could imagine a
> > > `#p/clone` that worked something like `/net/tcp/clone`: you open this
> > > file and read from it, and a new process springs into existence, the
> > > child of the current process, that is a duplicate of the parent except
> > > that the data read is the path to a new directory representing the
> > > child process in the parent, or the directory of the parent in the
> > > child (or nil on failure). Perhaps one might write a set of textual
> > > flags that represent the various flags one can pass to `rfork` to
> > > control sharing and so forth, before reading. On the face of it, it
> > > doesn't immediately seem like an awful interface, and I have no doubt
> > > that something like this _could_ be built. But we _don't_ have that;
> > > instead, `rfork()` is a system call. Why is that? I don't claim to
> > > know definitively, but I imagine that someone might have at least
> > > considered it once. Likely, it's too hard to get the details right
> > > (some problems with concurrency spring to mind immediately, if the
> > > read is into a non-stack buffer; what would it mean if I tried to do
> > > access a remote machine's `/proc/clone`; etc), and process creation is
> > > so frequent that the overhead of interacting with the file API isn't
> > > worth it for some imagined gain in elegance or orthogonality. Anyway,
> > > I think the question is at least worth pondering if we're asking
> > > whether something should be elevated to the level of a syscall. In the
> > > `rfork` case, being a system call really is simpler and more robust
> > > for such a fundamental operation. I believe that's the case Russ is
> > > trying to make, as well. It will always be a tradeoff; the key is
> > > deciding whether that tradeoff is worth it along some important
> > > dimension.
> >
> > The reason fork is a syscall is clearly just historical.
>
> That may be, but I don't know that this statement can be made
> definitively from available evidence; it's not clear at all.
>
> > Could it be
> > done differently today if we did it from scratch? Sure! Do we want
> > this? No, we are quite conservative towards such fundamental changes.
> > I think we have to be very careful and think this through.
>
> I agree; that was kind of my point. If you were to expose a clone file
> or something for processes, you'd suddenly have to confront a number
> of emergent behaviors that, perhaps, hadn't been anticipated.
> Reducing complexity in one area (one fewer system call! Yay!) would be
> replaced by complexity in another area (weird stuff happens when you
> clone over a network! Boo....). Like I said, it's a tradeoff.
>
> > But I'd love a good proposal how to send beta-level simulations of my
> > process and process context across universes at FTL speed. This would
> > require solving the problem with dangling FDs, which would be a much
> > greater excuse to rethink how our FDs (and syscalls) work. The current
> > best-practice implementations of "global namespace" will not be good
> > enough for cp /mnt/earth/proc/123 /mnt/alphacentauri/proc/. FDs would
> > need some kind of magic cross-kernel redirection, or we replace FDs
> > with something that has actual names.
> > Until then I think it's ok to fork.
> >
> > > > and the syscall interface, remember also that
> > > > userland Libraries can be used to hide the sequence into a single
> > > > function call.
> > >
> > > Yes, but at a significant cost in overhead.
> > >
> > > > BTW: how often do we need accurate time?
> > >
> > > Oh, you need it All. The. Time. On busy servers, many, many times a
> > > second. Rates in the kilohertz, spread across many cores.
> >
> > time yes, but accurate time? why?
> > maybe the problem is that not all of us here assume that even
> > high-resolution timestamps can also become practically inaccurate.
> > how accurate are we speaking? personally, i don't get it at all. in
> > 99% of all cases i just care about 1-second resolution, as long as
> > everything is monotonic.
> > in the remaining 1% i might misuse time as a half-assed indicator of
> > ordering (generally unreliable).
>
> So something that I've had to confront as a kernel person is that the
> universe of interesting userspace software is very large, and part of
> my job is to accommodate big chunks of it. I can argue that userspace
> software that doesn't work the way I think it ought to should change,
> but I can't really control that. If I'm lucky, I can convince some
> people to change their programs; if I'm really lucky, they may think
> my way is better and push for it in other systems. But most of the
> time, if I want folks to use my system I've got to accommodate them
> somehow. At all times, I _should_ try to understand their use cases.
>
> Moreover, userspace stuff may be driven by considerations that are
> much different than my own. But the overall point is that what I care
> about based on my own sense of aesthetics and experience may not be
> all that relevant to someone else, and generally speaking, those folks
> aren't dummies.
>
> > > Of course it depends on the application, but in a lot of production
> > > settings, the need is almost absurdly frequent and the overhead of
> > > even a system call is absolutely brutal,
> >
> > I feared the same. Maybe Russ missed this. I shall inquire.
>
> It turns out it wasn't about that after all. Me culpa there. :-)
>
> > > much effort in the Linux world to get that stuff into userspace: first
> > > by doing an initial read and then incrementing off of the TSC, and
> > > then by using e.g. the vdso.
> >
> > Yeah, it's complex, but I can imagine it's worth it for some edge-cases.
> >
> > > Go cares a lot because they do a lot with high-precision timers in the
> > > runtime.
> >
> > Do you mean TSC, or actual high-accuracy time sources?
> > TSC is cheap and would make sense to me. Other kinds of timers would
> seem fishy.
>
> I dunno. I don't actually know what Go chooses to use as a timing
> source in the runtime. I do know that their runtime maintains a
> callout list ordered by time, and that they cancel a lot of timers, so
> they keep track of time pretty aggressively. Clearly, the runtime
> reacts badly to non-monotonic time.
>
> > > Most 9fans probably don't need it, but on the high end you
> > > really want to reduce the overhead as much as possible; the closer you
> > > can get it just being a load from RAM the better.
> >
> > Time isn't generated by RAM. It's a property of the universe. :P
>
> Heh. What I mean is retrieving it in a userspace program. If an
> accurate time value was available as cheaply as a load, that would be
> ideal.
>
> > > I think it's being taken as a given that the overhead of
> > > open/read/close is too high, and so if one wants to retain the file
> > > metaphor for accessing time, that implies keeping an open FD to some
> > > device around. But file descriptors are a process-global resource, and
> > > can be closed by totally unrelated bits of code: there's no concept of
> > > a "private" FD in the way that there is for "private memory", such as
> > > a static inside of a function, or the stack segment of a process or
> > > something. So whatever code is used to return the time has to deal
> > > with a bunch of weird failure cases, which adds both complexity _and_
> > > overhead.
> >
> > Yes, which is no different to how we already have to handle stdin/out
> etc.
> > adding a stdtime in golang doesn't seem like too much clutter to me
> personally.
> > if all code only ever does single reads on the time fd, i see no
> > problem with parallel access either.
> > i see no reason why any process should close the time fd, and given
> > that, where would those weird failure cases come from?
>
> Here's where I admit I veered very far from the context. My premise,
> that open/read/close being too expensive was accepted as a given, was
> simply wrong.
>
> > And there's the matter of what format the data should take:
> > > traditionally, plan 9 would return text (e.g., `cat /dev/time` right
> > > now). That adds overhead since it probably has to be parsed. If you
> > > need nanosecond resolution for the phase-locked plesiosynchronous
> > > timer that ensures consistency across your HPC cluster or whatever,
> >
> > are they making PLL as a service in the cloud now?
> > seriously, for *actual* practically high-accuracy low-latency work you
> > would not want scheduling delays and syscall overheads.
>
> Consider something like NIX, where you've got cores that are
> effectively dedicated to a particular job; akaros did similar things
> with spatial scheduling across lots of cores. There's no scheduling
> overhead, really, because nothing is scheduled on those cores; with
> Akaros, interference from the OS was almost exclusively something the
> program itself requested, up to things required for the machine to
> work reliably (like TLB shootdowns). So once you've chopped out that
> noise, you're left with bounding the overhead of getting the value as
> closely as you can, and minimizing that overhead as much as you can.
> That's doable, as some of the HARE work showed (I think Ron's curried
> syscall closures are really pretty slick).
>
> > sure you can have high-accuracy timestamps at high latencies, no
> > problem. but if the latency is high, again, why did you want
> > high-accuracy in the first place?
>
> It really depends on the application, doesn't it? Again, there's a lot
> of userspace software out there; a lot more than kernel software. I
> really can't speak to all of it. Perhaps this is a good question to
> ask, "Have you considered another way of doing it?" But if we get a
> good answer to that question ("yes, but we're doing it this way
> because X, Y and Z..."), we've got to deal with that. In that spirit,
> I kind of feel like this is asking the wrong question.
>
> > maybe just use dedicated hardware like everybody else?
>
> I'm not sure what dedicated hardware you are referring to?
>
> > or better, show me your magical "real" "time" system that can do
> > anything close to this.
> > you just can't have high accuracy scheduling with guarantees AND lower
> > latency than usual AND high processing power just by .
> >
> > > you may not be able to easily provide that if you've got to read and
> > > parse the data. Presumably a system call just copies into a
> > > native-order binary vlong or something.
> >
> > Unlike simple counters like TSC, time/date generally has to be parsed
> > to be meaningful. that parsing is likely expensive cause multiple
> > political systems, calendars, timezones, and many historical events
> > have to be taken into account potentially...
> > yes it sounds like too much overhead for my taste. just use a simple
> > counter like TSC.
> >
> > > Rather, it is about identifying a specific need and the best way to
> > > serve that need. In the context of a plan9 system, I would argue that
> > > a system call for high precision timing is actually a pretty good
> > > engineering tradeoff: it's not a terribly invasive change, avoids a
> > > lot of complex (and honestly kind of weird) edge cases, and provides
> > > an adequate, if imperfect, solution to a real-world problem.
> >
> > What exactly is that real-world problem? it has not been described
> > yet. Everything was vague, HPC was mentioned now as very sensitive (to
> > slow hardware, phase noise, etc.), but which exact HPC SOFTWARE
> > problem is this software solution solving?
> >
> > > Oh, that hasn't been a problem for decades now. No offense to your
> > > python expert, but I think they are probably mistaken.
> >
> > That's a relief tbh :)
> >
> > > Incidentally, modern VMs were introduced for flexibility and
> > > utilization, much in the same way that containers were. I don't think
> > > considerations about only being able to hold X many files open at a
> > > time really entered into it. Sockets maybe, but I doubt it was an
> > > upper bound on the number of socket descriptors available so much as
> > > scalability in a particular system; e.g., POSIX (and plan 9) semantics
> > > around file descriptor allocation practically force it to be quadratic
> > > unless you're willing to throw a fair bit of per-process complexity at
> > > it; if you've got a lot of network connections coming and going, that
> > > could be a bottleneck.
> >
> > yes, could have been sockets and not fds.
> >
> > > Yeah. It's more an issue of what happens to the library if some other
> > > bit of code in the process yanks the FD out from under it. If some FDs
> > > become magical, in that you start to restrict whether they can be
> > > closed, or by who, or whatever, then how far do you push the
> > > complexity that that approach implies? And is it worth it to avoid
> > > just adding a system call?
> >
> > Isn't that issue shared by all programs that use any FDs? Have you
> > seen any code here that goes around and closes all FDs?
>
> Not only have I seen that code, I've written it: `for (int i = 3; i <
> whatever; i++) close(i); // Whoops.` Probably not my finest work, but
> a lot of assumptions are hidden in a lot of code.
>
> The cool thing about the curried syscall stuff is that it's immune to
> this: the file descriptor can be closed, but the kernel retains a
> reference to the Chan for the time device for the duration of the
> process. That's a very elegant solution.
>
> - Dan C.
>
[-- Attachment #2: Type: text/html, Size: 34374 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [9front] Re: [9fans] monotonic time and randomness on Plan 9
2025-03-17 15:49 ` Dan Cross
2025-03-17 22:13 ` ron minnich
@ 2025-03-18 2:16 ` hiro
1 sibling, 0 replies; 27+ messages in thread
From: hiro @ 2025-03-18 2:16 UTC (permalink / raw)
To: 9front
ok, this cleared up a lot for me. thanks for the lengthy answer.
and i have finished reading the other thread, too in the meantime.
and ron, thanks for bringing up monotonic time actually being
difficult. now that i think about it, you must be quite right, even
monotonic counting might be harder than i assumed at first on
multicore.
a single monotonic counter: trivial
many unsynchronized monotonic counters: trivial
guaranteeing they will appear monotonic to a process that switches
between cores: uhhhh
and then you still have to add time to this offset: oh no, i give up...
thanks everyone, these small changes served as a very informative
example to study.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-03-18 2:18 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CADSkJJUJrSNjFUkD_M1nHGY=9E5ABsu8oJZmHZhX1=mDiHCKmw@mail.gmail.com>
[not found] ` <50091B7893081EF7BCE34BCE6AE78DB2@eigenstate.org>
[not found] ` <CADSkJJXo2s-ttzB+3S0Nb_GQH35Fq9kjNrodUOEO6a8qngTfSA@mail.gmail.com>
2025-03-12 10:50 ` [9front] Re: [9fans] monotonic time and randomness on Plan 9 hiro
2025-03-12 11:51 ` [9front] " Jamie McClymont
2025-03-12 12:36 ` hiro
2025-03-12 12:28 ` [9front] " Dan Cross
2025-03-12 13:15 ` hiro
2025-03-12 14:59 ` Dan Cross
2025-03-14 23:33 ` hiro
2025-03-17 15:49 ` Dan Cross
2025-03-17 22:13 ` ron minnich
2025-03-18 2:16 ` hiro
2025-03-12 15:02 ` ron minnich
2025-03-12 17:43 ` Dan Cross
2025-03-12 21:30 ` qwx
2025-03-12 15:27 ` Kurt H Maier
2025-03-12 15:50 ` Dan Cross
2025-03-12 19:51 ` Kurt H Maier
2025-03-12 19:58 ` Dan Cross
2025-03-12 20:11 ` Kurt H Maier
2025-03-14 12:04 ` theinicke
2025-03-14 21:26 ` Kurt H Maier
2025-03-14 22:41 ` Paul Lalonde
2025-03-15 1:19 ` Lyndon Nerenberg (VE7TFX/VE6BBM)
2025-03-15 10:21 ` [9front] " Shawn Rutledge
2025-03-15 13:33 ` Paul Lalonde
2025-03-12 20:01 ` [9front] " Jacob Moody
2025-03-13 3:15 ` Ori Bernstein
2025-03-13 16:32 ` Dan Cross
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).