9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] monotonic time and randomness on Plan 9
@ 2025-03-10 15:13 Russ Cox
  2025-03-10 16:51 ` Skip Tavakkolian
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Russ Cox @ 2025-03-10 15:13 UTC (permalink / raw)
  To: 9fans; +Cc: cinap lenrek

[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]

Hi all,

Cinap said out in the other thread that nsec had been added and then
abandoned because it wasn't right. That turns out only to be half wrong -
it's not true today but it probably should be true in the future. We do
need a time-related special system call, but not that one.

I just saw a Go program crash because it observed monotonic time move
backward. That happened because on Plan 9, Go does not have easy access to
monotonic time, only Unix time. And when Unix time moves backward (like
timesync makes it do) then Go sees that as monotonic time moving backward.
The ironic thing is that #c/bintime has all the info Go needs, but Go
stopped using it.

The nsec system call was added to avoid needing to keep #c/bintime open in
all programs, avoid the problems of it accidentally using a standard fd (0
1 2) etc. But nsec is too specialized. bintime returns more than just Unix
nanoseconds. The right answer would have been to add a readbintime(p, n)
system call that acts like pread(/dev/bintime, p, n, 0), dispatching to the
kernel's readbintime function. I suggest we actually do that, which would
make monotonic time access work right.

While we are avoiding pre-opened file descriptors, the other thing modern
operating systems have come to realize is that /dev/random is important
enough to be able to access without a file descriptor. It would be good to
add a readcrypto(p, n) system call at the same time.

Perhaps there should not be two new system calls. Perhaps it should be one
new readspecial(id, p, n) system call.

Or perhaps there should be no new system calls, and instead pread should
accept a few distinguished negative file descriptors. Obviously fd=-1 has
to keep returning an error, but perhaps we should define that -2 is
#c/bintime and -3 is #c/random. Or if -2 is too close to -1, we could use
-1000 and -1001.

Personally I think the negative numbers are a bit too special, and I'd be
inclined to add two new system calls kread(kfd, data, n, off) and
kwrite(kfd, data, n, off), which are like pread and pwrite except that they
operate on "kernel file descriptors", which are small integers that are
always open and refer to specific kernel resources. The initial set of
kernel file descriptors are

    0 #c/bintime
    1 #c/random

This set could be extended over time; use of an unrecognized kfd would
return an error. This approach solves the "keep special fds open" problem
directly, without abandoning Plan 9's "everything is a file" quite as much
as nsec(2) does or readbintime(2) or readcrypto(2) would.

Thoughts?

Best,
Russ

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Mc872d415f55dab215c26a530
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 3652 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 15:13 [9fans] monotonic time and randomness on Plan 9 Russ Cox
@ 2025-03-10 16:51 ` Skip Tavakkolian
  2025-03-10 17:09 ` ron minnich
  2025-03-10 17:12 ` ori
  2 siblings, 0 replies; 16+ messages in thread
From: Skip Tavakkolian @ 2025-03-10 16:51 UTC (permalink / raw)
  To: 9fans; +Cc: cinap lenrek

[-- Attachment #1: Type: text/plain, Size: 3581 bytes --]

Following the magic fd's approach, would it make sense to have a
#c/magicfds that can be read to get the list of the FDs or an ID written
into it to get the an existing or new FD returned?

On Mon, Mar 10, 2025, 8:17 AM Russ Cox <rsc@swtch.com> wrote:

> Hi all,
>
> Cinap said out in the other thread that nsec had been added and then
> abandoned because it wasn't right. That turns out only to be half wrong -
> it's not true today but it probably should be true in the future. We do
> need a time-related special system call, but not that one.
>
> I just saw a Go program crash because it observed monotonic time move
> backward. That happened because on Plan 9, Go does not have easy access to
> monotonic time, only Unix time. And when Unix time moves backward (like
> timesync makes it do) then Go sees that as monotonic time moving backward.
> The ironic thing is that #c/bintime has all the info Go needs, but Go
> stopped using it.
>
> The nsec system call was added to avoid needing to keep #c/bintime open in
> all programs, avoid the problems of it accidentally using a standard fd (0
> 1 2) etc. But nsec is too specialized. bintime returns more than just Unix
> nanoseconds. The right answer would have been to add a readbintime(p, n)
> system call that acts like pread(/dev/bintime, p, n, 0), dispatching to the
> kernel's readbintime function. I suggest we actually do that, which would
> make monotonic time access work right.
>
> While we are avoiding pre-opened file descriptors, the other thing modern
> operating systems have come to realize is that /dev/random is important
> enough to be able to access without a file descriptor. It would be good to
> add a readcrypto(p, n) system call at the same time.
>
> Perhaps there should not be two new system calls. Perhaps it should be one
> new readspecial(id, p, n) system call.
>
> Or perhaps there should be no new system calls, and instead pread should
> accept a few distinguished negative file descriptors. Obviously fd=-1 has
> to keep returning an error, but perhaps we should define that -2 is
> #c/bintime and -3 is #c/random. Or if -2 is too close to -1, we could use
> -1000 and -1001.
>
> Personally I think the negative numbers are a bit too special, and I'd be
> inclined to add two new system calls kread(kfd, data, n, off) and
> kwrite(kfd, data, n, off), which are like pread and pwrite except that they
> operate on "kernel file descriptors", which are small integers that are
> always open and refer to specific kernel resources. The initial set of
> kernel file descriptors are
>
>     0 #c/bintime
>     1 #c/random
>
> This set could be extended over time; use of an unrecognized kfd would
> return an error. This approach solves the "keep special fds open" problem
> directly, without abandoning Plan 9's "everything is a file" quite as much
> as nsec(2) does or readbintime(2) or readcrypto(2) would.
>
> Thoughts?
>
> Best,
> Russ
> *9fans <https://9fans.topicbox.com/latest>* / 9fans / see discussions
> <https://9fans.topicbox.com/groups/9fans> + participants
> <https://9fans.topicbox.com/groups/9fans/members> + delivery options
> <https://9fans.topicbox.com/groups/9fans/subscription> Permalink
> <https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Mc872d415f55dab215c26a530>
>

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M7f82561d19843787329e9834
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 4186 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 15:13 [9fans] monotonic time and randomness on Plan 9 Russ Cox
  2025-03-10 16:51 ` Skip Tavakkolian
@ 2025-03-10 17:09 ` ron minnich
  2025-03-10 17:12 ` ori
  2 siblings, 0 replies; 16+ messages in thread
From: ron minnich @ 2025-03-10 17:09 UTC (permalink / raw)
  To: 9fans; +Cc: cinap lenrek

[-- Attachment #1: Type: text/plain, Size: 6265 bytes --]

I think you can preserve the plan 9 model, get an efficient solution, and
do so without adding system calls. We did it on Blue Gene.

On Blue Gene, we had issues with using /dev/bintime because even doing a
read(fd, pointer, size) had a lot of jitter, due to operations such as
okaddr.  Overhead was even more critical for the global barrier network
(gib), which could do a full barrier across 64k nodes in 125ns.

Paper here:
https://www.researchgate.net/publication/265264528_Using_Currying_and_process-private_system_calls_to_break_the_one-microsecond_system_call_barrier

The model we used implemented currying of syscall arguments: write(fd,
pointer, size) became a new system call, with the fd and pointer verified;
and process private system calls, so that we were not polluting the global
system call table with a bunch of new system calls. Process private system
calls build on Plan 9's model of per-process resources; it worked very well
for us.

We added the support via the gib ctl file, so we did not need to
change system calls. In the example below, gib is the global barrier, and
gibctl is its ctl file.

int cfd, gdf, scnum=256;
char area[1], cmd[256];
gfd = open("/dev/gib", ORDWR);
cfd = open("/dev/gib0ctl", OWRITE);
cmd = smprint("fastwrite %d %d 0x%p %d", scnum, fd, area, sizeof(area));
write(cfd, cmd, strlen(cmd));
close(cfd);
docall(scnum);

We could keep calling docall(scnum), and most of the code in syscall for
checking was bypassed. A write or read system called had all the overhead
of sysr1.

From the paper: "With the traditional
write path, it took approximately 3,000 cycles per write. Since the BG/P
uses
850 MHz PowerPC processors, this means a normal write takes approximately
3.529 microseconds. However, when using the private system calls, it only
takes
around 620 cycles to do a write, or 0.729 microseconds."

This equaled the performance of an OS bypass solution, while not bypassing
the OS.

The bigger factor was the lack of jitter for IO. All the checking in
syscall is done once, not on every call.

We measured every program that ran on Plan 9 over a period of days, over a
LOT of system calls, and it turned out, for the most part, programs call
read and write with the the same fd, the same address, and the same size
(typically well under a page), so locking that fd, address, and size down
are a pretty big win.

Had we been able to use this for making bintime efficient, no nsec() system
call would ever have been needed. I don't think your kread and kwrite are
needed either.

It's nice to avoid adding more system calls, because as nsec() shows, if
they're not right, we're still stuck with them forever.

I think process private system calls can provide what you want. The code is
still out there in the blue gene kernel for plan 9. It's very small.


On Mon, Mar 10, 2025 at 8:17 AM Russ Cox <rsc@swtch.com> wrote:

> Hi all,
>
> Cinap said out in the other thread that nsec had been added and then
> abandoned because it wasn't right. That turns out only to be half wrong -
> it's not true today but it probably should be true in the future. We do
> need a time-related special system call, but not that one.
>
> I just saw a Go program crash because it observed monotonic time move
> backward. That happened because on Plan 9, Go does not have easy access to
> monotonic time, only Unix time. And when Unix time moves backward (like
> timesync makes it do) then Go sees that as monotonic time moving backward.
> The ironic thing is that #c/bintime has all the info Go needs, but Go
> stopped using it.
>
> The nsec system call was added to avoid needing to keep #c/bintime open in
> all programs, avoid the problems of it accidentally using a standard fd (0
> 1 2) etc. But nsec is too specialized. bintime returns more than just Unix
> nanoseconds. The right answer would have been to add a readbintime(p, n)
> system call that acts like pread(/dev/bintime, p, n, 0), dispatching to the
> kernel's readbintime function. I suggest we actually do that, which would
> make monotonic time access work right.
>
> While we are avoiding pre-opened file descriptors, the other thing modern
> operating systems have come to realize is that /dev/random is important
> enough to be able to access without a file descriptor. It would be good to
> add a readcrypto(p, n) system call at the same time.
>
> Perhaps there should not be two new system calls. Perhaps it should be one
> new readspecial(id, p, n) system call.
>
> Or perhaps there should be no new system calls, and instead pread should
> accept a few distinguished negative file descriptors. Obviously fd=-1 has
> to keep returning an error, but perhaps we should define that -2 is
> #c/bintime and -3 is #c/random. Or if -2 is too close to -1, we could use
> -1000 and -1001.
>
> Personally I think the negative numbers are a bit too special, and I'd be
> inclined to add two new system calls kread(kfd, data, n, off) and
> kwrite(kfd, data, n, off), which are like pread and pwrite except that they
> operate on "kernel file descriptors", which are small integers that are
> always open and refer to specific kernel resources. The initial set of
> kernel file descriptors are
>
>     0 #c/bintime
>     1 #c/random
>
> This set could be extended over time; use of an unrecognized kfd would
> return an error. This approach solves the "keep special fds open" problem
> directly, without abandoning Plan 9's "everything is a file" quite as much
> as nsec(2) does or readbintime(2) or readcrypto(2) would.
>
> Thoughts?
>
> Best,
> Russ
> *9fans <https://9fans.topicbox.com/latest>* / 9fans / see discussions
> <https://9fans.topicbox.com/groups/9fans> + participants
> <https://9fans.topicbox.com/groups/9fans/members> + delivery options
> <https://9fans.topicbox.com/groups/9fans/subscription> Permalink
> <https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Mc872d415f55dab215c26a530>
>

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M8b2e41af7cb9d42af44daa90
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 7611 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 15:13 [9fans] monotonic time and randomness on Plan 9 Russ Cox
  2025-03-10 16:51 ` Skip Tavakkolian
  2025-03-10 17:09 ` ron minnich
@ 2025-03-10 17:12 ` ori
  2025-03-10 18:16   ` Russ Cox
  2 siblings, 1 reply; 16+ messages in thread
From: ori @ 2025-03-10 17:12 UTC (permalink / raw)
  To: 9fans; +Cc: cinap_lenrek

Monotonic time would be useful, though it would be better to do it
wtihout entering the kernel at all. Adding the last nsec/cycle pair
of the last context switch to the tos seems reasonable. For an example
of how it could look, see:

        /sys/src/cmd/vmx/nanosec.c

Though this still uses nsec to get the initial offset, and is not safe
for multiple procs, it should give an idea of the shape of the solution.
Doing it in userspace seems better [faster, cheaper] than doing it with
a new syscall. For go's needs, /dev/bintime could be opened once at the
start of the program, the base time initialized, and then bintime could
be closed.

As far as file descriptors: My recollection from talking with folks at
hackathons is that OpenBSD added getentropy() in large due to the needs
of pledge and chroot: if you cut off file system access, it becomes hard
to access /dev/random. If it's hard to construct sub-namespaces with devs,
it becomes hard to access /dev/random.  Other systems seem to have followed
suit for similar reasons.

Our situation is different; we approach sandboxing through controlling the
namespace[1], and it's easy to build a namespace where all the expected
names are in place.

I don't buy the argument around accidentally reusing file descriptors; any
file descriptor can end up in one of those slots, bintime, random, et al.
aren't special in that respect.

The reason contemporary operating systems can't deal with opening these
files is because they want to be very strict about access to the namespace,
or because the tools they use are anemic when it comes to namespacing. We
don't need to port over foreign system's problems; we have enough of our own.

[1] we could do a better job, but there are steps being made. We've largely
    replaced 'rfork M' with a more granular option, have private /srv, etc.

Quoth Russ Cox <rsc@swtch.com
> Hi all,
> 
> Cinap said out in the other thread that nsec had been added and then
> abandoned because it wasn't right. That turns out only to be half wrong -
> it's not true today but it probably should be true in the future. We do
> need a time-related special system call, but not that one.
> 
> I just saw a Go program crash because it observed monotonic time move
> backward. That happened because on Plan 9, Go does not have easy access to
> monotonic time, only Unix time. And when Unix time moves backward (like
> timesync makes it do) then Go sees that as monotonic time moving backward.
> The ironic thing is that #c/bintime has all the info Go needs, but Go
> stopped using it.
> 
> The nsec system call was added to avoid needing to keep #c/bintime open in
> all programs, avoid the problems of it accidentally using a standard fd (0
> 1 2) etc. But nsec is too specialized. bintime returns more than just Unix
> nanoseconds. The right answer would have been to add a readbintime(p, n)
> system call that acts like pread(/dev/bintime, p, n, 0), dispatching to the
> kernel's readbintime function. I suggest we actually do that, which would
> make monotonic time access work right.
> 
> While we are avoiding pre-opened file descriptors, the other thing modern
> operating systems have come to realize is that /dev/random is important
> enough to be able to access without a file descriptor. It would be good to
> add a readcrypto(p, n) system call at the same time.
> 
> Perhaps there should not be two new system calls. Perhaps it should be one
> new readspecial(id, p, n) system call.
> 
> Or perhaps there should be no new system calls, and instead pread should
> accept a few distinguished negative file descriptors. Obviously fd=-1 has
> to keep returning an error, but perhaps we should define that -2 is
> #c/bintime and -3 is #c/random. Or if -2 is too close to -1, we could use
> -1000 and -1001.
> 
> Personally I think the negative numbers are a bit too special, and I'd be
> inclined to add two new system calls kread(kfd, data, n, off) and
> kwrite(kfd, data, n, off), which are like pread and pwrite except that they
> operate on "kernel file descriptors", which are small integers that are
> always open and refer to specific kernel resources. The initial set of
> kernel file descriptors are
> 
> 0 #c/bintime
> 1 #c/random
> 
> This set could be extended over time; use of an unrecognized kfd would
> return an error. This approach solves the "keep special fds open" problem
> directly, without abandoning Plan 9's "everything is a file" quite as much
> as nsec(2) does or readbintime(2) or readcrypto(2) would.
> 
> Thoughts?
> 
> Best,
> Russ

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M131dd00372c17a4c1b62d3f5
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 17:12 ` ori
@ 2025-03-10 18:16   ` Russ Cox
  2025-03-10 22:28     ` ori
  0 siblings, 1 reply; 16+ messages in thread
From: Russ Cox @ 2025-03-10 18:16 UTC (permalink / raw)
  To: 9fans; +Cc: cinap_lenrek

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Thanks for the feedback so far. Correcting cinap's email address on this
reply.

I don't believe having a #c/magicfds file is a net win. Certainly if it is
writable / configurable, then much of the benefit is lost. The benefit is
not having to do set up like opening files and holding an fd open at
process startup, even temporarily. Like rfork(2) or sbrk(2) or other system
calls, I argue that these specific operations (read time and read entropy)
have become so fundamental to modern programs that they should be available
without the ceremony and conceptual overhead of managing long-term fds. A
readable one might be useful as documentation, but if it's a hard-coded set
then the kread(2) man page should suffice.

As for user-space time access, it is true that on 386 and amd64 we could
use RDTSC and saved clock parameters. We've actually moved away from that
on basically every operating system at this point, because the parameters
do change, and we need to know when to update them. Or the kernel would
have to share a memory page with the parameters with the user process, and
then that layout becomes part of the kernel interface. Of course we could
add shared code that the kernel leaves to be called as well. All of that
seems too complex and bespoke. (On Linux that's exactly what happens, but I
repeat myself.) And I don't know what we'd do instead on arm.

The Blue Gene system call batching also seems a bit much to me, especially
since (1) we want to not have to keep the fd open, (2) we don't want to
hard-code the target pointer for the read, and (3) ideally we don't want to
go through a per-process setup dance that includes opening files each time
a process is created. I wouldn't mind the per-process setup if (1) and (2)
were not problems as well.

Best,
Russ

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M8cf21f21f8efb2e6ad43994c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 2638 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 18:16   ` Russ Cox
@ 2025-03-10 22:28     ` ori
  2025-03-10 22:44       ` ori
  0 siblings, 1 reply; 16+ messages in thread
From: ori @ 2025-03-10 22:28 UTC (permalink / raw)
  To: 9fans; +Cc: cinap_lenrek

From my recollection talking to the OpenBSD hackers that added
them, these syscalls were designed to solve 1.5 problems. 

1: Primarly, sandboxes can interfere with getting the right files
   in the right place.

1.5; If a program is compromised, shellcode can dup over the
  random fd, allowing an attacker to replace a source of randomness
  with a known stream of values, which could lead to secretly broken
  cryptography.

For problem 1, I don't think we have an issue getting the right files
into sandboxes. We have to get our namespaces right, because so many of
important resources are named, rather than accessed by ioctl or syscall.
And because we have to get everything else right, we also have to get
access to bintime and random right.

Note that for similar interaction with sandbox reasons, OpenBSD is adding
special hooks to the kernel for yp(8) https://man.openbsd.org/yp. If our
goal is to have go work in incomplete sandbox environments, we have a lot
of work cut out for us; I don't think it's a worthwhile goal.

For problem 2: If a program is compromised, there are so many other more
interesting things it can do. However, if that's your concern, we should
look into more general security hardening.

If the goal is performance, any syscall is much of a muchness, and keeping
things in userspace is the way to go. As far as this:

> Or the kernel would have to share a memory page with the parameters with
> the user process, and  then that layout becomes part of the kernel interface.

We already have that with the TOS. It even has a low res clock in there.

The last thing mentioned was accidentally opening things over standard FDs.
This strikes me as a very odd concern. If a system is going to accidentally
clobber a standard fd, it's going to be one of the calls churning through file
descriptors, and not one of these nearly static calls.

So, while I undersatnd the desire for these calls in other contexts, the
reasons for adding them don't seem applicable.


Quoth Russ Cox <rsc@swtch.com>:
> Thanks for the feedback so far. Correcting cinap's email address on this
> reply.
> 
> I don't believe having a #c/magicfds file is a net win. Certainly if it is
> writable / configurable, then much of the benefit is lost. The benefit is
> not having to do set up like opening files and holding an fd open at
> process startup, even temporarily. Like rfork(2) or sbrk(2) or other system
> calls, I argue that these specific operations (read time and read entropy)
> have become so fundamental to modern programs that they should be available
> without the ceremony and conceptual overhead of managing long-term fds. A
> readable one might be useful as documentation, but if it's a hard-coded set
> then the kread(2) man page should suffice.
> 
> As for user-space time access, it is true that on 386 and amd64 we could
> use RDTSC and saved clock parameters. We've actually moved away from that
> on basically every operating system at this point, because the parameters
> do change, and we need to know when to update them. Or the kernel would
> have to share a memory page with the parameters with the user process, and
> then that layout becomes part of the kernel interface. Of course we could
> add shared code that the kernel leaves to be called as well. All of that
> seems too complex and bespoke. (On Linux that's exactly what happens, but I
> repeat myself.) And I don't know what we'd do instead on arm.
> 
> The Blue Gene system call batching also seems a bit much to me, especially
> since (1) we want to not have to keep the fd open, (2) we don't want to
> hard-code the target pointer for the read, and (3) ideally we don't want to
> go through a per-process setup dance that includes opening files each time
> a process is created. I wouldn't mind the per-process setup if (1) and (2)
> were not problems as well.
> 
> Best,
> Russ

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M2157559319a4d6f3dfb00ca0
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 22:28     ` ori
@ 2025-03-10 22:44       ` ori
  2025-03-11  1:55         ` Charles Forsyth
  0 siblings, 1 reply; 16+ messages in thread
From: ori @ 2025-03-10 22:44 UTC (permalink / raw)
  To: 9fans; +Cc: cinap_lenrek

Quoth ori@eigenstate.org:
> If the goal is performance, any syscall is much of a muchness, and keeping
> things in userspace is the way to go. As far as this:
> 
> > Or the kernel would have to share a memory page with the parameters with
> > the user process, and  then that layout becomes part of the kernel interface.
> 
> We already have that with the TOS. It even has a low res clock in there.
> 

Also, before I forget: on 9front, userspace uses /dev/bintime; this has
come in handy in the past because I've bound static files over it for
testing. Losing that interposability would be a downside.


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M648d103890ab2ef0051f3dd3
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-10 22:44       ` ori
@ 2025-03-11  1:55         ` Charles Forsyth
  2025-03-11  5:34           ` ron minnich
  0 siblings, 1 reply; 16+ messages in thread
From: Charles Forsyth @ 2025-03-11  1:55 UTC (permalink / raw)
  To: 9fans; +Cc: cinap_lenrek

[-- Attachment #1: Type: text/plain, Size: 973 bytes --]

I've done that often enough!

On Mon, 10 Mar 2025 at 22:46, <ori@eigenstate.org> wrote:

> Quoth ori@eigenstate.org:
> > If the goal is performance, any syscall is much of a muchness, and
> keeping
> > things in userspace is the way to go. As far as this:
> >
> > > Or the kernel would have to share a memory page with the parameters
> with
> > > the user process, and  then that layout becomes part of the kernel
> interface.
> >
> > We already have that with the TOS. It even has a low res clock in there.
> >
> 
> Also, before I forget: on 9front, userspace uses /dev/bintime; this has
> come in handy in the past because I've bound static files over it for
> testing. Losing that interposability would be a downside.
> 

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M64df585a7a55e8f39ce8e734
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 2335 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11  1:55         ` Charles Forsyth
@ 2025-03-11  5:34           ` ron minnich
  2025-03-11 17:07             ` Russ Cox
  0 siblings, 1 reply; 16+ messages in thread
From: ron minnich @ 2025-03-11  5:34 UTC (permalink / raw)
  To: 9fans; +Cc: cinap_lenrek

[-- Attachment #1: Type: text/plain, Size: 3773 bytes --]

it seems /dev/bintime will continue to be used no matter what.

The Blue Gene approach, I understand, won't quite do the trick, although it
was what we came up with after rejecting the "magic negative fd" and
kread/kwrite system call approach (we tried a lot of options in 2007). The
"magic negative fd" starts to look like ioctl, and we weren't comfortable
going down that road. The new system calls seem to only complicate the
picture. If I had to do this all over again, given the options presented so
far, I'd do what we did on Blue Gene. It preserves the use of /dev/bintime
as a source. Note, also, that the fd need not stay allocated from user
mode. Part of the currying involves getting the chan for the fd, and doing
an incref on that. User mode can close the fd once the process private
system call is set up. But, I understand if this seems a bit too much for
the task at hand.

In the HPC days we had realized that we could just return the time in nsecs
in an unused register, since the toolchain is caller save; user code could
decide to ignore that register, save for the one or two programs that use
it. This had the nice property of making system call measurement easy.

 Further, one could shortstop sysr1 very early in the assembly language
syscall prolog, and the net effect would be a very low overhead system call
that happens to return time. We ended the project before trying that out.
This is a bit nicer than using TOS because it has zero impact on cache
lines or any other memory subsystem hardware. In the best case, it's just a
register to register move. I'm experimenting with that now on risc-v on
linux.

The low-res-clock-on-TOS is really nice, esp if it derives from a stable
clock. How precise does the clock need to be for Go? nsec? microsecond?

Also, for many reasons, I'd rather not have to do a system call to get a
precise clock. This kind of thing matters in HPC, and I'm back in that
world, so I care again :-)

Could there be two more TOS variables, M and N, such that nsec = (rdtsc *
N) / M? I.e. the kernel passes calibration on TOS and user code uses it to
compute nsec, not having to leave user mode? This M and N approach is very
commonly used in clock trees, which uses integers, not floating point, to
implement clock scaling.



On Mon, Mar 10, 2025 at 7:27 PM Charles Forsyth <charles.forsyth@gmail.com>
wrote:

> I've done that often enough!
>
> On Mon, 10 Mar 2025 at 22:46, <ori@eigenstate.org> wrote:
>
>> Quoth ori@eigenstate.org:
>> > If the goal is performance, any syscall is much of a muchness, and
>> keeping
>> > things in userspace is the way to go. As far as this:
>> >
>> > > Or the kernel would have to share a memory page with the parameters
>> with
>> > > the user process, and  then that layout becomes part of the kernel
>> interface.
>> >
>> > We already have that with the TOS. It even has a low res clock in there.
>> >
>> 
>> Also, before I forget: on 9front, userspace uses /dev/bintime; this has
>> come in handy in the past because I've bound static files over it for
>> testing. Losing that interposability would be a downside.
>> 
> *9fans <https://9fans.topicbox.com/latest>* / 9fans / see discussions
> <https://9fans.topicbox.com/groups/9fans> + participants
> <https://9fans.topicbox.com/groups/9fans/members> + delivery options
> <https://9fans.topicbox.com/groups/9fans/subscription> Permalink
> <https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M64df585a7a55e8f39ce8e734>
>

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Mfc8a8d467f4076826272ca27
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 5387 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11  5:34           ` ron minnich
@ 2025-03-11 17:07             ` Russ Cox
  2025-03-11 17:56               ` ron minnich
                                 ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Russ Cox @ 2025-03-11 17:07 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

As far as the top-of-stack info, it gets complicated. You have to hard-code
an algorithm that must be implemented in every user binary, and then it's a
pain to change. This is why in Go we've backed away from reading that info
on Mac and why we call into the VDSO on Linux instead of recreating that
code ourselves. Personally, I'm not too worried about the cost of fetching
the time. A system call is fine.

Based on this discussion, I've abandoned the idea of changing the system
calls, and I've updated Go to open /dev/bintime at startup and abandon
nsec. It also opens /dev/random at startup now too. If we're opening one,
two is not a big deal. That change is pending at https://go.dev/cl/656755.

However, a change to Plan 9 is still needed to provide monotonic time. At
first I was going to try to recreate it from the ticks and fasthz values in
/dev/bintime, but the value of fasthz can change over time as aux/timesync
deems it necessary, and if fasthz goes up, then 1e9*ticks/fasthz will go
down, making the derived time non-monotonic. It is also annoying to do that
calculation efficiently: more parameters are needed from the kernel.
Instead of exposing all those parameters, it is far easier and cleaner to
have the kernel maintain a monotonic time and simply expose that. I suggest
we add the monotonic time in nanoseconds as an extra field you can read
from /dev/time and /dev/bintime. If you ask for a big enough buffer, you
get it. If not, you don't. The diff is here:
https://github.com/rsc/plan9/commit/baf076425c.

Best,
Russ

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M94a3e524e0a77ddfec440be9
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 2470 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11 17:07             ` Russ Cox
@ 2025-03-11 17:56               ` ron minnich
  2025-03-14 19:18               ` ori
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: ron minnich @ 2025-03-11 17:56 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 2309 bytes --]

sounds good to me.

On Tue, Mar 11, 2025 at 10:43 AM Russ Cox <rsc@swtch.com> wrote:

> As far as the top-of-stack info, it gets complicated. You have to
> hard-code an algorithm that must be implemented in every user binary, and
> then it's a pain to change. This is why in Go we've backed away from
> reading that info on Mac and why we call into the VDSO on Linux instead of
> recreating that code ourselves. Personally, I'm not too worried about the
> cost of fetching the time. A system call is fine.
>
> Based on this discussion, I've abandoned the idea of changing the system
> calls, and I've updated Go to open /dev/bintime at startup and abandon
> nsec. It also opens /dev/random at startup now too. If we're opening one,
> two is not a big deal. That change is pending at https://go.dev/cl/656755.
>
> However, a change to Plan 9 is still needed to provide monotonic time. At
> first I was going to try to recreate it from the ticks and fasthz values in
> /dev/bintime, but the value of fasthz can change over time as aux/timesync
> deems it necessary, and if fasthz goes up, then 1e9*ticks/fasthz will go
> down, making the derived time non-monotonic. It is also annoying to do that
> calculation efficiently: more parameters are needed from the kernel.
> Instead of exposing all those parameters, it is far easier and cleaner to
> have the kernel maintain a monotonic time and simply expose that. I suggest
> we add the monotonic time in nanoseconds as an extra field you can read
> from /dev/time and /dev/bintime. If you ask for a big enough buffer, you
> get it. If not, you don't. The diff is here:
> https://github.com/rsc/plan9/commit/baf076425c.
>
> Best,
> Russ
> *9fans <https://9fans.topicbox.com/latest>* / 9fans / see discussions
> <https://9fans.topicbox.com/groups/9fans> + participants
> <https://9fans.topicbox.com/groups/9fans/members> + delivery options
> <https://9fans.topicbox.com/groups/9fans/subscription> Permalink
> <https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M94a3e524e0a77ddfec440be9>
>

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M1ca84cc218ac849453eac6b5
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 2934 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11 17:07             ` Russ Cox
  2025-03-11 17:56               ` ron minnich
@ 2025-03-14 19:18               ` ori
  2025-03-15  1:25               ` Alyssa M via 9fans
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: ori @ 2025-03-14 19:18 UTC (permalink / raw)
  To: 9fans

The interface seems reasonable at a first glance; I'll look closer at the code
very soon.

Quoth Russ Cox <rsc@swtch.com>:
> As far as the top-of-stack info, it gets complicated. You have to hard-code
> an algorithm that must be implemented in every user binary, and then it's a
> pain to change. This is why in Go we've backed away from reading that info
> on Mac and why we call into the VDSO on Linux instead of recreating that
> code ourselves. Personally, I'm not too worried about the cost of fetching
> the time. A system call is fine.
> 
> Based on this discussion, I've abandoned the idea of changing the system
> calls, and I've updated Go to open /dev/bintime at startup and abandon
> nsec. It also opens /dev/random at startup now too. If we're opening one,
> two is not a big deal. That change is pending at https://go.dev/cl/656755.
> 
> However, a change to Plan 9 is still needed to provide monotonic time. At
> first I was going to try to recreate it from the ticks and fasthz values in
> /dev/bintime, but the value of fasthz can change over time as aux/timesync
> deems it necessary, and if fasthz goes up, then 1e9*ticks/fasthz will go
> down, making the derived time non-monotonic. It is also annoying to do that
> calculation efficiently: more parameters are needed from the kernel.
> Instead of exposing all those parameters, it is far easier and cleaner to
> have the kernel maintain a monotonic time and simply expose that. I suggest
> we add the monotonic time in nanoseconds as an extra field you can read
> from /dev/time and /dev/bintime. If you ask for a big enough buffer, you
> get it. If not, you don't. The diff is here:
> https://github.com/rsc/plan9/commit/baf076425c.
> 
> Best,
> Russ

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Md69d5fa69a43388265201796
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11 17:07             ` Russ Cox
  2025-03-11 17:56               ` ron minnich
  2025-03-14 19:18               ` ori
@ 2025-03-15  1:25               ` Alyssa M via 9fans
  2025-03-15 22:58                 ` Ron Minnich
  2025-03-17  3:17               ` ori
  2025-04-07 18:40               ` ori
  4 siblings, 1 reply; 16+ messages in thread
From: Alyssa M via 9fans @ 2025-03-15  1:25 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

On Tuesday, March 11, 2025, at 5:09 PM, Russ Cox wrote:
> Personally, I'm not too worried about the cost of fetching the time. A system call is fine.
If a read is fast enough, then perhaps open, read, close would also be fast enough. Or could be made so.
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Maa0a8121ccf2eea17400a649
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 1068 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-15  1:25               ` Alyssa M via 9fans
@ 2025-03-15 22:58                 ` Ron Minnich
  0 siblings, 0 replies; 16+ messages in thread
From: Ron Minnich @ 2025-03-15 22:58 UTC (permalink / raw)
  To: 9fans, Russ Cox

So, Russ, just in case nobody else said it to you, thanks so much for
chasing this down. I think the solution you came up with is quite nice
-- need more precision? read more bits!  -- and, as a Go user on Plan
9, I'm glad to see Go continuing to work.

Go on Plan 9 is a first-class citizen in u-root, especially in our CI,
so it matters to us.


On Fri, Mar 14, 2025 at 6:37 PM Alyssa M via 9fans <9fans@9fans.net> wrote:
>
> On Tuesday, March 11, 2025, at 5:09 PM, Russ Cox wrote:
>
> Personally, I'm not too worried about the cost of fetching the time. A system call is fine.
>
> If a read is fast enough, then perhaps open, read, close would also be fast enough. Or could be made so.
> 9fans / 9fans / see discussions + participants + delivery options Permalink

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Md16ebecd8c403fac364e1d68
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11 17:07             ` Russ Cox
                                 ` (2 preceding siblings ...)
  2025-03-15  1:25               ` Alyssa M via 9fans
@ 2025-03-17  3:17               ` ori
  2025-04-07 18:40               ` ori
  4 siblings, 0 replies; 16+ messages in thread
From: ori @ 2025-03-17  3:17 UTC (permalink / raw)
  To: 9fans

Quoth Russ Cox <rsc@swtch.com>:
> As far as the top-of-stack info, it gets complicated. You have to hard-code
> an algorithm that must be implemented in every user binary, and then it's a
> pain to change. This is why in Go we've backed away from reading that info
> on Mac and why we call into the VDSO on Linux instead of recreating that
> code ourselves. Personally, I'm not too worried about the cost of fetching
> the time. A system call is fine.
> 
> Based on this discussion, I've abandoned the idea of changing the system
> calls, and I've updated Go to open /dev/bintime at startup and abandon
> nsec. It also opens /dev/random at startup now too. If we're opening one,
> two is not a big deal. That change is pending at https://go.dev/cl/656755.
> 
> However, a change to Plan 9 is still needed to provide monotonic time. At
> first I was going to try to recreate it from the ticks and fasthz values in
> /dev/bintime, but the value of fasthz can change over time as aux/timesync
> deems it necessary, and if fasthz goes up, then 1e9*ticks/fasthz will go
> down, making the derived time non-monotonic. It is also annoying to do that
> calculation efficiently: more parameters are needed from the kernel.
> Instead of exposing all those parameters, it is far easier and cleaner to
> have the kernel maintain a monotonic time and simply expose that. I suggest
> we add the monotonic time in nanoseconds as an extra field you can read
> from /dev/time and /dev/bintime. If you ask for a big enough buffer, you
> get it. If not, you don't. The diff is here:
> https://github.com/rsc/plan9/commit/baf076425c.
> 
> Best,
> Russ

Alright, here's a draft of a parallel change for 9front.

This also makes a related semantic change -- While we're
touching all of the todget calls, it seems like it'd be
sensible to trace our syscalls and edf deadlines using
monotonic timestamps too.

diff 6c59f4960d2641786557499443e6cdb5e250d064 uncommitted
--- a/sys/src/9/arm64/trap.c
+++ b/sys/src/9/arm64/trap.c
@@ -221,7 +221,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
                
                if(scallnr >= nsyscall || systab[scallnr] == nil){
@@ -244,7 +244,7 @@
        }
        ureg->r0 = ret;
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list) up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/cycv/trap.c
+++ b/sys/src/9/cycv/trap.c
@@ -233,7 +233,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
                
                if(scallnr >= nsyscall || systab[scallnr] == nil){
@@ -257,7 +257,7 @@
        
        ureg->r0 = ret;
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list) up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/kw/syscall.c
+++ b/sys/src/9/kw/syscall.c
@@ -186,7 +186,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
                if(scallnr >= nsyscall || systab[scallnr] == nil){
                        postnote(up, 1, "sys: bad sys call", NDebug);
@@ -218,7 +218,7 @@
        ureg->r0 = ret;
 
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list)up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/mt7688/syscall.c
+++ b/sys/src/9/mt7688/syscall.c
@@ -53,7 +53,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
 
                if(scallnr >= nsyscall || systab[scallnr] == nil){
@@ -89,7 +89,7 @@
        ureg->r1 = ret;
 
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list)up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/pc/devlml.c
+++ b/sys/src/9/pc/devlml.c
@@ -396,7 +396,7 @@
                statcom = lml->codedata->statCom[fno];
                jpgheader = (FrameHeader *)(lml->codedata->frag[fno].hdr + 2);
                jpgheader->frameNo = lml->jpgframeno;
-               jpgheader->ftime  = todget(nil);
+               jpgheader->ftime  = todget(nil, nil);
                jpgheader->frameSize = (statcom & 0x00ffffff) >> 1;
                jpgheader->frameSeqNo = statcom >> 24;
                wakeup(&lml->sleepjpg);
--- a/sys/src/9/pc/trap.c
+++ b/sys/src/9/pc/trap.c
@@ -493,7 +493,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
 
                if(scallnr >= nsyscall || systab[scallnr] == nil){
@@ -528,7 +528,7 @@
        ureg->ax = ret;
 
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list)up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/pc64/trap.c
+++ b/sys/src/9/pc64/trap.c
@@ -472,7 +472,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
                if(scallnr >= nsyscall || systab[scallnr] == nil){
                        postnote(up, 1, "sys: bad sys call", NDebug);
@@ -504,7 +504,7 @@
        }
 
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list)up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/port/devcons.c
+++ b/sys/src/9/port/devcons.c
@@ -817,7 +817,7 @@
 static uvlong uvorder = 0x0001020304050607ULL;
 
 static uchar*
-le2vlong(vlong *to, uchar *f)
+be2vlong(vlong *to, uchar *f)
 {
        uchar *t, *o;
        int i;
@@ -830,7 +830,7 @@
 }
 
 static uchar*
-vlong2le(uchar *t, vlong from)
+vlong2be(uchar *t, vlong from)
 {
        uchar *f, *o;
        int i;
@@ -845,7 +845,7 @@
 static long order = 0x00010203;
 
 static uchar*
-le2long(long *to, uchar *f)
+be2long(long *to, uchar *f)
 {
        uchar *t, *o;
        int i;
@@ -857,19 +857,6 @@
        return f+sizeof(long);
 }
 
-static uchar*
-long2le(uchar *t, long from)
-{
-       uchar *f, *o;
-       int i;
-
-       f = (uchar*)&from;
-       o = (uchar*)&order;
-       for(i = 0; i < sizeof(long); i++)
-               t[i] = f[o[i]];
-       return t+sizeof(long);
-}
-
 char *Ebadtimectl = "bad time control";
 
 /*
@@ -880,19 +867,20 @@
 static int
 readtime(ulong off, char *buf, int n)
 {
-       vlong   nsec, ticks;
+       vlong   nsec, ticks, mono;
        long sec;
-       char str[7*NUMSIZE];
+       char str[9*NUMSIZE];
 
-       nsec = todget(&ticks);
+       nsec = todget(&ticks, &mono);
        if(fasthz == 0LL)
                fastticks((uvlong*)&fasthz);
        sec = nsec/1000000000ULL;
-       snprint(str, sizeof(str), "%*lud %*llud %*llud %*llud ",
+       snprint(str, sizeof(str), "%*lud %*llud %*llud %*llud %*llud ",
                NUMSIZE-1, sec,
                VLNUMSIZE-1, nsec,
                VLNUMSIZE-1, ticks,
-               VLNUMSIZE-1, fasthz);
+               VLNUMSIZE-1, fasthz,
+               VLNUMSIZE-1, mono);
        return readstr(off, buf, n, str);
 }
 
@@ -926,23 +914,27 @@
 readbintime(char *buf, int n)
 {
        int i;
-       vlong nsec, ticks;
+       vlong nsec, ticks, mono;
        uchar *b = (uchar*)buf;
 
        i = 0;
        if(fasthz == 0LL)
                fastticks((uvlong*)&fasthz);
-       nsec = todget(&ticks);
+       nsec = todget(&ticks, &mono);
+       if(n >= 4*sizeof(uvlong)){
+               vlong2be(b+3*sizeof(uvlong), mono);
+               i += sizeof(uvlong);
+       }
        if(n >= 3*sizeof(uvlong)){
-               vlong2le(b+2*sizeof(uvlong), fasthz);
+               vlong2be(b+2*sizeof(uvlong), fasthz);
                i += sizeof(uvlong);
        }
        if(n >= 2*sizeof(uvlong)){
-               vlong2le(b+sizeof(uvlong), ticks);
+               vlong2be(b+sizeof(uvlong), ticks);
                i += sizeof(uvlong);
        }
        if(n >= 8){
-               vlong2le(b, nsec);
+               vlong2be(b, nsec);
                i += sizeof(vlong);
        }
        return i;
@@ -968,20 +960,20 @@
        case 'n':
                if(n < sizeof(vlong))
                        error(Ebadtimectl);
-               le2vlong(&delta, p);
+               be2vlong(&delta, p);
                todset(delta, 0, 0);
                break;
        case 'd':
                if(n < sizeof(vlong)+sizeof(long))
                        error(Ebadtimectl);
-               p = le2vlong(&delta, p);
-               le2long(&period, p);
+               p = be2vlong(&delta, p);
+               be2long(&period, p);
                todset(-1, delta, period);
                break;
        case 'f':
                if(n < sizeof(uvlong))
                        error(Ebadtimectl);
-               le2vlong(&fasthz, p);
+               be2vlong(&fasthz, p);
                if(fasthz <= 0)
                        error(Ebadtimectl);
                todsetfreq(fasthz);
--- a/sys/src/9/port/devloopback.c
+++ b/sys/src/9/port/devloopback.c
@@ -553,7 +553,7 @@
        bp = padblock(bp, Tmsize);
        if(BLEN(bp) < lb->minmtu)
                bp = adjustblock(bp, lb->minmtu);
-       ptime(bp->rp, todget(nil));
+       ptime(bp->rp, todget(nil, nil));
 
        link->packets++;
        link->bytes += n;
--- a/sys/src/9/port/devproc.c
+++ b/sys/src/9/port/devproc.c
@@ -281,7 +281,7 @@
        te->pid = p->pid;
        te->etype = etype;
        if (ts == 0)
-               te->time = todget(nil);
+               todget(nil, &te->time);
        else
                te->time = ts;
        tproduced++;
--- a/sys/src/9/port/edf.c
+++ b/sys/src/9/port/edf.c
@@ -195,7 +195,7 @@
                DPRINT("%lud release %lud[%s], r=%lud, d=%lud, t=%lud, S=%lud\n",
                        now, p->pid, statename[p->state], e->r, e->d, e->t, e->S);
                if(pt = proctrace){
-                       nowns = todget(nil);
+                       todget(nil, &nowns);
                        pt(p, SRelease, nowns);
                        pt(p, SDeadline, nowns + 1000LL*e->D);
                }
@@ -291,6 +291,7 @@
        Edf *e;
        void (*pt)(Proc*, int, vlong);
        long tns;
+       vlong tnow;
 
        e = p->edf;
        /* Called with edflock held */
@@ -315,8 +316,10 @@
                }else{
                        DPRINT("v");
                }
-               if(p->trace && (pt = proctrace))
-                       pt(p, SInte, todget(nil) + e->tns);
+               if(p->trace && (pt = proctrace)){
+                       todget(nil, &tnow);
+                       pt(p, SInte, tnow + e->tns);
+               }
                e->tmode = Trelative;
                e->tf = deadlineintr;
                e->ta = p;
--- a/sys/src/9/port/portfns.h
+++ b/sys/src/9/port/portfns.h
@@ -366,7 +366,7 @@
 ulong          tk2ms(ulong);
 #define                TK2MS(x) ((x)*(1000/HZ))
 uvlong         tod2fastticks(vlong);
-vlong          todget(vlong*);
+vlong          todget(vlong*, vlong*);
 void           todsetfreq(vlong);
 void           todinit(void);
 void           todset(vlong, vlong, int);
--- a/sys/src/9/port/sysproc.c
+++ b/sys/src/9/port/sysproc.c
@@ -1257,12 +1257,12 @@
        /* return in register on 64bit machine */
        if(sizeof(uintptr) == sizeof(vlong)){
                USED(list);
-               return (uintptr)todget(nil);
+               return (uintptr)todget(nil, nil);
        }
 
        v = va_arg(list, vlong*);
        evenaddr((uintptr)v);
        validaddr((uintptr)v, sizeof(vlong), 1);
-       *v = todget(nil);
+       *v = todget(nil, nil);
        return 0;
 }
--- a/sys/src/9/port/taslock.c
+++ b/sys/src/9/port/taslock.c
@@ -36,6 +36,7 @@
 lock(Lock *l)
 {
        int i;
+       vlong mono;
        uintptr pc;
 
        pc = getcallerpc(&l);
@@ -67,7 +68,8 @@
                                 */
                                print("inversion %#p pc %#p proc %lud held by pc %#p proc %lud\n",
                                        l, pc, up ? up->pid : 0, l->pc, l->p ? l->p->pid : 0);
-                               up->edf->d = todget(nil);       /* yield to process with lock */
+                               todget(nil, &mono);     /* yield to process with lock */
+                               up->edf->d = mono;
                        }
                        if(i++ > 100000000){
                                i = 0;
--- a/sys/src/9/port/tod.c
+++ b/sys/src/9/port/tod.c
@@ -44,7 +44,9 @@
        uvlong  udivider;       /* ticks = (µdivider*µs)>>31 */
        vlong   hz;             /* frequency of fast clock */
        vlong   last;           /* last reading of fast clock */
-       vlong   off;            /* offset from epoch to last */
+       vlong   off;            /* offset from epoch to last (ns) */
+       vlong   monolast;       /* last reading of fast clocks for monotonic time */
+       vlong   monooff;        /* offset from 0 to monolast (ns) */
        vlong   lasttime;       /* last return value from todget */
        vlong   delta;  /* add 'delta' each slow clock tick from sstart to send */
        ulong   sstart;         /* ... */
@@ -61,6 +63,7 @@
        ilock(&tod);
        tod.init = 1;                   /* prevent reentry via fastticks */
        tod.last = fastticks((uvlong *)&tod.hz);
+       tod.monolast = tod.last;
        iunlock(&tod);
        todsetfreq(tod.hz);
        addclock0link(todfix, 100);
@@ -67,14 +70,36 @@
 }
 
 /*
+ *  return monotonic ns; tod must be locked
+ */
+static vlong
+todmono(vlong ticks)
+{
+       uvlong x;
+       vlong diff;
+
+       if(tod.hz == 0) /* called from first todsetfreq */
+               return 0;
+       diff = ticks - tod.monolast;
+       mul64fract(&x, diff, tod.multiplier);
+       x += tod.monooff;
+       return x;
+}
+
+/*
  *  calculate multiplier
  */
 void
 todsetfreq(vlong f)
 {
+       vlong ticks;
+
        if (f <= 0)
                panic("todsetfreq: freq %lld <= 0", f);
        ilock(&tod);
+       ticks = fastticks(nil);
+       tod.monooff = todmono(ticks);
+       tod.monolast = ticks;
        tod.hz = f;
 
        /* calculate multiplier for time conversion */
@@ -125,10 +150,10 @@
  *  get time of day
  */
 vlong
-todget(vlong *ticksp)
+todget(vlong *ticksp, vlong *monop)
 {
        uvlong x;
-       vlong ticks, diff;
+       vlong ticks, diff, mono;
        ulong t;
 
        if(!tod.init)
@@ -159,16 +184,21 @@
        mul64fract(&x, diff, tod.multiplier);
        x += tod.off;
 
-       /* time can't go backwards */
+       /* time can't go backwards (except when /dev/[bin]time is written) */
        if(x < tod.lasttime)
                x = tod.lasttime;
        else
                tod.lasttime = x;
 
+       mono = 0;
+       if(monop != nil)
+               mono = todmono(ticks);
        iunlock(&tod);
 
        if(ticksp != nil)
                *ticksp = ticks;
+       if(monop != nil)
+               *monop = mono;
 
        return x;
 }
@@ -219,7 +249,7 @@
 long
 seconds(void)
 {
-       return (vlong)todget(nil) / TODFREQ;
+       return (vlong)todget(nil, nil) / TODFREQ;
 }
 
 uvlong
--- a/sys/src/9/sgi/trap.c
+++ b/sys/src/9/sgi/trap.c
@@ -595,7 +595,7 @@
                if(up->syscalltrace)
                        free(up->syscalltrace);
                up->syscalltrace = nil;
-               *startnsp = todget(nil);
+               todget(nil, startnsp);
        }
 }
 
@@ -602,12 +602,14 @@
 static void
 sctracefinish(ulong scallnr, ulong sp, int ret, vlong startns)
 {
+       vlong stopns;
        int s;
 
        if(up->procctl == Proc_tracesyscall){
+               todget(nil, &stopns);
                up->procctl = Proc_stopme;
                sysretfmt(scallnr, (va_list)(sp+BY2WD), ret,
-                       startns, todget(nil));
+                       startns, stopns);
                s = splhi();
                procctl();
                splx(s);
--- a/sys/src/9/teg2/syscall.c
+++ b/sys/src/9/teg2/syscall.c
@@ -202,7 +202,7 @@
 
        up->nerrlab = 0;
        ret = -1;
-       startns = todget(nil);
+       todget(nil, &startns);
 
        l1cache->wb();                  /* system is more stable with this */
        if(!waserror()){
@@ -240,7 +240,7 @@
        ureg->r0 = ret;
 
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list)(sp+BY2WD), ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;
--- a/sys/src/9/zynq/trap.c
+++ b/sys/src/9/zynq/trap.c
@@ -233,7 +233,7 @@
                        up->procctl = Proc_stopme;
                        procctl();
                        splx(s);
-                       startns = todget(nil);
+                       todget(nil, &startns);
                }
                if(scallnr >= nsyscall || systab[scallnr] == nil){
                        postnote(up, 1, "sys: bad sys call", NDebug);
@@ -256,7 +256,7 @@
        
        ureg->r0 = ret;
        if(up->procctl == Proc_tracesyscall){
-               stopns = todget(nil);
+               todget(nil, &stopns);
                sysretfmt(scallnr, (va_list) up->s.args, ret, startns, stopns);
                s = splhi();
                up->procctl = Proc_stopme;


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-M2c451362e4dff236d4dea432
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [9fans] monotonic time and randomness on Plan 9
  2025-03-11 17:07             ` Russ Cox
                                 ` (3 preceding siblings ...)
  2025-03-17  3:17               ` ori
@ 2025-04-07 18:40               ` ori
  4 siblings, 0 replies; 16+ messages in thread
From: ori @ 2025-04-07 18:40 UTC (permalink / raw)
  To: 9fans

Quoth Russ Cox <rsc@swtch.com>:
> The diff is here:
> https://github.com/rsc/plan9/commit/baf076425c.


Just pushed the 9front version of this; the main difference
for 9front was that I also looked over the call sites and
changed most of them to use the monotonic time.


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T59810df4fe34a033-Mb39b73765e263dee325d4907
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-04-07 18:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-10 15:13 [9fans] monotonic time and randomness on Plan 9 Russ Cox
2025-03-10 16:51 ` Skip Tavakkolian
2025-03-10 17:09 ` ron minnich
2025-03-10 17:12 ` ori
2025-03-10 18:16   ` Russ Cox
2025-03-10 22:28     ` ori
2025-03-10 22:44       ` ori
2025-03-11  1:55         ` Charles Forsyth
2025-03-11  5:34           ` ron minnich
2025-03-11 17:07             ` Russ Cox
2025-03-11 17:56               ` ron minnich
2025-03-14 19:18               ` ori
2025-03-15  1:25               ` Alyssa M via 9fans
2025-03-15 22:58                 ` Ron Minnich
2025-03-17  3:17               ` ori
2025-04-07 18:40               ` ori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).