[9front] [PATCH] libc: replace lrand's algorithm

9front - general discussion about 9front
 help / color / mirror / Atom feed

* [9front] [PATCH] libc: replace lrand's algorithm
@ 2024-03-29  8:15 Eolien55
  0 siblings, 0 replies; 15+ messages in thread
From: Eolien55 @ 2024-03-29  8:15 UTC (permalink / raw)
  To: 9front

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

Hello,

The current lrand's algorithm is quite suboptimal.
After benchmarking, we find:
Current algorithm:
repeated srand with different values: 820.740000
repeated lrand: 643.880000

Proposed algorithm (PCG32-XSH-RR[1]):
repeated srand with different valudes: 1.920000
repeated lrand: 600.590000

I've attached the benchmark code for reproducibility.
Though the lrand calls performance aren't themselves
all that bad, the design of the algorithm is not very
good. It has 4856 bytes of state (vs 4 bytes for PCG),
is very expensive to seed, as the benchmarks show
(because it actually uses an other PRNG to generate a
somewhat random state) and has bad statistical properties.
See [2] for a more detailed rationale and supplementary
materials on why this specific algorithm is suboptimal.

Regards,
Elie Le Vaillant

[1]: https://pcg-random.org
[2]: https://leviathansecurity.com/blog/attacking-gos-lagged-fibonacci-generator

[-- Attachment #2.1: Type: text/plain, Size: 376 bytes --]

from postmaster@9front:
The following attachment had content that we can't
prove to be harmless.  To avoid possible automatic
execution, we changed the content headers.
The original header was:

	Content-Disposition: attachment;
 filename*0*=UTF-8''0001-libc-replace-lrand-s-algorithm.patch
	Content-Type: text/x-diff;charset=utf-8
	Content-Transfer-Encoding: quoted-printable

[-- Attachment #2.2: file.suspect --]
[-- Type: application/octet-stream, Size: 2585 bytes --]

From 17a6a011ac9654e78a267474d534e8511dcd03fc Mon Sep 17 00:00:00 2001
From: Elie Le Vaillant <eolien55@disroot.org>
Date: Thu, 28 Mar 2024 23:03:08 +0100
Subject: [PATCH] libc: replace lrand's algorithm

The previous algorithm (an ALFG) was big (4856 bytes), somewhat complex
(60 sloc), very expansive to seed, and had bad statistical properties[1].
This replaces it with PCG32-XSH-RR[2], which is smaller (4 bytes), simpler
(21 sloc), way less expansive to seed, a bit faster, and has better
statistical properties.

[1]: https://leviathansecurity.com/blog/attacking-gos-lagged-fibonacci-generator
[2]: https://pcg-random.org
---
 sys/src/libc/port/lrand.c | 68 +++++++--------------------------------
 1 file changed, 12 insertions(+), 56 deletions(-)

diff --git a/sys/src/libc/port/lrand.c b/sys/src/libc/port/lrand.c
index 2ebb39622..ae08ed24e 100644
--- a/sys/src/libc/port/lrand.c
+++ b/sys/src/libc/port/lrand.c
@@ -2,82 +2,38 @@
 #include	<libc.h>
 
 /*
+ *	PCG32 XSH-RR variant
+ *	constants taken from github.com/pcg-c
  *	algorithm by
- *	D. P. Mitchell & J. A. Reeds
+ *	Melissa E. O’Neill
  */
 
-#define	LEN	607
-#define	TAP	273
-#define	MASK	0x7fffffffL
-#define	A	48271
-#define	M	2147483647
-#define	Q	44488
-#define	R	3399
-#define	NORM	(1.0/(1.0+MASK))
-
-static	ulong	rng_vec[LEN];
-static	ulong*	rng_tap = rng_vec;
-static	ulong*	rng_feed = 0;
+static	ulong	state;
 static	Lock	lk;
 
-static void
-isrand(long seed)
-{
-	long lo, hi, x;
-	int i;
-
-	rng_tap = rng_vec;
-	rng_feed = rng_vec+LEN-TAP;
-	seed = seed%M;
-	if(seed < 0)
-		seed += M;
-	if(seed == 0)
-		seed = 89482311;
-	x = seed;
-	/*
-	 *	Initialize by x[n+1] = 48271 * x[n] mod (2**31 - 1)
-	 */
-	for(i = -20; i < LEN; i++) {
-		hi = x / Q;
-		lo = x % Q;
-		x = A*lo - R*hi;
-		if(x < 0)
-			x += M;
-		if(i >= 0)
-			rng_vec[i] = x;
-	}
-}
-
 void
 srand(long seed)
 {
 	lock(&lk);
-	isrand(seed);
+	state = (ulong)seed;
 	unlock(&lk);
 }
 
 long
 lrand(void)
 {
-	ulong x;
+	u64int oldstate = state;
+	u32int r, x;
 
 	lock(&lk);
 
-	rng_tap--;
-	if(rng_tap < rng_vec) {
-		if(rng_feed == 0) {
-			isrand(1);
-			rng_tap--;
-		}
-		rng_tap += LEN;
-	}
-	rng_feed--;
-	if(rng_feed < rng_vec)
-		rng_feed += LEN;
-	x = (*rng_feed + *rng_tap) & MASK;
-	*rng_feed = x;
+	state *= 747796405U;
+	state += 2891336453U;
 
 	unlock(&lk);
 
+	r = oldstate >> (64 - 5);
+	x = (oldstate ^ (oldstate >> 18)) >> (32 - 5);
+	x = (x >> (-r & 31)) | (x << r);
 	return x;
 }
-- 
2.44.0



[-- Attachment #3: bench.c --]
[-- Type: text/x-c, Size: 567 bytes --]

#include <u.h>
#include <libc.h>

#define CLOCKS_PER_SEC 1000000L

void
main(int argc, char *argv[])
{
	long start;
	long newseed = time(0);
	double elapsed;

	start = clock();

	for(long i = 0; i < 0xffffff; i++) {
		srand(newseed);
		newseed = lrand();
	}

	elapsed = ((double)(clock() - start))/(double)CLOCKS_PER_SEC;
	print("repeated srand with different values: %f\n", elapsed);

	start = clock();

	for(long i = 0; i < 0xffffffff; i++) {
		lrand();
	}

	elapsed = ((double)(clock() - start))/(double)CLOCKS_PER_SEC;
	print("repeated lrand: %f\n", elapsed);
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-20 12:46         ` cinap_lenrek
  2024-04-20 12:58           ` cinap_lenrek
@ 2024-04-20 13:01           ` cinap_lenrek
  1 sibling, 0 replies; 15+ messages in thread
From: cinap_lenrek @ 2024-04-20 13:01 UTC (permalink / raw)
  To: 9front

actually, no. lrand() is fine.

theres no change needed i think:

	r = (s[0] + s[1]) >> 33;

note, we shift 33. not 32 that
means the MSB is always 0.

so disregard.

--
cinap

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-20 12:46         ` cinap_lenrek
@ 2024-04-20 12:58           ` cinap_lenrek
  2024-04-20 13:01           ` cinap_lenrek
  1 sibling, 0 replies; 15+ messages in thread
From: cinap_lenrek @ 2024-04-20 12:58 UTC (permalink / raw)
  To: 9front

as the first step, i'm going to just fix the kernels
lrand() function to be compliant with the spec.

you can generate 32-bit unsigned random number by
calling nrand() or lrand() twice. which seems to
be what the majority of the user in the kernel are
doing.

let consider this a improvement for later.

--
cinap

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-02 14:37       ` Eolien55
  2024-04-02 14:52         ` Stanley Lieber
  2024-04-03  9:07         ` qwx
@ 2024-04-20 12:46         ` cinap_lenrek
  2024-04-20 12:58           ` cinap_lenrek
  2024-04-20 13:01           ` cinap_lenrek
  2 siblings, 2 replies; 15+ messages in thread
From: cinap_lenrek @ 2024-04-20 12:46 UTC (permalink / raw)
  To: 9front

> I found (rand()<<16) + rand() in kernel's devsdp, which is
> strange considering lrand already has 32 bits of output. I
> think this should be considered a bug.

The manpage states:

          Rand returns a uniform pseudo-random number x, 0≤ x <2^15.

          Lrand returns a uniform long x, 0≤ x <2^31.

So lrand() only return a positive number (so really, is 31 bit
generator, not 32 bit).

Also, rand() returns only 15-bit integer. so doing rand()<<16
results in a bias as it will never set bit 15.

The lrand() implementation in the kernel is not complient
which i think is a mistake. (BUG!)

libc does not provide a function for unsigned 32-bit.
having one might be a good start.

say, lets define:

ulong ulrand(void)

in range 0<=ulrand()<2^32

and use that as the basis for all the other functions
instead of lrand().

Then as you say, constructs like rand()<<16 | rand()
should be revisited.

--
cinap

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-03  9:07         ` qwx
  2024-04-03 19:50           ` Ori Bernstein
  2024-04-03 19:54           ` Ori Bernstein
@ 2024-04-04 21:00           ` Eolien55
  2 siblings, 0 replies; 15+ messages in thread
From: Eolien55 @ 2024-04-04 21:00 UTC (permalink / raw)
  To: 9front

qwx@sciops.net wrote:
> I think at this point it would be better to look at a patch, with the
> changes mentioned after the first post, and a sweep through the other
> implementations.  There are some odd occurrences as you found; libc's
> frand() is also somewhat suspect but will likely need to change if
> lrand() does.  We need a 16bit, 32bit, floating point, and possibly
> 64bit variant; this will clean up some code and remove some
> redundancies which is always nice.  Perhaps take the extreme route and
> do *all* of the changes you'd like to see, so we could more easily
> discuss each case.

I don't currently have the time, but I think I'll work on it, probably
next week. Having a good double/float PRNG surprisingly hard to implement.
See [1] for multiple approaches.

> We then need to check if there's a performance impact on non-amd64
> arches.  For xoroshiro128+ vs. pcg I'd be interested to know if
> there's any significant difference between the two in statistical
> properties and performance; if we make a change, it'd be nice to
> pick the best alternative and make it once and for all.  Code size
> and simplicity also counts.  Is there any other variant that should
> be assessed?

I believe PCG has the smallest code size, and is also the one with
best statistical properties, all things considered.  Other
alternatives of xoroshiro have better statistical properties, such as
xoroshiro128**.  However, if we plan on having 64-bit generators,
xoroshiro256** will probably be simpler and shorter to implement.
PCG64 requires 128-bit operations, which need to be emulated (and that
is quite complex).  xoroshiro256** would be simpler, since it uses 4
32-bit integers.  I am a bit skeptic of xoroshiro256**'s statistical
quality.  Its output function can be easily inversed, by multiplying
by 57 for example, which reveals the inner LFSR, with all its
statistical flaws.  Multiplying it by 57 yields very very
statistically bad generator (it fails within 4MB!).  It is also "too
big to fail", in that empirical tests cannot reasonably show its
flaws, for its period is simply too large for it to be fully tested.
See [2] for more material on the matter.

But that's very much "my opinion man" territory. I think I'd choose
PCG because its quality is good despite its small state, and because
reduced versions of it pass statistical tests (whereas many other
'reduced' (as in, with smaller state as normal) PRNGs fail them even
with larger state). In the end, both are very comparable. If statistical
quality matters more than code size, and if emulating 128-bit addition
and multiplication isn't a problem, then I believe PCG should be chosen.
If it's the other way around, I'd probably go for xoroshiro**.

Cheers,
Elie Le Vaillant

[1] http://mumble.net/~campbell/tmp/random_real.c
[2] https://pcg-random.org/blog/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-03  9:07         ` qwx
  2024-04-03 19:50           ` Ori Bernstein
@ 2024-04-03 19:54           ` Ori Bernstein
  2024-04-04 21:00           ` Eolien55
  2 siblings, 0 replies; 15+ messages in thread
From: Ori Bernstein @ 2024-04-03 19:54 UTC (permalink / raw)
  To: 9front; +Cc: qwx

On Wed, 03 Apr 2024 11:07:22 +0200
qwx@sciops.net wrote:

> I think at this point it would be better to look at a patch, with the
> changes mentioned after the first post, and a sweep through the other
> implementations.  There are some odd occurrences as you found; libc's
> frand() is also somewhat suspect but will likely need to change if
> lrand() does.  We need a 16bit, 32bit, floating point, and possibly
> 64bit variant; this will clean up some code and remove some
> redundancies which is always nice.  Perhaps take the extreme route and
> do *all* of the changes you'd like to see, so we could more easily
> discuss each case.  We then need to check if there's a performance
> impact on non-amd64 arches.  For xoroshiro128+ vs.  pcg I'd be
> interested to know if there's any significant difference between the
> two in statistical properties and performance; if we make a change,
> it'd be nice to pick the best alternative and make it once and for
> all.  Code size and simplicity also counts.  Is there any other
> variant that should be assessed?

Performance for this RNG is, tbh, pretty uninteresting, at
least for any of the options under consideration. All of them
are good.

Better statistical properties, smaller and cleaner code, are
both good reasons to change this, and I'd be happy to take
a correct patch to improve that.

It's not an urgent need -- anything that strongly depends
on the statistical properties of a random stream should
possibly consider a CSPRNG -- but it's nice to have.

Another question here: Would it be sensible to just ditch
all of this tomfoolery with "good enough" randomness, and
just move to a seedable CSPRNG?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-03  9:07         ` qwx
@ 2024-04-03 19:50           ` Ori Bernstein
  2024-04-03 19:54           ` Ori Bernstein
  2024-04-04 21:00           ` Eolien55
  2 siblings, 0 replies; 15+ messages in thread
From: Ori Bernstein @ 2024-04-03 19:50 UTC (permalink / raw)
  To: 9front; +Cc: qwx

On Wed, 03 Apr 2024 11:07:22 +0200
qwx@sciops.net wrote:

> On Tue Apr  2 16:38:36 +0200 2024, eolien55@disroot.org wrote:
> > qwx@sciops.net wrote:
> > > I like the idea of simplifying the tree and deleting some code,
> > > but keep in mind that the xoroshiro128+ variant is what is
> > > exposed by /dev/random via truerand(2).
> > 
> > That's not quite it. What is exposed by /dev/random is actually
> > a Chacha-based CSPRNG. truerand(2) is a cryptographic PRNG.
> 
> Ah, my bad, sorry.
> 
> 
> > However, the kernel itself (via tcp, ip, sdp, esp...) uses
> > a non-cryptographic PRNG (in nrand and rand calls, in the
> > kernel). The kernel's PRNG was changed to xoroshiro128+
> > in 10275ad6dd2. I believe it would make sense to either
> > change libc's lrand to xoroshiro128+ too, or to change
> > them both at the same time.
> [...]
> > > Most people are
> > > uninsterested in a change here because of the dichotomy between
> > > rand and truerand, the negligeable performance impact, etc.,
> > > and beyond this, it gets into "this is like, my opinion man"
> > > territory.  I wouldn't mind replacing libc's lrand with a PCG
> > > variant also exposing a 64bit version, which we don't have, if
> > > it indeed also improves the statistical properties of the prng.
> > > But that's like, my opinion man.
> > 
> > Well, this is feasible. Both PCG and xoroshiro require 128 bits
> > of state for a 64-bit output. kernel's implementation simulates
> > 128-bits operations using 2 64-bit integers. It should be feasible
> > to implement a PC64 with the same approach, or to implement
> > xoroshiro128+ (or another variant, for statistical quality of the
> > lower bits).
> > 
> > I understand the change here is highly subjective, and has
> > negligible performance impact (except for seeding/re-seeding).
> > I still believe we could, and should, implement a better PRNG
> > for libc. Kernel's PRNG has changed; why not libc's?
> > 
> > I found (rand()<<16) + rand() in kernel's devsdp, which is
> > strange considering lrand already has 32 bits of output. I
> > think this should be considered a bug.
> 
> I think at this point it would be better to look at a patch, with the
> changes mentioned after the first post, and a sweep through the other
> implementations.  There are some odd occurrences as you found; libc's
> frand() is also somewhat suspect but will likely need to change if
> lrand() does.  We need a 16bit, 32bit, floating point, and possibly
> 64bit variant; this will clean up some code and remove some
> redundancies which is always nice.  Perhaps take the extreme route and
> do *all* of the changes you'd like to see, so we could more easily
> discuss each case.  We then need to check if there's a performance
> impact on non-amd64 arches.  For xoroshiro128+ vs.  pcg I'd be
> interested to know if there's any significant difference between the
> two in statistical properties and performance; if we make a change,
> it'd be nice to pick the best alternative and make it once and for
> all.  Code size and simplicity also counts.  Is there any other
> variant that should be assessed?
> 
> 
> > > I agree, though again, I don't know what the impact of such
> > > a change would be.
> > 
> > Arguably, very negligible. Both use random tests, and quality is
> > somewhat important in these context. A little more than in, say,
> > fortune(1), but way less than in tls(3). However, venti's randtest
> > uses a hard-coded PRNG instead of libc's.
> > 
> > Cheers,
> > Elie Le Vaillant
> 
> In that case, I wouldn't touch venti either.  There are at least two
> projects for replacing venti with a new implementation; imo any such
> changes should be made there.
> 
> Cheers,
> qwx


-- 
Ori Bernstein <ori@eigenstate.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-02 14:37       ` Eolien55
  2024-04-02 14:52         ` Stanley Lieber
@ 2024-04-03  9:07         ` qwx
  2024-04-03 19:50           ` Ori Bernstein
                             ` (2 more replies)
  2024-04-20 12:46         ` cinap_lenrek
  2 siblings, 3 replies; 15+ messages in thread
From: qwx @ 2024-04-03  9:07 UTC (permalink / raw)
  To: 9front

On Tue Apr  2 16:38:36 +0200 2024, eolien55@disroot.org wrote:
> qwx@sciops.net wrote:
> > I like the idea of simplifying the tree and deleting some code,
> > but keep in mind that the xoroshiro128+ variant is what is
> > exposed by /dev/random via truerand(2).
> 
> That's not quite it. What is exposed by /dev/random is actually
> a Chacha-based CSPRNG. truerand(2) is a cryptographic PRNG.

Ah, my bad, sorry.


> However, the kernel itself (via tcp, ip, sdp, esp...) uses
> a non-cryptographic PRNG (in nrand and rand calls, in the
> kernel). The kernel's PRNG was changed to xoroshiro128+
> in 10275ad6dd2. I believe it would make sense to either
> change libc's lrand to xoroshiro128+ too, or to change
> them both at the same time.
[...]
> > Most people are
> > uninsterested in a change here because of the dichotomy between
> > rand and truerand, the negligeable performance impact, etc.,
> > and beyond this, it gets into "this is like, my opinion man"
> > territory.  I wouldn't mind replacing libc's lrand with a PCG
> > variant also exposing a 64bit version, which we don't have, if
> > it indeed also improves the statistical properties of the prng.
> > But that's like, my opinion man.
> 
> Well, this is feasible. Both PCG and xoroshiro require 128 bits
> of state for a 64-bit output. kernel's implementation simulates
> 128-bits operations using 2 64-bit integers. It should be feasible
> to implement a PC64 with the same approach, or to implement
> xoroshiro128+ (or another variant, for statistical quality of the
> lower bits).
> 
> I understand the change here is highly subjective, and has
> negligible performance impact (except for seeding/re-seeding).
> I still believe we could, and should, implement a better PRNG
> for libc. Kernel's PRNG has changed; why not libc's?
> 
> I found (rand()<<16) + rand() in kernel's devsdp, which is
> strange considering lrand already has 32 bits of output. I
> think this should be considered a bug.

I think at this point it would be better to look at a patch, with the
changes mentioned after the first post, and a sweep through the other
implementations.  There are some odd occurrences as you found; libc's
frand() is also somewhat suspect but will likely need to change if
lrand() does.  We need a 16bit, 32bit, floating point, and possibly
64bit variant; this will clean up some code and remove some
redundancies which is always nice.  Perhaps take the extreme route and
do *all* of the changes you'd like to see, so we could more easily
discuss each case.  We then need to check if there's a performance
impact on non-amd64 arches.  For xoroshiro128+ vs.  pcg I'd be
interested to know if there's any significant difference between the
two in statistical properties and performance; if we make a change,
it'd be nice to pick the best alternative and make it once and for
all.  Code size and simplicity also counts.  Is there any other
variant that should be assessed?


> > I agree, though again, I don't know what the impact of such
> > a change would be.
> 
> Arguably, very negligible. Both use random tests, and quality is
> somewhat important in these context. A little more than in, say,
> fortune(1), but way less than in tls(3). However, venti's randtest
> uses a hard-coded PRNG instead of libc's.
> 
> Cheers,
> Elie Le Vaillant

In that case, I wouldn't touch venti either.  There are at least two
projects for replacing venti with a new implementation; imo any such
changes should be made there.

Cheers,
qwx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-02 14:52         ` Stanley Lieber
@ 2024-04-02 20:18           ` Eolien55
  0 siblings, 0 replies; 15+ messages in thread
From: Eolien55 @ 2024-04-02 20:18 UTC (permalink / raw)
  To: 9front

Stanley Lieber <sl@stanleylieber.com> wrote:
> > A little more than in, say, fortune(1)
> 
> whatever happened to free will, buddy
> 
> http://okturing.com/src/19101/body
> 
> sl

Your message is very cryptic, I don't understand it at all :'(
Cheers,
Elie Le Vaillant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-02 14:37       ` Eolien55
@ 2024-04-02 14:52         ` Stanley Lieber
  2024-04-02 20:18           ` Eolien55
  2024-04-03  9:07         ` qwx
  2024-04-20 12:46         ` cinap_lenrek
  2 siblings, 1 reply; 15+ messages in thread
From: Stanley Lieber @ 2024-04-02 14:52 UTC (permalink / raw)
  To: 9front

> A little more than in, say, fortune(1)

whatever happened to free will, buddy

http://okturing.com/src/19101/body

sl

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-04-01 15:47     ` qwx
@ 2024-04-02 14:37       ` Eolien55
  2024-04-02 14:52         ` Stanley Lieber
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Eolien55 @ 2024-04-02 14:37 UTC (permalink / raw)
  To: 9front

qwx@sciops.net wrote:
> I like the idea of simplifying the tree and deleting some code,
> but keep in mind that the xoroshiro128+ variant is what is
> exposed by /dev/random via truerand(2).

That's not quite it. What is exposed by /dev/random is actually
a Chacha-based CSPRNG. truerand(2) is a cryptographic PRNG.
However, the kernel itself (via tcp, ip, sdp, esp...) uses
a non-cryptographic PRNG (in nrand and rand calls, in the
kernel). The kernel's PRNG was changed to xoroshiro128+
in 10275ad6dd2. I believe it would make sense to either
change libc's lrand to xoroshiro128+ too, or to change
them both at the same time.

> It's of little use to touch the APE code at all at this point.

Alright then, I wasn't sure what was its state in current 9front.

> Most people are
> uninsterested in a change here because of the dichotomy between
> rand and truerand, the negligeable performance impact, etc.,
> and beyond this, it gets into "this is like, my opinion man"
> territory.  I wouldn't mind replacing libc's lrand with a PCG
> variant also exposing a 64bit version, which we don't have, if
> it indeed also improves the statistical properties of the prng.
> But that's like, my opinion man.

Well, this is feasible. Both PCG and xoroshiro require 128 bits
of state for a 64-bit output. kernel's implementation simulates
128-bits operations using 2 64-bit integers. It should be feasible
to implement a PC64 with the same approach, or to implement
xoroshiro128+ (or another variant, for statistical quality of the
lower bits).

I understand the change here is highly subjective, and has
negligible performance impact (except for seeding/re-seeding).
I still believe we could, and should, implement a better PRNG
for libc. Kernel's PRNG has changed; why not libc's?

I found (rand()<<16) + rand() in kernel's devsdp, which is
strange considering lrand already has 32 bits of output. I
think this should be considered a bug.

> I agree, though again, I don't know what the impact of such
> a change would be.

Arguably, very negligible. Both use random tests, and quality is
somewhat important in these context. A little more than in, say,
fortune(1), but way less than in tls(3). However, venti's randtest
uses a hard-coded PRNG instead of libc's.

Cheers,
Elie Le Vaillant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-03-29 16:02   ` Eolien55
@ 2024-04-01 15:47     ` qwx
  2024-04-02 14:37       ` Eolien55
  0 siblings, 1 reply; 15+ messages in thread
From: qwx @ 2024-04-01 15:47 UTC (permalink / raw)
  To: 9front

On Fri Mar 29 17:02:32 +0100 2024, eolien55@disroot.org wrote:
> > lrand is used by rand(2) itself and is used throughout the system
> > and elsewhere, and who knows what impact changing it might have.
> 
> I looked through every usage of lrand-related functions in the codebase,
> and it seems that all programs use it simply as "a uniform pseudo-random
> number generator", without other expectations on the output (things
> that would vary from algorithm to algorithm). However the manpage should
> be changed to reflect the algorithm change.

I agree, it doesn't hurt to at least mention this in the manpage.


> However, through looking at the full source tree, it appears there
> are at least 3 lrand duplicates in the source tree, each with its
> own algorithm: one for the kernel, in /sys/src/9/port/random.c, which
> uses xoroshiro128+, which is quite good; one for normal userspace in
> /sys/src/libc/lrand.c, which uses the aforementioned ALFG, which is
> quite suboptimal; and one for ape in /sys/src/lib/v/rand.c, which
> seemingly points to this[1] article. The algorithm is not in common
> use, so maybe it should be changed to one whose properties are
> better-known?
> 
> Given that kernel's lrand is xoroshiro128+, it would make sense to
> either change kernel's lrand to PCG too, or change the proposed
> algorithm to xoroshiro128+.

I like the idea of simplifying the tree and deleting some code,
but keep in mind that the xoroshiro128+ variant is what is
exposed by /dev/random via truerand(2).  It's of little use to
touch the APE code at all at this point.  Most people are
uninsterested in a change here because of the dichotomy between
rand and truerand, the negligeable performance impact, etc.,
and beyond this, it gets into "this is like, my opinion man"
territory.  I wouldn't mind replacing libc's lrand with a PCG
variant also exposing a 64bit version, which we don't have, if
it indeed also improves the statistical properties of the prng.
But that's like, my opinion man.


> > Applications which need guarantees about distribution or
> > performance are not supposed to use rand(2) to begin with.
> 
> Well in that case, some code in /sys/src/cmd needs to be changed
> to not use rand, like spin and venti.

I agree, though again, I don't know what the impact of such
a change would be.

> Cheers,
> Elie Le Vaillant


Cheers,
qwx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-03-29 12:39 ` qwx
@ 2024-03-29 16:02   ` Eolien55
  2024-04-01 15:47     ` qwx
  0 siblings, 1 reply; 15+ messages in thread
From: Eolien55 @ 2024-03-29 16:02 UTC (permalink / raw)
  To: 9front

qwx@sciops.net wrote:
> On Fri Mar 29 09:19:39 +0100 2024, eolien55@disroot.org wrote:
> Here `state' is 64-bit, whereas in your code, it's a ulong,
> ie. 32-bit, which seems to break everything.

I've appeared to make a rookie mistake. It should obviously be
a u64int.

> Furthermore, your code seems to have flipped part of the last
> statement and return the equivalent of:
> 
> 	return (xorshifted >> ((-rot) & 31)) | (xorshifted << rot);
> 
> Is that intentional?

Well it is a right rotate instead of a left rotate. I don't believe
it changes anything relevant. Maybe it does, but I'm not sure.
Anyway it should probably indeed be changed to avoid such "puzzled
moments" (code is more often read than written).

> Moreover, for the benchmark itself, while seeding is significantly
> faster, actually calling lrand barely is; the new code is only
> 1.07x faster than the old.  The scale is also minuscule, around
> 42 seconds for circa 4.3 billion calls (you're looping on a signed
> long from 0 to -1).  Am I misreading?

You are not misreading. The main issues of the current algorithm are
not so much ones of speed (von Neumann's middle square algorithm is
incredibly fast but has extremely poor quality), but ones of state
size and statistical quality. Additive Lagged Fibonacci Generators
(ALFG) such as this one, have bad statistical properties, uncovered by
both mathematical analysis and empirical testing.

The scale is indeed miniscule, and the performance benefit arguably
small, but I do not think that performance alone should decide.

> lrand is used by rand(2) itself and is used throughout the system
> and elsewhere, and who knows what impact changing it might have.

I looked through every usage of lrand-related functions in the codebase,
and it seems that all programs use it simply as "a uniform pseudo-random
number generator", without other expectations on the output (things
that would vary from algorithm to algorithm). However the manpage should
be changed to reflect the algorithm change.

However, through looking at the full source tree, it appears there
are at least 3 lrand duplicates in the source tree, each with its
own algorithm: one for the kernel, in /sys/src/9/port/random.c, which
uses xoroshiro128+, which is quite good; one for normal userspace in
/sys/src/libc/lrand.c, which uses the aforementioned ALFG, which is
quite suboptimal; and one for ape in /sys/src/lib/v/rand.c, which
seemingly points to this[1] article. The algorithm is not in common
use, so maybe it should be changed to one whose properties are
better-known?

Given that kernel's lrand is xoroshiro128+, it would make sense to
either change kernel's lrand to PCG too, or change the proposed
algorithm to xoroshiro128+.

> Applications which need guarantees about distribution or
> performance are not supposed to use rand(2) to begin with.

Well in that case, some code in /sys/src/cmd needs to be changed
to not use rand, like spin and venti.

Cheers,
Elie Le Vaillant

[1] https://dl.acm.org/doi/pdf/10.1145/63039.63042

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9front] [PATCH] libc: replace lrand's algorithm
  2024-03-29  8:19 Eolien55
@ 2024-03-29 12:39 ` qwx
  2024-03-29 16:02   ` Eolien55
  0 siblings, 1 reply; 15+ messages in thread
From: qwx @ 2024-03-29 12:39 UTC (permalink / raw)
  To: 9front

On Fri Mar 29 09:19:39 +0100 2024, eolien55@disroot.org wrote:

> Hello,
> 
> The current lrand's algorithm is quite suboptimal.
> After benchmarking, we find:
> Current algorithm:
> repeated srand with different values: 820.740000
> repeated lrand: 643.880000
> 
> Proposed algorithm (PCG32-XSH-RR[1]):
> repeated srand with different valudes: 1.920000
> repeated lrand: 600.590000

Hello, 

I'm a bit puzzled by both the results and the implementation.
If I look at the site you point to, the download page [1] has
this snippet:

typedef struct { uint64_t state;  uint64_t inc; } pcg32_random_t;

 uint32_t pcg32_random_r(pcg32_random_t* rng)
 {
     uint64_t oldstate = rng->state;
     // Advance internal state
     rng->state = oldstate * 6364136223846793005ULL + (rng->inc|1);
     // Calculate output function (XSH RR), uses old state for max ILP
     uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
     uint32_t rot = oldstate >> 59u;
     return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
 }

Here `state' is 64-bit, whereas in your code, it's a ulong,
ie. 32-bit, which seems to break everything.  Furthermore, your
code seems to have flipped part of the last statement and return
the equivalent of:

	return (xorshifted >> ((-rot) & 31)) | (xorshifted << rot);

Is that intentional?

Moreover, for the benchmark itself, while seeding is significantly
faster, actually calling lrand barely is; the new code is only
1.07x faster than the old.  The scale is also minuscule, around
42 seconds for circa 4.3 billion calls (you're looping on a signed
long from 0 to -1).  Am I misreading?

lrand is used by rand(2) itself and is used throughout the system
and elsewhere, and who knows what impact changing it might have.
Applications which need guarantees about distribution or
performance are not supposed to use rand(2) to begin with.
Therefore, these numbers alone do not justify, in my opinion,
replacing the current implementation.

Cheers,
qwx

[1] https://www.pcg-random.org/download.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [9front] [PATCH] libc: replace lrand's algorithm
@ 2024-03-29  8:19 Eolien55
  2024-03-29 12:39 ` qwx
  0 siblings, 1 reply; 15+ messages in thread
From: Eolien55 @ 2024-03-29  8:19 UTC (permalink / raw)
  To: 9front

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

Hello,

The current lrand's algorithm is quite suboptimal.
After benchmarking, we find:
Current algorithm:
repeated srand with different values: 820.740000
repeated lrand: 643.880000

Proposed algorithm (PCG32-XSH-RR[1]):
repeated srand with different valudes: 1.920000
repeated lrand: 600.590000

I've attached the benchmark code for reproducibility.
Though the lrand calls performance aren't themselves
all that bad, the design of the algorithm is not very
good. It has 4856 bytes of state (vs 4 bytes for PCG),
is very expensive to seed, as the benchmarks show
(because it actually uses an other PRNG to generate a
somewhat random state) and has bad statistical properties.
See [2] for a more detailed rationale and supplementary
materials on why this specific algorithm is suboptimal.

Regards,
Elie Le Vaillant

[1]: https://pcg-random.org
[2]: https://leviathansecurity.com/blog/attacking-gos-lagged-fibonacci-generator

[-- Attachment #2.1: Type: text/plain, Size: 376 bytes --]

from postmaster@9front:
The following attachment had content that we can't
prove to be harmless.  To avoid possible automatic
execution, we changed the content headers.
The original header was:

	Content-Disposition: attachment;
 filename*0*=UTF-8''0001-libc-replace-lrand-s-algorithm.patch
	Content-Type: text/x-diff;charset=utf-8
	Content-Transfer-Encoding: quoted-printable

[-- Attachment #2.2: file.suspect --]
[-- Type: application/octet-stream, Size: 2585 bytes --]

From 17a6a011ac9654e78a267474d534e8511dcd03fc Mon Sep 17 00:00:00 2001
From: Elie Le Vaillant <eolien55@disroot.org>
Date: Thu, 28 Mar 2024 23:03:08 +0100
Subject: [PATCH] libc: replace lrand's algorithm

The previous algorithm (an ALFG) was big (4856 bytes), somewhat complex
(60 sloc), very expansive to seed, and had bad statistical properties[1].
This replaces it with PCG32-XSH-RR[2], which is smaller (4 bytes), simpler
(21 sloc), way less expansive to seed, a bit faster, and has better
statistical properties.

[1]: https://leviathansecurity.com/blog/attacking-gos-lagged-fibonacci-generator
[2]: https://pcg-random.org
---
 sys/src/libc/port/lrand.c | 68 +++++++--------------------------------
 1 file changed, 12 insertions(+), 56 deletions(-)

diff --git a/sys/src/libc/port/lrand.c b/sys/src/libc/port/lrand.c
index 2ebb39622..ae08ed24e 100644
--- a/sys/src/libc/port/lrand.c
+++ b/sys/src/libc/port/lrand.c
@@ -2,82 +2,38 @@
 #include	<libc.h>
 
 /*
+ *	PCG32 XSH-RR variant
+ *	constants taken from github.com/pcg-c
  *	algorithm by
- *	D. P. Mitchell & J. A. Reeds
+ *	Melissa E. O’Neill
  */
 
-#define	LEN	607
-#define	TAP	273
-#define	MASK	0x7fffffffL
-#define	A	48271
-#define	M	2147483647
-#define	Q	44488
-#define	R	3399
-#define	NORM	(1.0/(1.0+MASK))
-
-static	ulong	rng_vec[LEN];
-static	ulong*	rng_tap = rng_vec;
-static	ulong*	rng_feed = 0;
+static	ulong	state;
 static	Lock	lk;
 
-static void
-isrand(long seed)
-{
-	long lo, hi, x;
-	int i;
-
-	rng_tap = rng_vec;
-	rng_feed = rng_vec+LEN-TAP;
-	seed = seed%M;
-	if(seed < 0)
-		seed += M;
-	if(seed == 0)
-		seed = 89482311;
-	x = seed;
-	/*
-	 *	Initialize by x[n+1] = 48271 * x[n] mod (2**31 - 1)
-	 */
-	for(i = -20; i < LEN; i++) {
-		hi = x / Q;
-		lo = x % Q;
-		x = A*lo - R*hi;
-		if(x < 0)
-			x += M;
-		if(i >= 0)
-			rng_vec[i] = x;
-	}
-}
-
 void
 srand(long seed)
 {
 	lock(&lk);
-	isrand(seed);
+	state = (ulong)seed;
 	unlock(&lk);
 }
 
 long
 lrand(void)
 {
-	ulong x;
+	u64int oldstate = state;
+	u32int r, x;
 
 	lock(&lk);
 
-	rng_tap--;
-	if(rng_tap < rng_vec) {
-		if(rng_feed == 0) {
-			isrand(1);
-			rng_tap--;
-		}
-		rng_tap += LEN;
-	}
-	rng_feed--;
-	if(rng_feed < rng_vec)
-		rng_feed += LEN;
-	x = (*rng_feed + *rng_tap) & MASK;
-	*rng_feed = x;
+	state *= 747796405U;
+	state += 2891336453U;
 
 	unlock(&lk);
 
+	r = oldstate >> (64 - 5);
+	x = (oldstate ^ (oldstate >> 18)) >> (32 - 5);
+	x = (x >> (-r & 31)) | (x << r);
 	return x;
 }
-- 
2.44.0



[-- Attachment #3: bench.c --]
[-- Type: text/x-c, Size: 567 bytes --]

#include <u.h>
#include <libc.h>

#define CLOCKS_PER_SEC 1000000L

void
main(int argc, char *argv[])
{
	long start;
	long newseed = time(0);
	double elapsed;

	start = clock();

	for(long i = 0; i < 0xffffff; i++) {
		srand(newseed);
		newseed = lrand();
	}

	elapsed = ((double)(clock() - start))/(double)CLOCKS_PER_SEC;
	print("repeated srand with different values: %f\n", elapsed);

	start = clock();

	for(long i = 0; i < 0xffffffff; i++) {
		lrand();
	}

	elapsed = ((double)(clock() - start))/(double)CLOCKS_PER_SEC;
	print("repeated lrand: %f\n", elapsed);
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-04-20 13:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29  8:15 [9front] [PATCH] libc: replace lrand's algorithm Eolien55
2024-03-29  8:19 Eolien55
2024-03-29 12:39 ` qwx
2024-03-29 16:02   ` Eolien55
2024-04-01 15:47     ` qwx
2024-04-02 14:37       ` Eolien55
2024-04-02 14:52         ` Stanley Lieber
2024-04-02 20:18           ` Eolien55
2024-04-03  9:07         ` qwx
2024-04-03 19:50           ` Ori Bernstein
2024-04-03 19:54           ` Ori Bernstein
2024-04-04 21:00           ` Eolien55
2024-04-20 12:46         ` cinap_lenrek
2024-04-20 12:58           ` cinap_lenrek
2024-04-20 13:01           ` cinap_lenrek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).