caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Ocaml PRNG Passes Diehard Tests
@ 2008-07-16  1:24 Will Farr
  0 siblings, 0 replies; only message in thread
From: Will Farr @ 2008-07-16  1:24 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

Hello everyone,

I was curious yesterday to see how good the OCaml PRNG algorithm was.   
After a bit of googling, I couldn't find any tests of it with the  
"gold standard" of PRNG tests, Diehard (see http://en.wikipedia.org/wiki/Diehard_tests 
  ; from that site, there is a link to obtain C and FORTRAN programs  
which test a binary file of random bits).  So, I ran the Diehard test  
suite myself, using about 30MB of random bits generated from the  
stdlib Random module.  Good news: it passed.

Attached is the output of the test; basically, "passing" means that  
none of the P-values was too close to 0.000000 or 1.000000.  See the  
Diehard NOTES file for more information, and also the note at the top  
of the output regarding the significance of large or small p-values.

I just wanted to send this message so that future googlers like me  
yesterday will know that this test has been run, and that the OCaml  
PRNG passed.

Will


[-- Attachment #2: diehard_test_output.txt --]
[-- Type: text/plain, Size: 39458 bytes --]


				NOTE

	Most of the tests in DIEHARD return a p-value, which
	should be uniform on [0,1) if the input file contains truly
	independent random bits.   Those p-values are obtained by
	p=1-F(X), where F is the assumed distribution of the sample
	random variable X---often normal. But that assumed F is often just
	an asymptotic approximation, for which the fit will be worst
	in the tails. Thus you should not be surprised with  occasion-
	al p-values near 0 or 1, such as .0012 or .9983. When a bit
	stream really FAILS BIG, you will get p`s of 0 or 1 to six 
	or more places.  By all means, do not, as a Statistician 
	might, think that a p < .025 or p> .975 means that the RNG
	has "failed the test at the .05 level".  Such p`s happen
	among the hundreds that DIEHARD produces, even with good RNGs.
	 So keep in mind that "p happens"

	Enter the name of the file to be tested.
	This must be a form="unformatted",access="direct" binary
	file of about 10-12 million bytes. Enter file name: 


		HERE ARE YOUR CHOICES:

		1   Birthday Spacings
		2   Overlapping Permutations
		3   Ranks of 31x31 and 32x32 matrices
		4   Ranks of 6x8 Matrices
		5   Monkey Tests on 20-bit Words
		6   Monkey Tests OPSO,OQSO,DNA
		7   Count the 1`s in a Stream of Bytes
		8   Count the 1`s in Specific Bytes
		9   Parking Lot Test
		10  Minimum Distance Test
		11  Random Spheres Test
		12  The Sqeeze Test
		13  Overlapping Sums Test
		14  Runs Test
		15  The Craps Test
		16  All of the above

	To choose any particular tests, enter corresponding numbers.
	Enter 16 for all tests. If you want to perform all but a few
	tests, enter corresponding numbers preceded by "-" sign.
	Tests are executed in the order they are entered.

	Enter your choices.

	|-------------------------------------------------------------|
	|           This is the BIRTHDAY SPACINGS TEST                |
	|Choose m birthdays in a "year" of n days.  List the spacings |
	|between the birthdays.  Let j be the number of values that   |
	|occur more than once in that list, then j is asymptotically  |
	|Poisson distributed with mean m^3/(4n).  Experience shows n  |
	|must be quite large, say n>=2^18, for comparing the results  |
	|to the Poisson distribution with that mean.  This test uses  |
	|n=2^24 and m=2^10, so that the underlying distribution for j |
	|is taken to be Poisson with lambda=2^30/(2^26)=16. A sample  |
	|of 200 j''s is taken, and a chi-square goodness of fit test  |
	|provides a p value.  The first test uses bits 1-24 (counting |
	|from the left) from integers in the specified file.  Then the|
	|file is closed and reopened, then bits 2-25 of the same inte-|
	|gers are used to provide birthdays, and so on to bits 9-32.  |
	|Each set of bits provides a p-value, and the nine p-values   |
	|provide a sample for a KSTEST.                               |
	|------------------------------------------------------------ |

		RESULTS OF BIRTHDAY SPACINGS TEST FOR random_bits.dat
	(no_bdays=1024, no_days/yr=2^24, lambda=16.00, sample size=500)

	Bits used	mean		chisqr		p-value
	 1 to 24	15.62		13.5401		0.699348
	 2 to 25	15.65		24.4252		0.108335
	 3 to 26	15.63		18.7935		0.340514
	 4 to 27	15.88		20.6086		0.244290
	 5 to 28	15.79		14.0140		0.666114
	 6 to 29	15.55		13.0174		0.735020
	 7 to 30	16.08		25.8071		0.078047
	 8 to 31	15.82		17.5427		0.418225
	 9 to 32	15.60		29.1235		0.033404

			degree of freedoms is: 17
	---------------------------------------------------------------
		p-value for KStest on those 9 p-values: 0.279683


	|-------------------------------------------------------------|
	|           THE OVERLAPPING 5-PERMUTATION TEST                |
	|This is the OPERM5 test.  It looks at a sequence of one mill-|
	|ion 32-bit random integers.  Each set of five consecutive    |
	|integers can be in one of 120 states, for the 5! possible or-|
	|derings of five numbers.  Thus the 5th, 6th, 7th,...numbers  |
	|each provide a state. As many thousands of state transitions |
	|are observed,  cumulative counts are made of the number of   |
	|occurences of each state.  Then the quadratic form in the    |
	|weak inverse of the 120x120 covariance matrix yields a test  |
	|equivalent to the likelihood ratio test that the 120 cell    |
	|counts came from the specified (asymptotically) normal dis-  |
	|tribution with the specified 120x120 covariance matrix (with |
	|rank 99).  This version uses 1,000,000 integers, twice.      |
	|-------------------------------------------------------------|

			OPERM5 test for file 
		  (For samples of 1,000,000 consecutive 5-tuples)

			  sample 1 
	chisquare=112.576986 with df=99; p-value= 0.165806
	_______________________________________________________________

			  sample 2 
	chisquare=112.879031 with df=99; p-value= 0.160951
	_______________________________________________________________


	|-------------------------------------------------------------|
	|This is the BINARY RANK TEST for 31x31 matrices. The leftmost|
	|31 bits of 31 random integers from the test sequence are used|
	|to form a 31x31 binary matrix over the field {0,1}. The rank |
	|is determined. That rank can be from 0 to 31, but ranks< 28  |
	|are rare, and their counts are pooled with those for rank 28.|
	|Ranks are found for 40,000 such random matrices and a chisqu-|
	|are test is performed on counts for ranks 31,30,28 and <=28. |
	|-------------------------------------------------------------|
		Rank test for binary matrices (31x31) from random_bits.dat

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=28	218         	211.4       	0.205       	0.205       
	r=29	5035        	5134.0      	1.909       	2.114       
	r=30	23321       	23103.0     	2.056       	4.171       
	r=31	11426       	11551.5     	1.364       	5.534       

		chi-square = 5.534 with df = 3;  p-value = 0.137
	--------------------------------------------------------------

	|-------------------------------------------------------------|
	|This is the BINARY RANK TEST for 32x32 matrices. A random 32x|
	|32 binary matrix is formed, each row a 32-bit random integer.|
	|The rank is determined. That rank can be from 0 to 32, ranks |
	|less than 29 are rare, and their counts are pooled with those|
	|for rank 29.  Ranks are found for 40,000 such random matrices|
	|and a chisquare test is performed on counts for ranks  32,31,|
	|30 and <=29.                                                 |
	|-------------------------------------------------------------|
		Rank test for binary matrices (32x32) from random_bits.dat

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=29	211         	211.4       	0.001       	0.001       
	r=30	5071        	5134.0      	0.773       	0.774       
	r=31	23128       	23103.0     	0.027       	0.801       
	r=32	11590       	11551.5     	0.128       	0.929       

		chi-square = 0.929 with df = 3;  p-value = 0.818
	--------------------------------------------------------------

	|-------------------------------------------------------------|
	|This is the BINARY RANK TEST for 6x8 matrices.  From each of |
	|six random 32-bit integers from the generator under test, a  |
	|specified byte is chosen, and the resulting six bytes form a |
	|6x8 binary matrix whose rank is determined.  That rank can be|
	|from 0 to 6, but ranks 0,1,2,3 are rare; their counts are    |
	|pooled with those for rank 4. Ranks are found for 100,000    |
	|random matrices, and a chi-square test is performed on       |
	|counts for ranks 6,5 and (0,...,4) (pooled together).        |
	|-------------------------------------------------------------|

		Rank test for binary matrices (6x8) from random_bits.dat

			      bits  1 to  8

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	953         	944.3       	0.080       	0.080       
	r=5	21627       	21743.9     	0.628       	0.709       
	r=6	77420       	77311.8     	0.151       	0.860       

		chi-square = 0.860 with df = 2;  p-value = 0.650
	--------------------------------------------------------------

			      bits  2 to  9

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	926         	944.3       	0.355       	0.355       
	r=5	21478       	21743.9     	3.252       	3.606       
	r=6	77596       	77311.8     	1.045       	4.651       

		chi-square = 4.651 with df = 2;  p-value = 0.098
	--------------------------------------------------------------

			      bits  3 to 10

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	936         	944.3       	0.073       	0.073       
	r=5	21831       	21743.9     	0.349       	0.422       
	r=6	77233       	77311.8     	0.080       	0.502       

		chi-square = 0.502 with df = 2;  p-value = 0.778
	--------------------------------------------------------------

			      bits  4 to 11

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	926         	944.3       	0.355       	0.355       
	r=5	21645       	21743.9     	0.450       	0.804       
	r=6	77429       	77311.8     	0.178       	0.982       

		chi-square = 0.982 with df = 2;  p-value = 0.612
	--------------------------------------------------------------

			      bits  5 to 12

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	926         	944.3       	0.355       	0.355       
	r=5	21771       	21743.9     	0.034       	0.388       
	r=6	77303       	77311.8     	0.001       	0.389       

		chi-square = 0.389 with df = 2;  p-value = 0.823
	--------------------------------------------------------------

			      bits  6 to 13

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	978         	944.3       	1.203       	1.203       
	r=5	21803       	21743.9     	0.161       	1.363       
	r=6	77219       	77311.8     	0.111       	1.475       

		chi-square = 1.475 with df = 2;  p-value = 0.478
	--------------------------------------------------------------

			      bits  7 to 14

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	973         	944.3       	0.872       	0.872       
	r=5	21707       	21743.9     	0.063       	0.935       
	r=6	77320       	77311.8     	0.001       	0.936       

		chi-square = 0.936 with df = 2;  p-value = 0.626
	--------------------------------------------------------------

			      bits  8 to 15

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	948         	944.3       	0.014       	0.014       
	r=5	21778       	21743.9     	0.053       	0.068       
	r=6	77274       	77311.8     	0.018       	0.086       

		chi-square = 0.086 with df = 2;  p-value = 0.958
	--------------------------------------------------------------

			      bits  9 to 16

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	979         	944.3       	1.275       	1.275       
	r=5	21810       	21743.9     	0.201       	1.476       
	r=6	77211       	77311.8     	0.131       	1.607       

		chi-square = 1.607 with df = 2;  p-value = 0.448
	--------------------------------------------------------------

			      bits 10 to 17

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	923         	944.3       	0.480       	0.480       
	r=5	21836       	21743.9     	0.390       	0.871       
	r=6	77241       	77311.8     	0.065       	0.935       

		chi-square = 0.935 with df = 2;  p-value = 0.626
	--------------------------------------------------------------

			      bits 11 to 18

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	953         	944.3       	0.080       	0.080       
	r=5	21821       	21743.9     	0.273       	0.354       
	r=6	77226       	77311.8     	0.095       	0.449       

		chi-square = 0.449 with df = 2;  p-value = 0.799
	--------------------------------------------------------------

			      bits 12 to 19

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	991         	944.3       	2.310       	2.310       
	r=5	21512       	21743.9     	2.473       	4.783       
	r=6	77497       	77311.8     	0.444       	5.226       

		chi-square = 5.226 with df = 2;  p-value = 0.073
	--------------------------------------------------------------

			      bits 13 to 20

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	982         	944.3       	1.505       	1.505       
	r=5	21617       	21743.9     	0.741       	2.246       
	r=6	77401       	77311.8     	0.103       	2.349       

		chi-square = 2.349 with df = 2;  p-value = 0.309
	--------------------------------------------------------------

			      bits 14 to 21

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	968         	944.3       	0.595       	0.595       
	r=5	21644       	21743.9     	0.459       	1.054       
	r=6	77388       	77311.8     	0.075       	1.129       

		chi-square = 1.129 with df = 2;  p-value = 0.569
	--------------------------------------------------------------

			      bits 15 to 22

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	954         	944.3       	0.100       	0.100       
	r=5	21817       	21743.9     	0.246       	0.345       
	r=6	77229       	77311.8     	0.089       	0.434       

		chi-square = 0.434 with df = 2;  p-value = 0.805
	--------------------------------------------------------------

			      bits 16 to 23

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	918         	944.3       	0.732       	0.732       
	r=5	21890       	21743.9     	0.982       	1.714       
	r=6	77192       	77311.8     	0.186       	1.900       

		chi-square = 1.900 with df = 2;  p-value = 0.387
	--------------------------------------------------------------

			      bits 17 to 24

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	924         	944.3       	0.436       	0.436       
	r=5	21764       	21743.9     	0.019       	0.455       
	r=6	77312       	77311.8     	0.000       	0.455       

		chi-square = 0.455 with df = 2;  p-value = 0.797
	--------------------------------------------------------------

			      bits 18 to 25

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	912         	944.3       	1.105       	1.105       
	r=5	21644       	21743.9     	0.459       	1.564       
	r=6	77444       	77311.8     	0.226       	1.790       

		chi-square = 1.790 with df = 2;  p-value = 0.409
	--------------------------------------------------------------

			      bits 19 to 26

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	919         	944.3       	0.678       	0.678       
	r=5	21679       	21743.9     	0.194       	0.872       
	r=6	77402       	77311.8     	0.105       	0.977       

		chi-square = 0.977 with df = 2;  p-value = 0.614
	--------------------------------------------------------------

			      bits 20 to 27

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	941         	944.3       	0.012       	0.012       
	r=5	21524       	21743.9     	2.224       	2.235       
	r=6	77535       	77311.8     	0.644       	2.880       

		chi-square = 2.880 with df = 2;  p-value = 0.237
	--------------------------------------------------------------

			      bits 21 to 28

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	909         	944.3       	1.320       	1.320       
	r=5	21828       	21743.9     	0.325       	1.645       
	r=6	77263       	77311.8     	0.031       	1.676       

		chi-square = 1.676 with df = 2;  p-value = 0.433
	--------------------------------------------------------------

			      bits 22 to 29

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	923         	944.3       	0.480       	0.480       
	r=5	21640       	21743.9     	0.496       	0.977       
	r=6	77437       	77311.8     	0.203       	1.180       

		chi-square = 1.180 with df = 2;  p-value = 0.554
	--------------------------------------------------------------

			      bits 23 to 30

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	867         	944.3       	6.328       	6.328       
	r=5	21837       	21743.9     	0.399       	6.726       
	r=6	77296       	77311.8     	0.003       	6.730       

		chi-square = 6.730 with df = 2;  p-value = 0.035
	--------------------------------------------------------------

			      bits 24 to 31

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	893         	944.3       	2.787       	2.787       
	r=5	21565       	21743.9     	1.472       	4.259       
	r=6	77542       	77311.8     	0.685       	4.944       

		chi-square = 4.944 with df = 2;  p-value = 0.084
	--------------------------------------------------------------

			      bits 25 to 32

	RANK	OBSERVED	EXPECTED	(O-E)^2/E	SUM

	r<=4	931         	944.3       	0.187       	0.187       
	r=5	21571       	21743.9     	1.375       	1.562       
	r=6	77498       	77311.8     	0.448       	2.011       

		chi-square = 2.011 with df = 2;  p-value = 0.366
	--------------------------------------------------------------
	    TEST SUMMARY, 25 tests on 100,000 random 6x8 matrices
	    These should be 25 uniform [0,1] random variates:
 
	0.650488    	0.097735    	0.777957    	0.611969    	0.823073     
	0.478379    	0.626327    	0.957693    	0.447652    	0.626444     
	0.799012    	0.073299    	0.309028    	0.568671    	0.804902     
	0.386781    	0.796531    	0.408635    	0.613610    	0.236951     
	0.432646    	0.554418    	0.034569    	0.084404    	0.365932    
		The KS test for those 25 supposed UNI's yields
			KS p-value = 0.683614

	|-------------------------------------------------------------|
	|                  THE BITSTREAM TEST                         |
	|The file under test is viewed as a stream of bits. Call them |
	|b1,b2,... .  Consider an alphabet with two "letters", 0 and 1|
	|and think of the stream of bits as a succession of 20-letter |
	|"words", overlapping.  Thus the first word is b1b2...b20, the|
	|second is b2b3...b21, and so on.  The bitstream test counts  |
	|the number of missing 20-letter (20-bit) words in a string of|
	|2^21 overlapping 20-letter words.  There are 2^20 possible 20|
	|letter words.  For a truly random string of 2^21+19 bits, the|
	|number of missing words j should be (very close to) normally |
	|distributed with mean 141,909 and sigma 428.  Thus           |
	| (j-141909)/428 should be a standard normal variate (z score)|
	|that leads to a uniform [0,1) p value.  The test is repeated |
	|twenty times.                                                |
	|-------------------------------------------------------------|

		THE OVERLAPPING 20-TUPLES BITSTREAM  TEST for random_bits.dat
	 (20 bits/word, 2097152 words 20 bitstreams. No. missing words 
	  should average 141909.33 with sigma=428.00.)
	----------------------------------------------------------------
		   BITSTREAM test results for random_bits.dat.

	Bitstream	No. missing words	z-score		p-value
	   1		142104 			 0.45		0.324613
	   2		142374 			 1.09		0.138811
	   3		141938 			 0.07		0.473296
	   4		141485 			-0.99		0.839261
	   5		141957 			 0.11		0.455658
	   6		141311 			-1.40		0.918939
	   7		141881 			-0.07		0.526387
	   8		141947 			 0.09		0.464933
	   9		141425 			-1.13		0.871101
	   10		142360 			 1.05		0.146178
	   11		141702 			-0.48		0.685955
	   12		141642 			-0.62		0.733884
	   13		141748 			-0.38		0.646891
	   14		141860 			-0.12		0.545879
	   15		141920 			 0.02		0.490055
	   16		142069 			 0.37		0.354552
	   17		141131 			-1.82		0.965508
	   18		141850 			-0.14		0.555125
	   19		141390 			-1.21		0.887509
	   20		142132 			 0.52		0.301442
	----------------------------------------------------------------

	|-------------------------------------------------------------|
	|        OPSO means Overlapping-Pairs-Sparse-Occupancy        |
	|The OPSO test considers 2-letter words from an alphabet of   |
	|1024 letters.  Each letter is determined by a specified ten  |
	|bits from a 32-bit integer in the sequence to be tested. OPSO|
	|generates  2^21 (overlapping) 2-letter words  (from 2^21+1   |
	|"keystrokes")  and counts the number of missing words---that |
	|is 2-letter words which do not appear in the entire sequence.|
	|That count should be very close to normally distributed with |
	|mean 141,909, sigma 290. Thus (missingwrds-141909)/290 should|
	|be a standard normal variable. The OPSO test takes 32 bits at|
	|a time from the test file and uses a designated set of ten   |
	|consecutive bits. It then restarts the file for the next de- |
	|signated 10 bits, and so on.                                 |
	|------------------------------------------------------------ |

			   OPSO test for file random_bits.dat

	Bits used	No. missing words	z-score		p-value
	23 to 32  		142250 		 1.1747		0.120053
	22 to 31  		142147 		 0.8196		0.206236
	21 to 30  		141982 		 0.2506		0.401067
	20 to 29  		141833 		-0.2632		0.603804
	19 to 28  		141874 		-0.1218		0.548482
	18 to 27  		142149 		 0.8264		0.204275
	17 to 26  		142052 		 0.4920		0.311372
	16 to 25  		142074 		 0.5678		0.285076
	15 to 24  		141864 		-0.1563		0.562106
	14 to 23  		141626 		-0.9770		0.835715
	13 to 22  		141925 		 0.0540		0.478454
	12 to 21  		141388 		-1.7977		0.963887
	11 to 20  		142759 		 2.9299		0.001695
	10 to 19  		141688 		-0.7632		0.777330
	9 to 18  		141660 		-0.8598		0.805039
	8 to 17  		141806 		-0.3563		0.639196
	7 to 16  		142113 		 0.7023		0.241243
	6 to 15  		142030 		 0.4161		0.338667
	5 to 14  		141731 		-0.6149		0.730700
	4 to 13  		142013 		 0.3575		0.360365
	3 to 12  		141780 		-0.4460		0.672189
	2 to 11  		141718 		-0.6598		0.745296
	1 to 10  		142029 		 0.4127		0.339930
	-----------------------------------------------------------------

	|------------------------------------------------------------ |
	|    OQSO means Overlapping-Quadruples-Sparse-Occupancy       |
	|  The test OQSO is similar, except that it considers 4-letter|
	|words from an alphabet of 32 letters, each letter determined |
	|by a designated string of 5 consecutive bits from the test   |
	|file, elements of which are assumed 32-bit random integers.  |
	|The mean number of missing words in a sequence of 2^21 four- |
	|letter words,  (2^21+3 "keystrokes"), is again 141909, with  |
	|sigma = 295.  The mean is based on theory; sigma comes from  |
	|extensive simulation.                                        |
	|------------------------------------------------------------ |

			   OQSO test for file random_bits.dat

	Bits used	No. missing words	z-score		p-value
	28 to 32  		141502 		-1.3808		0.916327
	27 to 31  		141787 		-0.4147		0.660811
	26 to 30  		142163 		 0.8599		0.194923
	25 to 29  		141965 		 0.1887		0.425159
	24 to 28  		142178 		 0.9107		0.181215
	23 to 27  		142309 		 1.3548		0.087738
	22 to 26  		142080 		 0.5785		0.281449
	21 to 25  		142355 		 1.5107		0.065427
	20 to 24  		141852 		-0.1943		0.577045
	19 to 23  		142272 		 1.2294		0.109463
	18 to 22  		142586 		 2.2938		0.010901
	17 to 21  		142318 		 1.3853		0.082977
	16 to 20  		141781 		-0.4350		0.668225
	15 to 19  		142096 		 0.6328		0.263439
	14 to 18  		141860 		-0.1672		0.566402
	13 to 17  		141862 		-0.1604		0.563733
	12 to 16  		141693 		-0.7333		0.768319
	11 to 15  		142057 		 0.5006		0.308335
	10 to 14  		141957 		 0.1616		0.435813
	9 to 13  		141684 		-0.7638		0.777516
	8 to 12  		142417 		 1.7209		0.042633
	7 to 11  		141931 		 0.0735		0.470721
	6 to 10  		141884 		-0.0859		0.534213
	5 to 9  		141990 		 0.2735		0.392251
	4 to 8  		142251 		 1.1582		0.123390
	3 to 7  		141318 		-2.0045		0.977492
	2 to 6  		141668 		-0.8181		0.793341
	1 to 5  		141784 		-0.4248		0.664526
	-----------------------------------------------------------------

	|------------------------------------------------------------ |
	|    The DNA test considers an alphabet of 4 letters: C,G,A,T,|
	|determined by two designated bits in the sequence of random  |
	|integers being tested.  It considers 10-letter words, so that|
	|as in OPSO and OQSO, there are 2^20 possible words, and the  |
	|mean number of missing words from a string of 2^21  (over-   |
	|lapping)  10-letter  words (2^21+9 "keystrokes") is 141909.  |
	|The standard deviation sigma=339 was determined as for OQSO  |
	|by simulation.  (Sigma for OPSO, 290, is the true value (to  |
	|three places), not determined by simulation.                 |
	|------------------------------------------------------------ |

			   DNA test for file random_bits.dat

	Bits used	No. missing words	z-score		p-value
	31 to 32  		142042 		 0.3914		0.347767
	30 to 31  		141967 		 0.1701		0.432459
	29 to 30  		141171 		-2.1780		0.985296
	28 to 29  		141656 		-0.7473		0.772555
	27 to 28  		142171 		 0.7719		0.220090
	26 to 27  		142144 		 0.6922		0.244393
	25 to 26  		141400 		-1.5024		0.933509
	24 to 25  		142600 		 2.0374		0.020806
	23 to 24  		142162 		 0.7453		0.228033
	22 to 23  		142389 		 1.4150		0.078541
	21 to 22  		142229 		 0.9430		0.172846
	20 to 21  		141519 		-1.1514		0.875219
	19 to 20  		142342 		 1.2763		0.100922
	18 to 19  		141621 		-0.8505		0.802485
	17 to 18  		142090 		 0.5329		0.297034
	16 to 17  		142030 		 0.3560		0.360936
	15 to 16  		141751 		-0.4671		0.679768
	14 to 15  		142118 		 0.6155		0.269097
	13 to 14  		141816 		-0.2753		0.608461
	12 to 13  		142293 		 1.1318		0.128866
	11 to 12  		141468 		-1.3019		0.903518
	10 to 11  		141956 		 0.1377		0.445251
	9 to 10  		142520 		 1.8014		0.035821
	8 to 9  		142066 		 0.4622		0.321986
	7 to 8  		141936 		 0.0787		0.468646
	6 to 7  		142645 		 2.1701		0.014999
	5 to 6  		142123 		 0.6303		0.264251
	4 to 5  		142009 		 0.2940		0.384374
	3 to 4  		142251 		 1.0079		0.156757
	2 to 3  		142397 		 1.4386		0.075138
	1 to 2  		142314 		 1.1937		0.116294
	-----------------------------------------------------------------

	|-------------------------------------------------------------|
	|    This is the COUNT-THE-1''s TEST on a stream of bytes.    |
	|Consider the file under test as a stream of bytes (four per  |
	|32 bit integer).  Each byte can contain from 0 to 8 1''s,    |
	|with probabilities 1,8,28,56,70,56,28,8,1 over 256.  Now let |
	|the stream of bytes provide a string of overlapping  5-letter|
	|words, each "letter" taking values A,B,C,D,E. The letters are|
	|determined by the number of 1''s in a byte: 0,1,or 2 yield A,|
	|3 yields B, 4 yields C, 5 yields D and 6,7 or 8 yield E. Thus|
	|we have a monkey at a typewriter hitting five keys with vari-|
	|ous probabilities (37,56,70,56,37 over 256).  There are 5^5  |
	|possible 5-letter words, and from a string of 256,000 (over- |
	|lapping) 5-letter words, counts are made on the frequencies  |
	|for each word.   The quadratic form in the weak inverse of   |
	|the covariance matrix of the cell counts provides a chisquare|
	|test: Q5-Q4, the difference of the naive Pearson sums of     |
	|(OBS-EXP)^2/EXP on counts for 5- and 4-letter cell counts.   |
	|-------------------------------------------------------------|

		Test result for the byte stream from random_bits.dat
	  (Degrees of freedom: 5^4-5^3=2500; sample size: 2560000)

			chisquare	z-score		p-value
			2506.37		 0.090		0.464108

	|-------------------------------------------------------------|
	|    This is the COUNT-THE-1''s TEST for specific bytes.      |
	|Consider the file under test as a stream of 32-bit integers. |
	|From each integer, a specific byte is chosen , say the left- |
	|most: bits 1 to 8. Each byte can contain from 0 to 8 1''s,   |
	|with probabilitie 1,8,28,56,70,56,28,8,1 over 256.  Now let  |
	|the specified bytes from successive integers provide a string|
	|of (overlapping) 5-letter words, each "letter" taking values |
	|A,B,C,D,E. The letters are determined  by the number of 1''s,|
	|in that byte: 0,1,or 2 ---> A, 3 ---> B, 4 ---> C, 5 ---> D, |
	|and  6,7 or 8 ---> E.  Thus we have a monkey at a typewriter |
	|hitting five keys with with various probabilities: 37,56,70, |
	|56,37 over 256. There are 5^5 possible 5-letter words, and   |
	|from a string of 256,000 (overlapping) 5-letter words, counts|
	|are made on the frequencies for each word. The quadratic form|
	|in the weak inverse of the covariance matrix of the cell     |
	|counts provides a chisquare test: Q5-Q4, the difference of   |
	|the naive Pearson  sums of (OBS-EXP)^2/EXP on counts for 5-  |
	|and 4-letter cell  counts.                                   |
	|-------------------------------------------------------------|

		Test results for specific bytes from random_bits.dat
	  (Degrees of freedom: 5^4-5^3=2500; sample size: 256000)

	bits used	chisquare	z-score		p-value
	1 to 8  	2523.05		 0.326		0.372237
	2 to 9  	2579.53		 1.125		0.130362
	3 to 10  	2530.71		 0.434		0.332053
	4 to 11  	2503.56		 0.050		0.479951
	5 to 12  	2400.25		-1.411		0.920835
	6 to 13  	2519.21		 0.272		0.392949
	7 to 14  	2534.78		 0.492		0.311395
	8 to 15  	2340.97		-2.249		0.987745
	9 to 16  	2540.82		 0.577		0.281855
	10 to 17  	2405.40		-1.338		0.909529
	11 to 18  	2490.96		-0.128		0.550855
	12 to 19  	2446.31		-0.759		0.776157
	13 to 20  	2511.18		 0.158		0.437160
	14 to 21  	2400.46		-1.408		0.920394
	15 to 22  	2512.23		 0.173		0.431340
	16 to 23  	2569.34		 0.981		0.163401
	17 to 24  	2465.64		-0.486		0.686513
	18 to 25  	2432.04		-0.961		0.831735
	19 to 26  	2427.99		-1.018		0.845751
	20 to 27  	2575.01		 1.061		0.144390
	21 to 28  	2543.58		 0.616		0.268834
	22 to 29  	2497.48		-0.036		0.514194
	23 to 30  	2384.28		-1.637		0.949140
	24 to 31  	2426.13		-1.045		0.851921
	25 to 32  	2582.89		 1.172		0.120539
	|-------------------------------------------------------------|
	|              THIS IS A PARKING LOT TEST                     |
	|In a square of side 100, randomly "park" a car---a circle of |
	|radius 1.   Then try to park a 2nd, a 3rd, and so on, each   |
	|time parking "by ear".  That is, if an attempt to park a car |
	|causes a crash with one already parked, try again at a new   |
	|random location. (To avoid path problems, consider parking   |
	|helicopters rather than cars.)   Each attempt leads to either|
	|a crash or a success, the latter followed by an increment to |
	|the list of cars already parked. If we plot n: the number of |
	|attempts, versus k: the number successfully parked, we get a |
	|curve that should be similar to those provided by a perfect  |
	|random number generator.  Theory for the behavior of such a  |
	|random curve seems beyond reach, and as graphics displays are|
	|not available for this battery of tests, a simple characteriz|
	|ation of the random experiment is used: k, the number of cars|
	|successfully parked after n=12,000 attempts. Simulation shows|
	|that k should average 3523 with sigma 21.9 and is very close |
	|to normally distributed.  Thus (k-3523)/21.9 should be a st- |
	|andard normal variable, which, converted to a uniform varia- |
	|ble, provides input to a KSTEST based on a sample of 10.     |
	|-------------------------------------------------------------|

		CDPARK: result of 10 tests on file random_bits.dat
	  (Of 12000 tries, the average no. of successes should be 
	   3523.0 with sigma=21.9)

	   No. succeses		z-score		p-value
		3510		-0.5936		0.723613
		3520		-0.1370		0.554479
		3540		 0.7763		0.218799
		3516		-0.3196		0.625377
		3545		 1.0046		0.157553
		3518		-0.2283		0.590298
		3511		-0.5479		0.708135
		3535		 0.5479		0.291865
		3528		 0.2283		0.409702
		3515		-0.3653		0.642555
	  Square side=100, avg. no. parked=3523.80 sample std.=11.81
	     p-value of the KSTEST for those 10 p-values: 0.475730


	|-------------------------------------------------------------|
	|              THE MINIMUM DISTANCE TEST                      |
	|It does this 100 times:  choose n=8000 random points in a    |
	|square of side 10000.  Find d, the minimum distance between  |
	|the (n^2-n)/2 pairs of points.  If the points are truly inde-|
	|pendent uniform, then d^2, the square of the minimum distance|
	|should be (very close to) exponentially distributed with mean|
	|.995 .  Thus 1-exp(-d^2/.995) should be uniform on [0,1) and |
	|a KSTEST on the resulting 100 values serves as a test of uni-|
	|formity for random points in the square. Test numbers=0 mod 5|
	|are printed but the KSTEST is based on the full set of 100   |
	|random choices of 8000 points in the 10000x10000 square.     |
	|-------------------------------------------------------------|

		This is the MINIMUM DISTANCE test for file random_bits.dat

	Sample no.	 d^2		 mean		equiv uni
	   5		1.9751		1.2321		0.862617
	   10		0.2380		0.9142		0.212766
	   15		1.8802		0.8352		0.848868
	   20		1.1498		0.7255		0.685132
	   25		0.4578		0.6935		0.368761
	   30		1.8764		0.8438		0.848298
	   35		0.2470		0.8726		0.219838
	   40		0.6464		0.8765		0.477772
	   45		1.8605		0.8845		0.845853
	   50		0.3302		0.8594		0.282419
	   55		0.8722		0.8545		0.583784
	   60		0.1641		0.8961		0.152021
	   65		2.1060		0.9103		0.879553
	   70		0.9085		0.9274		0.598724
	   75		1.2703		0.9452		0.721038
	   80		0.3579		0.9392		0.302146
	   85		0.0035		0.9418		0.003463
	   90		0.1291		0.9291		0.121687
	   95		1.5258		0.9393		0.784220
	   100		2.2757		0.9502		0.898440
	--------------------------------------------------------------
	Result of KS test on 100 transformed mindist^2's: p-value=0.650683


	|-------------------------------------------------------------|
	|             THE 3DSPHERES TEST                              |
	|Choose  4000 random points in a cube of edge 1000.  At each  |
	|point, center a sphere large enough to reach the next closest|
	|point. Then the volume of the smallest such sphere is (very  |
	|close to) exponentially distributed with mean 120pi/3.  Thus |
	|the radius cubed is exponential with mean 30. (The mean is   |
	|obtained by extensive simulation).  The 3DSPHERES test gener-|
	|ates 4000 such spheres 20 times.  Each min radius cubed leads|
	|to a uniform variable by means of 1-exp(-r^3/30.), then a    |
	| KSTEST is done on the 20 p-values.                          |
	|-------------------------------------------------------------|

		    The 3DSPHERES test for file random_bits.dat

		sample no	r^3		equiv. uni.
		   1		5.033		0.154460
		   2		28.673		0.615488
		   3		55.192		0.841142
		   4		32.116		0.657175
		   5		9.227		0.264769
		   6		46.907		0.790612
		   7		6.964		0.207162
		   8		115.840		0.978960
		   9		30.363		0.636542
		   10		2.626		0.083814
		   11		14.417		0.381564
		   12		0.647		0.021351
		   13		79.013		0.928193
		   14		12.904		0.349567
		   15		11.930		0.328104
		   16		34.189		0.680066
		   17		3.866		0.120913
		   18		20.836		0.500688
		   19		74.758		0.917249
		   20		25.094		0.566764
	--------------------------------------------------------------
		p-value for KS test on those 20 p-values: 0.999868


	|-------------------------------------------------------------|
	|                 This is the SQUEEZE test                    |
	| Random integers are floated to get uniforms on [0,1). Start-|
	| ing with k=2^31=2147483647, the test finds j, the number of |
	| iterations necessary to reduce k to 1, using the reduction  |
	| k=ceiling(k*U), with U provided by floating integers from   |
	| the file being tested.  Such j''s are found 100,000 times,  |
	| then counts for the number of times j was <=6,7,...,47,>=48 |
	| are used to provide a chi-square test for cell frequencies. |
	|-------------------------------------------------------------|

			RESULTS OF SQUEEZE TEST FOR random_bits.dat

		    Table of standardized frequency counts
		(obs-exp)^2/exp  for j=(1,..,6), 7,...,47,(48,...)
		-0.8  	-0.3  	 0.1  	 0.8  	 1.8  	 0.9  
		-0.5  	 0.4  	-0.5  	 1.4  	 0.2  	-0.9  
		 0.6  	 0.5  	 0.5  	-0.2  	-1.8  	-1.2  
		 1.1  	-0.6  	 0.1  	-0.6  	 0.2  	 2.0  
		-0.4  	-0.2  	 1.1  	 0.2  	 0.2  	-0.9  
		 1.8  	-2.2  	-1.5  	 0.2  	 0.5  	-0.7  
		-1.6  	-0.7  	 1.3  	-1.3  	-0.6  	 0.0  
		-1.1  
		Chi-square with 42 degrees of freedom:41.988000
		z-score=-0.001309, p-value=0.471495
	_____________________________________________________________


	|-------------------------------------------------------------|
	|            The  OVERLAPPING SUMS test                       |
	|Integers are floated to get a sequence U(1),U(2),... of uni- |
	|form [0,1) variables.  Then overlapping sums,                |
	|  S(1)=U(1)+...+U(100), S2=U(2)+...+U(101),... are formed.   |
	|The S''s are virtually normal with a certain covariance mat- |
	|rix.  A linear transformation of the S''s converts them to a |
	|sequence of independent standard normals, which are converted|
	|to uniform variables for a KSTEST.                           |
	|-------------------------------------------------------------|

			Results of the OSUM test for random_bits.dat

			Test no			p-value
			  1 			0.036963
			  2 			0.207521
			  3 			0.762572
			  4 			0.807049
			  5 			0.742338
			  6 			0.638971
			  7 			0.243466
			  8 			0.333624
			  9 			0.552425
			  10 			0.040012
	_____________________________________________________________

		p-value for 10 kstests on 100 kstests:0.630801

	|-------------------------------------------------------------|
	|    This is the RUNS test.  It counts runs up, and runs down,|
	|in a sequence of uniform [0,1) variables, obtained by float- |
	|ing the 32-bit integers in the specified file. This example  |
	|shows how runs are counted: .123,.357,.789,.425,.224,.416,.95|
	|contains an up-run of length 3, a down-run of length 2 and an|
	|up-run of (at least) 2, depending on the next values.  The   |
	|covariance matrices for the runs-up and runs-down are well   |
	|known, leading to chisquare tests for quadratic forms in the |
	|weak inverses of the covariance matrices.  Runs are counted  |
	|for sequences of length 10,000.  This is done ten times. Then|
	|another three sets of ten.                                   |
	|-------------------------------------------------------------|

			The RUNS test for file random_bits.dat
		(Up and down runs in a sequence of 10000 numbers)
				Set 1
		 runs up; ks test for 10 p's: 0.660146
		 runs down; ks test for 10 p's: 0.958641
				Set 2
		 runs up; ks test for 10 p's: 0.893421
		 runs down; ks test for 10 p's: 0.899466

	|-------------------------------------------------------------|
	|This the CRAPS TEST.  It plays 200,000 games of craps, counts|
	|the number of wins and the number of throws necessary to end |
	|each game.  The number of wins should be (very close to) a   |
	|normal with mean 200000p and variance 200000p(1-p), and      |
	|p=244/495.  Throws necessary to complete the game can vary   |
	|from 1 to infinity, but counts for all>21 are lumped with 21.|
	|A chi-square test is made on the no.-of-throws cell counts.  |
	|Each 32-bit integer from the test file provides the value for|
	|the throw of a die, by floating to [0,1), multiplying by 6   |
	|and taking 1 plus the integer part of the result.            |
	|-------------------------------------------------------------|

		RESULTS OF CRAPS TEST FOR random_bits.dat 
	No. of wins:  Observed	Expected
	                 97950        98585.858586
		z-score=-2.844, pvalue=0.99777

	Analysis of Throws-per-Game:

	Throws	Observed	Expected	Chisq	 Sum of (O-E)^2/E
	1	66490		66666.7		0.468		0.468
	2	37922		37654.3		1.903		2.371
	3	27017		26954.7		0.144		2.515
	4	19286		19313.5		0.039		2.554
	5	13849		13851.4		0.000		2.554
	6	9820		9943.5		1.535		4.089
	7	7043		7145.0		1.457		5.546
	8	5099		5139.1		0.312		5.859
	9	3747		3699.9		0.600		6.459
	10	2719		2666.3		1.042		7.501
	11	1867		1923.3		1.650		9.151
	12	1444		1388.7		2.199		11.349
	13	1006		1003.7		0.005		11.355
	14	700		726.1		0.941		12.296
	15	562		525.8		2.487		14.783
	16	373		381.2		0.174		14.957
	17	262		276.5		0.764		15.722
	18	220		200.8		1.830		17.551
	19	169		146.0		3.629		21.180
	20	113		106.2		0.433		21.613
	21	292		287.1		0.083		21.697

	Chisq=  21.70 for 20 degrees of freedom, p= 0.35720

		SUMMARY of craptest on random_bits.dat
	 p-value for no. of wins: 0.997772
	 p-value for throws/game: 0.357199
	_____________________________________________________________


[-- Attachment #3: Type: text/plain, Size: 1 bytes --]



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2008-07-16  1:24 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-16  1:24 Ocaml PRNG Passes Diehard Tests Will Farr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).