From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: *** X-Spam-Status: No, score=3.1 required=5.0 tests=OBSCURED_EMAIL,SPF_SOFTFAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id B72ACBB84 for ; Wed, 16 Jul 2008 03:24:18 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgsGABvqfEgSBwdQmmdsb2JhbACPMoMLAQEBAQEIBQgHEZ1T X-IronPort-AV: E=Sophos;i="4.30,369,1212357600"; d="txt'?scan'208";a="27357863" Received: from discorde.inria.fr ([192.93.2.38]) by mail4-smtp-sop.national.inria.fr with ESMTP; 16 Jul 2008 03:24:18 +0200 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m6G1OH5n012554 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Wed, 16 Jul 2008 03:24:17 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgsGABvqfEgSBwdQmmdsb2JhbACPMoMLAQEBAQEIBQgHEZ1T X-IronPort-AV: E=Sophos;i="4.30,369,1212357600"; d="txt'?scan'208";a="27357860" Received: from biscayne-one-station.mit.edu ([18.7.7.80]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 16 Jul 2008 03:24:16 +0200 Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by biscayne-one-station.mit.edu (8.13.6/8.9.2) with ESMTP id m6G1OD0e022436 for ; Tue, 15 Jul 2008 21:24:13 -0400 (EDT) Received: from [192.168.11.8] (dhcp-219-180.mtk.nao.ac.jp [133.40.219.180]) (authenticated bits=0) (User authenticated as farr@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id m6G1O76s009440 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 15 Jul 2008 21:24:11 -0400 (EDT) Message-Id: <38585173-A301-4E1D-8718-33DF0DE6DF58@mit.edu> From: Will Farr To: caml-list@inria.fr Content-Type: multipart/mixed; boundary=Apple-Mail-5--506437555 Mime-Version: 1.0 (Apple Message framework v926) Subject: Ocaml PRNG Passes Diehard Tests Date: Wed, 16 Jul 2008 10:24:07 +0900 X-Mailer: Apple Mail (2.926) X-Scanned-By: MIMEDefang 2.42 X-Miltered: at discorde with ID 487D4DC1.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 prng:01 ocaml:01 prng:01 wikipedia:01 wiki:01 stdlib:01 0,1:01 permutations:01 matrices:01 matrices:01 preceded:01 lambda:01 integers:01 lambda:01 X-Attachments: name="diehard_test_output.txt" name="diehard_test_output.txt" --Apple-Mail-5--506437555 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Hello everyone, I was curious yesterday to see how good the OCaml PRNG algorithm was. After a bit of googling, I couldn't find any tests of it with the "gold standard" of PRNG tests, Diehard (see http://en.wikipedia.org/wiki/Diehard_tests ; from that site, there is a link to obtain C and FORTRAN programs which test a binary file of random bits). So, I ran the Diehard test suite myself, using about 30MB of random bits generated from the stdlib Random module. Good news: it passed. Attached is the output of the test; basically, "passing" means that none of the P-values was too close to 0.000000 or 1.000000. See the Diehard NOTES file for more information, and also the note at the top of the output regarding the significance of large or small p-values. I just wanted to send this message so that future googlers like me yesterday will know that this test has been run, and that the OCaml PRNG passed. Will --Apple-Mail-5--506437555 Content-Disposition: attachment; filename=diehard_test_output.txt Content-Type: text/plain; x-unix-mode=0644; name="diehard_test_output.txt" Content-Transfer-Encoding: 7bit NOTE Most of the tests in DIEHARD return a p-value, which should be uniform on [0,1) if the input file contains truly independent random bits. Those p-values are obtained by p=1-F(X), where F is the assumed distribution of the sample random variable X---often normal. But that assumed F is often just an asymptotic approximation, for which the fit will be worst in the tails. Thus you should not be surprised with occasion- al p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p`s of 0 or 1 to six or more places. By all means, do not, as a Statistician might, think that a p < .025 or p> .975 means that the RNG has "failed the test at the .05 level". Such p`s happen among the hundreds that DIEHARD produces, even with good RNGs. So keep in mind that "p happens" Enter the name of the file to be tested. This must be a form="unformatted",access="direct" binary file of about 10-12 million bytes. Enter file name: HERE ARE YOUR CHOICES: 1 Birthday Spacings 2 Overlapping Permutations 3 Ranks of 31x31 and 32x32 matrices 4 Ranks of 6x8 Matrices 5 Monkey Tests on 20-bit Words 6 Monkey Tests OPSO,OQSO,DNA 7 Count the 1`s in a Stream of Bytes 8 Count the 1`s in Specific Bytes 9 Parking Lot Test 10 Minimum Distance Test 11 Random Spheres Test 12 The Sqeeze Test 13 Overlapping Sums Test 14 Runs Test 15 The Craps Test 16 All of the above To choose any particular tests, enter corresponding numbers. Enter 16 for all tests. If you want to perform all but a few tests, enter corresponding numbers preceded by "-" sign. Tests are executed in the order they are entered. Enter your choices. |-------------------------------------------------------------| | This is the BIRTHDAY SPACINGS TEST | |Choose m birthdays in a "year" of n days. List the spacings | |between the birthdays. Let j be the number of values that | |occur more than once in that list, then j is asymptotically | |Poisson distributed with mean m^3/(4n). Experience shows n | |must be quite large, say n>=2^18, for comparing the results | |to the Poisson distribution with that mean. This test uses | |n=2^24 and m=2^10, so that the underlying distribution for j | |is taken to be Poisson with lambda=2^30/(2^26)=16. A sample | |of 200 j''s is taken, and a chi-square goodness of fit test | |provides a p value. The first test uses bits 1-24 (counting | |from the left) from integers in the specified file. Then the| |file is closed and reopened, then bits 2-25 of the same inte-| |gers are used to provide birthdays, and so on to bits 9-32. | |Each set of bits provides a p-value, and the nine p-values | |provide a sample for a KSTEST. | |------------------------------------------------------------ | RESULTS OF BIRTHDAY SPACINGS TEST FOR random_bits.dat (no_bdays=1024, no_days/yr=2^24, lambda=16.00, sample size=500) Bits used mean chisqr p-value 1 to 24 15.62 13.5401 0.699348 2 to 25 15.65 24.4252 0.108335 3 to 26 15.63 18.7935 0.340514 4 to 27 15.88 20.6086 0.244290 5 to 28 15.79 14.0140 0.666114 6 to 29 15.55 13.0174 0.735020 7 to 30 16.08 25.8071 0.078047 8 to 31 15.82 17.5427 0.418225 9 to 32 15.60 29.1235 0.033404 degree of freedoms is: 17 --------------------------------------------------------------- p-value for KStest on those 9 p-values: 0.279683 |-------------------------------------------------------------| | THE OVERLAPPING 5-PERMUTATION TEST | |This is the OPERM5 test. It looks at a sequence of one mill-| |ion 32-bit random integers. Each set of five consecutive | |integers can be in one of 120 states, for the 5! possible or-| |derings of five numbers. Thus the 5th, 6th, 7th,...numbers | |each provide a state. As many thousands of state transitions | |are observed, cumulative counts are made of the number of | |occurences of each state. Then the quadratic form in the | |weak inverse of the 120x120 covariance matrix yields a test | |equivalent to the likelihood ratio test that the 120 cell | |counts came from the specified (asymptotically) normal dis- | |tribution with the specified 120x120 covariance matrix (with | |rank 99). This version uses 1,000,000 integers, twice. | |-------------------------------------------------------------| OPERM5 test for file (For samples of 1,000,000 consecutive 5-tuples) sample 1 chisquare=112.576986 with df=99; p-value= 0.165806 _______________________________________________________________ sample 2 chisquare=112.879031 with df=99; p-value= 0.160951 _______________________________________________________________ |-------------------------------------------------------------| |This is the BINARY RANK TEST for 31x31 matrices. The leftmost| |31 bits of 31 random integers from the test sequence are used| |to form a 31x31 binary matrix over the field {0,1}. The rank | |is determined. That rank can be from 0 to 31, but ranks< 28 | |are rare, and their counts are pooled with those for rank 28.| |Ranks are found for 40,000 such random matrices and a chisqu-| |are test is performed on counts for ranks 31,30,28 and <=28. | |-------------------------------------------------------------| Rank test for binary matrices (31x31) from random_bits.dat RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=28 218 211.4 0.205 0.205 r=29 5035 5134.0 1.909 2.114 r=30 23321 23103.0 2.056 4.171 r=31 11426 11551.5 1.364 5.534 chi-square = 5.534 with df = 3; p-value = 0.137 -------------------------------------------------------------- |-------------------------------------------------------------| |This is the BINARY RANK TEST for 32x32 matrices. A random 32x| |32 binary matrix is formed, each row a 32-bit random integer.| |The rank is determined. That rank can be from 0 to 32, ranks | |less than 29 are rare, and their counts are pooled with those| |for rank 29. Ranks are found for 40,000 such random matrices| |and a chisquare test is performed on counts for ranks 32,31,| |30 and <=29. | |-------------------------------------------------------------| Rank test for binary matrices (32x32) from random_bits.dat RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=29 211 211.4 0.001 0.001 r=30 5071 5134.0 0.773 0.774 r=31 23128 23103.0 0.027 0.801 r=32 11590 11551.5 0.128 0.929 chi-square = 0.929 with df = 3; p-value = 0.818 -------------------------------------------------------------- |-------------------------------------------------------------| |This is the BINARY RANK TEST for 6x8 matrices. From each of | |six random 32-bit integers from the generator under test, a | |specified byte is chosen, and the resulting six bytes form a | |6x8 binary matrix whose rank is determined. That rank can be| |from 0 to 6, but ranks 0,1,2,3 are rare; their counts are | |pooled with those for rank 4. Ranks are found for 100,000 | |random matrices, and a chi-square test is performed on | |counts for ranks 6,5 and (0,...,4) (pooled together). | |-------------------------------------------------------------| Rank test for binary matrices (6x8) from random_bits.dat bits 1 to 8 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 953 944.3 0.080 0.080 r=5 21627 21743.9 0.628 0.709 r=6 77420 77311.8 0.151 0.860 chi-square = 0.860 with df = 2; p-value = 0.650 -------------------------------------------------------------- bits 2 to 9 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 926 944.3 0.355 0.355 r=5 21478 21743.9 3.252 3.606 r=6 77596 77311.8 1.045 4.651 chi-square = 4.651 with df = 2; p-value = 0.098 -------------------------------------------------------------- bits 3 to 10 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 936 944.3 0.073 0.073 r=5 21831 21743.9 0.349 0.422 r=6 77233 77311.8 0.080 0.502 chi-square = 0.502 with df = 2; p-value = 0.778 -------------------------------------------------------------- bits 4 to 11 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 926 944.3 0.355 0.355 r=5 21645 21743.9 0.450 0.804 r=6 77429 77311.8 0.178 0.982 chi-square = 0.982 with df = 2; p-value = 0.612 -------------------------------------------------------------- bits 5 to 12 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 926 944.3 0.355 0.355 r=5 21771 21743.9 0.034 0.388 r=6 77303 77311.8 0.001 0.389 chi-square = 0.389 with df = 2; p-value = 0.823 -------------------------------------------------------------- bits 6 to 13 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 978 944.3 1.203 1.203 r=5 21803 21743.9 0.161 1.363 r=6 77219 77311.8 0.111 1.475 chi-square = 1.475 with df = 2; p-value = 0.478 -------------------------------------------------------------- bits 7 to 14 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 973 944.3 0.872 0.872 r=5 21707 21743.9 0.063 0.935 r=6 77320 77311.8 0.001 0.936 chi-square = 0.936 with df = 2; p-value = 0.626 -------------------------------------------------------------- bits 8 to 15 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 948 944.3 0.014 0.014 r=5 21778 21743.9 0.053 0.068 r=6 77274 77311.8 0.018 0.086 chi-square = 0.086 with df = 2; p-value = 0.958 -------------------------------------------------------------- bits 9 to 16 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 979 944.3 1.275 1.275 r=5 21810 21743.9 0.201 1.476 r=6 77211 77311.8 0.131 1.607 chi-square = 1.607 with df = 2; p-value = 0.448 -------------------------------------------------------------- bits 10 to 17 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 923 944.3 0.480 0.480 r=5 21836 21743.9 0.390 0.871 r=6 77241 77311.8 0.065 0.935 chi-square = 0.935 with df = 2; p-value = 0.626 -------------------------------------------------------------- bits 11 to 18 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 953 944.3 0.080 0.080 r=5 21821 21743.9 0.273 0.354 r=6 77226 77311.8 0.095 0.449 chi-square = 0.449 with df = 2; p-value = 0.799 -------------------------------------------------------------- bits 12 to 19 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 991 944.3 2.310 2.310 r=5 21512 21743.9 2.473 4.783 r=6 77497 77311.8 0.444 5.226 chi-square = 5.226 with df = 2; p-value = 0.073 -------------------------------------------------------------- bits 13 to 20 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 982 944.3 1.505 1.505 r=5 21617 21743.9 0.741 2.246 r=6 77401 77311.8 0.103 2.349 chi-square = 2.349 with df = 2; p-value = 0.309 -------------------------------------------------------------- bits 14 to 21 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 968 944.3 0.595 0.595 r=5 21644 21743.9 0.459 1.054 r=6 77388 77311.8 0.075 1.129 chi-square = 1.129 with df = 2; p-value = 0.569 -------------------------------------------------------------- bits 15 to 22 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 954 944.3 0.100 0.100 r=5 21817 21743.9 0.246 0.345 r=6 77229 77311.8 0.089 0.434 chi-square = 0.434 with df = 2; p-value = 0.805 -------------------------------------------------------------- bits 16 to 23 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 918 944.3 0.732 0.732 r=5 21890 21743.9 0.982 1.714 r=6 77192 77311.8 0.186 1.900 chi-square = 1.900 with df = 2; p-value = 0.387 -------------------------------------------------------------- bits 17 to 24 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 924 944.3 0.436 0.436 r=5 21764 21743.9 0.019 0.455 r=6 77312 77311.8 0.000 0.455 chi-square = 0.455 with df = 2; p-value = 0.797 -------------------------------------------------------------- bits 18 to 25 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 912 944.3 1.105 1.105 r=5 21644 21743.9 0.459 1.564 r=6 77444 77311.8 0.226 1.790 chi-square = 1.790 with df = 2; p-value = 0.409 -------------------------------------------------------------- bits 19 to 26 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 919 944.3 0.678 0.678 r=5 21679 21743.9 0.194 0.872 r=6 77402 77311.8 0.105 0.977 chi-square = 0.977 with df = 2; p-value = 0.614 -------------------------------------------------------------- bits 20 to 27 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 941 944.3 0.012 0.012 r=5 21524 21743.9 2.224 2.235 r=6 77535 77311.8 0.644 2.880 chi-square = 2.880 with df = 2; p-value = 0.237 -------------------------------------------------------------- bits 21 to 28 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 909 944.3 1.320 1.320 r=5 21828 21743.9 0.325 1.645 r=6 77263 77311.8 0.031 1.676 chi-square = 1.676 with df = 2; p-value = 0.433 -------------------------------------------------------------- bits 22 to 29 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 923 944.3 0.480 0.480 r=5 21640 21743.9 0.496 0.977 r=6 77437 77311.8 0.203 1.180 chi-square = 1.180 with df = 2; p-value = 0.554 -------------------------------------------------------------- bits 23 to 30 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 867 944.3 6.328 6.328 r=5 21837 21743.9 0.399 6.726 r=6 77296 77311.8 0.003 6.730 chi-square = 6.730 with df = 2; p-value = 0.035 -------------------------------------------------------------- bits 24 to 31 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 893 944.3 2.787 2.787 r=5 21565 21743.9 1.472 4.259 r=6 77542 77311.8 0.685 4.944 chi-square = 4.944 with df = 2; p-value = 0.084 -------------------------------------------------------------- bits 25 to 32 RANK OBSERVED EXPECTED (O-E)^2/E SUM r<=4 931 944.3 0.187 0.187 r=5 21571 21743.9 1.375 1.562 r=6 77498 77311.8 0.448 2.011 chi-square = 2.011 with df = 2; p-value = 0.366 -------------------------------------------------------------- TEST SUMMARY, 25 tests on 100,000 random 6x8 matrices These should be 25 uniform [0,1] random variates: 0.650488 0.097735 0.777957 0.611969 0.823073 0.478379 0.626327 0.957693 0.447652 0.626444 0.799012 0.073299 0.309028 0.568671 0.804902 0.386781 0.796531 0.408635 0.613610 0.236951 0.432646 0.554418 0.034569 0.084404 0.365932 The KS test for those 25 supposed UNI's yields KS p-value = 0.683614 |-------------------------------------------------------------| | THE BITSTREAM TEST | |The file under test is viewed as a stream of bits. Call them | |b1,b2,... . Consider an alphabet with two "letters", 0 and 1| |and think of the stream of bits as a succession of 20-letter | |"words", overlapping. Thus the first word is b1b2...b20, the| |second is b2b3...b21, and so on. The bitstream test counts | |the number of missing 20-letter (20-bit) words in a string of| |2^21 overlapping 20-letter words. There are 2^20 possible 20| |letter words. For a truly random string of 2^21+19 bits, the| |number of missing words j should be (very close to) normally | |distributed with mean 141,909 and sigma 428. Thus | | (j-141909)/428 should be a standard normal variate (z score)| |that leads to a uniform [0,1) p value. The test is repeated | |twenty times. | |-------------------------------------------------------------| THE OVERLAPPING 20-TUPLES BITSTREAM TEST for random_bits.dat (20 bits/word, 2097152 words 20 bitstreams. No. missing words should average 141909.33 with sigma=428.00.) ---------------------------------------------------------------- BITSTREAM test results for random_bits.dat. Bitstream No. missing words z-score p-value 1 142104 0.45 0.324613 2 142374 1.09 0.138811 3 141938 0.07 0.473296 4 141485 -0.99 0.839261 5 141957 0.11 0.455658 6 141311 -1.40 0.918939 7 141881 -0.07 0.526387 8 141947 0.09 0.464933 9 141425 -1.13 0.871101 10 142360 1.05 0.146178 11 141702 -0.48 0.685955 12 141642 -0.62 0.733884 13 141748 -0.38 0.646891 14 141860 -0.12 0.545879 15 141920 0.02 0.490055 16 142069 0.37 0.354552 17 141131 -1.82 0.965508 18 141850 -0.14 0.555125 19 141390 -1.21 0.887509 20 142132 0.52 0.301442 ---------------------------------------------------------------- |-------------------------------------------------------------| | OPSO means Overlapping-Pairs-Sparse-Occupancy | |The OPSO test considers 2-letter words from an alphabet of | |1024 letters. Each letter is determined by a specified ten | |bits from a 32-bit integer in the sequence to be tested. OPSO| |generates 2^21 (overlapping) 2-letter words (from 2^21+1 | |"keystrokes") and counts the number of missing words---that | |is 2-letter words which do not appear in the entire sequence.| |That count should be very close to normally distributed with | |mean 141,909, sigma 290. Thus (missingwrds-141909)/290 should| |be a standard normal variable. The OPSO test takes 32 bits at| |a time from the test file and uses a designated set of ten | |consecutive bits. It then restarts the file for the next de- | |signated 10 bits, and so on. | |------------------------------------------------------------ | OPSO test for file random_bits.dat Bits used No. missing words z-score p-value 23 to 32 142250 1.1747 0.120053 22 to 31 142147 0.8196 0.206236 21 to 30 141982 0.2506 0.401067 20 to 29 141833 -0.2632 0.603804 19 to 28 141874 -0.1218 0.548482 18 to 27 142149 0.8264 0.204275 17 to 26 142052 0.4920 0.311372 16 to 25 142074 0.5678 0.285076 15 to 24 141864 -0.1563 0.562106 14 to 23 141626 -0.9770 0.835715 13 to 22 141925 0.0540 0.478454 12 to 21 141388 -1.7977 0.963887 11 to 20 142759 2.9299 0.001695 10 to 19 141688 -0.7632 0.777330 9 to 18 141660 -0.8598 0.805039 8 to 17 141806 -0.3563 0.639196 7 to 16 142113 0.7023 0.241243 6 to 15 142030 0.4161 0.338667 5 to 14 141731 -0.6149 0.730700 4 to 13 142013 0.3575 0.360365 3 to 12 141780 -0.4460 0.672189 2 to 11 141718 -0.6598 0.745296 1 to 10 142029 0.4127 0.339930 ----------------------------------------------------------------- |------------------------------------------------------------ | | OQSO means Overlapping-Quadruples-Sparse-Occupancy | | The test OQSO is similar, except that it considers 4-letter| |words from an alphabet of 32 letters, each letter determined | |by a designated string of 5 consecutive bits from the test | |file, elements of which are assumed 32-bit random integers. | |The mean number of missing words in a sequence of 2^21 four- | |letter words, (2^21+3 "keystrokes"), is again 141909, with | |sigma = 295. The mean is based on theory; sigma comes from | |extensive simulation. | |------------------------------------------------------------ | OQSO test for file random_bits.dat Bits used No. missing words z-score p-value 28 to 32 141502 -1.3808 0.916327 27 to 31 141787 -0.4147 0.660811 26 to 30 142163 0.8599 0.194923 25 to 29 141965 0.1887 0.425159 24 to 28 142178 0.9107 0.181215 23 to 27 142309 1.3548 0.087738 22 to 26 142080 0.5785 0.281449 21 to 25 142355 1.5107 0.065427 20 to 24 141852 -0.1943 0.577045 19 to 23 142272 1.2294 0.109463 18 to 22 142586 2.2938 0.010901 17 to 21 142318 1.3853 0.082977 16 to 20 141781 -0.4350 0.668225 15 to 19 142096 0.6328 0.263439 14 to 18 141860 -0.1672 0.566402 13 to 17 141862 -0.1604 0.563733 12 to 16 141693 -0.7333 0.768319 11 to 15 142057 0.5006 0.308335 10 to 14 141957 0.1616 0.435813 9 to 13 141684 -0.7638 0.777516 8 to 12 142417 1.7209 0.042633 7 to 11 141931 0.0735 0.470721 6 to 10 141884 -0.0859 0.534213 5 to 9 141990 0.2735 0.392251 4 to 8 142251 1.1582 0.123390 3 to 7 141318 -2.0045 0.977492 2 to 6 141668 -0.8181 0.793341 1 to 5 141784 -0.4248 0.664526 ----------------------------------------------------------------- |------------------------------------------------------------ | | The DNA test considers an alphabet of 4 letters: C,G,A,T,| |determined by two designated bits in the sequence of random | |integers being tested. It considers 10-letter words, so that| |as in OPSO and OQSO, there are 2^20 possible words, and the | |mean number of missing words from a string of 2^21 (over- | |lapping) 10-letter words (2^21+9 "keystrokes") is 141909. | |The standard deviation sigma=339 was determined as for OQSO | |by simulation. (Sigma for OPSO, 290, is the true value (to | |three places), not determined by simulation. | |------------------------------------------------------------ | DNA test for file random_bits.dat Bits used No. missing words z-score p-value 31 to 32 142042 0.3914 0.347767 30 to 31 141967 0.1701 0.432459 29 to 30 141171 -2.1780 0.985296 28 to 29 141656 -0.7473 0.772555 27 to 28 142171 0.7719 0.220090 26 to 27 142144 0.6922 0.244393 25 to 26 141400 -1.5024 0.933509 24 to 25 142600 2.0374 0.020806 23 to 24 142162 0.7453 0.228033 22 to 23 142389 1.4150 0.078541 21 to 22 142229 0.9430 0.172846 20 to 21 141519 -1.1514 0.875219 19 to 20 142342 1.2763 0.100922 18 to 19 141621 -0.8505 0.802485 17 to 18 142090 0.5329 0.297034 16 to 17 142030 0.3560 0.360936 15 to 16 141751 -0.4671 0.679768 14 to 15 142118 0.6155 0.269097 13 to 14 141816 -0.2753 0.608461 12 to 13 142293 1.1318 0.128866 11 to 12 141468 -1.3019 0.903518 10 to 11 141956 0.1377 0.445251 9 to 10 142520 1.8014 0.035821 8 to 9 142066 0.4622 0.321986 7 to 8 141936 0.0787 0.468646 6 to 7 142645 2.1701 0.014999 5 to 6 142123 0.6303 0.264251 4 to 5 142009 0.2940 0.384374 3 to 4 142251 1.0079 0.156757 2 to 3 142397 1.4386 0.075138 1 to 2 142314 1.1937 0.116294 ----------------------------------------------------------------- |-------------------------------------------------------------| | This is the COUNT-THE-1''s TEST on a stream of bytes. | |Consider the file under test as a stream of bytes (four per | |32 bit integer). Each byte can contain from 0 to 8 1''s, | |with probabilities 1,8,28,56,70,56,28,8,1 over 256. Now let | |the stream of bytes provide a string of overlapping 5-letter| |words, each "letter" taking values A,B,C,D,E. The letters are| |determined by the number of 1''s in a byte: 0,1,or 2 yield A,| |3 yields B, 4 yields C, 5 yields D and 6,7 or 8 yield E. Thus| |we have a monkey at a typewriter hitting five keys with vari-| |ous probabilities (37,56,70,56,37 over 256). There are 5^5 | |possible 5-letter words, and from a string of 256,000 (over- | |lapping) 5-letter words, counts are made on the frequencies | |for each word. The quadratic form in the weak inverse of | |the covariance matrix of the cell counts provides a chisquare| |test: Q5-Q4, the difference of the naive Pearson sums of | |(OBS-EXP)^2/EXP on counts for 5- and 4-letter cell counts. | |-------------------------------------------------------------| Test result for the byte stream from random_bits.dat (Degrees of freedom: 5^4-5^3=2500; sample size: 2560000) chisquare z-score p-value 2506.37 0.090 0.464108 |-------------------------------------------------------------| | This is the COUNT-THE-1''s TEST for specific bytes. | |Consider the file under test as a stream of 32-bit integers. | |From each integer, a specific byte is chosen , say the left- | |most: bits 1 to 8. Each byte can contain from 0 to 8 1''s, | |with probabilitie 1,8,28,56,70,56,28,8,1 over 256. Now let | |the specified bytes from successive integers provide a string| |of (overlapping) 5-letter words, each "letter" taking values | |A,B,C,D,E. The letters are determined by the number of 1''s,| |in that byte: 0,1,or 2 ---> A, 3 ---> B, 4 ---> C, 5 ---> D, | |and 6,7 or 8 ---> E. Thus we have a monkey at a typewriter | |hitting five keys with with various probabilities: 37,56,70, | |56,37 over 256. There are 5^5 possible 5-letter words, and | |from a string of 256,000 (overlapping) 5-letter words, counts| |are made on the frequencies for each word. The quadratic form| |in the weak inverse of the covariance matrix of the cell | |counts provides a chisquare test: Q5-Q4, the difference of | |the naive Pearson sums of (OBS-EXP)^2/EXP on counts for 5- | |and 4-letter cell counts. | |-------------------------------------------------------------| Test results for specific bytes from random_bits.dat (Degrees of freedom: 5^4-5^3=2500; sample size: 256000) bits used chisquare z-score p-value 1 to 8 2523.05 0.326 0.372237 2 to 9 2579.53 1.125 0.130362 3 to 10 2530.71 0.434 0.332053 4 to 11 2503.56 0.050 0.479951 5 to 12 2400.25 -1.411 0.920835 6 to 13 2519.21 0.272 0.392949 7 to 14 2534.78 0.492 0.311395 8 to 15 2340.97 -2.249 0.987745 9 to 16 2540.82 0.577 0.281855 10 to 17 2405.40 -1.338 0.909529 11 to 18 2490.96 -0.128 0.550855 12 to 19 2446.31 -0.759 0.776157 13 to 20 2511.18 0.158 0.437160 14 to 21 2400.46 -1.408 0.920394 15 to 22 2512.23 0.173 0.431340 16 to 23 2569.34 0.981 0.163401 17 to 24 2465.64 -0.486 0.686513 18 to 25 2432.04 -0.961 0.831735 19 to 26 2427.99 -1.018 0.845751 20 to 27 2575.01 1.061 0.144390 21 to 28 2543.58 0.616 0.268834 22 to 29 2497.48 -0.036 0.514194 23 to 30 2384.28 -1.637 0.949140 24 to 31 2426.13 -1.045 0.851921 25 to 32 2582.89 1.172 0.120539 |-------------------------------------------------------------| | THIS IS A PARKING LOT TEST | |In a square of side 100, randomly "park" a car---a circle of | |radius 1. Then try to park a 2nd, a 3rd, and so on, each | |time parking "by ear". That is, if an attempt to park a car | |causes a crash with one already parked, try again at a new | |random location. (To avoid path problems, consider parking | |helicopters rather than cars.) Each attempt leads to either| |a crash or a success, the latter followed by an increment to | |the list of cars already parked. If we plot n: the number of | |attempts, versus k: the number successfully parked, we get a | |curve that should be similar to those provided by a perfect | |random number generator. Theory for the behavior of such a | |random curve seems beyond reach, and as graphics displays are| |not available for this battery of tests, a simple characteriz| |ation of the random experiment is used: k, the number of cars| |successfully parked after n=12,000 attempts. Simulation shows| |that k should average 3523 with sigma 21.9 and is very close | |to normally distributed. Thus (k-3523)/21.9 should be a st- | |andard normal variable, which, converted to a uniform varia- | |ble, provides input to a KSTEST based on a sample of 10. | |-------------------------------------------------------------| CDPARK: result of 10 tests on file random_bits.dat (Of 12000 tries, the average no. of successes should be 3523.0 with sigma=21.9) No. succeses z-score p-value 3510 -0.5936 0.723613 3520 -0.1370 0.554479 3540 0.7763 0.218799 3516 -0.3196 0.625377 3545 1.0046 0.157553 3518 -0.2283 0.590298 3511 -0.5479 0.708135 3535 0.5479 0.291865 3528 0.2283 0.409702 3515 -0.3653 0.642555 Square side=100, avg. no. parked=3523.80 sample std.=11.81 p-value of the KSTEST for those 10 p-values: 0.475730 |-------------------------------------------------------------| | THE MINIMUM DISTANCE TEST | |It does this 100 times: choose n=8000 random points in a | |square of side 10000. Find d, the minimum distance between | |the (n^2-n)/2 pairs of points. If the points are truly inde-| |pendent uniform, then d^2, the square of the minimum distance| |should be (very close to) exponentially distributed with mean| |.995 . Thus 1-exp(-d^2/.995) should be uniform on [0,1) and | |a KSTEST on the resulting 100 values serves as a test of uni-| |formity for random points in the square. Test numbers=0 mod 5| |are printed but the KSTEST is based on the full set of 100 | |random choices of 8000 points in the 10000x10000 square. | |-------------------------------------------------------------| This is the MINIMUM DISTANCE test for file random_bits.dat Sample no. d^2 mean equiv uni 5 1.9751 1.2321 0.862617 10 0.2380 0.9142 0.212766 15 1.8802 0.8352 0.848868 20 1.1498 0.7255 0.685132 25 0.4578 0.6935 0.368761 30 1.8764 0.8438 0.848298 35 0.2470 0.8726 0.219838 40 0.6464 0.8765 0.477772 45 1.8605 0.8845 0.845853 50 0.3302 0.8594 0.282419 55 0.8722 0.8545 0.583784 60 0.1641 0.8961 0.152021 65 2.1060 0.9103 0.879553 70 0.9085 0.9274 0.598724 75 1.2703 0.9452 0.721038 80 0.3579 0.9392 0.302146 85 0.0035 0.9418 0.003463 90 0.1291 0.9291 0.121687 95 1.5258 0.9393 0.784220 100 2.2757 0.9502 0.898440 -------------------------------------------------------------- Result of KS test on 100 transformed mindist^2's: p-value=0.650683 |-------------------------------------------------------------| | THE 3DSPHERES TEST | |Choose 4000 random points in a cube of edge 1000. At each | |point, center a sphere large enough to reach the next closest| |point. Then the volume of the smallest such sphere is (very | |close to) exponentially distributed with mean 120pi/3. Thus | |the radius cubed is exponential with mean 30. (The mean is | |obtained by extensive simulation). The 3DSPHERES test gener-| |ates 4000 such spheres 20 times. Each min radius cubed leads| |to a uniform variable by means of 1-exp(-r^3/30.), then a | | KSTEST is done on the 20 p-values. | |-------------------------------------------------------------| The 3DSPHERES test for file random_bits.dat sample no r^3 equiv. uni. 1 5.033 0.154460 2 28.673 0.615488 3 55.192 0.841142 4 32.116 0.657175 5 9.227 0.264769 6 46.907 0.790612 7 6.964 0.207162 8 115.840 0.978960 9 30.363 0.636542 10 2.626 0.083814 11 14.417 0.381564 12 0.647 0.021351 13 79.013 0.928193 14 12.904 0.349567 15 11.930 0.328104 16 34.189 0.680066 17 3.866 0.120913 18 20.836 0.500688 19 74.758 0.917249 20 25.094 0.566764 -------------------------------------------------------------- p-value for KS test on those 20 p-values: 0.999868 |-------------------------------------------------------------| | This is the SQUEEZE test | | Random integers are floated to get uniforms on [0,1). Start-| | ing with k=2^31=2147483647, the test finds j, the number of | | iterations necessary to reduce k to 1, using the reduction | | k=ceiling(k*U), with U provided by floating integers from | | the file being tested. Such j''s are found 100,000 times, | | then counts for the number of times j was <=6,7,...,47,>=48 | | are used to provide a chi-square test for cell frequencies. | |-------------------------------------------------------------| RESULTS OF SQUEEZE TEST FOR random_bits.dat Table of standardized frequency counts (obs-exp)^2/exp for j=(1,..,6), 7,...,47,(48,...) -0.8 -0.3 0.1 0.8 1.8 0.9 -0.5 0.4 -0.5 1.4 0.2 -0.9 0.6 0.5 0.5 -0.2 -1.8 -1.2 1.1 -0.6 0.1 -0.6 0.2 2.0 -0.4 -0.2 1.1 0.2 0.2 -0.9 1.8 -2.2 -1.5 0.2 0.5 -0.7 -1.6 -0.7 1.3 -1.3 -0.6 0.0 -1.1 Chi-square with 42 degrees of freedom:41.988000 z-score=-0.001309, p-value=0.471495 _____________________________________________________________ |-------------------------------------------------------------| | The OVERLAPPING SUMS test | |Integers are floated to get a sequence U(1),U(2),... of uni- | |form [0,1) variables. Then overlapping sums, | | S(1)=U(1)+...+U(100), S2=U(2)+...+U(101),... are formed. | |The S''s are virtually normal with a certain covariance mat- | |rix. A linear transformation of the S''s converts them to a | |sequence of independent standard normals, which are converted| |to uniform variables for a KSTEST. | |-------------------------------------------------------------| Results of the OSUM test for random_bits.dat Test no p-value 1 0.036963 2 0.207521 3 0.762572 4 0.807049 5 0.742338 6 0.638971 7 0.243466 8 0.333624 9 0.552425 10 0.040012 _____________________________________________________________ p-value for 10 kstests on 100 kstests:0.630801 |-------------------------------------------------------------| | This is the RUNS test. It counts runs up, and runs down,| |in a sequence of uniform [0,1) variables, obtained by float- | |ing the 32-bit integers in the specified file. This example | |shows how runs are counted: .123,.357,.789,.425,.224,.416,.95| |contains an up-run of length 3, a down-run of length 2 and an| |up-run of (at least) 2, depending on the next values. The | |covariance matrices for the runs-up and runs-down are well | |known, leading to chisquare tests for quadratic forms in the | |weak inverses of the covariance matrices. Runs are counted | |for sequences of length 10,000. This is done ten times. Then| |another three sets of ten. | |-------------------------------------------------------------| The RUNS test for file random_bits.dat (Up and down runs in a sequence of 10000 numbers) Set 1 runs up; ks test for 10 p's: 0.660146 runs down; ks test for 10 p's: 0.958641 Set 2 runs up; ks test for 10 p's: 0.893421 runs down; ks test for 10 p's: 0.899466 |-------------------------------------------------------------| |This the CRAPS TEST. It plays 200,000 games of craps, counts| |the number of wins and the number of throws necessary to end | |each game. The number of wins should be (very close to) a | |normal with mean 200000p and variance 200000p(1-p), and | |p=244/495. Throws necessary to complete the game can vary | |from 1 to infinity, but counts for all>21 are lumped with 21.| |A chi-square test is made on the no.-of-throws cell counts. | |Each 32-bit integer from the test file provides the value for| |the throw of a die, by floating to [0,1), multiplying by 6 | |and taking 1 plus the integer part of the result. | |-------------------------------------------------------------| RESULTS OF CRAPS TEST FOR random_bits.dat No. of wins: Observed Expected 97950 98585.858586 z-score=-2.844, pvalue=0.99777 Analysis of Throws-per-Game: Throws Observed Expected Chisq Sum of (O-E)^2/E 1 66490 66666.7 0.468 0.468 2 37922 37654.3 1.903 2.371 3 27017 26954.7 0.144 2.515 4 19286 19313.5 0.039 2.554 5 13849 13851.4 0.000 2.554 6 9820 9943.5 1.535 4.089 7 7043 7145.0 1.457 5.546 8 5099 5139.1 0.312 5.859 9 3747 3699.9 0.600 6.459 10 2719 2666.3 1.042 7.501 11 1867 1923.3 1.650 9.151 12 1444 1388.7 2.199 11.349 13 1006 1003.7 0.005 11.355 14 700 726.1 0.941 12.296 15 562 525.8 2.487 14.783 16 373 381.2 0.174 14.957 17 262 276.5 0.764 15.722 18 220 200.8 1.830 17.551 19 169 146.0 3.629 21.180 20 113 106.2 0.433 21.613 21 292 287.1 0.083 21.697 Chisq= 21.70 for 20 degrees of freedom, p= 0.35720 SUMMARY of craptest on random_bits.dat p-value for no. of wins: 0.997772 p-value for throws/game: 0.357199 _____________________________________________________________ --Apple-Mail-5--506437555 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit --Apple-Mail-5--506437555--