From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id WAA09175; Mon, 2 Sep 2002 22:17:13 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id WAA09885 for ; Mon, 2 Sep 2002 22:17:12 +0200 (MET DST) Received: from lobus.fungible.com (adsl-64-161-114-6.dsl.snfc21.pacbell.net [64.161.114.6]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g82KHAD02109 for ; Mon, 2 Sep 2002 22:17:10 +0200 (MET DST) Received: by lobus.fungible.com (Postfix, from userid 1000) id 010407F66; Mon, 2 Sep 2002 13:17:06 -0700 (PDT) Date: Mon, 2 Sep 2002 10:18:13 -0700 From-Tims-Fingers: true To: john.chu@east.sun.com Cc: Caml-list@inria.fr In-reply-to: <15731.25402.232987.3449@gargle.gargle.HOWL> (john.chu@east.sun.com) Subject: Re: [Caml-list] Holding a set of random integers of very wide range? References: <3D70203F.1000106@ozemail.com.au> <15731.25402.232987.3449@gargle.gargle.HOWL> Message-Id: <20020902201706.010407F66@lobus.fungible.com> From: tim@fungible.com (Tim Freeman) Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk >For reasons I tried to explain in a previous draft of this e-mail and >has been cut due to the amount of space it took, I need to generate >multiple permuted lists of integers ranging from 0 to approximately >2^100 (or more, it's a bit open-ended unfortunately). > >Since I only need one value at a time, I can use a lazy list for >this. i.e., the head holds a value and the tail is a suspension of the >state I need to generate the next value. No big deal there. > >The big deal, for me anyways, is that the state I need is tracking the >integers that I've already used (so that I don't generate the same one >twice) given that the range of possible values is so large. If your integers have enough bits, then a list of random numbers is indistinguishable from a random permutation. This assumes you only have time to look at a reasonable number of values from the permutation. Do the math to figure out what "big enough" is. For example, in the past year I designed a scheme that assumed that two files were identical if and only if their 64-bit checksums were identical, and there were only a million or so files, so I did the math and concluded that the chances of a collision were small enough. It worked fine. If you're paranoid then use the MD5 checksums in the Digest module to generate random numbers; otherwise you can use a linear congruential pseudo-random number generator as mentioned by another poster and hope that there's no interesting interaction between the number generation scheme and the other details of your algorithm. >I'm currently using the very low-tech solution of maintaining a list >of ranges of used values. (i.e., when I use a value that would link 2 >different ranges, I go coalesce the list. The worst case would be if >I picked all the odd integers or all the even integers.) > >Given the recent talk of Patricia trees and bit sets, I was wondering >if some variant of either one of those, or some other representation >would be better for what I'm doing? (I figure at the very least I >ought to be using a tree of ranges of used values. I'm currently >using the Big_int module to represent my integers.) If your integers are large enough, then you can ignore the possiblity of a collision and no data structure is needed. Otherwise, is there a reason not to use a hash table? -- Tim Freeman tim@fungible.com GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D 7180 76DF FE00 34B1 5C78 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners