caml-list - the Caml user's mailing list
help / color / mirror / Atom feed
* [Caml-list] Choosing a random element in a Map or a Set
@ 2021-03-18  9:44 jean-marc.alliot
2021-03-18 10:06 ` François Pottier
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: jean-marc.alliot @ 2021-03-18  9:44 UTC (permalink / raw)
To: caml-list

Dear list,

I am trying to take a random element from a Map or a Set.

Currently, I generate one random int between 1 et Card(map), and I iter
until I reach this element. The solution is in O(n) which is not great...

I was about to code the following solution: take the current Map code
and add a function which follows the balanced binary tree from the root
and takes randomly a left or right turn at each node until we reach a
leaf (we only need to generate one random int and use its binary
representation to choose the left or right direction at each node). This
should be in O(log(n))

Before I start coding like an idiot:
1) Is there another, more intelligent, solution?
2) Is my solution biased? I think it is not, as long as the tree is
properly balanced but...

Thanks

Jean-Marc

• * Re: [Caml-list] Choosing a random element in a Map or a Set
2021-03-18  9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot
@ 2021-03-18 10:06 ` François Pottier
2021-03-18 14:20   ` Simon Cruanes
2021-03-18 11:10 ` Nicolas Barnier
` (2 subsequent siblings)
From: François Pottier @ 2021-03-18 10:06 UTC (permalink / raw)
To: jean-marc.alliot, caml-list

Le 18/03/2021 à 10:44, jean-marc.alliot@irit.fr a écrit :
> 1) Is there another, more intelligent, solution?
> 2) Is my solution biased? I think it is not, as long as the tree is
> properly balanced but...

The tree is only approximately balanced, so this approach should be
OK if you are willing to accept some bias, but it is not perfectly
uniform.

If it is important to obtain a uniform distribution, then I would
suggest: 1- picking a random index in the interval [0, n); and 2-
using a data structure that allows efficient indexing. Finger trees
come to mind (see the papers by Hinze and Paterson and by Sozeau),
but there are probably others.

--
François Pottier
francois.pottier@inria.fr
http://cambium.inria.fr/~fpottier/

• * Re: [Caml-list] Choosing a random element in a Map or a Set
2021-03-18  9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot
2021-03-18 10:06 ` François Pottier
@ 2021-03-18 11:10 ` Nicolas Barnier
2021-03-18 11:32   ` François Pottier
2021-03-19  1:32   ` Francois Berenger
2021-03-18 12:34 ` Ivan Gotovchits
2021-03-18 12:48 ` Ivan Gotovchits
3 siblings, 2 replies; 9+ messages in thread
From: Nicolas Barnier @ 2021-03-18 11:10 UTC (permalink / raw)
To: caml-list

Hi Jean-Marc,

Le 18/03/2021 à 10:44, jean-marc.alliot@irit.fr a écrit :
> I am trying to take a random element from a Map or a Set.
>
> Currently, I generate one random int between 1 et Card(map), and I iter
> until I reach this element. The solution is in O(n) which is not great...
>
> I was about to code the following solution: take the current Map code
> and add a function which follows the balanced binary tree from the root
> and takes randomly a left or right turn at each node until we reach a
> leaf (we only need to generate one random int and use its binary
> representation to choose the left or right direction at each node). This
> should be in O(log(n))
>
> Before I start coding like an idiot:
> 1) Is there another, more intelligent, solution?

If using a Map or a Set is not compulsory, you can design a data structure
supporting insert, delete, search and random element in constant amortized
time with an extensible array and a hash-table as explained here:

https://www.geeksforgeeks.org/design-a-data-structure-that-supports-insert-delete-search-and-getrandom-in-constant-time/

I don't know if there are any good implementation of (Python-like)
extensible arrays in a widespread OCaml library, but François Pottier
published one as part of another superb data structure (union-find):

https://gitlab.inria.fr/fpottier/unionfind/-/blob/master/src/StoreVector.ml

You may want to adjust the ratio (2 here) by which the array is resized when
necessary.

Hope this helps,

-- Nicolas

• * Re: [Caml-list] Choosing a random element in a Map or a Set
2021-03-18  9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot
2021-03-18 10:06 ` François Pottier
2021-03-18 11:10 ` Nicolas Barnier
@ 2021-03-18 12:34 ` Ivan Gotovchits
2021-03-18 12:48 ` Ivan Gotovchits
3 siblings, 0 replies; 9+ messages in thread
From: Ivan Gotovchits @ 2021-03-18 12:34 UTC (permalink / raw)
To: jean-marc.alliot; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2455 bytes --]

On Thu, Mar 18, 2021 at 5:41 AM <jean-marc.alliot@irit.fr> wrote:

> Dear list,
>
> I am trying to take a random element from a Map or a Set.
>
> Currently, I generate one random int between 1 et Card(map), and I iter
> until I reach this element. The solution is in O(n) which is not great...
>
> I was about to code the following solution: take the current Map code
> and add a function which follows the balanced binary tree from the root
> and takes randomly a left or right turn at each node until we reach a
> leaf (we only need to generate one random int and use its binary
> representation to choose the left or right direction at each node). This
> should be in O(log(n))
>
> Before I start coding like an idiot:
> 1) Is there another, more intelligent, solution?
>

I would create an array of keys and then randomly choose a key from that
array. If you want to be on the functional side, you can shuffle the array
and turn it into a list.
This solution involves some duplication but is easy to implement and is
especially useful if you do not what repetitions in your selection.

An alternative solution that doesn't suffer from an extra memory overhead
would be writing a function that generates a random key (instead of a
random integer). It is not always possible,
especially when the domain of keys is infinite. However, if your set/map is
total (i.e., maps all keys in the domain to values) then it is the perfect
solution.

Finally, as an amalgamation of the above approaches, you can hash-cons your
keys (essentially turning keys into integers). It usually involves an extra
data structure to maintain your index,
but gives you the additional benefits of hash-consing, e.g., faster and
tighter maps and sets.

> 2) Is my solution biased? I think it is not, as long as the tree is
> properly balanced but...
>

OCaml trees are relaxed AVL trees with the maximum difference between trees
height equal to 2 (not 1 as in classical AVL). For small trees, it is quite
a substantial difference, e.g., imagine a tree of 9 elements with the left
tree having a height of 3 and the right of height 1.
The right tree will have only one element and the left will have 7
elements. Therefore, the probability of selecting the right element will be
1/2 vs 1/9. Even for larger trees, it remains the problem as a large tree
will be far from uniform.

>
> Thanks
>
> Jean-Marc
>
>
>

[-- Attachment #2: Type: text/html, Size: 3348 bytes --]

• * Re: [Caml-list] Choosing a random element in a Map or a Set
2021-03-18  9:44 [Caml-list] Choosing a random element in a Map or a Set jean-marc.alliot
` (2 preceding siblings ...)
2021-03-18 12:34 ` Ivan Gotovchits
@ 2021-03-18 12:48 ` Ivan Gotovchits
3 siblings, 0 replies; 9+ messages in thread
From: Ivan Gotovchits @ 2021-03-18 12:48 UTC (permalink / raw)
To: jean-marc.alliot; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

On Thu, Mar 18, 2021 at 5:41 AM <jean-marc.alliot@irit.fr> wrote:

> Dear list,
>
> I am trying to take a random element from a Map or a Set.
>
> Currently, I generate one random int between 1 et Card(map), and I iter
> until I reach this element. The solution is in O(n) which is not great...
>
> I was about to code the following solution: take the current Map code
>

As a side note, in case if you decide to pursue this idea. You don't really
need to take the code as you can implement your solution using the public
interface.
For that, you need only these two public functions: `find_first` and
`split`. The `find_first` function will help us to extract the top element
(`let top t = fst@@find_first (fun _ -> true) t`),
and `let destruct t = split (top t) t` will return you the left and the
right subtrees, and the top element.  The `destruct` function will not
allocate or rebalance any trees and conceptually
will just return pointers to the corresponding subtrees. So it is O(1) both
in space and time.

[-- Attachment #2: Type: text/html, Size: 1420 bytes --]