caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Diego Olivier Fernandez Pons <Diego.FERNANDEZ_PONS@etu.upmc.fr>
To: Yamagata Yoriyuki <yoriyuki@mbg.sphere.ne.jp>
Cc: caml-list@inria.fr
Subject: Re : [Caml-list] Set/Map with intervals and/or order-related operations
Date: Fri, 30 May 2003 14:00:00 +0200 (DST)	[thread overview]
Message-ID: <Pine.A41.4.44.0305300954130.3408066-100000@ibm1.cicrp.jussieu.fr> (raw)
In-Reply-To: <20030529.224025.59459731.yoriyuki@mbg.sphere.ne.jp>

    Bonjour,

> Perhaps I will develop my own since I need balancing schema, and a
> Map-like structure (not only a Set-like structure.) based on Diet.
> I only need queries for individual elements, not intervals. So I
> don't think a k-d tree is appropriate. Also I need the data
> structure purely functional, because the cost of copying is not
> negligible. So, even though several people points out the existence
> of C implementations, I will opt an ocaml one. The dom type of
> FaCiLe is essentially a list of intervals. Maybe I can use Baire
> implementations.

I am afraid I haven't yet understood your problem and I will need more
explanations to be able to help you.

There are several points in your message I would like to comment :
- Tree-like data structure balancing
- Diet (discrete interval encoding scheme)

1. Balancing tree-like data structure

You should separate the design of the tree-like data structure and its
balancing.

First write down your unbalanced data structure and get your
algorithms right, only then choose one and implement one of the
multiple balancing schemes available

The trick is very simple (Stephen Adams) : encapsulate the balancing
information in << smart constructors >>.

At the beginning this constructors will be trivial

let makeTree = fun l v r -> N (l, v, r)

and your functions will look like this

let rec insert x = function
  | E -> single x
  | N (l, v, r) ->
     match compare x v with
        | n when n < 0 -> makeTree (insert x l) v r
        | n when n > 0 -> makeTree l v (insert x r)
        | _ -> failwith "already in the data structure"

Once everything goes right, you will add a balancing constructor and
replace your [makeTree] by the balancing constructor [makeBTree].

Sometimes you will find usefull not to recompute all the balancing
information at every node, then you just need to add a cache. This is
of course optional.

...
| E -> injection x
| N (l, v, r, _) -> (* the last int is a cache for the constructor *)
...

Your balanced function will look like this

let rec insert x = function
  | E -> single x
  | N (l, v, r, _) ->
     match compare x v with
        | n when n < 0 -> makeBTree (insert x l) v r
        | n when n > 0 -> makeBTree l v (insert x r)
        | _ -> failwith "already in the data structure"

Conclusion : You do not need to bother with balancing schemes from the
beginning. Once your unbalanced data structure works fine, you can add
them very easely.

Baire contains a few examples :
- unbalancedSet.ml contains unbalanced trees and AVL-trees without
cache (does not change the data type) (0)
- avlSet.ml contains avl sets with cache (1)
- weighBalancedSet.ml contains WB sets with cache (2)
...

(1) and (2) are just a copy-paste of (0) with the described
modifications and (1) and (2) share all but 40 lines of code (namely
the 4 balancing constructors).


2. Discrete interval encoding tree

The idea of diet is to replace successive integers by an interval

{3} {5} {6} {7} {9} -> {3} [5-7] {9}

Then, a Diet is just a tree of intervals and singletons with some
join/separate instructions. Since I do not understand what you mean by
'map-like operations' I don't know if this is appropriate.

{3, 4} {5, 2} {6, 2} {7, 3} {9, 4} -> {3, 4} [5-6, 2] {7, 3} {9, 4}

This will work (I mean with a significant compression rate) only if
you can ensure that consecutive bindings will have very often the same
image.

Or are you binding sigletons and intervals to values ?

{3, 4} {[5-7], 2} {[6-10], 5} {9, 4}

find 5 myDataStructure -> [(5, 2); (5, 5)]

Are the intervals supposed to be disjoint ?

{3, 4} {[5-7], 2} {[8-10], 3}

find 5 myDataStructure -> (5, 2)

Or are you binding keys to multiple values

{3, {4}} {5, {2, 5}} {6, {2, 5}} {7, {2, 5}} {8, 5} {9, {4, 5}} {10,
{5}}

find 5 myDataStructure -> [(5, 2); (5, 5)]

Multidimensional data structure (k-d trees, range trees, cartesian
trees) may be what you are looking for. It depends highly on what you
mean by "set/map with intervals" and "order related operations".

Could you give a few examples of what you expect of such a data
structure ?

        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


      reply	other threads:[~2003-05-30 12:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-05-25 10:26 Yamagata Yoriyuki
2003-05-25 10:48 ` Eray Ozkural
2003-05-26  7:59   ` Diego Olivier Fernandez Pons
2003-05-26  7:11 ` Diego Olivier Fernandez Pons
2003-05-29 13:40 ` Yamagata Yoriyuki
2003-05-30 12:00   ` Diego Olivier Fernandez Pons [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.A41.4.44.0305300954130.3408066-100000@ibm1.cicrp.jussieu.fr \
    --to=diego.fernandez_pons@etu.upmc.fr \
    --cc=caml-list@inria.fr \
    --cc=yoriyuki@mbg.sphere.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).