caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann
@ 2002-05-10 16:56 ` Alexander V.Voinov
  2002-05-10 22:31 ` Stefano Zacchiroli
  2002-05-13  7:23 ` Mattias Waldau
  2 siblings, 0 replies; 10+ messages in thread
From: Alexander V.Voinov @ 2002-05-10 16:56 UTC (permalink / raw)
  To: info; +Cc: caml-list

Hi

From: Gerd Stolpmann <info@gerd-stolpmann.de>
Subject: [Caml-list] Suggestion about balanced trees in stdlib
Date: Fri, 10 May 2002 18:59:54 +0200

> Balanced_tree would have the following operations:
> 
> - all of Set
> - all of Map
> - operations on intervals: "iter_interval", "fold_interval", and "interval"
>   (returning the subset)

Yes, this would be great, together with the rest of the proposal.

Alexander
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Caml-list] Suggestion about balanced trees in stdlib
@ 2002-05-10 16:59 Gerd Stolpmann
  2002-05-10 16:56 ` Alexander V.Voinov
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Gerd Stolpmann @ 2002-05-10 16:59 UTC (permalink / raw)
  To: caml-list

Hello list,

I have always wondered why there are two implementations of balanced trees
in the standard library: Set and Map. The set of operations is different,
but the representation is (almost) identical. Furthermore, neither of the
two modules claims to implement balanced trees in the interface; for example
Set.iter does not specify the order of iteration although the implementation
iterates linearly from the smallest to the largest element.

The idea is to introduce a third module Balanced_tree, defining all of the
operations for the two mentioned modules, but announcing them as balanced
trees. This would have the advantage that the operations could be specified
in a more complete way (Balanced_tree.iter is specified as iterating in an
ascending way), and it would be reasonable to add further operations that
know about the representation.

Of course, Set and Map would recur to that more special module.

Balanced_tree would have the representation of Map, i.e. the elements are
pairs of keys and attached values. To emulate Set, the value () is used.

Balanced_tree would have the following operations:

- all of Set
- all of Map
- operations on intervals: "iter_interval", "fold_interval", and "interval"
  (returning the subset)

The interval operations have an important application: One can program indexes
that can be accessed by coordinates (e.g.: return all objects that are currently
visible in the viewbox).

What do you think about this?

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann
  2002-05-10 16:56 ` Alexander V.Voinov
@ 2002-05-10 22:31 ` Stefano Zacchiroli
  2002-05-10 23:05   ` Alexander V.Voinov
  2002-05-11 15:47   ` Gerd Stolpmann
  2002-05-13  7:23 ` Mattias Waldau
  2 siblings, 2 replies; 10+ messages in thread
From: Stefano Zacchiroli @ 2002-05-10 22:31 UTC (permalink / raw)
  To: caml-list

On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote:
> Of course, Set and Map would recur to that more special module.
> 
> Balanced_tree would have the representation of Map, i.e. the elements are
> pairs of keys and attached values. To emulate Set, the value () is used.

I don't know how the compiler treat unit values wrt optimization or
such, but using an unit in each node to emulate Set seems to me
inefficient for a module of the standard library.

I think, but I haven't thought a lot about it, that the problem can be
solved using an implementation polymorphic in the leaf type hidden to
the use that access Set and Map through functors as usual.

Anyway the idea seems really good.

I also suggest to add different kind of visit (i.e. not only iter) like
{pre,post,in}visit via different functions or specifing the desired
behaviour when using the functor or both.

Cheers.

-- 
Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy
zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro
"I know you believe you understood what you think I said, but I am not
sure you realize that what you heard is not what I meant!" -- G.Romney
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-10 22:31 ` Stefano Zacchiroli
@ 2002-05-10 23:05   ` Alexander V.Voinov
  2002-05-11  7:20     ` Stefano Zacchiroli
  2002-05-11 15:47   ` Gerd Stolpmann
  1 sibling, 1 reply; 10+ messages in thread
From: Alexander V.Voinov @ 2002-05-10 23:05 UTC (permalink / raw)
  To: zack; +Cc: caml-list

Hi

From: Stefano Zacchiroli <zack@cs.unibo.it>
Subject: Re: [Caml-list] Suggestion about balanced trees in stdlib
Date: Sat, 11 May 2002 00:31:10 +0200

> On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote:
> > Of course, Set and Map would recur to that more special module.
> > 
> > Balanced_tree would have the representation of Map, i.e. the elements are
> > pairs of keys and attached values. To emulate Set, the value () is used.
> 
> I don't know how the compiler treat unit values wrt optimization or
> such, but using an unit in each node to emulate Set seems to me
> inefficient for a module of the standard library.

Is 1 (integer one) better? :-)

Alexander
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-10 23:05   ` Alexander V.Voinov
@ 2002-05-11  7:20     ` Stefano Zacchiroli
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Zacchiroli @ 2002-05-11  7:20 UTC (permalink / raw)
  To: caml-list

On Fri, May 10, 2002 at 04:05:58PM -0700, Alexander V.Voinov wrote:
> > I don't know how the compiler treat unit values wrt optimization or
> > such, but using an unit in each node to emulate Set seems to me
> > inefficient for a module of the standard library.
> 
> Is 1 (integer one) better? :-)

No, is the same, they are both unboxed integers.

Cheers.

-- 
Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy
zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro
"I know you believe you understood what you think I said, but I am not
sure you realize that what you heard is not what I meant!" -- G.Romney
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-10 22:31 ` Stefano Zacchiroli
  2002-05-10 23:05   ` Alexander V.Voinov
@ 2002-05-11 15:47   ` Gerd Stolpmann
  2002-05-11 17:18     ` Stefano Zacchiroli
  1 sibling, 1 reply; 10+ messages in thread
From: Gerd Stolpmann @ 2002-05-11 15:47 UTC (permalink / raw)
  To: Stefano Zacchiroli; +Cc: caml-list


On 2002.05.11 00:31 Stefano Zacchiroli wrote:
> On Fri, May 10, 2002 at 06:59:54PM +0200, Gerd Stolpmann wrote:
> > Of course, Set and Map would recur to that more special module.
> > 
> > Balanced_tree would have the representation of Map, i.e. the elements are
> > pairs of keys and attached values. To emulate Set, the value () is used.
> 
> I don't know how the compiler treat unit values wrt optimization or
> such, but using an unit in each node to emulate Set seems to me
> inefficient for a module of the standard library.
> 
> I think, but I haven't thought a lot about it, that the problem can be
> solved using an implementation polymorphic in the leaf type hidden to
> the use that access Set and Map through functors as usual.

type 'a map = Empty | Node of 'a map * key * 'a * 'a map * int

type set = Empty | Node of set * elt * set * int

The difference is that every map node has 6 words, and every set node
consists of 5 words. My suggestion is to prefer the map representation,
and define set = unit map, wasting one word per node, and making sets
20% larger.

The other possibility would be to prefer the set representation, e.g.

type 'a baltree = Empty | Node of 'a baltree * 'a elt * 'a baltree * int

and to reduce map and set with some functor tricks to this representation.
Now sets are not larger, but maps require 8 words of memory, because
you have to use an additional pair 'a elt = ('a, map_elt), i.e. maps are
33% larger. I think this is the worse choice.

BTW, if memory consumption matters, the current representation can be still
optimized by removing the balance factor, and encoding it in the variant
tag:

type set = Empty | Node_0 of set * elt * set | Node_l of <same> | Node_r of <same>

Of course, the readability of the code would suffer, but one can argue that 
the overall performance is more important than code beauty for the standard 
library. However, the point is that it is a matter of discussion what is
acceptable in the stdlib, because there are a lot of views to this central
library. I would prefer to pay an increased memory consumption of 20% for
sets, and to get the possibility to use balanced trees directly for that
costs. Maybe other people have different views on this.

> Anyway the idea seems really good.
> 
> I also suggest to add different kind of visit (i.e. not only iter) like
> {pre,post,in}visit via different functions or specifing the desired
> behaviour when using the functor or both.

Why not export functions that access the subtrees directly:
left_subtree, right_subtree, top_element.

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-11 15:47   ` Gerd Stolpmann
@ 2002-05-11 17:18     ` Stefano Zacchiroli
  2002-05-11 17:36       ` Dave Mason
  2002-05-12  0:03       ` polux moon
  0 siblings, 2 replies; 10+ messages in thread
From: Stefano Zacchiroli @ 2002-05-11 17:18 UTC (permalink / raw)
  To: caml-list

On Sat, May 11, 2002 at 05:47:04PM +0200, Gerd Stolpmann wrote:
> type 'a map = Empty | Node of 'a map * key * 'a * 'a map * int
> 
> type set = Empty | Node of set * elt * set * int
> 
> The difference is that every map node has 6 words, and every set node
> consists of 5 words. My suggestion is to prefer the map representation,
> and define set = unit map, wasting one word per node, and making sets
> 20% larger.

Ok, I agree that this is the "clean" way to the goal.

Anyway I think that memory consumption is crucial for the standard
library, this is why I'm proposing a dirty trick for the set/map matter.
Usually I dislike tricks and prefer clean programming, but IMHO a
standard library is somewhat a particular case.

<WARNING_dirty_trick>

Why not use a single source file that implement both set and map using
some campl4 tricks like pa_ifdef module?
You may have two different definitions of tree and a pool of accessor
functions working on a tree and returning single components (hoping that
the compiler inline this kind of functions).
In the set case you doesn't need to use the value component of a tree
so you can avoid to compile the relative accessor function (poor gain)
and you can also compile a tree tuple without a value component.

</WARNING_dirty_trick>

[ Question: gurus, do you thinks this is a feasible solution or I'm
missing something? ]

I know, this is a really dirty trick. Also I don't know if is applicable
because I have never looked at the implementations set.ml and map.ml but
just as an idea it can work.

Pros:
- reduce memory consumption for Set
- no code duplication

Cons:
- objects (.cm{a,o,x...}) duplication
- really dirty!
- last time I looked at it camlp4 pa_ifdef module doesn't permit
  multiple declaration using only one "if" construct so you incurs in
  situations like:
     ifdef FOO then let bar = ...
     ifdef FOO then let quux = ...
     ifdef FOO then let baz = ...

Just a [dirty] idea.
Cheers.

-- 
Stefano Zacchiroli - undergraduate student of CS @ Univ. Bologna, Italy
zack@cs.unibo.it | ICQ# 33538863 | http://www.cs.unibo.it/~zacchiro
"I know you believe you understood what you think I said, but I am not
sure you realize that what you heard is not what I meant!" -- G.Romney
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-11 17:18     ` Stefano Zacchiroli
@ 2002-05-11 17:36       ` Dave Mason
  2002-05-12  0:03       ` polux moon
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Mason @ 2002-05-11 17:36 UTC (permalink / raw)
  To: caml-list

A less-dirty idea:

Sorry, I scanned Gerd's comment and deleted it.  I agree unifying Set
and Map would be good, but don't like the idea of *increasing* the
size of an existing module's data structure.  And I'm not crazy about
Stefano's dirty trick.  :-)

Gerd suggested that the int weighting factor could be turned into a
constructor.  Doing that would save the word that making a Set be a
unit Map would cost, so Set would be the same size as now, Map would
be smaller, and they'd be integrated.  (Yes, you could make Set yet
smaller, but this way that complexity (of constructors) is only in one
place.

You could also save that extra word, and make it a little faster with
the ugly trick of using recursive records, where a record that points
to itself is the sentinel for Empty... but then the = operation
becomes non-terminating, so this isn't something that I think is worth
it in a stdlib kind of module.

Sorry if I missed something, and this is senseless mumbling...

../Dave
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-11 17:18     ` Stefano Zacchiroli
  2002-05-11 17:36       ` Dave Mason
@ 2002-05-12  0:03       ` polux moon
  1 sibling, 0 replies; 10+ messages in thread
From: polux moon @ 2002-05-12  0:03 UTC (permalink / raw)
  To: caml-list

In a tree, there are many leaf 
> type set = Empty | Node of set * elt * set * int
Why not add   Leaf of elt 
Instead of using Node(Empty,some value,Empty,0) u can use Leaf(some value).
It is clear and just add some code.
U win 3 word per leaf. (in a tree the leaf are about the half of the nodes)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Caml-list] Suggestion about balanced trees in stdlib
  2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann
  2002-05-10 16:56 ` Alexander V.Voinov
  2002-05-10 22:31 ` Stefano Zacchiroli
@ 2002-05-13  7:23 ` Mattias Waldau
  2 siblings, 0 replies; 10+ messages in thread
From: Mattias Waldau @ 2002-05-13  7:23 UTC (permalink / raw)
  To: 'Gerd Stolpmann', caml-list

> - operations on intervals: "iter_interval", "fold_interval", 
> and "interval"

In order to solve the mentioned operations, I usually use Splay-trees,
which is part of the CDK. They have the following operations.

    val floor: 'a t -> key -> key * 'a
	(* [floor s c] returns the greatest element with [key <= c] *)
    val ceil: 'a t -> key -> key * 'a
	(* [ceil s c] returns the smallest element with [key >= c] *)
    val prev: 'a t -> key -> key * 'a
	(* [prev s c] returns the greatest element with [key < c] *)
    val next: 'a t -> key -> key * 'a
	(* [next s c] returns the smallest element with [key > c] *)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-05-13  7:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-10 16:59 [Caml-list] Suggestion about balanced trees in stdlib Gerd Stolpmann
2002-05-10 16:56 ` Alexander V.Voinov
2002-05-10 22:31 ` Stefano Zacchiroli
2002-05-10 23:05   ` Alexander V.Voinov
2002-05-11  7:20     ` Stefano Zacchiroli
2002-05-11 15:47   ` Gerd Stolpmann
2002-05-11 17:18     ` Stefano Zacchiroli
2002-05-11 17:36       ` Dave Mason
2002-05-12  0:03       ` polux moon
2002-05-13  7:23 ` Mattias Waldau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).