From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <furr@cs.umd.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id 0BBD2BC6B
	for <caml-list@yquem.inria.fr>; Mon,  4 Jun 2007 17:33:36 +0200 (CEST)
Received: from dispatch.cs.umd.edu (dispatch.cs.umd.edu [128.8.128.60])
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l54FXYKB012091
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL)
	for <caml-list@yquem.inria.fr>; Mon, 4 Jun 2007 17:33:35 +0200
Received: from [192.168.0.2] (c-69-243-63-39.hsd1.md.comcast.net [69.243.63.39])
	(authenticated bits=0)
	by dispatch.cs.umd.edu (8.13.1/8.12.5) with ESMTP id l54FXODP030709
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 4 Jun 2007 11:33:25 -0400
Message-ID: <466430C4.7020806@cs.umd.edu>
Date: Mon, 04 Jun 2007 11:33:24 -0400
From: Mike Furr <furr@cs.umd.edu>
Reply-To: osp2007-ocaml-reins@googlegroups.com
User-Agent: Mozilla-Thunderbird 2.0.0.0 (X11/20070521)
MIME-Version: 1.0
To: caml-list@yquem.inria.fr
Cc: osp2007-ocaml-reins@googlegroups.com
Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics
References: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com>	<200706011813.48515.jon@ffconsultancy.com>	<46641BC1.9090305@cs.umd.edu> <200706041539.03286.jon@ffconsultancy.com>
In-Reply-To: <200706041539.03286.jon@ffconsultancy.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-CSD-MailScanner-Information: Please email staff@cs.umd.edu for more information
X-CSD-MailScanner: Found to be clean
X-CSD-MailScanner-SpamCheck: not spam, SpamAssassin (not cached,
	score=-4.399, required 5, autolearn=not spam, ALL_TRUSTED -1.80,
	BAYES_00 -2.60)
X-CSD-MailScanner-From: furr@cs.umd.edu
X-Miltered: at discorde with ID 466430CE.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; ocaml:01 ocaml:01 filliatre:01 okasaki:01 markus:01 mottl:01 stdlib:01 iter:01 posts:01 semantics:01 read-only:01 indirection:01 functors:01 polymorphism:01 trivial:01 

   For now,  I just want to get Jon Harrop wrote:
> On Monday 04 June 2007 15:03:45 Mike Furr wrote:
>> My OCaml summer project to build an improved data structure library
> 
> This is an absolutely incredible project and I am eagerly awaiting your 
> results. As I am sure you already know, Jean-Christophe Filliatre, Chris 
> Okasaki and Markus Mottl have done fantastic work in this area already. I 
> would be very happy to see this collated. Diego Fernandez also did some good 
> work with Baire but it seems to have evaporated.

Yes, their work is a major influence of the project.

> I'll gladly submit my stdlib2 if you're interested. Its main benefit is 
> autogenerated functions (like iter, map, fold) that act upon tuples of 
> different arities:

Great, I would love to see what you've done!  Also, if you interested, 
you can subscribe to the mailing list for the project (no posts yet as 
I'm just getting started, but perhaps we should move our conversation 
there... CC'd).

http://groups.google.com/group/osp2007-ocaml-reins

> You might also consider lazy rebalancing, as rebalancing is both expensive and 
> often unnecessary.

Initially, I plan on supporting lazy rebalancing via the use of zippers. 
   This will also allow splay-like find semantics for arbitrary tree 
structures (although not quite as efficient as a native splay tree which 
of course will also be available).

> A functional array implemented as a balanced binary tree needs the number of 
> elements (size) of a sub tree just to index into the tree. So that read-only 
> operation must indirect unnecessarily using your approach. However, the 
> indirection is a much smaller overhead compared to the cost of functors or 
> polymorphism when the comparison functions are trivial (which they normally 
> are) using the current approach.

Indeed, this example may require a custom implementation to be 
efficient.  However, I have lots to do for the summer and will likely 
leave such performance tweaking until the end of the project.

> My feeling is that the use of functors in Set and Map are more of a hindrance 
> than a benefit. The ubiquity of Hashtbl is testament to this.

I'm not exactly sure what you mean by this.  Are you suggesting the 
performance is a hindrance or that the need for a separate 'module M = 
SMap.Make(...)' line is annoying?  One goal of my library will be that 
it will allow you to change data structures very easily.  For instance 
you might write the following while initially developing your code

   module MyMap = Poly.Map

This map would use the standard polymorphic compare function with keys 
of type 'a and values of type 'b (for some a,b).  Later, once your code 
has matured, you might know your keys are always of type (int*int) list.
Then you would simply change the module def to:

  module MyMap = Mono.AVLMap(List(Pair(Int,Int)))

and now you have a specific data structure which takes advantage of the 
faster monomorphic compare function.  The library will include modules 
for all of the base types (int, float, etc...) as well as hoisting 
common type constructors into functors (such as List() above). 
Therefore, you don't have to write any annoying boilerplate code to use 
the functors.

As far as performance goes, I plan on using functors where it makes the 
code most elegant and easy to use.  With all of the mention of 
ocamldefun on this mailing list, I imagine it won't be too long before 
we have version for ocaml 3.10 (or else I may write it myself!)

>> However, this has the huge benefit of allowing me to only write *one*
>> version of find, min/max, fold, iter, etc. for all of the trees I
>> implement, which in and of itself is a big win. :)
> 
> Perhaps we should consider the alternative approach where those functions are 
> higher-order functions over the necessary low-level tree functions. This 
> would only need support for inlining to recover optimal performance and it 
> also lets you write the core functions only once.

Perhaps.  This would require passing around a dictionary of functions 
and I'm not sure the resulting code would be as clean or any faster than 
functors.  Another way would be to build them on top of the libraries 
zipper-based iterators (but this will likely be less efficient).  I plan 
on doing this anyway to support arbitrary traversal strategies.  Again, 
perhaps at the end of the summer I can look into the performance 
characteristics of these ideas.

Cheers,
-Mike