From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 0BBD2BC6B for ; Mon, 4 Jun 2007 17:33:36 +0200 (CEST) Received: from dispatch.cs.umd.edu (dispatch.cs.umd.edu [128.8.128.60]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l54FXYKB012091 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Mon, 4 Jun 2007 17:33:35 +0200 Received: from [192.168.0.2] (c-69-243-63-39.hsd1.md.comcast.net [69.243.63.39]) (authenticated bits=0) by dispatch.cs.umd.edu (8.13.1/8.12.5) with ESMTP id l54FXODP030709 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 4 Jun 2007 11:33:25 -0400 Message-ID: <466430C4.7020806@cs.umd.edu> Date: Mon, 04 Jun 2007 11:33:24 -0400 From: Mike Furr Reply-To: osp2007-ocaml-reins@googlegroups.com User-Agent: Mozilla-Thunderbird 2.0.0.0 (X11/20070521) MIME-Version: 1.0 To: caml-list@yquem.inria.fr Cc: osp2007-ocaml-reins@googlegroups.com Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics References: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com> <200706011813.48515.jon@ffconsultancy.com> <46641BC1.9090305@cs.umd.edu> <200706041539.03286.jon@ffconsultancy.com> In-Reply-To: <200706041539.03286.jon@ffconsultancy.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-CSD-MailScanner-Information: Please email staff@cs.umd.edu for more information X-CSD-MailScanner: Found to be clean X-CSD-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-4.399, required 5, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -2.60) X-CSD-MailScanner-From: furr@cs.umd.edu X-Miltered: at discorde with ID 466430CE.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 ocaml:01 filliatre:01 okasaki:01 markus:01 mottl:01 stdlib:01 iter:01 posts:01 semantics:01 read-only:01 indirection:01 functors:01 polymorphism:01 trivial:01 For now, I just want to get Jon Harrop wrote: > On Monday 04 June 2007 15:03:45 Mike Furr wrote: >> My OCaml summer project to build an improved data structure library > > This is an absolutely incredible project and I am eagerly awaiting your > results. As I am sure you already know, Jean-Christophe Filliatre, Chris > Okasaki and Markus Mottl have done fantastic work in this area already. I > would be very happy to see this collated. Diego Fernandez also did some good > work with Baire but it seems to have evaporated. Yes, their work is a major influence of the project. > I'll gladly submit my stdlib2 if you're interested. Its main benefit is > autogenerated functions (like iter, map, fold) that act upon tuples of > different arities: Great, I would love to see what you've done! Also, if you interested, you can subscribe to the mailing list for the project (no posts yet as I'm just getting started, but perhaps we should move our conversation there... CC'd). http://groups.google.com/group/osp2007-ocaml-reins > You might also consider lazy rebalancing, as rebalancing is both expensive and > often unnecessary. Initially, I plan on supporting lazy rebalancing via the use of zippers. This will also allow splay-like find semantics for arbitrary tree structures (although not quite as efficient as a native splay tree which of course will also be available). > A functional array implemented as a balanced binary tree needs the number of > elements (size) of a sub tree just to index into the tree. So that read-only > operation must indirect unnecessarily using your approach. However, the > indirection is a much smaller overhead compared to the cost of functors or > polymorphism when the comparison functions are trivial (which they normally > are) using the current approach. Indeed, this example may require a custom implementation to be efficient. However, I have lots to do for the summer and will likely leave such performance tweaking until the end of the project. > My feeling is that the use of functors in Set and Map are more of a hindrance > than a benefit. The ubiquity of Hashtbl is testament to this. I'm not exactly sure what you mean by this. Are you suggesting the performance is a hindrance or that the need for a separate 'module M = SMap.Make(...)' line is annoying? One goal of my library will be that it will allow you to change data structures very easily. For instance you might write the following while initially developing your code module MyMap = Poly.Map This map would use the standard polymorphic compare function with keys of type 'a and values of type 'b (for some a,b). Later, once your code has matured, you might know your keys are always of type (int*int) list. Then you would simply change the module def to: module MyMap = Mono.AVLMap(List(Pair(Int,Int))) and now you have a specific data structure which takes advantage of the faster monomorphic compare function. The library will include modules for all of the base types (int, float, etc...) as well as hoisting common type constructors into functors (such as List() above). Therefore, you don't have to write any annoying boilerplate code to use the functors. As far as performance goes, I plan on using functors where it makes the code most elegant and easy to use. With all of the mention of ocamldefun on this mailing list, I imagine it won't be too long before we have version for ocaml 3.10 (or else I may write it myself!) >> However, this has the huge benefit of allowing me to only write *one* >> version of find, min/max, fold, iter, etc. for all of the trees I >> implement, which in and of itself is a big win. :) > > Perhaps we should consider the alternative approach where those functions are > higher-order functions over the necessary low-level tree functions. This > would only need support for inlining to recover optimal performance and it > also lets you write the core functions only once. Perhaps. This would require passing around a dictionary of functions and I'm not sure the resulting code would be as clean or any faster than functors. Another way would be to build them on top of the libraries zipper-based iterators (but this will likely be less efficient). I plan on doing this anyway to support arbitrary traversal strategies. Again, perhaps at the end of the summer I can look into the performance characteristics of these ideas. Cheers, -Mike