From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.5 required=5.0 tests=DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id B7585BC0A for ; Mon, 4 Jun 2007 16:04:06 +0200 (CEST) Received: from consign.cs.umd.edu (consign.cs.umd.edu [128.8.128.61]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l54E44XA014764 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Mon, 4 Jun 2007 16:04:06 +0200 Received: from [192.168.0.2] (c-69-243-63-39.hsd1.md.comcast.net [69.243.63.39]) (authenticated bits=0) by consign.cs.umd.edu (8.13.1/8.12.5) with ESMTP id l54E3krh020382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 4 Jun 2007 10:03:50 -0400 Message-ID: <46641BC1.9090305@cs.umd.edu> Date: Mon, 04 Jun 2007 10:03:45 -0400 From: Mike Furr User-Agent: Mozilla-Thunderbird 2.0.0.0 (X11/20070521) MIME-Version: 1.0 To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics References: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com> <3d13dcfc0706010449k53f1c364gfd4db47c7c258725@mail.gmail.com> <200706011813.48515.jon@ffconsultancy.com> In-Reply-To: <200706011813.48515.jon@ffconsultancy.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-CSD-MailScanner-Information: Please email staff@cs.umd.edu for more information X-CSD-MailScanner: Found to be clean X-CSD-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-4.32, required 5, autolearn=not spam, ALL_TRUSTED -1.80, AWL 0.08, BAYES_00 -2.60) X-CSD-MailScanner-From: furr@cs.umd.edu X-Miltered: at concorde with ID 46641BD4.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 stdlib:01 functorize:01 stdlib:01 ocaml:01 speedup:01 abstractions:01 markus:01 flatten:01 node:01 node:01 indirection:01 mixin:01 indirection:01 read-only:01 Jon Harrop wrote: > Actually, while we're here. I've long thought that the stdlib should provide > abstract implementations of concrete data structures like RB trees and AVL > trees, and functorize the Set and Map modules over the tree type they use. > This would let people add new abstract data structures (I like purely > functional sequences based on AVL trees) built upon solid concrete data > structures from the stdlib rather than cutting and pasting code (one of the > more embarassing OCaml FAQs). My OCaml summer project to build an improved data structure library will (hopefully) address many of the issues you have raised. Although I'm just getting started, I have already started using your suggestion of including a 1-element constructor for my trees as it does indeed seem to give a noticeable speedup. > Making this feasible by optimizing away the abstractions requires more than > just defunctorizing though. You need to partially specialize by type, as > Markus says. You also need to do whole-program transformations to flatten > data structures. For example, a set would require: > > Node of 'a t * 'a * 'a t * height > > and a sequence would require: > > Node of 'a t * 'a * 'a t * height * size > > So a generic OCaml solution would add an indirection to the metadata that > could be flattened out. Indeed this would be preferable. Right now, my tree implementations include a generic tree mixin with the type constructor: Node of ('data,'inv) t * 'data * ('data,'inv) t * 'inv forcing all invariant information into the last cell. Implementations that require more than a single value here must use an extra level of indirection (so in your example, 'inv = height * size). I don't think this will be too bad for performance since these values only need to be retrieved for re-balancing leaving read-only operations unaffected. However, this has the huge benefit of allowing me to only write *one* version of find, min/max, fold, iter, etc. for all of the trees I implement, which in and of itself is a big win. :) -Mike