From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by sympa.inria.fr (Postfix) with ESMTPS id 592187EE73 for ; Mon, 5 Nov 2012 11:12:55 +0100 (CET) Received-SPF: None (mail1-smtp-roc.national.inria.fr: no sender authenticity information available from domain of gabriel.scherer@gmail.com) identity=pra; client-ip=209.85.223.182; receiver=mail1-smtp-roc.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail1-smtp-roc.national.inria.fr: domain of gabriel.scherer@gmail.com designates 209.85.223.182 as permitted sender) identity=mailfrom; client-ip=209.85.223.182; receiver=mail1-smtp-roc.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail1-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-ie0-f182.google.com) identity=helo; client-ip=209.85.223.182; receiver=mail1-smtp-roc.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="postmaster@mail-ie0-f182.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: As4BAKOQl1DRVd+2m2dsb2JhbABEwzoIIwEBAQEBCAkLCRQngh4BAQUnGQEbCRQBAwwGBQsNLiEBAREBBQEcBhMIh28BAw8Lm2CMMYJ3hCUKGScNWYh1AQUMiw1oBYY3A5QmgVWBHIoXgzAWKYQS X-IronPort-AV: E=Sophos;i="4.80,714,1344204000"; d="scan'208";a="180155340" Received: from mail-ie0-f182.google.com ([209.85.223.182]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 05 Nov 2012 11:12:54 +0100 Received: by mail-ie0-f182.google.com with SMTP id k10so13709882iea.27 for ; Mon, 05 Nov 2012 02:12:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=qMZLtTV7v/+u85p3XhrRKhGwrIT7l2DTmd8a9N5Byfc=; b=kWUPIbTgcpRt8z/1tUxc+VSpyhaiBxEnp4NuovKTcMGc2WvSP004SjUlknrZYfoTpt 7keaaqSX52HkRGVL3QFjj3tTHPhANqrbaHHAKUFPL6ufNpEaqTNCkj6gbnK/mPObetIl Yh/AS9nGBp99azfVHwj6jvw1bEE6HkHFnrwtzVYgyvCENkiAU6ngSmlhgGPCazPPspvv VMRYWhVbkhNaGk7yNZ2vyoeBKLpi8hGXY4BU4yfZlxJmnvaDTV6jB/tYTBKuV+/DrE4C blceJ0KRiqT6RXy0gNbiC8utLO2CikeTAADkYnmmNEUsJkaG+a58rPpkzR5VkZWCTXgO RIWw== Received: by 10.50.1.200 with SMTP id 8mr8844598igo.51.1352110373239; Mon, 05 Nov 2012 02:12:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.50.65.9 with HTTP; Mon, 5 Nov 2012 02:12:11 -0800 (PST) In-Reply-To: <9da48315-639b-4de2-a3d5-f86685602425@googlegroups.com> References: <9da48315-639b-4de2-a3d5-f86685602425@googlegroups.com> From: Gabriel Scherer Date: Mon, 5 Nov 2012 11:12:11 +0100 Message-ID: To: Radu Grigore Cc: fa.caml@googlegroups.com, Caml List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Caml-list] Writing the function Set.map using first-class modules and 4.00 inference You haven't tested the bad case, where the list is nearly sorted but not exactly -- random lists are a good-case behavior as unsortedness is detected very early. I don't expect the performances to be that bad ("len" avoids any allocation). Another problem is that you call the user-defined comparison more than necessary, which can be problematic if it is costly -- you can solve that by rather using of_sorted_list to build the partial result list before raising Unsolved rather than rebuilding it from the ground up. I have played with such dynamic algorithm selection in the Map implementation of Batteries=B9, but I'm not really satisfied with it: you end up making guesses about the user situation which are hard to evaluate (we have no real-world-looking data to measure against) and pay costs that a demanding user could request to skip by having of_sorted_list and of_list exposed directly, and making the choice herself. =B9: the use case was similar: there is an implementation of non-functorized maps inherited from Extlib that some user find convenient to use, where maps are records carrying their own comparison operation, but then the implementation of binary functions such as union/diff/intersect and merge are very different depending on whether the two comparison operations provided are compatible or not. https://github.com/ocaml-batteries-team/batteries-included/blob/master/sr= c/batMap.ml#L571 A note on API design: the type abstraction of Set and Map are inconvenient when users want to implement better performing operations on top of the balanced tree model. My personal opinion is that we should preserve the abstraction of Set and Map, reject all operations that rely on a representation assumption of sorted balanced trees (unfortunately some of them, such as split, have already crept in), and provide as a different module a "purely algorithmic" transparent definition of balanced binary trees that users could extend easily and on which they would build their own data structures. I insist on having both abstract purpose-oriented modules and concrete algorithm-oriented data structures, and not mixing them as is currently done. Not relying on ordered representations would be useful if we considered moving to a different representation for sets and maps. Thibault Suzanne has been experimenting with Hash Array Mapped Tries (HAMT) in OCaml and the results are relatively promising, but this implementation would have a hard time transparently replacing stdlib's Set and Map because of their exposed balanced implementation details. https://gitorious.org/ocaml-hamt On Mon, Nov 5, 2012 at 9:05 AM, Radu Grigore wrote: > Just to clarify what I meant, see the attached (hackish) benchmarks and p= atch. > > -- begin benchmarks.ml -- > (* Times (in seconds) > > 1000000 random lists of length 10 > of_list1 9.23 > of_list2 9.80 > S.of_list 10.15 (+3.6% compared to of_list2) > > 1 random list of length 10000000 > of_list1 Stack_overflow > of_list2 102.39 > S.of_list 104.15 (+1.8% compared to of_list2) > > 1000000 sorted lists of length 10 > of_list1 11.14 > of_list2 12.51 > S.of_list 4.53 (-63% compared to of_list2) > > 1 sorted list of length 10000000 > of_list1 Stack_overflow > of_list2 71.91 > S.of_list 6.73 (-90% compared to of_list2) > > *) > open Printf > > module S =3D Set.Make (struct type t =3D int let compare =3D compare end) > > let of_list1 xs =3D List.fold_right S.add xs S.empty > let of_list2 xs =3D List.fold_left (fun x y -> S.add y x ) S.empty xs > > let mk_list n =3D > let xs =3D ref [] in > for i =3D n downto 1 do xs :=3D i (*Random.int max_int*) :: !xs done; > !xs > > let time s f xss =3D > let a =3D Sys.time () in > List.iter (fun xs -> ignore (f xs)) xss; > let b =3D Sys.time () in > printf "%s %.2f\n" s (b -. a) > > let _ =3D > Random.init 0; > let n =3D 1 in > let xss =3D ref [] in > for i =3D 1 to n do xss :=3D mk_list 10000000 :: !xss done; > (* time "of_list1" of_list1 !xss; *) > time "of_list2" of_list2 !xss; > time "S.of_list" S.of_list !xss > -- end benchmarks.ml -- > > > > --begin of_list.patch-- > Index: stdlib/moreLabels.mli > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- stdlib/moreLabels.mli (revision 13061) > +++ stdlib/moreLabels.mli (working copy) > @@ -155,6 +155,7 @@ > val partition : f:(elt -> bool) -> t -> t * t > val cardinal : t -> int > val elements : t -> elt list > + val of_list : elt list -> t > val min_elt : t -> elt > val max_elt : t -> elt > val choose : t -> elt > Index: stdlib/set.ml > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- stdlib/set.ml (revision 13061) > +++ stdlib/set.ml (working copy) > @@ -43,6 +43,7 @@ > val partition: (elt -> bool) -> t -> t * t > val cardinal: t -> int > val elements: t -> elt list > + val of_list: elt list -> t > val min_elt: t -> elt > val max_elt: t -> elt > val choose: t -> elt > @@ -343,6 +344,29 @@ > Empty -> accu > | Node(l, v, r, _) -> elements_aux (v :: elements_aux accu r) l > > + exception Unsorted > + > + let rec of_sorted_list n xs =3D > + if n =3D 0 then (empty, xs) else begin > + let l, xs =3D of_sorted_list (n / 2) xs in > + let v, xs =3D (match xs with v :: xs -> v, xs | [] -> assert fal= se) in > + let r, xs =3D of_sorted_list (n - n / 2 - 1) xs in > + (create l v r, xs) > + end > + > + let of_list =3D function > + [] -> empty > + | (x :: xs) as ys -> begin > + let rec len n x =3D function > + [] -> n > + | z :: zs -> > + if Ord.compare x z < 0 > + then len (n + 1) z zs > + else raise Unsorted in > + try let n =3D len 1 x xs in let set, _ =3D of_sorted_list n ys= in set > + with Unsorted -> List.fold_left (fun t elt -> add elt t) empty= ys > + end > + > let elements s =3D > elements_aux [] s > > Index: stdlib/set.mli > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- stdlib/set.mli (revision 13061) > +++ stdlib/set.mli (working copy) > @@ -121,6 +121,9 @@ > to the ordering [Ord.compare], where [Ord] is the argument > given to {!Set.Make}. *) > > + val of_list : elt list -> t > + (** Makes a set out of a list. *) > + > val min_elt: t -> elt > (** Return the smallest element of the given set > (with respect to the [Ord.compare] ordering), or raise > --end of_list.patch--