From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 409EE7EFCD for ; Wed, 29 Oct 2014 11:12:04 +0100 (CET) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of gabriel.scherer@gmail.com) identity=pra; client-ip=209.85.214.179; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of gabriel.scherer@gmail.com designates 209.85.214.179 as permitted sender) identity=mailfrom; client-ip=209.85.214.179; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-ob0-f179.google.com) identity=helo; client-ip=209.85.214.179; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="postmaster@mail-ob0-f179.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvcBACC8UFTRVdazlGdsb2JhbABTCYNiWASCOkjLP4dJAoERBxYBAQEBAREBAQEBBwsLCRIwhAMBAQQMBhEEGQEbHQEDDAYDAgsNAgImAgIiAREBBQEcGSKICQEDEg2TbpA+boszgXKDEYhjChknDWeFUQEBAQEGAQEBAQEBFgEFDoEejUCBNQ0EBgEBHDMHgneBVAWTE4NChEGCUoExESuGPIwZGCmFLjwvAYEGCBeBJQEBAQ X-IPAS-Result: AvcBACC8UFTRVdazlGdsb2JhbABTCYNiWASCOkjLP4dJAoERBxYBAQEBAREBAQEBBwsLCRIwhAMBAQQMBhEEGQEbHQEDDAYDAgsNAgImAgIiAREBBQEcGSKICQEDEg2TbpA+boszgXKDEYhjChknDWeFUQEBAQEGAQEBAQEBFgEFDoEejUCBNQ0EBgEBHDMHgneBVAWTE4NChEGCUoExESuGPIwZGCmFLjwvAYEGCBeBJQEBAQ X-IronPort-AV: E=Sophos;i="5.04,809,1406584800"; d="scan'208";a="85199251" Received: from mail-ob0-f179.google.com ([209.85.214.179]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 29 Oct 2014 11:12:02 +0100 Received: by mail-ob0-f179.google.com with SMTP id m8so2119936obr.10 for ; Wed, 29 Oct 2014 03:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=5pEWBNm09CDsxKTnyOY21CZrTvcp2gr8n40T6JSZSAQ=; b=0GCW3Qe4WbbSxIH4Hi2v2JpJ/X7uECajf9Fr0QjjZoBbvGYG0HQIpW+QZOM3VoZvVe HbQQ/7CWmNdh5LhgT6jGATIL/+46OwoQTFXNG3WkalvRYZ+uqfvQFay764L17cZLXniR ylS2iqrOZbesFulYo0DTuxUBS4JBfQRkoyF1Wij5EBOKXIN6WQbajiqdxZfNT73/dJzm rNZhzS74QKJQ+z4XLnYeE6azGLktcVap+7bzkwCH1po1rq/aq1fTUOT8PiMGowJb9R+v ZMasz7MWaW1RfnXn0/EhWenf04PCtMJVk5xdQdk/EfD7Un5lCmC2jYczAyhVT96jUs1L nqBA== X-Received: by 10.202.48.212 with SMTP id w203mr7476513oiw.30.1414577521280; Wed, 29 Oct 2014 03:12:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.105.196 with HTTP; Wed, 29 Oct 2014 03:11:21 -0700 (PDT) In-Reply-To: <20141001102943.AA0C0C382A@www1.g3.pair.com> References: <20141001102943.AA0C0C382A@www1.g3.pair.com> From: Gabriel Scherer Date: Wed, 29 Oct 2014 11:11:21 +0100 Message-ID: To: oleg@okmij.org Cc: Goswin von Brederlow , caml users , =?UTF-8?B?RnLDqWTDqXJpYyBCb3Vy?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Caml-list] Why List.map does not be implemented During a discussion of Tail-Recursion Modulo Cons with Fr=C3=A9d=C3=A9ric B= our, I had an amusing idea of how to fix that problem pointed out by Oleg. The idea is that we can detect at mutation time whether non-linear control effect happened, and in that case do a deep copy of the partially-filled structure. It sounds slightly crazy but works beautifully. Here is the current implementation style of `map` in Batteries; it is merely an abstraction over Oleg's code example above (the decision to implement destination-passing style was not made by Batteries, but in the former Extlib project, but the small abstraction layer is due to Fran=C3=A7ois B=C3=A9renger as part of Batteries). type 'a mut_list =3D { hd: 'a; mutable tl: 'a list } external inj : 'a mut_list -> 'a list =3D "%identity" module Acc1 =3D struct let create x =3D { hd =3D x; tl =3D [] } let accum acc x =3D let cell =3D create x in acc.tl <- inj cell; cell let return head =3D inj head end let map_dst f =3D function | [] -> [] | x :: xs -> let open Acc1 in let rec loop f acc =3D function | [] -> return acc | x :: xs -> loop f (accum acc (f x)) xs in let acc =3D create (f x) in loop f acc xs `accum` performs a mutation and returns the new destination in which to produce the rest of the list. This is destination-passing-style, and can also be understood as a manual TRMC optimization. It can also be read as a "snoc" operation (adding a new element to the tail of a list): it is possible to look at map_dst and think it is a pure implementation. This pure implementation is optimized using mutation under the hood, assuming (Mezzo-style) that the "map" function is the unique owner of the intermediate list, which can thus be mutated-in-place (even using strong mutation turning its type from a mutable value to an immutable one). Oleg explains that non-linear use of control operators (calling a captured (delimited) continuation several times) allows to observe this mutation, which thus destroys the probability computation of HANSEI. Another way to see this problem is that non-linear use of continuations breaks the unique-ownership invariant justifying the optimization. Yet it is possible to dynamically detect when we do not uniquely own the list anymore (a previous invocation of the current continuation has already mutated it), by filling the yet-to-be-mutated holes with a secret canary value (instead of []). An accumulator is "fresh" (uniquely owned) when its tail is (physically) equal to the canary value. If we notice a mutator is not fresh, we perform a deep copy of the list (from the head to the accumulator) and start accumulating again from this copy. Here is the new implementation of "map" using an Acc2 module implementing freshness-check and deep-copy (code at the end of this mail): let rec map_dst2 f =3D function | [] -> [] | x :: xs -> let open Acc2 in (* precondition: acc is fresh *) let rec loop f head acc =3D function | [] -> return head acc | x :: xs -> let next =3D f x in (* control effects during (f x) may destroy freshness *) if fresh acc then loop f head (accum acc next) xs else begin let head, acc =3D copy head (inj acc) in (* fresh after copy *) loop f head (accum acc next) xs end in let head =3D create (f x) in (* head is fresh *) loop f head head xs The code is slightly more verbose because the deep-copy iteration needs to be passed both the current accumulator and the head of the list (it wouldn't know where to copy from otherwise). Notice that the fast path (in absence of non-linear control) only has an additional freshness check. I benchmarked the two implementations and the difference is negligible -- in particular, it does not perform any additional allocation in the fast case. This version works fine with HANSEI. (I'm not claiming this implementation alone is what people should use for List.map. It does not behave particularly well on small lists, which is the most important thing for most use-cases. The only point is that the destination-passing-style can be made to work with non-linear control.) Finally, here is the "protected" Acc module implementing canary-check and deep copy: module Acc2 =3D struct let rec canary =3D Obj.magic () :: canary let copy head limit =3D let rec copy li limit tail =3D match li with | [] -> assert false | hd::tl -> let cell =3D { hd; tl =3D canary } in tail.tl <- inj cell; if li =3D=3D limit then cell else copy tl limit cell in let new_head =3D { hd =3D head.hd; tl =3D canary } in if inj head =3D=3D limit then (new_head, new_head) else begin let tail =3D copy head.tl limit new_head in new_head, tail end let create x =3D { hd =3D x; tl =3D canary } let fresh acc =3D acc.tl =3D=3D canary let accum acc x =3D let cell =3D { hd =3D x; tl =3D canary } in acc.tl <- inj cell; cell let return head acc =3D acc.tl <- []; inj head end Full source for this experiment is available here: https://www.gitorious.org/gasche-snippets/destination-passing-style-and-c= ontrol-effects/source/HEAD:map.ml On Wed, Oct 1, 2014 at 12:29 PM, wrote: > > Gabriel Scherer wrote >> My intuition would be that the mutation should not be observable in this >> case, as it is only done on "private" data that the List.map function >> allocate, owns, and which never escapes in its immutable form. > > Let me explain, on the example of the simplest Hansei function > val flip : float -> bool > that flips (a possibly unfair) coin; flip 0.5 flips the fair coin. > A good way to understand the meaning of this function is to take the > frequentist approach. There are two ways the function call (flip 0.5) > can return: it can return with true or with false. There are total two > choices, in one of them the function returns true, in another > false. So, the probability of returning true is 1/2. In general, the > complete distribution -- which is the meaning of the program > (flip 0.5) -- is as follows > > # exact_reify (fun () -> flip 0.5);; > - : bool Ptypes.pV =3D [(0.5, Ptypes.V true); (0.5, Ptypes.V false)] > > which is what Hansei computes. Let's take a more complex example, > closer to the topic: two independent coin flips > > # let model () =3D List.map flip [0.5; 0.5] > val model : unit -> bool list =3D > > # exact_reify model;; > - : bool list Ptypes.pV =3D > [(0.25, Ptypes.V [true; true]); (0.25, Ptypes.V [true; false]); > (0.25, Ptypes.V [false; true]); (0.25, Ptypes.V [false; false])] > > The result correctly tells that there are four possible choices. > > Let us consider the function map from the Batteries. After desugaring, > it has the following form > > type 'a mut_list =3D { > hd: 'a; > mutable tl: 'a list > } > > let map : ('a -> 'b) -> 'a list -> 'b list =3D fun f -> function > | [] -> [] > | h :: t -> > let rec loop dst =3D function > | [] -> () > | h :: t -> > let cell =3D {hd =3D f h; tl =3D []} in > dst.tl <- Obj.magic cell; > loop cell t > in > let r =3D {hd =3D f h; tl =3D []} in > loop r t; > Obj.magic r > > Using this function, we obtain > > # let model1 () =3D map flip [0.5; 0.5] > val model1 : unit -> bool list =3D > > # exact_reify model1 > - : bool list Ptypes.pV =3D > [(0.5, Ptypes.V [true; false]); (0.5, Ptypes.V [false; false])] > > Something went wrong, didn't it? Where are the other two possible > choices -- with true for the second flip -- for the list of two > independent coin flips? > > To understand the problem, let us recall the main principle stated > earlier: ``There are two ways the function call (flip 0.5) > can return: it can return with true or with false.'' That is, the > function call (flip 0.5) returns *twice*. Therefore, the call (f h) > in the let cell statement returns twice. Therefore, the assignment > dst.tl <- Obj.magic cell; > will be executed twice. And the second execution of that assignment > will be with a different cell, which will override and destroy the > result of the first assignment. That is how the "true" choices became > lost. > > >