From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gabriel.scherer@gmail.com>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by sympa.inria.fr (Postfix) with ESMTPS id 409EE7EFCD
	for <caml-list@sympa.inria.fr>; Wed, 29 Oct 2014 11:12:04 +0100 (CET)
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  gabriel.scherer@gmail.com) identity=pra;
  client-ip=209.85.214.179;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="gabriel.scherer@gmail.com";
  x-sender="gabriel.scherer@gmail.com";
  x-conformance=sidf_compatible
Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of
  gabriel.scherer@gmail.com designates 209.85.214.179 as
  permitted sender) identity=mailfrom;
  client-ip=209.85.214.179;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="gabriel.scherer@gmail.com";
  x-sender="gabriel.scherer@gmail.com";
  x-conformance=sidf_compatible; x-record-type="v=spf1"
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mail-ob0-f179.google.com) identity=helo;
  client-ip=209.85.214.179;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="gabriel.scherer@gmail.com";
  x-sender="postmaster@mail-ob0-f179.google.com";
  x-conformance=sidf_compatible
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvcBACC8UFTRVdazlGdsb2JhbABTCYNiWASCOkjLP4dJAoERBxYBAQEBAREBAQEBBwsLCRIwhAMBAQQMBhEEGQEbHQEDDAYDAgsNAgImAgIiAREBBQEcGSKICQEDEg2TbpA+boszgXKDEYhjChknDWeFUQEBAQEGAQEBAQEBFgEFDoEejUCBNQ0EBgEBHDMHgneBVAWTE4NChEGCUoExESuGPIwZGCmFLjwvAYEGCBeBJQEBAQ
X-IPAS-Result: AvcBACC8UFTRVdazlGdsb2JhbABTCYNiWASCOkjLP4dJAoERBxYBAQEBAREBAQEBBwsLCRIwhAMBAQQMBhEEGQEbHQEDDAYDAgsNAgImAgIiAREBBQEcGSKICQEDEg2TbpA+boszgXKDEYhjChknDWeFUQEBAQEGAQEBAQEBFgEFDoEejUCBNQ0EBgEBHDMHgneBVAWTE4NChEGCUoExESuGPIwZGCmFLjwvAYEGCBeBJQEBAQ
X-IronPort-AV: E=Sophos;i="5.04,809,1406584800"; 
   d="scan'208";a="85199251"
Received: from mail-ob0-f179.google.com ([209.85.214.179])
  by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 29 Oct 2014 11:12:02 +0100
Received: by mail-ob0-f179.google.com with SMTP id m8so2119936obr.10
        for <caml-list@inria.fr>; Wed, 29 Oct 2014 03:12:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-type:content-transfer-encoding;
        bh=5pEWBNm09CDsxKTnyOY21CZrTvcp2gr8n40T6JSZSAQ=;
        b=0GCW3Qe4WbbSxIH4Hi2v2JpJ/X7uECajf9Fr0QjjZoBbvGYG0HQIpW+QZOM3VoZvVe
         HbQQ/7CWmNdh5LhgT6jGATIL/+46OwoQTFXNG3WkalvRYZ+uqfvQFay764L17cZLXniR
         ylS2iqrOZbesFulYo0DTuxUBS4JBfQRkoyF1Wij5EBOKXIN6WQbajiqdxZfNT73/dJzm
         rNZhzS74QKJQ+z4XLnYeE6azGLktcVap+7bzkwCH1po1rq/aq1fTUOT8PiMGowJb9R+v
         ZMasz7MWaW1RfnXn0/EhWenf04PCtMJVk5xdQdk/EfD7Un5lCmC2jYczAyhVT96jUs1L
         nqBA==
X-Received: by 10.202.48.212 with SMTP id w203mr7476513oiw.30.1414577521280;
 Wed, 29 Oct 2014 03:12:01 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.76.105.196 with HTTP; Wed, 29 Oct 2014 03:11:21 -0700 (PDT)
In-Reply-To: <20141001102943.AA0C0C382A@www1.g3.pair.com>
References: <CAPFanBFyQApQX7rwkjy1PCyKqyyjuP0YauOJcnw58AFoEiyGZg@mail.gmail.com>
 <20141001102943.AA0C0C382A@www1.g3.pair.com>
From: Gabriel Scherer <gabriel.scherer@gmail.com>
Date: Wed, 29 Oct 2014 11:11:21 +0100
Message-ID: <CAPFanBHQgD+octf9Qd=U+iL-LHms7Oy0a0RtWJ+HPX0gFXa=fg@mail.gmail.com>
To: oleg@okmij.org
Cc: Goswin von Brederlow <goswin-v-b@web.de>, caml users <caml-list@inria.fr>, 
	=?UTF-8?B?RnLDqWTDqXJpYyBCb3Vy?= <frederic.bour@lakaban.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Caml-list] Why List.map does not be implemented

During a discussion of Tail-Recursion Modulo Cons with Fr=C3=A9d=C3=A9ric B=
our,
I had an amusing idea of how to fix that problem pointed out by Oleg.
The idea is that we can detect at mutation time whether non-linear
control effect happened, and in that case do a deep copy of the
partially-filled structure. It sounds slightly crazy but works
beautifully.

Here is the current implementation style of `map` in Batteries; it is
merely an abstraction over Oleg's code example above (the decision to
implement destination-passing style was not made by Batteries, but in
the former Extlib project, but the small abstraction layer is due to
Fran=C3=A7ois B=C3=A9renger as part of Batteries).

type 'a mut_list =3D {
  hd: 'a;
  mutable tl: 'a list
}
external inj : 'a mut_list -> 'a list =3D "%identity"

module Acc1 =3D struct
  let create x =3D { hd =3D x; tl =3D [] }

  let accum acc x =3D
    let cell =3D create x in
    acc.tl <- inj cell;
    cell

  let return head =3D inj head
end

let map_dst f =3D function
  | [] -> []
  | x :: xs ->
     let open Acc1 in
     let rec loop f acc =3D function
       | [] -> return acc
       | x :: xs ->
          loop f (accum acc (f x)) xs
     in
     let acc =3D create (f x) in
     loop f acc xs

`accum` performs a mutation and returns the new destination in which
to produce the rest of the list. This is destination-passing-style,
and can also be understood as a manual TRMC optimization. It can also
be read as a "snoc" operation (adding a new element to the tail of a
list): it is possible to look at map_dst and think it is a pure
implementation. This pure implementation is optimized using mutation
under the hood, assuming (Mezzo-style) that the "map" function is the
unique owner of the intermediate list, which can thus be
mutated-in-place (even using strong mutation turning its type from a
mutable value to an immutable one).

Oleg explains that non-linear use of control operators (calling a
captured (delimited) continuation several times) allows to observe
this mutation, which thus destroys the probability computation of
HANSEI. Another way to see this problem is that non-linear use of
continuations breaks the unique-ownership invariant justifying the
optimization.

Yet it is possible to dynamically detect when we do not uniquely own
the list anymore (a previous invocation of the current continuation
has already mutated it), by filling the yet-to-be-mutated holes with a
secret canary value (instead of []). An accumulator is "fresh"
(uniquely owned) when its tail is (physically) equal to the canary
value. If we notice a mutator is not fresh, we perform a deep copy of
the list (from the head to the accumulator) and start accumulating
again from this copy.

Here is the new implementation of "map" using an Acc2 module
implementing freshness-check and deep-copy (code at the end of this
mail):

let rec map_dst2 f =3D function
  | [] -> []
  | x :: xs ->
     let open Acc2 in
     (* precondition: acc is fresh *)
     let rec loop f head acc =3D function
       | [] -> return head acc
       | x :: xs ->
      let next =3D f x in
      (* control effects during (f x) may destroy freshness *)
      if fresh acc then loop f head (accum acc next) xs
      else begin
         let head, acc =3D copy head (inj acc) in
         (* fresh after copy *)
         loop f head (accum acc next) xs
      end
     in
     let head =3D create (f x) in
     (* head is fresh *)
     loop f head head xs

The code is slightly more verbose because the deep-copy iteration
needs to be passed both the current accumulator and the head of the
list (it wouldn't know where to copy from otherwise). Notice that the
fast path (in absence of non-linear control) only has an additional
freshness check. I benchmarked the two implementations and the
difference is negligible -- in particular, it does not perform any
additional allocation in the fast case.

This version works fine with HANSEI.

(I'm not claiming this implementation alone is what people should use
for List.map. It does not behave particularly well on small lists,
which is the most important thing for most use-cases. The only point
is that the destination-passing-style can be made to work with
non-linear control.)

Finally, here is the "protected" Acc module implementing canary-check
and deep copy:

module Acc2 =3D struct
  let rec canary =3D Obj.magic () :: canary

  let copy head limit =3D
    let rec copy li limit tail =3D
      match li with
        | [] -> assert false
        | hd::tl ->
           let cell =3D { hd; tl =3D canary } in
           tail.tl <- inj cell;
           if li =3D=3D limit then cell
           else copy tl limit cell
    in
    let new_head =3D { hd =3D head.hd; tl =3D canary } in
    if inj head =3D=3D limit then (new_head, new_head)
    else begin
      let tail =3D copy head.tl limit new_head in
      new_head, tail
    end

  let create x =3D { hd =3D x; tl =3D canary }

  let fresh acc =3D acc.tl =3D=3D canary

  let accum acc x =3D
    let cell =3D { hd =3D x; tl =3D canary } in
    acc.tl <- inj cell;
    cell

  let return head acc =3D
    acc.tl <- [];
    inj head
end

Full source for this experiment is available here:
  https://www.gitorious.org/gasche-snippets/destination-passing-style-and-c=
ontrol-effects/source/HEAD:map.ml

On Wed, Oct 1, 2014 at 12:29 PM,  <oleg@okmij.org> wrote:
>
> Gabriel Scherer wrote
>> My intuition would be that the mutation should not be observable in this
>> case, as it is only done on "private" data that the List.map function
>> allocate, owns, and which never escapes in its immutable form.
>
> Let me explain, on the example of the simplest Hansei function
>         val flip : float -> bool
> that flips (a possibly unfair) coin; flip 0.5 flips the fair coin.
> A good way to understand the meaning of this function is to take the
> frequentist approach. There are two ways the function call (flip 0.5)
> can return: it can return with true or with false. There are total two
> choices, in one of them the function returns true, in another
> false. So, the probability of returning true is 1/2. In general, the
> complete distribution -- which is the meaning of the program
> (flip 0.5) -- is as follows
>
> # exact_reify (fun () -> flip 0.5);;
> - : bool Ptypes.pV =3D [(0.5, Ptypes.V true); (0.5, Ptypes.V false)]
>
> which is what Hansei computes. Let's take a more complex example,
> closer to the topic: two independent coin flips
>
> # let model () =3D List.map flip [0.5; 0.5]
> val model : unit -> bool list =3D <fun>
>
> # exact_reify model;;
> - : bool list Ptypes.pV =3D
> [(0.25, Ptypes.V [true; true]); (0.25, Ptypes.V [true; false]);
>  (0.25, Ptypes.V [false; true]); (0.25, Ptypes.V [false; false])]
>
> The result correctly tells that there are four possible choices.
>
> Let us consider the function map from the Batteries. After desugaring,
> it has the following form
>
> type 'a mut_list =3D {
>     hd: 'a;
>     mutable tl: 'a list
>   }
>
> let map : ('a -> 'b) -> 'a list -> 'b list =3D fun f -> function
>   | [] -> []
>   | h :: t ->
>       let rec loop dst =3D function
>         | [] -> ()
>         | h :: t ->
>             let cell =3D {hd =3D f h; tl =3D []} in
>             dst.tl <- Obj.magic cell;
>             loop cell t
>       in
>       let r =3D {hd =3D f h; tl =3D []} in
>       loop r t;
>       Obj.magic r
>
> Using this function, we obtain
>
> # let model1 () =3D map flip [0.5; 0.5]
> val model1 : unit -> bool list =3D <fun>
>
> # exact_reify model1
> - : bool list Ptypes.pV =3D
> [(0.5, Ptypes.V [true; false]); (0.5, Ptypes.V [false; false])]
>
> Something went wrong, didn't it? Where are the other two possible
> choices -- with true for the second flip -- for the list of two
> independent coin flips?
>
> To understand the problem, let us recall the main principle stated
> earlier: ``There are two ways the function call (flip 0.5)
> can return: it can return with true or with false.'' That is, the
> function call (flip 0.5) returns *twice*. Therefore, the call (f h)
> in the let cell statement returns twice. Therefore, the assignment
>             dst.tl <- Obj.magic cell;
> will be executed twice. And the second execution of that assignment
> will be with a different cell, which will override and destroy the
> result of the first assignment. That is how the "true" choices became
> lost.
>
>
>