caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
@ 2013-10-14 15:30 Enrico Tassi
  2013-10-16 14:03 ` Jacques-Henri Jourdan
  2013-10-22  9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg
  0 siblings, 2 replies; 6+ messages in thread
From: Enrico Tassi @ 2013-10-14 15:30 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 2057 bytes --]

Hello caml-list.  I'm new here, It's my fist post, and I want to share
an hack I came up with to systematically solve some marshalling problems
I have in the context of Coq development.  I hope the hack is of general
interest, maybe it is folklore, maybe not... 

The Coq architecture is switching to a multi process one, where slave
processes are fed a system state, a portion of the proof script (an
entire proof at the moment) and are expected to return the resulting
proof term.  Data is marshalled using the standard Marshal module.

The system state is made of pretty much anything, and is also user
extensible via plugins.  Nothing prevents someone to stick in there
values that can hardly be marshalled, like callbacks, file descriptors,
lazy_t, and the like.  Of course it is nice to be able to store
callbacks or lazy values in the system state, so forbidding all that is
not nice.

Luckily enough, I don't need this data in the slave processes.
Hence I could pre-process the system state and throw it away.
But the system state is big, and traversing it to prune out some bits is
likely to be expensive.  Moreover I'm lazy, I don't want to code that
pruning function, hence the following solution.  

Values that can't, or should not, be marshalled are stored in the system
state using a unique key, and given a key one can retrieve the
corresponding value.  These (key,value) pairs are ephemeron, if the key
is not reachable, the value (if it has no extra references) will be
eventually collected too.  When a key is marshalled, no error occurs
(keys are just doubly boxed integers).  When a key is unmarshalled it
becomes invalid, i.e. the associated value cannot be retrieved anymore.

In some sense it is like if marshalling was forgetting the part of
the marshalled value that is stored as a key.  All automatically, no
need to pre/post process the marshalled value and no extra memory
management hassle.

Thanks to Damien Doligez for his preliminary comments on the attached
files.

All sort of comments is welcome,
Cheers
-- 
Enrico Tassi

[-- Attachment #2: ephemeron.mli --]
[-- Type: text/plain, Size: 1638 bytes --]

(* Use case:
     You have a data structure that needs to be marshalled but it contains
     unmarshallable data (like a closure, or a file descriptor).  Actually
     you don't need this data to be preserved by marshalling, it just happens
     to be there.
     You could produced a trimmed down data structure, but then, once
     unmarshalled, you can't used the very same code to process it, unless you
     re-inject the trimmed down data structure into the standard one, using
     dummy values for the unmarshallable stuff.
     Similarly you could change your data structure turning all types [bad]
     into [bad option], then just before marshalling you set all values of type
     [bad option] to [None].  Still this pruning may be expensive and you have
     to code it.

   Desiderata:
     The marshalling operation automatically discards values that cannot be
     marshalled or cannot be properly unmarshalled.

   Proposed solution:
     Turn all occurrences of [bad] into [bad key] in your data structure.
     Use [crate bad_val] to obtain a unique key [k] for [bad_val], and store
     [k] in the data structure.  Use [get k] to obtain [bad_val].

     An ['a key] can always be marshalled.  When marshalled, a key loses its
     value.  The function [get] raises Not_found on unmarshalled keys.
     
     If a key is garbage collected, the corresponding value is garbage
     collected too (unless extra references to it exist).
     In short no memory management hassle, keys can just replace their
     corresponding value in the data structure.  *)

type 'a key

val create : 'a -> 'a key
val get : 'a key -> 'a

[-- Attachment #3: ephemeron.ml --]
[-- Type: text/plain, Size: 1484 bytes --]

type key_type = int

type boxed_key = key_type ref ref

let mk_key : unit -> boxed_key =
  (* We should take a random value here, is there a random function in OCaml?*) 
  let bid = ref 0 in
  (* According to OCaml Gc module documentation, Pervasives.ref is one of the
     few ways of getting a boxed value the compiler will never alias. *)
  fun () -> incr bid; Pervasives.ref (Pervasives.ref !bid)

(* A phantom type to preserve type safety *)
type 'a key = boxed_key

(* Comparing keys with == grants that if a key is unmarshalled (in the same
   process where it was created or in another one) it is not mistaken for
   an already existing one (unmarshal has no right to alias).  If the initial
   value of bid is taken at random, then one also avoids potential collisions *)
module HT = Hashtbl.Make(struct
  type t = key_type ref
  let equal k1 k2 = k1 == k2
  let hash id = !id
end)

(* A key is the (unique) value inside a boxed key, hence it does not
   keep its corresponding boxed key reachable (replacing key_type by boxed_key
   would make the key always reachable) *)
let values : Obj.t HT.t = HT.create 1001

let collect k = HT.remove values !k

(* The only reference to the boxed key is the one returned, when the user drops
   it the value eventually disappears from the values table above *)
let create (v : 'a) : 'a key =
  let k = mk_key () in
  HT.add values !k (Obj.repr v);
  Gc.finalise collect k;
  k

let get (k : 'a key) : 'a = Obj.obj (HT.find values !k)

[-- Attachment #4: test.ml --]
[-- Type: text/plain, Size: 839 bytes --]


module E = Ephemeron ;;

let test0 () =
  let f = E.create (fun () -> prerr_endline "OK") in
  try E.get f ()
  with Not_found -> prerr_endline "WTF"
;;

let test1 () =
  let f = E.create (fun () -> prerr_endline "OK") in
  Gc.full_major ();
  try E.get f ()
  with Not_found -> prerr_endline "WTF"
;;

let test2 () =
  let str =
    let f = E.create (fun () -> prerr_endline "WTF") in
    Marshal.to_string f [] in
  Gc.full_major ();
  let f : (unit -> unit) E.key = Marshal.from_string str 0 in
  try E.get f ()
  with Not_found -> prerr_endline "OK"
;;


let test3 () =
  let str =
    let f = E.create (fun () -> prerr_endline "WTF") in
    Marshal.to_string f [] in
  let f : (unit -> unit) E.key = Marshal.from_string str 0 in
  try E.get f ()
  with Not_found -> prerr_endline "OK"
;;

test0 ();;
test1 ();;
test2 ();;
test3 ();;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
  2013-10-14 15:30 [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons Enrico Tassi
@ 2013-10-16 14:03 ` Jacques-Henri Jourdan
  2013-10-16 14:42   ` Enrico Tassi
  2013-10-22  9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg
  1 sibling, 1 reply; 6+ messages in thread
From: Jacques-Henri Jourdan @ 2013-10-16 14:03 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 926 bytes --]

Hi,

Le 14/10/2013 17:30, Enrico Tassi a écrit :
> let collect k = HT.remove values !k
> 
> (* The only reference to the boxed key is the one returned, when the user drops
>    it the value eventually disappears from the values table above *)
> let create (v : 'a) : 'a key =
>   let k = mk_key () in
>   HT.add values !k (Obj.repr v);
>   Gc.finalise collect k;
>   k


In that piece of code, you should take care to the fact that the
finalizer can be called at /any/ moment.

In particular, it can be called while you are modifying your hash table.
If this situation happens, then you are modifying the hash table (by
removing elements) when it is in an unconsistent state.

Did you think about that ? Do I miss something ?

The authors of Why3 encountered similar problems. I suggest you to read
the Weakhtbl module, that gives a solution to this problem.

Regards,
-- 
Jacques-Henri Jourdan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 555 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
  2013-10-16 14:03 ` Jacques-Henri Jourdan
@ 2013-10-16 14:42   ` Enrico Tassi
  2013-10-16 15:30     ` François Bobot
  0 siblings, 1 reply; 6+ messages in thread
From: Enrico Tassi @ 2013-10-16 14:42 UTC (permalink / raw)
  To: caml-list

On Wed, Oct 16, 2013 at 04:03:18PM +0200, Jacques-Henri Jourdan wrote:
> The authors of Why3 encountered similar problems. I suggest you to read
> the Weakhtbl module, that gives a solution to this problem.

Thanks for the hint!

Indeed they use an imperative list to enqueue/dequeue deletions
that are hence never performed by the finalizer itself.

Ciao
-- 
Enrico Tassi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
  2013-10-16 14:42   ` Enrico Tassi
@ 2013-10-16 15:30     ` François Bobot
  2013-10-16 16:24       ` Alain Frisch
  0 siblings, 1 reply; 6+ messages in thread
From: François Bobot @ 2013-10-16 15:30 UTC (permalink / raw)
  To: caml-list

On 16/10/2013 16:42, Enrico Tassi wrote:
> On Wed, Oct 16, 2013 at 04:03:18PM +0200, Jacques-Henri Jourdan wrote:
>> The authors of Why3 encountered similar problems. I suggest you to read
>> the Weakhtbl module, that gives a solution to this problem.
>
> Thanks for the hint!
>
> Indeed they use an imperative list to enqueue/dequeue deletions
> that are hence never performed by the finalizer itself.
>

Another important points is that it allows concurrent access. The 
finalizer can run when the actual deletions is performed (add can run 
when iter_remove is run)


module ProdConsume :
sig
   type 'a t
   val create : unit -> 'a t
   val add : 'a -> 'a t -> unit
   val iter_remove : ('a -> unit) -> 'a t -> unit
end


-- 
François

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons
  2013-10-16 15:30     ` François Bobot
@ 2013-10-16 16:24       ` Alain Frisch
  0 siblings, 0 replies; 6+ messages in thread
From: Alain Frisch @ 2013-10-16 16:24 UTC (permalink / raw)
  To: François Bobot, caml-list

I can see that Damien created an "ephemeron" branch in the SVN.  Is it 
related to François' proposal presented in the 2011 OCaml User Conference ?

https://forge.ocamlcore.org/docman/view.php/77/134/memoization2011.pdf



Alain

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Marshalling: automatic discard of unmashalable data via
  2013-10-14 15:30 [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons Enrico Tassi
  2013-10-16 14:03 ` Jacques-Henri Jourdan
@ 2013-10-22  9:38 ` oleg
  1 sibling, 0 replies; 6+ messages in thread
From: oleg @ 2013-10-22  9:38 UTC (permalink / raw)
  To: enrico.tassi; +Cc: caml-list


> The system state is made of pretty much anything, and is also user
> extensible via plugins.  Nothing prevents someone to stick in there
> values that can hardly be marshalled, like callbacks, file descriptors,
> lazy_t, and the like.  Of course it is nice to be able to store
> callbacks or lazy values in the system state, so forbidding all that is
> not nice.

There is a similar problem when marshalling a captured delimited
continuation. The continuation captures a part of stack that points to
various closures in system libraries, which contain lots of
unserializable stuff. In addition, reference cells (t ref) can't be
meaningfully serialized as they are copied in the process of the
serialization, which breaks their semantics. 

The library delimcc proposes a solution. Unlike your situation, I do
care to properly serialize `bad' value and properly restore. The
unmarshalled delimited continuation must work properly. I cannot
afford not to care what the unmarshalled value will look like.

The solution, which involves pre-processing a value before marshalling
and post-processing after unmarshalling, is described at
        http://okmij.org/ftp/ML/ML.html#persistent-delim2cc
as well in Sec 8 of
        http://okmij.org/ftp/continuations/caml-shift-journal.pdf


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-10-22  9:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-14 15:30 [Caml-list] Marshalling: automatic discard of unmashalable data via ephemerons Enrico Tassi
2013-10-16 14:03 ` Jacques-Henri Jourdan
2013-10-16 14:42   ` Enrico Tassi
2013-10-16 15:30     ` François Bobot
2013-10-16 16:24       ` Alain Frisch
2013-10-22  9:38 ` [Caml-list] Marshalling: automatic discard of unmashalable data via oleg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).