caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* SafeUnmarshal: questions/problems/timings
@ 2006-08-29 23:39 Hendrik Tews
  2006-08-31 10:01 ` [Caml-list] " Grégoire Henry
  0 siblings, 1 reply; 4+ messages in thread
From: Hendrik Tews @ 2006-08-29 23:39 UTC (permalink / raw)
  To: caml-list

Dear all,

I used the safeUnmarshal module
(http://www.pps.jussieu.fr/~henry/marshal/) in order to check if
some marshaled ocaml data is compatible with its type (the data
is generated in a mixed C++/ocaml program). Here are my
experiences:

1. I made some measurements that suggest that the 
   unmarshal has quadratic complexity in the data size, see

   http://www.cs.ru.nl/~tews/marshal-plot.eps
   http://www.cs.ru.nl/~tews/marshal-plot-detail.eps

   If my (simple-minded) estimations are correct it would take
   more than 2 hours to check 4 MB of marshaled data (which I
   generate in less than 3 seconds).

   Is there any hope that the time complexity will improve?


2.	    Objective Caml version 3.09.3+dev0+ty1

    # SafeUnmarshal.copy [^nativeint^] 4;;
    Segmentation fault


3. Would it be possible to put an english version of
   http://www.pps.jussieu.fr/~henry/marshal/docTy/Ty.html online?


4. Instead of type-safe unmarshal functions, I am more interested
   in checking if some value that has been constructed outside
   ocaml is type correct. Therefore I would suggest to make
   Check.check available in come way. I am using now

     let check obj ty = Check.check (Obj.repr obj) (Ty.dump ty)

   with type 

     val check : 'a -> 'a tyrepr -> bool

   Am I right that the type parameter of tyrepr is a kind of
   phantom type that is mainly used to restrict the type of
   SafeUnmarshal.from_channel? Then I could also use 

     val check : 'a -> 'b tyrepr -> bool  ?

   It would be great if (as a debugging feature) this check could
   produce some sort of trace that helps to locate those parts
   that violate the given type.

5. nativeint, int32, and int64 are not supported yet (I would
   suggest to make the documentation a bit more precise in that
   point). As fix I use (in Check.check_block):

    | Tnativeint -> 
	tag = Obj.custom_tag && size = 2 && 
	((Obj.field obj 0) == (Obj.field (Obj.repr Nativeint.zero) 0))
    | Tint32 ->
	tag = Obj.custom_tag && size = 2 && 
	((Obj.field obj 0) == (Obj.field (Obj.repr Int32.zero) 0))
    | Tint64 ->
	tag = Obj.custom_tag && size = 3 && 
	((Obj.field obj 0) == (Obj.field (Obj.repr Int64.zero) 0))

   Any comments? On a 64 bit architecture the int64 size should be
   required to be 2.

   I would strongly suggest to replace the catch all cases

    | _ -> false

   in check.ml with some more precise code (it took me several
   hours of bug hunting until I found out that the above line
   made my unmarshal fail without even looking at the
   nativeints). 


6. Thanks for SafeUnmarshal, it helped me a lot when checking the
   data created inside C++!


Bye,

Hendrik


^ permalink raw reply	[flat|nested] 4+ messages in thread
* Re: [Caml-list] SafeUnmarshal: questions/problems/timings
@ 2006-09-14 13:07 Hendrik Tews
  0 siblings, 0 replies; 4+ messages in thread
From: Hendrik Tews @ 2006-09-14 13:07 UTC (permalink / raw)
  To: caml-list


Here is the promised followup with more details on the slow safe
unmarshalling. On
http://www.cs.ru.nl/~tews/nsUnicodeToTeXCMRt1.i.oast you can
download 281 KB of marshalled data. On my machine it takes 23
seconds to check with native code.

The data is of type 

  annotated translationUnit_type = annotated * annotated topForm_type list 

You can test it with the following piece of code:

open Cc_ast_gen_type
open Ast_annotation

let file = "/home/tews/src/elsa/elsa/in/big/nsUnicodeToTeXCMRt1.i.oast"
;;

try
  SafeUnmarshal.from_channel 
    [^ annotated translationUnit_type ^]
    (open_in file);
  print_endline "OK"
with
  | _ -> print_endline "FAIL"


compile with

        ocamlopt.opt safeUnmarshal.cmxa ast_annotation.ml elsa_util.ml \
              ml_ctype.ml cc_ml_types.ml cc_ast_gen_type.ml justunmarshal.ml

The additional files are from Olmar, get them here:

http://www.sos.cs.ru.nl/cgi-bin/~tews/olmar/viewvc-patch.cgi/elsa/elsa/ast_annotation.ml?revision=olmar-release-2006-09-07
http://www.sos.cs.ru.nl/cgi-bin/~tews/olmar/viewvc-patch.cgi/elsa/elsa/elsa_util.ml?revision=olmar-release-2006-09-07
http://www.sos.cs.ru.nl/cgi-bin/~tews/olmar/viewvc-patch.cgi/elsa/elsa/ml_ctype.ml?revision=olmar-release-2006-09-07
http://www.sos.cs.ru.nl/cgi-bin/~tews/olmar/viewvc-patch.cgi/elsa/elsa/cc_ml_types.ml?revision=olmar-release-2006-09-07
http://www.sos.cs.ru.nl/cgi-bin/~tews/olmar/viewvc-patch.cgi/elsa/elsa/cc_ast_gen_type.ml?revision=olmar-release-2006-09-07

To produce graphs like http://www.cs.ru.nl/~tews/marshal-plot.eps
you need to download Olmar, compile it and then
- ./regrtest -ocaml  in subdir elsa
- ./regtest-oast | grep time >data in subdir asttools
- gnuplot plot

where the plot file is something like

reset

set grid
unset mouse
set terminal x11 persist
#set terminal postscript enhanced color

set xlabel "size (Bytes)"
set ylabel "time (s)"
set key left

z = 2.1

plot "data" using 6:3 title "SafeUnmarshal user time", \
	(0.000017088 * x) ** 2.1


Bye,

Hendrik


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-09-14 13:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-29 23:39 SafeUnmarshal: questions/problems/timings Hendrik Tews
2006-08-31 10:01 ` [Caml-list] " Grégoire Henry
2006-09-01  9:23   ` Hendrik Tews
2006-09-14 13:07 Hendrik Tews

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).