From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 6CBD7BC29 for ; Wed, 30 Aug 2006 01:45:21 +0200 (CEST) Received: from pandora.cs.kun.nl (pandora.cs.kun.nl [131.174.33.4]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7TNjIx3024845 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Wed, 30 Aug 2006 01:45:21 +0200 Received: from tandem.cs.ru.nl [131.174.142.18] (helo=tandem.cs.ru.nl) by pandora.cs.kun.nl (8.13.7/5.9) with ESMTP id k7TNcWME018044 for ; Wed, 30 Aug 2006 01:45:10 +0200 (MEST) Received: from tews by tandem.cs.ru.nl with local (Exim 4.62) (envelope-from ) id 1GIDAi-0007MV-Ik for caml-list@inria.fr; Wed, 30 Aug 2006 01:39:08 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17652.53276.522158.649480@tandem.cs.ru.nl> Date: Wed, 30 Aug 2006 01:39:08 +0200 To: caml-list@inria.fr Subject: SafeUnmarshal: questions/problems/timings X-Mailer: VM 7.19 under Emacs 21.4.1 From: Hendrik Tews X-Scanned-By: MIMEDefang 2.56 on 131.174.33.4 X-Miltered: at nez-perce with ID 44F4D18E.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; timings:01 hendrik:01 tews:01 tews:01 marshaled:01 ocaml:01 ocaml:01 marshaled:01 nativeint:01 type-safe:01 val:01 bool:01 val:01 bool:01 debugging:01 Dear all, I used the safeUnmarshal module (http://www.pps.jussieu.fr/~henry/marshal/) in order to check if some marshaled ocaml data is compatible with its type (the data is generated in a mixed C++/ocaml program). Here are my experiences: 1. I made some measurements that suggest that the unmarshal has quadratic complexity in the data size, see http://www.cs.ru.nl/~tews/marshal-plot.eps http://www.cs.ru.nl/~tews/marshal-plot-detail.eps If my (simple-minded) estimations are correct it would take more than 2 hours to check 4 MB of marshaled data (which I generate in less than 3 seconds). Is there any hope that the time complexity will improve? 2. Objective Caml version 3.09.3+dev0+ty1 # SafeUnmarshal.copy [^nativeint^] 4;; Segmentation fault 3. Would it be possible to put an english version of http://www.pps.jussieu.fr/~henry/marshal/docTy/Ty.html online? 4. Instead of type-safe unmarshal functions, I am more interested in checking if some value that has been constructed outside ocaml is type correct. Therefore I would suggest to make Check.check available in come way. I am using now let check obj ty = Check.check (Obj.repr obj) (Ty.dump ty) with type val check : 'a -> 'a tyrepr -> bool Am I right that the type parameter of tyrepr is a kind of phantom type that is mainly used to restrict the type of SafeUnmarshal.from_channel? Then I could also use val check : 'a -> 'b tyrepr -> bool ? It would be great if (as a debugging feature) this check could produce some sort of trace that helps to locate those parts that violate the given type. 5. nativeint, int32, and int64 are not supported yet (I would suggest to make the documentation a bit more precise in that point). As fix I use (in Check.check_block): | Tnativeint -> tag = Obj.custom_tag && size = 2 && ((Obj.field obj 0) == (Obj.field (Obj.repr Nativeint.zero) 0)) | Tint32 -> tag = Obj.custom_tag && size = 2 && ((Obj.field obj 0) == (Obj.field (Obj.repr Int32.zero) 0)) | Tint64 -> tag = Obj.custom_tag && size = 3 && ((Obj.field obj 0) == (Obj.field (Obj.repr Int64.zero) 0)) Any comments? On a 64 bit architecture the int64 size should be required to be 2. I would strongly suggest to replace the catch all cases | _ -> false in check.ml with some more precise code (it took me several hours of bug hunting until I found out that the above line made my unmarshal fail without even looking at the nativeints). 6. Thanks for SafeUnmarshal, it helped me a lot when checking the data created inside C++! Bye, Hendrik