Hello, > 1. I made some measurements that suggest that the > unmarshal has quadratic complexity in the data size, see > > http://www.cs.ru.nl/~tews/marshal-plot.eps > http://www.cs.ru.nl/~tews/marshal-plot-detail.eps > > If my (simple-minded) estimations are correct it would take > more than 2 hours to check 4 MB of marshaled data (which I > generate in less than 3 seconds). > > Is there any hope that the time complexity will improve? If your value contains no sharing (hence no cycle), the type checking of safe-unmarshalling should be linear in the size of the data. If there is some sharing (but still no cycle), and if we assume that looking for shared type information takes constant time, then type checking should still be linear. With cycles, things are a bit worse because of the computation of strongly connected components. Therefore, if your data contains shared parts but no cycle, then it should be close to linear. Currently, mapping shared blocks to their type is represented by an association list: accessing such type information could explain this quadratic behaviour. If you are in this situation, this could be the reason and the implementation needs a more clever way of retrieving types of shared blocks. Note that a simple workaround such as using the Map module is incorrect because a memory compaction may occur and break the order used for storing blocks. With cycles, the `height' of cycles (cycles inside cycles) comes as an extra complexity factor. But in order to produce a quadratic behaviour, you need data with a very particular shape. Could you please send us more information about the shape of your data (sharing or not, cycles or not, and wether sharing increase linearly or not with the size)? > 2. Objective Caml version 3.09.3+dev0+ty1 > > # SafeUnmarshal.copy [^nativeint^] 4;; > Segmentation fault See your fifth question :) > 3. Would it be possible to put an english version of > http://www.pps.jussieu.fr/~henry/marshal/docTy/Ty.html online? Certainly. I'm joining a translated ty.mli, I will put it online too. > 4. Instead of type-safe unmarshal functions, I am more interested > in checking if some value that has been constructed outside > ocaml is type correct. Therefore I would suggest to make > Check.check available in come way. I am using now > > let check obj ty = Check.check (Obj.repr obj) (Ty.dump ty) > > with type > > val check : 'a -> 'a tyrepr -> bool > > Am I right that the type parameter of tyrepr is a kind of > phantom type that is mainly used to restrict the type of > SafeUnmarshal.from_channel? Then I could also use > > val check : 'a -> 'b tyrepr -> bool ? Yes and yes. > It would be great if (as a debugging feature) this check could > produce some sort of trace that helps to locate those parts > that violate the given type. My hope for reducing the complexity is to make some "pre-processing" on the value when the unmarshaller reallocates it in the memory. In this case, the Check.check function can't operate on "standard" values, and should not be exported. Yet I don't know if the "preprocessor" would be tightly bound to the unmarshaller or have a chance to be exported too. We can keep exporting a quadratic check for "standard" values. > 5. nativeint, int32, and int64 are not supported yet (I would > suggest to make the documentation a bit more precise in that > point). As fix I use (in Check.check_block): > > | Tnativeint -> > tag = Obj.custom_tag && size = 2 && > ((Obj.field obj 0) == (Obj.field (Obj.repr Nativeint.zero) 0)) > | Tint32 -> > tag = Obj.custom_tag && size = 2 && > ((Obj.field obj 0) == (Obj.field (Obj.repr Int32.zero) 0)) > | Tint64 -> > tag = Obj.custom_tag && size = 3 && > ((Obj.field obj 0) == (Obj.field (Obj.repr Int64.zero) 0)) > > Any comments? On a 64 bit architecture the int64 size should be > required to be 2. let custom_int64_size = 1 + 64 / Sys.word_size;; I will add those cases. > I would strongly suggest to replace the catch all cases > > | _ -> false > > in check.ml with some more precise code (it took me several > hours of bug hunting until I found out that the above line > made my unmarshal fail without even looking at the > nativeints). Something like printing the type clash (looking for ty_xx, found ty_yy)? Or pretty-printing part of the value that have been checked before failure? > 6. Thanks for SafeUnmarshal, it helped me a lot when checking the > data created inside C++! This is quite an original use for unmarshalling functions :-) Many thanks for your reports and suggestions ! -- Grégoire Henry