If you're willing to sacrifice readability for speed and compactness, you might want to consider jane street's bin-prot library as well... Yaron Minsky On May 31, 2008, at 12:54 PM, Luca de Alfaro wrote: > Thanks for this insight... I imagined the lack of robustness of > Marshaling, but without all the details you mentioned!... actually, > I DO desperately need speed, as I am processing TB's of Wikipedia > data, but precisely because the datasets are so large, I cannot > afford having to recompute / convert them often, and so I want a > robust format. Furthermore, I think the bottleneck for me is anyway > the speed of mysql and the disk, not really the small amount of time > that natively compiled Ocaml would take for the conversion (I have > anyway to do more complex computation that converting a few lists > and datatypes to ascii, unfortunately). Moreover, a plaintext > format greatly helps debugging; it also helps that I can read the > same data with other programming languages. > > Speaking of debugging, and said in passing, I cannot say enough how > much I LOVE the ability of ocamldebug of executing code backwards. > It is such a revelation. You simply go to the error, then back off > a bit to see how you got there. But, this is a topic for another > thread. > > Many thanks, > > Luca > > > On Sat, May 31, 2008 at 2:38 AM, Berke Durak > wrote: > I second Luca's suggestion to use Sexplib. At the very least, use a > plaintext format. > Don't use Marshal for long-term storage of values. Avoid it if you > can. Been there, done that. > Why? > > (1) Not type-safe. Translation: your program *wil segfault* and you > won't know why. > (2) Not human-readable nor editable. > (3) Not future-proof. What happens if you change your type > definition? Your program > will segfault. So you'll have to migrate your data. But how? You'll > have to find > the exact revision used to generate the binary data. Good luck with > that. Did you put > a revision number in your data? Are you sure it was up-to-date? Then > you'll have to hand-write a converter that uses type declarations from > the old and the new modules. > I hope your dependencies are not too complex. Not fun *at all*. > > However, there are some situations where Marshal is appropriate : > > (1) Your data is not acyclic, contains closures, or needs sharing to > be compact enough. Sexplib doesn't handle these. > (2) The data won't live long anyway. As in: you're doing IPC between > known versions of Ocaml programs. > (3) You desperately need speed. As in: you're processing 200GB of > Wikipedia data. > Then I can understand. > -- > Berke Durak > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs