From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 1AC34BBAF for ; Sat, 31 May 2008 19:00:53 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AncAALIgQUjAXQIno2dsb2JhbACBVZBdAQEBAQEBBwUIBxGaXg X-IronPort-AV: E=Sophos;i="4.27,571,1204498800"; d="scan'208";a="13308109" Received: from concorde.inria.fr ([192.93.2.39]) by mail3-smtp-sop.national.inria.fr with ESMTP; 31 May 2008 19:00:52 +0200 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id m4VH0pQG019211 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Sat, 31 May 2008 19:00:52 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av8AAKMhQUjQccgFf2dsb2JhbACBVZBdAQELBQIGBxGaWg X-IronPort-AV: E=Sophos;i="4.27,571,1204498800"; d="scan'208";a="13017471" Received: from lax-green-bigip-5.dreamhost.com (HELO looneymail-a2.g.dreamhost.com) ([208.113.200.5]) by mail1-smtp-roc.national.inria.fr with ESMTP; 31 May 2008 19:00:38 +0200 Received: from Smokejumper-MBP.local (c-76-113-178-230.hsd1.mn.comcast.net [76.113.178.230]) by looneymail-a2.g.dreamhost.com (Postfix) with ESMTP id 4AE9016E066 for ; Sat, 31 May 2008 10:00:18 -0700 (PDT) Message-ID: <48418421.5090601@fischerventure.com> Date: Sat, 31 May 2008 12:00:17 -0500 From: Robert Fischer User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) MIME-Version: 1.0 To: caml-list@inria.fr Subject: Re: [Caml-list] picking / marshaling to strings in ocaml-revision-stable way References: <28fa90930805302343n18b98b17t72a22ea82539fc6f@mail.gmail.com> <20080531.174314.107716495.garrigue@math.nagoya-u.ac.jp> <28fa90930805310954w3089478bqfd8c3f821fff207e@mail.gmail.com> In-Reply-To: <28fa90930805310954w3089478bqfd8c3f821fff207e@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 48418443.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; marshaling:01 marshaling:01 wikipedia:01 ocaml:01 datatypes:01 ocamldebug:01 berke:01 durak:01 berke:01 durak:01 type-safe:01 segfault:01 segfault:01 hand-write:01 dependencies:01 How far is the reach from the Jane St S-exp library from producing JSON? I've not actually looked at it, but that'd be super nifty in the interoperation world. ~~ Robert. Luca de Alfaro wrote: > Thanks for this insight... I imagined the lack of robustness of Marshaling, > but without all the details you mentioned!... actually, I DO desperately > need speed, as I am processing TB's of Wikipedia data, but precisely because > the datasets are so large, I cannot afford having to recompute / convert > them often, and so I want a robust format. Furthermore, I think the > bottleneck for me is anyway the speed of mysql and the disk, not really the > small amount of time that natively compiled Ocaml would take for the > conversion (I have anyway to do more complex computation that converting a > few lists and datatypes to ascii, unfortunately). Moreover, a plaintext > format greatly helps debugging; it also helps that I can read the same data > with other programming languages. > > Speaking of debugging, and said in passing, I cannot say enough how much I > LOVE the ability of ocamldebug of executing code backwards. It is such a > revelation. You simply go to the error, then back off a bit to see how you > got there. But, this is a topic for another thread. > > Many thanks, > > Luca > > > On Sat, May 31, 2008 at 2:38 AM, Berke Durak wrote: > >> I second Luca's suggestion to use Sexplib. At the very least, use a >> plaintext format. >> Don't use Marshal for long-term storage of values. Avoid it if you >> can. Been there, done that. >> Why? >> >> (1) Not type-safe. Translation: your program *wil segfault* and you >> won't know why. >> (2) Not human-readable nor editable. >> (3) Not future-proof. What happens if you change your type >> definition? Your program >> will segfault. So you'll have to migrate your data. But how? You'll >> have to find >> the exact revision used to generate the binary data. Good luck with >> that. Did you put >> a revision number in your data? Are you sure it was up-to-date? Then >> you'll have to hand-write a converter that uses type declarations from >> the old and the new modules. >> I hope your dependencies are not too complex. Not fun *at all*. >> >> However, there are some situations where Marshal is appropriate : >> >> (1) Your data is not acyclic, contains closures, or needs sharing to >> be compact enough. Sexplib doesn't handle these. >> (2) The data won't live long anyway. As in: you're doing IPC between >> known versions of Ocaml programs. >> (3) You desperately need speed. As in: you're processing 200GB of >> Wikipedia data. >> Then I can understand. >> -- >> Berke Durak >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs