caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Luca de Alfaro" <luca@dealfaro.org>
To: "Berke Durak" <berke.durak@gmail.com>
Cc: "Jacques Garrigue" <garrigue@math.nagoya-u.ac.jp>, caml-list@inria.fr
Subject: Re: [Caml-list] picking / marshaling to strings in ocaml-revision-stable way
Date: Sat, 31 May 2008 09:54:52 -0700	[thread overview]
Message-ID: <28fa90930805310954w3089478bqfd8c3f821fff207e@mail.gmail.com> (raw)
In-Reply-To: <b903a8570805310238u239c60c9sf85d2475e6635ad4@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2427 bytes --]

Thanks for this insight... I imagined the lack of robustness of Marshaling,
but without all the details you mentioned!... actually, I DO desperately
need speed, as I am processing TB's of Wikipedia data, but precisely because
the datasets are so large, I cannot afford having to recompute / convert
them often, and so I want a robust format. Furthermore, I think the
bottleneck for me is anyway the speed of mysql and the disk, not really the
small amount of time that natively compiled Ocaml would take for the
conversion (I have anyway to do more complex computation that converting a
few lists and datatypes to ascii, unfortunately).  Moreover, a plaintext
format greatly helps debugging; it also helps that I can read the same data
with other programming languages.

Speaking of debugging, and said in passing, I cannot say enough how much I
LOVE the ability of ocamldebug of executing code backwards.  It is such a
revelation.  You simply go to the error, then back off a bit to see how you
got there.  But, this is a topic for another thread.

Many thanks,

Luca


On Sat, May 31, 2008 at 2:38 AM, Berke Durak <berke.durak@gmail.com> wrote:

> I second Luca's suggestion to use Sexplib.  At the very least, use a
> plaintext format.
> Don't use Marshal for long-term storage of values.  Avoid it if you
> can.  Been there, done that.
> Why?
>
> (1) Not type-safe.  Translation: your program *wil segfault* and you
> won't know why.
> (2) Not human-readable nor editable.
> (3) Not future-proof.  What happens if you change your type
> definition?  Your program
> will segfault.  So you'll have to migrate your data.  But how?  You'll
> have to find
> the exact revision used to generate the binary data.  Good luck with
> that.  Did you put
> a revision number in your data?  Are you sure it was up-to-date?  Then
> you'll have to hand-write a converter that uses type declarations from
> the old and the new modules.
> I hope your dependencies are not too complex.  Not fun *at all*.
>
> However, there are some situations where Marshal is appropriate :
>
> (1) Your data is not acyclic, contains closures, or needs sharing to
> be compact enough.  Sexplib doesn't handle these.
> (2) The data won't live long anyway.  As in: you're doing IPC between
> known versions of Ocaml programs.
> (3) You desperately need speed.  As in: you're processing 200GB of
> Wikipedia data.
> Then I can understand.
> --
> Berke Durak
>

[-- Attachment #2: Type: text/html, Size: 2930 bytes --]

  reply	other threads:[~2008-05-31 16:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-31  6:43 Luca de Alfaro
2008-05-31  7:24 ` [Caml-list] " asmadeus77
2008-05-31  8:43 ` Jacques Garrigue
2008-05-31  9:38   ` Berke Durak
2008-05-31 16:54     ` Luca de Alfaro [this message]
2008-05-31 17:00       ` Robert Fischer
2008-05-31 17:24         ` Luca de Alfaro
2008-05-31 22:18           ` Martin Jambon
2008-05-31 17:25         ` blue storm
2008-05-31 21:34         ` Berke Durak
2008-05-31 22:51           ` Stefano Zacchiroli
2008-06-02  9:04             ` Berke Durak
2008-06-02  9:21               ` Stefano Zacchiroli
2008-06-01 11:14           ` Martin Jambon
2008-06-02 11:13         ` Richard Jones
2008-05-31 17:06       ` Yaron Minsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28fa90930805310954w3089478bqfd8c3f821fff207e@mail.gmail.com \
    --to=luca@dealfaro.org \
    --cc=berke.durak@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=garrigue@math.nagoya-u.ac.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).