caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Basile Starynkevitch <basile.starynkevitch@inria.fr>
To: Sebastien Ferre <sbf@aber.ac.uk>, caml-list@inria.fr
Subject: Re: [Caml-list] swapping large data structures from/to files
Date: Thu, 8 Apr 2004 17:57:32 +0200	[thread overview]
Message-ID: <20040408155732.GA24949@bourg.inria.fr> (raw)
In-Reply-To: <407566DF.3040402@aber.ac.uk>

On Thu, Apr 08, 2004 at 02:51:11PM +0000, Sebastien Ferre wrote:

> I am interested in handling so large data structures
> that they don't fit in main memory. 

I am curious - what are your huge DAGs? (bio-informatics
applications??)

> I need 2 things:

> 1. Persistency of the data structure, preferably in
> a file (similarly to NDBM, say).

Did you look into Persil on my home page (see my sig)? It does provide
persistency into small segmented files (which works reasonably for
small data, since the whole file gets copied at end of process) or
with MySQL4. If you need, I could add another persistent store for it
(but I think that using a transactional database with Persil is much
better for big persistent data).

The most important issue is: do you need some kind of transaction
mechanism?  I could write some better file based persistent store iff
you don't need [nested] transactions (with commit & abort ability)!

You might also use Bigarray-s which can be mapped to files.

> 2. Customized swapping strategy of elements of the data
> structure, what should be more efficient than the
> virtual memory.

I'm not sure to fully understand your point. Persil does give the
ability to unload & relead persistent values on (explicit) demand.

Do you agree to explicitly say in your application (by appropriate
calls) I won't need any more this and this values? Or do you want the
system to guess them by yourself.

(For completness, you can give hints to the VM system with the madvise
system call, but it won't work with Ocaml - because values may be
moved by the GC).

> 
> Typically, my data structure is a DAG, and I wish to
> keep in memory only a limited amount of nodes at a time.

Is schema evolution a concern for you? Ie if you change the types
implementing your DAG, how do you deal with the huge persistent data
in that case? (Persil does not handle this issue, since it uses the
Marshal module)


> Hence the necessaty for swapping. It is also important
> to have as much as possible in memory, and not merely
> accessing the file, for efficiency reasons.

> Has anything be done in this direction ?
> The library Dbm is fine to me for the persistency,
> but it does not work on every platform :-(.
> ( Would Dbm be difficult to rewrite in OCaml ?)

I think that there are quite portable versions of Dbm (or BSD DB).

> Sébastien Ferré

(You can answer me in French if you wish; if you CC the list, let's
continue in english)


-- 
Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
Project cristal.inria.fr - phone +33 1 3963 5197 - mobile 6 8501 2359
http://cristal.inria.fr/~starynke --- all opinions are only mine 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


      reply	other threads:[~2004-04-08 15:58 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-08 14:51 Sebastien Ferre
2004-04-08 15:57 ` Basile Starynkevitch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040408155732.GA24949@bourg.inria.fr \
    --to=basile.starynkevitch@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=sbf@aber.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).