Re: [Caml-list] Memory Mapped Files and OCaml

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Basile Starynkevitch <basile.starynkevitch@inria.fr>
To: Richard Cole <rcole@itee.uq.edu.au>, caml-list@inria.fr
Subject: Re: [Caml-list] Memory Mapped Files and OCaml
Date: Mon, 22 Mar 2004 10:40:11 +0100	[thread overview]
Message-ID: <20040322094010.GA25354@bourg.inria.fr> (raw)
In-Reply-To: <405E7404.3050306@itee.uq.edu.au>

On Mon, Mar 22, 2004 at 03:05:08PM +1000, Richard Cole wrote:

> I wonder if anyone can give me some pointers. I'm interested in having 
> all memory used by my ocaml program memory mapped so that calculations 
> can be preserved from one run of an ocaml program to the next. [...]

There are two separate issues here:

1. first is memory mapped files, i.e. an interface to the mmap(2) &
   munmap(2) system calls. This is provided in the Bigarray module
   (see http://caml.inria.fr/ocaml/htmlman/manual043.html for more)
   thru functions like Bigarray.Array1.map_file etc...

2. second issue is persistent data. Did you look at Persil on
   http://cristal.inria.fr/~starynke/persil/ which should provide what
   you need? Persil is using the internal marshalling primitives for
   serialisation.

> With the idea being that all values are stored in the memory mapped 
> files so put_value and get_value are very fast. Serialization is ok for 
> small data structures, but for 50M data structures, for which only a 
> small part of the data structure is accesed, it is a pain.

If the huge data structures contain some (potentially shared)
persistent data, you don't need to serialise the data itself but only
some persistent "pointer" to it (internally, a persistent value in
Persil is a phantom type of only two integers: the store number, and
the value rank within this store. So serializing such a persistent
value is quite quick). This is what Persil does. So I'll bet that evan
if you have a gigabyte of data, if it is organised as a "chunk" of
many medium (or small) sized persistent values, you won't have to
serialise 50Mbytes at once. Of course you'll need to explicitly code
with persistent values, and get & set operations on them (and also
transactions on persistent stores).

> Of course program termination can take a long time if there are many 
> dirty pages that need to be synchronised to disk. There may be some way 
> to tell unix to sync dirty pages while the program is running but 
> without thrashing (i.e. using all system resources).

If you mean calling the msync(2) system call, I think that there is
currently no Ocaml interface to it. The madvise(2) system call might
also help. For reads, Linux also provide a Linux specific readahead(2)
call. All three calls (msync, madvise, readahead) are not interfaced
to Ocaml, but coding the C wrapper to call them from Ocaml should be
easy.

In Persil, updating the persistent store (if it was not done before)
is done at exit (using the Pervasives.at_exit function), unless it was
not done in a transactional manner. So you shouldn't lose your data.

> 
> Such a persistent store does suffer from a lack of safety. i.e. killing 
> the process or the machine going down could leave the store in an 
> inconsistent state. If safety is required there must be algorithms 
> around to provide it in conjuction with a memory mapped file, perhaps 
> via checkpointing. Does referential transparency help us here?

Persil does provide some transaction mechanism, provided that the
underlying store (eg MySQL4) provides it. Persil also provides a
"generic" persistent store machinery (using functors).

> One final question: Are most people using database backends for 
> persistence? Is it the case that most data structures that one would 
> want to create in Ocaml programs map fairly easily into B-tree 
> structures, i.e. are maps or multimaps from a keyed domain into some 
> structured domain.

I think it depends upon the application. The main reasons for using a
database include

A. concurrency, and more generally ACID properties and
transactions. Persil does provide a transactional interfaces, if and
only if the underlying persistent store has them. I'm not sure that
memory mapping is enough here! And writing an ACID system from scratch
is a huge amount of work.

B. compatibility with other applications. If a database is accessed by
your Ocaml program and also by an existing Perl or Java software, you
have to find a least common denominator.... (which might be SQL...)

C. (closely related to B) compatibility with existing data. Usually,
big data is already available in some RDBMS system, and you have to
handle it thru the existing infrastructure.

Persil may use MySQL4 for point A (ACIDity & transactions), but the
persistent data is marshalled into a string.

________________

There are still some difficult issues (a little related, but mostly
orthogonal)

* functional values: marshalling functions (ie internally closures) is
difficult, and only currently possible within a single program which
does not change. This means that you cannot communicate a closure from
one program to another (even if it is inside the same compilation unit
- but in that case, the runtime might perhaps be adapted to handle
this specific case). I tend to believe that functional values are not
needed in practice in persistent stores, but Ocaml objects have
functions inside them (internally, in their class descriptor or vtable
equivalent).

* data schema evolution: suppose you serialise records like 
   type person = { name: string; age: int }
  and later on, you want to change it to 
   type person = { name: string; age: int; mutable friends: person list }

Currently, the marshalling machinery does not permit such an
evolution. You cannot store a huge number of persons of the first type
and read them as persons of the later type (with an empty friends list).

* generating encoding & decoding functions from the concrete type
descriptions (and, in case of abstract data types, from their module
signatures and more...). If all your types are concrete, a syntactic
approach like IoXML at http://cristal.inria.fr/~ddr/IoXML/ should
help.

________________

Richard Cole, I am extremely interested by your application's need
regarding persistency. Could you give me (or the list) more details,
first forgetting the file mapping issue (which IMHO is just an
[important] implementation detail).

Regards.
-- 
Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
Project cristal.inria.fr - phone +33 1 3963 5197 - mobile 6 8501 2359
http://cristal.inria.fr/~starynke --- all opinions are only mine 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2004-03-22  9:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-22  5:05 Richard Cole
2004-03-22  9:40 ` Basile Starynkevitch [this message]
2004-03-23  7:21 ` Morphed at little into Grid Computing " Vasili Galchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040322094010.GA25354@bourg.inria.fr \
    --to=basile.starynkevitch@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=rcole@itee.uq.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).