[Caml-list] Memory Mapped Files and OCaml

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] Memory Mapped Files and OCaml
@ 2004-03-22  5:05 Richard Cole
  2004-03-22  9:40 ` Basile Starynkevitch
  2004-03-23  7:21 ` Morphed at little into Grid Computing " Vasili Galchin
  0 siblings, 2 replies; 3+ messages in thread
From: Richard Cole @ 2004-03-22  5:05 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 2214 bytes --]

Hi,

I wonder if anyone can give me some pointers. I'm interested in having 
all memory used by my ocaml program memory mapped so that calculations 
can be preserved from one run of an ocaml program to the next. I'm 
thinking of something like:

let empty_list : (string, string) list = [] in
*let user_list = store.get_value "user_list" empty_list in*
let user : string = Cgi.get_value "user" in
let password : string = Cgi.get_value "password" in
  if exists user user_list then
    if auth user password user_list then
      print_account_details user       
    else
      print_account_details user
  else
    begin
      *store.put_value "user_list" (add_user user password user_list);*
      print_account_details user
    end 

With the idea being that all values are stored in the memory mapped 
files so put_value and get_value are very fast. Serialization is ok for 
small data structures, but for 50M data structures, for which only a 
small part of the data structure is accesed, it is a pain.

The idea is that put_value and get_value cause structures to survive to 
the uppermost scope level and so these are not garbage collected before 
program termination since they are still referenced. Everything else 
gets garbage collected and so doesn't clog up the persistent store.

Of course program termination can take a long time if there are many 
dirty pages that need to be synchronised to disk. There may be some way 
to tell unix to sync dirty pages while the program is running but 
without thrashing (i.e. using all system resources).

Such a persistent store does suffer from a lack of safety. i.e. killing 
the process or the machine going down could leave the store in an 
inconsistent state. If safety is required there must be algorithms 
around to provide it in conjuction with a memory mapped file, perhaps 
via checkpointing. Does referential transparency help us here?

One final question: Are most people using database backends for 
persistence? Is it the case that most data structures that one would 
want to create in Ocaml programs map fairly easily into B-tree 
structures, i.e. are maps or multimaps from a keyed domain into some 
structured domain.

best regards,

Richard.

[-- Attachment #2: Type: text/html, Size: 2782 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Memory Mapped Files and OCaml
  2004-03-22  5:05 [Caml-list] Memory Mapped Files and OCaml Richard Cole
@ 2004-03-22  9:40 ` Basile Starynkevitch
  2004-03-23  7:21 ` Morphed at little into Grid Computing " Vasili Galchin
  1 sibling, 0 replies; 3+ messages in thread
From: Basile Starynkevitch @ 2004-03-22  9:40 UTC (permalink / raw)
  To: Richard Cole, caml-list

On Mon, Mar 22, 2004 at 03:05:08PM +1000, Richard Cole wrote:

> I wonder if anyone can give me some pointers. I'm interested in having 
> all memory used by my ocaml program memory mapped so that calculations 
> can be preserved from one run of an ocaml program to the next. [...]

There are two separate issues here:

1. first is memory mapped files, i.e. an interface to the mmap(2) &
   munmap(2) system calls. This is provided in the Bigarray module
   (see http://caml.inria.fr/ocaml/htmlman/manual043.html for more)
   thru functions like Bigarray.Array1.map_file etc...

2. second issue is persistent data. Did you look at Persil on
   http://cristal.inria.fr/~starynke/persil/ which should provide what
   you need? Persil is using the internal marshalling primitives for
   serialisation.

> With the idea being that all values are stored in the memory mapped 
> files so put_value and get_value are very fast. Serialization is ok for 
> small data structures, but for 50M data structures, for which only a 
> small part of the data structure is accesed, it is a pain.

If the huge data structures contain some (potentially shared)
persistent data, you don't need to serialise the data itself but only
some persistent "pointer" to it (internally, a persistent value in
Persil is a phantom type of only two integers: the store number, and
the value rank within this store. So serializing such a persistent
value is quite quick). This is what Persil does. So I'll bet that evan
if you have a gigabyte of data, if it is organised as a "chunk" of
many medium (or small) sized persistent values, you won't have to
serialise 50Mbytes at once. Of course you'll need to explicitly code
with persistent values, and get & set operations on them (and also
transactions on persistent stores).

> Of course program termination can take a long time if there are many 
> dirty pages that need to be synchronised to disk. There may be some way 
> to tell unix to sync dirty pages while the program is running but 
> without thrashing (i.e. using all system resources).

If you mean calling the msync(2) system call, I think that there is
currently no Ocaml interface to it. The madvise(2) system call might
also help. For reads, Linux also provide a Linux specific readahead(2)
call. All three calls (msync, madvise, readahead) are not interfaced
to Ocaml, but coding the C wrapper to call them from Ocaml should be
easy.

In Persil, updating the persistent store (if it was not done before)
is done at exit (using the Pervasives.at_exit function), unless it was
not done in a transactional manner. So you shouldn't lose your data.

> 
> Such a persistent store does suffer from a lack of safety. i.e. killing 
> the process or the machine going down could leave the store in an 
> inconsistent state. If safety is required there must be algorithms 
> around to provide it in conjuction with a memory mapped file, perhaps 
> via checkpointing. Does referential transparency help us here?

Persil does provide some transaction mechanism, provided that the
underlying store (eg MySQL4) provides it. Persil also provides a
"generic" persistent store machinery (using functors).

> One final question: Are most people using database backends for 
> persistence? Is it the case that most data structures that one would 
> want to create in Ocaml programs map fairly easily into B-tree 
> structures, i.e. are maps or multimaps from a keyed domain into some 
> structured domain.

I think it depends upon the application. The main reasons for using a
database include

A. concurrency, and more generally ACID properties and
transactions. Persil does provide a transactional interfaces, if and
only if the underlying persistent store has them. I'm not sure that
memory mapping is enough here! And writing an ACID system from scratch
is a huge amount of work.

B. compatibility with other applications. If a database is accessed by
your Ocaml program and also by an existing Perl or Java software, you
have to find a least common denominator.... (which might be SQL...)

C. (closely related to B) compatibility with existing data. Usually,
big data is already available in some RDBMS system, and you have to
handle it thru the existing infrastructure.

Persil may use MySQL4 for point A (ACIDity & transactions), but the
persistent data is marshalled into a string.

________________

There are still some difficult issues (a little related, but mostly
orthogonal)

* functional values: marshalling functions (ie internally closures) is
difficult, and only currently possible within a single program which
does not change. This means that you cannot communicate a closure from
one program to another (even if it is inside the same compilation unit
- but in that case, the runtime might perhaps be adapted to handle
this specific case). I tend to believe that functional values are not
needed in practice in persistent stores, but Ocaml objects have
functions inside them (internally, in their class descriptor or vtable
equivalent).

* data schema evolution: suppose you serialise records like 
   type person = { name: string; age: int }
  and later on, you want to change it to 
   type person = { name: string; age: int; mutable friends: person list }

Currently, the marshalling machinery does not permit such an
evolution. You cannot store a huge number of persons of the first type
and read them as persons of the later type (with an empty friends list).

* generating encoding & decoding functions from the concrete type
descriptions (and, in case of abstract data types, from their module
signatures and more...). If all your types are concrete, a syntactic
approach like IoXML at http://cristal.inria.fr/~ddr/IoXML/ should
help.

________________

Richard Cole, I am extremely interested by your application's need
regarding persistency. Could you give me (or the list) more details,
first forgetting the file mapping issue (which IMHO is just an
[important] implementation detail).

Regards.
-- 
Basile STARYNKEVITCH -- basile dot starynkevitch at inria dot fr
Project cristal.inria.fr - phone +33 1 3963 5197 - mobile 6 8501 2359
http://cristal.inria.fr/~starynke --- all opinions are only mine 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Morphed at little into Grid Computing Re: [Caml-list] Memory Mapped Files and OCaml
  2004-03-22  5:05 [Caml-list] Memory Mapped Files and OCaml Richard Cole
  2004-03-22  9:40 ` Basile Starynkevitch
@ 2004-03-23  7:21 ` Vasili Galchin
  1 sibling, 0 replies; 3+ messages in thread
From: Vasili Galchin @ 2004-03-23  7:21 UTC (permalink / raw)
  To: Richard Cole, caml-list; +Cc: vasiliocaml

Hello,

    Richard question is IMO a very good one. i want to
go a step further however. I remember two research
efforts in the past:

1) Shiva project - in Scheme I believe done at Bell
Labs (I almost 100% sure there was an ACM TOPLAS paper
on this project)

2) The Tube - a PhD dissertation project done at
Cambridge U.  .... IMO ... very, very nice  (caveat it
is a prototype proof-of-concept), but IMO very cool
ideas ....
http://citeseer.ist.psu.edu/cache/papers/cs/5552/http:zSzzSzwww.cl.cam.ac.ukzSzuserszSzdah28zSzdah28ths.pdf/halls97applying.pdf/.


Shiva would call call-cc to capture a computation as a
continuation, serialize the continuation and then this
serialized continuation could be "faulted" across to
another computation node for execution. This allows
"mobile" computations (The Tube is very, very
similar!), i.e. that move to other computation nodes
say for fault tolerant reasons, load balancing, etc.
Of course, the fact that we can serialize implies (at
least to me) that we can have notion of persistence
which Richard seems to be alluding to. yes?

    Frankly in the O'Reilly book when I read that an
entire "closure" could not be faulted across to
another computation node, I was very disappointed. 

    - I realized that this faulting doesn't come for
free in terms om network traffick.

    - on the other hand in grid computing as it now
stands, there is (IMO) is a serious lack of "mobile"
computations (http://www.globus.org)!

Regards, vasili 


ote:
> Hi,
> 
> I wonder if anyone can give me some pointers. I'm
> interested in having 
> all memory used by my ocaml program memory mapped so
> that calculations 
> can be preserved from one run of an ocaml program to
> the next. I'm 
> thinking of something like:
> 
> let empty_list : (string, string) list = [] in
> *let user_list = store.get_value "user_list"
> empty_list in*
> let user : string = Cgi.get_value "user" in
> let password : string = Cgi.get_value "password" in
>   if exists user user_list then
>     if auth user password user_list then
>       print_account_details user       
>     else
>       print_account_details user
>   else
>     begin
>       *store.put_value "user_list" (add_user user
> password user_list);*
>       print_account_details user
>     end 
> 
> With the idea being that all values are stored in
> the memory mapped 
> files so put_value and get_value are very fast.
> Serialization is ok for 
> small data structures, but for 50M data structures,
> for which only a 
> small part of the data structure is accesed, it is a
> pain.
> 
> The idea is that put_value and get_value cause
> structures to survive to 
> the uppermost scope level and so these are not
> garbage collected before 
> program termination since they are still referenced.
> Everything else 
> gets garbage collected and so doesn't clog up the
> persistent store.
> 
> Of course program termination can take a long time
> if there are many 
> dirty pages that need to be synchronised to disk.
> There may be some way 
> to tell unix to sync dirty pages while the program
> is running but 
> without thrashing (i.e. using all system resources).
> 
> Such a persistent store does suffer from a lack of
> safety. i.e. killing 
> the process or the machine going down could leave
> the store in an 
> inconsistent state. If safety is required there must
> be algorithms 
> around to provide it in conjuction with a memory
> mapped file, perhaps 
> via checkpointing. Does referential transparency
> help us here?
> 
> One final question: Are most people using database
> backends for 
> persistence? Is it the case that most data
> structures that one would 
> want to create in Ocaml programs map fairly easily
> into B-tree 
> structures, i.e. are maps or multimaps from a keyed
> domain into some 
> structured domain.
> 
> best regards,
> 
> Richard.
> 
> 


__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-03-23  7:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-22  5:05 [Caml-list] Memory Mapped Files and OCaml Richard Cole
2004-03-22  9:40 ` Basile Starynkevitch
2004-03-23  7:21 ` Morphed at little into Grid Computing " Vasili Galchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).