caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Memory mapped values
@ 2001-10-08 19:24 Don Syme
  2001-10-08 19:32 ` Basile STARYNKEVITCH
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Don Syme @ 2001-10-08 19:24 UTC (permalink / raw)
  To: caml-list


This is just a random idea...

Would it be possible in theory for "input_value" to work by
memory-mapping the file being read, rather than by immediately reading
the file?  The idea would be that the structured value would then only
actually be realised in physical memory as it is touched by execution
and the corresponding pages of the memory-mapped file dragged in by the
virtual memory mechanism.  (To be honest, I haven't actually checked if
this is how input_value currently works, though I'm certain it can't
be.)

This technique would certainly require some modification to the GC, and
I'm not even sure if the relocation of internal pointers in the data
structure could be made to work (do any memory mapping primitives
provide that functionality?).  But if it could work, then that could
make for one of the very best and easiest ways of persisting data
structures - easier than moving to a relational database, and directly
related to the programming model.  In addition, the layout of data
structures on disk could be then be optimized to take into account the
access pattern at runtime.  With a page-fault costing something in the
order of a million cycles these days that could be very valuable...

Cheers,
Don

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Caml-list] Memory mapped values
  2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme
@ 2001-10-08 19:32 ` Basile STARYNKEVITCH
  2001-10-09  7:23 ` Fabrice Le Fessant
  2001-10-12 14:42 ` Xavier Leroy
  2 siblings, 0 replies; 4+ messages in thread
From: Basile STARYNKEVITCH @ 2001-10-08 19:32 UTC (permalink / raw)
  To: Don Syme; +Cc: caml-list

>>>>> "Don" == Don Syme <dsyme@microsoft.com> writes:
    Don> Would it be possible in theory for "input_value" to work by
    Don> memory-mapping the file being read, rather than by
    Don> immediately reading the file?  
[... interesting discussion skipped ....]

Perhaps, but I think that input_value is mostly useful for sequential
byte *streams* (not randomly accessed files) such as TCP/IP sockets.

I hope that input_value will still be able to work on sockets in
future versions of Ocaml.

Regards to all
--
Basile STARYNKEVITCH           http://lesours.starynkevitch.net/
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Faïencerie, 92340 Bourg La Reine, France
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Memory mapped values
  2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme
  2001-10-08 19:32 ` Basile STARYNKEVITCH
@ 2001-10-09  7:23 ` Fabrice Le Fessant
  2001-10-12 14:42 ` Xavier Leroy
  2 siblings, 0 replies; 4+ messages in thread
From: Fabrice Le Fessant @ 2001-10-09  7:23 UTC (permalink / raw)
  To: Don Syme; +Cc: caml-list


In the CDK, you will find a very small library "mmap" closed to what
you are talking about. The idea is to output the values in the file as
if they were in memory, so that the file can be directly mapped in
memory, and the values directly used by Ocaml. The library has not yet
being tested a lot. Of course, these values cannot be collected by the
garbage collector, nor should be mutable. However, there is a big
(unsolved yet) problem with compaction. Another problem is the size of
the pages bitmap used by the garbage collector, since the file might
be mapped very far from the main heap.

Regards,

- Fabrice
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Memory mapped values
  2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme
  2001-10-08 19:32 ` Basile STARYNKEVITCH
  2001-10-09  7:23 ` Fabrice Le Fessant
@ 2001-10-12 14:42 ` Xavier Leroy
  2 siblings, 0 replies; 4+ messages in thread
From: Xavier Leroy @ 2001-10-12 14:42 UTC (permalink / raw)
  To: Don Syme; +Cc: caml-list

> Would it be possible in theory for "input_value" to work by
> memory-mapping the file being read, rather than by immediately reading
> the file?  The idea would be that the structured value would then only
> actually be realised in physical memory as it is touched by execution
> and the corresponding pages of the memory-mapped file dragged in by the
> virtual memory mechanism.  (To be honest, I haven't actually checked if
> this is how input_value currently works, though I'm certain it can't
> be.)

No, that's not how input_value currently works :-)

What you describe sounds feasible, with two caveats:

- You need a serialization format that is "isomorphic" to the memory
representation of the data, i.e. that occupies the same space.  The
original Caml Light implementation of serialization used such a
format: the on-disk representation was essentially produced by a
copying GC applied to the value being externed, and input_value would
just read it in heap and replace offsets by pointers.

There were two problems with this approach.  One is 32/64 bit
interoperability, where you need to expand or shrink the data
accordingly during input_value; this is expensive and would prevent
direct access to a page as you describe.  The other is that this
serialization format wastes space, resulting in huge files that are
slow to read.  The "compact" format that OCaml uses (basically, a
prefix notation for the DAG of memory blocks composing the externed
value) is much more compact (by a factor of 10, roughly), and while it
takes more CPU time to do input_value, this is well offset by the
reduced file reading time.

- You need to relocate offsets into pointers when a page is first
accessed.  Under Unix, this could possibly be done by mapping the file
without read and write access, then catch the segmentation violation
that occurs when one of the pages is accessed, patch the pointers, and
change the page protections to read-write.  All this is highly
non-portable and quite slow, though.  (I think it's Appel and Li that
tried VM tricks to implement concurrent copying GC in the late 80s;
they found out later that the cost of changing page permissions is so
high under all Unix implementations they tested that the scheme was
impractical.) 

Because of this cost issue, your scheme would be interesting only if
the program accesses a small fragment of the memory-mapped data.  If
you're going to use all of the data, reading it in one step is more
efficient (it saves the cost of trapping SEGV and changing page
protections).

> But if it could work, then that could
> make for one of the very best and easiest ways of persisting data
> structures - easier than moving to a relational database, and directly
> related to the programming model.

I'm pretty ignorant with databases, but still what you describe is
vaguely reminiscent of some OO databases (ObjectStore, maybe?).  Two
issues remain to be addressed, though: how to modify incrementally the
data structure (modifying it in core and re-dumping it whole to disk
doesn't suffice), and how to deal with atomicity of updates...

Best wishes,

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-10-12 14:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-08 19:24 [Caml-list] Memory mapped values Don Syme
2001-10-08 19:32 ` Basile STARYNKEVITCH
2001-10-09  7:23 ` Fabrice Le Fessant
2001-10-12 14:42 ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).