caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* appending data to a mmap-ed file
@ 2010-12-16 11:31 Joel Reymont
  2010-12-16 12:38 ` [Caml-list] " Jesper Louis Andersen
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Joel Reymont @ 2010-12-16 11:31 UTC (permalink / raw)
  To: caml-list

I'm constantly appending to a file of stock quotes (ints, longs, doubles, etc.). I have this file mapped into memory with mmap. 

What's the most efficient way to make newly appended data available as part of the memory mapping?

Obligatory OCaml content: I'm trying to prototype a trading system in OCaml.
 
	Thanks, Joel

---
http://twitter.com/wagerlabs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-16 11:31 appending data to a mmap-ed file Joel Reymont
@ 2010-12-16 12:38 ` Jesper Louis Andersen
  2010-12-16 13:13   ` Joel Reymont
  2010-12-16 12:57 ` Gerd Stolpmann
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Jesper Louis Andersen @ 2010-12-16 12:38 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On Thu, Dec 16, 2010 at 12:31, Joel Reymont <joelr1@gmail.com> wrote:
> I'm constantly appending to a file of stock quotes (ints, longs, doubles, etc.). I have this file mapped into memory with mmap.

Ok, this helps a bit on what you are trying to do (you asked almost
the same question on the Erlang mailing list, but the details of
getting a foothold for the same thing in Erlang is subtly different)

My approach would be simple by noting you you two kinds of data and
some peculiar behaviour
  * "Newly generated data"
  * "Old data for archeology"
  * Data are almost never deleted

So:
  * If data is less than a threshold in size (preferably less than a
couple of PAGE_SIZE page boundaries, keep data in memory and serve it
from there. Simply have an Ocaml array of bytes or something such to
store data into (my Ocaml representation specific knowledge is not up
to par at the moment, but arrange it such that the byte-array has
C-representation underneath. I know that Ocaml strings have this).
This is the newly generated data.
 * Once in a while, you write(2) this string to the file on the disk,
then reopen the mmap() (which is now READ-ONLY as an effect. There
might be sharing tricks to play here should you do multi-process).
 * Lookup is handled by checking if data is archeology or data are
recent. The right lookup is then made. Everything hidden by batching
it up in a module.
 * You can play with the factor of when to write data to disk. Too
large, and you risk loosing too much data on failure. Too small and
the approach dies of syscall-overhead.

You may have additional constraints, so spill them, please.


-- 
J.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-16 11:31 appending data to a mmap-ed file Joel Reymont
  2010-12-16 12:38 ` [Caml-list] " Jesper Louis Andersen
@ 2010-12-16 12:57 ` Gerd Stolpmann
  2010-12-16 17:16 ` Richard W.M. Jones
  2010-12-18  9:56 ` Christophe Raffalli
  3 siblings, 0 replies; 10+ messages in thread
From: Gerd Stolpmann @ 2010-12-16 12:57 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

Am Donnerstag, den 16.12.2010, 11:31 +0000 schrieb Joel Reymont:
> I'm constantly appending to a file of stock quotes (ints, longs, doubles, etc.). I have this file mapped into memory with mmap. 
> 
> What's the most efficient way to make newly appended data available as part of the memory mapping?

Generally, you can only unmap the old mapping, and remap it. Don't know
whether you have an OS preference - Linux has also mremap, and it would
at least be possible to create a binding for it. mremap is non-portable,
though.

If you have all rights to define the file format yourself, I'd suggest
you use some kind of container, so you can append to the file in larger
chunks, and only when you need a new chunk, the mapping needs to be
re-established.

Gerd

> 
> Obligatory OCaml content: I'm trying to prototype a trading system in OCaml.
>  
> 	Thanks, Joel
> 
> ---
> http://twitter.com/wagerlabs
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-16 12:38 ` [Caml-list] " Jesper Louis Andersen
@ 2010-12-16 13:13   ` Joel Reymont
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Reymont @ 2010-12-16 13:13 UTC (permalink / raw)
  To: Jesper Louis Andersen; +Cc: caml-list

Jesper,

On Dec 16, 2010, at 12:38 PM, Jesper Louis Andersen wrote:

> Simply have an Ocaml array of bytes or something such to
> store data into (my Ocaml representation specific knowledge is not up
> to par at the moment, but arrange it such that the byte-array has
> C-representation underneath.

The data comes from a C++ thread running a market data feed. I was planning to manage the memory-mapped file in the same thread and only notify OCaml when new data is available, after extending the memory mapping. It seems to be the simplest approach to me. 

I also need to launch a C++ thread from OCaml and talk to it but that's probably a subject for a new mailing list thread. 

	Thanks, Joel

---
http://twitter.com/wagerlabs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-16 11:31 appending data to a mmap-ed file Joel Reymont
  2010-12-16 12:38 ` [Caml-list] " Jesper Louis Andersen
  2010-12-16 12:57 ` Gerd Stolpmann
@ 2010-12-16 17:16 ` Richard W.M. Jones
  2010-12-17  0:36   ` Goswin von Brederlow
  2010-12-18  9:56 ` Christophe Raffalli
  3 siblings, 1 reply; 10+ messages in thread
From: Richard W.M. Jones @ 2010-12-16 17:16 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On Thu, Dec 16, 2010 at 11:31:16AM +0000, Joel Reymont wrote:
> I'm constantly appending to a file of stock quotes (ints, longs,
> doubles, etc.). I have this file mapped into memory with mmap.
>
> What's the most efficient way to make newly appended data available
> as part of the memory mapping?

Unfortunately it's hard to reliably extend an mmap'd area.  The reason
is not that you can't do it, but that you might overrun another memory
mapping after it, where that other mapping could be something
important like your program or a shared library.  The other mapping
might not even be present at the time you initially map your file, but
might appear as the result of an innocuous operation such as printing
a string or allocating memory.

Now you can, with a bunch of work, avoid this: parse /proc/self/maps,
select a suitable base address for your mapping, move the mapping if
it gets too large for the selected area or if another library is
mapped in above it, etc. but this quickly gets very difficult.

I would suggest a simpler way to solve your problem is simply to open
the data file and append to it.  If you need to reference the values,
keep them in memory structures.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-16 17:16 ` Richard W.M. Jones
@ 2010-12-17  0:36   ` Goswin von Brederlow
  2010-12-17 14:48     ` Richard W.M. Jones
  0 siblings, 1 reply; 10+ messages in thread
From: Goswin von Brederlow @ 2010-12-17  0:36 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Joel Reymont, caml-list

"Richard W.M. Jones" <rich@annexia.org> writes:

> On Thu, Dec 16, 2010 at 11:31:16AM +0000, Joel Reymont wrote:
>> I'm constantly appending to a file of stock quotes (ints, longs,
>> doubles, etc.). I have this file mapped into memory with mmap.
>>
>> What's the most efficient way to make newly appended data available
>> as part of the memory mapping?
>
> Unfortunately it's hard to reliably extend an mmap'd area.  The reason
> is not that you can't do it, but that you might overrun another memory
> mapping after it, where that other mapping could be something
> important like your program or a shared library.  The other mapping
> might not even be present at the time you initially map your file, but
> might appear as the result of an innocuous operation such as printing
> a string or allocating memory.
>
> Now you can, with a bunch of work, avoid this: parse /proc/self/maps,
> select a suitable base address for your mapping, move the mapping if
> it gets too large for the selected area or if another library is
> mapped in above it, etc. but this quickly gets very difficult.
>
> I would suggest a simpler way to solve your problem is simply to open
> the data file and append to it.  If you need to reference the values,
> keep them in memory structures.
>
> Rich.

Or avoid the whole issue and make the file large enough to begin
with. Thanks to sparse files you can create a huge file that only uses 1
block on disk. Then you can mmap that and it will use up more disk space
as you fill in data automatically.

MfG    Goswin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-17  0:36   ` Goswin von Brederlow
@ 2010-12-17 14:48     ` Richard W.M. Jones
  2010-12-17 15:49       ` Joel Reymont
  2010-12-17 19:05       ` Goswin von Brederlow
  0 siblings, 2 replies; 10+ messages in thread
From: Richard W.M. Jones @ 2010-12-17 14:48 UTC (permalink / raw)
  Cc: caml-list

On Fri, Dec 17, 2010 at 01:36:35AM +0100, Goswin von Brederlow wrote:
> Or avoid the whole issue and make the file large enough to begin
> with. Thanks to sparse files you can create a huge file that only uses 1
> block on disk. Then you can mmap that and it will use up more disk space
> as you fill in data automatically.

Sure, if you have an upper limit.  Neither works well on 32 bit
architectures where you're really limited for contiguous free address
space.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-17 14:48     ` Richard W.M. Jones
@ 2010-12-17 15:49       ` Joel Reymont
  2010-12-17 19:05       ` Goswin von Brederlow
  1 sibling, 0 replies; 10+ messages in thread
From: Joel Reymont @ 2010-12-17 15:49 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: caml-list

I'm fine with focusing on 64-bit architectures. 

Sent from my iPhone

On 17/12/2010, at 14:48, "Richard W.M. Jones" <rich@annexia.org> wrote:

> On Fri, Dec 17, 2010 at 01:36:35AM +0100, Goswin von Brederlow wrote:
>> Or avoid the whole issue and make the file large enough to begin
>> with. Thanks to sparse files you can create a huge file that only uses 1
>> block on disk. Then you can mmap that and it will use up more disk space
>> as you fill in data automatically.
> 
> Sure, if you have an upper limit.  Neither works well on 32 bit
> architectures where you're really limited for contiguous free address
> space.
> 
> Rich.
> 
> -- 
> Richard Jones
> Red Hat
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-17 14:48     ` Richard W.M. Jones
  2010-12-17 15:49       ` Joel Reymont
@ 2010-12-17 19:05       ` Goswin von Brederlow
  1 sibling, 0 replies; 10+ messages in thread
From: Goswin von Brederlow @ 2010-12-17 19:05 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: caml-list

"Richard W.M. Jones" <rich@annexia.org> writes:

> On Fri, Dec 17, 2010 at 01:36:35AM +0100, Goswin von Brederlow wrote:
>> Or avoid the whole issue and make the file large enough to begin
>> with. Thanks to sparse files you can create a huge file that only uses 1
>> block on disk. Then you can mmap that and it will use up more disk space
>> as you fill in data automatically.
>
> Sure, if you have an upper limit.  Neither works well on 32 bit
> architectures where you're really limited for contiguous free address
> space.
>
> Rich.

Which has nothing to do with appending to a mmap-ed file. You are
already in trouble if the file too big to begin with.

If you need more than 1-3GB mapped on 32bit then you need to map it
dynamically in chunks as needed, which gets a hell of a lot more
complex. Frankly at that point I would just buy a 64bit cpu or install a
64bit kernel on the one you have.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] appending data to a mmap-ed file
  2010-12-16 11:31 appending data to a mmap-ed file Joel Reymont
                   ` (2 preceding siblings ...)
  2010-12-16 17:16 ` Richard W.M. Jones
@ 2010-12-18  9:56 ` Christophe Raffalli
  3 siblings, 0 replies; 10+ messages in thread
From: Christophe Raffalli @ 2010-12-18  9:56 UTC (permalink / raw)
  To: caml-list


[-- Attachment #1.1: Type: text/plain, Size: 1639 bytes --]

Le 16/12/10 12:31, Joel Reymont a écrit :
> I'm constantly appending to a file of stock quotes (ints, longs, doubles, etc.). I have this file mapped into memory with mmap. 
>
> What's the most efficient way to make newly appended data available as part of the memory mapping?
>
> Obligatory OCaml content: I'm trying to prototype a trading system in OCaml.
>  
> 	Thanks, Joel

euh: mremap ? Même si ce n'est pas super portable ...

It can fails especially if you disallow adress change ...

I would access all mmapped data through an indirection and let mremap
move the data
... On systems without mremap, I would unmap and mmap ...

By the way, if the file is large, you could mmap it partially only, on
demand ...

Cheers,
Christophe
> ---
> http://twitter.com/wagerlabs
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


-- 
Christophe Raffalli
Universite de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tel: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution 
can check this signature. The public key is
stored on www.keyserver.net
---------------------------------------------


[-- Attachment #1.2: Christophe_Raffalli.vcf --]
[-- Type: text/x-vcard, Size: 310 bytes --]

begin:vcard
fn:Christophe Raffalli
n:Raffalli;Christophe
org:LAMA (UMR 5127)
email;internet:christophe.raffalli@univ-savoie.fr
title;quoted-printable:Ma=C3=AEtre de conf=C3=A9rences
tel;work:+33 4 79 75 81 03
note:http://www.lama.univ-savoie.fr/~raffalli
x-mozilla-html:TRUE
version:2.1
end:vcard


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-12-18  9:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-16 11:31 appending data to a mmap-ed file Joel Reymont
2010-12-16 12:38 ` [Caml-list] " Jesper Louis Andersen
2010-12-16 13:13   ` Joel Reymont
2010-12-16 12:57 ` Gerd Stolpmann
2010-12-16 17:16 ` Richard W.M. Jones
2010-12-17  0:36   ` Goswin von Brederlow
2010-12-17 14:48     ` Richard W.M. Jones
2010-12-17 15:49       ` Joel Reymont
2010-12-17 19:05       ` Goswin von Brederlow
2010-12-18  9:56 ` Christophe Raffalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).