Re: [Caml-list] Bigarray is a pig

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: "David McClain" <dmcclain1@mindspring.com>
To: "caml" <caml-list@inria.fr>
Subject: Re: [Caml-list] Bigarray is a pig
Date: Sun, 25 Jul 2004 02:09:42 -0700	[thread overview]
Message-ID: <000c01c47227$226c7240$0201a8c0@dylan> (raw)
In-Reply-To: <1090663185.15206.52.camel@qrnik>

Here is a case where even paying the price of a function call, no matter how
indirect, is well worth it...

I just finished an implementation of memory mapped files for binary array
access from scientific datasets stored on disk. The implementation was in
C++ along with natural usage pseudo-pointers. Every array access has to
check whether or not the position is currently mapped into memory. If so, it
simply returns that address for get or set. If not, then it has to sync the
dirty pages to disk, then remap another segment of the file into memory
before continuing as before.

I am in the process of writing an OCaml interface to this as we speak. But
is it worth doing? My tests show that for pure sequential writing, the
memory mapped I/O is about 350% faster than using buffered I/O with
fwrite(). That's nice... but for pseudo-random access, where I sequentially
march upward in memory and write that address and the two surrounding
addresses at +/-4KB, which is similar to some scientific array access
patterns, the speed of the memory mapped I/O is 200 times faster (20,000%)
than using buffered I/O.

Of course, I chose that 4 KB as a ticklish offset because it both matches
the page frame size and will cause some stumbling in the memory mapped I/O.
And it also happens to be the more or less standard size of a buffer for
fwrite. I'm writing 24 MBytes of data overall, one longword at a time.

The effective throughput is about 72 MB/sec for sequential access and 100
MB/sec for randomized access, compared with 20 MB/sec sequential and 0.5
MB/sec randomized for fwrite buffered I/O.

Disks are slow. File systems are slower yet. By letting the Mach kernel
handle the I/O directly on page faults, I end up squeezing a lot more
performance from the data handling system. This is still orders of magnitude
slower than my effective computation rate, and so the cost of all the bounds
checking and subroutine calling is lost in the noise.

[Tests were performed on a stock 1.25 GHz G4 eMac].

David McClain
Senior Corporate Scientist
Avisere, Inc.

+1.520.390.7738 (USA)
david.mcclain@avisere.com

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2004-07-25  9:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-23 20:36 Brandon J. Van Every
2004-07-23 21:05 ` Brian Hurt
2004-07-24  9:49   ` Brandon J. Van Every
2004-07-23 21:05 ` Olivier Andrieu
2004-07-24  9:07   ` Brandon J. Van Every
2004-07-24  9:59     ` Marcin 'Qrczak' Kowalczyk
2004-07-25  9:09       ` David McClain [this message]
2004-07-24 10:39     ` Markus Mottl
2004-07-23 21:45 ` David McClain
2004-07-23 22:01 ` David McClain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000c01c47227$226c7240$0201a8c0@dylan' \
    --to=dmcclain1@mindspring.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).