caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Anil Madhavapeddy <anil@recoil.org>
Cc: rixed@happyleptic.org, OCaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] IPv6 packet parsing
Date: Fri, 18 Oct 2013 15:52:07 +0200	[thread overview]
Message-ID: <1382104327.3040.21.camel@e130> (raw)
In-Reply-To: <20131018122018.GJ25839@dark.recoil.org>

[-- Attachment #1: Type: text/plain, Size: 3342 bytes --]

Am Freitag, den 18.10.2013, 13:20 +0100 schrieb Anil Madhavapeddy:
> On Fri, Oct 18, 2013 at 02:16:12PM +0200, rixed@happyleptic.org wrote:
> > -[ Fri, Oct 18, 2013 at 12:59:55PM +0100, Anil Madhavapeddy ]----
> > > One feature I'd really like to see in Bitstring is support for Bigarray,
> > > since that avoids a copy into the OCaml heap and lets us do quite high
> > > performance parsing.  If I remember right, there was a patch on the
> > > Bitstring issue tracker, but it wasn't parameterised (so it's either
> > > Bistring+string or Bitstring+bigarray, which isn't ideal).
> > 
> > Pardon my lack of familiarity with bigarrays, but I can't see what's the
> > difference between copying packets from pcap ring buffer into a bigarray
> > or into a string. Or do you mean using Bigarray.map_file on the whole
> > raw ring buffer and handle it without pcap help?

Without knowing details: maybe no copy is required at all? The pcap ring
buffer could be directly wrapped as Bigarray.

> We have a number of use-cases that run OCaml in kernel mode, directly
> operating on packets read from a network driver that's also written in
> OCaml.  Bigarrays are used as the mechanism for passing around externally
> allocated memory (i.e. network card buffers) directly, whereas inspecting
> them with a string-based Bigarray requires an expensive data copy.
> 
> See: http://anil.recoil.org/papers/2013-asplos-mirage.pdf
> or http://www.openmirage.org

For similar reasons, I also added some Bigarray functions to Ocamlnet:

http://projects.camlcity.org/projects/dl/ocamlnet-3.7.3/doc/html-main/Netsys_mem.html

If you look at the stub behind e.g. Unix.read, you'll see that the data
is first read into an internal unaligned buffer, and then copied to the
string buffer. This means usually two copies of the data: one from the
kernel buffer to the internal buffer, and one from there to the string.

If you use a Bigarray instead the internal buffer becomes superfluous:
Bigarrays are malloc'ed memory, and cannot be moved by the GC. Hence,
you can invoke the read() syscall directly with the Bigarray as buffer.
If you additionally ensure that the Bigarray is page-aligned, the kernel
can sometimes even avoid copying at all (though only some OS seem to
implement such a strategy, as changing the page mapping or doing some
direct I/O can be more costly than copying).

Another advantage here is that you can freely choose the size of the
buffer (Unix.read et al use fixed-size 64K for the internal buffer).
Also you can allocate the buffer in a shared area.

Ocamlnet now prefers Bigarrays as primary buffers where reasonable, and
where a speedup (or lower CPU consumption) can be expected. E.g. The
HTTP client first reads data into a bigarray, splits the header there
into lines (which are then normal strings again), and gathers the data
chunks from the HTTP body (which can be strings or Bigarrays, at the
user's choice).

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

  parent reply	other threads:[~2013-10-18 13:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-18  4:15 Johan Mazel
2013-10-18  5:33 ` Stéphane Glondu
2013-10-18 11:55 ` rixed
2013-10-18 11:59   ` Anil Madhavapeddy
2013-10-18 12:16     ` rixed
2013-10-18 12:20       ` Anil Madhavapeddy
2013-10-18 12:26         ` rixed
2013-10-18 13:52         ` Gerd Stolpmann [this message]
2013-10-18 14:13           ` rixed
2013-10-18 14:40             ` Stéphane Glondu
2013-10-18 14:20         ` Markus Mottl
2013-10-18 17:23           ` Paul Pelzl
2013-10-18 17:52           ` rixed
2013-10-18 20:09             ` Markus Mottl
2013-11-06 13:57         ` Richard W.M. Jones
2013-10-19  8:54   ` Johan Mazel
2013-10-21  7:58     ` rixed
2013-10-21  8:37       ` Johan Mazel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1382104327.3040.21.camel@e130 \
    --to=info@gerd-stolpmann.de \
    --cc=anil@recoil.org \
    --cc=caml-list@inria.fr \
    --cc=rixed@happyleptic.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).