[Caml-list] Allocation profiling for x86-64 native code

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Mark Shinwell <mark@three-tuns.net>
To: caml-list@inria.fr
Subject: [Caml-list] Allocation profiling for x86-64 native code
Date: Fri, 13 Sep 2013 16:48:34 +0100	[thread overview]
Message-ID: <20130913154834.GA6566@three-tuns.net> (raw)

Large OCaml programs can experience performance degradation due to high
garbage collection loads (or possibly due to it being Friday 13th).
Understanding the memory usage of such programs can also be difficult.

To this end, I am pleased to release a version of OCaml 4.01 that
contains functionality for the memory profiling of native code programs,
for the x86-64 architecture.  Currently it is only fully working on Linux
platforms, but there should be a version for Mac OS X in the near future,
and the BSDs.

  opam remote add mshinwell git://github.com/mshinwell/opam-repo-dev
  opam update
  opam switch 4.01-allocation-profiling

The source is on GitHub:
  https://github.com/mshinwell/ocaml/tree/4.01-allocation-profiling

Using ocamlopt with -allocation-tracing and running in an environment
with the OCAMLRUNPARAM environment variable including the letter "T"
enables the use of the functionality in the new [Allocation_profiling]
standard library module [1].  You should also ensure that you have
the new ocamlmklocs script (installed to the same place as the
compiler binaries) on your PATH at compile time.

The runtime system for this compiler contains instrumentation that can
produce a global analysis showing the total number of words allocated
on the OCaml heaps by source location.  This works not only for blocks
allocated in OCaml code but also in C stubs.  Further, values are
instrumented---without space overhead---in order to be able to determine
from a snapshot of the heap which value was allocated where; and also
to provide a runtime API that can be queried from the instrumented
program itself.  Following Unix tradition, there is no shiny user
interface.  Scripts are provided to decode the data from the former
two analyses.  There is also a script that can draw a graph of the
heap quotiented by the equivalence relation that identifies two
blocks iff they were allocated at the same source location.

Programs compiled with allocation profiling will run slower than under
normal compilation, but this degradation should not be that
significant.  They will use a little more memory than normal, but not
much, and the amount of increase roughly speaking is about twice the
size of the machine code in your program.  (In particular there is
no overhead per value allocated.)

Source locations reported are very slightly approximated, but this
should not normally cause a problem.

Sometimes the source location that appears in the profile may not be
quite the function you're looking for (e.g. some allocation function
that's called from multiple places; the allocation function rather
than the callers might show up).  The system goes to some rudimentary
efforts to avoid this by looking back up the stack one level under
certain conditions, but if you get stuck, you can set a breakpoint in
gdb on the function identified in the profile and collect a backtrace
every time you pass it.  These can then be uniquified by a shell script
left as an exercise to the reader.  (This technique has been discussed
previously on this list.)

This is not yet a fully-polished system, but it has been used at Jane
Street on rather large OCaml programs with success.  The part that still
requires most work is the runtime API.  If you experience long compile
times then you can disable the runtime API support by editing the
ocamlmklocs script to write an empty file; fixing this is on the list.
(See the comment in stdlib/allocation_profiling.mli.)  This is on the
list to be fixed.

I would be interested to hear of reports of success or failure; or
feature requests.  One feature on the near-term list is being able to
measure how long a particular value has been in existence.

Have fun.

Mark

P.S. There is some related work going on at OCamlPro using similar
techniques.  These projects were developed independently, but we expect
to collaborate on getting some of this technology into the main
distribution.

[1] https://github.com/mshinwell/ocaml/blob/4.01-allocation-profiling/stdlib/allocation_profiling.mli

next             reply	other threads:[~2013-09-13 15:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-13 15:48 Mark Shinwell [this message]
2013-09-13 16:52 ` Gerd Stolpmann
2013-09-16  8:00   ` Mark Shinwell
2013-09-14 22:10 ` Jacques-Henri Jourdan
2013-09-16  8:42   ` Mark Shinwell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130913154834.GA6566@three-tuns.net \
    --to=mark@three-tuns.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).