Re: [Caml-list] Allocation profiling for x86-64 native code

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Mark Shinwell <mshinwell@janestreet.com>
To: Jacques-Henri Jourdan <jacques-henri.jourdan@ens.fr>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Allocation profiling for x86-64 native code
Date: Mon, 16 Sep 2013 09:42:10 +0100	[thread overview]
Message-ID: <CAM3Ki76rw0CJSAVXeHqig=i1Y8Tp3+gBX+QjK0Q2=Wh71CKEHQ@mail.gmail.com> (raw)
In-Reply-To: <5234DECC.4090808@ens.fr>

On 14 September 2013 23:10, Jacques-Henri Jourdan
<jacques-henri.jourdan@ens.fr> wrote:
> statistical profiling by
> annotating only a fraction of the allocated blocks.

I think it's certainly worth experimenting with different approaches.
I've tried to address your points below.

> Here was the advantages I thought about :
>
> 1- Lower execution time overhead. BTW, what is yours ?

I don't have any exact figures to hand, and in fact, it will
potentially vary quite a lot depending on the amount of allocation.
I think the time overhead is maybe 20% at the moment for a large
allocation-heavy application.

This could be decreased somewhat---firstly by optimizing, and
secondly by maybe allowing the "global"
(cf. [Allocation_profiling.Global] in the stdlib) analysis to be
disabled.  (This analysis was actually the predecessor of the
value-annotating analysis.)  The remaining overhead of annotating
the values is small.

> 2- When annotating a block, we could decide to store more information.
> Typically, it could be very profitable to know (at least a part of) the
> current backtrace while allocating.

Agreed.  I'm hoping to do some work on capturing a partial
backtrace in that scenario.

> 3- We could also analyze more precisely the life of annotated objects
> without much performance constrains, because they are fewer. We could
> for example put watchpoints on them to know when they accessed, or do
> statistics about there life time...

I think if using watchpoints you'd have to pick and choose what
you instrument fairly carefully, in any case, otherwise everything
will grind to a halt.  Lifetime statistics are likely to be supported
soon (this should be easy).

> 4- Your method for annotating blocks uses the 22 highest bits of the
> blocks headers to store the bits 4..25 of the allocation point address.
> I can see several (minor) problems of doing that
>    - The maximum size of a block is then limited to 32GB.

I think such blocks are unlikely to occur in practice.  I'd argue
that it's most likely a mistake to have such large allocations inside
the OCaml heap, too.

>    - That does mean that those 22 bits identify the allocation point,
> and I am not convinced that the probability of collision is negligible
> in the case of large code base (like code) non-contiguously loaded in
> memory because of dynlink, for example.

I neglected to say that this is not expected to work with natdynlink
at the moment.  I think for x86-64 Linux the current assumption about
contiguous code is correct, at least using the normal linker scripts,
and the range is probably sufficient.

The main place where the approximation could be problematic, I think,
is where there are allocation points close together that can't quite
be distinguished.  In practice I'm not sure this is a problem, though.

>    - This is not usable for x86-32 bits.

I'm not sure x86-32 is worthy of much attention any more (dare
I say it!) but 32-bit platforms more generally I think still are
of concern.  My plan for upstreaming this work includes a patch
to enable compiler hackers to adjust the layout of the block header
more easily (roughly speaking, removing hard-coded constants in
the code) and that may end up including an option to allocate more
than one header word per block.  This could then be used to solve
the 32-bit problem, as well as likely being a useful platform for
other experiments.

> With statistical profiling, we can afford having a separate table of
> traced blocks, that we would maintain at the end of each GC phase. This
> way, we don't actually "annotate" blocks, but we rather annotate the
> corresponding table entry.

This seems like it might cause quite a lot of extra work, and
disturb cache behaviour, no?  (The current "global" analysis
mentioned above in my system will disturb the cache too, but I
think if that's turned off, just the value annotation should not.)

Mark

     prev parent reply	other threads:[~2013-09-16  8:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-13 15:48 Mark Shinwell
2013-09-13 16:52 ` Gerd Stolpmann
2013-09-16  8:00   ` Mark Shinwell
2013-09-14 22:10 ` Jacques-Henri Jourdan
2013-09-16  8:42   ` Mark Shinwell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM3Ki76rw0CJSAVXeHqig=i1Y8Tp3+gBX+QjK0Q2=Wh71CKEHQ@mail.gmail.com' \
    --to=mshinwell@janestreet.com \
    --cc=caml-list@inria.fr \
    --cc=jacques-henri.jourdan@ens.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).