[Caml-list] Measuring GC latencies for OCaml program

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] Measuring GC latencies for OCaml program
@ 2016-05-30 19:48 Gabriel Scherer
  2016-05-31  1:13 ` Yaron Minsky
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Gabriel Scherer @ 2016-05-30 19:48 UTC (permalink / raw)
  To: caml users; +Cc: Damien Doligez

Dear caml-list,

You may be interested in the following blog post, in which I give
instructions to measure the worst-case latencies incurred by the GC:

  Measuring GC latencies in Haskell, OCaml, Racket
  http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/

In summary, the commands to measure latencies look something like:

    % build the program with the instrumented runtime
    ocamlbuild -tag "runtime_variant(i)" myprog.native

    % run with instrumentation enabled
    OCAML_INSTR_FILE="ocaml.log" ./main.native

    % visualize results from the raw log
    $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
    $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log

While the OCaml GC has had a good incremental mode with low latencies
for most workloads for a long time, the ability to instrument it to
actually measure latencies is still in its infancy: it is
a side-result of Damien Doligez's stint at Jane Street last year, and
4.03.0 is the first release in which this work is available.

A practical consequence of this youth is that the "user experience" of
actually performing these measurements is currently very bad. The GC
measurements are activated in an instrumented runtime variant (OCaml
supports having several variants of the runtime available, and
deciding which one to use for a specific program at link-time), which
is the right design choice, but right now this variant is not built by
default by the compiler distribution -- building it is
a configure-time option disabled by default. This means that, as
a user interested in doing the measurements, you have to compile an
alternative OCaml compiler.
Furthermore, processing the raw instrumented log requires tool that
are also in their infancy, and are currently included in the compiler
distribution sources but not installed -- so you have to have a source
checkout available to use them. In contrast, GHC's instrumentation is
enabled by just passing the option "+RTS -s" to the Haskell program of
interest; this is superbly convenient and something we should aim at.

I discussed with Damien whether we should enable building the
instrumented runtime by default (for example pass
the --with-instrumented-runtime option to the opam switches people are
using, and encourage distributions to use it in their packages as
well). Of course there is a cost/benefit trade-off: currently
virtually nobody is using this instrumentation, but enabling it by
default would increase the compilation time of the compiler
distribution for everyone. (On my machine it only adds 5 seconds to
total build time.)

I personally think that we should aim for a rock-solid experience for
profiling and instrumenting OCaml program enabled by default¹. It is
worth making it slightly longer for anyone to install the compiler if
we can make it vastly easier to measure GC pauses in our program when
the need arises (even if it's not very often). But that is
a discussion that should be had before making any choice.

Regarding the log analysis tools, before finding about Damien's
included-but-not installed tools (a shell and an awk script, in the
finest Unix tradition) I built a quick&dirty OCaml script to do some
measurements, which can be found in the benchmark repository below. It
would not be much more work to grow this in a reusable library to
extract the current log format into a structured data structure -- the
format is undocumented but the provided scripts in tools/ have enough
information to infer the structure. Such a script/library would, of
course, remain tightly coupled to the OCaml version, but I think it
could be useful to have it packaged for users to play with.

  https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml

¹: We cannot expect users to effectively write performant code if they
don't have the tool support for it. The fact that lazyness in Haskell
makes it harder for users to reason about efficiency or memory usage
has made the avaibility of excellent performance tooling *necessary*,
where it is merely nice-to-have in OCaml. Rather ironically, Haskell
tooling is now much better than OCaml's in this area, to the point
that it can be easier to write efficient code in Haskell.

Three side-notes on profiling tools:

1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
  (no need for a frame-pointers switch), and this is documented:
    https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux

2. Thomas Leonard has done excellent work on domain-specific profiling
   tools for Mirage, and this is the kind of tool support that I think
   should be available to anyone out of the box.
     http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
     http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/

3. There is currently more debate than anyone could wish for around
   a pull request of Mark Shinwell for runtime support for dynamic call
   graph construction and its use for memory profiling.
     https://github.com/ocaml/ocaml/pull/585

4. Providing a good user experience for performance or space profiling
   is a fundamentally harder problem than for GC pauses. It may
   require specially-compiled versions of the libraries used by your
   program, and thus a general policy/agreement across the
   ecosystem. Swapping a different runtime at link-time is very easy.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Measuring GC latencies for OCaml program
  2016-05-30 19:48 [Caml-list] Measuring GC latencies for OCaml program Gabriel Scherer
@ 2016-05-31  1:13 ` Yaron Minsky
  2016-05-31  5:39 ` Malcolm Matalka
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Yaron Minsky @ 2016-05-31  1:13 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: caml users, Damien Doligez

I'm very pleased to see Damien's profiling work picked up and used
(and nice to see that OCaml looks pretty good by comparison), and I
agree making it easily available by default is a great thing. OCaml
lets you be lazy in a few ways --- I know OCaml's excellent type
system caused me to be too lazy for too long about testing (though
I've long since repented), and OCaml's relatively predictable
performance made many of us similarly lazy about profiling.

y

On Mon, May 30, 2016 at 3:48 PM, Gabriel Scherer
<gabriel.scherer@gmail.com> wrote:
> Dear caml-list,
>
> You may be interested in the following blog post, in which I give
> instructions to measure the worst-case latencies incurred by the GC:
>
>   Measuring GC latencies in Haskell, OCaml, Racket
>   http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
>
> In summary, the commands to measure latencies look something like:
>
>     % build the program with the instrumented runtime
>     ocamlbuild -tag "runtime_variant(i)" myprog.native
>
>     % run with instrumentation enabled
>     OCAML_INSTR_FILE="ocaml.log" ./main.native
>
>     % visualize results from the raw log
>     $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
>     $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log
>
> While the OCaml GC has had a good incremental mode with low latencies
> for most workloads for a long time, the ability to instrument it to
> actually measure latencies is still in its infancy: it is
> a side-result of Damien Doligez's stint at Jane Street last year, and
> 4.03.0 is the first release in which this work is available.
>
> A practical consequence of this youth is that the "user experience" of
> actually performing these measurements is currently very bad. The GC
> measurements are activated in an instrumented runtime variant (OCaml
> supports having several variants of the runtime available, and
> deciding which one to use for a specific program at link-time), which
> is the right design choice, but right now this variant is not built by
> default by the compiler distribution -- building it is
> a configure-time option disabled by default. This means that, as
> a user interested in doing the measurements, you have to compile an
> alternative OCaml compiler.
> Furthermore, processing the raw instrumented log requires tool that
> are also in their infancy, and are currently included in the compiler
> distribution sources but not installed -- so you have to have a source
> checkout available to use them. In contrast, GHC's instrumentation is
> enabled by just passing the option "+RTS -s" to the Haskell program of
> interest; this is superbly convenient and something we should aim at.
>
> I discussed with Damien whether we should enable building the
> instrumented runtime by default (for example pass
> the --with-instrumented-runtime option to the opam switches people are
> using, and encourage distributions to use it in their packages as
> well). Of course there is a cost/benefit trade-off: currently
> virtually nobody is using this instrumentation, but enabling it by
> default would increase the compilation time of the compiler
> distribution for everyone. (On my machine it only adds 5 seconds to
> total build time.)
>
> I personally think that we should aim for a rock-solid experience for
> profiling and instrumenting OCaml program enabled by default¹. It is
> worth making it slightly longer for anyone to install the compiler if
> we can make it vastly easier to measure GC pauses in our program when
> the need arises (even if it's not very often). But that is
> a discussion that should be had before making any choice.
>
> Regarding the log analysis tools, before finding about Damien's
> included-but-not installed tools (a shell and an awk script, in the
> finest Unix tradition) I built a quick&dirty OCaml script to do some
> measurements, which can be found in the benchmark repository below. It
> would not be much more work to grow this in a reusable library to
> extract the current log format into a structured data structure -- the
> format is undocumented but the provided scripts in tools/ have enough
> information to infer the structure. Such a script/library would, of
> course, remain tightly coupled to the OCaml version, but I think it
> could be useful to have it packaged for users to play with.
>
>   https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml
>
> ¹: We cannot expect users to effectively write performant code if they
> don't have the tool support for it. The fact that lazyness in Haskell
> makes it harder for users to reason about efficiency or memory usage
> has made the avaibility of excellent performance tooling *necessary*,
> where it is merely nice-to-have in OCaml. Rather ironically, Haskell
> tooling is now much better than OCaml's in this area, to the point
> that it can be easier to write efficient code in Haskell.
>
> Three side-notes on profiling tools:
>
> 1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
>   (no need for a frame-pointers switch), and this is documented:
>     https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux
>
> 2. Thomas Leonard has done excellent work on domain-specific profiling
>    tools for Mirage, and this is the kind of tool support that I think
>    should be available to anyone out of the box.
>      http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>      http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>
> 3. There is currently more debate than anyone could wish for around
>    a pull request of Mark Shinwell for runtime support for dynamic call
>    graph construction and its use for memory profiling.
>      https://github.com/ocaml/ocaml/pull/585
>
> 4. Providing a good user experience for performance or space profiling
>    is a fundamentally harder problem than for GC pauses. It may
>    require specially-compiled versions of the libraries used by your
>    program, and thus a general policy/agreement across the
>    ecosystem. Swapping a different runtime at link-time is very easy.
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Measuring GC latencies for OCaml program
  2016-05-30 19:48 [Caml-list] Measuring GC latencies for OCaml program Gabriel Scherer
  2016-05-31  1:13 ` Yaron Minsky
@ 2016-05-31  5:39 ` Malcolm Matalka
  2016-06-10 20:35 ` Jon Harrop
  2016-09-14  2:51 ` pratikfegade
  3 siblings, 0 replies; 9+ messages in thread
From: Malcolm Matalka @ 2016-05-31  5:39 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: caml users, Damien Doligez

This post was a great read!  I can say, at least, that I am looking
forward to this instrumentation being easier to use in the future.  I'm
planning on writing some low-latency programs in Ocaml and being able to
measure these things would be great.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Caml-list] Measuring GC latencies for OCaml program
  2016-05-30 19:48 [Caml-list] Measuring GC latencies for OCaml program Gabriel Scherer
  2016-05-31  1:13 ` Yaron Minsky
  2016-05-31  5:39 ` Malcolm Matalka
@ 2016-06-10 20:35 ` Jon Harrop
  2016-06-10 21:34   ` Stanislav Artemkin
  2016-09-14  2:51 ` pratikfegade
  3 siblings, 1 reply; 9+ messages in thread
From: Jon Harrop @ 2016-06-10 20:35 UTC (permalink / raw)
  To: 'Gabriel Scherer', 'caml users'; +Cc: 'Damien Doligez'

Very interesting, thank you!

We just implemented a substantial client and server system for the finance sector with the "low" latency server written in OCaml. I have done this before in other languages and seen it done in many more languages. The OCaml is by far the consistently-fastest solution I have ever seen. Orders of magnitude faster than the last C++ solution I saw. In particular, compared to Java and .NET where we see substantial latencies from the GC at around 100ms, with OCaml there is no visible peak at high latency due to the GC at all. And this project was implemented to a very short deadline with no time for optimisation at all.

On a related note, we used Jane St.'s Core and Async libraries as well as Cohttp and found them all to be *phenomenally* efficient and robust.

In case anyone is interested, the only pain point I had was the development environment. I actually prototyped all my hard code in simplified F# in Visual Studio on Windows and then ported to OCaml. Emacs and Merlin crash and hang a lot for me: maybe 50 times per day. Hence my other post. :-)

In terms of the language, OCaml was very well suited to this task. Lots of purely functional data structures forming in-memory databases that can be queried in different ways and have many different versions of them stored in different places at different times. Perhaps the main language feature I missed from F# was (surprisingly!) reflection. My F# client code uses reflection to serialize and deserialize messages. With no reflection I couldn't do that in OCaml so I used reflection in F# to autogenerate the OCaml code.

Cheers,
Jon.

-----Original Message-----
From: caml-list-request@inria.fr [mailto:caml-list-request@inria.fr] On Behalf Of Gabriel Scherer
Sent: 30 May 2016 20:48
To: caml users
Cc: Damien Doligez
Subject: [Caml-list] Measuring GC latencies for OCaml program

Dear caml-list,

You may be interested in the following blog post, in which I give instructions to measure the worst-case latencies incurred by the GC:

  Measuring GC latencies in Haskell, OCaml, Racket
  http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/

In summary, the commands to measure latencies look something like:

    % build the program with the instrumented runtime
    ocamlbuild -tag "runtime_variant(i)" myprog.native

    % run with instrumentation enabled
    OCAML_INSTR_FILE="ocaml.log" ./main.native

    % visualize results from the raw log
    $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
    $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log

While the OCaml GC has had a good incremental mode with low latencies for most workloads for a long time, the ability to instrument it to actually measure latencies is still in its infancy: it is a side-result of Damien Doligez's stint at Jane Street last year, and
4.03.0 is the first release in which this work is available.

A practical consequence of this youth is that the "user experience" of actually performing these measurements is currently very bad. The GC measurements are activated in an instrumented runtime variant (OCaml supports having several variants of the runtime available, and deciding which one to use for a specific program at link-time), which is the right design choice, but right now this variant is not built by default by the compiler distribution -- building it is a configure-time option disabled by default. This means that, as a user interested in doing the measurements, you have to compile an alternative OCaml compiler.
Furthermore, processing the raw instrumented log requires tool that are also in their infancy, and are currently included in the compiler distribution sources but not installed -- so you have to have a source checkout available to use them. In contrast, GHC's instrumentation is enabled by just passing the option "+RTS -s" to the Haskell program of interest; this is superbly convenient and something we should aim at.

I discussed with Damien whether we should enable building the instrumented runtime by default (for example pass the --with-instrumented-runtime option to the opam switches people are using, and encourage distributions to use it in their packages as well). Of course there is a cost/benefit trade-off: currently virtually nobody is using this instrumentation, but enabling it by default would increase the compilation time of the compiler distribution for everyone. (On my machine it only adds 5 seconds to total build time.)

I personally think that we should aim for a rock-solid experience for profiling and instrumenting OCaml program enabled by default¹. It is worth making it slightly longer for anyone to install the compiler if we can make it vastly easier to measure GC pauses in our program when the need arises (even if it's not very often). But that is a discussion that should be had before making any choice.

Regarding the log analysis tools, before finding about Damien's included-but-not installed tools (a shell and an awk script, in the finest Unix tradition) I built a quick&dirty OCaml script to do some measurements, which can be found in the benchmark repository below. It would not be much more work to grow this in a reusable library to extract the current log format into a structured data structure -- the format is undocumented but the provided scripts in tools/ have enough information to infer the structure. Such a script/library would, of course, remain tightly coupled to the OCaml version, but I think it could be useful to have it packaged for users to play with.

  https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml

¹: We cannot expect users to effectively write performant code if they don't have the tool support for it. The fact that lazyness in Haskell makes it harder for users to reason about efficiency or memory usage has made the avaibility of excellent performance tooling *necessary*, where it is merely nice-to-have in OCaml. Rather ironically, Haskell tooling is now much better than OCaml's in this area, to the point that it can be easier to write efficient code in Haskell.

Three side-notes on profiling tools:

1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
  (no need for a frame-pointers switch), and this is documented:
    https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux

2. Thomas Leonard has done excellent work on domain-specific profiling
   tools for Mirage, and this is the kind of tool support that I think
   should be available to anyone out of the box.
     http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
     http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/

3. There is currently more debate than anyone could wish for around
   a pull request of Mark Shinwell for runtime support for dynamic call
   graph construction and its use for memory profiling.
     https://github.com/ocaml/ocaml/pull/585

4. Providing a good user experience for performance or space profiling
   is a fundamentally harder problem than for GC pauses. It may
   require specially-compiled versions of the libraries used by your
   program, and thus a general policy/agreement across the
   ecosystem. Swapping a different runtime at link-time is very easy.

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs=

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Measuring GC latencies for OCaml program
  2016-06-10 20:35 ` Jon Harrop
@ 2016-06-10 21:34   ` Stanislav Artemkin
  2016-06-10 23:14     ` Yaron Minsky
  2016-06-11  8:53     ` Jon Harrop
  0 siblings, 2 replies; 9+ messages in thread
From: Stanislav Artemkin @ 2016-06-10 21:34 UTC (permalink / raw)
  To: jon; +Cc: Gabriel Scherer, caml users, Damien Doligez

[-- Attachment #1: Type: text/plain, Size: 8628 bytes --]

Very interesting! It seems it was completely broken C++ solution. Wish I
could use OCaml for current project, but we still have to use C++ to get
microsecond latencies.

Do I correctly understand that OCaml is suitable for latencies ~10ms and
worse?

Also, there is still an issue with multithreading. Did you use any existing
solution?

Thank you

On Sat, Jun 11, 2016 at 12:35 AM, Jon Harrop <jon@ffconsultancy.com> wrote:

>
> Very interesting, thank you!
>
> We just implemented a substantial client and server system for the finance
> sector with the "low" latency server written in OCaml. I have done this
> before in other languages and seen it done in many more languages. The
> OCaml is by far the consistently-fastest solution I have ever seen. Orders
> of magnitude faster than the last C++ solution I saw. In particular,
> compared to Java and .NET where we see substantial latencies from the GC at
> around 100ms, with OCaml there is no visible peak at high latency due to
> the GC at all. And this project was implemented to a very short deadline
> with no time for optimisation at all.
>
> On a related note, we used Jane St.'s Core and Async libraries as well as
> Cohttp and found them all to be *phenomenally* efficient and robust.
>
> In case anyone is interested, the only pain point I had was the
> development environment. I actually prototyped all my hard code in
> simplified F# in Visual Studio on Windows and then ported to OCaml. Emacs
> and Merlin crash and hang a lot for me: maybe 50 times per day. Hence my
> other post. :-)
>
> In terms of the language, OCaml was very well suited to this task. Lots of
> purely functional data structures forming in-memory databases that can be
> queried in different ways and have many different versions of them stored
> in different places at different times. Perhaps the main language feature I
> missed from F# was (surprisingly!) reflection. My F# client code uses
> reflection to serialize and deserialize messages. With no reflection I
> couldn't do that in OCaml so I used reflection in F# to autogenerate the
> OCaml code.
>
> Cheers,
> Jon.
>
> -----Original Message-----
> From: caml-list-request@inria.fr [mailto:caml-list-request@inria.fr] On
> Behalf Of Gabriel Scherer
> Sent: 30 May 2016 20:48
> To: caml users
> Cc: Damien Doligez
> Subject: [Caml-list] Measuring GC latencies for OCaml program
>
> Dear caml-list,
>
> You may be interested in the following blog post, in which I give
> instructions to measure the worst-case latencies incurred by the GC:
>
>   Measuring GC latencies in Haskell, OCaml, Racket
>
> http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
>
> In summary, the commands to measure latencies look something like:
>
>     % build the program with the instrumented runtime
>     ocamlbuild -tag "runtime_variant(i)" myprog.native
>
>     % run with instrumentation enabled
>     OCAML_INSTR_FILE="ocaml.log" ./main.native
>
>     % visualize results from the raw log
>     $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
>     $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log
>
> While the OCaml GC has had a good incremental mode with low latencies for
> most workloads for a long time, the ability to instrument it to actually
> measure latencies is still in its infancy: it is a side-result of Damien
> Doligez's stint at Jane Street last year, and
> 4.03.0 is the first release in which this work is available.
>
> A practical consequence of this youth is that the "user experience" of
> actually performing these measurements is currently very bad. The GC
> measurements are activated in an instrumented runtime variant (OCaml
> supports having several variants of the runtime available, and deciding
> which one to use for a specific program at link-time), which is the right
> design choice, but right now this variant is not built by default by the
> compiler distribution -- building it is a configure-time option disabled by
> default. This means that, as a user interested in doing the measurements,
> you have to compile an alternative OCaml compiler.
> Furthermore, processing the raw instrumented log requires tool that are
> also in their infancy, and are currently included in the compiler
> distribution sources but not installed -- so you have to have a source
> checkout available to use them. In contrast, GHC's instrumentation is
> enabled by just passing the option "+RTS -s" to the Haskell program of
> interest; this is superbly convenient and something we should aim at.
>
> I discussed with Damien whether we should enable building the instrumented
> runtime by default (for example pass the --with-instrumented-runtime option
> to the opam switches people are using, and encourage distributions to use
> it in their packages as well). Of course there is a cost/benefit trade-off:
> currently virtually nobody is using this instrumentation, but enabling it
> by default would increase the compilation time of the compiler distribution
> for everyone. (On my machine it only adds 5 seconds to total build time.)
>
> I personally think that we should aim for a rock-solid experience for
> profiling and instrumenting OCaml program enabled by default¹. It is worth
> making it slightly longer for anyone to install the compiler if we can make
> it vastly easier to measure GC pauses in our program when the need arises
> (even if it's not very often). But that is a discussion that should be had
> before making any choice.
>
> Regarding the log analysis tools, before finding about Damien's
> included-but-not installed tools (a shell and an awk script, in the finest
> Unix tradition) I built a quick&dirty OCaml script to do some measurements,
> which can be found in the benchmark repository below. It would not be much
> more work to grow this in a reusable library to extract the current log
> format into a structured data structure -- the format is undocumented but
> the provided scripts in tools/ have enough information to infer the
> structure. Such a script/library would, of course, remain tightly coupled
> to the OCaml version, but I think it could be useful to have it packaged
> for users to play with.
>
>
> https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml
>
> ¹: We cannot expect users to effectively write performant code if they
> don't have the tool support for it. The fact that lazyness in Haskell makes
> it harder for users to reason about efficiency or memory usage has made the
> avaibility of excellent performance tooling *necessary*, where it is merely
> nice-to-have in OCaml. Rather ironically, Haskell tooling is now much
> better than OCaml's in this area, to the point that it can be easier to
> write efficient code in Haskell.
>
> Three side-notes on profiling tools:
>
> 1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
>   (no need for a frame-pointers switch), and this is documented:
>
> https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux
>
> 2. Thomas Leonard has done excellent work on domain-specific profiling
>    tools for Mirage, and this is the kind of tool support that I think
>    should be available to anyone out of the box.
>      http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>
> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>
> 3. There is currently more debate than anyone could wish for around
>    a pull request of Mark Shinwell for runtime support for dynamic call
>    graph construction and its use for memory profiling.
>      https://github.com/ocaml/ocaml/pull/585
>
> 4. Providing a good user experience for performance or space profiling
>    is a fundamentally harder problem than for GC pauses. It may
>    require specially-compiled versions of the libraries used by your
>    program, and thus a general policy/agreement across the
>    ecosystem. Swapping a different runtime at link-time is very easy.
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs=
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 10729 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Measuring GC latencies for OCaml program
  2016-06-10 21:34   ` Stanislav Artemkin
@ 2016-06-10 23:14     ` Yaron Minsky
  2016-06-11  8:53     ` Jon Harrop
  1 sibling, 0 replies; 9+ messages in thread
From: Yaron Minsky @ 2016-06-10 23:14 UTC (permalink / raw)
  To: Stanislav Artemkin
  Cc: Jon Harrop, Gabriel Scherer, caml users, Damien Doligez

10ms and better, more like it.  You have to know what you're doing,
but you can do as well as C++ in OCaml for many workloads.  Just don't
allocate anything.

And even without any extreme style changes, you can do a hell of a lot
better than 10ms, depending on exactly what you're doing.

y

On Fri, Jun 10, 2016 at 5:34 PM, Stanislav Artemkin <artemkin@gmail.com> wrote:
> Very interesting! It seems it was completely broken C++ solution. Wish I
> could use OCaml for current project, but we still have to use C++ to get
> microsecond latencies.
>
> Do I correctly understand that OCaml is suitable for latencies ~10ms and
> worse?
>
> Also, there is still an issue with multithreading. Did you use any existing
> solution?
>
> Thank you
>
> On Sat, Jun 11, 2016 at 12:35 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
>>
>>
>> Very interesting, thank you!
>>
>> We just implemented a substantial client and server system for the finance
>> sector with the "low" latency server written in OCaml. I have done this
>> before in other languages and seen it done in many more languages. The OCaml
>> is by far the consistently-fastest solution I have ever seen. Orders of
>> magnitude faster than the last C++ solution I saw. In particular, compared
>> to Java and .NET where we see substantial latencies from the GC at around
>> 100ms, with OCaml there is no visible peak at high latency due to the GC at
>> all. And this project was implemented to a very short deadline with no time
>> for optimisation at all.
>>
>> On a related note, we used Jane St.'s Core and Async libraries as well as
>> Cohttp and found them all to be *phenomenally* efficient and robust.
>>
>> In case anyone is interested, the only pain point I had was the
>> development environment. I actually prototyped all my hard code in
>> simplified F# in Visual Studio on Windows and then ported to OCaml. Emacs
>> and Merlin crash and hang a lot for me: maybe 50 times per day. Hence my
>> other post. :-)
>>
>> In terms of the language, OCaml was very well suited to this task. Lots of
>> purely functional data structures forming in-memory databases that can be
>> queried in different ways and have many different versions of them stored in
>> different places at different times. Perhaps the main language feature I
>> missed from F# was (surprisingly!) reflection. My F# client code uses
>> reflection to serialize and deserialize messages. With no reflection I
>> couldn't do that in OCaml so I used reflection in F# to autogenerate the
>> OCaml code.
>>
>> Cheers,
>> Jon.
>>
>> -----Original Message-----
>> From: caml-list-request@inria.fr [mailto:caml-list-request@inria.fr] On
>> Behalf Of Gabriel Scherer
>> Sent: 30 May 2016 20:48
>> To: caml users
>> Cc: Damien Doligez
>> Subject: [Caml-list] Measuring GC latencies for OCaml program
>>
>> Dear caml-list,
>>
>> You may be interested in the following blog post, in which I give
>> instructions to measure the worst-case latencies incurred by the GC:
>>
>>   Measuring GC latencies in Haskell, OCaml, Racket
>>
>> http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
>>
>> In summary, the commands to measure latencies look something like:
>>
>>     % build the program with the instrumented runtime
>>     ocamlbuild -tag "runtime_variant(i)" myprog.native
>>
>>     % run with instrumentation enabled
>>     OCAML_INSTR_FILE="ocaml.log" ./main.native
>>
>>     % visualize results from the raw log
>>     $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
>>     $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log
>>
>> While the OCaml GC has had a good incremental mode with low latencies for
>> most workloads for a long time, the ability to instrument it to actually
>> measure latencies is still in its infancy: it is a side-result of Damien
>> Doligez's stint at Jane Street last year, and
>> 4.03.0 is the first release in which this work is available.
>>
>> A practical consequence of this youth is that the "user experience" of
>> actually performing these measurements is currently very bad. The GC
>> measurements are activated in an instrumented runtime variant (OCaml
>> supports having several variants of the runtime available, and deciding
>> which one to use for a specific program at link-time), which is the right
>> design choice, but right now this variant is not built by default by the
>> compiler distribution -- building it is a configure-time option disabled by
>> default. This means that, as a user interested in doing the measurements,
>> you have to compile an alternative OCaml compiler.
>> Furthermore, processing the raw instrumented log requires tool that are
>> also in their infancy, and are currently included in the compiler
>> distribution sources but not installed -- so you have to have a source
>> checkout available to use them. In contrast, GHC's instrumentation is
>> enabled by just passing the option "+RTS -s" to the Haskell program of
>> interest; this is superbly convenient and something we should aim at.
>>
>> I discussed with Damien whether we should enable building the instrumented
>> runtime by default (for example pass the --with-instrumented-runtime option
>> to the opam switches people are using, and encourage distributions to use it
>> in their packages as well). Of course there is a cost/benefit trade-off:
>> currently virtually nobody is using this instrumentation, but enabling it by
>> default would increase the compilation time of the compiler distribution for
>> everyone. (On my machine it only adds 5 seconds to total build time.)
>>
>> I personally think that we should aim for a rock-solid experience for
>> profiling and instrumenting OCaml program enabled by default¹. It is worth
>> making it slightly longer for anyone to install the compiler if we can make
>> it vastly easier to measure GC pauses in our program when the need arises
>> (even if it's not very often). But that is a discussion that should be had
>> before making any choice.
>>
>> Regarding the log analysis tools, before finding about Damien's
>> included-but-not installed tools (a shell and an awk script, in the finest
>> Unix tradition) I built a quick&dirty OCaml script to do some measurements,
>> which can be found in the benchmark repository below. It would not be much
>> more work to grow this in a reusable library to extract the current log
>> format into a structured data structure -- the format is undocumented but
>> the provided scripts in tools/ have enough information to infer the
>> structure. Such a script/library would, of course, remain tightly coupled to
>> the OCaml version, but I think it could be useful to have it packaged for
>> users to play with.
>>
>>
>> https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml
>>
>> ¹: We cannot expect users to effectively write performant code if they
>> don't have the tool support for it. The fact that lazyness in Haskell makes
>> it harder for users to reason about efficiency or memory usage has made the
>> avaibility of excellent performance tooling *necessary*, where it is merely
>> nice-to-have in OCaml. Rather ironically, Haskell tooling is now much better
>> than OCaml's in this area, to the point that it can be easier to write
>> efficient code in Haskell.
>>
>> Three side-notes on profiling tools:
>>
>> 1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
>>   (no need for a frame-pointers switch), and this is documented:
>>
>> https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux
>>
>> 2. Thomas Leonard has done excellent work on domain-specific profiling
>>    tools for Mirage, and this is the kind of tool support that I think
>>    should be available to anyone out of the box.
>>      http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>>
>> http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
>>
>> 3. There is currently more debate than anyone could wish for around
>>    a pull request of Mark Shinwell for runtime support for dynamic call
>>    graph construction and its use for memory profiling.
>>      https://github.com/ocaml/ocaml/pull/585
>>
>> 4. Providing a good user experience for performance or space profiling
>>    is a fundamentally harder problem than for GC pauses. It may
>>    require specially-compiled versions of the libraries used by your
>>    program, and thus a general policy/agreement across the
>>    ecosystem. Swapping a different runtime at link-time is very easy.
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs=
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Caml-list] Measuring GC latencies for OCaml program
  2016-06-10 21:34   ` Stanislav Artemkin
  2016-06-10 23:14     ` Yaron Minsky
@ 2016-06-11  8:53     ` Jon Harrop
  1 sibling, 0 replies; 9+ messages in thread
From: Jon Harrop @ 2016-06-11  8:53 UTC (permalink / raw)
  To: 'Stanislav Artemkin'; +Cc: 'caml users'

[-- Attachment #1: Type: text/plain, Size: 9751 bytes --]

Ø  It seems it was completely broken C++ solution

For this particular problem (which is essentially building a stock exchange) C++ is only ok for the initial core. As soon as you start adding features that interact with each other a decent solution becomes intractable and you end up with 40+ developers writing millions of lines of code that deep copy data structures to avoid memory management problems, share heavily contended locks, uses thousands of OS threads (and you have to start tweaking the default stack size to fit them all into RAM) and so on. Everybody loses track of the big picture and after 15 years of this you’ve got an unmaintainable code base with runaway costs.

Ø  Do I correctly understand that OCaml is suitable for latencies ~10ms and worse?

I’m seeing 95% of messages handles in under 13 microseconds.

Ø  Also, there is still an issue with multithreading. Did you use any existing solution?

My OCaml solution is single threaded. Latency is great but throughput could be a lot better. In particular, serialization and deserialization to each client is embarrassingly parallel but done in series by OCaml.

Cheers,

Jon.

From: Stanislav Artemkin [mailto:artemkin@gmail.com] 
Sent: 10 June 2016 22:34
To: jon@ffconsultancy.com
Cc: Gabriel Scherer; caml users; Damien Doligez
Subject: Re: [Caml-list] Measuring GC latencies for OCaml program

Very interesting! It seems it was completely broken C++ solution. Wish I could use OCaml for current project, but we still have to use C++ to get microsecond latencies.

Do I correctly understand that OCaml is suitable for latencies ~10ms and worse?

Also, there is still an issue with multithreading. Did you use any existing solution?

Thank you

On Sat, Jun 11, 2016 at 12:35 AM, Jon Harrop <jon@ffconsultancy.com> wrote:

Very interesting, thank you!

We just implemented a substantial client and server system for the finance sector with the "low" latency server written in OCaml. I have done this before in other languages and seen it done in many more languages. The OCaml is by far the consistently-fastest solution I have ever seen. Orders of magnitude faster than the last C++ solution I saw. In particular, compared to Java and .NET where we see substantial latencies from the GC at around 100ms, with OCaml there is no visible peak at high latency due to the GC at all. And this project was implemented to a very short deadline with no time for optimisation at all.

On a related note, we used Jane St.'s Core and Async libraries as well as Cohttp and found them all to be *phenomenally* efficient and robust.

In case anyone is interested, the only pain point I had was the development environment. I actually prototyped all my hard code in simplified F# in Visual Studio on Windows and then ported to OCaml. Emacs and Merlin crash and hang a lot for me: maybe 50 times per day. Hence my other post. :-)

In terms of the language, OCaml was very well suited to this task. Lots of purely functional data structures forming in-memory databases that can be queried in different ways and have many different versions of them stored in different places at different times. Perhaps the main language feature I missed from F# was (surprisingly!) reflection. My F# client code uses reflection to serialize and deserialize messages. With no reflection I couldn't do that in OCaml so I used reflection in F# to autogenerate the OCaml code.

Cheers,
Jon.

-----Original Message-----
From: caml-list-request@inria.fr [mailto:caml-list-request@inria.fr] On Behalf Of Gabriel Scherer
Sent: 30 May 2016 20:48
To: caml users
Cc: Damien Doligez
Subject: [Caml-list] Measuring GC latencies for OCaml program

Dear caml-list,

You may be interested in the following blog post, in which I give instructions to measure the worst-case latencies incurred by the GC:

  Measuring GC latencies in Haskell, OCaml, Racket
  http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/

In summary, the commands to measure latencies look something like:

    % build the program with the instrumented runtime
    ocamlbuild -tag "runtime_variant(i)" myprog.native

    % run with instrumentation enabled
    OCAML_INSTR_FILE="ocaml.log" ./main.native

    % visualize results from the raw log
    $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
    $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log

While the OCaml GC has had a good incremental mode with low latencies for most workloads for a long time, the ability to instrument it to actually measure latencies is still in its infancy: it is a side-result of Damien Doligez's stint at Jane Street last year, and
4.03.0 is the first release in which this work is available.

A practical consequence of this youth is that the "user experience" of actually performing these measurements is currently very bad. The GC measurements are activated in an instrumented runtime variant (OCaml supports having several variants of the runtime available, and deciding which one to use for a specific program at link-time), which is the right design choice, but right now this variant is not built by default by the compiler distribution -- building it is a configure-time option disabled by default. This means that, as a user interested in doing the measurements, you have to compile an alternative OCaml compiler.
Furthermore, processing the raw instrumented log requires tool that are also in their infancy, and are currently included in the compiler distribution sources but not installed -- so you have to have a source checkout available to use them. In contrast, GHC's instrumentation is enabled by just passing the option "+RTS -s" to the Haskell program of interest; this is superbly convenient and something we should aim at.

I discussed with Damien whether we should enable building the instrumented runtime by default (for example pass the --with-instrumented-runtime option to the opam switches people are using, and encourage distributions to use it in their packages as well). Of course there is a cost/benefit trade-off: currently virtually nobody is using this instrumentation, but enabling it by default would increase the compilation time of the compiler distribution for everyone. (On my machine it only adds 5 seconds to total build time.)

I personally think that we should aim for a rock-solid experience for profiling and instrumenting OCaml program enabled by default¹. It is worth making it slightly longer for anyone to install the compiler if we can make it vastly easier to measure GC pauses in our program when the need arises (even if it's not very often). But that is a discussion that should be had before making any choice.

Regarding the log analysis tools, before finding about Damien's included-but-not installed tools (a shell and an awk script, in the finest Unix tradition) I built a quick&dirty OCaml script to do some measurements, which can be found in the benchmark repository below. It would not be much more work to grow this in a reusable library to extract the current log format into a structured data structure -- the format is undocumented but the provided scripts in tools/ have enough information to infer the structure. Such a script/library would, of course, remain tightly coupled to the OCaml version, but I think it could be useful to have it packaged for users to play with.

  https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml

¹: We cannot expect users to effectively write performant code if they don't have the tool support for it. The fact that lazyness in Haskell makes it harder for users to reason about efficiency or memory usage has made the avaibility of excellent performance tooling *necessary*, where it is merely nice-to-have in OCaml. Rather ironically, Haskell tooling is now much better than OCaml's in this area, to the point that it can be easier to write efficient code in Haskell.

Three side-notes on profiling tools:

1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
  (no need for a frame-pointers switch), and this is documented:
    https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux

2. Thomas Leonard has done excellent work on domain-specific profiling
   tools for Mirage, and this is the kind of tool support that I think
   should be available to anyone out of the box.
     http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
     http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/

3. There is currently more debate than anyone could wish for around
   a pull request of Mark Shinwell for runtime support for dynamic call
   graph construction and its use for memory profiling.
     https://github.com/ocaml/ocaml/pull/585

4. Providing a good user experience for performance or space profiling
   is a fundamentally harder problem than for GC pauses. It may
   require specially-compiled versions of the libraries used by your
   program, and thus a general policy/agreement across the
   ecosystem. Swapping a different runtime at link-time is very easy.

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

Bug reports: http://caml.inria.fr/bin/caml-bugs=

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

[-- Attachment #2: Type: text/html, Size: 19437 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Measuring GC latencies for OCaml program
  2016-05-30 19:48 [Caml-list] Measuring GC latencies for OCaml program Gabriel Scherer
                   ` (2 preceding siblings ...)
  2016-06-10 20:35 ` Jon Harrop
@ 2016-09-14  2:51 ` pratikfegade
  2016-09-14  8:38   ` Gabriel Scherer
  3 siblings, 1 reply; 9+ messages in thread
From: pratikfegade @ 2016-09-14  2:51 UTC (permalink / raw)
  To: Ocaml Aggregation List; +Cc: caml-list, damien.doligez, gabriel.scherer

[-- Attachment #1: Type: text/plain, Size: 7414 bytes --]

Hi,

I am not sure  if this is the right place to ask this but this being the only reference to the instrumented runtime I found, I am going to go ahead and ask.

I am trying to use the instrumented runtime as you described but it does not seem to work. I cannot compile the program with the instrumented runtime. I have compiled the compiler with the --with-instrumented-runtime flag (I have ocamlruni) in my path. I get the following error while trying to compile the way you suggested:

Configuration "true: quiet, runtime_variant(i)", line 1, characters 13-31:
Warning: tag "runtime_variant" does not expect a parameter, but is used with parameter "i"
Configuration "true: quiet, runtime_variant(i)", line 1, characters 13-31:
Warning: the tag "runtime_variant(i)" is not used in any flag declaration, so it will have no effect; it may be a typo. Otherwise use `mark_tag_used` in your myocamlbuild.ml to disable this warning.
Finished, 4 targets (0 cached) in 00:00:00.

I also tried to compile directly to native code using the command

ocamlopt -runtime-variant i main.ml

which does not give an error which I assume to mean that the instrumented runtime exists on my system. I cannot however find any log file nor does running the executable give any information on stdout or stderr.

Am I missing something here?

Thanks,
Pratik Fegade.

On Monday, May 30, 2016 at 3:49:21 PM UTC-4, Gabriel Scherer wrote:
> Dear caml-list,
> 
> You may be interested in the following blog post, in which I give
> instructions to measure the worst-case latencies incurred by the GC:
> 
>   Measuring GC latencies in Haskell, OCaml, Racket
>   http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
> 
> In summary, the commands to measure latencies look something like:
> 
>     % build the program with the instrumented runtime
>     ocamlbuild -tag "runtime_variant(i)" myprog.native
> 
>     % run with instrumentation enabled
>     OCAML_INSTR_FILE="ocaml.log" ./main.native
> 
>     % visualize results from the raw log
>     $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
>     $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log
> 
> While the OCaml GC has had a good incremental mode with low latencies
> for most workloads for a long time, the ability to instrument it to
> actually measure latencies is still in its infancy: it is
> a side-result of Damien Doligez's stint at Jane Street last year, and
> 4.03.0 is the first release in which this work is available.
> 
> A practical consequence of this youth is that the "user experience" of
> actually performing these measurements is currently very bad. The GC
> measurements are activated in an instrumented runtime variant (OCaml
> supports having several variants of the runtime available, and
> deciding which one to use for a specific program at link-time), which
> is the right design choice, but right now this variant is not built by
> default by the compiler distribution -- building it is
> a configure-time option disabled by default. This means that, as
> a user interested in doing the measurements, you have to compile an
> alternative OCaml compiler.
> Furthermore, processing the raw instrumented log requires tool that
> are also in their infancy, and are currently included in the compiler
> distribution sources but not installed -- so you have to have a source
> checkout available to use them. In contrast, GHC's instrumentation is
> enabled by just passing the option "+RTS -s" to the Haskell program of
> interest; this is superbly convenient and something we should aim at.
> 
> I discussed with Damien whether we should enable building the
> instrumented runtime by default (for example pass
> the --with-instrumented-runtime option to the opam switches people are
> using, and encourage distributions to use it in their packages as
> well). Of course there is a cost/benefit trade-off: currently
> virtually nobody is using this instrumentation, but enabling it by
> default would increase the compilation time of the compiler
> distribution for everyone. (On my machine it only adds 5 seconds to
> total build time.)
> 
> I personally think that we should aim for a rock-solid experience for
> profiling and instrumenting OCaml program enabled by default¹. It is
> worth making it slightly longer for anyone to install the compiler if
> we can make it vastly easier to measure GC pauses in our program when
> the need arises (even if it's not very often). But that is
> a discussion that should be had before making any choice.
> 
> Regarding the log analysis tools, before finding about Damien's
> included-but-not installed tools (a shell and an awk script, in the
> finest Unix tradition) I built a quick&dirty OCaml script to do some
> measurements, which can be found in the benchmark repository below. It
> would not be much more work to grow this in a reusable library to
> extract the current log format into a structured data structure -- the
> format is undocumented but the provided scripts in tools/ have enough
> information to infer the structure. Such a script/library would, of
> course, remain tightly coupled to the OCaml version, but I think it
> could be useful to have it packaged for users to play with.
> 
>   https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml
> 
> ¹: We cannot expect users to effectively write performant code if they
> don't have the tool support for it. The fact that lazyness in Haskell
> makes it harder for users to reason about efficiency or memory usage
> has made the avaibility of excellent performance tooling *necessary*,
> where it is merely nice-to-have in OCaml. Rather ironically, Haskell
> tooling is now much better than OCaml's in this area, to the point
> that it can be easier to write efficient code in Haskell.
> 
> Three side-notes on profiling tools:
> 
> 1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
>   (no need for a frame-pointers switch), and this is documented:
>     https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux
> 
> 2. Thomas Leonard has done excellent work on domain-specific profiling
>    tools for Mirage, and this is the kind of tool support that I think
>    should be available to anyone out of the box.
>      http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
>      http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/
> 
> 3. There is currently more debate than anyone could wish for around
>    a pull request of Mark Shinwell for runtime support for dynamic call
>    graph construction and its use for memory profiling.
>      https://github.com/ocaml/ocaml/pull/585
> 
> 4. Providing a good user experience for performance or space profiling
>    is a fundamentally harder problem than for GC pauses. It may
>    require specially-compiled versions of the libraries used by your
>    program, and thus a general policy/agreement across the
>    ecosystem. Swapping a different runtime at link-time is very easy.
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Measuring GC latencies for OCaml program
  2016-09-14  2:51 ` pratikfegade
@ 2016-09-14  8:38   ` Gabriel Scherer
  0 siblings, 0 replies; 9+ messages in thread
From: Gabriel Scherer @ 2016-09-14  8:38 UTC (permalink / raw)
  To: pratikfegade; +Cc: Ocaml Aggregation List, caml users, Damien Doligez

[-- Attachment #1: Type: text/plain, Size: 8225 bytes --]

I suspect a mistake in your OCaml environment configuration. The
runtime_variant(...) parametrized tag was contributed by whitequark for
OCaml 4.02.2, so it should be available in all OCaml versions that also
provide the instrumented runtime (that is, 4.03.0 and future versions).

Once you compile an executable with the instrumented runtime, you should
explicitly set the OCAML_INSTR_FILE="ocaml.log" variable if you want
logging to happen.

On Wed, Sep 14, 2016 at 11:51 AM, <pratikfegade@gmail.com> wrote:

> Hi,
>
> I am not sure  if this is the right place to ask this but this being the
> only reference to the instrumented runtime I found, I am going to go ahead
> and ask.
>
> I am trying to use the instrumented runtime as you described but it does
> not seem to work. I cannot compile the program with the instrumented
> runtime. I have compiled the compiler with the --with-instrumented-runtime
> flag (I have ocamlruni) in my path. I get the following error while trying
> to compile the way you suggested:
>
> Configuration "true: quiet, runtime_variant(i)", line 1, characters 13-31:
> Warning: tag "runtime_variant" does not expect a parameter, but is used
> with parameter "i"
> Configuration "true: quiet, runtime_variant(i)", line 1, characters 13-31:
> Warning: the tag "runtime_variant(i)" is not used in any flag declaration,
> so it will have no effect; it may be a typo. Otherwise use `mark_tag_used`
> in your myocamlbuild.ml to disable this warning.
> Finished, 4 targets (0 cached) in 00:00:00.
>
> I also tried to compile directly to native code using the command
>
> ocamlopt -runtime-variant i main.ml
>
> which does not give an error which I assume to mean that the instrumented
> runtime exists on my system. I cannot however find any log file nor does
> running the executable give any information on stdout or stderr.
>
> Am I missing something here?
>
> Thanks,
> Pratik Fegade.
>
> On Monday, May 30, 2016 at 3:49:21 PM UTC-4, Gabriel Scherer wrote:
> > Dear caml-list,
> >
> > You may be interested in the following blog post, in which I give
> > instructions to measure the worst-case latencies incurred by the GC:
> >
> >   Measuring GC latencies in Haskell, OCaml, Racket
> >   http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-
> latencies-in-haskell-ocaml-racket/
> >
> > In summary, the commands to measure latencies look something like:
> >
> >     % build the program with the instrumented runtime
> >     ocamlbuild -tag "runtime_variant(i)" myprog.native
> >
> >     % run with instrumentation enabled
> >     OCAML_INSTR_FILE="ocaml.log" ./main.native
> >
> >     % visualize results from the raw log
> >     $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
> >     $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log
> >
> > While the OCaml GC has had a good incremental mode with low latencies
> > for most workloads for a long time, the ability to instrument it to
> > actually measure latencies is still in its infancy: it is
> > a side-result of Damien Doligez's stint at Jane Street last year, and
> > 4.03.0 is the first release in which this work is available.
> >
> > A practical consequence of this youth is that the "user experience" of
> > actually performing these measurements is currently very bad. The GC
> > measurements are activated in an instrumented runtime variant (OCaml
> > supports having several variants of the runtime available, and
> > deciding which one to use for a specific program at link-time), which
> > is the right design choice, but right now this variant is not built by
> > default by the compiler distribution -- building it is
> > a configure-time option disabled by default. This means that, as
> > a user interested in doing the measurements, you have to compile an
> > alternative OCaml compiler.
> > Furthermore, processing the raw instrumented log requires tool that
> > are also in their infancy, and are currently included in the compiler
> > distribution sources but not installed -- so you have to have a source
> > checkout available to use them. In contrast, GHC's instrumentation is
> > enabled by just passing the option "+RTS -s" to the Haskell program of
> > interest; this is superbly convenient and something we should aim at.
> >
> > I discussed with Damien whether we should enable building the
> > instrumented runtime by default (for example pass
> > the --with-instrumented-runtime option to the opam switches people are
> > using, and encourage distributions to use it in their packages as
> > well). Of course there is a cost/benefit trade-off: currently
> > virtually nobody is using this instrumentation, but enabling it by
> > default would increase the compilation time of the compiler
> > distribution for everyone. (On my machine it only adds 5 seconds to
> > total build time.)
> >
> > I personally think that we should aim for a rock-solid experience for
> > profiling and instrumenting OCaml program enabled by default¹. It is
> > worth making it slightly longer for anyone to install the compiler if
> > we can make it vastly easier to measure GC pauses in our program when
> > the need arises (even if it's not very often). But that is
> > a discussion that should be had before making any choice.
> >
> > Regarding the log analysis tools, before finding about Damien's
> > included-but-not installed tools (a shell and an awk script, in the
> > finest Unix tradition) I built a quick&dirty OCaml script to do some
> > measurements, which can be found in the benchmark repository below. It
> > would not be much more work to grow this in a reusable library to
> > extract the current log format into a structured data structure -- the
> > format is undocumented but the provided scripts in tools/ have enough
> > information to infer the structure. Such a script/library would, of
> > course, remain tightly coupled to the OCaml version, but I think it
> > could be useful to have it packaged for users to play with.
> >
> >   https://gitlab.com/gasche/gc-latency-experiment/blob/
> master/parse_ocaml_log.ml
> >
> > ¹: We cannot expect users to effectively write performant code if they
> > don't have the tool support for it. The fact that lazyness in Haskell
> > makes it harder for users to reason about efficiency or memory usage
> > has made the avaibility of excellent performance tooling *necessary*,
> > where it is merely nice-to-have in OCaml. Rather ironically, Haskell
> > tooling is now much better than OCaml's in this area, to the point
> > that it can be easier to write efficient code in Haskell.
> >
> > Three side-notes on profiling tools:
> >
> > 1. `perf record --call-graph=dwarf` works fine for ocamlopt binaries
> >   (no need for a frame-pointers switch), and this is documented:
> >     https://ocaml.org/learn/tutorials/performance_and_profiling.html#
> UsingperfonLinux
> >
> > 2. Thomas Leonard has done excellent work on domain-specific profiling
> >    tools for Mirage, and this is the kind of tool support that I think
> >    should be available to anyone out of the box.
> >      http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/
> >      http://roscidus.com/blog/blog/2014/10/27/visualising-an-
> asynchronous-monad/
> >
> > 3. There is currently more debate than anyone could wish for around
> >    a pull request of Mark Shinwell for runtime support for dynamic call
> >    graph construction and its use for memory profiling.
> >      https://github.com/ocaml/ocaml/pull/585
> >
> > 4. Providing a good user experience for performance or space profiling
> >    is a fundamentally harder problem than for GC pauses. It may
> >    require specially-compiled versions of the libraries used by your
> >    program, and thus a general policy/agreement across the
> >    ecosystem. Swapping a different runtime at link-time is very easy.
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa.inria.fr/sympa/arc/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 10570 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-14  8:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-30 19:48 [Caml-list] Measuring GC latencies for OCaml program Gabriel Scherer
2016-05-31  1:13 ` Yaron Minsky
2016-05-31  5:39 ` Malcolm Matalka
2016-06-10 20:35 ` Jon Harrop
2016-06-10 21:34   ` Stanislav Artemkin
2016-06-10 23:14     ` Yaron Minsky
2016-06-11  8:53     ` Jon Harrop
2016-09-14  2:51 ` pratikfegade
2016-09-14  8:38   ` Gabriel Scherer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).