From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id EAF937FD92 for ; Fri, 10 Jun 2016 23:34:21 +0200 (CEST) IronPort-PHdr: 9a23:3Iyj4Bf1uPdNN4h4v0ZAaDsNlGMj4u6mDksu8pMizoh2WeGdxc68Yh7h7PlgxGXEQZ/co6odzbGG4ua/CSdYvN6oizMrTt9lb1c9k8IYnggtUoauKHbQC7rUVRE8B9lIT1R//nu2YgB/Ecf6YEDO8DXptWZBUiv2OQc9HOnpAIma153xjLDjvcKDKF0VzBOGIppMbzyO5T3LsccXhYYwYo0Q8TDu5kVyRuJN2GlzLkiSlRuvru25/Zpk7jgC86l5r50IeezAcq85Vb1VCig9eyBwvZWz9EqLcQza5HwaemsYmR1OGBXB8Bj8VYa3uSz/5cRn3yzPBtH/S7EvXT28p45xVBLtiyYBf2ow6n3aj89xiopUpRugo1p0xIuCM9LdD+Z3Yq6IJYBSfmFGRMsEEnUZWo4= Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=artemkin@gmail.com; spf=Pass smtp.mailfrom=artemkin@gmail.com; spf=None smtp.helo=postmaster@mail-wm0-f53.google.com Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of artemkin@gmail.com) identity=pra; client-ip=74.125.82.53; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="artemkin@gmail.com"; x-sender="artemkin@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of artemkin@gmail.com designates 74.125.82.53 as permitted sender) identity=mailfrom; client-ip=74.125.82.53; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="artemkin@gmail.com"; x-sender="artemkin@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-wm0-f53.google.com) identity=helo; client-ip=74.125.82.53; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="artemkin@gmail.com"; x-sender="postmaster@mail-wm0-f53.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0D9AQDnMVtXdzVSfUpegnCBJH0GqgmFcoYlhQCBeiSFbwICAoEkBzgUAQEBAQEBAQERAQoLCwkfMYIwghUBAQEDARIRHQEbEgsBAwgEBgULBgQBAQ4aAwICIgERAQUBCgEDAQUIGRIJB4dzAQMPCA6fUoExPjGLO4FqglkFhz8KGScDClKDLAEBAQEBBQEBAQEBAQEBARYCBhCKZIQqQ4JUgloFhkUMiA2KAYYEiCSCN4xpjiwSHoEPDw+CMA0cgVA3MgGKBwEBAQ X-IPAS-Result: A0D9AQDnMVtXdzVSfUpegnCBJH0GqgmFcoYlhQCBeiSFbwICAoEkBzgUAQEBAQEBAQERAQoLCwkfMYIwghUBAQEDARIRHQEbEgsBAwgEBgULBgQBAQ4aAwICIgERAQUBCgEDAQUIGRIJB4dzAQMPCA6fUoExPjGLO4FqglkFhz8KGScDClKDLAEBAQEBBQEBAQEBAQEBARYCBhCKZIQqQ4JUgloFhkUMiA2KAYYEiCSCN4xpjiwSHoEPDw+CMA0cgVA3MgGKBwEBAQ X-IronPort-AV: E=Sophos;i="5.26,452,1459807200"; d="scan'208,217";a="180896326" Received: from mail-wm0-f53.google.com ([74.125.82.53]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 10 Jun 2016 23:34:19 +0200 Received: by mail-wm0-f53.google.com with SMTP id m124so7994371wme.1; Fri, 10 Jun 2016 14:34:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=cFdS7CFQjiKVcYdp4h8+kPaVgvbl552Eh2e2reuSzs0=; b=J0rSM9GZqdyVqqxqp+eNf/gPFtkH8Bx+Sn+XDCK9Zwp5ON17ojKEvI0qUFMycjkJo5 Dlv2IQ2Ze/39Cjkk1tIijaeN6vTOddLpdZwRu7tfLmDWyHEGlJsY5uMDuF/U0MRxpgbV BhdUATgFT3miQWP43/BCQqSHxqcsiuvNPsMt90TisWHoB241JqMGLPkdyJeXqr8ZcHj6 MVeYv0mIQ1z3MLSMXW/z25pIGaCUneI7niBIzav33AzvP9zzEHUPx2LPkc/BzO2/kvft GjHhwxGGfgWwLQFDijDkQSwgqZGGGslUcDjDHm62BYz0VVkgK+inVvv9Bu4JFlZzmgcm mFEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=cFdS7CFQjiKVcYdp4h8+kPaVgvbl552Eh2e2reuSzs0=; b=eCKm65iVG2sW4QmX2HmzPUN6EFwe271oMSrmRur5VXEZqgv7vd5VjylEGLK6MpAUsu iMRetoeI5I/zGXcC9MKBuTGzuc6U+XOo4ZbHGRSpyXZAsN6zV7XyX3U+MWp+JAbsit0F 9kyD1lOagRPBcH/96zpE5CmUiV98qJMu0gfRJY3YLkbwcYlEKWKfLcgIdvy947E15SaV uKrLIZyJJU/4yI2sA3Nvb6CA0TOTkSRHsi6uZzDaxuMTI174VOnxPrY2owisiXfnyhBl 87vJm1T+tv/sRdmmxO7YHHAE37jx/ED/SuSP+xdG7LwWsM8SBXOOX96QQiMJ/vOMB67W NxTw== X-Gm-Message-State: ALyK8tL2o9MA6RAHlI3uCQiQSTHQwO+4ATdOCu3SjR/Pxo+s3y41rim5D/3e5fhXcUtxGCcsINvieKDh6IteNg== X-Received: by 10.195.18.3 with SMTP id gi3mr4425488wjd.66.1465594459492; Fri, 10 Jun 2016 14:34:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.57.169 with HTTP; Fri, 10 Jun 2016 14:34:18 -0700 (PDT) In-Reply-To: <00c801d1c357$95598ca0$c00ca5e0$@ffconsultancy.com> References: <00c801d1c357$95598ca0$c00ca5e0$@ffconsultancy.com> From: Stanislav Artemkin Date: Sat, 11 Jun 2016 01:34:18 +0400 Message-ID: To: jon@ffconsultancy.com Cc: Gabriel Scherer , caml users , Damien Doligez Content-Type: multipart/alternative; boundary=001a11c29692249c360534f34dfb Subject: Re: [Caml-list] Measuring GC latencies for OCaml program --001a11c29692249c360534f34dfb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Very interesting! It seems it was completely broken C++ solution. Wish I could use OCaml for current project, but we still have to use C++ to get microsecond latencies. Do I correctly understand that OCaml is suitable for latencies ~10ms and worse? Also, there is still an issue with multithreading. Did you use any existing solution? Thank you On Sat, Jun 11, 2016 at 12:35 AM, Jon Harrop wrote: > > Very interesting, thank you! > > We just implemented a substantial client and server system for the finance > sector with the "low" latency server written in OCaml. I have done this > before in other languages and seen it done in many more languages. The > OCaml is by far the consistently-fastest solution I have ever seen. Orders > of magnitude faster than the last C++ solution I saw. In particular, > compared to Java and .NET where we see substantial latencies from the GC = at > around 100ms, with OCaml there is no visible peak at high latency due to > the GC at all. And this project was implemented to a very short deadline > with no time for optimisation at all. > > On a related note, we used Jane St.'s Core and Async libraries as well as > Cohttp and found them all to be *phenomenally* efficient and robust. > > In case anyone is interested, the only pain point I had was the > development environment. I actually prototyped all my hard code in > simplified F# in Visual Studio on Windows and then ported to OCaml. Emacs > and Merlin crash and hang a lot for me: maybe 50 times per day. Hence my > other post. :-) > > In terms of the language, OCaml was very well suited to this task. Lots of > purely functional data structures forming in-memory databases that can be > queried in different ways and have many different versions of them stored > in different places at different times. Perhaps the main language feature= I > missed from F# was (surprisingly!) reflection. My F# client code uses > reflection to serialize and deserialize messages. With no reflection I > couldn't do that in OCaml so I used reflection in F# to autogenerate the > OCaml code. > > Cheers, > Jon. > > -----Original Message----- > From: caml-list-request@inria.fr [mailto:caml-list-request@inria.fr] On > Behalf Of Gabriel Scherer > Sent: 30 May 2016 20:48 > To: caml users > Cc: Damien Doligez > Subject: [Caml-list] Measuring GC latencies for OCaml program > > Dear caml-list, > > You may be interested in the following blog post, in which I give > instructions to measure the worst-case latencies incurred by the GC: > > Measuring GC latencies in Haskell, OCaml, Racket > > http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-= ocaml-racket/ > > In summary, the commands to measure latencies look something like: > > % build the program with the instrumented runtime > ocamlbuild -tag "runtime_variant(i)" myprog.native > > % run with instrumentation enabled > OCAML_INSTR_FILE=3D"ocaml.log" ./main.native > > % visualize results from the raw log > $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log > $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log > > While the OCaml GC has had a good incremental mode with low latencies for > most workloads for a long time, the ability to instrument it to actually > measure latencies is still in its infancy: it is a side-result of Damien > Doligez's stint at Jane Street last year, and > 4.03.0 is the first release in which this work is available. > > A practical consequence of this youth is that the "user experience" of > actually performing these measurements is currently very bad. The GC > measurements are activated in an instrumented runtime variant (OCaml > supports having several variants of the runtime available, and deciding > which one to use for a specific program at link-time), which is the right > design choice, but right now this variant is not built by default by the > compiler distribution -- building it is a configure-time option disabled = by > default. This means that, as a user interested in doing the measurements, > you have to compile an alternative OCaml compiler. > Furthermore, processing the raw instrumented log requires tool that are > also in their infancy, and are currently included in the compiler > distribution sources but not installed -- so you have to have a source > checkout available to use them. In contrast, GHC's instrumentation is > enabled by just passing the option "+RTS -s" to the Haskell program of > interest; this is superbly convenient and something we should aim at. > > I discussed with Damien whether we should enable building the instrumented > runtime by default (for example pass the --with-instrumented-runtime opti= on > to the opam switches people are using, and encourage distributions to use > it in their packages as well). Of course there is a cost/benefit trade-of= f: > currently virtually nobody is using this instrumentation, but enabling it > by default would increase the compilation time of the compiler distributi= on > for everyone. (On my machine it only adds 5 seconds to total build time.) > > I personally think that we should aim for a rock-solid experience for > profiling and instrumenting OCaml program enabled by default=C2=B9. It is= worth > making it slightly longer for anyone to install the compiler if we can ma= ke > it vastly easier to measure GC pauses in our program when the need arises > (even if it's not very often). But that is a discussion that should be had > before making any choice. > > Regarding the log analysis tools, before finding about Damien's > included-but-not installed tools (a shell and an awk script, in the finest > Unix tradition) I built a quick&dirty OCaml script to do some measurement= s, > which can be found in the benchmark repository below. It would not be much > more work to grow this in a reusable library to extract the current log > format into a structured data structure -- the format is undocumented but > the provided scripts in tools/ have enough information to infer the > structure. Such a script/library would, of course, remain tightly coupled > to the OCaml version, but I think it could be useful to have it packaged > for users to play with. > > > https://gitlab.com/gasche/gc-latency-experiment/blob/master/parse_ocaml_l= og.ml > > =C2=B9: We cannot expect users to effectively write performant code if th= ey > don't have the tool support for it. The fact that lazyness in Haskell mak= es > it harder for users to reason about efficiency or memory usage has made t= he > avaibility of excellent performance tooling *necessary*, where it is mere= ly > nice-to-have in OCaml. Rather ironically, Haskell tooling is now much > better than OCaml's in this area, to the point that it can be easier to > write efficient code in Haskell. > > Three side-notes on profiling tools: > > 1. `perf record --call-graph=3Ddwarf` works fine for ocamlopt binaries > (no need for a frame-pointers switch), and this is documented: > > https://ocaml.org/learn/tutorials/performance_and_profiling.html#Usingper= fonLinux > > 2. Thomas Leonard has done excellent work on domain-specific profiling > tools for Mirage, and this is the kind of tool support that I think > should be available to anyone out of the box. > http://roscidus.com/blog/blog/2014/08/15/optimising-the-unikernel/ > > http://roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-mona= d/ > > 3. There is currently more debate than anyone could wish for around > a pull request of Mark Shinwell for runtime support for dynamic call > graph construction and its use for memory profiling. > https://github.com/ocaml/ocaml/pull/585 > > 4. Providing a good user experience for performance or space profiling > is a fundamentally harder problem than for GC pauses. It may > require specially-compiled versions of the libraries used by your > program, and thus a general policy/agreement across the > ecosystem. Swapping a different runtime at link-time is very easy. > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs=3D > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > --001a11c29692249c360534f34dfb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Very interesting! It seems = it was completely broken C++ solution. Wish I could use OCaml for current p= roject, but we still have to use C++ to get microsecond latencies.
Do I co= rrectly understand that OCaml is suitable for latencies ~10ms and worse?

Al= so, there is still an issue with multithreading. Did you use any existing s= olution?

Thank you

On Sat, Jun 11, 2016 at 12:35 AM, Jon Harrop <jon@ffconsu= ltancy.com> wrote:

Very interesting, thank you!

We just implemented a substantial client and server system for the finance = sector with the "low" latency server written in OCaml. I have don= e this before in other languages and seen it done in many more languages. T= he OCaml is by far the consistently-fastest solution I have ever seen. Orde= rs of magnitude faster than the last C++ solution I saw. In particular, com= pared to Java and .NET where we see substantial latencies from the GC at ar= ound 100ms, with OCaml there is no visible peak at high latency due to the = GC at all. And this project was implemented to a very short deadline with n= o time for optimisation at all.

On a related note, we used Jane St.'s Core and Async libraries as well = as Cohttp and found them all to be *phenomenally* efficient and robust.

In case anyone is interested, the only pain point I had was the development= environment. I actually prototyped all my hard code in simplified F# in Vi= sual Studio on Windows and then ported to OCaml. Emacs and Merlin crash and= hang a lot for me: maybe 50 times per day. Hence my other post. :-)

In terms of the language, OCaml was very well suited to this task. Lots of = purely functional data structures forming in-memory databases that can be q= ueried in different ways and have many different versions of them stored in= different places at different times. Perhaps the main language feature I m= issed from F# was (surprisingly!) reflection. My F# client code uses reflec= tion to serialize and deserialize messages. With no reflection I couldn'= ;t do that in OCaml so I used reflection in F# to autogenerate the OCaml co= de.

Cheers,
Jon.

-----Original Message-----
From: caml-list-request@inria= .fr [mailto:caml-list-req= uest@inria.fr] On Behalf Of Gabriel Scherer
Sent: 30 May 2016 20:48
To: caml users
Cc: Damien Doligez
Subject: [Caml-list] Measuring GC latencies for OCaml program

Dear caml-list,

You may be interested in the following blog post, in which I give instructi= ons to measure the worst-case latencies incurred by the GC:

=C2=A0 Measuring GC latencies in Haskell, OCaml, Racket
=C2=A0 http://= prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-rac= ket/

In summary, the commands to measure latencies look something like:

=C2=A0 =C2=A0 % build the program with the instrumented runtime
=C2=A0 =C2=A0 ocamlbuild -tag "runtime_variant(i)" myprog.native<= br>
=C2=A0 =C2=A0 % run with instrumentation enabled
=C2=A0 =C2=A0 OCAML_INSTR_FILE=3D"ocaml.log" ./main.native

=C2=A0 =C2=A0 % visualize results from the raw log
=C2=A0 =C2=A0 $(OCAML_SOURCES)/tools/ocaml-instr-graph ocaml.log
=C2=A0 =C2=A0 $(OCAML_SOURCES)/tools/ocaml-instr-report ocaml.log

While the OCaml GC has had a good incremental mode with low latencies for m= ost workloads for a long time, the ability to instrument it to actually mea= sure latencies is still in its infancy: it is a side-result of Damien Dolig= ez's stint at Jane Street last year, and
4.03.0 is the first release in which this work is available.

A practical consequence of this youth is that the "user experience&quo= t; of actually performing these measurements is currently very bad. The GC = measurements are activated in an instrumented runtime variant (OCaml suppor= ts having several variants of the runtime available, and deciding which one= to use for a specific program at link-time), which is the right design cho= ice, but right now this variant is not built by default by the compiler dis= tribution -- building it is a configure-time option disabled by default. Th= is means that, as a user interested in doing the measurements, you have to = compile an alternative OCaml compiler.
Furthermore, processing the raw instrumented log requires tool that are als= o in their infancy, and are currently included in the compiler distribution= sources but not installed -- so you have to have a source checkout availab= le to use them. In contrast, GHC's instrumentation is enabled by just p= assing the option "+RTS -s" to the Haskell program of interest; t= his is superbly convenient and something we should aim at.

I discussed with Damien whether we should enable building the instrumented = runtime by default (for example pass the --with-instrumented-runtime option= to the opam switches people are using, and encourage distributions to use = it in their packages as well). Of course there is a cost/benefit trade-off:= currently virtually nobody is using this instrumentation, but enabling it = by default would increase the compilation time of the compiler distribution= for everyone. (On my machine it only adds 5 seconds to total build time.)<= br>
I personally think that we should aim for a rock-solid experience for profi= ling and instrumenting OCaml program enabled by default=C2=B9. It is worth = making it slightly longer for anyone to install the compiler if we can make= it vastly easier to measure GC pauses in our program when the need arises = (even if it's not very often). But that is a discussion that should be = had before making any choice.

Regarding the log analysis tools, before finding about Damien's include= d-but-not installed tools (a shell and an awk script, in the finest Unix tr= adition) I built a quick&dirty OCaml script to do some measurements, wh= ich can be found in the benchmark repository below. It would not be much mo= re work to grow this in a reusable library to extract the current log forma= t into a structured data structure -- the format is undocumented but the pr= ovided scripts in tools/ have enough information to infer the structure. Su= ch a script/library would, of course, remain tightly coupled to the OCaml v= ersion, but I think it could be useful to have it packaged for users to pla= y with.

=C2=A0 https://gitlab.= com/gasche/gc-latency-experiment/blob/master/parse_ocaml_log.ml

=C2=B9: We cannot expect users to effectively write performant code if they= don't have the tool support for it. The fact that lazyness in Haskell = makes it harder for users to reason about efficiency or memory usage has ma= de the avaibility of excellent performance tooling *necessary*, where it is= merely nice-to-have in OCaml. Rather ironically, Haskell tooling is now mu= ch better than OCaml's in this area, to the point that it can be easier= to write efficient code in Haskell.

Three side-notes on profiling tools:

1. `perf record --call-graph=3Ddwarf` works fine for ocamlopt binaries
=C2=A0 (no need for a frame-pointers switch), and this is documented:
=C2=A0 =C2=A0 https= ://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinu= x

2. Thomas Leonard has done excellent work on domain-specific profiling
=C2=A0 =C2=A0tools for Mirage, and this is the kind of tool support that I = think
=C2=A0 =C2=A0should be available to anyone out of the box.
=C2=A0 =C2=A0 =C2=A0http://roscidu= s.com/blog/blog/2014/08/15/optimising-the-unikernel/
=C2=A0 =C2=A0 =C2=A0http:= //roscidus.com/blog/blog/2014/10/27/visualising-an-asynchronous-monad/<= br>
3. There is currently more debate than anyone could wish for around
=C2=A0 =C2=A0a pull request of Mark Shinwell for runtime support for dynami= c call
=C2=A0 =C2=A0graph construction and its use for memory profiling.
=C2=A0 =C2=A0 =C2=A0https://github.com/ocaml/ocaml/pull/585

4. Providing a good user experience for performance or space profiling
=C2=A0 =C2=A0is a fundamentally harder problem than for GC pauses. It may =C2=A0 =C2=A0require specially-compiled versions of the libraries used by y= our
=C2=A0 =C2=A0program, and thus a general policy/agreement across the
=C2=A0 =C2=A0ecosystem. Swapping a different runtime at link-time is very e= asy.

--
Caml-list mailing list.=C2=A0 Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocam= l_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs=3D<= /a>

--001a11c29692249c360534f34dfb--