From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 9A9267EE49 for ; Fri, 13 Sep 2013 17:48:56 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of mark@three-tuns.net) identity=pra; client-ip=93.93.131.93; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="mark@three-tuns.net"; x-sender="mark@three-tuns.net"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of mark@three-tuns.net) identity=mailfrom; client-ip=93.93.131.93; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="mark@three-tuns.net"; x-sender="mark@three-tuns.net"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mash.three-tuns.net) identity=helo; client-ip=93.93.131.93; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="mark@three-tuns.net"; x-sender="postmaster@mash.three-tuns.net"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnQGAHszM1JdXYNd/2dsb2JhbABbgwc4DoMFvjiBFBZ0gmY4NA80BSgNiC4ImRWhCY4ngX+CcIEAA5Qeg1uBMJBFgyU7 X-IPAS-Result: AnQGAHszM1JdXYNd/2dsb2JhbABbgwc4DoMFvjiBFBZ0gmY4NA80BSgNiC4ImRWhCY4ngX+CcIEAA5Qeg1uBMJBFgyU7 X-IronPort-AV: E=Sophos;i="4.90,899,1371074400"; d="scan'208";a="32790478" Received: from mash.three-tuns.net ([93.93.131.93]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 13 Sep 2013 17:48:40 +0200 Received: from mark by mash.three-tuns.net with local (Exim 4.80) (envelope-from ) id 1VKVbm-0005Db-H4 for caml-list@inria.fr; Fri, 13 Sep 2013 16:48:34 +0100 Date: Fri, 13 Sep 2013 16:48:34 +0100 From: Mark Shinwell To: caml-list@inria.fr Message-ID: <20130913154834.GA6566@three-tuns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Validation-by: mark@three-tuns.net Subject: [Caml-list] Allocation profiling for x86-64 native code Large OCaml programs can experience performance degradation due to high garbage collection loads (or possibly due to it being Friday 13th). Understanding the memory usage of such programs can also be difficult. To this end, I am pleased to release a version of OCaml 4.01 that contains functionality for the memory profiling of native code programs, for the x86-64 architecture. Currently it is only fully working on Linux platforms, but there should be a version for Mac OS X in the near future, and the BSDs. opam remote add mshinwell git://github.com/mshinwell/opam-repo-dev opam update opam switch 4.01-allocation-profiling The source is on GitHub: https://github.com/mshinwell/ocaml/tree/4.01-allocation-profiling Using ocamlopt with -allocation-tracing and running in an environment with the OCAMLRUNPARAM environment variable including the letter "T" enables the use of the functionality in the new [Allocation_profiling] standard library module [1]. You should also ensure that you have the new ocamlmklocs script (installed to the same place as the compiler binaries) on your PATH at compile time. The runtime system for this compiler contains instrumentation that can produce a global analysis showing the total number of words allocated on the OCaml heaps by source location. This works not only for blocks allocated in OCaml code but also in C stubs. Further, values are instrumented---without space overhead---in order to be able to determine from a snapshot of the heap which value was allocated where; and also to provide a runtime API that can be queried from the instrumented program itself. Following Unix tradition, there is no shiny user interface. Scripts are provided to decode the data from the former two analyses. There is also a script that can draw a graph of the heap quotiented by the equivalence relation that identifies two blocks iff they were allocated at the same source location. Programs compiled with allocation profiling will run slower than under normal compilation, but this degradation should not be that significant. They will use a little more memory than normal, but not much, and the amount of increase roughly speaking is about twice the size of the machine code in your program. (In particular there is no overhead per value allocated.) Source locations reported are very slightly approximated, but this should not normally cause a problem. Sometimes the source location that appears in the profile may not be quite the function you're looking for (e.g. some allocation function that's called from multiple places; the allocation function rather than the callers might show up). The system goes to some rudimentary efforts to avoid this by looking back up the stack one level under certain conditions, but if you get stuck, you can set a breakpoint in gdb on the function identified in the profile and collect a backtrace every time you pass it. These can then be uniquified by a shell script left as an exercise to the reader. (This technique has been discussed previously on this list.) This is not yet a fully-polished system, but it has been used at Jane Street on rather large OCaml programs with success. The part that still requires most work is the runtime API. If you experience long compile times then you can disable the runtime API support by editing the ocamlmklocs script to write an empty file; fixing this is on the list. (See the comment in stdlib/allocation_profiling.mli.) This is on the list to be fixed. I would be interested to hear of reports of success or failure; or feature requests. One feature on the near-term list is being able to measure how long a particular value has been in existence. Have fun. Mark P.S. There is some related work going on at OCamlPro using similar techniques. These projects were developed independently, but we expect to collaborate on getting some of this technology into the main distribution. [1] https://github.com/mshinwell/ocaml/blob/4.01-allocation-profiling/stdlib/allocation_profiling.mli