From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 162357EE49 for ; Sun, 15 Sep 2013 00:10:25 +0200 (CEST) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of jacques-henri.jourdan@ens.fr) identity=pra; client-ip=176.31.242.187; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="jacques-henri.jourdan@ens.fr"; x-sender="jacques-henri.jourdan@ens.fr"; x-conformance=sidf_compatible Received-SPF: Neutral (mail3-smtp-sop.national.inria.fr: domain of jacques-henri.jourdan@ens.fr does not assert whether or not 176.31.242.187 is permitted sender) identity=mailfrom; client-ip=176.31.242.187; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="jacques-henri.jourdan@ens.fr"; x-sender="jacques-henri.jourdan@ens.fr"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@ulminfo.fr) identity=helo; client-ip=176.31.242.187; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="jacques-henri.jourdan@ens.fr"; x-sender="postmaster@ulminfo.fr"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmUBAE3eNFKwH/K7nGdsb2JhbABRCoM/wVCBHRYOAQEBAQEIFAk8giYBBXIGEQsSDxYPCQMCAQIBNw4TCAEBFYduCLlajikHBIFGhB4DkCaBL4JKhQuFAY5qgWY X-IPAS-Result: AmUBAE3eNFKwH/K7nGdsb2JhbABRCoM/wVCBHRYOAQEBAQEIFAk8giYBBXIGEQsSDxYPCQMCAQIBNw4TCAEBFYduCLlajikHBIFGhB4DkCaBL4JKhQuFAY5qgWY X-IronPort-AV: E=Sophos;i="4.90,905,1371074400"; d="asc'?scan'208";a="26865626" Received: from ulminfo.fr ([176.31.242.187]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/ADH-AES256-SHA; 15 Sep 2013 00:10:24 +0200 Received: from [192.168.1.20] (ALille-652-1-130-3.w83-198.abo.wanadoo.fr [83.198.81.3]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by ulminfo.fr (Postfix) with ESMTPSA id 1BD2E40D47 for ; Sun, 15 Sep 2013 00:10:21 +0200 (CEST) Message-ID: <5234DECC.4090808@ens.fr> Date: Sun, 15 Sep 2013 00:10:20 +0200 From: Jacques-Henri Jourdan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: caml-list@inria.fr References: <20130913154834.GA6566@three-tuns.net> In-Reply-To: <20130913154834.GA6566@three-tuns.net> X-Enigmail-Version: 1.5.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="HgEk2wJkkLHm6e0jbAt3dADkMcfC0vfrN" Subject: Re: [Caml-list] Allocation profiling for x86-64 native code This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --HgEk2wJkkLHm6e0jbAt3dADkMcfC0vfrN Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable This is a really interesting work ! Actually, I have had this project of making a memory profiler for Ocaml for a few months. My idea was to do some statistical profiling by annotating only a fraction of the allocated blocks. Here was the advantages I thought about : 1- Lower execution time overhead. BTW, what is yours ? 2- When annotating a block, we could decide to store more information. Typically, it could be very profitable to know (at least a part of) the current backtrace while allocating. 3- We could also analyze more precisely the life of annotated objects without much performance constrains, because they are fewer. We could for example put watchpoints on them to know when they accessed, or do statistics about there life time... 4- Your method for annotating blocks uses the 22 highest bits of the blocks headers to store the bits 4..25 of the allocation point address. I can see several (minor) problems of doing that - The maximum size of a block is then limited to 32GB. - That does mean that those 22 bits identify the allocation point, and I am not convinced that the probability of collision is negligible in the case of large code base (like code) non-contiguously loaded in memory because of dynlink, for example. - This is not usable for x86-32 bits. With statistical profiling, we can afford having a separate table of traced blocks, that we would maintain at the end of each GC phase. This way, we don't actually "annotate" blocks, but we rather annotate the corresponding table entry. 5- It is not necessary to walk the whole heap to understand some of its properties, but rather only the traced blocks. There is an easy and cheap way to do statistical profiling of memory allocation: we could decide that each allocation exceeding caml_young_limit should receive a special treatment in order to be traced. So, do you think this would be a good idea to implement ? Any other comments ? -- JH Jourdan Le 13/09/2013 17:48, Mark Shinwell a =E9crit : > Large OCaml programs can experience performance degradation due to high > garbage collection loads (or possibly due to it being Friday 13th). > Understanding the memory usage of such programs can also be difficult. >=20 > To this end, I am pleased to release a version of OCaml 4.01 that > contains functionality for the memory profiling of native code programs, > for the x86-64 architecture. Currently it is only fully working on Linux > platforms, but there should be a version for Mac OS X in the near future, > and the BSDs. >=20 > opam remote add mshinwell git://github.com/mshinwell/opam-repo-dev > opam update > opam switch 4.01-allocation-profiling >=20 > The source is on GitHub: > https://github.com/mshinwell/ocaml/tree/4.01-allocation-profiling >=20 > Using ocamlopt with -allocation-tracing and running in an environment > with the OCAMLRUNPARAM environment variable including the letter "T" > enables the use of the functionality in the new [Allocation_profiling] > standard library module [1]. You should also ensure that you have > the new ocamlmklocs script (installed to the same place as the > compiler binaries) on your PATH at compile time. >=20 > The runtime system for this compiler contains instrumentation that can > produce a global analysis showing the total number of words allocated > on the OCaml heaps by source location. This works not only for blocks > allocated in OCaml code but also in C stubs. Further, values are > instrumented---without space overhead---in order to be able to determine > from a snapshot of the heap which value was allocated where; and also > to provide a runtime API that can be queried from the instrumented > program itself. Following Unix tradition, there is no shiny user > interface. Scripts are provided to decode the data from the former > two analyses. There is also a script that can draw a graph of the > heap quotiented by the equivalence relation that identifies two > blocks iff they were allocated at the same source location. >=20 > Programs compiled with allocation profiling will run slower than under > normal compilation, but this degradation should not be that > significant. They will use a little more memory than normal, but not > much, and the amount of increase roughly speaking is about twice the > size of the machine code in your program. (In particular there is > no overhead per value allocated.) >=20 > Source locations reported are very slightly approximated, but this > should not normally cause a problem. >=20 > Sometimes the source location that appears in the profile may not be > quite the function you're looking for (e.g. some allocation function > that's called from multiple places; the allocation function rather > than the callers might show up). The system goes to some rudimentary > efforts to avoid this by looking back up the stack one level under > certain conditions, but if you get stuck, you can set a breakpoint in > gdb on the function identified in the profile and collect a backtrace > every time you pass it. These can then be uniquified by a shell script > left as an exercise to the reader. (This technique has been discussed > previously on this list.) >=20 > This is not yet a fully-polished system, but it has been used at Jane > Street on rather large OCaml programs with success. The part that still > requires most work is the runtime API. If you experience long compile > times then you can disable the runtime API support by editing the > ocamlmklocs script to write an empty file; fixing this is on the list. > (See the comment in stdlib/allocation_profiling.mli.) This is on the > list to be fixed. >=20 > I would be interested to hear of reports of success or failure; or > feature requests. One feature on the near-term list is being able to > measure how long a particular value has been in existence. >=20 > Have fun. >=20 > Mark >=20 > P.S. There is some related work going on at OCamlPro using similar > techniques. These projects were developed independently, but we expect > to collaborate on getting some of this technology into the main > distribution. >=20 > [1] https://github.com/mshinwell/ocaml/blob/4.01-allocation-profiling/std= lib/allocation_profiling.mli >=20 --HgEk2wJkkLHm6e0jbAt3dADkMcfC0vfrN Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSNN7MAAoJEGHoGlEY1GjFfkkIALXhOHEVvNNPDAYuUwygcjkF 29ZW2sCzDy9Xtyeb3orm3T2V9e1wlUjVe3Xh2elhvdj0ckHNC4ro0Z1qH1oAgFZT P5IY2RnrJdMS8ikckHUcB1Zab0jscLDS759Byzt/v/A4uV6a+YCRa3mugyP4WEd/ f7ieskwWccGiCs9hJAbhW7XoBZik37YR67o67pURaMNEDFuyKv45gi8M+HrW1y6h o1BLfGnOGqzXN4WFE3usbIDWMSZx/SfXlXjNRCGIEA9DLfVnANOtOcIJ+JWywYNm s2X6uQawkCcdU8l4D+2Z5yKrKV1lezYt/7iB48Q6mXSXuMZfNBPqZ9Ab5MSiJF4= =1j6e -----END PGP SIGNATURE----- --HgEk2wJkkLHm6e0jbAt3dADkMcfC0vfrN--