From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 501C87EE49 for ; Mon, 16 Sep 2013 10:42:13 +0200 (CEST) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of mshinwell@janestreet.com) identity=pra; client-ip=38.105.200.229; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="mshinwell@janestreet.com"; x-sender="mshinwell@janestreet.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of mshinwell@janestreet.com designates 38.105.200.229 as permitted sender) identity=mailfrom; client-ip=38.105.200.229; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="mshinwell@janestreet.com"; x-sender="mshinwell@janestreet.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@tot-dmz-mxout1.janestreet.com) identity=helo; client-ip=38.105.200.229; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="mshinwell@janestreet.com"; x-sender="postmaster@tot-dmz-mxout1.janestreet.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkIBAPHDNlImacjlnGdsb2JhbABQCoQRwHWBEB4OAQEBAQEGFgk8giUBAQQBQAEBNwEECwsLDS4iEgEFARwGExmHZAYDnHKLDIRMAQWNYwaOMASBPweEHpd+kAsYKYRNgWc X-IPAS-Result: AkIBAPHDNlImacjlnGdsb2JhbABQCoQRwHWBEB4OAQEBAQEGFgk8giUBAQQBQAEBNwEECwsLDS4iEgEFARwGExmHZAYDnHKLDIRMAQWNYwaOMASBPweEHpd+kAsYKYRNgWc X-IronPort-AV: E=Sophos;i="4.90,913,1371074400"; d="scan'208";a="26972731" Received: from mx5.janestreet.com (HELO tot-dmz-mxout1.janestreet.com) ([38.105.200.229]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 16 Sep 2013 10:42:12 +0200 Received: from tot-oib-smtp1.delacy.com ([172.27.22.15] helo=tot-smtp) by tot-dmz-mxout1.janestreet.com with esmtp (Exim 4.76) (envelope-from ) id 1VLUNn-0007Qw-5B for caml-list@inria.fr; Mon, 16 Sep 2013 04:42:11 -0400 Received: from tot-dmz-mxgoog1.delacy.com ([172.27.224.14] helo=mxgoog2.janestreet.com) by tot-smtp with esmtps (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1VLUNn-00064f-4D for caml-list@inria.fr; Mon, 16 Sep 2013 04:42:11 -0400 Received: from mail-ve0-f180.google.com ([209.85.128.180]) by mxgoog2.janestreet.com with esmtp (Exim 4.76) (envelope-from ) id 1VLUNn-0007OK-00 for caml-list@inria.fr; Mon, 16 Sep 2013 04:42:11 -0400 Received: by mail-ve0-f180.google.com with SMTP id jz11so2636966veb.25 for ; Mon, 16 Sep 2013 01:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wntMBlDxa+IfRzmqVGhUVgVhRS70Ttp42GzuepWorn8=; b=zJATTSZR1M837Nip7haJvRjm/LnNXahoeg1C8b4wOsLtRcu9DV4Z+7Jbm50jup2ddw Wu7cHlIBIfSVi5sw2wA3LR5w4rP7/myqm9eLiNz9GBpeMoZbUkMdhohBUuYBi2AsMo0P 9KSPXcwSt5iqiJWmyuLX3+AMVPVVu8KVPCnPc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=wntMBlDxa+IfRzmqVGhUVgVhRS70Ttp42GzuepWorn8=; b=j4L0GHm5y6JEvpVqI633o7Vj+ZShLYmA9W2hgkSCSKJgAw378bwOgSWywyY2TmgHtF EcWSfGU0I/wayO8lwX76C6LwIh4zVk0Ho60YxlXV5FegZ+hJIlm6zmuo85Dqm+heUTpP zkBGgN3CuruNez9qhnKmHN54ApChfU9HMFlSqCEBZv7MBI1mL0gyUyY/VJzhIeoN1IQA ahjBPQf6NbsidGFIteEvM+rwoQWRELux5m6Qpvvn/+iqHZ91Z9OOf4wikeSUim/26xMD Gh/QkJpnoyT5ZazbTpMFhaH/vUX2FHCGkrRR6VYkDZW6JamNjYe2yHBeKzoi3zM9mKBL 8pJg== X-Gm-Message-State: ALoCoQk58l+Foz4pPinMh4E1L/oHEy1ONzHRxcd4kfSic6XOTwvrNqSBr4lUDhX5Jcgwih29dSdszULAjLGM1rqhK25g0YN27sIodVDGOKCJxz1sSlyKpjgXplOa4+obztlDyA4hIt6dLORwprCmeVggUHUJWy1AOQ== X-Received: by 10.220.181.136 with SMTP id by8mr26493241vcb.11.1379320930832; Mon, 16 Sep 2013 01:42:10 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.220.181.136 with SMTP id by8mr26493231vcb.11.1379320930731; Mon, 16 Sep 2013 01:42:10 -0700 (PDT) Received: by 10.58.144.198 with HTTP; Mon, 16 Sep 2013 01:42:10 -0700 (PDT) In-Reply-To: <5234DECC.4090808@ens.fr> References: <20130913154834.GA6566@three-tuns.net> <5234DECC.4090808@ens.fr> Date: Mon, 16 Sep 2013 09:42:10 +0100 Message-ID: From: Mark Shinwell To: Jacques-Henri Jourdan Cc: "caml-list@inria.fr" Content-Type: text/plain; charset=ISO-8859-1 X-Validation-by: mshinwell@janestreet.com Subject: Re: [Caml-list] Allocation profiling for x86-64 native code On 14 September 2013 23:10, Jacques-Henri Jourdan wrote: > statistical profiling by > annotating only a fraction of the allocated blocks. I think it's certainly worth experimenting with different approaches. I've tried to address your points below. > Here was the advantages I thought about : > > 1- Lower execution time overhead. BTW, what is yours ? I don't have any exact figures to hand, and in fact, it will potentially vary quite a lot depending on the amount of allocation. I think the time overhead is maybe 20% at the moment for a large allocation-heavy application. This could be decreased somewhat---firstly by optimizing, and secondly by maybe allowing the "global" (cf. [Allocation_profiling.Global] in the stdlib) analysis to be disabled. (This analysis was actually the predecessor of the value-annotating analysis.) The remaining overhead of annotating the values is small. > 2- When annotating a block, we could decide to store more information. > Typically, it could be very profitable to know (at least a part of) the > current backtrace while allocating. Agreed. I'm hoping to do some work on capturing a partial backtrace in that scenario. > 3- We could also analyze more precisely the life of annotated objects > without much performance constrains, because they are fewer. We could > for example put watchpoints on them to know when they accessed, or do > statistics about there life time... I think if using watchpoints you'd have to pick and choose what you instrument fairly carefully, in any case, otherwise everything will grind to a halt. Lifetime statistics are likely to be supported soon (this should be easy). > 4- Your method for annotating blocks uses the 22 highest bits of the > blocks headers to store the bits 4..25 of the allocation point address. > I can see several (minor) problems of doing that > - The maximum size of a block is then limited to 32GB. I think such blocks are unlikely to occur in practice. I'd argue that it's most likely a mistake to have such large allocations inside the OCaml heap, too. > - That does mean that those 22 bits identify the allocation point, > and I am not convinced that the probability of collision is negligible > in the case of large code base (like code) non-contiguously loaded in > memory because of dynlink, for example. I neglected to say that this is not expected to work with natdynlink at the moment. I think for x86-64 Linux the current assumption about contiguous code is correct, at least using the normal linker scripts, and the range is probably sufficient. The main place where the approximation could be problematic, I think, is where there are allocation points close together that can't quite be distinguished. In practice I'm not sure this is a problem, though. > - This is not usable for x86-32 bits. I'm not sure x86-32 is worthy of much attention any more (dare I say it!) but 32-bit platforms more generally I think still are of concern. My plan for upstreaming this work includes a patch to enable compiler hackers to adjust the layout of the block header more easily (roughly speaking, removing hard-coded constants in the code) and that may end up including an option to allocate more than one header word per block. This could then be used to solve the 32-bit problem, as well as likely being a useful platform for other experiments. > With statistical profiling, we can afford having a separate table of > traced blocks, that we would maintain at the end of each GC phase. This > way, we don't actually "annotate" blocks, but we rather annotate the > corresponding table entry. This seems like it might cause quite a lot of extra work, and disturb cache behaviour, no? (The current "global" analysis mentioned above in my system will disturb the cache too, but I think if that's turned off, just the value annotation should not.) Mark