From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 9F38CBBAF for ; Thu, 11 Nov 2010 15:15:22 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAKuJ20yty1O7/2dsb2JhbACiQnHAQoVKBIRaiQka X-IronPort-AV: E=Sophos;i="4.59,183,1288566000"; d="scan'208";a="65481128" Received: from elehack.net ([173.203.83.187]) by mail3-smtp-sop.national.inria.fr with ESMTP; 11 Nov 2010 15:14:47 +0100 Received: from [192.168.42.103] (unknown [68.168.162.166]) by elehack.net (Postfix) with ESMTPSA id E4326C8956 for ; Thu, 11 Nov 2010 08:17:22 -0600 (CST) Message-ID: <4CDBFA53.5020009@elehack.net> Date: Thu, 11 Nov 2010 08:14:43 -0600 From: Michael Ekstrand User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6 MIME-Version: 1.0 To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Average cost of the OCaml GC References: <87bp5w1b47.fsf@frosties.localnet> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam: no; 0.00; ocaml:01 ocaml:01 'self:01 'self:01 alloc:01 alloc:01 pointers:01 'as:01 chunks:01 patched:01 bigarray:01 bigarray:01 allocating:01 bigarrays:01 allocations:01 On 11/11/2010 07:52 AM, Jianzhou Zhao wrote: > On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow wrote: >> Jianzhou Zhao writes: >> >>> Hi, >>> >>> What is the average cost of the OCaml GC? I have a program that calls >>> 'mark_slice' in 57% of the total execution time, and calls >>> 'sweep_slice' in 21% of the total time, reported by Callgrind, which >>> is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- >>> the cost of the function itself ('Self Cost'), rather than the cost >>> including all called functions ('Inclusive Cost'). I guess >>> 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are >>> these numbers normal? >> >> Those numbers sound rather high to me. They sound high to me as well, but not unheard of - I sometimes measure a lot of time in the GC. >>> My program calls both OCaml and C, which passes around C data types in >>> between. I also doubt if I defined the interface in an 'unefficient' >>> way that slows down the GC. Are there any rules in mind to make GC >>> work more efficiently? >> >> You can tune some of the GC parameters to suit your use case. >> >> Do you allocate custom types from C? In caml_alloc_custom(ops, size, >> used, max) the used and max do influence the GC how often to run. > > Yes. The code uses caml_alloc_custom to create a lot of small objects > (less then 8 bytes) frequently. The used and max are set to be > default, 0 and 1. The manual says > http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 > > ///////////////////// > If your finalized blocks contain no pointers to out-of-heap resources, > or if the previous discussion made little sense to you, just take used > = 0 and max = 1. But if you later find that the finalization functions > are not called “often enough”, consider increasing the used / max > ratio. > ////////////////////// > > Does this mean the default used and max let GC do finalization 'as > slow as possible'? This does not seem to be the case if the costs 57% > and 20% are too high. Yes, with respect to GC cycles triggered by "too much" custom data allocation. There are a variety of things that can cause GC thrashing. One of them is the GC "space overhead" parameter, which controls how aggressive the GC is at reclaiming memory. Another is your minor heap size - if your minor heap is too small, it can cause excess GC activity. I documented the parameter tuning I have done to reduce GC cost on my blog[1], but here's a short summary: * Increase minor heap size. I usually use 1M or 4M words; my general rule of thumb is that I want one "work unit" with its temporary storage requirements to fit in a minor heap. This decreases the frequency both of minor collections and major slices. * Increase space_overhead; I increase this to 100 or 200 (the default is 80), as I typically run my large codes on machines with lots of spare RAM and can accept a space-speed tradeoff. * Increase the heap increment. If your process will require lots of RAM, this lets it allocate that memory in bigger chunks further decreasing the memory overhead. I also use a patched Bigarray that allows me to set the "max" parameter it uses in its invocations of caml_alloc_custom, but if you are not using bigarray that shouldn't be impacting your program's performance. It's quite critical when allocating large bigarrays, though! Having custom blocks allocated near or above the "max" param is a sure-fire recipe for GC thrashing. It sounds like you're avoiding that pitfall, though. So, the short short story: you're doing many of the right things (measuring, not letting custom allocations thrash the GC). Some more parameter tuning will hopefully help you decrease your GC overhead. 1. http://elehack.net/michael/blog/2010/06/ocaml-memory-tuning - Michael