From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 15524820A1 for ; Mon, 19 Aug 2013 12:34:52 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of oliver@first.in-berlin.de) identity=pra; client-ip=192.109.42.8; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="oliver@first.in-berlin.de"; x-sender="oliver@first.in-berlin.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of oliver@first.in-berlin.de) identity=mailfrom; client-ip=192.109.42.8; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="oliver@first.in-berlin.de"; x-sender="oliver@first.in-berlin.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@einhorn.in-berlin.de) identity=helo; client-ip=192.109.42.8; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="oliver@first.in-berlin.de"; x-sender="postmaster@einhorn.in-berlin.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnUBAHvzEVLAbSoIkWdsb2JhbABbhye2W4U9gSEWDgEBAQEJCwsHFAQkgiQBAQUjVhALCQ8CAgUhAgIPBRgxiCMEpD2QexaBE483B4JoM3cDjx+IRJR0 X-IPAS-Result: AnUBAHvzEVLAbSoIkWdsb2JhbABbhye2W4U9gSEWDgEBAQEJCwsHFAQkgiQBAQUjVhALCQ8CAgUhAgIPBRgxiCMEpD2QexaBE483B4JoM3cDjx+IRJR0 X-IronPort-AV: E=Sophos;i="4.89,912,1367964000"; d="scan'208";a="29728968" Received: from einhorn.in-berlin.de ([192.109.42.8]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Aug 2013 12:34:51 +0200 X-Envelope-From: oliver@first.in-berlin.de Received: from first (e178029176.adsl.alicedsl.de [85.178.29.176]) (authenticated bits=0) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id r7JAYneL018072 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 19 Aug 2013 12:34:50 +0200 Received: by first (Postfix, from userid 1000) id 8C1DF1540135; Mon, 19 Aug 2013 12:34:49 +0200 (CEST) Date: Mon, 19 Aug 2013 12:34:49 +0200 From: oliver To: Anthony Tavener Cc: "caml-list@inria.fr" Message-ID: <20130819103448.GA2070@siouxsie> References: <20130818204213.GA7482@siouxsie> <20130818205305.GA7841@siouxsie> <20130818224024.GA10193@siouxsie> <20130819002023.GA11063@siouxsie> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 Subject: Re: [Caml-list] Early GC'ing On Mon, Aug 19, 2013 at 12:43:56AM -0600, Anthony Tavener wrote: > What I was hinting at with Gc.full_major (), is that if you still had a > large amount of memory allocated after calling that, I think that means > your program is still holding on to the values somewhere. > > In your loop, when you read in the data each time, is there any way > something might leak? A hashtable holding reference to buffers? Or files > left open? [...] I close the files after reading them and also select the data of the file directly, evfore working on the next file. For time-reasons I called the Gc-cleanup after 100 files; I can try calling it immediately after each file, or after a smaller number of files. Maybe thats, why the Gc-call did not had that huge effect... ...but on average I would await that the needed size would be somewhat stable after a while. [...] > Sorry I don't have much more help on this! [...] You already helped with the hint to the full-major cleanup. I just tried to cleanup after every file. The mem usage is about 30% after 19 minutes running time. So Gc used that often consumes a lot of time... If the decreased mem usage is from the effect of the Gc or because it just needs longer until the program reaches the 50% mem usage I don't know so far. So, maybe I will just accept the mem usage. There will be some overhead for storing the data either way (not sure how much, possibly the used library uses OOP and therefore has just some overhead). As the used data is much more than the usual amount, I think further optimization is not necessary. I looked for a quick fix. Exploring in detail would need more effort, and possibly the results will not justify exploring in more depth. (Or will do it if I have more time for it.) Thanks so far for your support. Ciao, Oliver P.S.: Hmhh, at first look, cleaning up after 10 files seems to be a good compromise of speed and mem-usage. Maybe I can collect data from some more runs in a batched way to decide it...