From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <oliver@first.in-berlin.de>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by sympa.inria.fr (Postfix) with ESMTPS id 15524820A1
	for <caml-list@sympa.inria.fr>; Mon, 19 Aug 2013 12:34:52 +0200 (CEST)
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  oliver@first.in-berlin.de) identity=pra;
  client-ip=192.109.42.8;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="oliver@first.in-berlin.de";
  x-sender="oliver@first.in-berlin.de";
  x-conformance=sidf_compatible
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  oliver@first.in-berlin.de) identity=mailfrom;
  client-ip=192.109.42.8;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="oliver@first.in-berlin.de";
  x-sender="oliver@first.in-berlin.de";
  x-conformance=sidf_compatible
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@einhorn.in-berlin.de) identity=helo;
  client-ip=192.109.42.8;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="oliver@first.in-berlin.de";
  x-sender="postmaster@einhorn.in-berlin.de";
  x-conformance=sidf_compatible
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AnUBAHvzEVLAbSoIkWdsb2JhbABbhye2W4U9gSEWDgEBAQEJCwsHFAQkgiQBAQUjVhALCQ8CAgUhAgIPBRgxiCMEpD2QexaBE483B4JoM3cDjx+IRJR0
X-IPAS-Result: AnUBAHvzEVLAbSoIkWdsb2JhbABbhye2W4U9gSEWDgEBAQEJCwsHFAQkgiQBAQUjVhALCQ8CAgUhAgIPBRgxiCMEpD2QexaBE483B4JoM3cDjx+IRJR0
X-IronPort-AV: E=Sophos;i="4.89,912,1367964000"; 
   d="scan'208";a="29728968"
Received: from einhorn.in-berlin.de ([192.109.42.8])
  by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Aug 2013 12:34:51 +0200
X-Envelope-From: oliver@first.in-berlin.de
Received: from first (e178029176.adsl.alicedsl.de [85.178.29.176])
	(authenticated bits=0)
	by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id r7JAYneL018072
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Mon, 19 Aug 2013 12:34:50 +0200
Received: by first (Postfix, from userid 1000)
	id 8C1DF1540135; Mon, 19 Aug 2013 12:34:49 +0200 (CEST)
Date: Mon, 19 Aug 2013 12:34:49 +0200
From: oliver <oliver@first.in-berlin.de>
To: Anthony Tavener <anthony.tavener@gmail.com>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Message-ID: <20130819103448.GA2070@siouxsie>
References: <20130818204213.GA7482@siouxsie>
 <20130818205305.GA7841@siouxsie>
 <CAN=ouMQKr10L87_xWzR5j+YNH63Vzw+RiWciEEOhA9BQDzrUPw@mail.gmail.com>
 <20130818224024.GA10193@siouxsie>
 <20130819002023.GA11063@siouxsie>
 <CAN=ouMSFSEmrnbdkfEefa0faKKoTJQqTNdgzT+5tnLADDMaU3A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CAN=ouMSFSEmrnbdkfEefa0faKKoTJQqTNdgzT+5tnLADDMaU3A@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8
Subject: Re: [Caml-list] Early GC'ing

On Mon, Aug 19, 2013 at 12:43:56AM -0600, Anthony Tavener wrote:
> What I was hinting at with Gc.full_major (), is that if you still had a
> large amount of memory allocated after calling that, I think that means
> your program is still holding on to the values somewhere.
> 
> In your loop, when you read in the data each time, is there any way
> something might leak? A hashtable holding reference to buffers? Or files
> left open?
[...]

I close the files after reading them and also select the data of the
file directly, evfore working on the next file.

For time-reasons I called the Gc-cleanup after 100 files;
I can try calling it immediately after each file, or after
a smaller number of files.

Maybe thats, why the Gc-call did not had that huge effect...
...but on average I would await that the needed size would be somewhat stable
after a while.


[...]
> Sorry I don't have much more help on this!
[...]

You already helped with the hint to the full-major cleanup.

I just tried to cleanup after every file.
The mem usage is about 30% after 19 minutes running time.
So Gc used that often consumes a lot of time...
If the decreased mem usage is from the effect of the Gc or
because it just needs longer until the program reaches
the 50% mem usage I don't know so far.

So, maybe I will just accept the mem usage.
There will be some overhead for storing the data
either way (not sure how much, possibly the used library
uses OOP and therefore has just some overhead).


As the used data is much more than the usual amount,
I think further optimization is not necessary.
I looked for a quick fix. Exploring in detail
would need more effort, and possibly the results will
not justify exploring in more depth.
(Or will do it if I have more time for it.)


Thanks so far for your support.

Ciao,
   Oliver

P.S.: Hmhh, at first look, cleaning up after 10 files seems to be a good
      compromise of speed and mem-usage.
      Maybe I can collect data from some more runs in a batched way to decide
      it...