caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: oliver <oliver@first.in-berlin.de>
To: Anthony Tavener <anthony.tavener@gmail.com>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Early GC'ing
Date: Mon, 19 Aug 2013 12:34:49 +0200	[thread overview]
Message-ID: <20130819103448.GA2070@siouxsie> (raw)
In-Reply-To: <CAN=ouMSFSEmrnbdkfEefa0faKKoTJQqTNdgzT+5tnLADDMaU3A@mail.gmail.com>

On Mon, Aug 19, 2013 at 12:43:56AM -0600, Anthony Tavener wrote:
> What I was hinting at with Gc.full_major (), is that if you still had a
> large amount of memory allocated after calling that, I think that means
> your program is still holding on to the values somewhere.
> 
> In your loop, when you read in the data each time, is there any way
> something might leak? A hashtable holding reference to buffers? Or files
> left open?
[...]

I close the files after reading them and also select the data of the
file directly, evfore working on the next file.

For time-reasons I called the Gc-cleanup after 100 files;
I can try calling it immediately after each file, or after
a smaller number of files.

Maybe thats, why the Gc-call did not had that huge effect...
...but on average I would await that the needed size would be somewhat stable
after a while.


[...]
> Sorry I don't have much more help on this!
[...]

You already helped with the hint to the full-major cleanup.

I just tried to cleanup after every file.
The mem usage is about 30% after 19 minutes running time.
So Gc used that often consumes a lot of time...
If the decreased mem usage is from the effect of the Gc or
because it just needs longer until the program reaches
the 50% mem usage I don't know so far.

So, maybe I will just accept the mem usage.
There will be some overhead for storing the data
either way (not sure how much, possibly the used library
uses OOP and therefore has just some overhead).


As the used data is much more than the usual amount,
I think further optimization is not necessary.
I looked for a quick fix. Exploring in detail
would need more effort, and possibly the results will
not justify exploring in more depth.
(Or will do it if I have more time for it.)


Thanks so far for your support.

Ciao,
   Oliver

P.S.: Hmhh, at first look, cleaning up after 10 files seems to be a good
      compromise of speed and mem-usage.
      Maybe I can collect data from some more runs in a batched way to decide
      it...

  reply	other threads:[~2013-08-19 10:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-18 20:42 oliver
2013-08-18 20:53 ` oliver
2013-08-18 21:14   ` Anthony Tavener
2013-08-18 22:40     ` oliver
2013-08-19  0:20       ` oliver
2013-08-19  6:43         ` Anthony Tavener
2013-08-19 10:34           ` oliver [this message]
2013-08-19 11:39           ` Mark Shinwell
2013-08-19 11:51 ` Adrien Nader
2013-08-19 12:36   ` oliver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130819103448.GA2070@siouxsie \
    --to=oliver@first.in-berlin.de \
    --cc=anthony.tavener@gmail.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).