caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Early GC'ing
@ 2013-08-18 20:42 oliver
  2013-08-18 20:53 ` oliver
  2013-08-19 11:51 ` Adrien Nader
  0 siblings, 2 replies; 10+ messages in thread
From: oliver @ 2013-08-18 20:42 UTC (permalink / raw)
  To: caml-list

Hello,

in a loop I read in a lot of files,
from which I just need to collect
a small portion of the data.

Right after reading and selecting what I needed,
I don't need any of the read file data.
The most stuff can just be thrown away.

What GC functions need to be called to throw away
the garbage soon?

Thats the first time I really need the GC-functions to
trigger by myself, because it's much data, of which
I need only a small portion.

So far I normally needed most of the data I read in
and the defaults of the GC were pretty fine for my tasks and taste.

This time a lot of mem is used...
...so GC functionality seems to be the way to go.


Are there general rules of thumb how to trigger GC
for a given kind of mem-handling (in my case: read data,
pick some parts of it and throw away the rest soon),
or is this something, where exploring the GC-statistics
is the common case (no rules of thumb?)?


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-18 20:42 [Caml-list] Early GC'ing oliver
@ 2013-08-18 20:53 ` oliver
  2013-08-18 21:14   ` Anthony Tavener
  2013-08-19 11:51 ` Adrien Nader
  1 sibling, 1 reply; 10+ messages in thread
From: oliver @ 2013-08-18 20:53 UTC (permalink / raw)
  To: caml-list

With the subset of the files I want to read,
I get these two GC-stats-outputs (Gc.print_stats) at the beginning and
the end of the reading loop:


--------------------
minor_words: 176917
promoted_words: 28265
major_words: 83313
minor_collections: 5
major_collections: 1
heap_words: 126976
heap_chunks: 1
top_heap_words: 126976
live_words: 82608
live_blocks: 7325
free_words: 44368
free_blocks: 33
largest_free: 43663
fragments: 0
compactions: 0
--------------------
minor_words: 119944974
promoted_words: 18927217
major_words: 55119706
minor_collections: 3672
major_collections: 71
heap_words: 14221824
heap_chunks: 112
top_heap_words: 14221824
live_words: 10194963
live_blocks: 1184994
free_words: 4024031
free_blocks: 46685
largest_free: 65811
fragments: 2830
compactions: 6
--------------------


What does this tell you?
How to clean the mem?


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-18 20:53 ` oliver
@ 2013-08-18 21:14   ` Anthony Tavener
  2013-08-18 22:40     ` oliver
  0 siblings, 1 reply; 10+ messages in thread
From: Anthony Tavener @ 2013-08-18 21:14 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

I run tight loops (game engine -- 60fps with a lot of memory activity and
allocations each frame), and the GC works remarkably well at keeping things
sane. I did have a problem with runaway allocations once, and tracked it
down to a source of allocations which was effectively never
un-referenced... so a legitimate leak.

If you do a Gc.full_major (), is your memory returned? If not, then I think
that's evidence that there's still some handle on it -- be sure the
appropriate values have fallen out of scope and aren't referenced in some
other way!

On the other hand, if this is just the GC not cleaning up quick enough for
your case... I'm sorry I have no help for tuning the GC. The documentation
in gc.mli seemed pretty sensible when I looked at it (a while ago) though!



On Sun, Aug 18, 2013 at 2:53 PM, oliver <oliver@first.in-berlin.de> wrote:

> With the subset of the files I want to read,
> I get these two GC-stats-outputs (Gc.print_stats) at the beginning and
> the end of the reading loop:
>
>
> --------------------
> minor_words: 176917
> promoted_words: 28265
> major_words: 83313
> minor_collections: 5
> major_collections: 1
> heap_words: 126976
> heap_chunks: 1
> top_heap_words: 126976
> live_words: 82608
> live_blocks: 7325
> free_words: 44368
> free_blocks: 33
> largest_free: 43663
> fragments: 0
> compactions: 0
> --------------------
> minor_words: 119944974
> promoted_words: 18927217
> major_words: 55119706
> minor_collections: 3672
> major_collections: 71
> heap_words: 14221824
> heap_chunks: 112
> top_heap_words: 14221824
> live_words: 10194963
> live_blocks: 1184994
> free_words: 4024031
> free_blocks: 46685
> largest_free: 65811
> fragments: 2830
> compactions: 6
> --------------------
>
>
> What does this tell you?
> How to clean the mem?
>
>
> Ciao,
>    Oliver
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 2883 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-18 21:14   ` Anthony Tavener
@ 2013-08-18 22:40     ` oliver
  2013-08-19  0:20       ` oliver
  0 siblings, 1 reply; 10+ messages in thread
From: oliver @ 2013-08-18 22:40 UTC (permalink / raw)
  To: Anthony Tavener; +Cc: caml-list

Hi,

thanks for your hints.


On Sun, Aug 18, 2013 at 03:14:26PM -0600, Anthony Tavener wrote:
> I run tight loops (game engine -- 60fps with a lot of memory activity and
> allocations each frame), and the GC works remarkably well at keeping things
> sane. I did have a problem with runaway allocations once, and tracked it
> down to a source of allocations which was effectively never
> un-referenced... so a legitimate leak.
> 
> If you do a Gc.full_major (), is your memory returned? If not, then I think
> that's evidence that there's still some handle on it -- be sure the
> appropriate values have fallen out of scope and aren't referenced in some
> other way!
[...]


It did not helped much.

Not sure if it's a Gc-issue at all, or overhead of the data structures.

I used my full dataset now and top showed me mem usage above 50% (about 50...55).
From 1.9 GB free mem and a resulting file of 254 MB, it means
1000 MB mem usage for 254 MB output data.

Maybe thats just the normal overhead I have to accept. (?)

The input data was 1.4 GB.


(It seemed, the Gc-invokation did made the program run faster. But I'm not sure
if it's an artifact (caching stuff by the kernel or so or my unprecise measurment).
It's about 9 minutes vs. 10.5 minutes, but rather unprecise measured with "top" and eye.)


> 
> On the other hand, if this is just the GC not cleaning up quick enough for
> your case... I'm sorry I have no help for tuning the GC. The documentation
> in gc.mli seemed pretty sensible when I looked at it (a while ago) though!

I had no problem with fast cleaning; rather my limited mem on my machine
was an issue. Looks like I need a new / bigger machine...

The expected worst case of swapping did not occur at all.


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-18 22:40     ` oliver
@ 2013-08-19  0:20       ` oliver
  2013-08-19  6:43         ` Anthony Tavener
  0 siblings, 1 reply; 10+ messages in thread
From: oliver @ 2013-08-19  0:20 UTC (permalink / raw)
  To: Anthony Tavener; +Cc: caml-list

...strangely the resulting data is different for each run...


...strange.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-19  0:20       ` oliver
@ 2013-08-19  6:43         ` Anthony Tavener
  2013-08-19 10:34           ` oliver
  2013-08-19 11:39           ` Mark Shinwell
  0 siblings, 2 replies; 10+ messages in thread
From: Anthony Tavener @ 2013-08-19  6:43 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

What I was hinting at with Gc.full_major (), is that if you still had a
large amount of memory allocated after calling that, I think that means
your program is still holding on to the values somewhere.

In your loop, when you read in the data each time, is there any way
something might leak? A hashtable holding reference to buffers? Or files
left open?

I tried searching for tools which might help with memory leaks or
profiling, and got a recent presentation I hadn't seen:
http://oud.ocaml.org/2012/slides/oud2012-paper13-slides.pdf
Not that that is much help for you right now...!

And I found these tips on GC params:
http://elehack.net/writings/programming/ocaml-memory-tuning
But it doesn't seem like it's a tuning problem, if the memory is still
"live", especially after a full_major collection.

Sorry I don't have much more help on this!



On Sun, Aug 18, 2013 at 6:20 PM, oliver <oliver@first.in-berlin.de> wrote:

> ...strangely the resulting data is different for each run...
>
>
> ...strange.
>
>
>

[-- Attachment #2: Type: text/html, Size: 1713 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-19  6:43         ` Anthony Tavener
@ 2013-08-19 10:34           ` oliver
  2013-08-19 11:39           ` Mark Shinwell
  1 sibling, 0 replies; 10+ messages in thread
From: oliver @ 2013-08-19 10:34 UTC (permalink / raw)
  To: Anthony Tavener; +Cc: caml-list

On Mon, Aug 19, 2013 at 12:43:56AM -0600, Anthony Tavener wrote:
> What I was hinting at with Gc.full_major (), is that if you still had a
> large amount of memory allocated after calling that, I think that means
> your program is still holding on to the values somewhere.
> 
> In your loop, when you read in the data each time, is there any way
> something might leak? A hashtable holding reference to buffers? Or files
> left open?
[...]

I close the files after reading them and also select the data of the
file directly, evfore working on the next file.

For time-reasons I called the Gc-cleanup after 100 files;
I can try calling it immediately after each file, or after
a smaller number of files.

Maybe thats, why the Gc-call did not had that huge effect...
...but on average I would await that the needed size would be somewhat stable
after a while.


[...]
> Sorry I don't have much more help on this!
[...]

You already helped with the hint to the full-major cleanup.

I just tried to cleanup after every file.
The mem usage is about 30% after 19 minutes running time.
So Gc used that often consumes a lot of time...
If the decreased mem usage is from the effect of the Gc or
because it just needs longer until the program reaches
the 50% mem usage I don't know so far.

So, maybe I will just accept the mem usage.
There will be some overhead for storing the data
either way (not sure how much, possibly the used library
uses OOP and therefore has just some overhead).


As the used data is much more than the usual amount,
I think further optimization is not necessary.
I looked for a quick fix. Exploring in detail
would need more effort, and possibly the results will
not justify exploring in more depth.
(Or will do it if I have more time for it.)


Thanks so far for your support.

Ciao,
   Oliver

P.S.: Hmhh, at first look, cleaning up after 10 files seems to be a good
      compromise of speed and mem-usage.
      Maybe I can collect data from some more runs in a batched way to decide
      it...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-19  6:43         ` Anthony Tavener
  2013-08-19 10:34           ` oliver
@ 2013-08-19 11:39           ` Mark Shinwell
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Shinwell @ 2013-08-19 11:39 UTC (permalink / raw)
  To: Anthony Tavener; +Cc: oliver, caml-list

On 19 August 2013 07:43, Anthony Tavener <anthony.tavener@gmail.com> wrote:
> What I was hinting at with Gc.full_major (), is that if you still had a
> large amount of memory allocated after calling that, I think that means your
> program is still holding on to the values somewhere.

I'm not sure that is the case.  As far as I know offhand,
[Gc.full_major] won't cause memory previously occupied by Caml heap
pages to be returned to the operating system (and so you'd still
see it allocated in top).  You would need to call [Gc.compact],
which is potentially much more expensive, to achieve that effect.

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-18 20:42 [Caml-list] Early GC'ing oliver
  2013-08-18 20:53 ` oliver
@ 2013-08-19 11:51 ` Adrien Nader
  2013-08-19 12:36   ` oliver
  1 sibling, 1 reply; 10+ messages in thread
From: Adrien Nader @ 2013-08-19 11:51 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

Hi,

On Sun, Aug 18, 2013, oliver wrote:
> Hello,
> 
> in a loop I read in a lot of files,
> from which I just need to collect
> a small portion of the data.
> 
> Right after reading and selecting what I needed,
> I don't need any of the read file data.
> The most stuff can just be thrown away.
> 

Have you tried using Bigarray.Array1.map_file?
It should be much faster and allocate pretty much nothing.
(mmap is awesome)

-- 
Adrien Nader

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Early GC'ing
  2013-08-19 11:51 ` Adrien Nader
@ 2013-08-19 12:36   ` oliver
  0 siblings, 0 replies; 10+ messages in thread
From: oliver @ 2013-08-19 12:36 UTC (permalink / raw)
  To: Adrien Nader; +Cc: caml-list

On Mon, Aug 19, 2013 at 01:51:38PM +0200, Adrien Nader wrote:
> Hi,
> 
> On Sun, Aug 18, 2013, oliver wrote:
> > Hello,
> > 
> > in a loop I read in a lot of files,
> > from which I just need to collect
> > a small portion of the data.
> > 
> > Right after reading and selecting what I needed,
> > I don't need any of the read file data.
> > The most stuff can just be thrown away.
> > 
> 
> Have you tried using Bigarray.Array1.map_file?
> It should be much faster and allocate pretty much nothing.
> (mmap is awesome)
[...]

I was not aware that mmap-functionality exists in standard OCaml distribution.
I would have expected such a binding in Unix-module, and because it's not there,
not even mentioned that it exists somewhere else, I thought it's not available.

Thanks for the hint. :-)


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-08-19 12:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-18 20:42 [Caml-list] Early GC'ing oliver
2013-08-18 20:53 ` oliver
2013-08-18 21:14   ` Anthony Tavener
2013-08-18 22:40     ` oliver
2013-08-19  0:20       ` oliver
2013-08-19  6:43         ` Anthony Tavener
2013-08-19 10:34           ` oliver
2013-08-19 11:39           ` Mark Shinwell
2013-08-19 11:51 ` Adrien Nader
2013-08-19 12:36   ` oliver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).