caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Memory management dominates running time
@ 2003-01-03 21:58 Jörgen Gustavsson
  2003-01-04  1:20 ` Norman Ramsey
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jörgen Gustavsson @ 2003-01-03 21:58 UTC (permalink / raw)
  To: caml-list


Hi,

I have a performance problem with one of my ocaml programs: it seems as if
the memory management asymptoticly dominates the running time. I.e., as
the program runs the fraction of the time that is spent on memory
management seems to go towards 100%. I believe that the problem is in
the function fl_allocate in the ocaml runtime.

I profiled my program with gprof on a moderately sized input which gave
the following top ten time consumers.

   %  cumulative    self              self    total
 time   seconds   seconds    calls  ms/call  ms/call name
 55.5      63.29    63.29                            fl_allocate [1]
 11.0      75.79    12.50                            mark_slice [2]
  5.7      82.32     6.53                            sweep_slice [3]
  4.0      86.91     4.59                            _brk_unlocked [4]
  2.5      89.80     2.89                            oldify_one [5]
  2.3      92.42     2.62                            oldify_mopup [6]
  1.6      94.23     1.81                            modify [7]
  1.5      95.90     1.67                            alloc_shr [8]
  0.9      96.91     1.01                            allocate_block [9]
  0.9      97.90     0.99                            compare_val [10]

Judging from their names all top ten functions except compare_val and
modify has to do with the memory management. In total roughly 85% of the
time is spent on memory management.

Unfortunately it gets worse when we increase the input size.

   %  cumulative    self              self    total
 time   seconds   seconds    calls  ms/call  ms/call name
 62.2     612.56   612.56                            fl_allocate [1]
  8.9     699.91    87.35                            _brk_unlocked [2]
  7.3     772.09    72.18                            fl_add_block [3]
  5.0     821.43    49.34                            mark_slice [4]
  3.7     857.90    36.47                            oldify_local_roots [5]
  2.6     883.93    26.03                            sweep_slice [6]
  1.2     895.87    11.94                            add_to_heap [7]
  1.0     905.92    10.05                            oldify_one [8]
  1.0     915.39     9.47                            oldify_mopup [9]
  0.6     921.78     6.39                            alloc_shr [10]

Now all top ten functions concern memory management and 95% of the time is
spent on it. If I increase the input size even further it seems as if it
gets even worse.

I am not an expert on garbage collection but the only explanation that
I can see for this behaviour is that the free list that fl_allocate
searches grows as the program runs. If there are lots of small blocks in
the begining of the free list then fl_allocate potentially has to search
very far when it allocates somewhat larger blocks.

This situation can occur, for example, if the program allocates many small
blocks in the begining which are then reclaimed by the garbage collector
and then later the program allocates mostly larger blocks.

Does this explanation make any sense?

One reason for why I doubt it, is that if it is correct then the ocaml
memory management can asymptoticly dominate the running time. Is it really
so?

Finally, what can I do about this problem?


Regards,
    Jörgen Gustavsson.




-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Memory management dominates running time
  2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson
@ 2003-01-04  1:20 ` Norman Ramsey
  2003-01-05 11:35 ` Chet Murthy
  2003-01-08 14:20 ` Damien Doligez
  2 siblings, 0 replies; 6+ messages in thread
From: Norman Ramsey @ 2003-01-04  1:20 UTC (permalink / raw)
  To: Jörgen Gustavsson; +Cc: caml-list

For some time, I have been hoping for a heap profiler or other tool that
will help me debug my memory-performance problems in OCaml.  I had a student
who was interested but ran out of time at the end of last summer.
If anybody is working on such a tool, we would be pleased to help
if we can.


Norman
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Memory management dominates running time
  2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson
  2003-01-04  1:20 ` Norman Ramsey
@ 2003-01-05 11:35 ` Chet Murthy
  2003-01-08 12:58   ` Jörgen Gustavsson
  2003-01-08 14:20 ` Damien Doligez
  2 siblings, 1 reply; 6+ messages in thread
From: Chet Murthy @ 2003-01-05 11:35 UTC (permalink / raw)
  To: Jörgen Gustavsson; +Cc: caml-list


By using gprof, and doing a little careful counting, you can 
often find out where egregious allocations are happening, and
eliminate them.

I can't give you good pointers on this, because, well, it's a bit of
an art, and takes a lot of experience.  But it _is_ doable, and I have
done it to nontrivial programs in Caml, as well as in Java, attaining
speedups of 2x, 3x, without much code-restructuring.

E.g., once, I took a Stream-based parser, and wrote a Stream module
which only supported "char Stream", and only for parsing.

By writing a few auxiliary functions in addition to this, I was able
to reduce the consing associated with stream-based parsing (the
constant creation of "Some _" blocks) significantly, and got a rather
large speedup.

I was able to see that this was important, basically by looking around
in the gprof profile and trying to follow where the allocation were
happening.

--chet--
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Memory management dominates running time
  2003-01-05 11:35 ` Chet Murthy
@ 2003-01-08 12:58   ` Jörgen Gustavsson
  0 siblings, 0 replies; 6+ messages in thread
From: Jörgen Gustavsson @ 2003-01-08 12:58 UTC (permalink / raw)
  To: Chet Murthy; +Cc: caml-list

On Sun, 5 Jan 2003, Chet Murthy wrote:

> By using gprof, and doing a little careful counting, you can
> often find out where egregious allocations are happening, and
> eliminate them.

I am afraid that is not my problem right now. I know where the allocations
are and they are necessary in the algorithm we are currently implementing.

The problem is (I believe) that the cost of each allocation is not atomic.
Instead each allocation can under certain circumstances become more and
more expensive as the program runs and eventually the program hardly
does anything but searching for free blocks of memory to allocate. (I
think this happens when data is moved from the minor heap to the major
heap during the minor garbage collection. The free memory in the major
heap is kept in a linked list which is searched linearly for a block of
the right size.) It could actually help to introduce a space leak because
it might lead to less fragmentation.

Thus, when you reason about the asymptotic run time complexity of your
ocaml programs you have to take the internals of the garbage collector
into account which I think is unfortunate. Is it impossible to avoid this
problem with a mark-sweep garbage collector? (With a simple two-space
copying collector the problem disappear.)

In any case we have found a temporary solution to the problem. By setting
max_overhead in the gc control record to 0 one can force heap compaction
to take place at the end of each major GC cycle. This makes the garbage
collector behave like a compacting collector and it becomes very slow
(90% of running time) but the asymptotic problem goes away so it is good
enough for prototyping.

Regards,
    Jörgen Gustavsson.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Memory management dominates running time
  2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson
  2003-01-04  1:20 ` Norman Ramsey
  2003-01-05 11:35 ` Chet Murthy
@ 2003-01-08 14:20 ` Damien Doligez
  2003-01-08 22:23   ` [Caml-list] GlSurf 1.2 available Christophe Raffalli
  2 siblings, 1 reply; 6+ messages in thread
From: Damien Doligez @ 2003-01-08 14:20 UTC (permalink / raw)
  To: Jörgen Gustavsson; +Cc: caml-list

On Friday, January 3, 2003, at 10:58 PM, Jörgen Gustavsson wrote:

> I profiled my program with gprof on a moderately sized input which gave
> the following top ten time consumers.
>
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  55.5      63.29    63.29                            fl_allocate [1]
>  11.0      75.79    12.50                            mark_slice [2]
>   5.7      82.32     6.53                            sweep_slice [3]
>   4.0      86.91     4.59                            _brk_unlocked [4]
>   2.5      89.80     2.89                            oldify_one [5]
>   2.3      92.42     2.62                            oldify_mopup [6]
>   1.6      94.23     1.81                            modify [7]
>   1.5      95.90     1.67                            alloc_shr [8]
>   0.9      96.91     1.01                            allocate_block [9]
>   0.9      97.90     0.99                            compare_val [10]

fl_allocate, alloc_shr, and allocate_block are O'Caml allocation 
functions
   for the major heap.
mark_slice and sweep_slice are the major collector.
brk_unlocked is a C library function, I'm surprised to see it use
   so much run time.
oldify_one and oldify_mopup are the minor collector.
modify is the write barrier that is called when an assignment is done.


> Unfortunately it gets worse when we increase the input size.

This behaviour is a bit unusual.


> I am not an expert on garbage collection but the only explanation that
> I can see for this behaviour is that the free list that fl_allocate
> searches grows as the program runs. If there are lots of small blocks 
> in
> the begining of the free list then fl_allocate potentially has to 
> search
> very far when it allocates somewhat larger blocks.
>
> This situation can occur, for example, if the program allocates many 
> small
> blocks in the begining which are then reclaimed by the garbage 
> collector
> and then later the program allocates mostly larger blocks.

This phenomenon is called "fragmentation" and O'Caml does a few things
to prevent fragmentation-related problems:

1. Whenever two free blocks are adjacent, they are merged into one
    bigger block.
2. The free list is not searched from the beginning every time, because
    that would lead to an accumulation of small blocks at the beginning
    of the list, even under normal conditions.  Instead, each search
    starts where the previous one stopped, so that no block is examined
    more often than any other.
3. The heap compaction algorithm is designed to remove all fragmentation
    from the free list by moving objects around to make sure that all
    free blocks are adjacent.
4. When fragmentation is too high, the typical behaviour is that the
    allocation requests cannot be satisfied from the free list, and
    the heap size has to be increased indefinitely.  The GC can detect
    when this is the case, and automatically trigger a heap compaction.

It seems that you have found a case where this strategy fails to detect
a situation of heavy fragmentation.


> One reason for why I doubt it, is that if it is correct then the ocaml
> memory management can asymptoticly dominate the running time. Is it 
> really
> so?

It looks surprising, but we don't have an analysis of the run-time
cost of memory management, and such an analysis would be quite hard
to do precisely.


> Finally, what can I do about this problem?

You can have good control of the GC parameters through the Gc module
of the standard library.  The things you should try are:
1. Increase the minor heap size.  This will let the minor GC deallocate
    more blocks, and thus reduce the load of the major GC.  For this
    parameter, the point of diminishing returns depends quite a lot
    on the allocation behaviour of your program.
2. Increase the space overhead.  The major GC has a trade-off between
    space and time.  If you increase the space overhead, your program
    will need more memory, but the major GC load will decrease.
    Reasonable values for this parameter are between 50 and 500.
3. Decrease max_overhead.  This will trigger more automatic heap
    compactions, thus reducing fragmentation.  Be aware, however,
    that heap compaction is costly and not incremental, so it will
    pause the execution of your program (for up to a few seconds).
    This can be a problem if your program is real-time or interactive.
4. Call Gc.compact from your program.  This is a good idea if you
    can easily identify in your program the point where it switches
    from allocating small blocks to allocating big blocks.

If none of the above helps, I would be interested in getting a copy
of your program, to see what is going on in more detail.  After all,
the behaviour that you are observing might simply be caused by a bug.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Caml-list] GlSurf 1.2 available
  2003-01-08 14:20 ` Damien Doligez
@ 2003-01-08 22:23   ` Christophe Raffalli
  0 siblings, 0 replies; 6+ messages in thread
From: Christophe Raffalli @ 2003-01-08 22:23 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

I am pleased to annouce that GlSurf 1.2 is available at
<http://www.lama.univ-savoie.fr/~raffalli/glsurf.html>

GlSurf is a program to draw surface (implicit surface)
using OpenGl.

There is now a "wishes list" on the web page ... so if
OCaml programmer wants to contribute ! This kind of programming
is fun :-)

--
Version 1.2:
	- bug when free variables are used in "surface".
	- new algorithm to extract triangles from cubes. Between two
	and three times less triangles with the same cubes !
	on test9:
	
	       |  version 1.1 | version 1.2 |
	-------------------------------------
	time   |  40.36       | 19.64       |
	nb tri | ~125000      | ~45000      |
	-------------------------------------

	- simplification of the algorithm (a lot of useless tests
	removed.
	- more parameters can be set from the script (see the
	documentation.
	- added "p" key to print the eye position.
	- fixed a display bug when changing the size parameter
	between two drawings.
	
Version 1.1:
	- Removed a last minute simplification which was in fact a
	bug producing wrong topology (visible in examples/test7);
	- Simplification of the test to divide cubes and the root_in_cube
	function (a little less triangles in general).
	- Fixed a bug in root_in_cube (the diagonals were missing).
	- Fixed typo in examples/test10.

Version 1.0:
	Initial version

-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-01-09  6:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-03 21:58 [Caml-list] Memory management dominates running time Jörgen Gustavsson
2003-01-04  1:20 ` Norman Ramsey
2003-01-05 11:35 ` Chet Murthy
2003-01-08 12:58   ` Jörgen Gustavsson
2003-01-08 14:20 ` Damien Doligez
2003-01-08 22:23   ` [Caml-list] GlSurf 1.2 available Christophe Raffalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).