caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Damien Doligez <damien.doligez@inria.fr>
To: "Jörgen Gustavsson" <gustavss@cs.chalmers.se>
Cc: <caml-list@inria.fr>
Subject: Re: [Caml-list] Memory management dominates running time
Date: Wed, 8 Jan 2003 15:20:40 +0100	[thread overview]
Message-ID: <5D91C652-2314-11D7-B138-0003930FCE12@inria.fr> (raw)
In-Reply-To: <Pine.SOL.4.30.0301032109550.17168-100000@muppet20.cs.chalmers.se>

On Friday, January 3, 2003, at 10:58 PM, Jörgen Gustavsson wrote:

> I profiled my program with gprof on a moderately sized input which gave
> the following top ten time consumers.
>
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  55.5      63.29    63.29                            fl_allocate [1]
>  11.0      75.79    12.50                            mark_slice [2]
>   5.7      82.32     6.53                            sweep_slice [3]
>   4.0      86.91     4.59                            _brk_unlocked [4]
>   2.5      89.80     2.89                            oldify_one [5]
>   2.3      92.42     2.62                            oldify_mopup [6]
>   1.6      94.23     1.81                            modify [7]
>   1.5      95.90     1.67                            alloc_shr [8]
>   0.9      96.91     1.01                            allocate_block [9]
>   0.9      97.90     0.99                            compare_val [10]

fl_allocate, alloc_shr, and allocate_block are O'Caml allocation 
functions
   for the major heap.
mark_slice and sweep_slice are the major collector.
brk_unlocked is a C library function, I'm surprised to see it use
   so much run time.
oldify_one and oldify_mopup are the minor collector.
modify is the write barrier that is called when an assignment is done.


> Unfortunately it gets worse when we increase the input size.

This behaviour is a bit unusual.


> I am not an expert on garbage collection but the only explanation that
> I can see for this behaviour is that the free list that fl_allocate
> searches grows as the program runs. If there are lots of small blocks 
> in
> the begining of the free list then fl_allocate potentially has to 
> search
> very far when it allocates somewhat larger blocks.
>
> This situation can occur, for example, if the program allocates many 
> small
> blocks in the begining which are then reclaimed by the garbage 
> collector
> and then later the program allocates mostly larger blocks.

This phenomenon is called "fragmentation" and O'Caml does a few things
to prevent fragmentation-related problems:

1. Whenever two free blocks are adjacent, they are merged into one
    bigger block.
2. The free list is not searched from the beginning every time, because
    that would lead to an accumulation of small blocks at the beginning
    of the list, even under normal conditions.  Instead, each search
    starts where the previous one stopped, so that no block is examined
    more often than any other.
3. The heap compaction algorithm is designed to remove all fragmentation
    from the free list by moving objects around to make sure that all
    free blocks are adjacent.
4. When fragmentation is too high, the typical behaviour is that the
    allocation requests cannot be satisfied from the free list, and
    the heap size has to be increased indefinitely.  The GC can detect
    when this is the case, and automatically trigger a heap compaction.

It seems that you have found a case where this strategy fails to detect
a situation of heavy fragmentation.


> One reason for why I doubt it, is that if it is correct then the ocaml
> memory management can asymptoticly dominate the running time. Is it 
> really
> so?

It looks surprising, but we don't have an analysis of the run-time
cost of memory management, and such an analysis would be quite hard
to do precisely.


> Finally, what can I do about this problem?

You can have good control of the GC parameters through the Gc module
of the standard library.  The things you should try are:
1. Increase the minor heap size.  This will let the minor GC deallocate
    more blocks, and thus reduce the load of the major GC.  For this
    parameter, the point of diminishing returns depends quite a lot
    on the allocation behaviour of your program.
2. Increase the space overhead.  The major GC has a trade-off between
    space and time.  If you increase the space overhead, your program
    will need more memory, but the major GC load will decrease.
    Reasonable values for this parameter are between 50 and 500.
3. Decrease max_overhead.  This will trigger more automatic heap
    compactions, thus reducing fragmentation.  Be aware, however,
    that heap compaction is costly and not incremental, so it will
    pause the execution of your program (for up to a few seconds).
    This can be a problem if your program is real-time or interactive.
4. Call Gc.compact from your program.  This is a good idea if you
    can easily identify in your program the point where it switches
    from allocating small blocks to allocating big blocks.

If none of the above helps, I would be interested in getting a copy
of your program, to see what is going on in more detail.  After all,
the behaviour that you are observing might simply be caused by a bug.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2003-01-08 14:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-01-03 21:58 Jörgen Gustavsson
2003-01-04  1:20 ` Norman Ramsey
2003-01-05 11:35 ` Chet Murthy
2003-01-08 12:58   ` Jörgen Gustavsson
2003-01-08 14:20 ` Damien Doligez [this message]
2003-01-08 22:23   ` [Caml-list] GlSurf 1.2 available Christophe Raffalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D91C652-2314-11D7-B138-0003930FCE12@inria.fr \
    --to=damien.doligez@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=gustavss@cs.chalmers.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).