caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] mark_slice, sweep_slice, oldify
@ 2001-09-15  9:10 Marco Maggesi
  2001-09-18 14:22 ` Xavier Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Marco Maggesi @ 2001-09-15  9:10 UTC (permalink / raw)
  To: caml-list


Hi,

I am writing a library for computing basic algebraic operations on
polynomials in multiple indeterminates (my ideal goal would be to
implement the Buchberger algorithm for Groebner Basis).

With the aid of gprof I noticed that my program spends most of the time
of the computation in three functions `mark_slice', `sweep_slice' and
`oldify' (see below the output of gprof obtained after the computation
of some products of polynomials).

What does this exactly mean?  This is the generational garbage
collector, right?  May I low the cost of GC?

I do not want to bore you with the details of my program (I can make
the code available if someone wants), let me simply say that I tried
to represent polynomial as binary trees (both mutable and non mutable)
where each node is a term.  In any test I do the main CPU costs is
alwais in GC.  (I also tried to represent polynomials as sorted lists
of terms, but then I have a bad complexity in terms of number of
comparisons and products of terms).

I do not pretend a precise answer.  Simply tell me what I have to look
or study to understand which operation stress the GC more and which not.

Thanks, Marco


Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.25      2.24     2.24      908     2.47     2.47  mark_slice
 19.84      4.00     1.76      754     2.33     2.35  sweep_slice
 16.69      5.48     1.48   172210     0.01     0.01  oldify
  8.46      6.23     0.75  4075593     0.00     0.00  Monomial_mul_56
  4.85      6.66     0.43  6151035     0.00     0.00  alloc_shr
  4.85      7.09     0.43     2831     0.15     2.93  Tpoly_mul_term_iter_212
  4.51      7.49     0.40  1022415     0.00     0.00  Tpoly_mul_term_iter_216
  2.82      7.74     0.25  4075593     0.00     0.00  Ring_mul_88
  2.37      7.95     0.21  3444124     0.00     0.00  Monomial_lex_102
  2.03      8.13     0.18  6151535     0.00     0.00  fl_allocate
  1.92      8.30     0.17  6151035     0.00     0.00  allocate_block
  1.58      8.44     0.14      500     0.28     0.28  add_to_heap
  1.24      8.55     0.11  1360897     0.00     0.00  Ring_add_80
  1.01      8.64     0.09  8880614     0.00     0.00  caml_apply2
  1.01      8.73     0.09     1662     0.05     1.50  oldify_local_roots
  0.90      8.81     0.08     9333     0.01     0.06  Tpoly_add_104
  0.34      8.84     0.03     2831     0.01     0.13  Tpoly_mul_term_iter_191
  0.11      8.85     0.01   376392     0.00     0.00  fl_merge_block
  0.11      8.86     0.01   339844     0.00     0.00  Ring_is_zero_95
  0.11      8.87     0.01    12657     0.00     0.00  gc_message
  0.00      8.87     0.00   233068     0.00     0.00  Tpoly_make_node_92
  0.00      8.87     0.00     3324     0.00     0.75  empty_minor_heap
  0.00      8.87     0.00     3324     0.00     0.00  final_empty_young
  0.00      8.87     0.00     2926     0.00     0.00  darken
  0.00      8.87     0.00     2831     0.00     0.00  Tpoly_mul_term2_rlist_205
  0.00      8.87     0.00     1662     0.00     3.92  caml_call_gc
  0.00      8.87     0.00     1662     0.00     0.00  final_do_calls
  0.00      8.87     0.00     1662     0.00     0.00  final_do_young_roots
  0.00      8.87     0.00     1662     0.00     3.92  garbage_collection
  0.00      8.87     0.00     1662     0.00     2.42  major_collection_slice
  0.00      8.87     0.00     1662     0.00     3.92  minor_collection
  0.00      8.87     0.00      501     0.00     0.00  aligned_malloc
  0.00      8.87     0.00      501     0.00     0.00  round_heap_chunk_size
  0.00      8.87     0.00      500     0.00     0.00  alloc_for_heap
  0.00      8.87     0.00      500     0.00     0.28  expand_heap
  0.00      8.87     0.00      500     0.00     0.00  fl_add_block

[...]
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] mark_slice, sweep_slice, oldify
  2001-09-15  9:10 [Caml-list] mark_slice, sweep_slice, oldify Marco Maggesi
@ 2001-09-18 14:22 ` Xavier Leroy
  0 siblings, 0 replies; 8+ messages in thread
From: Xavier Leroy @ 2001-09-18 14:22 UTC (permalink / raw)
  To: Marco Maggesi; +Cc: caml-list

> With the aid of gprof I noticed that my program spends most of the time
> of the computation in three functions `mark_slice', `sweep_slice' and
> `oldify' (see below the output of gprof obtained after the computation
> of some products of polynomials).
> 
> What does this exactly mean?  This is the generational garbage
> collector, right?

Yes.  "oldify" is the minor copying collector, and "mark_slice" and
"sweep_slice" are the incremental mark-and-sweep major collector.

> May I low the cost of GC?

There is no general answer to this question.  As a rule of thumb,
allocating small, short-lived blocks (that die before the next minor
collection) is very cheap; longer-lived objects cost significantly more.

You could try to play with the GC settings (either at start-up via the
CAMLRUNPARAM environment variable, or programmatically via the Gc.get
and Gc.set functions).  E.g. if many of your blocks survive a minor
collection but die shortly thereafter, increasing the size of the
minor heap could help.  If your program runs for a short amount of
time, you could try increasing the "overhead" parameter to, say, 100,
to make the incremental major collector work less.

The GC statistics returned by Gc.stat could also be of interest,
although you need a bit of background in garbage collection to
understand what they really mean :-)

Sometimes, the program can be rewritten to eliminate allocation of
intermediate data structures.  This is known as "deforestation" in the
literature.  For a simple-minded example: if you often sum three
polynomials, instead of writing "poly_add x (poly_add y z)"
(which causes the intermediate polynomial y + z to be allocated),
write a "poly_add3" function that will sum three polynomials in one go.

Good luck,

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] mark_slice, sweep_slice, oldify
  2001-09-24 17:48 ` Marco Maggesi
@ 2001-09-27 16:59   ` Thorsten Ohl
  0 siblings, 0 replies; 8+ messages in thread
From: Thorsten Ohl @ 2001-09-27 16:59 UTC (permalink / raw)
  To: caml-list; +Cc: Marco Maggesi

Marco Maggesi writes:

> how to merge a certain number of ordered lists in a single orederd
> list.

Patricia trees allow very fast merging.

Theory:

   http://www.cs.columbia.edu/~cdo/papers.html#ml98maps

Released O'Caml implementation:

   http://www.lri.fr/~filliatr/software.en.html

Unreleased O'Caml code for computer algebra:

   ftp://heplix.ikp.physik.tu-darmstadt.de/pub/ohl/lotr/

Unfortunately, I currently have no time to work on it, but comments
are welcome.
-- 
Thorsten Ohl, Physics Department, TU Darmstadt -- ohl@hep.tu-darmstadt.de
http://heplix.ikp.physik.tu-darmstadt.de/~ohl/ [<=== PGP public key here]
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] mark_slice, sweep_slice, oldify
  2001-09-19 16:40 Damien Doligez
  2001-09-19 19:06 ` Chris Hecker
@ 2001-09-24 17:48 ` Marco Maggesi
  2001-09-27 16:59   ` Thorsten Ohl
  1 sibling, 1 reply; 8+ messages in thread
From: Marco Maggesi @ 2001-09-24 17:48 UTC (permalink / raw)
  To: caml-list


Damien and Xavier, thank you for your clear and concrete answers.  I
already had some benefit by playing with [Gc.set].  However, I think I
will need more time to do some tests and understand better what's
going on, but I do not have enough time at the moment.  I will send a
notice if I will obtain something interesting.

I realized that my problem can be described with a good approximation
as follows: how to merge a certain number of ordered lists in a single
orederd list.  Even better, I need to merge a matrix a_{i,j} of
elements which are ordered by rows and by columns
  a_{i+1,j} >= a_{i,j}   and   a_{1,j+1} >= a_{i,j}
for all i, j.

My approach is to use leftist trees to reduce the cost of merging
lists in pairs.  Probably, this is responsable for the allocation of
many intermediate structures to compute the result.  Probably I can do
better than that.  I think I have to look more depthly in the theory.
(Perhaps The Art of Computer Programs?)

Ciao, Marco
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] mark_slice, sweep_slice, oldify
  2001-09-19 19:06 ` Chris Hecker
  2001-09-22 18:15   ` John Max Skaller
@ 2001-09-22 18:24   ` John Max Skaller
  1 sibling, 0 replies; 8+ messages in thread
From: John Max Skaller @ 2001-09-22 18:24 UTC (permalink / raw)
  To: Chris Hecker; +Cc: Damien Doligez, caml-list

Chris Hecker wrote:
[GC/games]

BTW: the 'amount of time' spent collecting is almost certainly
irrelevant to a real time game: what's important is the
_distribution_ of the collection operation over time:
it's OK to spend half you time collecting, as long as its
the same amount of time every frame (rather, less than
the available time). So you probably want to force
minor collections often. To put this another way: 

	performance == worst case delay

(So you need to minimise the maximum computation
time between frames, not the total fraction of time
spent collecting).


-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au 
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
New generation programming language Felix  http://felix.sourceforge.net
Literate Programming tool Interscript     
http://Interscript.sourceforge.net
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] mark_slice, sweep_slice, oldify
  2001-09-19 19:06 ` Chris Hecker
@ 2001-09-22 18:15   ` John Max Skaller
  2001-09-22 18:24   ` John Max Skaller
  1 sibling, 0 replies; 8+ messages in thread
From: John Max Skaller @ 2001-09-22 18:15 UTC (permalink / raw)
  To: Chris Hecker; +Cc: caml-list

Chris Hecker wrote:

>  Being a relatively performance sensitive game programmer 
> writing a commercial game in ocaml, I have a constant lowlevel 
> anxiety that I'll wake up one day and the GC will be taking a 
> lot of time and I won't be able to fix it.

	Stop worrying.

	1. You can do what other games do: use fixed mutable
data stores. 

	2. The GC will be much faster than other storage
management techniques used by games.

	3. The GC can be tuned.

Just don't overdo the 'functional' programming. Use imperative
data structures where they're appropriate. 

	It's my guess that, unlike most games, your game might actually WORK. 
This would be a pleasant change from the beautiful, high quality
graphics
games around, with utterly broken game play engines. Hopefully,
using Ocaml will allow you to concentrate on design and balance.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au 
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
New generation programming language Felix  http://felix.sourceforge.net
Literate Programming tool Interscript     
http://Interscript.sourceforge.net
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:  [Caml-list] mark_slice, sweep_slice, oldify
  2001-09-19 16:40 Damien Doligez
@ 2001-09-19 19:06 ` Chris Hecker
  2001-09-22 18:15   ` John Max Skaller
  2001-09-22 18:24   ` John Max Skaller
  2001-09-24 17:48 ` Marco Maggesi
  1 sibling, 2 replies; 8+ messages in thread
From: Chris Hecker @ 2001-09-19 19:06 UTC (permalink / raw)
  To: Damien Doligez, caml-list, maggesi


At 06:40 PM 9/19/01 +0200, Damien Doligez wrote:

[good GC advice snipped]

>Please send me mail if the above hints are not enough to solve your
>problem.

If you guys end up having a private thread about this, please post a summary to the list, since I'm very interested and I'm sure others are too.  Being a relatively performance sensitive game programmer writing a commercial game in ocaml, I have a constant lowlevel anxiety that I'll wake up one day and the GC will be taking a lot of time and I won't be able to fix it.  So, any success stories and hints are good therapy.  :)

Chris


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re:  [Caml-list] mark_slice, sweep_slice, oldify
@ 2001-09-19 16:40 Damien Doligez
  2001-09-19 19:06 ` Chris Hecker
  2001-09-24 17:48 ` Marco Maggesi
  0 siblings, 2 replies; 8+ messages in thread
From: Damien Doligez @ 2001-09-19 16:40 UTC (permalink / raw)
  To: caml-list, maggesi

>From: Marco Maggesi <maggesi@math.unifi.it>

>With the aid of gprof I noticed that my program spends most of the time
>of the computation in three functions `mark_slice', `sweep_slice' and
>`oldify' (see below the output of gprof obtained after the computation
>of some products of polynomials).

>What does this exactly mean?  This is the generational garbage
>collector, right?  May I low the cost of GC?

As Xavier said, "oldify" is the minor collector and "mark_slice" and
"sweep_slice" are the major collector.  Spending 17% of your time in
"oldify" is not shockingly bad.  Spending 45% in the major collector
is a lot higher than normal.


>I do not pretend a precise answer.  Simply tell me what I have to look
>or study to understand which operation stress the GC more and which not.

It is not really operations that stress the GC, but the behavior of
your programs with respect to data structures, so it is always a bit
hard to understand what is going on exactly.

The default GC parameters are a compromise between speed and memory
use, and there is no way to make them optimal for all programs.  If
you want more speed (in exchange for more memory "wasted"), there are
mainly two parameters that you can change: minor_heap_size and
space_overhead.

Increasing minor_heap_size will reduce the time spent by both the
minor GC and the major GC.  The default is 32768 words, and depending
on your machine and the behaviour of your program, it can be
reasonable to increase it to 250000, or even to a few millions.  It is
often (but not always) better to keep it small enough to fit in
the cache of your machine.

Increasing space_overhead will reduce the time spent in the major GC.
The default is 42 percent.  If your program uses 100 megabytes of data
structures, then the GC will (on average) use an additional 42
megabytes of memory (you can think of it as a buffer for allocations;
the larger the buffer, the less work the GC has to do).  It can be
reasonable to increase space_overhead to 100 or 200, again depending
on the amount of memory in your machine and the behaviour of your
program.

It may also be interesting to increase major_heap_increment if the
program uses a lot of memory (which seems to be true in your case).
By increasing it, you reduce the number of times that add_to_heap is
called.  The default value is 62k words.


To have a better idea of what's going on, you need to use the Gc
module of the standard library.  You should print the GC statistics at
the end of your program.  The most important figure is the ratio of
promoted_words to minor_words.  This should be as small as possible.
If it is more than 10%, you'll spend too much time in the GC (both
minor and major).  Increasing minor_heap_size often helps in this case.

Please send me mail if the above hints are not enough to solve your
problem.

-- Damien
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-09-27 17:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-15  9:10 [Caml-list] mark_slice, sweep_slice, oldify Marco Maggesi
2001-09-18 14:22 ` Xavier Leroy
2001-09-19 16:40 Damien Doligez
2001-09-19 19:06 ` Chris Hecker
2001-09-22 18:15   ` John Max Skaller
2001-09-22 18:24   ` John Max Skaller
2001-09-24 17:48 ` Marco Maggesi
2001-09-27 16:59   ` Thorsten Ohl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).