caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Garbage collection and caml_adjust_gc_speed
@ 2006-08-29  4:16 Christopher Kauffman
  2006-08-29 15:20 ` [Caml-list] " Damien Doligez
  0 siblings, 1 reply; 6+ messages in thread
From: Christopher Kauffman @ 2006-08-29  4:16 UTC (permalink / raw)
  To: caml-list

I am finishing up my first major research application written in OCaml. The code is a scientific 
application which does a moderate amount of floating point computations for which I have employed 
the Bigarray library and the Lacaml package.

I am attempting to tune the performance of the code and to that end, I have examined the native code 
performance using gprof (OCaml manual section 17.4). The first thing that struck me on analyzing the 
profile is that the function 'caml_adjust_gc_speed' is called a lot. The first few lines of the 
profile are:

Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
  14.55      6.80     6.80  4841768     0.00     0.00  caml_adjust_gc_speed
  14.41     13.53     6.73                             bigarray_fill
   6.87     16.74     3.21                             lacaml_Dssqr_zero_stub
   6.30     19.68     2.94                             bigarray_offset
   4.30     21.69     2.01                             bigarray_slice
   4.03     23.57     1.88                             bigarray_get_N
   2.68     24.82     1.25     1612     0.00     0.00  sweep_slice
   2.63     26.05     1.23 24714254     0.00     0.00  caml_c_call
   2.10     27.03     0.98  4972572     0.00     0.00  caml_alloc_shr
...

The actual runtime of the program is about 18 seconds so the gprof cumulative time is off by quite a 
bit. What concerns me is the large overhead I seem to be getting from the first function, 
'caml_adjust_gc_speed', which I assume is related to the garbage collector. Over 4 million calls to 
this function seems a little much. I attempted to play with a garbage collection parameter, the 
value of control.space_overhead in the Gc module. According to the manual, this affects the major GC 
speed and increasing the value is supposed to cut down on the aggressiveness of the GC. Setting 
space_overhead to 99 did not change number of calls to 'caml_adjust_gc_speed'.

I'm looking for someone with a bit more knowledge of the garbage collection in OCaml to enlighten me 
on whether this overhead can be reduced or if it is an unavoidable side-effect of relying on the 
garbage collector. I'd be happy to provide more details on the code if this would be helpful.

Cheers,
Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Garbage collection and caml_adjust_gc_speed
  2006-08-29  4:16 Garbage collection and caml_adjust_gc_speed Christopher Kauffman
@ 2006-08-29 15:20 ` Damien Doligez
  2006-08-29 18:37   ` Christopher Kauffman
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Doligez @ 2006-08-29 15:20 UTC (permalink / raw)
  To: OCaml

On 2006-08-29, at 06:16, Christopher Kauffman wrote:

> I am finishing up my first major research application written in  
> OCaml. The code is a scientific application which does a moderate  
> amount of floating point computations for which I have employed the  
> Bigarray library and the Lacaml package.

[...]

> What concerns me is the large overhead I seem to be getting from  
> the first function, 'caml_adjust_gc_speed', which I assume is  
> related to the garbage collector. Over 4 million calls to this  
> function seems a little much. I attempted to play with a garbage  
> collection parameter, the value of control.space_overhead in the Gc  
> module. According to the manual, this affects the major GC speed  
> and increasing the value is supposed to cut down on the  
> aggressiveness of the GC. Setting space_overhead to 99 did not  
> change number of calls to 'caml_adjust_gc_speed'.

caml_adjust_gc_speed is called by caml_alloc_custom to tell it about
resources (in this case, memory) outside the heap that depend on the
allocated block.

None of the GC parameter can reduce this number.  I guess your program
allocated over 4 million bigarrays.

What I find surprising is that this simple function would dominate
the run time of your program.  I suspect some problem with gprof.
Could you give us more details about the computations performed by
your program?

-- Damien


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Garbage collection and caml_adjust_gc_speed
  2006-08-29 15:20 ` [Caml-list] " Damien Doligez
@ 2006-08-29 18:37   ` Christopher Kauffman
  2006-08-31 15:34     ` Christopher Kauffman
  0 siblings, 1 reply; 6+ messages in thread
From: Christopher Kauffman @ 2006-08-29 18:37 UTC (permalink / raw)
  To: OCaml

> What I find surprising is that this simple function would dominate the run
> time of your program.  I suspect some problem with gprof. Could you give us
> more details about the computations performed by your program?

The program is in the area of computational biology. It is intended to be used
to study approximations to protein folding. The main operations involve altering
  N by 3 matrices representing the coordinates of atoms of the protein. I have
used Bigarrays for the data structures for efficiency and the Lacaml package for
most of these operations (multiply, add, etc.) as it leverages BLAS and LAPACK
library calls which are machine-tuned matrix operations.

Lacaml uses fortran_layout double arrays as the matrices on which it operates.
The style of Lacaml is to use optional label arguments so that space may be
re-used. For instance the following creates two N by 3 matrices for
multiplication and a 3 by 3 matrix to store the results. The two large matrices
are multiplied and the result is stored in the 3 by 3 matrix:

open Lacaml.D (* Use double precision matrix elements *)
let n = 100 in
let a = Mat.random n 3 in (* Create n by 3 matrix - Bigarray *)
let b = Mat.random n 3 in (* Create n by 3 matrix - Bigarray *)
let corr = Mat.make 3 3 in (* Create 3 by 3 matrix - Bigarray *)
let new_corr = gemm ~c:corr ~transa:`T a b in (* matrix multiply *)
...

This snippet multiplies 'a' and 'b' and stores the result in 'corr'. The final
line creates a new name for 'corr', 'new_corr' which should point to the same
space as the original 'corr'. This allows a pseudo-functional style of writing
the program while avoiding the need to create many new matrices. As an
alternative, the final line could be changed to

ignore(gemm ~c:corr ~transa:`T acpy bcpy);

and the name 'corr' used later. For convenience I have followed the style of the
first version in my program. I suppose it is possible that creating the aliased
name could affect the garbage collector (it might look like the allocation of
another Bigarray so I will experiment with the second style.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Garbage collection and caml_adjust_gc_speed
  2006-08-29 18:37   ` Christopher Kauffman
@ 2006-08-31 15:34     ` Christopher Kauffman
  2006-08-31 21:08       ` [Caml-list] Garbage collection and caml_adjust_gc_speed - Problem in Bigarray Christopher Kauffman
  0 siblings, 1 reply; 6+ messages in thread
From: Christopher Kauffman @ 2006-08-31 15:34 UTC (permalink / raw)
  To: OCaml

Christopher Kauffman wrote:
>> What I find surprising is that this simple function would dominate the 
>> run
>> time of your program.  I suspect some problem with gprof. Could you 
>> give us
>> more details about the computations performed by your program?
> 
> I suppose it is possible that creating the 
> aliased
> name could affect the garbage collector (it might look like the 
> allocation of
> another Bigarray so I will experiment with the second style.

I altered some of the code in my program to remove aliases that were created by 
the style mentioned in my previous post. This did not seem to affect the number 
of times caml_adjust_gc_speed was called. I guess my questions will go 
unanswered until I find some time to dig into the inner workings of the garbage 
collector. Let me know if someone sees this in the future and has similar 
problems and/or finds a solution.

-Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Garbage collection and caml_adjust_gc_speed - Problem in Bigarray
  2006-08-31 15:34     ` Christopher Kauffman
@ 2006-08-31 21:08       ` Christopher Kauffman
  2006-09-01 17:06         ` David MENTRE
  0 siblings, 1 reply; 6+ messages in thread
From: Christopher Kauffman @ 2006-08-31 21:08 UTC (permalink / raw)
  To: OCaml

After some very helpful advice from Shivkumar Chandrasekaran, I have located the 
source of the problem. In my code, a very frequent operation is to take a slice 
of a matrix or array (a column of a matrix representing the X-coordinates of 
some atoms for instance) and perform some operations on that slice. This is 
supported in the Bigarray library with the 'slice_left' function (for 
fortran_layout arrays) and in Lacaml with Mat.col function. Unfortunately, using 
this operation results in the following sequence of underlying C-function calls 
in ocaml-3.09.2/otherlibs/bigarray/bigarray_stubs.c:

bigarray_slice() -> alloc_bigarray() -> alloc_custom() -> caml_adjust_gc_speed()

So even though there is no allocation of new underlying data, taking a slice 
creates custom data to manage the new alias which triggers GC adjustment. I am 
certainly not familiar with the inner workings of the garbage collector, but 
this seems like undesirable behavior as much efficiency is lost for no apparent 
reason.

This now seems like an issue with the Bigarray library. Please advise me on the 
protocol for submitting this issue that it might be considered for correction in 
a future release.

Cheers,
Chris

PS - Thanks again to Shivkumar!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Garbage collection and caml_adjust_gc_speed - Problem in Bigarray
  2006-08-31 21:08       ` [Caml-list] Garbage collection and caml_adjust_gc_speed - Problem in Bigarray Christopher Kauffman
@ 2006-09-01 17:06         ` David MENTRE
  0 siblings, 0 replies; 6+ messages in thread
From: David MENTRE @ 2006-09-01 17:06 UTC (permalink / raw)
  To: Christopher Kauffman; +Cc: OCaml

Christopher Kauffman <kauffman@cs.umn.edu> writes:

> This now seems like an issue with the Bigarray library. Please advise me
> on the protocol for submitting this issue that it might be considered
> for correction in a future release.

Submit a bug report with as much details as possible on OCaml bug
tracker: http://caml.inria.fr/mantis/main_page.php

Best wishes,
d.
-- 
GPG/PGP key: A3AD7A2A David MENTRE <dmentre@linux-france.org>
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-09-01 17:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-29  4:16 Garbage collection and caml_adjust_gc_speed Christopher Kauffman
2006-08-29 15:20 ` [Caml-list] " Damien Doligez
2006-08-29 18:37   ` Christopher Kauffman
2006-08-31 15:34     ` Christopher Kauffman
2006-08-31 21:08       ` [Caml-list] Garbage collection and caml_adjust_gc_speed - Problem in Bigarray Christopher Kauffman
2006-09-01 17:06         ` David MENTRE

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).