caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Slow GC problem
@ 2003-04-04 19:40 Shivkumar Chandrasekaran
  2003-04-03 21:07 ` Christophe Raffalli
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-04 19:40 UTC (permalink / raw)
  To: caml-list

I have a gc efficiency problem for which I require some advice. I have 
read both the O'Reilly book and the manual on gc.

I am implementing a fast direct matrix solver for 2D PDEs. So it uses 
the Bigarray module a lot. I have two versions of my algorithm. On is 
an in-core algorithm and the other is the same solver, except that it 
is out-of-core (most of the matrices are stored in disk files). 
Unfortunately the out-of-core solver  is *faster* than the in-core 
solver for the identical problem! I was expecting the out-of-core 
solver to be 10 times slower. I am concluding that gc is to blame. 
Below I give the gc stats just before and after the solver routine is 
called in the in-core solver:

				"Just before"	"Just after"
minor_words:		46243376	139259767
promoted_words:	928267		2595523
major_words:		2883087		39489766
minor_collections:	1412		4591
major_collections:	18			52
heap_words:		2150400		1044480
heap_chunks:		35			17
top_heap_words:	2150400		5038080
live_words:		1842373		840037
live_blocks:		253926		116816
free_words:		307180		204440
free_blocks:		47368		17
largest_free:		10928		61440
fragments:		847			3
compactions:		0			2

I tried changing some parameters using Gc.set but it did not make a 
significant difference. Does anybody see any obvious gc problems from 
the above data? Thanks,

--shiv--


PS: I wrote the out-of-core solver in just 3 days once the in-core 
solver was done, all in O'Caml. This would have have taken much longer 
in Fortran/C. Thanks to the O'Caml team.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
@ 2003-04-14 16:37 Shivkumar Chandrasekaran
  0 siblings, 0 replies; 15+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-04-14 16:37 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

I have attached the entire profile (such as is available on Mac OS X) 
to the bottom of this message. One warning though: I am only interested 
in profiling my solver (function SfTwoDsolver.DsfTwoDsolver. 
superfastLUofBTDSSS'). However the time required to set-up my problem 
is significant. So the profiler information may not be that helpful. 
Direct timing shows that on this run my solver required about 43 
seconds to run.

--shiv--



[-- Attachment #2: a_profile --]
[-- Type: application/octet-stream, Size: 50515 bytes --]




call graph profile:
          The sum of self and descendents is the major sort
          for this listing.

          function entries:

index     the index of the function in the call graph
          listing, as an aid to locating it (see below).

%time     the percentage of the total time of the program
          accounted for by this function and its
          descendents.

self      the number of seconds spent in this function
          itself.

descendents
          the number of seconds spent in the descendents of
          this function on behalf of this function.

called    the number of times this function is called (other
          than recursive calls).

self      the number of times this function calls itself
          recursively.

name      the name of the function, with an indication of
          its membership in a cycle, if any.

index     the index of the function in the call graph
          listing, as an aid to locating it.



          parent listings:

self*     the number of seconds of this function's self time
          which is due to calls from this parent.

descendents*
          the number of seconds of this function's
          descendent time which is due to calls from this
          parent.

called**  the number of times this function is called by
          this parent.  This is the numerator of the
          fraction which divides up the function's time to
          its parents.

total*    the number of times this function was called by
          all of its parents.  This is the denominator of
          the propagation fraction.

parents   the name of this parent, with an indication of the
          parent's membership in a cycle, if any.

index     the index of this parent in the call graph
          listing, as an aid in locating it.



          children listings:

self*     the number of seconds of this child's self time
          which is due to being called by this function.

descendent*
          the number of seconds of this child's descendent's
          time which is due to being called by this
          function.

called**  the number of times this child is called by this
          function.  This is the numerator of the
          propagation fraction for this child.

total*    the number of times this child is called by all
          functions.  This is the denominator of the
          propagation fraction.

children  the name of this child, and an indication of its
          membership in a cycle, if any.

index     the index of this child in the call graph listing,
          as an aid to locating it.



          * these fields are omitted for parents (or
          children) in the same cycle as the function.  If
          the function (or child) is a member of a cycle,
          the propagated times and propagation denominator
          represent the self time and descendent time of the
          cycle as a whole.

          ** static-only parents and children are indicated
          by a call count of 0.



          cycle listings:
          the cycle as a whole is listed with the same
          fields as a function entry.  Below it are listed
          the members of the cycle, and their contributions
          to the time and call counts of the cycle.
\f

granularity: each sample hit covers 4 byte(s) for 0.09% of 11.47 seconds

                                  called/total       parents 
index  %time    self descendents  called+self    name    	index
                                  called/total       children

                                                     <spontaneous>
[1]     13.9    1.60        0.00                 _local_dger_ [1]

-----------------------------------------------

                                                     <spontaneous>
[2]      7.8    0.90        0.00                 _region_for_ptr_no_lock [2]

-----------------------------------------------

                                                     <spontaneous>
[3]      4.6    0.53        0.00                 _bigarray_set_aux [3]

-----------------------------------------------

                                                     <spontaneous>
[4]      3.7    0.42        0.00                 _bigarray_offset [4]

-----------------------------------------------

                                                     <spontaneous>
[5]      3.7    0.42        0.00                 _gemvT4x16 [5]

-----------------------------------------------

                                                     <spontaneous>
[6]      3.6    0.41        0.00                 _mark_slice [6]

-----------------------------------------------

                                                     <spontaneous>
[7]      3.1    0.36        0.00                 _DLACPY [7]

-----------------------------------------------

                                                     <spontaneous>
[8]      3.1    0.28        0.07                 _caml_c_call [8]
                0.02        0.00  328470/328470      _camlidl_lapack_cblas_dgemm [83]
                0.00        0.02   68298/68298       _camlidl_lapack_d_transp [84]
                0.01        0.00  273772/273772      _camlidl_lapack_dlacpy_ [101]
                0.01        0.00   48704/48704       _camlidl_lapack_dgeqlf_ [102]
                0.01        0.00   19963/19963       _camlidl_lapack_dtrtrs_ [103]
                0.00        0.00  204939/204939      _camlidl_lapack_cblas_dscal [2632]
                0.00        0.00   87830/87830       _camlidl_lapack_dormql_ [2633]
                0.00        0.00   56730/56730       _camlidl_lapack_dormlq_ [2634]
                0.00        0.00   19152/19152       _camlidl_lapack_dgesvd_ [2635]
                0.00        0.00    9600/9600        _camlidl_lapack_dgelqf_ [2636]

-----------------------------------------------

                                                     <spontaneous>
[9]      2.6    0.30        0.00                 _sweep_slice [9]

-----------------------------------------------

                                                     <spontaneous>
[10]     2.3    0.26        0.00                 _adjust_gc_speed [10]

-----------------------------------------------

                                                     <spontaneous>
[11]     2.2    0.25        0.00                 _szone_malloc [11]

-----------------------------------------------

                                                     <spontaneous>
[12]     1.8    0.21        0.00                 _DLASR [12]

-----------------------------------------------

                                                     <spontaneous>
[13]     1.8    0.21        0.00                 _Nla__fromInt_1391 [13]

-----------------------------------------------

                                                     <spontaneous>
[14]     1.7    0.20        0.00                 _fl_allocate [14]

-----------------------------------------------

                                                     <spontaneous>
[15]     1.7    0.19        0.00                 _bigarray_set_2 [15]

-----------------------------------------------

                                                     <spontaneous>
[16]     1.6    0.18        0.00                 _ATL_dJIK0x0x0NN0x0x0_aX_bX [16]

-----------------------------------------------

                                                     <spontaneous>
[17]     1.6    0.18        0.00                 _bigarray_get_N [17]

-----------------------------------------------

                                                     <spontaneous>
[18]     1.4    0.16        0.00                 _copy_double [18]

-----------------------------------------------

                                                     <spontaneous>
[19]     1.1    0.13        0.00                 _ATL_ddot_xp1yp1aXbX [19]

-----------------------------------------------

                                                     <spontaneous>
[20]     1.1    0.13        0.00                 _DBDSQR [20]

-----------------------------------------------

                                                     <spontaneous>
[21]     1.0    0.12        0.00                 _ATL_dJIK0x0x0NN5x1x16_aX_bX [21]

-----------------------------------------------

                                                     <spontaneous>
[22]     1.0    0.12        0.00                 _ATL_dgemv [22]

-----------------------------------------------

                                                     <spontaneous>
[23]     1.0    0.12        0.00                 _bigarray_finalize [23]

-----------------------------------------------

                                                     <spontaneous>
[24]     1.0    0.11        0.00                 _ATL_dgezero [24]

-----------------------------------------------

                                                     <spontaneous>
[25]     0.9    0.10        0.00                 _ATL_dger1_a1_x1_yX [25]

-----------------------------------------------

                                                     <spontaneous>
[26]     0.9    0.10        0.00                 _ATL_dtrsmKLLNN [26]

-----------------------------------------------

                                                     <spontaneous>
[27]     0.9    0.10        0.00                 _Nla__iter2_486 [27]

-----------------------------------------------

                                                     <spontaneous>
[28]     0.9    0.10        0.00                 _sqrt [28]

-----------------------------------------------

                                                     <spontaneous>
[29]     0.8    0.09        0.00                 _ATL_dJIK0x0x0NT0x0x0_aX_bX [29]

-----------------------------------------------

                                                     <spontaneous>
[30]     0.8    0.09        0.00                 _bigarray_sub [30]

-----------------------------------------------

                                                     <spontaneous>
[31]     0.8    0.09        0.00                 _szone_size [31]

-----------------------------------------------

                                                     <spontaneous>
[32]     0.6    0.07        0.00                 _alloc_bigarray [32]

-----------------------------------------------

                                                     <spontaneous>
[33]     0.6    0.07        0.00                 _bigarray_dim [33]

-----------------------------------------------

                                                     <spontaneous>
[34]     0.6    0.07        0.00                 _bigarray_update_proxy [34]

-----------------------------------------------

                                                     <spontaneous>
[35]     0.6    0.07        0.00                 _cblas_dgemv [35]

-----------------------------------------------

                                                     <spontaneous>
[36]     0.6    0.07        0.00                 _dlamch_ [36]

-----------------------------------------------

                                                     <spontaneous>
[37]     0.6    0.07        0.00                 _dlartg_ [37]

-----------------------------------------------

                                                     <spontaneous>
[38]     0.6    0.07        0.00                 _free_list_remove_ptr [38]

-----------------------------------------------

                                                     <spontaneous>
[39]     0.5    0.06        0.00                 _ATL_dNCmmJIK [39]

-----------------------------------------------

                                                     <spontaneous>
[40]     0.5    0.06        0.00                 _ILAENV [40]

-----------------------------------------------

                                                     <spontaneous>
[41]     0.5    0.06        0.00                 _Std_exit__code_end [41]

-----------------------------------------------

                                                     <spontaneous>
[42]     0.5    0.06        0.00                 _bigarray_reshape [42]

-----------------------------------------------

                                                     <spontaneous>
[43]     0.5    0.06        0.00                 _dlarfg_ [43]

-----------------------------------------------

                                                     <spontaneous>
[44]     0.4    0.05        0.00                 _ATL_dJIK0x0x0NN1x4x16_aX_bX [44]

-----------------------------------------------

                                                     <spontaneous>
[45]     0.4    0.05        0.00                 _ATL_dJIK0x0x0TN0x0x0_aX_bX [45]

-----------------------------------------------

                                                     <spontaneous>
[46]     0.4    0.05        0.00                 _ATL_dscal_xp1yp0aXbX [46]

-----------------------------------------------

                                                     <spontaneous>
[47]     0.4    0.05        0.00                 _Nla__iDU_1800 [47]

-----------------------------------------------

                                                     <spontaneous>
[48]     0.4    0.05        0.00                 _Nla__normMax_1913 [48]

-----------------------------------------------

                                                     <spontaneous>
[49]     0.4    0.05        0.00                 _SSQr [49]

-----------------------------------------------

                                                     <spontaneous>
[50]     0.4    0.05        0.00                 _bigarray_get_2 [50]

-----------------------------------------------

                                                     <spontaneous>
[51]     0.4    0.05        0.00                 _fl_merge_block [51]

-----------------------------------------------

                                                     <spontaneous>
[52]     0.4    0.05        0.00                 _frexp [52]

-----------------------------------------------

                                                     <spontaneous>
[53]     0.4    0.05        0.00                 _lsame_ [53]

-----------------------------------------------

                                                     <spontaneous>
[54]     0.3    0.04        0.00                 _DLARF [54]

-----------------------------------------------

                                                     <spontaneous>
[55]     0.3    0.04        0.00                 _alloc_custom [55]

-----------------------------------------------

                                                     <spontaneous>
[56]     0.3    0.04        0.00                 _alloc_shr [56]

-----------------------------------------------

                                                     <spontaneous>
[57]     0.3    0.04        0.00                 _bigarray_fill [57]

-----------------------------------------------

                                                     <spontaneous>
[58]     0.3    0.04        0.00                 _caml_apply2 [58]

-----------------------------------------------

                                                     <spontaneous>
[59]     0.3    0.04        0.00                 _f2c_dgemv [59]

-----------------------------------------------

                                                     <spontaneous>
[60]     0.3    0.04        0.00                 _malloc_zone_free [60]

-----------------------------------------------

                                                     <spontaneous>
[61]     0.3    0.03        0.00                 _ATL_dJIK0x0x0NT5x1x12_aX_bX [61]

-----------------------------------------------

                                                     <spontaneous>
[62]     0.3    0.03        0.00                 _ATL_dcopy_xp0yp0aXbX [62]

-----------------------------------------------

                                                     <spontaneous>
[63]     0.3    0.03        0.00                 _ATL_dcpsc_xp0yp0aXbX [63]

-----------------------------------------------

                                                     <spontaneous>
[64]     0.3    0.03        0.00                 _ATL_ddot_xp0yp0aXbX [64]

-----------------------------------------------

                                                     <spontaneous>
[65]     0.3    0.03        0.00                 _ATL_dger [65]

-----------------------------------------------

                                                     <spontaneous>
[66]     0.3    0.03        0.00                 _ATL_dptgemm [66]

-----------------------------------------------

                                                     <spontaneous>
[67]     0.3    0.03        0.00                 _Ltv__sfsolve_1040 [67]

-----------------------------------------------

                                                     <spontaneous>
[68]     0.3    0.03        0.00                 _Ltv__superfastMul_456 [68]

-----------------------------------------------

                                                     <spontaneous>
[69]     0.3    0.03        0.00                 _Nla__extractRange_1487 [69]

-----------------------------------------------

                                                     <spontaneous>
[70]     0.3    0.03        0.00                 _allocate_block [70]

-----------------------------------------------

                                                     <spontaneous>
[71]     0.3    0.03        0.00                 _check_urgent_gc [71]

-----------------------------------------------

                                                     <spontaneous>
[72]     0.3    0.03        0.00                 _compare_val [72]

-----------------------------------------------

                                                     <spontaneous>
[73]     0.3    0.03        0.00                 _dorm2l_ [73]

-----------------------------------------------

                                                     <spontaneous>
[74]     0.3    0.03        0.00                 _free_list_add_ptr [74]

-----------------------------------------------

                                                     <spontaneous>
[75]     0.3    0.03        0.00                 _gemv8x4 [75]

-----------------------------------------------

                                                     <spontaneous>
[76]     0.3    0.03        0.00                 _gemvT_Nsmall [76]

-----------------------------------------------

                                                     <spontaneous>
[77]     0.3    0.03        0.00                 _ger_Nle4 [77]

-----------------------------------------------

                                                     <spontaneous>
[78]     0.3    0.03        0.00                 _malloc [78]

-----------------------------------------------

                                                     <spontaneous>
[79]     0.3    0.03        0.00                 _malloc_zone_malloc [79]

-----------------------------------------------

                                                     <spontaneous>
[80]     0.3    0.03        0.00                 _oldify_one [80]

-----------------------------------------------

                                                     <spontaneous>
[81]     0.2    0.02        0.00                 restFP [81]

-----------------------------------------------

                                                     <spontaneous>
[82]     0.2    0.02        0.00                 saveFP [82]

-----------------------------------------------

                0.02        0.00  328470/328470      _caml_c_call [8]
[83]     0.2    0.02        0.00  328470         _camlidl_lapack_cblas_dgemm [83]

-----------------------------------------------

                0.00        0.02   68298/68298       _caml_c_call [8]
[84]     0.2    0.00        0.02   68298         _camlidl_lapack_d_transp [84]
                0.02        0.00   68298/68298       _d_transp [85]

-----------------------------------------------

                0.02        0.00   68298/68298       _camlidl_lapack_d_transp [84]
[85]     0.2    0.02        0.00   68298         _d_transp [85]

-----------------------------------------------

                                                     <spontaneous>
[86]     0.2    0.02        0.00                 _ATL_dGEMM2TN [86]

-----------------------------------------------

                                                     <spontaneous>
[87]     0.2    0.02        0.00                 _ATL_ddot [87]

-----------------------------------------------

                                                     <spontaneous>
[88]     0.2    0.02        0.00                 _Bigarray__dim1_152 [88]

-----------------------------------------------

                                                     <spontaneous>
[89]     0.2    0.02        0.00                 _DTRTRS [89]

-----------------------------------------------

                                                     <spontaneous>
[90]     0.2    0.02        0.00                 _Nla__matrix2x2_1693 [90]

-----------------------------------------------

                                                     <spontaneous>
[91]     0.2    0.02        0.00                 _Nla__ql_2076 [91]

-----------------------------------------------

                                                     <spontaneous>
[92]     0.2    0.02        0.00                 _Nla__svd_2301 [92]

-----------------------------------------------

                                                     <spontaneous>
[93]     0.2    0.02        0.00                 _Nla__zeros_1459 [93]

-----------------------------------------------

                                                     <spontaneous>
[94]     0.2    0.02        0.00                 _Pervasives__min_48 [94]

-----------------------------------------------

                                                     <spontaneous>
[95]     0.2    0.02        0.00                 _cblas_dnrm2 [95]

-----------------------------------------------

                                                     <spontaneous>
[96]     0.2    0.02        0.00                 _dgesvd_ [96]

-----------------------------------------------

                                                     <spontaneous>
[97]     0.2    0.02        0.00                 _dlange_ [97]

-----------------------------------------------

                                                     <spontaneous>
[98]     0.2    0.02        0.00                 _f2c_dger [98]

-----------------------------------------------

                                                     <spontaneous>
[99]     0.2    0.02        0.00                 _lessequal [99]

-----------------------------------------------

                                                     <spontaneous>
[100]    0.2    0.02        0.00                 _szone_free [100]

-----------------------------------------------

                0.01        0.00  273772/273772      _caml_c_call [8]
[101]    0.1    0.01        0.00  273772         _camlidl_lapack_dlacpy_ [101]

-----------------------------------------------

                0.01        0.00   48704/48704       _caml_c_call [8]
[102]    0.1    0.01        0.00   48704         _camlidl_lapack_dgeqlf_ [102]

-----------------------------------------------

                0.01        0.00   19963/19963       _caml_c_call [8]
[103]    0.1    0.01        0.00   19963         _camlidl_lapack_dtrtrs_ [103]

-----------------------------------------------

                                                     <spontaneous>
[104]    0.1    0.01        0.00                 _ATL_apply_tree [104]

-----------------------------------------------

                                                     <spontaneous>
[105]    0.1    0.01        0.00                 _ATL_dGEMM2NN [105]

-----------------------------------------------

                                                     <spontaneous>
[106]    0.1    0.01        0.00                 _ATL_dJIK0x0x0NN1x1x16_aX_bX [106]

-----------------------------------------------

                                                     <spontaneous>
[107]    0.1    0.01        0.00                 _ATL_dJIK0x0x0NT1x4x12_aX_bX [107]

-----------------------------------------------

                                                     <spontaneous>
[108]    0.1    0.01        0.00                 _ATL_dJIK0x0x0TN5x1x12_aX_bX [108]

-----------------------------------------------

                                                     <spontaneous>
[109]    0.1    0.01        0.00                 _ATL_dcpsc [109]

-----------------------------------------------

                                                     <spontaneous>
[110]    0.1    0.01        0.00                 _ATL_dptgemm_nt [110]

-----------------------------------------------

                                                     <spontaneous>
[111]    0.1    0.01        0.00                 _ATL_dpttrsm_nt [111]

-----------------------------------------------

                                                     <spontaneous>
[112]    0.1    0.01        0.00                 _ATL_dscal [112]

-----------------------------------------------

                                                     <spontaneous>
[113]    0.1    0.01        0.00                 _ATL_join_tree [113]

-----------------------------------------------

                                                     <spontaneous>
[114]    0.1    0.01        0.00                 _Bigarray__reshape_1_255 [114]

-----------------------------------------------

                                                     <spontaneous>
[115]    0.1    0.01        0.00                 _DGEBD2 [115]

-----------------------------------------------

                                                     <spontaneous>
[116]    0.1    0.01        0.00                 _DLAPY2 [116]

-----------------------------------------------

                                                     <spontaneous>
[117]    0.1    0.01        0.00                 _List__rev_append_74 [117]

-----------------------------------------------

                                                     <spontaneous>
[118]    0.1    0.01        0.00                 _Ltv__fastSub_895 [118]

-----------------------------------------------

                                                     <spontaneous>
[119]    0.1    0.01        0.00                 _Nla__fun2mat_1529 [119]

-----------------------------------------------

                                                     <spontaneous>
[120]    0.1    0.01        0.00                 _Nla__getArrayFromPool_1427 [120]

-----------------------------------------------

                                                     <spontaneous>
[121]    0.1    0.01        0.00                 _Nla__lq_2052 [121]

-----------------------------------------------

                                                     <spontaneous>
[122]    0.1    0.01        0.00                 _Nla__noOfCols_1415 [122]

-----------------------------------------------

                                                     <spontaneous>
[123]    0.1    0.01        0.00                 _Nla__partition2x1_1574 [123]

-----------------------------------------------

                                                     <spontaneous>
[124]    0.1    0.01        0.00                 _Nla__partitionInfx1_1620 [124]

-----------------------------------------------

                                                     <spontaneous>
[125]    0.1    0.01        0.00                 _Nla__rowScale_2366 [125]

-----------------------------------------------

                                                     <spontaneous>
[126]    0.1    0.01        0.00                 _Nla__setToL_1771 [126]

-----------------------------------------------

                                                     <spontaneous>
[127]    0.1    0.01        0.00                 _Nla__transp_1890 [127]

-----------------------------------------------

                                                     <spontaneous>
[128]    0.1    0.01        0.00                 _SSQ [128]

-----------------------------------------------

                                                     <spontaneous>
[129]    0.1    0.01        0.00                 _bigarray_num_elts [129]

-----------------------------------------------

                                                     <spontaneous>
[130]    0.1    0.01        0.00                 _caml_apply12 [130]

-----------------------------------------------

                                                     <spontaneous>
[131]    0.1    0.01        0.00                 _caml_apply14 [131]

-----------------------------------------------

                                                     <spontaneous>
[132]    0.1    0.01        0.00                 _caml_apply9 [132]

-----------------------------------------------

                                                     <spontaneous>
[133]    0.1    0.01        0.00                 _caml_curry3_1 [133]

-----------------------------------------------

                                                     <spontaneous>
[134]    0.1    0.01        0.00                 _cblas_dgemm [134]

-----------------------------------------------

                                                     <spontaneous>
[135]    0.1    0.01        0.00                 _d_sign [135]

-----------------------------------------------

                                                     <spontaneous>
[136]    0.1    0.01        0.00                 _dgeql2_ [136]

-----------------------------------------------

                                                     <spontaneous>
[137]    0.1    0.01        0.00                 _dorgl2_ [137]

-----------------------------------------------

                                                     <spontaneous>
[138]    0.1    0.01        0.00                 _dorml2_ [138]

-----------------------------------------------

                                                     <spontaneous>
[139]    0.1    0.01        0.00                 _dormlq_ [139]

-----------------------------------------------

                                                     <spontaneous>
[140]    0.1    0.01        0.00                 _free [140]

-----------------------------------------------

                                                     <spontaneous>
[141]    0.1    0.01        0.00                 _gemvMlt8 [141]

-----------------------------------------------

                                                     <spontaneous>
[142]    0.1    0.01        0.00                 _ger_Mle8 [142]

-----------------------------------------------

                                                     <spontaneous>
[143]    0.1    0.01        0.00                 _minor_collection [143]

-----------------------------------------------

                                                     <spontaneous>
[144]    0.1    0.01        0.00                 _pthread_attr_setdetachstate [144]

-----------------------------------------------

                                                     <spontaneous>
[145]    0.1    0.01        0.00                 _scalbn [145]

-----------------------------------------------

                                                     <spontaneous>
[146]    0.1    0.01        0.00                 _stat_alloc [146]

-----------------------------------------------

                0.00        0.00  204939/204939      _caml_c_call [8]
[2632]   0.0    0.00        0.00  204939         _camlidl_lapack_cblas_dscal [2632]

-----------------------------------------------

                0.00        0.00   87830/87830       _caml_c_call [8]
[2633]   0.0    0.00        0.00   87830         _camlidl_lapack_dormql_ [2633]

-----------------------------------------------

                0.00        0.00   56730/56730       _caml_c_call [8]
[2634]   0.0    0.00        0.00   56730         _camlidl_lapack_dormlq_ [2634]

-----------------------------------------------

                0.00        0.00   19152/19152       _caml_c_call [8]
[2635]   0.0    0.00        0.00   19152         _camlidl_lapack_dgesvd_ [2635]

-----------------------------------------------

                0.00        0.00    9600/9600        _caml_c_call [8]
[2636]   0.0    0.00        0.00    9600         _camlidl_lapack_dgelqf_ [2636]

-----------------------------------------------

\f



flat profile:

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.
 
 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
	   else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
	   function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
	   the function in the gprof listing. If the index is
	   in parenthesis it shows where it would appear in
	   the gprof listing if it were to be printed.
\f

granularity: each sample hit covers 4 byte(s) for 0.09% of 11.47 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 13.9       1.60     1.60                             _local_dger_ [1]
  7.8       2.50     0.90                             _region_for_ptr_no_lock [2]
  4.6       3.03     0.53                             _bigarray_set_aux [3]
  3.7       3.45     0.42                             _bigarray_offset [4]
  3.7       3.87     0.42                             _gemvT4x16 [5]
  3.6       4.28     0.41                             _mark_slice [6]
  3.1       4.64     0.36                             _DLACPY [7]
  2.6       4.94     0.30                             _sweep_slice [9]
  2.4       5.22     0.28                             _caml_c_call [8]
  2.3       5.48     0.26                             _adjust_gc_speed [10]
  2.2       5.73     0.25                             _szone_malloc [11]
  1.8       5.94     0.21                             _DLASR [12]
  1.8       6.15     0.21                             _Nla__fromInt_1391 [13]
  1.7       6.35     0.20                             _fl_allocate [14]
  1.7       6.54     0.19                             _bigarray_set_2 [15]
  1.6       6.72     0.18                             _ATL_dJIK0x0x0NN0x0x0_aX_bX [16]
  1.6       6.90     0.18                             _bigarray_get_N [17]
  1.4       7.06     0.16                             _copy_double [18]
  1.1       7.19     0.13                             _ATL_ddot_xp1yp1aXbX [19]
  1.1       7.32     0.13                             _DBDSQR [20]
  1.0       7.44     0.12                             _ATL_dJIK0x0x0NN5x1x16_aX_bX [21]
  1.0       7.56     0.12                             _ATL_dgemv [22]
  1.0       7.68     0.12                             _bigarray_finalize [23]
  1.0       7.79     0.11                             _ATL_dgezero [24]
  0.9       7.89     0.10                             _ATL_dger1_a1_x1_yX [25]
  0.9       7.99     0.10                             _ATL_dtrsmKLLNN [26]
  0.9       8.09     0.10                             _Nla__iter2_486 [27]
  0.9       8.19     0.10                             _sqrt [28]
  0.8       8.28     0.09                             _ATL_dJIK0x0x0NT0x0x0_aX_bX [29]
  0.8       8.37     0.09                             _bigarray_sub [30]
  0.8       8.46     0.09                             _szone_size [31]
  0.6       8.53     0.07                             _alloc_bigarray [32]
  0.6       8.60     0.07                             _bigarray_dim [33]
  0.6       8.67     0.07                             _bigarray_update_proxy [34]
  0.6       8.74     0.07                             _cblas_dgemv [35]
  0.6       8.81     0.07                             _dlamch_ [36]
  0.6       8.88     0.07                             _dlartg_ [37]
  0.6       8.95     0.07                             _free_list_remove_ptr [38]
  0.5       9.01     0.06                             _ATL_dNCmmJIK [39]
  0.5       9.07     0.06                             _ILAENV [40]
  0.5       9.13     0.06                             _Std_exit__code_end [41]
  0.5       9.19     0.06                             _bigarray_reshape [42]
  0.5       9.25     0.06                             _dlarfg_ [43]
  0.4       9.30     0.05                             _ATL_dJIK0x0x0NN1x4x16_aX_bX [44]
  0.4       9.35     0.05                             _ATL_dJIK0x0x0TN0x0x0_aX_bX [45]
  0.4       9.40     0.05                             _ATL_dscal_xp1yp0aXbX [46]
  0.4       9.45     0.05                             _Nla__iDU_1800 [47]
  0.4       9.50     0.05                             _Nla__normMax_1913 [48]
  0.4       9.55     0.05                             _SSQr [49]
  0.4       9.60     0.05                             _bigarray_get_2 [50]
  0.4       9.65     0.05                             _fl_merge_block [51]
  0.4       9.70     0.05                             _frexp [52]
  0.4       9.75     0.05                             _lsame_ [53]
  0.3       9.79     0.04                             _DLARF [54]
  0.3       9.83     0.04                             _alloc_custom [55]
  0.3       9.87     0.04                             _alloc_shr [56]
  0.3       9.91     0.04                             _bigarray_fill [57]
  0.3       9.95     0.04                             _caml_apply2 [58]
  0.3       9.99     0.04                             _f2c_dgemv [59]
  0.3      10.03     0.04                             _malloc_zone_free [60]
  0.3      10.06     0.03                             _ATL_dJIK0x0x0NT5x1x12_aX_bX [61]
  0.3      10.09     0.03                             _ATL_dcopy_xp0yp0aXbX [62]
  0.3      10.12     0.03                             _ATL_dcpsc_xp0yp0aXbX [63]
  0.3      10.15     0.03                             _ATL_ddot_xp0yp0aXbX [64]
  0.3      10.18     0.03                             _ATL_dger [65]
  0.3      10.21     0.03                             _ATL_dptgemm [66]
  0.3      10.24     0.03                             _Ltv__sfsolve_1040 [67]
  0.3      10.27     0.03                             _Ltv__superfastMul_456 [68]
  0.3      10.30     0.03                             _Nla__extractRange_1487 [69]
  0.3      10.33     0.03                             _allocate_block [70]
  0.3      10.36     0.03                             _check_urgent_gc [71]
  0.3      10.39     0.03                             _compare_val [72]
  0.3      10.42     0.03                             _dorm2l_ [73]
  0.3      10.45     0.03                             _free_list_add_ptr [74]
  0.3      10.48     0.03                             _gemv8x4 [75]
  0.3      10.51     0.03                             _gemvT_Nsmall [76]
  0.3      10.54     0.03                             _ger_Nle4 [77]
  0.3      10.57     0.03                             _malloc [78]
  0.3      10.60     0.03                             _malloc_zone_malloc [79]
  0.3      10.63     0.03                             _oldify_one [80]
  0.2      10.65     0.02   328470     0.00     0.00  _camlidl_lapack_cblas_dgemm [83]
  0.2      10.67     0.02    68298     0.00     0.00  _d_transp [85]
  0.2      10.69     0.02                             _ATL_dGEMM2TN [86]
  0.2      10.71     0.02                             _ATL_ddot [87]
  0.2      10.73     0.02                             _Bigarray__dim1_152 [88]
  0.2      10.75     0.02                             _DTRTRS [89]
  0.2      10.77     0.02                             _Nla__matrix2x2_1693 [90]
  0.2      10.79     0.02                             _Nla__ql_2076 [91]
  0.2      10.81     0.02                             _Nla__svd_2301 [92]
  0.2      10.83     0.02                             _Nla__zeros_1459 [93]
  0.2      10.85     0.02                             _Pervasives__min_48 [94]
  0.2      10.87     0.02                             _cblas_dnrm2 [95]
  0.2      10.89     0.02                             _dgesvd_ [96]
  0.2      10.91     0.02                             _dlange_ [97]
  0.2      10.93     0.02                             _f2c_dger [98]
  0.2      10.95     0.02                             _lessequal [99]
  0.2      10.97     0.02                             _szone_free [100]
  0.2      10.99     0.02                             restFP [81]
  0.2      11.01     0.02                             saveFP [82]
  0.1      11.02     0.01   273772     0.00     0.00  _camlidl_lapack_dlacpy_ [101]
  0.1      11.03     0.01    48704     0.00     0.00  _camlidl_lapack_dgeqlf_ [102]
  0.1      11.04     0.01    19963     0.00     0.00  _camlidl_lapack_dtrtrs_ [103]
  0.1      11.05     0.01                             _ATL_apply_tree [104]
  0.1      11.06     0.01                             _ATL_dGEMM2NN [105]
  0.1      11.07     0.01                             _ATL_dJIK0x0x0NN1x1x16_aX_bX [106]
  0.1      11.08     0.01                             _ATL_dJIK0x0x0NT1x4x12_aX_bX [107]
  0.1      11.09     0.01                             _ATL_dJIK0x0x0TN5x1x12_aX_bX [108]
  0.1      11.10     0.01                             _ATL_dcpsc [109]
  0.1      11.11     0.01                             _ATL_dptgemm_nt [110]
  0.1      11.12     0.01                             _ATL_dpttrsm_nt [111]
  0.1      11.13     0.01                             _ATL_dscal [112]
  0.1      11.14     0.01                             _ATL_join_tree [113]
  0.1      11.15     0.01                             _Bigarray__reshape_1_255 [114]
  0.1      11.16     0.01                             _DGEBD2 [115]
  0.1      11.17     0.01                             _DLAPY2 [116]
  0.1      11.18     0.01                             _List__rev_append_74 [117]
  0.1      11.19     0.01                             _Ltv__fastSub_895 [118]
  0.1      11.20     0.01                             _Nla__fun2mat_1529 [119]
  0.1      11.21     0.01                             _Nla__getArrayFromPool_1427 [120]
  0.1      11.22     0.01                             _Nla__lq_2052 [121]
  0.1      11.23     0.01                             _Nla__noOfCols_1415 [122]
  0.1      11.24     0.01                             _Nla__partition2x1_1574 [123]
  0.1      11.25     0.01                             _Nla__partitionInfx1_1620 [124]
  0.1      11.26     0.01                             _Nla__rowScale_2366 [125]
  0.1      11.27     0.01                             _Nla__setToL_1771 [126]
  0.1      11.28     0.01                             _Nla__transp_1890 [127]
  0.1      11.29     0.01                             _SSQ [128]
  0.1      11.30     0.01                             _bigarray_num_elts [129]
  0.1      11.31     0.01                             _caml_apply12 [130]
  0.1      11.32     0.01                             _caml_apply14 [131]
  0.1      11.33     0.01                             _caml_apply9 [132]
  0.1      11.34     0.01                             _caml_curry3_1 [133]
  0.1      11.35     0.01                             _cblas_dgemm [134]
  0.1      11.36     0.01                             _d_sign [135]
  0.1      11.37     0.01                             _dgeql2_ [136]
  0.1      11.38     0.01                             _dorgl2_ [137]
  0.1      11.39     0.01                             _dorml2_ [138]
  0.1      11.40     0.01                             _dormlq_ [139]
  0.1      11.41     0.01                             _free [140]
  0.1      11.42     0.01                             _gemvMlt8 [141]
  0.1      11.43     0.01                             _ger_Mle8 [142]
  0.1      11.44     0.01                             _minor_collection [143]
  0.1      11.45     0.01                             _pthread_attr_setdetachstate [144]
  0.1      11.46     0.01                             _scalbn [145]
  0.1      11.47     0.01                             _stat_alloc [146]
  0.0      11.47     0.00   204939     0.00     0.00  _camlidl_lapack_cblas_dscal [2632]
  0.0      11.47     0.00    87830     0.00     0.00  _camlidl_lapack_dormql_ [2633]
  0.0      11.47     0.00    68298     0.00     0.00  _camlidl_lapack_d_transp [84]
  0.0      11.47     0.00    56730     0.00     0.00  _camlidl_lapack_dormlq_ [2634]
  0.0      11.47     0.00    19152     0.00     0.00  _camlidl_lapack_dgesvd_ [2635]
  0.0      11.47     0.00     9600     0.00     0.00  _camlidl_lapack_dgelqf_ [2636]
\f
Index by function name

 [104] _ATL_apply_tree      [90] _Nla__matrix2x2_169  [71] _check_urgent_gc   
 [105] _ATL_dGEMM2NN       [122] _Nla__noOfCols_1415  [72] _compare_val       
  [86] _ATL_dGEMM2TN        [48] _Nla__normMax_1913   [18] _copy_double       
  [16] _ATL_dJIK0x0x0NN0x0 [123] _Nla__partition2x1_ [135] _d_sign            
 [106] _ATL_dJIK0x0x0NN1x1 [124] _Nla__partitionInfx  [85] _d_transp          
  [44] _ATL_dJIK0x0x0NN1x4  [91] _Nla__ql_2076       [136] _dgeql2_           
  [21] _ATL_dJIK0x0x0NN5x1 [125] _Nla__rowScale_2366  [96] _dgesvd_           
  [29] _ATL_dJIK0x0x0NT0x0 [126] _Nla__setToL_1771    [36] _dlamch_           
 [107] _ATL_dJIK0x0x0NT1x4  [92] _Nla__svd_2301       [97] _dlange_           
  [61] _ATL_dJIK0x0x0NT5x1 [127] _Nla__transp_1890    [43] _dlarfg_           
  [45] _ATL_dJIK0x0x0TN0x0  [93] _Nla__zeros_1459     [37] _dlartg_           
 [108] _ATL_dJIK0x0x0TN5x1  [94] _Pervasives__min_48 [137] _dorgl2_           
  [39] _ATL_dNCmmJIK       [128] _SSQ                 [73] _dorm2l_           
  [62] _ATL_dcopy_xp0yp0aX  [49] _SSQr               [138] _dorml2_           
 [109] _ATL_dcpsc           [41] _Std_exit__code_end [139] _dormlq_           
  [63] _ATL_dcpsc_xp0yp0aX  [10] _adjust_gc_speed     [59] _f2c_dgemv         
  [87] _ATL_ddot            [32] _alloc_bigarray      [98] _f2c_dger          
  [64] _ATL_ddot_xp0yp0aXb  [55] _alloc_custom        [14] _fl_allocate       
  [19] _ATL_ddot_xp1yp1aXb  [56] _alloc_shr           [51] _fl_merge_block    
  [22] _ATL_dgemv           [70] _allocate_block     [140] _free              
  [65] _ATL_dger            [33] _bigarray_dim        [74] _free_list_add_ptr 
  [25] _ATL_dger1_a1_x1_yX  [57] _bigarray_fill       [38] _free_list_remove_p
  [24] _ATL_dgezero         [23] _bigarray_finalize   [52] _frexp             
  [66] _ATL_dptgemm         [50] _bigarray_get_2      [75] _gemv8x4           
 [110] _ATL_dptgemm_nt      [17] _bigarray_get_N     [141] _gemvMlt8          
 [111] _ATL_dpttrsm_nt     [129] _bigarray_num_elts    [5] _gemvT4x16         
 [112] _ATL_dscal            [4] _bigarray_offset     [76] _gemvT_Nsmall      
  [46] _ATL_dscal_xp1yp0aX  [42] _bigarray_reshape   [142] _ger_Mle8          
  [26] _ATL_dtrsmKLLNN      [15] _bigarray_set_2      [77] _ger_Nle4          
 [113] _ATL_join_tree        [3] _bigarray_set_aux    [99] _lessequal         
  [88] _Bigarray__dim1_152  [30] _bigarray_sub         [1] _local_dger_       
 [114] _Bigarray__reshape_  [34] _bigarray_update_pr  [53] _lsame_            
  [20] _DBDSQR             [130] _caml_apply12        [78] _malloc            
 [115] _DGEBD2             [131] _caml_apply14        [60] _malloc_zone_free  
   [7] _DLACPY              [58] _caml_apply2         [79] _malloc_zone_malloc
 [116] _DLAPY2             [132] _caml_apply9          [6] _mark_slice        
  [54] _DLARF                [8] _caml_c_call        [143] _minor_collection  
  [12] _DLASR              [133] _caml_curry3_1       [80] _oldify_one        
  [89] _DTRTRS              [83] _camlidl_lapack_cbl [144] _pthread_attr_setde
  [40] _ILAENV            [2632] _camlidl_lapack_cbl   [2] _region_for_ptr_no_
 [117] _List__rev_append_7  [84] _camlidl_lapack_d_t [145] _scalbn            
 [118] _Ltv__fastSub_895  [2636] _camlidl_lapack_dge  [28] _sqrt              
  [67] _Ltv__sfsolve_1040  [102] _camlidl_lapack_dge [146] _stat_alloc        
  [68] _Ltv__superfastMul_[2635] _camlidl_lapack_dge   [9] _sweep_slice       
  [69] _Nla__extractRange_ [101] _camlidl_lapack_dla [100] _szone_free        
  [13] _Nla__fromInt_1391 [2634] _camlidl_lapack_dor  [11] _szone_malloc      
 [119] _Nla__fun2mat_1529 [2633] _camlidl_lapack_dor  [31] _szone_size        
 [120] _Nla__getArrayFromP [103] _camlidl_lapack_dtr  [81] restFP             
  [47] _Nla__iDU_1800      [134] _cblas_dgemm         [82] saveFP             
  [27] _Nla__iter2_486      [35] _cblas_dgemv       
 [121] _Nla__lq_2052        [95] _cblas_dnrm2       

[-- Attachment #3: Type: text/plain, Size: 1794 bytes --]






> Subject :Re: [Caml-list] Slow GC problem
> From :Damien Doligez < Damien.Doligez@inria.fr >
> Date: Tue, 8 Apr 2003 12:23:46 +0200
> Cc: caml-list@inria.fr
> In-Reply-To: < 3C821F52-66D5-11D7-A265-000393942C76@ece.ucsb.edu >
> > I have a gc efficiency problem for which I require some advice. I 
> have
> > read both the O'Reilly book and the manual on gc.
> [...]
> >  Below I give the gc stats just before and after the solver routine 
> is
> > called in the in-core solver:
> >
> >                               "Just before"   "Just after"
> > minor_words:          46243376        139259767
> > promoted_words:       928267          2595523
> > major_words:          2883087         39489766
> > minor_collections:    1412            4591
> > major_collections:    18                      52
> > heap_words:           2150400         1044480
> > heap_chunks:          35                      17
> > top_heap_words:       2150400         5038080
> > live_words:           1842373         840037
> > live_blocks:          253926          116816
> > free_words:           307180          204440
> > free_blocks:          47368           17
> > largest_free:         10928           61440
> > fragments:            847                     3
> > compactions:          0                       2
>
> As others have said, this is not really enough information to tell
> what is going on.  What we can say from the above is:
>
> 1. You are allocating lots and lots of data structures in the major
>     heap (maybe finalized bigarray descriptors)
> 2. The compactor was called twice, which may indicate that you have
>     a fragmentation problem.
> 3. The compactor was called near the end of the solver routine,
>     which must have erased most of the evidence...
>
> -- Damien 
--shiv--

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [Caml-list] Slow GC problem
@ 2003-10-23  8:17 Chris Hecker
  0 siblings, 0 replies; 15+ messages in thread
From: Chris Hecker @ 2003-10-23  8:17 UTC (permalink / raw)
  To: Shivkumar Chandrasekaran, caml-list


>What if I modified bigarray_stubs.c to use the malloc and free calls of

>the Boehm gc (6.1-4) garbage collector? My reasoning is that malloc is 
>performing poorly due to fragmentation, and switching to a gc'd version

>might help out.
>Before I try this I would like some feedback from the list on the 
>soundness of this idea.

I don't mean to be a nag, but did you profile your application yet?  A
very 
wise programmer once said, "Assume Nothing".

Chris


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives:
http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-10-23  8:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-04 19:40 [Caml-list] Slow GC problem Shivkumar Chandrasekaran
2003-04-03 21:07 ` Christophe Raffalli
2003-04-07 17:53 ` Shivkumar Chandrasekaran
2003-04-07 19:08   ` Chris Hecker
2003-04-08  7:15     ` David Monniaux
2003-04-08 10:28   ` Damien Doligez
2003-04-08 23:03     ` Shivkumar Chandrasekaran
2003-04-08 10:23 ` Damien Doligez
2003-04-10 21:21   ` Shivkumar Chandrasekaran
2003-04-10 21:51     ` Brian Hurt
2003-04-11  7:10     ` Chris Hecker
2003-04-11  7:58       ` Christophe Raffalli
2003-04-11 16:35         ` Shivkumar Chandrasekaran
2003-04-14 16:37 Shivkumar Chandrasekaran
2003-10-23  8:17 Chris Hecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).