caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Profiling a function execution
@ 2003-11-13 22:34 Daniel Bünzli
  2003-11-13 23:53 ` Brian Hurt
  2003-11-25 18:05 ` Xavier Leroy
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Bünzli @ 2003-11-13 22:34 UTC (permalink / raw)
  To: caml-list

Hello,

To compare different implementations of a function I would like to 
profile its execution in time and heap memory usage (at least orders of 
magnitude). To do so, I use the code below, relying on the Gc and Unix 
libraries.

I have the following questions :

1) When I start profiling should I prefer a Gc.compact to Gc.full_major 
?

2) Unix.times seems to have a low resolution, which means that my 
timings are often 0.0 (unless I execute the function a lot of times). I 
don't want to use Unix.gettimeofday because this prevents me to make 
the difference between user and system time. Has anybody bindings to 
the getrusage function or another idea ?

3) What is the accuracy of these results ? E.g. I read in the 
documentation of the Gc module that the field minor_words is only an 
approximation in programs compiled to native code. Is it also true for 
the other fields ? How much can I trust the figures I get ?

4) Is it possible to know at runtime whether we are running native code 
or interpreted bytecode ?

Thanks for your answers,

Daniel


-- profile.ml --

type t = { minor_bytes : float;
	   promoted_bytes : float;
	   major_bytes : float;
	   allocated_bytes : float;
	
	   minor_collections : float;
	   major_collections : float;
	
	   user_time : float;
	   system_time : float }

(* Bytes per words *)
let bpw = float_of_int (Sys.word_size / 8)

(* Heap allocation overhead due to profiling *)
let heap_overhead =
   let s = Gc.stat() in
   ignore(Unix.times());
   ignore(Unix.times());
   let s' = Gc.stat() in
   ((s'.Gc.minor_words +. s'.Gc.major_words -. s'.Gc.promoted_words) -.
   (s.Gc.minor_words +. s.Gc.major_words -. s.Gc.promoted_words)) *. bpw

let execution_mean (f : 'a -> 'b) (arg : 'a) n =
   Gc.full_major ();
   let s = Gc.stat () in
   let t = Unix.times () in
   for i = 1 to n do
     ignore(f arg)
   done;
   let t' = Unix.times () in
   let s' = Gc.stat () in
   let mi, pro, ma =
     ((s'.Gc.minor_words -. s.Gc.minor_words) *. bpw) -. heap_overhead,
     (s'.Gc.promoted_words -. s.Gc.promoted_words) *. bpw,
     (s'.Gc.major_words -. s.Gc.major_words) *. bpw in
   let n' = float_of_int n in
   { minor_bytes = mi /. n';
     promoted_bytes = pro /. n';
     major_bytes = ma /. n';
     allocated_bytes = (mi +. ma -. pro) /. n';

     minor_collections =
       (float_of_int (s'.Gc.minor_collections - s.Gc.minor_collections)) 
/. n';
     major_collections =
       (float_of_int (s'.Gc.major_collections - s.Gc.major_collections)) 
/. n';

     user_time = (t'.Unix.tms_utime -. t.Unix.tms_utime) /. n';
     system_time = (t'.Unix.tms_stime -. t.Unix.tms_stime) /. n'
   }

let execution (f : 'a -> 'b) (arg : 'a) = execution_mean f arg 10_000

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-13 22:34 [Caml-list] Profiling a function execution Daniel Bünzli
@ 2003-11-13 23:53 ` Brian Hurt
  2003-11-14 12:23   ` Ville-Pertti Keinonen
  2003-11-14 12:33   ` Richard Jones
  2003-11-25 18:05 ` Xavier Leroy
  1 sibling, 2 replies; 11+ messages in thread
From: Brian Hurt @ 2003-11-13 23:53 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: Ocaml Mailing List

On Thu, 13 Nov 2003, Daniel Bünzli wrote:

> 2) Unix.times seems to have a low resolution, which means that my 
> timings are often 0.0 (unless I execute the function a lot of times). I 
> don't want to use Unix.gettimeofday because this prevents me to make 
> the difference between user and system time. Has anybody bindings to 
> the getrusage function or another idea ?

Clocks on computers have a 10ms quantiziation- i.e., they measure 
everything as multiples of 10 milliseconds.  All measurements are accurate 
only up to about +/- 10 milliseconds.  I.e. if times reports that 
something takes 50 milliseconds, it probably takes more than 40 but less 
than 60 milliseconds.

My recommendation if you want to time something that takes less than 
seconds to do, is to do it lots of times.  You want the amount of time 
between two calls to times to be at least seconds, if not tens of seconds 
or minutes.  You then divide the total amount of time taken by the number 
of times the operation was executed, to get the time per execution.  

Is there a wrapper to the clock function kicking around?

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-13 23:53 ` Brian Hurt
@ 2003-11-14 12:23   ` Ville-Pertti Keinonen
  2003-11-14 12:33   ` Richard Jones
  1 sibling, 0 replies; 11+ messages in thread
From: Ville-Pertti Keinonen @ 2003-11-14 12:23 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Daniel Bünzli, Ocaml Mailing List

On Thu, Nov 13, 2003 at 05:53:01PM -0600, Brian Hurt wrote:

> Clocks on computers have a 10ms quantiziation- i.e., they measure 
> everything as multiples of 10 milliseconds.  All measurements are accurate 

No they don't, hardware timers usually run at at more than a MHz.  Most
operating systems provide interfaces with limited precision (this usually
varies between 100 and 1000 Hz), but on systems that actually read the
clock and don't just count interrupts, getrusage is much more accurate
than times.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-13 23:53 ` Brian Hurt
  2003-11-14 12:23   ` Ville-Pertti Keinonen
@ 2003-11-14 12:33   ` Richard Jones
  2003-11-15 11:11     ` Wolfgang Lux
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Jones @ 2003-11-14 12:33 UTC (permalink / raw)
  Cc: Ocaml Mailing List

If you're using a Pentium-class machine, then a nice trick is to wrap
the RDTSC assembly instruction and call that. It returns number of
clock cycles elapsed (a 64 bit number IIRC). Here's some code to get
you started:

#ifdef linux
#include <sys/bitypes.h>
#endif

extern inline u_int64_t
rdtsc (void)
{
  u_int32_t hi, lo;
#ifdef linux
  __asm__ ("rdtsc" : "=d" (hi),  "=a" (lo));
#else
  __asm__ (".byte 0x0f,0x31" : "=d" (hi),  "=a" (lo));
#endif
  return ((u_int64_t) hi) << 32 | lo;
}

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://freshmeat.net/users/rwmj
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
"My karma ran over your dogma"

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-14 12:33   ` Richard Jones
@ 2003-11-15 11:11     ` Wolfgang Lux
  2003-11-15 12:21       ` David MENTRE
  2003-11-15 12:54       ` Richard Jones
  0 siblings, 2 replies; 11+ messages in thread
From: Wolfgang Lux @ 2003-11-15 11:11 UTC (permalink / raw)
  To: Richard Jones; +Cc: Ocaml Mailing List

Richard Jones wrote:

> If you're using a Pentium-class machine, then a nice trick is to wrap
> the RDTSC assembly instruction and call that. It returns number of
> clock cycles elapsed (a 64 bit number IIRC). Here's some code to get
> you started:

This looks like complete nonsense to me. This will neither distinguish
user and system mode execution times. Even worse, clock cycles will
not have any meaning if the operating system chooses to perform a
task switch while the profiled function is executed. It is for such
reasons that Unix systems have an getrusage call.

Regards
Wolfgang

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-15 11:11     ` Wolfgang Lux
@ 2003-11-15 12:21       ` David MENTRE
  2003-11-15 12:54       ` Richard Jones
  1 sibling, 0 replies; 11+ messages in thread
From: David MENTRE @ 2003-11-15 12:21 UTC (permalink / raw)
  To: Wolfgang Lux; +Cc: Richard Jones, Ocaml Mailing List

Hello,

Wolfgang Lux <wlux@uni-muenster.de> writes:

> This looks like complete nonsense to me. This will neither distinguish
> user and system mode execution times. Even worse, clock cycles will
> not have any meaning if the operating system chooses to perform a
> task switch while the profiled function is executed.

You're right. And the better approach might be to use profiling
architectures like OProfile (http://oprofile.sourceforge.net/news/).

Yours,
d.
-- 
 David Mentré <dmentre@linux-france.org>
   http://www.linux-france.org/~dmentre/david-mentre-public-key.asc
 GnuPG key fingerprint: A7CD 7357 3EC4 1163 745B  7FD3 FB3E AD7C 2A18 BE9E

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-15 11:11     ` Wolfgang Lux
  2003-11-15 12:21       ` David MENTRE
@ 2003-11-15 12:54       ` Richard Jones
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Jones @ 2003-11-15 12:54 UTC (permalink / raw)
  Cc: Ocaml Mailing List

On Sat, Nov 15, 2003 at 12:11:17PM +0100, Wolfgang Lux wrote:
> Richard Jones wrote:
> 
> >If you're using a Pentium-class machine, then a nice trick is to wrap
> >the RDTSC assembly instruction and call that. It returns number of
> >clock cycles elapsed (a 64 bit number IIRC). Here's some code to get
> >you started:
> 
> This looks like complete nonsense to me. This will neither distinguish
> user and system mode execution times. Even worse, clock cycles will
> not have any meaning if the operating system chooses to perform a
> task switch while the profiled function is executed.

Obviously. It was only a suggestion after all ...

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://freshmeat.net/users/rwmj
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
'There is a joke about American engineers and French engineers. The
American team brings a prototype to the French team. The French team's
response is: "Well, it works fine in practice; but how will it hold up
in theory?"'

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-13 22:34 [Caml-list] Profiling a function execution Daniel Bünzli
  2003-11-13 23:53 ` Brian Hurt
@ 2003-11-25 18:05 ` Xavier Leroy
  2003-11-25 21:54   ` Daniel Bünzli
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Xavier Leroy @ 2003-11-25 18:05 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

Just to complement Damien's answers:

> 2) Unix.times seems to have a low resolution, which means that my 
> timings are often 0.0 (unless I execute the function a lot of times). I 
> don't want to use Unix.gettimeofday because this prevents me to make 
> the difference between user and system time. Has anybody bindings to 
> the getrusage function or another idea ?

At least under Linux, getrusage() doesn't provide more accuracy than
times(), since the kernel maintains user and system time as an
integral number of clock ticks.  I haven't looked at other Unix
kernels, but I suspect that the additional precision made possible by
the getrusage() syscall is simply not exploited.  

(Besides, times() is part of the POSIX standard while getrusage() is not,
meaning less portability.)

> 4) Is it possible to know at runtime whether we are running native code 
> or interpreted bytecode ?

Some clever hacks involving external C functions were given on this list.
I'm not sure I would endorse them :-)  But your question begs another
question: why would you need to do distinguish native code from
bytecode?  We're working hard on removing the last discrepancies
between the two compilers...

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-25 18:05 ` Xavier Leroy
@ 2003-11-25 21:54   ` Daniel Bünzli
  2003-11-25 22:38   ` Kim Nguyen
  2003-11-26 12:23   ` Ville-Pertti Keinonen
  2 siblings, 0 replies; 11+ messages in thread
From: Daniel Bünzli @ 2003-11-25 21:54 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Damien Doligez, caml-list

Le 25 nov. 03, à 19:05, Xavier Leroy a écrit :

> I haven't looked at other Unix
> kernels, but I suspect that the additional precision made possible by
> the getrusage() syscall is simply not exploited.
I have done tests using the C code appended to this email.

Indeed with the Darwin Kernel 7.0.0 (MacOS X), you don't get extra 
precision with the figures returned by getrusage(). I was mistaken by 
the potential of the data structure described in the man page (there's 
no info about the actual granularity there).

You can remove my wish (id=1937) for getrusage bindings from your 
database.

> your question begs another
> question: why would you need to do distinguish native code from
> bytecode?
I just wanted a quick way to distinguish, on output, my bytecode 
profiles from native ones. This can of course be handled very easily 
with a preprocessor, I was just wondering if I had missed a flag in the 
style of Sys.interactive.

>   We're working hard on removing the last discrepancies
> between the two compilers...
One could argue that since there exists some discrepancies such a flag 
should exist. However I doubt its usefulness in common practice (which 
is why I didn't file a feature wish for that one). Furthermore, if it 
doesn't exist, removing the last discrepancies becomes a must...

Thanks for your answers,

Daniel


--- test.c ---
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/types.h>
#include <sys/times.h>
#include <sys/time.h>
#include <sys/resource.h>

#define NUM_OPS ULONG_MAX / 100

void idle (void)
{
   unsigned long i;
   int op;

   for (i = 0; i < NUM_OPS; i++) op = 1+1;
}

void diff_times (void)
{
   struct tms t1, t2;

   times(&t1);
   idle();
   times(&t2);
   printf("times: %fs user %fs sys\n",
	 (double)(t2.tms_utime-t1.tms_utime) / CLK_TCK,
	 (double)(t2.tms_stime-t1.tms_stime) / CLK_TCK);
}

void diff_getrusage(void)
{
   struct rusage t1, t2;

   getrusage(RUSAGE_SELF, &t1);
   idle();
   getrusage(RUSAGE_SELF, &t2);
   printf("getrusage: %fs user %fs sys\n",
	 (double)t2.ru_utime.tv_sec + (double)t2.ru_utime.tv_usec / 1e6 -
	 ((double)t1.ru_utime.tv_sec + (double)t1.ru_utime.tv_usec / 1e6),
	 (double)t2.ru_stime.tv_sec + (double)t2.ru_stime.tv_usec / 1e6 -
	 ((double)t1.ru_stime.tv_sec + (double)t1.ru_stime.tv_usec / 1e6));
	
}

int main(int argc, char **argv)
{
   diff_times();
   diff_getrusage();
   return 0;
}
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-25 18:05 ` Xavier Leroy
  2003-11-25 21:54   ` Daniel Bünzli
@ 2003-11-25 22:38   ` Kim Nguyen
  2003-11-26 12:23   ` Ville-Pertti Keinonen
  2 siblings, 0 replies; 11+ messages in thread
From: Kim Nguyen @ 2003-11-25 22:38 UTC (permalink / raw)
  To: caml-list; +Cc: Xavier Leroy

[-- Attachment #1: Type: text/plain, Size: 665 bytes --]

On Tue, 25 Nov 2003 19:05:53 +0100
Xavier Leroy <xavier.leroy@inria.fr> wrote:

Hello,

> Some clever hacks involving external C functions were given on this list.
> I'm not sure I would endorse them :-)

Is this for a "political" reason (this 'side effect' of the compiler should not be
exposed to the casual user or can this hack lead to an unsafety of some kind ?
(in code generation or something else ?)

> We're working hard on removing the last discrepancies
> between the two compilers...

Does this include a Dynlink module for native code ? ( or is such a module
incompatible with the design of the compiler or with type safety ? )

Regards,

Kim Nguyen.


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Profiling a function execution
  2003-11-25 18:05 ` Xavier Leroy
  2003-11-25 21:54   ` Daniel Bünzli
  2003-11-25 22:38   ` Kim Nguyen
@ 2003-11-26 12:23   ` Ville-Pertti Keinonen
  2 siblings, 0 replies; 11+ messages in thread
From: Ville-Pertti Keinonen @ 2003-11-26 12:23 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Daniel Bünzli, caml-list

On Tue, Nov 25, 2003 at 07:05:53PM +0100, Xavier Leroy wrote:

> At least under Linux, getrusage() doesn't provide more accuracy than
> times(), since the kernel maintains user and system time as an
> integral number of clock ticks.  I haven't looked at other Unix
> kernels, but I suspect that the additional precision made possible by
> the getrusage() syscall is simply not exploited.  

4.4 BSD derivatives (except Darwin/MacOS X).

O^HD^HTru64.

> (Besides, times() is part of the POSIX standard while getrusage() is not,
> meaning less portability.)

You can always use something like #ifdef HAS_GETRUSAGE

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-11-26 12:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-13 22:34 [Caml-list] Profiling a function execution Daniel Bünzli
2003-11-13 23:53 ` Brian Hurt
2003-11-14 12:23   ` Ville-Pertti Keinonen
2003-11-14 12:33   ` Richard Jones
2003-11-15 11:11     ` Wolfgang Lux
2003-11-15 12:21       ` David MENTRE
2003-11-15 12:54       ` Richard Jones
2003-11-25 18:05 ` Xavier Leroy
2003-11-25 21:54   ` Daniel Bünzli
2003-11-25 22:38   ` Kim Nguyen
2003-11-26 12:23   ` Ville-Pertti Keinonen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).