caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 15:15 Memory allocation nano-benchmark Christian Szegedy
@ 2005-02-10 14:47 ` Frédéric Gava
  2005-02-10 15:19   ` skaller
  2005-02-10 14:56 ` Jon Harrop
  2005-02-10 14:59 ` John Prevost
  2 siblings, 1 reply; 31+ messages in thread
From: Frédéric Gava @ 2005-02-10 14:47 UTC (permalink / raw)
  To: caml-list

Hi,

I think it is due to tablesize^3 applications of functions that are not
present in the C code. But the number of line codes makes me prefer the
ocaml one.

Regards,
Frédéric Gava

> Let us look at another example where ocaml not really shines:
>
> mem.ml:
>
> let test tablesize =
>    let table =
>      Array.init tablesize (fun i ->
>        Array.init tablesize (fun j ->
>          Array.init tablesize (fun k ->
>            ((i+1)*(j+1)*(k+1))))) in ()
>
> let _ = test (int_of_string Sys.argv.(1))
>
>
> mem.c:
>
> void test(int tablesize)
> {
>
>     int i;
>     int ***table = (int ***)malloc(tablesize *sizeof(int **));
>
>     for( i=0; i<tablesize ; i++ )
>     {
>        int j;
>        table[i] = (int **)malloc(tablesize * sizeof(int *));
>
>        for ( j=0 ; j < tablesize ; j++ )
>        {
>           int k;
>           table[i][j] = (int *)malloc(tablesize * sizeof(int));
>
>           for ( k =0 ; k < tablesize ; k++ )
>           {
>              table[i][j][k] = (i+1)*(j+1)*(k+1);
>           }
>        }
>     }
> }
>
>
> int main(int argc,char *argv[])
> {
>     int i;
>
>     if( argc < 2 ) { return -1; }
>
>     test(atoi(argv[1]));
>
>     return 0;
> }
>
> On an old sub-Ghz Pentium laptop:
>
> ocamlopt -unsafe -inline 40 mem.ml
> time a.out 500
> real    0m4.229s
> user    0m3.870s
> sys     0m0.360s
>
>
> gcc -O3 -fomit-frame-pointer mem.c
> time a.out 300
> real    0m0.935s
> user    0m0.420s
> sys     0m0.500s
>
>
>
> On a new Opteron box (OK, it's cheating, becasuse OCamls ints
> are 64 bits, but there is still no disk-cache involved since
> it has 64gigs :)):
>
> gcc -O3 -fomit-frame-pointer mem.c
> time ./a.out 500
> real    0m0.803s
> user    0m0.224s
> sys     0m0.575s
>
>
> ocamlopt -unsafe -inline 40 mem.ml
> time ./a.out 500
> real    0m8.651s
> user    0m7.270s
> sys     0m1.357s
>
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 15:15 Memory allocation nano-benchmark Christian Szegedy
  2005-02-10 14:47 ` [Caml-list] " Frédéric Gava
@ 2005-02-10 14:56 ` Jon Harrop
  2005-02-10 15:32   ` Ville-Pertti Keinonen
  2005-02-10 14:59 ` John Prevost
  2 siblings, 1 reply; 31+ messages in thread
From: Jon Harrop @ 2005-02-10 14:56 UTC (permalink / raw)
  To: caml-list

On Thursday 10 February 2005 15:15, Christian Szegedy wrote:
> Let us look at another example where ocaml not really shines:
> ...

I brought this up a while ago. IIRC, performance is poor here because of:

1. Run-time polymorphism in Array.init.
2. Blitting the array twice (once for initialisation, once for filling).

I believe the former could be fixed by running source through a whole-program 
preprocessor to remove all polymorphism.

The latter could be addressed by extending the functionality of the GC to 
permit extensible arrays. However, this is tricky, error-prone work on a 
safety-critical part of the compiler.

PS: Note that this is not a test of "memory allocation" as you state but, 
rather, it is mainly a test of the cost of run-time polymorphism. Remove the 
polymorphism for a better test of memory allocation.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 15:15 Memory allocation nano-benchmark Christian Szegedy
  2005-02-10 14:47 ` [Caml-list] " Frédéric Gava
  2005-02-10 14:56 ` Jon Harrop
@ 2005-02-10 14:59 ` John Prevost
  2005-02-10 16:50   ` Marwan Burelle
  2 siblings, 1 reply; 31+ messages in thread
From: John Prevost @ 2005-02-10 14:59 UTC (permalink / raw)
  To: caml-list

On Thu, 10 Feb 2005 16:15:10 +0100, Christian Szegedy
<szegedy@or.uni-bonn.de> wrote:
> Let us look at another example where ocaml not really shines:

Er.  Or perhaps we should not?  I could not imagine writing anything
even vaguely similar to these examples in either C or in O'Caml.

Not to mention the serious problem with evaluating memory allocation
overhead by comparing programs that allocate massive amounts of memory
but never use or release any of it.

In a program that allocates many small short-lived chunks of memory, I
suspect you will find that *in practice*, O'Caml performs better than
C.

In a program that allocates one very large chunk of memory, I suspect
you will find that both C and O'Caml do a lot better when... you
allocated as one very large chunk of memory (or, if need be, a *tiny*
number of large chunks) instead of as many small chunks of memory.

John.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Memory allocation nano-benchmark.
@ 2005-02-10 15:15 Christian Szegedy
  2005-02-10 14:47 ` [Caml-list] " Frédéric Gava
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Christian Szegedy @ 2005-02-10 15:15 UTC (permalink / raw)
  To: caml-list

Let us look at another example where ocaml not really shines:

mem.ml:

let test tablesize =
   let table =
     Array.init tablesize (fun i ->
       Array.init tablesize (fun j ->
         Array.init tablesize (fun k ->
           ((i+1)*(j+1)*(k+1))))) in ()

let _ = test (int_of_string Sys.argv.(1))


mem.c:

void test(int tablesize)
{

    int i;
    int ***table = (int ***)malloc(tablesize *sizeof(int **));

    for( i=0; i<tablesize ; i++ )
    {
       int j;
       table[i] = (int **)malloc(tablesize * sizeof(int *));

       for ( j=0 ; j < tablesize ; j++ )
       {
          int k;
          table[i][j] = (int *)malloc(tablesize * sizeof(int));

          for ( k =0 ; k < tablesize ; k++ )
          {
             table[i][j][k] = (i+1)*(j+1)*(k+1);
          }
       }
    }
}


int main(int argc,char *argv[])
{
    int i;

    if( argc < 2 ) { return -1; }

    test(atoi(argv[1]));

    return 0;
}

On an old sub-Ghz Pentium laptop:

ocamlopt -unsafe -inline 40 mem.ml
time a.out 500
real    0m4.229s
user    0m3.870s
sys     0m0.360s


gcc -O3 -fomit-frame-pointer mem.c
time a.out 300
real    0m0.935s
user    0m0.420s
sys     0m0.500s



On a new Opteron box (OK, it's cheating, becasuse OCamls ints
are 64 bits, but there is still no disk-cache involved since
it has 64gigs :)):

gcc -O3 -fomit-frame-pointer mem.c
time ./a.out 500
real    0m0.803s
user    0m0.224s
sys     0m0.575s


ocamlopt -unsafe -inline 40 mem.ml
time ./a.out 500
real    0m8.651s
user    0m7.270s
sys     0m1.357s




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 14:47 ` [Caml-list] " Frédéric Gava
@ 2005-02-10 15:19   ` skaller
  2005-02-10 16:36     ` Frédéric Gava
  0 siblings, 1 reply; 31+ messages in thread
From: skaller @ 2005-02-10 15:19 UTC (permalink / raw)
  To: Frédéric Gava; +Cc: caml-list

On Fri, 2005-02-11 at 01:47, Frédéric Gava wrote:
> Hi,
> 
> I think it is due to tablesize^3 applications of functions that are not
> present in the C code. 

[skaller@pelican] ~>time ./a.out 250 
real    0m3.303s
user    0m2.850s
sys     0m0.350s

Using:

let test tablesize =
  let table =
    Array.init tablesize (fun i ->
    Array.init tablesize (fun j ->
    Array.create tablesize 0))
  in
    for i = 0 to tablesize - 1 do
    for j = 0 to tablesize - 1 do
    for k = 0 to tablesize - 1 do
    table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
    done done done
;;
                
let _ = test (int_of_string Sys.argv.(1))

[skaller@pelican] ~>time ./xmem 250
real    0m3.327s
user    0m2.760s
sys     0m0.300s

[skaller@pelican] ~>time ./cmem 250 
real    0m0.706s
user    0m0.150s
sys     0m0.450s

[I can't run it with >250 without the disk going nuts]

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 14:56 ` Jon Harrop
@ 2005-02-10 15:32   ` Ville-Pertti Keinonen
  0 siblings, 0 replies; 31+ messages in thread
From: Ville-Pertti Keinonen @ 2005-02-10 15:32 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Thu, 2005-02-10 at 14:56 +0000, Jon Harrop wrote:

> 1. Run-time polymorphism in Array.init.
 ...
> I believe the former could be fixed by running source through a whole-program 
> preprocessor to remove all polymorphism.

It's not just the polymorphism, but also the creation of the closure.
Better inlining could address these, without changing the compilation
model quite that much.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 15:19   ` skaller
@ 2005-02-10 16:36     ` Frédéric Gava
  2005-02-10 17:56       ` Frédéric Gava
  2005-02-11  0:55       ` skaller
  0 siblings, 2 replies; 31+ messages in thread
From: Frédéric Gava @ 2005-02-10 16:36 UTC (permalink / raw)
  Cc: caml-list

Hi,

> let test tablesize =
>   let table =
>     Array.init tablesize (fun i ->
>     Array.init tablesize (fun j ->
>     Array.create tablesize 0))
>   in
>     for i = 0 to tablesize - 1 do
>     for j = 0 to tablesize - 1 do
>     for k = 0 to tablesize - 1 do
>     table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
>     done done done
Here, you have tablesize^2 applications and time to create the closure(s),
so it is not a good example. Peraps test the following code (I do not have
test it)
  let tmp1=Array.create tablesize 0 in
  let tmp2 = Array.create tablesize (Array.copy tmp1) in
  let table = Array.create tablesize (Array.copy tmp2) in
      for i = 0 to tablesize - 1 do
      for j = 0 to tablesize - 1 do
      for k = 0 to tablesize - 1 do
       table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
     done done done
(copy are used to not have the same arrays ) I think (not sure) that used
"Array.create" instead of  "Array.init" allows ours to not have the problem
of buildings the closures and peraps give a faster code (for this cases). To
test ;-)

> [skaller@pelican] ~>time ./xmem 250
> real    0m3.327s
> user    0m2.760s
> sys     0m0.300s
I do not understand what is "xmem". A super hero ;-) ?

Cheers,
Frédéric Gava



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 14:59 ` John Prevost
@ 2005-02-10 16:50   ` Marwan Burelle
  2005-02-10 19:20     ` Christian Szegedy
  2005-02-11  1:04     ` skaller
  0 siblings, 2 replies; 31+ messages in thread
From: Marwan Burelle @ 2005-02-10 16:50 UTC (permalink / raw)
  To: John Prevost; +Cc: caml-list

On Thu, 10 Feb 2005 09:59:53 -0500, John Prevost <j.prevost@gmail.com> wrote:
> In a program that allocates one very large chunk of memory, I suspect
> you will find that both C and O'Caml do a lot better when... you
> allocated as one very large chunk of memory (or, if need be, a *tiny*
> number of large chunks) instead of as many small chunks of memory.

It also depends on malloc, on Linux it sometimes works
"optimisticaly", that is, it won't realy allocate memory unless you
use it (leading to some strange out of memory error, since it can
return a non-Null pointer even if memory isn't available.) So, a C
program with a lot of malloc and no usage of the memory allocated
could be faster than it realy is ...

My 2cc.

-- 
Burelle Marwan,
Equipe Bases de Donnees - LRI
http://www.cduce.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 16:36     ` Frédéric Gava
@ 2005-02-10 17:56       ` Frédéric Gava
  2005-02-10 19:56         ` Christian Szegedy
  2005-02-11  0:55       ` skaller
  1 sibling, 1 reply; 31+ messages in thread
From: Frédéric Gava @ 2005-02-10 17:56 UTC (permalink / raw)
  Cc: caml-list

Peraps test the following code (I do not have test it)
>   let tmp1=Array.create tablesize 0 in
>   let tmp2 = Array.create tablesize (Array.copy tmp1) in
>   let table = Array.create tablesize (Array.copy tmp2) in
>       for i = 0 to tablesize - 1 do
>       for j = 0 to tablesize - 1 do
>       for k = 0 to tablesize - 1 do
>        table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
>      done done done
> (copy are used to not have the same arrays )
Oups !!!!
The "Array.copy"  is here a bug because we would do not have copies of the
arrays. A better code would be:
 let table = Array.create tablesize [| [| |] |] in
  (* modify the array table to not have arrays*)
  for i=0 to tablesize - 1 do
      table.(i) <-   (  let tmp=Array.create tablesize [||]  in
                             for j=0 to tablesize -1 do
                              tmp.(j) <- Array.create tablesize 0
                           done;
                           tmp)
 done;
(* create the good values *)
       for i = 0 to tablesize - 1 do
       for j = 0 to tablesize - 1 do
       for k = 0 to tablesize - 1 do
        table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
      done done done

I do not have test this code (I do not used my computer) but it does not
create closures.

Regards,
Frédéric Gava



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 16:50   ` Marwan Burelle
@ 2005-02-10 19:20     ` Christian Szegedy
  2005-02-10 19:40       ` Jon Harrop
  2005-02-11 11:26       ` Oliver Bandel
  2005-02-11  1:04     ` skaller
  1 sibling, 2 replies; 31+ messages in thread
From: Christian Szegedy @ 2005-02-10 19:20 UTC (permalink / raw)
  To: caml-list

Marwan Burelle wrote:

>On Thu, 10 Feb 2005 09:59:53 -0500, John Prevost <j.prevost@gmail.com> wrote:
>  
>
>>In a program that allocates one very large chunk of memory, I suspect
>>you will find that both C and O'Caml do a lot better when... you
>>allocated as one very large chunk of memory (or, if need be, a *tiny*
>>number of large chunks) instead of as many small chunks of memory.
>>    
>>
>
>It also depends on malloc, on Linux it sometimes works
>"optimisticaly", that is, it won't realy allocate memory unless you
>use it (leading to some strange out of memory error, since it can
>return a non-Null pointer even if memory isn't available.) So, a C
>program with a lot of malloc and no usage of the memory allocated
>could be faster than it realy is ...
>
>My 2cc.
>
Actually, I have *filled* the arrays, as it may be clear from
the code. This example was extracted from a program which
massively shuffles around the content of this 3-dimensional grid.
(Both work fine and yield identical output.)

To my astonishment, the OCaml was a bit faster than C when
working on the grid, but the speed of allocation was nowhere
near to that of the C version.

This was a surprise to me, since I thought that OCaml is quite
competitive in this regard.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 19:20     ` Christian Szegedy
@ 2005-02-10 19:40       ` Jon Harrop
  2005-02-11 11:26       ` Oliver Bandel
  1 sibling, 0 replies; 31+ messages in thread
From: Jon Harrop @ 2005-02-10 19:40 UTC (permalink / raw)
  To: caml-list

On Thursday 10 February 2005 19:20, Christian Szegedy wrote:
> ... This example was extracted from a program which
> massively shuffles around the content of this 3-dimensional grid.
> (Both work fine and yield identical output.)

Perhaps a non-uniform data structure would be more suitable? OCaml would 
really shine at this...

> To my astonishment, the OCaml was a bit faster than C when
> working on the grid, but the speed of allocation was nowhere
> near to that of the C version.

Yes, there is a lot of overhead when doing this in ocaml.

> This was a surprise to me, since I thought that OCaml is quite
> competitive in this regard.

No, I've also found this with a wavelet transform too. The time taken to do 
the transform was within a few percent but the time taken to initialise was 
double for ocaml compared to C, IIRC.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 17:56       ` Frédéric Gava
@ 2005-02-10 19:56         ` Christian Szegedy
  2005-02-10 23:58           ` Frédéric Gava
  2005-02-11  9:22           ` Frédéric Gava
  0 siblings, 2 replies; 31+ messages in thread
From: Christian Szegedy @ 2005-02-10 19:56 UTC (permalink / raw)
  To: caml-list

Frédéric Gava wrote:

>Peraps test the following code (I do not have test it)
>  
>
OK:

memnew.ml

let test tablesize =
  let table = Array.create tablesize [| [| |] |] in
  for i=0 to tablesize - 1 do
     table.(i) <-   (  let tmp=Array.create tablesize [||]  in
    for j=0 to tablesize -1 do
        tmp.(j) <- Array.create tablesize 0
    done;
    tmp)
  done;

  for i = 0 to tablesize - 1 do
    for j = 0 to tablesize - 1 do
      for k = 0 to tablesize - 1 do
        table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
      done done done

let _ = test (int_of_string Sys.argv.(1))


The results (on an AMD 2000XP):

chris@gentoo:~/test$ ocamlopt -unsafe -inline 40 memnew.ml
chris@gentoo:~/test$ a.out 300
chris@gentoo:~/test$ time a.out 300
real    0m1.560s
user    0m1.391s
sys     0m0.105s

chris@gentoo:~/test$ ocamlopt -unsafe -inline 40 mem.ml
chris@gentoo:~/test$ time a.out 300
real    0m2.106s
user    0m1.904s
sys     0m0.106s

chris@gentoo:~/test$ gcc -O3 -fomit-frame-pointer mem.c
chris@gentoo:~/test$ time a.out 300
real    0m0.380s
user    0m0.283s
sys     0m0.072s


So, yours is slightly faster, but both are in the same ballpark
and the C version is at least 4X faster.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 19:56         ` Christian Szegedy
@ 2005-02-10 23:58           ` Frédéric Gava
  2005-02-11  9:22           ` Frédéric Gava
  1 sibling, 0 replies; 31+ messages in thread
From: Frédéric Gava @ 2005-02-10 23:58 UTC (permalink / raw)
  To: caml-list

Hi,

There is a function in the stdlid for have a beter code:

let test tablesize =
  let table = Array.create tablesize [| [| |] |] in
  for i=0 to tablesize - 1 do
     table.(i) <-   Array.make_matrix tablesize tablesize 0
  done;
...

smaller code no ? ;-)

> The results (on an AMD 2000XP):
>
> chris@gentoo:~/test$ ocamlopt -unsafe -inline 40 memnew.ml
> chris@gentoo:~/test$ a.out 300
> chris@gentoo:~/test$ time a.out 300
> real    0m1.560s
> user    0m1.391s
> sys     0m0.105s
>
> chris@gentoo:~/test$ ocamlopt -unsafe -inline 40 mem.ml
> chris@gentoo:~/test$ time a.out 300
> real    0m2.106s
> user    0m1.904s
> sys     0m0.106s
>
> chris@gentoo:~/test$ gcc -O3 -fomit-frame-pointer mem.c
> chris@gentoo:~/test$ time a.out 300
> real    0m0.380s
> user    0m0.283s
> sys     0m0.072s
>
> So, yours is slightly faster, but both are in the same ballpark
> and the C version is at least 4X faster.

:-(

Peraps due to the fact that "int" are boxed. Try to give "options you give
to gcc" to ocamlopt
to link optimized code. Else I do not know how to optimize this code "in a
pure ocaml way" (i.e without trick with C code). Is caml guru, wizards or
ocaml's creators have an idea ?

Regards,
Frédéric Gava



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 16:36     ` Frédéric Gava
  2005-02-10 17:56       ` Frédéric Gava
@ 2005-02-11  0:55       ` skaller
  1 sibling, 0 replies; 31+ messages in thread
From: skaller @ 2005-02-11  0:55 UTC (permalink / raw)
  To: Frédéric Gava; +Cc: caml-list

On Fri, 2005-02-11 at 03:36, Frédéric Gava wrote:
> Hi,
> 
> > let test tablesize =
> >   let table =
> >     Array.init tablesize (fun i ->
> >     Array.init tablesize (fun j ->
> >     Array.create tablesize 0))
> >   in
> >     for i = 0 to tablesize - 1 do
> >     for j = 0 to tablesize - 1 do
> >     for k = 0 to tablesize - 1 do
> >     table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
> >     done done done
> Here, you have tablesize^2 applications and time to create the closure(s),
> so it is not a good example.

Why? o(2) is lots smaller than o(3). In my test case,
a factor of 250 should show up in the timings.

I also assumed, naively, Ocaml would use invariant
code motion to optimise the triple indexing.


> > [skaller@pelican] ~>time ./xmem 250
> > real    0m3.327s
> > user    0m2.760s
> > sys     0m0.300s
> I do not understand what is "xmem". A super hero ;-) ?

LOL! It's just the name of the program. The original one
was 'mem' the modified one I called 'xmem'.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 16:50   ` Marwan Burelle
  2005-02-10 19:20     ` Christian Szegedy
@ 2005-02-11  1:04     ` skaller
  2005-02-11 11:28       ` Oliver Bandel
  1 sibling, 1 reply; 31+ messages in thread
From: skaller @ 2005-02-11  1:04 UTC (permalink / raw)
  To: Marwan Burelle; +Cc: John Prevost, caml-list

On Fri, 2005-02-11 at 03:50, Marwan Burelle wrote:
> On Thu, 10 Feb 2005 09:59:53 -0500, John Prevost <j.prevost@gmail.com> wrote:
> > In a program that allocates one very large chunk of memory, 

> It also depends on malloc, on Linux it sometimes works
> "optimisticaly", that is, it won't realy allocate memory unless you
> use it

This behaviour is not allowed by the ISO C Standard.
This case was actually discussed by WG14.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 19:56         ` Christian Szegedy
  2005-02-10 23:58           ` Frédéric Gava
@ 2005-02-11  9:22           ` Frédéric Gava
  2005-02-11 13:04             ` skaller
  1 sibling, 1 reply; 31+ messages in thread
From: Frédéric Gava @ 2005-02-11  9:22 UTC (permalink / raw)
  To: Christian Szegedy, caml-list

Hi,

> So, yours is slightly faster, but both are in the same ballpark
> and the C version is at least 4X faster.

The FAQ of OCaml says (http://caml.inria.fr/ocaml/numerical.html)

|       for j = 0 to N-1 do
|        for i = 0 to N-1 do
|            c.(i).(j) <- a.(i).(j) + b.(i).(j)
|          done
|        done
|but write:
|        for i = 0 to N-1 do
|          let row_a = a.(i) and row_b = b.(i) and row_c = c.(i) in
|          for j = 0 to N-1 do
|            row_c.(j) <- row_a.(j) + row_b.(j)
|          done
|       done


So for your code
>  for i = 0 to tablesize - 1 do
>    for j = 0 to tablesize - 1 do
>      for k = 0 to tablesize - 1 do
>        table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
>      done done done

You could write;
for i=0 to tablesize -1 do
 let row1 = table.(i) in
 for j=0 to tablesize -1 do
  let row2 = row1.(j) do
  for k=0 to tablesize -1 do
   row2.(k) <- (i+1)*(j+1)*(k+1)
 done done done

and peraps you will have a faster code. Try with "make_matrice" and without
and tell me please, I am interested by this subject.

Regards,
Frédéric Gava



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-10 19:20     ` Christian Szegedy
  2005-02-10 19:40       ` Jon Harrop
@ 2005-02-11 11:26       ` Oliver Bandel
  2005-02-12 13:42         ` Christian Szegedy
  1 sibling, 1 reply; 31+ messages in thread
From: Oliver Bandel @ 2005-02-11 11:26 UTC (permalink / raw)
  To: caml-list

On Thu, Feb 10, 2005 at 08:20:03PM +0100, Christian Szegedy wrote:
[...]
> Actually, I have *filled* the arrays, as it may be clear from
> the code. This example was extracted from a program which
> massively shuffles around the content of this 3-dimensional grid.
> (Both work fine and yield identical output.)
> 
> To my astonishment, the OCaml was a bit faster than C when
> working on the grid, but the speed of allocation was nowhere
> near to that of the C version.

What does this mean?
Is only the work on the grid faster, or is the program
all in all (mem allocation AND working on the grid)
faster in OCaml or in C?

What does count more?

Being faster in subatsks, or getting the whole computation
done in a shorter time?


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11  1:04     ` skaller
@ 2005-02-11 11:28       ` Oliver Bandel
  2005-02-12  0:01         ` Guillaume
  2005-02-12  0:36         ` skaller
  0 siblings, 2 replies; 31+ messages in thread
From: Oliver Bandel @ 2005-02-11 11:28 UTC (permalink / raw)
  To: caml-list

On Fri, Feb 11, 2005 at 12:04:20PM +1100, skaller wrote:
> On Fri, 2005-02-11 at 03:50, Marwan Burelle wrote:
> > On Thu, 10 Feb 2005 09:59:53 -0500, John Prevost <j.prevost@gmail.com> wrote:
> > > In a program that allocates one very large chunk of memory, 
> 
> > It also depends on malloc, on Linux it sometimes works
> > "optimisticaly", that is, it won't realy allocate memory unless you
> > use it

You can call calloc() instead of malloc, so you automatically
use the memory.

Would be interesting to have a bench on this...



> 
> This behaviour is not allowed by the ISO C Standard.

Yes, that's true.


> This case was actually discussed by WG14.

What is the WG14?

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11  9:22           ` Frédéric Gava
@ 2005-02-11 13:04             ` skaller
  2005-02-11 13:33               ` skaller
  2005-02-11 21:07               ` Oliver Bandel
  0 siblings, 2 replies; 31+ messages in thread
From: skaller @ 2005-02-11 13:04 UTC (permalink / raw)
  To: Frédéric Gava; +Cc: Christian Szegedy, caml-list

On Fri, 2005-02-11 at 20:22, Frédéric Gava wrote:

> 
> You could write;
> for i=0 to tablesize -1 do
>  let row1 = table.(i) in
>  for j=0 to tablesize -1 do
>   let row2 = row1.(j) do
>   for k=0 to tablesize -1 do
>    row2.(k) <- (i+1)*(j+1)*(k+1)
>  done done done
> 
> and peraps you will have a faster code.

I found no difference, here are two runs:

[skaller@pelican] ~>time ./zmem 250
 
real    0m3.110s
user    0m2.820s
sys     0m0.240s

[skaller@pelican] ~>time ./zmem 250
 
real    0m27.732s
user    0m2.750s
sys     0m0.340s

The huge 'real' time there is VM paging.

Perhaps Xavier will bless us with a comment as to
whether invariant code motion optimisation is actually
done in 

        table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)

Using bigarray (c_layout):
 
real    0m27.948s
user    0m0.770s
sys     0m0.500s

.. 4 times faster.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11 13:04             ` skaller
@ 2005-02-11 13:33               ` skaller
  2005-02-11 21:07               ` Oliver Bandel
  1 sibling, 0 replies; 31+ messages in thread
From: skaller @ 2005-02-11 13:33 UTC (permalink / raw)
  To: Frédéric Gava; +Cc: Christian Szegedy, caml-list

On Sat, 2005-02-12 at 00:04, skaller wrote:

> 
> Using bigarray (c_layout):
>  
> real    0m27.948s
> user    0m0.770s
> sys     0m0.500s
> 
> .. 4 times faster.

I should add I checked the allocation time at around 0.07
for the bigarray.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11 13:04             ` skaller
  2005-02-11 13:33               ` skaller
@ 2005-02-11 21:07               ` Oliver Bandel
  2005-02-12  0:44                 ` skaller
  1 sibling, 1 reply; 31+ messages in thread
From: Oliver Bandel @ 2005-02-11 21:07 UTC (permalink / raw)
  To: caml-list

On Sat, Feb 12, 2005 at 12:04:32AM +1100, skaller wrote:
> On Fri, 2005-02-11 at 20:22, Frédéric Gava wrote:
> 
> > 
> > You could write;
> > for i=0 to tablesize -1 do
> >  let row1 = table.(i) in
> >  for j=0 to tablesize -1 do
> >   let row2 = row1.(j) do
> >   for k=0 to tablesize -1 do
> >    row2.(k) <- (i+1)*(j+1)*(k+1)
> >  done done done
> > 
> > and peraps you will have a faster code.
> 
> I found no difference, here are two runs:
> 
> [skaller@pelican] ~>time ./zmem 250
>  
> real    0m3.110s
> user    0m2.820s
> sys     0m0.240s
> 
> [skaller@pelican] ~>time ./zmem 250
>  
> real    0m27.732s
> user    0m2.750s
> sys     0m0.340s


Two runs, no difference?

Are you sure zmem and zmem are the same?


IMHO 

> real    0m3.110s
> user    0m2.820s
> sys     0m0.240s

and

> real    0m27.732s
> user    0m2.750s
> sys     0m0.340s

differ...




> 
> The huge 'real' time there is VM paging.
> 
> Perhaps Xavier will bless us with a comment as to
> whether invariant code motion optimisation is actually
> done in 
> 
>         table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
> 
> Using bigarray (c_layout):
>  
> real    0m27.948s
> user    0m0.770s
> sys     0m0.500s
> 
> .. 4 times faster.


?

What is faster than what?!

Is zmem zmem?
What is zmem?

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11 11:28       ` Oliver Bandel
@ 2005-02-12  0:01         ` Guillaume
  2005-02-12  0:36         ` skaller
  1 sibling, 0 replies; 31+ messages in thread
From: Guillaume @ 2005-02-12  0:01 UTC (permalink / raw)
  To: caml-list

On Fri, Feb 11, 2005 at 12:28:57PM +0100, Oliver Bandel wrote :
> On Fri, Feb 11, 2005 at 12:04:20PM +1100, skaller wrote:
> > On Fri, 2005-02-11 at 03:50, Marwan Burelle wrote:
> > > On Thu, 10 Feb 2005 09:59:53 -0500, John Prevost <j.prevost@gmail.com> wrote:
> > > > In a program that allocates one very large chunk of memory, 
> > 
> > > It also depends on malloc, on Linux it sometimes works
> > > "optimisticaly", that is, it won't realy allocate memory unless you
> > > use it
> 
> You can call calloc() instead of malloc, so you automatically
> use the memory.

IIRC, calloc() does not always write the memory it allocates. If it
allocates one little chunk of memory, it uses sbrk() + memset().
But on large ones, it uses mmap(), which do not write memory until a
pagefault.

> 
> Ciao,
>    Oliver

Guillaume Leconte

-- 
You're not a beautiful and unique snowflake.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11 11:28       ` Oliver Bandel
  2005-02-12  0:01         ` Guillaume
@ 2005-02-12  0:36         ` skaller
  1 sibling, 0 replies; 31+ messages in thread
From: skaller @ 2005-02-12  0:36 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Fri, 2005-02-11 at 22:28, Oliver Bandel wrote:

> > This case was actually discussed by WG14.
> 
> What is the WG14?

ISO/IEC SC22 WG14 -- group responsible for drafting
ISO C Standard. WG=Working Group. SC=Subcommittee.
SC22=programming languages group.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11 21:07               ` Oliver Bandel
@ 2005-02-12  0:44                 ` skaller
  2005-02-15 14:17                   ` Frédéric Gava
  0 siblings, 1 reply; 31+ messages in thread
From: skaller @ 2005-02-12  0:44 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Sat, 2005-02-12 at 08:07, Oliver Bandel wrote:

> > [skaller@pelican] ~>time ./zmem 250
> >  
> > real    0m3.110s
> > user    0m2.820s
> > sys     0m0.240s
> > 
> > [skaller@pelican] ~>time ./zmem 250
> >  
> > real    0m27.732s
> > user    0m2.750s
> > sys     0m0.340s
> 
> 
> Two runs, no difference?

> Are you sure zmem and zmem are the same?

Yes, its the same binary.

> 
> IMHO 
> 
> > real    0m3.110s
> > user    0m2.820s
> > sys     0m0.240s
> 
> and
> 
> > real    0m27.732s
> > user    0m2.750s
> > sys     0m0.340s
> 
> differ...

Not significantly in user time. Remember this
is not a serious measurement.

> >         table.(i).(j).(k) <- (i+1)*(j+1)*(k+1)
> > 
> > Using bigarray (c_layout):
> >  
> > real    0m27.948s
> > user    0m0.770s
> > sys     0m0.500s
> > 
> > .. 4 times faster.
> 
> ?
> 
> What is faster than what?!

The code using ordinary arrays runs in 2.8 seconds,
using bigarray 0.7 seconds. 4 x 0.7 = 2.8.

bigarray is 4 times faster to write than three level
ordinary array.


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-11 11:26       ` Oliver Bandel
@ 2005-02-12 13:42         ` Christian Szegedy
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Szegedy @ 2005-02-12 13:42 UTC (permalink / raw)
  To: caml-list

Oliver Bandel wrote:

>What does this mean?
>Is only the work on the grid faster, or is the program
>all in all (mem allocation AND working on the grid)
>faster in OCaml or in C?
>  
>
Only the work on the grid is about as fast as in C.
The differences were negligible.

The overall difference in the runtime was due to the
allocation overhead in the OCaml code.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-12  0:44                 ` skaller
@ 2005-02-15 14:17                   ` Frédéric Gava
  2005-02-15 19:19                     ` Christian Szegedy
  2005-02-15 20:51                     ` Jon Harrop
  0 siblings, 2 replies; 31+ messages in thread
From: Frédéric Gava @ 2005-02-15 14:17 UTC (permalink / raw)
  To: Christian Szegedy; +Cc: caml-list

Hi,

stupid question, do you use the "-unsafe" option of ocamlopt. It is better
for arrays...

Regards,
Frédéric Gava
> The code using ordinary arrays runs in 2.8 seconds,
> using bigarray 0.7 seconds. 4 x 0.7 = 2.8.
>
> bigarray is 4 times faster to write than three level
> ordinary array.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-15 14:17                   ` Frédéric Gava
@ 2005-02-15 19:19                     ` Christian Szegedy
  2005-02-15 20:51                     ` Jon Harrop
  1 sibling, 0 replies; 31+ messages in thread
From: Christian Szegedy @ 2005-02-15 19:19 UTC (permalink / raw)
  To: caml-list

Frédéric Gava wrote:

>Hi,
>
>stupid question, do you use the "-unsafe" option of ocamlopt. It is better
>for arrays...
>
I used -unsafe. (You could have seen it in my previous postings:
the compilation line was there.)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-15 14:17                   ` Frédéric Gava
  2005-02-15 19:19                     ` Christian Szegedy
@ 2005-02-15 20:51                     ` Jon Harrop
  2005-02-16  8:19                       ` Ville-Pertti Keinonen
  1 sibling, 1 reply; 31+ messages in thread
From: Jon Harrop @ 2005-02-15 20:51 UTC (permalink / raw)
  To: caml-list

On Tuesday 15 February 2005 14:17, Frédéric Gava wrote:
> stupid question, do you use the "-unsafe" option of ocamlopt. It is better
> for arrays...

Having seen a physics student write a C++ program which invalidly read from 
a[n], resulting in him drawing scientific conclusions from physically- 
realistic but non-deterministically- and unquantifiably-erroneous results, I 
strongly advise people to take the extra ~10% performance hit and use array 
bounds checking all the time.

Indeed, I'm in the "remove -unsafe" camp. Even if OCaml only hoisted bounds 
checks in the simplest of cases, I think there would be a strong case for 
removing this option.

> > The code using ordinary arrays runs in 2.8 seconds,
> > using bigarray 0.7 seconds. 4 x 0.7 = 2.8.
> >
> > bigarray is 4 times faster to write than three level
> > ordinary array.

The current array implementation is not great for large array-based 
computations but I'm not sure that this is such a failing because I don't 
think people should be coding large array-based computations in OCaml. Either 
use a dedicated numerical library for handling large matrices/tensors or use 
a more appropriate data structure (typically a tree) to solve "the bigger" 
problem. The asymptotic complexity of such array based computations is rarely 
good and, consequently, real-time performance is often unnecessarily poor.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-15 20:51                     ` Jon Harrop
@ 2005-02-16  8:19                       ` Ville-Pertti Keinonen
  2005-02-16  9:54                         ` Jon Harrop
  0 siblings, 1 reply; 31+ messages in thread
From: Ville-Pertti Keinonen @ 2005-02-16  8:19 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Tue, 2005-02-15 at 20:51 +0000, Jon Harrop wrote:

> Indeed, I'm in the "remove -unsafe" camp. Even if OCaml only hoisted bounds 
> checks in the simplest of cases, I think there would be a strong case for 
> removing this option.

As far as I can tell OCaml *never* eliminates or hoists bounds checks
(or any other repetitive operation), even in the simplest of cases.  It
does explicitly use unsafe operations in the standard library, though.

ocamlopt doesn't really perform a lot of optimizations.  The most
significant ones (inlining, constant folding/value propagation, direct
calls) appear to be done in a single pass (asmcomp/closure.ml).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-16  8:19                       ` Ville-Pertti Keinonen
@ 2005-02-16  9:54                         ` Jon Harrop
  2005-02-16 10:56                           ` Ville-Pertti Keinonen
  0 siblings, 1 reply; 31+ messages in thread
From: Jon Harrop @ 2005-02-16  9:54 UTC (permalink / raw)
  To: caml-list

On Wednesday 16 February 2005 08:19, Ville-Pertti Keinonen wrote:
> On Tue, 2005-02-15 at 20:51 +0000, Jon Harrop wrote:
> > Indeed, I'm in the "remove -unsafe" camp. Even if OCaml only hoisted
> > bounds checks in the simplest of cases, I think there would be a strong
> > case for removing this option.
>
> As far as I can tell OCaml *never* eliminates or hoists bounds checks
> (or any other repetitive operation), even in the simplest of cases.

In most cases, I'm happy with that as the programmer can hoist invariants 
manually. As Xavier has said before, ocamlopt is designed to compile well- 
written code. :-)

In the case of bounds checking though, only the compiler can do the hoisting.

Still, the last time I wrote array-based code it was to demonstrate how much 
faster tree-based code is. ;-)

> It does explicitly use unsafe operations in the standard library, though.

I was going to say that this is all the more reason to have a full complement 
of map/iter/folds in Array and String. Actually, that's not really true as 
the cost of polymorphism in most of these functions (except String.iter) will 
dwarf the advantage of removing bounds checking.

> ocamlopt doesn't really perform a lot of optimizations.  The most
> significant ones (inlining, constant folding/value propagation, direct
> calls) appear to be done in a single pass (asmcomp/closure.ml).

Yes, the performance is astonishingly good considering how elegant the 
compiler is. I think it relies entirely on the CPU to do dynamic instruction 
rescheduling for some CPUs (not ARM, of course ;-).

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Memory allocation nano-benchmark.
  2005-02-16  9:54                         ` Jon Harrop
@ 2005-02-16 10:56                           ` Ville-Pertti Keinonen
  0 siblings, 0 replies; 31+ messages in thread
From: Ville-Pertti Keinonen @ 2005-02-16 10:56 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Wed, 2005-02-16 at 09:54 +0000, Jon Harrop wrote:

> Yes, the performance is astonishingly good considering how elegant the 
> compiler is. I think it relies entirely on the CPU to do dynamic instruction 

Elegant?  My idea of elegant would be something that's nicely abstracted
into intermediate representations and passes, introducing runtime
dependencies as late as possible, making things straightforward to
change etc.  My impression of OCaml is that it's an interesting
combination of "simplistic" and "hairy".  To be fair, the actual machine
code generation parts seem fairly nice.



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2005-02-16 10:56 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-10 15:15 Memory allocation nano-benchmark Christian Szegedy
2005-02-10 14:47 ` [Caml-list] " Frédéric Gava
2005-02-10 15:19   ` skaller
2005-02-10 16:36     ` Frédéric Gava
2005-02-10 17:56       ` Frédéric Gava
2005-02-10 19:56         ` Christian Szegedy
2005-02-10 23:58           ` Frédéric Gava
2005-02-11  9:22           ` Frédéric Gava
2005-02-11 13:04             ` skaller
2005-02-11 13:33               ` skaller
2005-02-11 21:07               ` Oliver Bandel
2005-02-12  0:44                 ` skaller
2005-02-15 14:17                   ` Frédéric Gava
2005-02-15 19:19                     ` Christian Szegedy
2005-02-15 20:51                     ` Jon Harrop
2005-02-16  8:19                       ` Ville-Pertti Keinonen
2005-02-16  9:54                         ` Jon Harrop
2005-02-16 10:56                           ` Ville-Pertti Keinonen
2005-02-11  0:55       ` skaller
2005-02-10 14:56 ` Jon Harrop
2005-02-10 15:32   ` Ville-Pertti Keinonen
2005-02-10 14:59 ` John Prevost
2005-02-10 16:50   ` Marwan Burelle
2005-02-10 19:20     ` Christian Szegedy
2005-02-10 19:40       ` Jon Harrop
2005-02-11 11:26       ` Oliver Bandel
2005-02-12 13:42         ` Christian Szegedy
2005-02-11  1:04     ` skaller
2005-02-11 11:28       ` Oliver Bandel
2005-02-12  0:01         ` Guillaume
2005-02-12  0:36         ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).