Optimizing Array.blit

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Optimizing Array.blit
@ 2007-04-04 22:19 David Baelde
  2007-04-05 17:56 ` [Caml-list] " Tom
  2007-04-10  0:13 ` David Baelde
  0 siblings, 2 replies; 7+ messages in thread
From: David Baelde @ 2007-04-04 22:19 UTC (permalink / raw)
  To: OCaml Mailing List

Hi list,

I'm developing an app (compiled with ocamlopt) which does intensive
Array.blits. I quickly realized that these blits took an absurd amount
of time compared to other computations. Then I figured out that the
code in array.ml couldn't be as efficient as a memcpy (and saw
http://caml.inria.fr/mantis/view.php?id=2787).

I tried to write a blit function in C, specialized for float arrays.
I'm not very much into the internals of OCaml, but I came up with the
following working code:

CAMLprim value caml_float_array_blit(value _src, value _src_off,
                                     value _dst, value _dst_off, value _len) {
  int src_off = Int_val(_src_off) ;
  int dst_off = Int_val(_dst_off) ;
  int len = Int_val(_len) ;
  memcpy(&Double_field(_dst,dst_off),&Double_field(_src,src_off),sizeof(double)*len)
;
  return Val_unit ;
}

I have no idea whether or not this is valid in general, portable, etc.

Also, should I try to get a good C function or are BigArrays the
ultimate efficient solution ? I'm reluctant to change all my code just
for blits..

Thanks for any advice on that problem.
-- 
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Optimizing Array.blit
  2007-04-04 22:19 Optimizing Array.blit David Baelde
@ 2007-04-05 17:56 ` Tom
  2007-04-05 21:54   ` Florian Weimer
  2007-04-10  0:13 ` David Baelde
  1 sibling, 1 reply; 7+ messages in thread
From: Tom @ 2007-04-05 17:56 UTC (permalink / raw)
  To: david.baelde; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 426 bytes --]

>From http://www.cplusplus.com/memcpy:
      the arrays pointed by both the *destination* and *source* parameters
should not overlap (for overlapping memory blocks,
memmove<http://www.cplusplus.com/memmove>is a safer approach).

However, Array.blit allows overlapping of the chunks that are to be copied.
So, probably it is slower because it is checking for the overlapping regions
and dealing with them appropriately.

- Tom

[-- Attachment #2: Type: text/html, Size: 543 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Optimizing Array.blit
  2007-04-05 17:56 ` [Caml-list] " Tom
@ 2007-04-05 21:54   ` Florian Weimer
  2007-04-06  7:00     ` David Baelde
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2007-04-05 21:54 UTC (permalink / raw)
  To: Tom; +Cc: david.baelde, OCaml Mailing List

* Tom:

> However, Array.blit allows overlapping of the chunks that are to be copied.
> So, probably it is slower because it is checking for the overlapping regions
> and dealing with them appropriately.

I guess it's because it has to update the GC flags.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Optimizing Array.blit
  2007-04-05 21:54   ` Florian Weimer
@ 2007-04-06  7:00     ` David Baelde
  0 siblings, 0 replies; 7+ messages in thread
From: David Baelde @ 2007-04-06  7:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Tom, OCaml Mailing List

Hi,

According to Alain (Frisch I guess), in the bug report that I pointed,
Array.blit causes checks for unboxed floats for every check, and also
has to update GC flags. I don't master the second point but he seems
to suggest that it can be optimized.

In the case of non-overlapping unboxed float arrays, I believe a
simple function similar to mine should work. But I'm not very
confident in mine yet.

Cheers.
-- 
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing Array.blit
  2007-04-04 22:19 Optimizing Array.blit David Baelde
  2007-04-05 17:56 ` [Caml-list] " Tom
@ 2007-04-10  0:13 ` David Baelde
  2007-04-10  1:26   ` [Caml-list] " Jacques Garrigue
  1 sibling, 1 reply; 7+ messages in thread
From: David Baelde @ 2007-04-10  0:13 UTC (permalink / raw)
  To: OCaml Mailing List

Hi again,

To answer the question by myself, I have ran a few tests. It turned
out that specializing Array.blit provides a significant boost, but
doing such an ugly hack was dangerous and unnecessary. I now use the
following standard code:

CAMLprim value caml_float_array_blit(value _src, value _src_off,
                                     value _dst, value _dst_off, value _len) {
  int src_off = Int_val(_src_off) ;
  int dst_off = Int_val(_dst_off) ;
  int len = Int_val(_len) ;
  int i ;
  for (i=0 ; i<len ; i++)
    Store_double_field(_dst,dst_off+i,Double_field(_src,src_off+i)) ;
  return Val_unit ;
}

I got the following timings in seconds for a program doing intensive
blits on float arrays: 0.17 for the ugly hack, 0.19 for the clean C
function, 1.04 for the standard Array.blit.
-- 
David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Re: Optimizing Array.blit
  2007-04-10  0:13 ` David Baelde
@ 2007-04-10  1:26   ` Jacques Garrigue
  2007-04-10 10:08     ` David Baelde
  0 siblings, 1 reply; 7+ messages in thread
From: Jacques Garrigue @ 2007-04-10  1:26 UTC (permalink / raw)
  To: david.baelde, david.baelde; +Cc: caml-list

From: "David Baelde" <david.baelde@gmail.com>
> To answer the question by myself, I have ran a few tests. It turned
> out that specializing Array.blit provides a significant boost, but
> doing such an ugly hack was dangerous and unnecessary. I now use the
> following standard code:
> 
> CAMLprim value caml_float_array_blit(value _src, value _src_off,
>                                      value _dst, value _dst_off, value _len) {
>   int src_off = Int_val(_src_off) ;
>   int dst_off = Int_val(_dst_off) ;
>   int len = Int_val(_len) ;
>   int i ;
>   for (i=0 ; i<len ; i++)
>     Store_double_field(_dst,dst_off+i,Double_field(_src,src_off+i)) ;
>   return Val_unit ;
> }
> 
> I got the following timings in seconds for a program doing intensive
> blits on float arrays: 0.17 for the ugly hack, 0.19 for the clean C
> function, 1.04 for the standard Array.blit.

Did you try taking the version from the standard library, and
annotating it with (a1 : float array)?
No ugly hack here: everything is 100% safe.
Personally, I get a 4 fold improvement just doing that.
Sometimes polymorphism is costly (particularly with arrays.)

Jacques Garrigue


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Re: Optimizing Array.blit
  2007-04-10  1:26   ` [Caml-list] " Jacques Garrigue
@ 2007-04-10 10:08     ` David Baelde
  0 siblings, 0 replies; 7+ messages in thread
From: David Baelde @ 2007-04-10 10:08 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: caml-list

Hi,

Thanks a lot for that advice, I was stupid not to try that before. I
got a 4 fold improvement too on my PIV desktop machine, which brings
the specialized ML code about as fast as the C code. This does not
change when I pass -O3 to gcc.

However, I ran the test on a dedibox, which has a VIA processor. The
machine is loaded so it's difficult to conclude anything, but I get
quite always the same surprising figures: without -O3 I get 0.57 for
the C code, 0.91 for the generic ML and 0.25 for the specialized ML;
with -O3 I get 0.12 for the C code, same for the others.

Optimizing is boring...
-- 
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-04-10 10:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-04 22:19 Optimizing Array.blit David Baelde
2007-04-05 17:56 ` [Caml-list] " Tom
2007-04-05 21:54   ` Florian Weimer
2007-04-06  7:00     ` David Baelde
2007-04-10  0:13 ` David Baelde
2007-04-10  1:26   ` [Caml-list] " Jacques Garrigue
2007-04-10 10:08     ` David Baelde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).