[Caml-list] OCaml Speed for Block Convolutions

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] OCaml Speed for Block Convolutions
@ 2001-06-01 18:38 David McClain
  2001-06-01 22:51 ` Tom _
  0 siblings, 1 reply; 26+ messages in thread
From: David McClain @ 2001-06-01 18:38 UTC (permalink / raw)
  To: caml-list

Hi,

I just wanted to let everyone know that I have been doing some OCaml
modeling of signal processing chains. One of these is block convolution
processing using overlap-add of huge signal streams (GByte in length).

While this was only a modeling attempt I was amazed to find that the code is
able to keep up in realtime with stereo processing at 16 KHz sample rates on
a cheapo 500 MHz Pentium III running Win/98SE 256 MB RAM, with lots of disk
activity.

The speed is so good here that I am reluctant to recode this whole thing
into C for what I believe will be only a 20-30% speedup. The OCaml array
handling mechanism is very fast indeed.

I have run into problems with address space limitations given a 32-bit
architecture and a roughly 21 bit limit on maximum index of normal
(double-precision) arrays. When are we going to have readily available
64-bit architectures!?

- D.McClain, Sr. Scientist, Raytheon Systems Co., Tucson, AZ

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-01 18:38 [Caml-list] OCaml Speed for Block Convolutions David McClain
@ 2001-06-01 22:51 ` Tom _
  2001-06-02  0:10   ` Stefan Monnier
  0 siblings, 1 reply; 26+ messages in thread
From: Tom _ @ 2001-06-01 22:51 UTC (permalink / raw)
  To: David McClain, caml-list

> When are we going to have
> readily available 64-bit architectures!?

We already do: Alphas are pretty affordable, and
you can get a 64bit SPARC for about $1000, even
running Linux if you like.

I think widespread adoption of 64bit machines will
make a huge difference for polymorphic and dynamic
languages, however.  32 bits is kind of tight for
numbers+type bits or pointers+type bits, but
in 64 bits, you can put both numbers and type bits
without feeling squeezed.  That should eliminate
the need for a lot of static analysis and runtime
tricks.

Tom.

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-01 22:51 ` Tom _
@ 2001-06-02  0:10   ` Stefan Monnier
  2001-06-04 10:12     ` Jacques Garrigue
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Monnier @ 2001-06-02  0:10 UTC (permalink / raw)
  To: caml-list

>>>>> "Tom" == Tom  <tom7ca@yahoo.com> writes:
> I think widespread adoption of 64bit machines will
> make a huge difference for polymorphic and dynamic
> languages, however.  32 bits is kind of tight for

But boxing will force everything to 64bit, thus the "double memory use"
will be slightly more noticeable with those languages than with C.


	Stefan
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-02  0:10   ` Stefan Monnier
@ 2001-06-04 10:12     ` Jacques Garrigue
  0 siblings, 0 replies; 26+ messages in thread
From: Jacques Garrigue @ 2001-06-04 10:12 UTC (permalink / raw)
  To: monnier+lists.caml/news/; +Cc: caml-list

From: "Stefan Monnier" <monnier+lists.caml/news/@rum.cs.yale.edu>
> >>>>> "Tom" == Tom  <tom7ca@yahoo.com> writes:
> > I think widespread adoption of 64bit machines will
> > make a huge difference for polymorphic and dynamic
> > languages, however.  32 bits is kind of tight for
> 
> But boxing will force everything to 64bit, thus the "double memory use"
> will be slightly more noticeable with those languages than with C.

I'm not sure this would matter that much.
Even in C, people are going to play safe, and oversize everything...
Just considering code size, alpha was already about twice as big as
other architectures. But it might be better to compare Sparc/32 vs
Sparc/64.

Anyway, if the real problem is about data size (and not code size),
you can still use Bigarray for your raw data, and have the same sizes
as in C. With the extra advantage that your 32-bit integers have all
their bits when converted to 63, and for 64-bit integers you generally
do not care about the topmost one. So I would expect Tom's statement
to be right: 64-bit is really a plus for functional programming in
general, and ocaml in particular.

Cheers,

Jacques Garrigue
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06 18:35       ` William Chesters
  2001-06-06 18:40         ` Patrick M Doane
  2001-06-07  1:50         ` Hugo Herbelin
@ 2001-06-07 18:20         ` Tom _
  2 siblings, 0 replies; 26+ messages in thread
From: Tom _ @ 2001-06-07 18:20 UTC (permalink / raw)
  To: William Chesters, caml-list

--- William Chesters <williamc@paneris.org> wrote:
> Hugo Herbelin writes:
>  > Assume more generally that you can modify any
> local variable as in the
>  > (standard) following example:
>  > 
>  > let fact (mutable n) =
>  >   let mutable r = 1 in
>  >   while n > 0 do
>  >      r <- r * n;
>  >      n <- n - 1
>  >   done;
>  >   r
> 
> This doesn't actually make life much easier for the
> compiler.  On 32-bit machines [see other thread!], 
> `r' must be a reference (in the 
> C++ sense) to a heap object---64-bit float, plus
> header.

I don't see why. The compiler has full type 
information.  All it may have to do is box up
a return value.

In any case, whether or not the compiler in the
current implementation can or cannot do good type
inference and may or may not be forced to box
a floating point number, the two constructs mean
something different to the programmer, and they
behave differently.  In particular "mutable x"
can never be passed around as a reference, while
"x = ref ..." can.  If not anything else, that
prevents the programmer from inadvertently inhibiting
a compiler optimization by passing around a reference.

Besides, the syntax is probably quite a bit more 
natural to imperative programmers anyway.

Tom.

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06 22:29         ` Chris Hecker
@ 2001-06-07  7:42           ` William Chesters
  0 siblings, 0 replies; 26+ messages in thread
From: William Chesters @ 2001-06-07  7:42 UTC (permalink / raw)
  To: caml-list

Chris Hecker writes:
 > 
 > >My feeling has always been that the overhead of initialising a float
 > >array, per element, is considerably less than the cost of one floating
 > >point operation; so if you ever intend to actually use the array for
 > >anything, the inefficiency is negligible.
 > 
 > This is not true on modern processors, if I understand what you're saying.  Touching memory that's not in the cache is orders of magnitude slower than flops.  Even the best case, with the memory in the cache, touching memory is the same speed or slower than doing a flop on most CPUs.

Er, yes, you are right of course and "I didn't mean to say that" ;)
but rather: whatever you subsequently do to the float array, be it
arithmetic or churning it through the cache, is likely to be
more expensive per element than initialisation.

Sorry,
William

[head befuddled by day spent trying to trick C++ compiler into coding up
float array ops right, or that's my excuse!]
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06 18:35       ` William Chesters
  2001-06-06 18:40         ` Patrick M Doane
@ 2001-06-07  1:50         ` Hugo Herbelin
  2001-06-07 18:20         ` Tom _
  2 siblings, 0 replies; 26+ messages in thread
From: Hugo Herbelin @ 2001-06-07  1:50 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list


William Chesters wrote:
> Hugo Herbelin writes:
>  > Assume more generally that you can modify any local variable as in the
>  > (standard) following example:
>  > 
>  > let fact (mutable n) =
>  >   let mutable r = 1 in
>  >   while n > 0 do
>  >      r <- r * n;
>  >      n <- n - 1
>  >   done;
>  >   r
> 
> This doesn't actually make life much easier for the compiler.  On
> 32-bit machines [see other thread!], `r' must be a reference (in the
> C++ sense) to a heap object---64-bit float, plus header.  In general
> it is not safe to overwrite the float value in the heap, if it's
> possible that other variables have been assigned to it. (unless floats
> are assigned by value not by reference, but in that case you get heap
> allocation at the time of assignment ...).  

  To be boxed or not is not a property of the language but of the
implementation. This means that the semantics for such a "x <- e"
for boxed values (may be floats) can only be the same as for unboxed
values, that is "assignment by value".

  This means that "x <- e", compared to "x.contents <- e", wouldn't
avoid the heap allocation inherent to the boxing of floats, but would
just avoid the initial allocation of an extra ref block and the
subsequent indirections to set/get the value (assuming the cell is not
already cached in a register what perhaps the compiler is able to do).

                                                  Hugo


-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06 20:13       ` William Chesters
@ 2001-06-06 22:29         ` Chris Hecker
  2001-06-07  7:42           ` William Chesters
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Hecker @ 2001-06-06 22:29 UTC (permalink / raw)
  To: William Chesters, caml-list


>My feeling has always been that the overhead of initialising a float
>array, per element, is considerably less than the cost of one floating
>point operation; so if you ever intend to actually use the array for
>anything, the inefficiency is negligible.

This is not true on modern processors, if I understand what you're saying.  Touching memory that's not in the cache is orders of magnitude slower than flops.  Even the best case, with the memory in the cache, touching memory is the same speed or slower than doing a flop on most CPUs.

Chris


-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 22:34     ` Markus Mottl
@ 2001-06-06 20:13       ` William Chesters
  2001-06-06 22:29         ` Chris Hecker
  0 siblings, 1 reply; 26+ messages in thread
From: William Chesters @ 2001-06-06 20:13 UTC (permalink / raw)
  To: caml-list

Markus Mottl writes:
 > On Mon, 04 Jun 2001, David McClain wrote:
 > > let dst = Array.unsafe_create nel in
 > 
 > Are there any intentions to add "unsafe_create" for float arrays to the
 > standard library? It would be nice to have a function that allocates such
 > arrays without initializing them - they are not scanned by the GC, anyway.

My feeling has always been that the overhead of initialising a float
array, per element, is considerably less than the cost of one floating
point operation; so if you ever intend to actually use the array for
anything, the inefficiency is negligible.

On the other hand if we have unsafe_set than unsafe_create is hardly a
huge additional bloat, more a kind of natural counterpart.

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06 18:35       ` William Chesters
@ 2001-06-06 18:40         ` Patrick M Doane
  2001-06-07  1:50         ` Hugo Herbelin
  2001-06-07 18:20         ` Tom _
  2 siblings, 0 replies; 26+ messages in thread
From: Patrick M Doane @ 2001-06-06 18:40 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list

On Wed, 6 Jun 2001, William Chesters wrote:

> Hugo Herbelin writes:
>  > Assume more generally that you can modify any local variable as in the
>  > (standard) following example:
>  > 
>  > let fact (mutable n) =
>  >   let mutable r = 1 in
>  >   while n > 0 do
>  >      r <- r * n;
>  >      n <- n - 1
>  >   done;
>  >   r
> 
> This doesn't actually make life much easier for the compiler.

Even if there's no benefit to the compiler, I think there is a benefit to
the programmer. I would much rather write imperative code with that
syntax.

Patrick

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06  2:03     ` Hugo Herbelin
  2001-06-06  4:04       ` Charles Martin
@ 2001-06-06 18:35       ` William Chesters
  2001-06-06 18:40         ` Patrick M Doane
                           ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: William Chesters @ 2001-06-06 18:35 UTC (permalink / raw)
  To: caml-list

Hugo Herbelin writes:
 > Assume more generally that you can modify any local variable as in the
 > (standard) following example:
 > 
 > let fact (mutable n) =
 >   let mutable r = 1 in
 >   while n > 0 do
 >      r <- r * n;
 >      n <- n - 1
 >   done;
 >   r

This doesn't actually make life much easier for the compiler.  On
32-bit machines [see other thread!], `r' must be a reference (in the
C++ sense) to a heap object---64-bit float, plus header.  In general
it is not safe to overwrite the float value in the heap, if it's
possible that other variables have been assigned to it (unless floats
are assigned by value not by reference, but in that case you get heap
allocation at the time of assignment ...).  The analysis necessary for
determining that no such aliasing references can exist, and indeed
that the value can simply be kept in a float register/on the FPU
stack, is already there in the compiler and applies perfectly well to
mutable references.

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06  4:04       ` Charles Martin
@ 2001-06-06 18:25         ` William Chesters
  0 siblings, 0 replies; 26+ messages in thread
From: William Chesters @ 2001-06-06 18:25 UTC (permalink / raw)
  To: caml-list

Charles Martin writes:
 > 
 > >> >           a := !a +. Array.unsafe_get xs i
 > >>     a <- a +. Array.unsafe_get xs i
 > 
 > At the risk of being off topic, I am wondering why most every code
 > scrap that appears on this list uses
 > [Array|String].unsafe_[get|set], rather than the "safe" versions.
 > Are off-by-one errors and the like so rare?  Doesn't adding
 > "-unsafe" as a compiler option to your final build have the same
 > effect?  If not, that would be worth knowing.

it's just that when one posts performance-related snippets, it saves
everyone's time if it's immediately clear that bounds checks are not
an issue
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-05  7:22     ` Chris Hecker
@ 2001-06-06  6:27       ` David McClain
  0 siblings, 0 replies; 26+ messages in thread
From: David McClain @ 2001-06-06  6:27 UTC (permalink / raw)
  To: caml-list

> I think I'm missing something.  The ocaml loops allocate memory for temps,
but the C loops reuse storage?  Did you do a version for ocaml that's got
the same level of optimizations as the C version (not allocating, etc.)?

I didn't write an OCaml program to solve this specific problem. Instead, I
made use of my NML (numeric modeling language) built on top of OCaml. Hence,
being a general purpose language, it had to make very conservative choices
to protect the user. Whenever new data are generated, they are placed into
freshly allocated arrays.

Had I written code to specifically address this one problem it would
probably have performed better. I was quite impressed with the performance I
was getting from my NML which used this very general purpose OCaml kernel. I
almost stopped at that point. But nagging questions, eager users, and a
growing list of block convolution jobs convinced me to see what I could do
in C, while still living primarily in the NML world.

I didn't see any need to rewrite all the file I/O handling, the
unwind-protect mechanisms, etc, etc, all in C when I have perfectly good
versions of this stuff in NML thanks to OCaml. So I simply targeted the
innermost loop where I knew much of the overhead was in fresh array creation
at every intermediate step. I am doing block convolutions on GByte sized
datasets, which is far more than NML was ever intended to support well. NML
is a modeling and prototyping language.

But adding C code is as easy as adding C code to any OCaml program and
recompiling the augmented NML system. Hence targeting only the inner loops
was very simple (in principle). In practice, even after nearly 30 years of
heavy C/C++ experience I still cannot write even relatively simple programs
without planting bugs in them. To OCaml's credit, I can easily write OCaml
code that executes properly the first time out. No debugging. I wish that
were the case for C as well.

So to end this long winded explanation, had I written OCaml code
specifically for the purpose of block convolutions I would have expected a
performance around 70% of that demonstrated by C, not the factor of 4 that I
had with NML. I could have written to use mutable arrays in that case,
reaping much of the same advantages that I got when I rewrote the inner
loops in C.

- DM

----- Original Message -----
From: "Chris Hecker" <checker@d6.com>
To: "David McClain" <dmcclain1@mindspring.com>; <caml-list@inria.fr>
Sent: Tuesday, June 05, 2001 12:22 AM
Subject: Re: [Caml-list] OCaml Speed for Block Convolutions

>
> >I think the constant creation of new arrays to hold intermediate results
> >(i.e., immutable data) is costing too much. Many of the intermediate
values
> >could be overwritten in place and save a lot of time.
> >... of course... don't ask how long it took to write all this stuff in
the
> >inner loops of the convolution in C, nor how long it took to debug that
> >stuff.... And immutable data definitely helps to get the code correct.
OCaml
> >still rules!
>
> I think I'm missing something.  The ocaml loops allocate memory for temps,
but the C loops reuse storage?  Did you do a version for ocaml that's got
the same level of optimizations as the C version (not allocating, etc.)?
>
> If the code's not to big, could you post it?
>
> Chris
>
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-06  2:03     ` Hugo Herbelin
@ 2001-06-06  4:04       ` Charles Martin
  2001-06-06 18:25         ` William Chesters
  2001-06-06 18:35       ` William Chesters
  1 sibling, 1 reply; 26+ messages in thread
From: Charles Martin @ 2001-06-06  4:04 UTC (permalink / raw)
  To: caml-list

>> >           a := !a +. Array.unsafe_get xs i
>>     a <- a +. Array.unsafe_get xs i

At the risk of being off topic, I am wondering why most every code scrap that appears on this list uses [Array|String].unsafe_[get|set], rather than the "safe" versions.  Are off-by-one errors and the like so rare?  Doesn't adding "-unsafe" as a compiler option to your final build have the same effect?  If not, that would be worth knowing.

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-05 10:48   ` Tom _
@ 2001-06-06  2:03     ` Hugo Herbelin
  2001-06-06  4:04       ` Charles Martin
  2001-06-06 18:35       ` William Chesters
  0 siblings, 2 replies; 26+ messages in thread
From: Hugo Herbelin @ 2001-06-06  2:03 UTC (permalink / raw)
  To: Tom _; +Cc: caml-list

> >         let a = ref 0. in
> >         for i = 0 to n-1 do
> >           a := !a +. Array.unsafe_get xs i
> >         done
> ...
> Why not have a construct that expresses what
> the programmer really wants:
> 
>   let mutable a = 0. in
>   for i = 0 to n-1 do
>     a <- a +. Array.unsafe_get xs i
>   done
> 
> Tom

I was precisely asking myself the same question, initially motivated
by explaining the imperative constructs of ML to programmers coming
from the imperative side. I mean how to explain that you find in ML a
regular for-loop, a regular while-loop but you need to use pointers to
translate assignments as in the code at top above which really only
corresponds to the following C code:

int i;
double *a = (double *) malloc(sizeof(double)); *a = 0;
for (i=0;i<=n-1;i++) *a = *a + xs[i];

Assume more generally that you can modify any local variable as in the
(standard) following example:

let fact (mutable n) =
  let mutable r = 1 in
  while n > 0 do
     r <- r * n;
     n <- n - 1
  done;
  r

At first, this is in contradiction with the substitution property of
purely functional languages, like "let x = 1 in ... x ..." should
behave the same as "... 1 ...", but it then should be enough to
explain that a while-loop binds over all the variables modified in its
body, as the functional interpretation of the while-loop shows it

let fact n =
  let r = 1 in
  let rec loop (r,n as effects) =
     if n > 0 then
       let r = r * n in
       let n = n - 1 in
       loop (r,n)
     else effects in
  let (r,n) = loop (r,n) in
  r

and the substitution property is preserved.

The compilation model of ocaml is already able to handle this (at
least the bytecode, in fact to realize the incrementation of the
for-loops counter) and the typing would just be monomorphic as for 
any mutable. May there be any problem I don't see? Is there a risk
to allow more imperative-style algorithms? Could it really lead to
a sensible benefit in efficiency for some typical algorithms?

At least, it seems there is some limitations with higher-order
functions.

> let a = ref 0 in
> Hashtbl.iter (fun k d -> a := !a + d) my_hash

Following the same principe as before, an alternative would be

let a = mutable 0 in
Hashtbl.iter (fun k d -> a <- a + d) my_hash

explaining that the substitution property is preserved by the fact
that a function too binds over the variables it modifies, even if in
this case, the functional translation needs to add an extra arg to
iter.

But since only "immediate" variables of a function are not copied (all
the variables declared by and since the closer surrounding "fun"), the
last option will not work: when building the closure, a new copy of
the value of a is put in the closure environment and further updatings
will not affect the initial cell.

                                                  Hugo
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-05  2:52     ` Brian Rogoff
@ 2001-06-05 15:02       ` Stefan Monnier
  0 siblings, 0 replies; 26+ messages in thread
From: Stefan Monnier @ 2001-06-05 15:02 UTC (permalink / raw)
  To: caml-list

>>>>> "Brian" == Brian Rogoff <bpr@best.com> writes:
> regardless. Coming back to the subject of this thread, in an ideal world, 
> there wouldn't be a need to distinguish between Bigarray and Array, you'd 
> just use polymorphic arrays and the compiler would be smart enough to 
> always get unboxed and tag free element representations. Supposedly Stalin 

That's also what SML/NJ tries to do.  But short of doing whole program
analysis and/or profiling feedback, it's difficult to generate as good
code as in the Bigarray case.


	Stefan
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 19:51 ` William Chesters
                     ` (2 preceding siblings ...)
  2001-06-04 22:14   ` Tom _
@ 2001-06-05 10:48   ` Tom _
  2001-06-06  2:03     ` Hugo Herbelin
  3 siblings, 1 reply; 26+ messages in thread
From: Tom _ @ 2001-06-05 10:48 UTC (permalink / raw)
  To: caml-list

>         let a = ref 0. in
>         for i = 0 to n-1 do
>           a := !a +. Array.unsafe_get xs i
>         done

I think it's natural for this to cause storage
allocation.  After all, "ref 0." is an object that
can be passed around.  The compiler may be able
to infer that it can allocate it on the stack, but
that may be hard in general.

Why not have a construct that expresses what
the programmer really wants:

  let mutable a = 0. in
  for i = 0 to n-1 do
    a <- a +. Array.unsafe_get xs i
  done

Tom

PS: sorry if you have seen this posting before; I
think I keep hitting the wrong button in my mailer
and end up not responding to the list


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 20:15   ` David McClain
  2001-06-04 22:34     ` Markus Mottl
@ 2001-06-05  7:22     ` Chris Hecker
  2001-06-06  6:27       ` David McClain
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Hecker @ 2001-06-05  7:22 UTC (permalink / raw)
  To: David McClain, caml-list


>I think the constant creation of new arrays to hold intermediate results
>(i.e., immutable data) is costing too much. Many of the intermediate values
>could be overwritten in place and save a lot of time.
>... of course... don't ask how long it took to write all this stuff in the
>inner loops of the convolution in C, nor how long it took to debug that
>stuff.... And immutable data definitely helps to get the code correct. OCaml
>still rules!

I think I'm missing something.  The ocaml loops allocate memory for temps, but the C loops reuse storage?  Did you do a version for ocaml that's got the same level of optimizations as the C version (not allocating, etc.)?

If the code's not to big, could you post it?

Chris


-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 22:14   ` Tom _
  2001-06-04 22:57     ` Chris Hecker
@ 2001-06-05  2:52     ` Brian Rogoff
  2001-06-05 15:02       ` Stefan Monnier
  1 sibling, 1 reply; 26+ messages in thread
From: Brian Rogoff @ 2001-06-05  2:52 UTC (permalink / raw)
  To: Tom _; +Cc: William Chesters, caml-list

On Mon, 4 Jun 2001, Tom _ wrote:
> > E.g.
> > 
> >         let a = ref 0. in
> >         for i = 0 to n-1 do
> >           a := !a +. Array.unsafe_get xs i
> >         done
> 
> Incidentally, rather than trying to come up
> with other workaround for imperative variables,
> maybe it would be better to just make the
> straightforward functional implementation fast:
> 
>   let loop i total =
>     if i=n then total else
>     loop (i+1) (total + Array.unsafe_get xs i)
>   in loop 0 0.0
> 
> In fact, in a compiler that already has TRO
> and type inference, shouldn't this be entirely
> equivalent to an imperative loop?

Yes, but I think the general problem that William Chesters alludes to (the
overhead of polymorphic variables) is an interesting one to solve
regardless. Coming back to the subject of this thread, in an ideal world, 
there wouldn't be a need to distinguish between Bigarray and Array, you'd 
just use polymorphic arrays and the compiler would be smart enough to 
always get unboxed and tag free element representations. Supposedly Stalin 
comes close (I've only played with it on *very* small programs) so maybe 
one day we'll get something similar. And maybe David McClain wouldn't have
to write his inner loops in C or Fortran. What a wonderful world that
would be...

> Another approach would be to adopt a "functional"
> loop syntax as found in languages like SISAL.  

Or SAC. But OCaml arrays (and strings) aren't functional, and the only
Clean and efficient way I know to deal with that is uniqueness type
information. Functional arrays (trees or version arrays) are not efficient
enough to replace imperative arrays when you have single threaded usage; 
definitely not the right data structure for signal processing or numerical 
linear algebra. OTOH (changing the subject) it might be nice to have
built in version arrays and functional strings for other reasons.

-- Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 22:14   ` Tom _
@ 2001-06-04 22:57     ` Chris Hecker
  2001-06-05  2:52     ` Brian Rogoff
  1 sibling, 0 replies; 26+ messages in thread
From: Chris Hecker @ 2001-06-04 22:57 UTC (permalink / raw)
  To: Tom _, William Chesters, caml-list

>maybe it would be better to just make the
>straightforward functional implementation fast:
>  let loop i total =
>    if i=n then total else
>    loop (i+1) (total + Array.unsafe_get xs i)
>  in loop 0 0.0

I was planning on asking about this after I timed it relative to a nonrecursive loop.  I noticed a while back that Markus' code for the Language Shootout (for example, http://www.bagley.org/~doug/shootout/bench/matrix/) does tail recursion rather than looping, so I was curious about whether it was superior.

However, this doesn't solve all the problems where this crops up:

let a = ref 0 in
Hashtbl.iter (fun k d -> a := !a + d) my_hash

or 

let hash_to_list h =
        let l = ref [] in
        Hashtbl.iter (fun k d -> l := d :: !l) h

or whatever.  Sometimes these can be handled by fold type functions, but not always (Hashtbl has no fold, for example).

Sometimes you need an imperative variable around, and it'd be nice if it was optimal in the easy cases.  Since function calls are so cheap these days on branch predicting CPUs, these functional map/iter/fold type things can still be high performance.

I haven't done any performance timings yet on these types of operations, though, so a huge grain of salt is attached to this email.

Chris

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 20:15   ` David McClain
@ 2001-06-04 22:34     ` Markus Mottl
  2001-06-06 20:13       ` William Chesters
  2001-06-05  7:22     ` Chris Hecker
  1 sibling, 1 reply; 26+ messages in thread
From: Markus Mottl @ 2001-06-04 22:34 UTC (permalink / raw)
  To: David McClain; +Cc: caml-list

On Mon, 04 Jun 2001, David McClain wrote:
> let dst = Array.unsafe_create nel in

Are there any intentions to add "unsafe_create" for float arrays to the
standard library? It would be nice to have a function that allocates such
arrays without initializing them - they are not scanned by the GC, anyway.

Regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 19:51 ` William Chesters
  2001-06-04 20:05   ` Chris Hecker
  2001-06-04 20:15   ` David McClain
@ 2001-06-04 22:14   ` Tom _
  2001-06-04 22:57     ` Chris Hecker
  2001-06-05  2:52     ` Brian Rogoff
  2001-06-05 10:48   ` Tom _
  3 siblings, 2 replies; 26+ messages in thread
From: Tom _ @ 2001-06-04 22:14 UTC (permalink / raw)
  To: William Chesters, caml-list

> E.g.
> 
>         let a = ref 0. in
>         for i = 0 to n-1 do
>           a := !a +. Array.unsafe_get xs i
>         done

Incidentally, rather than trying to come up
with other workaround for imperative variables,
maybe it would be better to just make the
straightforward functional implementation fast:

  let loop i total =
    if i=n then total else
    loop (i+1) (total + Array.unsafe_get xs i)
  in loop 0 0.0

In fact, in a compiler that already has TRO
and type inference, shouldn't this be entirely
equivalent to an imperative loop?

Another approach would be to adopt a "functional"
loop syntax as found in languages like SISAL.  

Tom.


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 19:51 ` William Chesters
  2001-06-04 20:05   ` Chris Hecker
@ 2001-06-04 20:15   ` David McClain
  2001-06-04 22:34     ` Markus Mottl
  2001-06-05  7:22     ` Chris Hecker
  2001-06-04 22:14   ` Tom _
  2001-06-05 10:48   ` Tom _
  3 siblings, 2 replies; 26+ messages in thread
From: David McClain @ 2001-06-04 20:15 UTC (permalink / raw)
  To: caml-list

> I can easily imagine that, with two caveats: tuples passed directly to
> functions do seem to get elided, while on the other hand apparently
> atomic float accumulators can cause more garbage than you might think.

Actually, wherever I produce new data, I usually allocate an entirely new
array to hold the results. Then I use unsafe accesses like the following for
summing two arrays:

let dst = Array.unsafe_create nel in
for ix = 0 to pred nel do
    Array.unsafe_set dst ix ((Array.unsafe_get src1 ix) +. (Array.unsafe_get
src2 ix))
done;

I think the constant creation of new arrays to hold intermediate results
(i.e., immutable data) is costing too much. Many of the intermediate values
could be overwritten in place and save a lot of time.

... of course... don't ask how long it took to write all this stuff in the
inner loops of the convolution in C, nor how long it took to debug that
stuff.... And immutable data definitely helps to get the code correct. OCaml
still rules!

- DM

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 19:51 ` William Chesters
@ 2001-06-04 20:05   ` Chris Hecker
  2001-06-04 20:15   ` David McClain
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Chris Hecker @ 2001-06-04 20:05 UTC (permalink / raw)
  To: William Chesters, caml-list

>        let a = ref 0. in
>makes garbage---`a' isn't unboxed---while
>       type floatref = { mutable it: float }
>doesn't.

Does this bother anybody else besides me?  Since ref is kind of built-in, is there any way to special case its use on unboxed types and optimize it without making us rewrite lots of code?  There's plenty of other special cased stuff in the compiler, so hey, it's a party!  :)

The other place I get bitten by ref being kind of built-in, kind of not, is in pattern matching:

match a_ref with 
{ contents = 0.0 } -> yuck

Yes, I can match !a, but that's not always convenient (for example, I want a function to take a ref, but name its value:

let f ((x,y) as xy) z = fine
let f ({contents = (x,y)} as xyr) z = a mess

Maybe I'm still too imperative and use ref too much, but I run into this all the time.

Chris

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Caml-list] OCaml Speed for Block Convolutions
  2001-06-04 13:25 David McClain
@ 2001-06-04 19:51 ` William Chesters
  2001-06-04 20:05   ` Chris Hecker
                     ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: William Chesters @ 2001-06-04 19:51 UTC (permalink / raw)
  To: caml-list

David McClain writes:
 > ....well, I hate to say this... but I did recode the innermost loop in C and
 > it now runs more than 5 times faster... Straight recoding into C bought 4x,
 > and using better math brought that up to 5x.
 > 
 > I think the big thing here is that the OCaml code was producing huge amounts
 > of garbage, despite preallocated buffers with which all the processing was
 > reading and writing data. The ancillary closures and tuple args were just
 > eating my shirt...

I can easily imagine that, with two caveats: tuples passed directly to
functions do seem to get elided, while on the other hand apparently
atomic float accumulators can cause more garbage than you might think.
E.g.

        let a = ref 0. in
        for i = 0 to n-1 do
          a := !a +. Array.unsafe_get xs i
        done

makes garbage---`a' isn't unboxed---while

        type floatref = { mutable it: float }

        let a = { it = 0. } in
        for i = 0 to n-1 do
          a := !a +. Array.unsafe_get xs i
        done

doesn't.  The effect on the visible quality of the assembler for the
inner loop is dramatic, and so is the speed improvement ... basically
using polymorphic data structures is a bad idea.  I wonder if this is
a limitation you have run up against?

Some years ago I made a library using camlp4 for supporting tensor
notation, e.g.

        tens x[I] = a[I J] b[J]

using both automatically generated C and vanilla caml.  When I
recently noticed the above point about not using polymorphic
references, I found there was rather little difference in performane
between the C and ocaml versions.
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Caml-list] OCaml Speed for Block Convolutions
@ 2001-06-04 13:25 David McClain
  2001-06-04 19:51 ` William Chesters
  0 siblings, 1 reply; 26+ messages in thread
From: David McClain @ 2001-06-04 13:25 UTC (permalink / raw)
  To: caml-list

....well, I hate to say this... but I did recode the innermost loop in C and
it now runs more than 5 times faster... Straight recoding into C bought 4x,
and using better math brought that up to 5x.

I think the big thing here is that the OCaml code was producing huge amounts
of garbage, despite preallocated buffers with which all the processing was
reading and writing data. The ancillary closures and tuple args were just
eating my shirt...

- DM

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2001-06-07 18:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-01 18:38 [Caml-list] OCaml Speed for Block Convolutions David McClain
2001-06-01 22:51 ` Tom _
2001-06-02  0:10   ` Stefan Monnier
2001-06-04 10:12     ` Jacques Garrigue
2001-06-04 13:25 David McClain
2001-06-04 19:51 ` William Chesters
2001-06-04 20:05   ` Chris Hecker
2001-06-04 20:15   ` David McClain
2001-06-04 22:34     ` Markus Mottl
2001-06-06 20:13       ` William Chesters
2001-06-06 22:29         ` Chris Hecker
2001-06-07  7:42           ` William Chesters
2001-06-05  7:22     ` Chris Hecker
2001-06-06  6:27       ` David McClain
2001-06-04 22:14   ` Tom _
2001-06-04 22:57     ` Chris Hecker
2001-06-05  2:52     ` Brian Rogoff
2001-06-05 15:02       ` Stefan Monnier
2001-06-05 10:48   ` Tom _
2001-06-06  2:03     ` Hugo Herbelin
2001-06-06  4:04       ` Charles Martin
2001-06-06 18:25         ` William Chesters
2001-06-06 18:35       ` William Chesters
2001-06-06 18:40         ` Patrick M Doane
2001-06-07  1:50         ` Hugo Herbelin
2001-06-07 18:20         ` Tom _

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).