caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* HLVM stuff
@ 2009-09-27 17:43 David McClain
  2009-09-27 19:25 ` [Caml-list] " Jon Harrop
  0 siblings, 1 reply; 7+ messages in thread
From: David McClain @ 2009-09-27 17:43 UTC (permalink / raw)
  To: caml-list

... as a specific example, I remember dealing with stacks of images  
taken from an infrared imaging sensor -- perhaps 256 images, each of  
which is, say 512 x 512 pixels. I needed to obtain median pixel values  
from the stack and produce one "median image", as well as perform  
thresholding, masking, perhaps some dilation and erosion. And, also  
get the image 2-D FFT image.

So there we are dealing with an aggregate of 2^26 8-bit pixel values.  
Surely, the majority of time is spent processing pixels, and not  
dispatching over some polymorphic type...

Dr. David McClain
dbm@refined-audiometrics.com




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] HLVM stuff
  2009-09-27 17:43 HLVM stuff David McClain
@ 2009-09-27 19:25 ` Jon Harrop
  2009-09-27 21:58   ` David McClain
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Harrop @ 2009-09-27 19:25 UTC (permalink / raw)
  To: caml-list

On Sunday 27 September 2009 18:43:24 David McClain wrote:
> ... as a specific example, I remember dealing with stacks of images
> taken from an infrared imaging sensor -- perhaps 256 images, each of
> which is, say 512 x 512 pixels. I needed to obtain median pixel values
> from the stack and produce one "median image", as well as perform
> thresholding, masking, perhaps some dilation and erosion. And, also
> get the image 2-D FFT image.
>
> So there we are dealing with an aggregate of 2^26 8-bit pixel values.
> Surely, the majority of time is spent processing pixels, and not
> dispatching over some polymorphic type...

No. That is actually an excellent example of exactly what I was talking about.

The solution most obvious to me would be something like:

  let median_image (images: _ [,] []) =
    let n = images.Length
    assert(n > 0)
    let h, w = images.[0].GetLength 0, images.[0].GetLength 1
    Array2D.parallelInit n h w (fun x y ->
      kthSmallest (n/2) (fun i -> images.[i].[y, x]))

where the "kthSmallest" and "Array2D.parallelInit" functions are both 
polymorphic. The former handles implicit sequences of any comparable type and 
the latter handles 2D arrays of any element type. This use of polymorphic 
functions would be slow in OCaml: you would have to inline and type 
specialize those functions by hand to get the best performance OCaml has to 
offer. OCaml's representation and handling of ints is inefficient so your 
performance will always be dire. Furthermore, OCaml cannot leverage 
multicores efficiently because you're returning a non-trivial result that 
would have to be marshalled in O(n) time.

In F#, you can mark the "kthSmallest" function "inline" to completely remove 
the overhead of the higher-order function and its closure argument. The 
parallelInit function uses the "n" to estimate a good initial batching for 
the parallel tasks and uses exponential backoff to avoid spending too much 
time spawning tasks. The result would be near optimal performance on my 8 
core with minimal effort.

OCaml is a hugely productive high-level language but many abstractions do 
incur severe performance penalties in OCaml and polymorphism is one of them. 
This is highly relevant to technical computing where the ability to write 
efficient code with minimal effort is extremely valuable.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] HLVM stuff
  2009-09-27 19:25 ` [Caml-list] " Jon Harrop
@ 2009-09-27 21:58   ` David McClain
  2009-09-27 23:14     ` Jon Harrop
  0 siblings, 1 reply; 7+ messages in thread
From: David McClain @ 2009-09-27 21:58 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


On Sep 27, 2009, at 12:25 PM, Jon Harrop wrote:

> where the "kthSmallest" and "Array2D.parallelInit" functions are both
> polymorphic. The former handles implicit sequences of any comparable  
> type and
> the latter handles 2D arrays of any element type. This use of  
> polymorphic


But facing a situation with 2^26 pixels to process, I would never do  
that. I would write a type-specific function to apply. Why dispatch of  
every pixel of the aggregate, when I could dispatch once at the top,  
to decide what kind of homogeneous array...

Dr. David McClain
dbm@refined-audiometrics.com




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] HLVM stuff
  2009-09-27 21:58   ` David McClain
@ 2009-09-27 23:14     ` Jon Harrop
  2009-09-28  0:35       ` David McClain
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Harrop @ 2009-09-27 23:14 UTC (permalink / raw)
  To: caml-list

On Sunday 27 September 2009 22:58:59 David McClain wrote:
> On Sep 27, 2009, at 12:25 PM, Jon Harrop wrote:
> > where the "kthSmallest" and "Array2D.parallelInit" functions are both
> > polymorphic. The former handles implicit sequences of any comparable
> > type and
> > the latter handles 2D arrays of any element type. This use of
> > polymorphic
>
> But facing a situation with 2^26 pixels to process, I would never do
> that.

Here is a better one-line F# solution:

  images |> Array2D.map (fun xs -> Array.sortInPlaceWith compare xs; xs.[m/2])

This solves your problem from the REPL in 0.34s. Moreover, you can easily 
parallelize it in F#:

  Parallel.For(0, n, fun y ->
    for x=0 to n-1 do
      Array.sortInPlaceWith compare images.[y, x])
  images |> Array2D.map (fun xs -> xs.[m/2])

On this 8-core box, the time taken is reduced to 0.039s (finally a superlinear 
speedup on my Intel box, yay!).

Here is the OCaml equivalent:

  Array.map (Array.map (fun gs -> Array.sort compare gs; gs.(m/2))) images

This solves your problem non-interactively in 32s, which is 821x slower than 
F#.

This huge performance discrepancy is a direct result of the elegant solution 
using polymorphic functions. HLVM's solution to polymorphism solves this 
problem, offering polymorphism with no performance degradation whatsoever.

> I would write a type-specific function to apply.

Why waste your time doing by hand what the compiler can do for you?

> Why dispatch of every pixel of the aggregate, when I could dispatch once at
> the top, to decide what kind of homogeneous array...

Why dispatch at all when a JIT compiler would already know all of the types 
involved and could partially specialize your code for them?

FWIW, a completed HLVM would solve this problem extremely efficiently despite 
having a naive garbage collector because the entire program only does a 
single allocation. This is not at all uncommon in technical computing and is 
exactly the characteristic I was referring to: these solutions leverage 
features of the OCaml language like higher-order functions, currying and 
partial application but they have completely different performance 
requirements to those of Coq. In the context of technical computing, the 
benefits of shared-memory parallelism far outweigh those of efficient 
single-threaded allocation and collection of small values.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] HLVM stuff
  2009-09-27 23:14     ` Jon Harrop
@ 2009-09-28  0:35       ` David McClain
  2009-09-28  1:25         ` Jon Harrop
  2009-10-13 22:18         ` Jon Harrop
  0 siblings, 2 replies; 7+ messages in thread
From: David McClain @ 2009-09-28  0:35 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


On Sep 27, 2009, at 16:14 PM, Jon Harrop wrote:

> Here is a better one-line F# solution:
>
>  images |> Array2D.map (fun xs -> Array.sortInPlaceWith compare xs;  
> xs.[m/2])
>
> This solves your problem from the REPL in 0.34s. Moreover, you can  
> easily
> parallelize it in F#:
>
>  Parallel.For(0, n, fun y ->
>    for x=0 to n-1 do
>      Array.sortInPlaceWith compare images.[y, x])
>  images |> Array2D.map (fun xs -> xs.[m/2])
>
> On this 8-core box, the time taken is reduced to 0.039s (finally a  
> superlinear
> speedup on my Intel box, yay!).


Yes, this is beginning to sound very interesting... So now that you  
have F#, which I understand to be some derivative of OCaml, why do you  
need HLVM? Is F# using the LLVM? or is it executing natively compiled  
code?

 From what I have garnered today in a quick scan of JIT docs, it  
appears that JIT cannot compete yet with native code. But if the  
timings you stated are for some kind of JIT against byte-codes, I am  
very impressed.

Thanks for that...

Dr. David McClain
dbm@refined-audiometrics.com




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] HLVM stuff
  2009-09-28  0:35       ` David McClain
@ 2009-09-28  1:25         ` Jon Harrop
  2009-10-13 22:18         ` Jon Harrop
  1 sibling, 0 replies; 7+ messages in thread
From: Jon Harrop @ 2009-09-28  1:25 UTC (permalink / raw)
  To: caml-list

On Monday 28 September 2009 01:35:32 David McClain wrote:
> Yes, this is beginning to sound very interesting... So now that you
> have F#, which I understand to be some derivative of OCaml,

F# is superficially similar to OCaml, most notably its OCaml-like syntax, but 
there are some quite major differences:

http://www.strangelights.com/fsharp/wiki/default.aspx/FSharpWiki/FSharpAndOCaml.html

> why do you need HLVM?

Good question. I saw these important advantages realized in F# by Microsoft 
and wanted to bring those benefits to the OCaml/Linux world. There is 
no "need" to do so unless you refuse to use Windows and I am happily using 
Windows now. Moreover, the libraries available under Linux are dire in 
comparison to .NET. Hence I am no longer really motivated to work on HLVM. F# 
is a lot more fun and a lot more profitable. :-)

> Is F# using the LLVM?

No. F# is Microsoft's new programming language for .NET.

> or is it executing natively compiled code?

Yes. The F# compiler generates .NET assemblies containing CIL (Common 
Intermediate Language) that the CLR (Common Language Run-time) then JIT 
compiles the CIL to native code:

  http://en.wikipedia.org/wiki/File:CLR_diag.svg

This is true of the interactive F# REPL as well as compiled binaries.

>  From what I have garnered today in a quick scan of JIT docs, it
> appears that JIT cannot compete yet with native code. But if the
> timings you stated are for some kind of JIT against byte-codes, I am
> very impressed.

The timings I posted show JIT-compiled F# solving your problem orders of 
magnitude faster than native-code compiled with ocamlopt. OCaml's interpreted 
bytecode is even slower than its compiled native code, of course. I don't 
know how fast other native-code compiled languages like C, C++ and Fortran 
are in comparison except that some of my numerical F# code outperform's 
Intel's vendor-tuned Fortran running on Intel hardware.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] HLVM stuff
  2009-09-28  0:35       ` David McClain
  2009-09-28  1:25         ` Jon Harrop
@ 2009-10-13 22:18         ` Jon Harrop
  1 sibling, 0 replies; 7+ messages in thread
From: Jon Harrop @ 2009-10-13 22:18 UTC (permalink / raw)
  To: caml-list

On Monday 28 September 2009 01:35:32 David McClain wrote:
> Yes, this is beginning to sound very interesting...

FWIW, I wrote this up as a blog post here:

  http://flyingfrogblog.blogspot.com/2009/09/f-vs-ocaml.html

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-10-13 22:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-27 17:43 HLVM stuff David McClain
2009-09-27 19:25 ` [Caml-list] " Jon Harrop
2009-09-27 21:58   ` David McClain
2009-09-27 23:14     ` Jon Harrop
2009-09-28  0:35       ` David McClain
2009-09-28  1:25         ` Jon Harrop
2009-10-13 22:18         ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).