caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Odd performance result with HLVM
@ 2009-02-28  1:12 Jon Harrop
  2009-02-28 20:18 ` [Caml-list] " Kuba Ober
  0 siblings, 1 reply; 15+ messages in thread
From: Jon Harrop @ 2009-02-28  1:12 UTC (permalink / raw)
  To: caml-list


I am developing a high-level virtual machine built upon LLVM and written in 
OCaml that uses JIT compilation to execute OCaml-like code at break-neck 
speeds.

I just stumbled upon a weird performance result: compiling my VM with ocamlc 
and ocamlopt produces very different benchmark results for the performance of 
the generated code (which should be identical). For example, my float-based 
Fibonacci function becomes 70% slower if I use ocamlopt and some other 
float-based functions also see big performance drops.

What is ocamlopt doing that might explain this? I assume it is fiddling with 
the floating point state somehow...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28  1:12 Odd performance result with HLVM Jon Harrop
@ 2009-02-28 20:18 ` Kuba Ober
  2009-02-28 21:21   ` Richard Jones
  2009-02-28 21:52   ` Jon Harrop
  0 siblings, 2 replies; 15+ messages in thread
From: Kuba Ober @ 2009-02-28 20:18 UTC (permalink / raw)
  To: caml-list

> I am developing a high-level virtual machine built upon LLVM and  
> written in
> OCaml that uses JIT compilation to execute OCaml-like code at break- 
> neck
> speeds.
>
> I just stumbled upon a weird performance result: compiling my VM  
> with ocamlc
> and ocamlopt produces very different benchmark results for the  
> performance of
> the generated code (which should be identical). For example, my  
> float-based
> Fibonacci function becomes 70% slower if I use ocamlopt and some other
> float-based functions also see big performance drops.
>
> What is ocamlopt doing that might explain this? I assume it is  
> fiddling with
> the floating point state somehow...

You didn't let us in on how it really works. You said "high-level  
virtual machine
built upon LLVM and written in OCaml". LLVM means too many things to be
able to decipher what you mean, and your statement is too general.

I'm assuming, but that's forced, so don't shoot if I make an asinus  
out of myself ;)

So, it's a VM and it runs native jit-ted code like say .net would. So  
presumably you
have some OCaml code that then invokes jit and some native functions  
to dispatch
to jit-ted code? Do you interpret any bytecode, or always compile? Do  
you even
have to have any OCaml code running in the process where the jit-ted  
code runs in?

I presume you use the LLVM infrastructure to do the jit-ting, but  
without knowing
exactly what code runs in the process space of the application, it's  
hard to tell
what's going on.

There's no floating point state to change that would slow things up  
that much.
At least I have never seen anything like that. Maybe the FP exceptions  
are being
fired somehow? You can always set a breakpoint in exception handler(s).

Cheers, Kuba


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28 20:18 ` [Caml-list] " Kuba Ober
@ 2009-02-28 21:21   ` Richard Jones
  2009-02-28 21:59     ` Jon Harrop
  2009-02-28 21:52   ` Jon Harrop
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Jones @ 2009-02-28 21:21 UTC (permalink / raw)
  To: Kuba Ober; +Cc: caml-list

On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote:
> You didn't let us in on how it really works. You said "high-level  
> virtual machine
> built upon LLVM and written in OCaml". LLVM means too many things to be
> able to decipher what you mean, and your statement is too general.

I'm still waiting for the open source project ...

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28 20:18 ` [Caml-list] " Kuba Ober
  2009-02-28 21:21   ` Richard Jones
@ 2009-02-28 21:52   ` Jon Harrop
  2009-03-02  3:20     ` Jon Harrop
  2009-03-02 14:28     ` Florent Ouchet
  1 sibling, 2 replies; 15+ messages in thread
From: Jon Harrop @ 2009-02-28 21:52 UTC (permalink / raw)
  To: caml-list

On Saturday 28 February 2009 20:18:40 Kuba Ober wrote:
> You didn't let us in on how it really works. You said "high-level
> virtual machine
> built upon LLVM and written in OCaml". LLVM means too many things to be
> able to decipher what you mean, and your statement is too general.

I'm referring to my HLVM project that I described here:

http://flyingfrogblog.blogspot.com/2008/12/building-better-future-high-level.html

That is built upon the Low-Level Virtual Machine here:

http://llvm.org

Basically, my idea is to build a better VM for OCaml-like languages. My work 
is still at a very early stage of development but the results are already 
promising: faster than ocamlopt on all of my benchmarks and several times 
faster on some numerical benchmarks.

There are really two major advantages over the current ocamlopt design and 
both stem from the use of JIT compilation:

. Run-time types allow per-type functions like generic pretty printers and 
comparison.

. Monomorphisation during JIT compilation completely removes the performance 
cost of polymorphism, e.g. floats, tuples and records are never boxed.

Algebraic datatypes were the latest addition and I expected them to be very 
slow because I am calling thread-safe malloc every time one is created. 
However, a simple benchmark that creates and folds over a huge list actually 
runs faster on my HLVM than it does in OCaml.

> I'm assuming, but that's forced, so don't shoot if I make an asinus
> out of myself ;)
>
> So, it's a VM and it runs native jit-ted code like say .net would. So
> presumably you
> have some OCaml code that then invokes jit and some native functions
> to dispatch
> to jit-ted code?

Yes.

> Do you interpret any bytecode, or always compile?

Always compile.

> Do  
> you even
> have to have any OCaml code running in the process where the jit-ted
> code runs in?

I believe the OCaml bindings to LLVM JIT compile a function, get a pointer to 
it and call into that function from the current thread. So there is no 
process separation.

> I presume you use the LLVM infrastructure to do the jit-ting, but
> without knowing
> exactly what code runs in the process space of the application, it's
> hard to tell
> what's going on.

You just gave me another idea: use LLVM to compile the same IR code to a 
standalone executable and benchmark that. I'll get back to you...

> There's no floating point state to change that would slow things up
> that much.
> At least I have never seen anything like that.

That's what I thought.

> Maybe the FP exceptions are being fired somehow?

Maybe something like that. I have no idea.

This definitely only affects float performance so it is not a difference in 
the efficiency of calling into JIT compiled code from OCaml bytecode or 
native code.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28 21:21   ` Richard Jones
@ 2009-02-28 21:59     ` Jon Harrop
  2009-03-02  0:38       ` Jon Harrop
  0 siblings, 1 reply; 15+ messages in thread
From: Jon Harrop @ 2009-02-28 21:59 UTC (permalink / raw)
  To: caml-list

On Saturday 28 February 2009 21:21:18 Richard Jones wrote:
> On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote:
> > You didn't let us in on how it really works. You said "high-level
> > virtual machine
> > built upon LLVM and written in OCaml". LLVM means too many things to be
> > able to decipher what you mean, and your statement is too general.
>
> I'm still waiting for the open source project ...

I'm working on it. I now have:

. unit, bool, int, float.
. tuples.
. arrays.
. algebraic datatypes.
. function pointers.
. tail calls.
. generic printing.
. catching stack overflows.
. C FFI.
. JIT compilation to high performance native code.

I need to implement closures and a GC before I release anything. Then I'll add 
parametric polymorphism, exceptions and a front end.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28 21:59     ` Jon Harrop
@ 2009-03-02  0:38       ` Jon Harrop
  0 siblings, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-02  0:38 UTC (permalink / raw)
  To: caml-list

On Saturday 28 February 2009 21:59:29 Jon Harrop wrote:
> On Saturday 28 February 2009 21:21:18 Richard Jones wrote:
> > On Sat, Feb 28, 2009 at 03:18:40PM -0500, Kuba Ober wrote:
> > > You didn't let us in on how it really works. You said "high-level
> > > virtual machine
> > > built upon LLVM and written in OCaml". LLVM means too many things to be
> > > able to decipher what you mean, and your statement is too general.
> >
> > I'm still waiting for the open source project ...
>
> I'm working on it. I now have:
>
> . unit, bool, int, float.
> . tuples.
> . arrays.
> . algebraic datatypes.
> . function pointers.
> . tail calls.
> . generic printing.
> . catching stack overflows.
> . C FFI.
> . JIT compilation to high performance native code.
>
> I need to implement closures...

Now that I come to try it, I don't think I do need to implement closures. 
Check out the following sample:

let curry : t list =
  let ty_closure(ty1, ty2) =
    `Struct[`Function([`Reference; ty1], ty2); `Reference] in
  let apply(f, x) =
    Apply(GetValue(f, 0), [GetValue(f, 1); x]) in
  let ty_ret = `Struct[`Int; `Float] in
  [`Function("f_uncurried", ["x", `Int; "y", `Float], ty_ret,
	     Struct[Var "x"; Var "y"]);

   `Type("env_int", ["Int", `Int]);

   `Function("f_apply_2", ["env", `Reference; "y", `Float], ty_ret,
	     Apply(Var "f_uncurried", [Cast(Var "env", "Int"); Var "y"]));

   `Function("f_apply_1", ["x", `Int], ty_closure(`Float, ty_ret),
	     Struct[Var "f_apply_2"; Construct("Int", Var "x")]);

   `Expr(Let("g", Apply(Var "f_apply_1", [Int 3]),
	     Struct[apply(Var "g", Float 2.3);
		    apply(Var "g", Float 3.4)]), `Struct[ty_ret; ty_ret])]

Unions are represented by the type "`Reference" and constructed 
with "Construct". Their type can be tested with "IsType" and their argument 
can be extracted with "Cast".

The above code is an intermediate representation of the curried OCaml 
function:

  let f (x: int) (y: float) = (x, y)

and the example application of a closure created by partially applying the 
first argument "x":

  let g = f 3 in (g 2.3, g 3.4)

The raw representation is an uncurried function "f_uncurried". That should 
make recursive closures a lot faster than they are in OCaml.

The type constructor "Int" is used to represent a closure environment that 
conveys an int. Such types can be generated on-the-fly whenever the compiler 
sees code that would generate a closure that required an environment of this 
type.

The "f_apply_2" function extracts "x" from the environment and applies "x" 
and "y" to "f".

The "f_apply_1" function partially applies "x" to "f", creating a closure 
represented by a struct containing a pointer to "f_apply_2" and the boxed 
environment containing "x".

The final expression partially applies "f" to "x=3" and then applies the 
resulting closure twice.

Running this program in HLVM produces:

Writing bitcode
Evaluating
- : <type> = ((3, 2.3), (3, 3.4))
Took 0.027330s

Note that the structs are pretty printed as tuples using generic printers. You 
can call the internal Print function on any value of any type and it will be 
pretty printed.

A front-end for a functional language can, therefore, already handle 
first-class functions with the current HLVM by representing all functional 
values as a struct containing a function pointer and a boxed environment. 
Moreover, HLVM could include an optimization pass to remove redundant closure 
constructions and deconstructions. The nice aspect of this is that it reduces 
complexity and makes it much easier to write a GC: only algebraic datatypes 
are non-trivial.

So I'll probably just reimplement my GC and then publish HLVM as open source 
software on the OCaml Forge.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28 21:52   ` Jon Harrop
@ 2009-03-02  3:20     ` Jon Harrop
  2009-03-02 14:28     ` Florent Ouchet
  1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-02  3:20 UTC (permalink / raw)
  To: caml-list

On Saturday 28 February 2009 21:52:18 Jon Harrop wrote:
> However, a simple benchmark that creates and folds over a huge list
> actually runs faster on my HLVM than it does in OCaml.

I just finished implementing a list-based solution to the n-queens problem and 
that does run 5-14x faster in OCaml than on my HLVM:

n=8
OCaml: 0.012s
HLVM:  0.190s

n=9
OCaml: 0.088s
HLVM:  0.591s

n=10
OCaml: 0.740s
HLVM:  3.713s

n=11
OCaml: 6.68s
HLVM:  Out of memory

HLVM ran out of memory in the last case because I have not yet reintegrated 
the GC.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-02-28 21:52   ` Jon Harrop
  2009-03-02  3:20     ` Jon Harrop
@ 2009-03-02 14:28     ` Florent Ouchet
  2009-03-02 16:18       ` Jon Harrop
  2009-03-02 20:09       ` Kuba Ober
  1 sibling, 2 replies; 15+ messages in thread
From: Florent Ouchet @ 2009-03-02 14:28 UTC (permalink / raw)
  To: caml-list

Jon Harrop a écrit :
> There are really two major advantages over the current ocamlopt design and 
> both stem from the use of JIT compilation:
>
> . Run-time types allow per-type functions like generic pretty printers and 
> comparison.
>
> . Monomorphisation during JIT compilation completely removes the performance 
> cost of polymorphism, e.g. floats, tuples and records are never boxed.

Do you mean that each polymorphic function is compiled into a different
native piece of code each time it is called with different parameter
types? How does the JIT'ed code size compare to ocamlopt'ed code size?

- Florent



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-02 14:28     ` Florent Ouchet
@ 2009-03-02 16:18       ` Jon Harrop
  2009-03-02 20:09       ` Kuba Ober
  1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-02 16:18 UTC (permalink / raw)
  To: caml-list

On Monday 02 March 2009 14:28:23 Florent Ouchet wrote:
> Jon Harrop a écrit :
> > There are really two major advantages over the current ocamlopt design
> > and both stem from the use of JIT compilation:
> >
> > . Run-time types allow per-type functions like generic pretty printers
> > and comparison.
> >
> > . Monomorphisation during JIT compilation completely removes the
> > performance cost of polymorphism, e.g. floats, tuples and records are
> > never boxed.
>
> Do you mean that each polymorphic function is compiled into a different
> native piece of code each time it is called with different parameter
> types?

Yes.

> How does the JIT'ed code size compare to ocamlopt'ed code size? 

No idea. Without a front end I have only compiled the smallest pieces of test 
code so far, just to make sure that the functionality works.

.NET does the same thing and it offers substantial performance improvements 
over OCaml for polymorphic code.

Note that there is no reason to distinguish between reference types for they 
can all be treated equivalently with respect to instantiating polymorphic 
code. My type system is as follows:

type t =
    [ `Unit
    | `Bool
    | `Int
    | `Float
    | `Struct of t list
    | `Array of t
    | `Function of t list * t
    | `Reference ]

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-02 14:28     ` Florent Ouchet
  2009-03-02 16:18       ` Jon Harrop
@ 2009-03-02 20:09       ` Kuba Ober
  2009-03-04 16:17         ` Mikkel Fahnøe Jørgensen
  1 sibling, 1 reply; 15+ messages in thread
From: Kuba Ober @ 2009-03-02 20:09 UTC (permalink / raw)
  To: caml-list


> Jon Harrop a écrit :
>> There are really two major advantages over the current ocamlopt  
>> design and both stem from the use of JIT compilation:
>>
>> . Run-time types allow per-type functions like generic pretty  
>> printers and comparison.
>>
>> . Monomorphisation during JIT compilation completely removes the  
>> performance cost of polymorphism, e.g. floats, tuples and records  
>> are never boxed.
>
> Do you mean that each polymorphic function is compiled into a  
> different
> native piece of code each time it is called with different parameter
> types? How does the JIT'ed code size compare to ocamlopt'ed code size?

Having done it, although not in a JIT but in your plain-old whole- 
project compiler,
for my use cases the code size actually shrinks. The functions usually  
end up inlined
and sometimes reduce to a few machine instructions. Most of the  
runtime library is written
using polymorphic functions. Case in point: all sorts of string- 
processing functions which
can take as arguments either strings stored in RAM or stored in ROM,  
and those data types
are very much orthogonal on my platform. An invocation of a tail- 
recursive
"strlen" reduces to about as many bytes of code than it'd take to push  
the arguments on
the stack and call a non-polymorphic version of itself.

That's how I initially got a statically typed LISP to compile for  
"tiny" 8 bit microcontrollers
without using all of the whopping 1kb of RAM and 16kb of program flash  
on a Z8F162x
device.

Right now I'm hacking away to get rid of last traces of LISPiness and  
to get the project fully
working in OCaml, using ML-like syntax for user code. I like it much  
better than LISP's.

I have also found that by doing whole-project compilation with  
aggressive constant propagation
and compile-time execution of functions that depend only on known  
constants, I could get
rid of about 85% of LISP macros in my code. The other macros ended up  
being rewritten
to just invoke ct_eval: string -> function, which is a compile-time  
eval function.
It's just like LISP macros, but since in ML family code isn't data, it  
was easier to just
generate strings and feed them into compiler, rather than expose all  
of the AST machinery
to "userland".

Cheers, Kuba

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-02 20:09       ` Kuba Ober
@ 2009-03-04 16:17         ` Mikkel Fahnøe Jørgensen
  2009-03-04 16:30           ` Kuba Ober
  2009-03-04 19:05           ` Jon Harrop
  0 siblings, 2 replies; 15+ messages in thread
From: Mikkel Fahnøe Jørgensen @ 2009-03-04 16:17 UTC (permalink / raw)
  To: Kuba Ober; +Cc: caml-list

When looking at the benchmark game and other benchmarks I have seen, I
noticed that Haskell is almost as fast as OCaml and sometimes faster.
Some Lisp implementations are also pretty fast.

However, when you look at memory consumption OCaml uses considerably
less memory, except for languages in the C family.

I suspect that many real world performance scenarios, such as heavily
loaded web servers and complex simulations, depend very much on memory
consumption. This is both because of GC overhead and because of the
slower memory pipeline the more cache levels are involved.

So in case of a new JIT solution for OCaml, I believe it is important
to observe this aspect as well.

Mikkel

2009/3/2 Kuba Ober <ober.14@osu.edu>:
>
>> Jon Harrop a écrit :
>>>
>>> There are really two major advantages over the current ocamlopt design
>>> and both stem from the use of JIT compilation:
>>>
>>> . Run-time types allow per-type functions like generic pretty printers
>>> and comparison.
>>>
>>> . Monomorphisation during JIT compilation completely removes the
>>> performance cost of polymorphism, e.g. floats, tuples and records are never
>>> boxed.
>>
>> Do you mean that each polymorphic function is compiled into a different
>> native piece of code each time it is called with different parameter
>> types? How does the JIT'ed code size compare to ocamlopt'ed code size?
>
> Having done it, although not in a JIT but in your plain-old whole-project
> compiler,
> for my use cases the code size actually shrinks. The functions usually end
> up inlined
> and sometimes reduce to a few machine instructions. Most of the runtime
> library is written
> using polymorphic functions. Case in point: all sorts of string-processing
> functions which
> can take as arguments either strings stored in RAM or stored in ROM, and
> those data types
> are very much orthogonal on my platform. An invocation of a tail-recursive
> "strlen" reduces to about as many bytes of code than it'd take to push the
> arguments on
> the stack and call a non-polymorphic version of itself.
>
> That's how I initially got a statically typed LISP to compile for "tiny" 8
> bit microcontrollers
> without using all of the whopping 1kb of RAM and 16kb of program flash on a
> Z8F162x
> device.
>
> Right now I'm hacking away to get rid of last traces of LISPiness and to get
> the project fully
> working in OCaml, using ML-like syntax for user code. I like it much better
> than LISP's.
>
> I have also found that by doing whole-project compilation with aggressive
> constant propagation
> and compile-time execution of functions that depend only on known constants,
> I could get
> rid of about 85% of LISP macros in my code. The other macros ended up being
> rewritten
> to just invoke ct_eval: string -> function, which is a compile-time eval
> function.
> It's just like LISP macros, but since in ML family code isn't data, it was
> easier to just
> generate strings and feed them into compiler, rather than expose all of the
> AST machinery
> to "userland".
>
> Cheers, Kuba
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-04 16:17         ` Mikkel Fahnøe Jørgensen
@ 2009-03-04 16:30           ` Kuba Ober
  2009-03-04 18:15             ` Mikkel Fahnøe Jørgensen
  2009-03-04 19:05           ` Jon Harrop
  1 sibling, 1 reply; 15+ messages in thread
From: Kuba Ober @ 2009-03-04 16:30 UTC (permalink / raw)
  To: caml-list


On Mar 4, 2009, at 11:17 AM, Mikkel Fahnøe Jørgensen wrote:

> When looking at the benchmark game and other benchmarks I have seen, I
> noticed that Haskell is almost as fast as OCaml and sometimes faster.
> Some Lisp implementations are also pretty fast.
>
> However, when you look at memory consumption OCaml uses considerably
> less memory, except for languages in the C family.
>
> I suspect that many real world performance scenarios, such as heavily
> loaded web servers and complex simulations, depend very much on memory
> consumption. This is both because of GC overhead and because of the
> slower memory pipeline the more cache levels are involved.
>
> So in case of a new JIT solution for OCaml, I believe it is important
> to observe this aspect as well.

I believe it is also important not to dynamically allocate memory for no
good reason.

All of my realtime numerical code uses statically allocated memory with
overlaying based on execution flow of basic blocks. That has zero  
runtime
overhead: the produced machine code has fixed addresses for data
(not all data of course).

It reduces to whether a "basic block" can be re-entered from its  
future (downstream)
or not. If it can, you have to use stack or heap. If it won't, then  
you can do static
allocation. The potential cost is if given function is entered from  
many points.
At that point you can get some overhead since the overlaying has to take
into account all possible ways the code is reached. This can be  
mitigated
by generating more than one copy of the function. It makes sense when  
you
have some free code ROM, but your RAM is almost full.

This of course can only be done when you do whole-project compilation.  
If you
compile "modules" separately, you have to fall back on doing it in the  
linker,
where all you have is the function call graph and available  
granularity is much
worse, at bigger RAM overhead. The code ROM overhead is then none  
since linker
can hardly generate copies of functions; at the point where you copy  
functions
you may as well do other optimizations, so linker is way too late to  
do that
efficiently.

There's no reason not to use those techniques in code that runs on  
"large"
platforms. It'd at least artificially boost some benchmark results ;)

Cheers, Kuba

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-04 16:30           ` Kuba Ober
@ 2009-03-04 18:15             ` Mikkel Fahnøe Jørgensen
  2009-03-04 18:32               ` Jon Harrop
  0 siblings, 1 reply; 15+ messages in thread
From: Mikkel Fahnøe Jørgensen @ 2009-03-04 18:15 UTC (permalink / raw)
  To: caml-list

oh, and another thing:

I'd like the VM to be small enough to link into an executable like the
OCaml runtime so you don't have some version and dispatch nightmare.

Mikkel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-04 18:15             ` Mikkel Fahnøe Jørgensen
@ 2009-03-04 18:32               ` Jon Harrop
  0 siblings, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-04 18:32 UTC (permalink / raw)
  To: caml-list

On Wednesday 04 March 2009 18:15:50 Mikkel Fahnøe Jørgensen wrote:
> oh, and another thing:
>
> I'd like the VM to be small enough to link into an executable like the
> OCaml runtime so you don't have some version and dispatch nightmare.

You can generate executables but it takes 10s to compile HLVM to OCaml 
bytecode and the result is a 60Mb file so I doubt it will satisfy "small".

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Odd performance result with HLVM
  2009-03-04 16:17         ` Mikkel Fahnøe Jørgensen
  2009-03-04 16:30           ` Kuba Ober
@ 2009-03-04 19:05           ` Jon Harrop
  1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2009-03-04 19:05 UTC (permalink / raw)
  To: caml-list

On Wednesday 04 March 2009 16:17:55 Mikkel Fahnøe Jørgensen wrote:
> When looking at the benchmark game and other benchmarks I have seen, I
> noticed that Haskell is almost as fast as OCaml and sometimes faster.
> Some Lisp implementations are also pretty fast.

From my ray tracer language comparison, my OCaml is ~50% faster than Haskell 
written by Lennart Augustsson and Lisp written by Juho Snellman, both of whom 
have extensive experience writing optimizing compilers for those languages 
whereas I did not:

  http://www.ffconsultancy.com/languages/ray_tracer/results.html

Moreover, I received dozens of implementations in Haskell and Lisp and these 
were the only vaguely competitive ones: most programmers are not able to 
write fast code in Haskell or Lisp primarily because their performance is so 
wildly unpredictable.

The Burrows-Wheeler block sorting data compression algorithm implemented in 
Haskell and discussed extensively for weeks in the context of performance is 
a good example of this: they never got within 10,000x the performance of C. 
There are many other examples where nobody was able to get within orders of 
magnitude of the performance of common languages.

GHC does have rudimentary support for parallelism and that makes it much 
easier to leverage 2-6 cores in Haskell compared to OCaml. However, that is 
merely a deficiency in the current OCaml implementation and is something that 
can be addressed. Moreover, the current Haskell implementation scales very 
poorly and is easily maxed out even on today's 8 core computers. For example, 
on a recent Mandelbrot benchmark from comp.lang.functional the Haskell is 
faster for 1-6 cores but stops seeing improvements beyond that whereas OCaml 
with process forking continues to see improvements up to all 8 cores and it 
faster overall on this machine as a consequence.

Although efficient concurrent garbage collectors are hard to write, parallel 
ones like the one in GHC are comparatively easy to write and still very 
useful.

> However, when you look at memory consumption OCaml uses considerably
> less memory, except for languages in the C family.
>
> I suspect that many real world performance scenarios, such as heavily
> loaded web servers and complex simulations, depend very much on memory
> consumption. This is both because of GC overhead and because of the
> slower memory pipeline the more cache levels are involved.
>
> So in case of a new JIT solution for OCaml, I believe it is important
> to observe this aspect as well.

OCaml's memory efficiency is certainly extremely good and it may be 
theoretically possible to preserve that in a new implementation that supports 
parallelism. That is absolutely not the goal of my work though: I only intent 
to get the simplest possible parallel GC working because I am interested 
primarily in high-performance numerics, string processing and visualization 
and not web servers.

However, I will endeavour to make the implementation as extensible as possible 
so that other people can create drop-in replacements that provide this kind 
of functionality. Improving upon my GC should be quite easy for anyone versed 
in the subject. Interestingly, my GC is written entirely in my own 
intermediate representation.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-03-04 19:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-28  1:12 Odd performance result with HLVM Jon Harrop
2009-02-28 20:18 ` [Caml-list] " Kuba Ober
2009-02-28 21:21   ` Richard Jones
2009-02-28 21:59     ` Jon Harrop
2009-03-02  0:38       ` Jon Harrop
2009-02-28 21:52   ` Jon Harrop
2009-03-02  3:20     ` Jon Harrop
2009-03-02 14:28     ` Florent Ouchet
2009-03-02 16:18       ` Jon Harrop
2009-03-02 20:09       ` Kuba Ober
2009-03-04 16:17         ` Mikkel Fahnøe Jørgensen
2009-03-04 16:30           ` Kuba Ober
2009-03-04 18:15             ` Mikkel Fahnøe Jørgensen
2009-03-04 18:32               ` Jon Harrop
2009-03-04 19:05           ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).