From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 7441EBC57 for ; Tue, 16 Nov 2010 18:07:34 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aj8EAAJJ4kzRVdS0k2dsb2JhbACULjKFdQGIAQgVAQEBAQkJCgkRAx+lWolighiFFS6IWQEBAwWFRgSBXIh8iSo X-IronPort-AV: E=Sophos;i="4.59,206,1288566000"; d="scan'208";a="66308026" Received: from mail-px0-f180.google.com ([209.85.212.180]) by mail3-smtp-sop.national.inria.fr with ESMTP; 16 Nov 2010 18:07:32 +0100 Received: by pxi8 with SMTP id 8so202416pxi.39 for ; Tue, 16 Nov 2010 09:07:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:cc:content-type; bh=/08nILijIcxRt1akRYTJPesnNHqMpY5hB5UaENwPo7E=; b=ZvvRkljYElODtvkIOkWi4u5tP1GzqKrKjqH8oetMFSBmsJl8tkQm90se0BQZSogEUb AhAMRcrhjih+23qBspfQcyVq8CC3USVP+4z7aN2oUjuVYR2GRuGrjBrIc7PkTq/D7XH0 acBV193E6z3ckafwj5srP8PjgTjQzWcLoIQuU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; b=LOpgpw/i3QcBXme5STs1UVYn4DFWtCf5V6gH07bdWAw2sQbKgbx/v5EHh+8d+g2+dx zl2m7HhIg+g9XteDpd1wjT3YBBoYFEdzdN1SM7MUgTPUIgh3nzdig+lV2mrB9npEC/ph j+fz/9vFbaTrFFfLv+OlIMPYeVlPFIlCbzwxI= Received: by 10.229.110.8 with SMTP id l8mr6369983qcp.144.1289927250128; Tue, 16 Nov 2010 09:07:30 -0800 (PST) MIME-Version: 1.0 Sender: gabriel.scherer@gmail.com Received: by 10.229.253.135 with HTTP; Tue, 16 Nov 2010 09:07:06 -0800 (PST) In-Reply-To: References: From: bluestorm Date: Tue, 16 Nov 2010 18:07:06 +0100 X-Google-Sender-Auth: 1ORSvqd2zvrd5x1y8NONw5wvOa0 Message-ID: Subject: Re: [Caml-list] OCamlJit 2.0 To: Benedikt Meurer Cc: caml-list@yquem.inria.fr Content-Type: multipart/alternative; boundary=0016367f9c1287adaf04952e971c X-Spam: no; 0.00; basile:01 bytecode:01 suboptimal:01 bytecode:01 compilation:01 ocamlrun:01 ocamlc:01 -help:01 compilation:01 ocamlc:01 byte-code:01 speedups:01 speedups:01 ocaml:01 compiler:01 --0016367f9c1287adaf04952e971c Content-Type: text/plain; charset=ISO-8859-1 To those of you who are lazy but still curious, I just read the report, and here are the answers to the question I had: 1. Is that project related to Basile Starynkevitch's venerable OCamlJIT ? Yes, OcamlJIT was apparently a major inspiration for this work. The overall design is similar, and in particular the objective of absolute compatibility with the existing bytecode (even when it may hurts performances) was retained. Unfortunately, due to bitrotting and different architectures (OcamlJIT x86 vs. OcamlJIT2 x86_64), there is no direct performance comparison, though the reports seems to indicate similar to better performances. Several points were also improved (better (but still suboptimal) register usage, clever hacks to speed up mapping between bytecode and native code adresses...). 2. Does it use LLVM ? No, it doesn't, but it's justified. Apparently, an early prototype showed than LLVM compilation/generation overhead was too high. A (debatable) design goal of OcamlJIT2.0 is to be absolutely faster than ocamlrun, even on *very* short-running programs (`ocamlc -help`). Here is a direct quote from the report : > In a simple OCamlJit2 prototype based on LLVM, we have measured significant compilation overhead; short running programs (like ocamlc applied to a small > to medium sized *.ml file) were three to four times slower than running the same > program with the byte-code interpreter. Similar results were observed by other projects using > LLVM for Just-In-Time compilation. For example, the Mono project10 reported > impressive speedups for long running, computationally intensive applications, but at the cost of > increased com- pilation time and memory usage. OCamlJIT2.0 uses its own macro to generate x86_64 assembly directly (apparently using a dedicated library wasn't worth it). A drawback of this strategy is its inherent non-portability. 3. Does it perform well ? As explained in the original post, the speedups against bytecode are around 2-6x. It's quite honourable, and the results are very consistent: all benchmarked programs running for more than 1 second have speedups between 2.59 and 5.34. The authors seems to think other optimizations are possible, and expect significant performance improvements in the future. The goal of being "on par with the ocaml native compiler" is (prudently) mentioned. On the benchmark, ocamlopt was itself 2 to 6 times faster than OcamlJIT2.0. 4. Various remarks: Various clever hacks are described in the report, which bring noticeable performance improvements but probably make the implementation more complex. I was also interested by the following remark --- maybe a bit provocative: Both JVM and CLR depend on profile guided optimizations for good > performance, because their byte-code is slower than the OCaml byte-code. This is mostly > because of the additional overhead that comes with exception handling, multi-threading, > class loading and object synchronization. But it is also due to the fact that both the JVM > byte-code [27] and the Common Intermediate Language (CIL) [10] instruction sets are not as > optimized as the OCaml byte-code. On Tue, Nov 16, 2010 at 3:52 PM, Benedikt Meurer < benedikt.meurer@googlemail.com> wrote: > Hello everybody, > > OCamlJit 2.0 is a new Just-In-Time engine for Objective Caml 3.12.0 on > desktop processors (x86/x86-64). It translates the OCaml byte-code used by > the interpreter (ocamlrun and ocaml) to x86/x86-64 native code on-demand and > runs the generated native code instead of interpreting the byte-code. It is > designed to run with minimal compilation overhead (translating only what is > being executed, avoiding costly code generation and optimization > techniques), while being 100% compatible with the byte-code runtime > (including serialization and hashing of closures, etc.). > > OCamlJit 2.0 was specifically designed for desktop processors and is not > really portable to anything else in its current shape, because the target > audience are people using the interactive top-level and the byte-code > interpreter for rapid prototyping/development (which is unlikely to happen > on anything else but x86/x86-64). The implementation currently requires a > system that adheres to the SysV ABI, which includes Linux, BSD, OS X, but > excludes Win32/Win64 (patches/ideas are welcome). It was tested on Linux/x86 > (Debian), Linux/amd64 (CentOS) and Mac OS X 10.6 (64bit). The x86 > implementation requires SSE2 capable processors (otherwise it falls back to > the byte-code interpreter), so it won't speedup your OCaml programs running > on 486 CPUs. :-) > > OCamlJit 2.0 runs most benchmarks at 2-6 times faster than the byte-code > interpreter. The interactive top-level benefits twice when used with the JIT > engine: (a) the compiler stages are JIT compiled and (b) the generated > byte-code is JIT compiled. A tech report describing a slightly earlier > prototype and including performance measures of OCamlJit 2.0 on Mac OS X > (64bit) is available at: > > http://arxiv.org/abs/1011.1783 > > The source code is available from the Git repository (master branch) at: > > http://gitorious.org/ocamljit2/ocamljit2 > > Installation is similar to installation of Objective Caml, just run > > ./configure -prefix /path/to/ocamljit2 [options] > > followed by > > make world opt > make install > > This will install a fully working Objective Caml 3.12.0 to > /path/to/ocamlji2, where /path/to/ocamljit2/bin/ocamlrun and > /path/to/ocamljit2/lib/libcamlrun_shared.so include the JIT engine in > addition to the byte-code interpreter (fallback to the byte-code interpreter > is necessary for debugging with ocamldebug). The configure script prints a > line indicating whether the JIT engine is enabled or not (if not, it'll be > just a regular OCaml 3.12 installation). > > Comments are welcome. > > Enjoy! > > Benedikt Meurer > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > --0016367f9c1287adaf04952e971c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
To those of you who are lazy but still curious, = I just read the report, and here are the answers to the question I had:

1. Is tha= t project related to Basile Starynkevitch's venerable OCamlJIT ?

Yes, OcamlJ= IT was apparently a major inspiration for this work. The overall design is = similar, and in particular the objective of absolute compatibility with the= existing bytecode (even when it may hurts performances) was retained. Unfo= rtunately, due to bitrotting and different architectures (OcamlJIT x86 vs. = OcamlJIT2 x86_64), there is no direct performance comparison, though the re= ports seems to indicate similar to better performances. Several points were= also improved (better (but still suboptimal) register usage, clever hacks = to speed up mapping between bytecode and native code adresses...).

2. Does it = use LLVM ?

No, it doesn't, but it's justified. Apparently, an early proto= type showed than LLVM compilation/generation overhead was too high. A (deba= table) design goal of OcamlJIT2.0 is to be absolutely faster than ocamlrun,= even on *very* short-running programs (`ocamlc -help`). Here is a direct q= uote from the report :
In a simple OCamlJit2 prototype based on LLVM, we have measured significant=
compilation overhead; short running programs (like ocamlc applied to a smal= l to medium
sized *.ml file) were three to four times slower than running the same prog= ram with the
byte-code interpreter. Similar results were observed by other projects usin= g LLVM for
Just-In-Time compilation. For example, the Mono project10 reported impressi= ve speedups
for long running, computationally intensive applications, but at the cost o= f increased com-
pilation time and memory usage.

OCamlJIT2.0= uses its own macro to generate x86_64 assembly directly (apparently using = a dedicated library wasn't worth it). A drawback of this strategy is it= s inherent non-portability.


3. Does it p= erform well ?

As explained in the original post, the speedups against bytecode ar= e around 2-6x. It's quite honourable, and the results are very consiste= nt: all benchmarked programs running for more than 1 second have speedups b= etween 2.59 and 5.34.

The authors= seems to think other optimizations are possible, and expect significant pe= rformance improvements in the future. The goal of being "on par with t= he ocaml native compiler" is (prudently) mentioned. On the benchmark, = ocamlopt was itself 2 to 6 times faster than OcamlJIT2.0.


<= div class=3D"gmail_quote">4. Various remarks:

Various clever hacks are described = in the report, which bring noticeable performance improvements but probably= make the implementation more complex.
I was also interested by the following remark --= - maybe a bit provocative:

Both JVM and CLR depend on profile guided optimizations for good performanc= e,
because their byte-code is slower than the OCaml byte-code. This is mostly = because of
the additional overhead that comes with exception handling, multi-threading= , class loading
and object synchronization. But it is also due to the fact that both the JV= M byte-code [27]
and the Common Intermediate Language (CIL) [10] instruction sets are not as= optimized
as the OCaml byte-code.



On Tue, Nov 16, 2010 at 3:52 PM, Benedikt Meurer = <benedikt.meurer@googl= email.com> wrote:
Hello everybody,

OCamlJit 2.0 is a new Just-In-Time engine for Objective Caml 3.12.0 on desk= top processors (x86/x86-64). It translates the OCaml byte-code used by the = interpreter (ocamlrun and ocaml) to x86/x86-64 native code on-demand and ru= ns the generated native code instead of interpreting the byte-code. It is d= esigned to run with minimal compilation overhead (translating only what is = being executed, avoiding costly code generation and optimization techniques= ), while being 100% compatible with the byte-code runtime (including serial= ization and hashing of closures, etc.).

OCamlJit 2.0 was specifically designed for desktop processors and is not re= ally portable to anything else in its current shape, because the target aud= ience are people using the interactive top-level and the byte-code interpre= ter for rapid prototyping/development (which is unlikely to happen on anyth= ing else but x86/x86-64). The implementation currently requires a system th= at adheres to the SysV ABI, which includes Linux, BSD, OS X, but excludes W= in32/Win64 (patches/ideas are welcome). It was tested on Linux/x86 (Debian)= , Linux/amd64 (CentOS) and Mac OS X 10.6 (64bit). The x86 implementation re= quires SSE2 capable processors (otherwise it falls back to the byte-code in= terpreter), so it won't speedup your OCaml programs running on 486 CPUs= . :-)

OCamlJit 2.0 runs most benchmarks at 2-6 times faster than the byte-code in= terpreter. The interactive top-level benefits twice when used with the JIT = engine: (a) the compiler stages are JIT compiled and (b) the generated byte= -code is JIT compiled. A tech report describing a slightly earlier prototyp= e and including performance measures of OCamlJit 2.0 on Mac OS X (64bit) is= available at:

=A0http://arxi= v.org/abs/1011.1783

The source code is available from the Git repository (master branch) at:
=A0h= ttp://gitorious.org/ocamljit2/ocamljit2

Installation is similar to installation of Objective Caml, just run

=A0./configure -prefix /path/to/ocamljit2 [options]

followed by

=A0make world opt
=A0make install

This will install a fully working Objective Caml 3.12.0 to /path/to/ocamlji= 2, where /path/to/ocamljit2/bin/ocamlrun and /path/to/ocamljit2/lib/libcaml= run_shared.so include the JIT engine in addition to the byte-code interpret= er (fallback to the byte-code interpreter is necessary for debugging with o= camldebug). The configure script prints a line indicating whether the JIT e= ngine is enabled or not (if not, it'll be just a regular OCaml 3.12 ins= tallation).

Comments are welcome.

Enjoy!

Benedikt Meurer

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.in= ria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

--0016367f9c1287adaf04952e971c--