From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <benedikt.meurer@googlemail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by yquem.inria.fr (Postfix) with ESMTP id 0BC4BBC57
	for <caml-list@yquem.inria.fr>; Tue, 30 Nov 2010 23:48:23 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AtkAAPsN9UzRVdc0k2dsb2JhbACjBQgVAQEBAQkJCgkRAx+pZYt8AQWOJAEEghODNA
X-IronPort-AV: E=Sophos;i="4.59,282,1288566000"; 
   d="scan'208";a="68884791"
Received: from mail-ew0-f52.google.com ([209.85.215.52])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 30 Nov 2010 23:48:22 +0100
Received: by ewy23 with SMTP id 23so2962771ewy.39
        for <caml-list@yquem.inria.fr>; Tue, 30 Nov 2010 14:48:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlemail.com; s=gamma;
        h=domainkey-signature:received:received:content-type:mime-version
         :subject:from:in-reply-to:date:content-transfer-encoding:message-id
         :references:to:x-mailer;
        bh=TmFDmiENXcpFUFSJikZ2ZUlE2o6xPp88OB99VuHqGK4=;
        b=DsUFZkW07xj9jmy0YeOVLfICxrBqd49YCbk+tu9B+q3/v3Jip9zIGrQzMYQ2MIbUh+
         eRy3BveXQOBg00fyivcG18DE0tpe9ypkpvRwG/BZcJirVt+l1mL/AlYGlb2AryZeRah0
         Oj5M6D4KNENU7v8Alu1bwquJ2WkEygUKEhTkg=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=content-type:mime-version:subject:from:in-reply-to:date
         :content-transfer-encoding:message-id:references:to:x-mailer;
        b=tvY/eFz5yivmjlnkAzE+n2uvfF6XJz9OW0rJVvB2B3Z0BKHSyC0XMoujOt77a9EDiH
         MsXmmxjgT7u/xyBWc/f7bRgjnkmf6BjR5unNj0rCOWE4ZeIRl/59ppwoMD9vcvfJbauw
         oMahRTkfAKHv3PDnwYvWTd84ha5huhO3MuVLs=
Received: by 10.14.127.2 with SMTP id c2mr6615912eei.3.1291157302243;
        Tue, 30 Nov 2010 14:48:22 -0800 (PST)
Received: from bespin.kosmos.all (ip-95-223-170-32.unitymediagroup.de [95.223.170.32])
        by mx.google.com with ESMTPS id q58sm6821658eeh.21.2010.11.30.14.48.21
        (version=SSLv3 cipher=RC4-MD5);
        Tue, 30 Nov 2010 14:48:21 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1081)
Subject: Re: [Caml-list] OCamlJIT2 vs. OCamlJIT
From: Benedikt Meurer <benedikt.meurer@googlemail.com>
In-Reply-To: <0a8b01cb90da$da5e6240$8f1b26c0$@com>
Date: Tue, 30 Nov 2010 23:48:19 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <5E2DA3F1-7998-4F62-B617-7B6451D1001D@googlemail.com>
References: <3DCEA910-1382-47E5-876B-059178F8F82E@googlemail.com>	<20101130124803.7952fca1@deb0>	<D6C274D9-8A00-4D6F-936A-58206CA5D358@googlemail.com>	<AANLkTi=Wu1SXVgQ+X0NECNx3oUsD=xEy-9Zq6T3ncJ+W@mail.gmail.com> <B8C055A1-2A67-4C7E-BF67-4A639619C4EB@wanadoo.fr> <0a8b01cb90da$da5e6240$8f1b26c0$@com>
To: caml-list@yquem.inria.fr
X-Mailer: Apple Mail (2.1081)
X-Spam: no; 0.00; ocamlopt:01 compilation:01 ocamlopt:01 compilation:01 ocaml:01 polymorphism:01 trivial:01 polymorphism:01 integers:01 recursive:01 invokes:01 doable:01 ulambda:01 low-level:01 ocaml:01 


On Nov 30, 2010, at 23:06 , Jon Harrop wrote:

> Because benchmarks like my HLVM ones have proven that LLVM can =
generate
> *much* faster code than ocamlopt does.
>=20
> For example, Fibonacci function over floats in HLVM with optimization =
passes
> disabled and compilation time included in the measurement:
>=20
> # let rec fib (x: float) : float =3D
>    if x < 1.5 then x else fib(x - 1.0) + fib(x - 2.0);;
> # fib 40.0;;
> - : `Float =3D 1.02334e+08
> Live: 0
> 2.48074s total; 0s suspend; 0s mark; 0s sweep
>=20
> And ocamlopt without compilation time:
>=20
> $ cat >fib.ml
> let rec fib x =3D if x < 1.5 then x else fib(x -. 1.0) +. fib(x -. =
2.0);;
> fib 40.0;;
> $ ocamlopt fib.ml -o fib
> $ time ./fib
>=20
> real    0m7.811s
> user    0m7.808s
> sys     0m0.000s
>=20
> Note that HLVM's *REPL* is over 3x faster than ocamlopt-compiled =
OCaml.

This has nothing to do with LLVM, but is simply due to the fact that =
your code does not box the float parameters/results. The following peace =
of C code is most probably even faster than your code, so what?

double fib(double x) { if (x < 1.5) return x else return fib(x-1) + =
fib(x-2); }

So this is about data representation not code generation (using LLVM =
with boxed floats would result in same/lower performance); HLVM ignores =
complex stuff like polymorphism, modules, etc. (at least last time I =
checked), which makes code generation almost trivial. The literature =
includes various solutions to implement stuff like ML polymorphism: =
tagged integers/boxed floats/objects is just one solution, not =
necessarily the best; but examples that simply ignore the complex stuff, =
and therefore deliver better performance don't really help to make =
progress.

A possible solution to improve ocamlopt's performance in this case would =
be to compile the recursive fib calls in a way that the parameter/result =
is passed in a floating point register (not boxed), and provide a =
wrapper for calls from outside, which unboxes the parameter, invokes the =
actual function code, and boxes the result. This should be doable on the =
Ulambda/Cmm level, so it is actually quite portable and completely =
independent of the low-level code generation (which is where LLVM comes =
into play). That way ocamlopt code will be as fast as the C code for =
this example.

> I have microbenchmarks where LLVM generates code over 10x faster than
> ocamlopt (specifically, floating point code on x86) and larger =
numerical
> programs that also wipe the floor with OCaml.

ocamlopt's x86 floating point backend is easy to beat, as demonstrated =
in the original post. Even a simple byte-code JIT engine (which still =
boxes floating point results, etc.) is able to beat it.

Your benchmarks do not prove that LLVM generates faster code than =
ocamlopt. Instead you proved that OCaml's floating point representation =
comes at a cost for number crunching applications (which is obvious). =
Use the same data representation with LLVM (or C) and you'll notice that =
the performance is the same (most likely worse) compared to ocamlopt.

> LLVM is also much better documented than ocamlopt's internals.

Definitely, but LLVM replaces just the low-level stuff of ocamlopt (most =
probably starting at the Cmm level), so a lot of (undocumented) ocamlopt =
internals remain.

> Cheers,
> Jon.

regards,
Benedikt=