From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <benedikt.meurer@googlemail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id AEC6CBBAF
	for <caml-list@yquem.inria.fr>; Sun,  5 Dec 2010 17:37:35 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApYBAHtO+0zRVdY0gWdsb2JhbACjJwgVAQELCQoHEwQeo0uMAgEFjQoBBIIUgzU
X-IronPort-AV: E=Sophos;i="4.59,302,1288566000"; 
   d="scan'208";a="91557598"
Received: from mail-bw0-f52.google.com ([209.85.214.52])
  by mail1-smtp-roc.national.inria.fr with ESMTP; 05 Dec 2010 17:37:35 +0100
Received: by bwz4 with SMTP id 4so9744480bwz.39
        for <caml-list@yquem.inria.fr>; Sun, 05 Dec 2010 08:37:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlemail.com; s=gamma;
        h=domainkey-signature:received:received:content-type:mime-version
         :subject:from:in-reply-to:date:content-transfer-encoding:message-id
         :references:to:x-mailer;
        bh=RmcrcHP4KAaVpWZhHmzRvi7AxFYiIu9ctf53gIkIoKE=;
        b=BSQm2E/Q6oOtAlosbJiD7mUfC2IZOzE00HyuJqHHwyvmi1R0rHxv+v7dQxRXAteGpn
         NvFgbD8O73k1zMXGxPMZyx1JabdL3Qstjef/AXwyi9chFpYz9k6aLcj/bdk9VtPcITpT
         8oeR94NDLYQrt6QXiMMWYreZ9Bm6PWZZW9GKQ=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=content-type:mime-version:subject:from:in-reply-to:date
         :content-transfer-encoding:message-id:references:to:x-mailer;
        b=hV1ttCnZcfxyUINdXj/pqqImBYVNTTVHF1HYiOZe7EE9cJm+3vZ7OvwGjkaX4d5Ff/
         vDWVdnwAIjgOSPMU0bvpsMJKVduYC70x0CXU6yphYd5nUhm3KG9XQkJp0b9BJocqD8mi
         Gki+0KhzPxEPNY7qL2VUxWW9Mn6qsuBhAxR+k=
Received: by 10.204.112.143 with SMTP id w15mr4669412bkp.214.1291567054743;
        Sun, 05 Dec 2010 08:37:34 -0800 (PST)
Received: from bespin.kosmos.all (ip-95-223-170-32.unitymediagroup.de [95.223.170.32])
        by mx.google.com with ESMTPS id b17sm1999651bku.20.2010.12.05.08.37.33
        (version=SSLv3 cipher=RC4-MD5);
        Sun, 05 Dec 2010 08:37:33 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1081)
Subject: Re: ocamlopt LLVM support (Was: [Caml-list] OCamlJIT2 vs. OCamlJIT)
From: Benedikt Meurer <benedikt.meurer@googlemail.com>
In-Reply-To: <0cae01cb9325$a8ddc3d0$fa994b70$@com>
Date: Sun, 5 Dec 2010 17:37:32 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <CC528935-50A4-4A94-849B-9DAD160D46A3@googlemail.com>
References: <3DCEA910-1382-47E5-876B-059178F8F82E@googlemail.com>	<20101130124803.7952fca1@deb0>	<D6C274D9-8A00-4D6F-936A-58206CA5D358@googlemail.com>	<AANLkTi=Wu1SXVgQ+X0NECNx3oUsD=xEy-9Zq6T3ncJ+W@mail.gmail.com>	<B8C055A1-2A67-4C7E-BF67-4A639619C4EB@wanadoo.fr>	<0a8b01cb90da$da5e6240$8f1b26c0$@com>	<5E2DA3F1-7998-4F62-B617-7B6451D1001D@googlemail.com>	<0b3b01cb9161$a81c8e10$f855aa30$@com>	<B965CE04-4059-421F-A8CD-EB178BF32D40@googlemail.com>	<0b9301cb91a3$8f42fd60$adc8f820$@com> <AB3E2F68-0E5B-4B4F-A4F8-3C67BD6F598E@googlemail.com> <0cae01cb9325$a8ddc3d0$fa994b70$@com>
To: caml-list@yquem.inria.fr
X-Mailer: Apple Mail (2.1081)
X-Spam: no; 0.00; ocamlopt:01 ocaml:01 ocaml:01 compiler:01 bytecode:01 coq:01 pointers:01 locality:01 marshalling:01 ffi:01 bindings:01 recursive:01 trivial:01 model:01 integers:01 


On Dec 3, 2010, at 21:07 , Jon Harrop wrote:

> Benedikt wrote:
>> So, let's take that LLVM thing to a somewhat more serious level =
(which
>> means no more arguing with toy projects that simply ignore the
>> difficult stuff).
>=20
> You can learn a lot from prototypes.

Indeed, but you need stay on topic. HLVM does not prototype the use of =
LLVM within OCaml, because you're using a quite different data =
representation.

Using a different data representation within OCaml would indeed be =
possible, but one has to ask what it's worth? You'd get better floating =
point performance (most likely) and possibly also gain from a few LLVM =
optimization passes. But on the other hand:

1. You will have to rewrite not only the compiler, the standard library =
and the bytecode interpreter (already a massive amount of work), but you =
also loose compatibility with almost every existing OCaml library =
binding, since the native function interface will be different. That's a =
hell of a lot of work for many people.

2. The current data representation is already close to optimal for =
symbolic computation, so there's is not much to gain here. The most =
complex applications use OCaml for symbolic computation (i.e. Coq), so =
they will not benefit (if at all, see 3.).

3. The current garbage collector is mostly straight-forward because of =
the data representation. No need to carry around/load type information, =
you just need the following bits of information: Is it a block in the =
heap? Does it contain pointers? If so how many? All this information is =
immediately available with the current data representation (also =
improves cache locality of the GC loops). Even if the current GC is =
replaced with something more recent (which I'm sure is worth it), the =
new GC will also be easy to implement because of the data =
representation.

4. Marshalling/unmarshalling will also be touched (and probably become =
more complex), loosing even more compatibility (less of an issue than =
the FFI, but nevertheless something to take care of).

So, it is really worth to spend years on a new data representation =
(including fixing all C bindings, etc.)? Just to get better floating =
point performance? Integer performance will be almost the same, avoiding =
the shift/tag bit just replaces an "addq r1, r2; subq $1, r2" with "addq =
r1, r2"; even doing this thousands of times will not cause a noticeable =
difference.

I already mentioned a few simple ways to improve floating point =
performance w/o the need to throw away most of the code base. I.e. just =
compile (tail-)recursive floating point functions in a way that avoids =
the boxing for the (tail-)calls. If this seems too complex, you could =
use an even simpler trick: Just compile every floating point function =
twice, one "generic version", which accepts "value"s (for use in =
closures, on parameter position, etc.), and one "float version" which =
takes non-boxed doubles. Calling the appropriate version is trivial, =
just pass the type information down to the Cmm level. That way =
everything will continue to work, no need to mess up everything. And you =
can also apply your LLVM optimizations.

>> 3. A better LLVM. If we do it right, other's implementing functional
>> languages using LLVM will not be faced with the same problems again.
>> Dunno if that's possible.
>=20
> What exactly are you thinking of that goes beyond what the other =
functional
> languages that already target LLVM have already had to implement?

Two main areas for now: The GC interface and the exception handling. =
LLVM's exception support is really limited; the GC support is better and =
more generic. I don't know how to implement the OCaml exception model =
within LLVM w/o adding a lot of special case stuff to LLVM itself =
(mentioned in my original post); if there would be a way to easily =
extend LLVM with special exception models, other projects could plug in =
their models.

For the GC interface. =46rom what I know right now, the "ocaml" GC =
plugin generates the frametables and the special symbols. But several =
details are left to LLVM, which don't seem to be handled properly: For =
example, there does not seem to be a way to tell LLVM how to =
spill/reload; in case of OCaml the stack cells marked as gcroots must =
not contain "untagged integers". I'm still trying to figure out what are =
the exact lowering mechanisms involved, so it may already be possible - =
we shall see.

The other thing I noticed so far: LLVM does not seem to eliminate stores =
to gcroots (probably required for Java, dunno). In case of ocamlopt, =
this would generate a lot of unnecessary stores. ocamlopt avoids this =
because it has knowledge not available to LLVM, i.e. when calling a =
native C floating point primitive (sin, cos, ...), ocamlopt knows that =
i.e. %r12 is preserved and the primitive does not allocate or raise so =
no need to spill the value to the stack cell, LLVM does not know this =
(and there's no way to tell, yet) so it has to spill and reload all =
(possible) gcroots.

You can think of various similar examples, where ocamlopt generates =
better code, simply because it "knows" OCaml (and even tho it lacks =
complex data/control analysis), whereas LLVM is generic (not necessarily =
a bad thing) but lacks ways to "learn" OCaml (or other language specific =
stuff). Adding more ways to tell LLVM about certain constraints will =
also be beneficial to other projects (not only functional programming =
languages...).

> Cheers,
> Jon.

Benedikt=