From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 8C1B8BBAF for ; Sun, 5 Dec 2010 22:21:40 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Am0BAAqR+0zRVdY0gWdsb2JhbACjKAgVAQELCQoaBB6jX4wCAQWNAAEEghRpgkw X-IronPort-AV: E=Sophos;i="4.59,302,1288566000"; d="scan'208";a="83138914" Received: from mail-bw0-f52.google.com ([209.85.214.52]) by mail2-smtp-roc.national.inria.fr with ESMTP; 05 Dec 2010 22:21:40 +0100 Received: by bwz4 with SMTP id 4so9874751bwz.39 for ; Sun, 05 Dec 2010 13:21:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:content-type:mime-version :subject:from:in-reply-to:date:content-transfer-encoding:message-id :references:to:x-mailer; bh=yu0FE4vFfJf6vwmy89KZ5k/JCHd2X2GSOgJIxca6UYs=; b=hZiDFt0SH9T/k4N1H7ZARZKA/RNJGmL3LRFKlmqbOomZXqUsqyAjMI9H9wvTaK6KBu J8l94oh5P09SwhVtjKeyoZtnd3k6miryP4uSKF0BBqzMh+5WhKoRKlx3fGrLjnBxyGKe 4iHuWLK/dVXRUji6AZ1xcZl3pIr1R5K92V68Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=CmLeeIpNUMEXu78l4P03P0OoXAMPvk0GO/2H6fonmMlsF/mNO0Vv4xs5sb11DDOChc ji1bFWQcSf09su/Amxu4BmxmRrASmZeBtC4mfpWmtlDNb5sCoBSoi0sdPYRhfCZMfNFt c5x2Sa4T1yuDpqULgtIYDz7F/qay/wKXNF0E8= Received: by 10.204.123.14 with SMTP id n14mr4988892bkr.33.1291584099403; Sun, 05 Dec 2010 13:21:39 -0800 (PST) Received: from bespin.kosmos.all (ip-95-223-170-32.unitymediagroup.de [95.223.170.32]) by mx.google.com with ESMTPS id q15sm1624947bkk.1.2010.12.05.13.21.37 (version=SSLv3 cipher=RC4-MD5); Sun, 05 Dec 2010 13:21:38 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: ocamlopt LLVM support (Was: [Caml-list] OCamlJIT2 vs. OCamlJIT) From: Benedikt Meurer In-Reply-To: <004901cb94b8$c03abf80$40b03e80$@com> Date: Sun, 5 Dec 2010 22:21:37 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <3DCEA910-1382-47E5-876B-059178F8F82E@googlemail.com> <20101130124803.7952fca1@deb0> <0a8b01cb90da$da5e6240$8f1b26c0$@com> <5E2DA3F1-7998-4F62-B617-7B6451D1001D@googlemail.com> <0b3b01cb9161$a81c8e10$f855aa30$@com> <0b9301cb91a3$8f42fd60$adc8f820$@com> <0cae01cb9325$a8ddc3d0$fa994b70$@com> <004901cb94b8$c03abf80$40b03e80$@com> To: caml-list@yquem.inria.fr X-Mailer: Apple Mail (2.1081) X-Spam: no; 0.00; ocamlopt:01 ocaml:01 ocamlopt:01 coq:01 ocaml:01 pointer:01 lambda:01 ulambda:01 c--:01 vectors:01 matrices:01 hash:01 unboxed:01 bigarrays:01 ffi:01 On Dec 5, 2010, at 21:12 , Jon Harrop wrote: > Benedikt wrote: >> On Dec 3, 2010, at 21:07 , Jon Harrop wrote: >>> Benedikt wrote: >>>> So, let's take that LLVM thing to a somewhat more serious level >> (which >>>> means no more arguing with toy projects that simply ignore the >>>> difficult stuff). >>>=20 >>> You can learn a lot from prototypes. >>=20 >> Indeed, but you need stay on topic. HLVM does not prototype the use = of >> LLVM within OCaml, because you're using a quite different data >> representation. >=20 > My point was that HLVM's data representation is far more flexible than > ocamlopt's, so you can use it to measure the effect of changing data > representations on the efficiency of the code LLVM generates more = easily > than with ocamlopt. >=20 > In particular, I am saying that (from my own measurements) LLVM does = not > cope with data representations like ocamlopt's at all well. = Specifically, > when you box and cast away type information. My guess is that = retargeting > ocamlopt to LLVM IR (which is a considerable undertaking due to the = amount > of hacking required on LLVM itself) would result in more performance > degradation that improvement. Moreover, that degradation is most = likely to > occur on code like Coq because ocamlopt has been so heavily optimized = for > that one application domain and, consequently, I doubt your hard work = would > ever be accepted into mainstream OCaml. >=20 > Ultimately, LLVM was built specifically to exploit a typed = intermediate > representation whereas ocamlopt removes type information very early. =46rom what I've seen so far, LLVM makes little use of the type = information, especially integer and pointer types are treated nearly the = same, so LLVM won't benefit from this information. Any other data is = still distinguished at the Cmm level (i.e. doubles, strings, ...), which = is necessary for the register allocation and instruction selection phase = in ocamlopt. You could easily change the intermediate representations in = ocaml (lambda, ulambda, cmm) to include additional type information, so = this is not the problem. But as said, it won't help a lot with LLVM. Also looking at the GHC work you mentioned, they also start at the Cmm = level (slightly different C--, but comparable to ocamlopt), with mostly = the same amount of type information available. And as you said, they did = quite well in some cases. >> Using a different data representation within OCaml would indeed be >> possible, but one has to ask what it's worth? You'd get better = floating >> point performance (most likely) >=20 > And faster tuples, ints, chars, complex numbers, low-dimensional > vectors/matrices, hash tables and so on. More types (e.g. int16 and > float32). Even arbitrary-precision arithmetic can benefit by keeping = small > numbers unboxed when possible. Bigarrays disappear. The FFI gets = simpler, > easier to use and more powerful (e.g. you can generate AoS layouts for > OpenGL). The benefits are enormous in the general case but that is = beside > the point here. Moreover, this would never be accepted either because = it > would degrade the performance of applications like Coq. If it is done = at > all, such work must be kept separate from the OCaml compilers. ocamlopt already supports float32 and int16, tho they are not exploited = to the upper layers (don't know why, but I also don't know why one would = need them). Keeping Int32/Int64 values unboxed would be possible similar = to what is done with doubles; maybe there is simply no need to do = (honestly, I've never needed Int32/Int64 so far, int/Natint is usually = what you want and these are optimized). >> 3. The current garbage collector is mostly straight-forward because = of >> the data representation. No need to carry around/load type = information, >=20 > Note that conveying run-time type information also facilitates useful > features like generic printing. Generic printing is currently implemented using the run-time type = information (the three bits mentioned below). >> you just need the following bits of information: Is it a block in the >> heap? Does it contain pointers? If so how many? All this information = is >> immediately available with the current data representation (also >> improves cache locality of the GC loops). >=20 > That information is also immediately available with HLVM's design. = Each > value contains a pointer to its run-time type. Each run-time type = contains a > pointer to a "visit" function for the GC that returns the references = in any > value of that type. This is indeed possible, yes, and also implemented by most Java VMs; = each objects ships a type-info pointer. OCaml avoids the type-info = pointer using clever header field encoding. A matter of taste probably. >> So, it is really worth to spend years on a new data representation >> (including fixing all C bindings, etc.)? Just to get better floating >> point performance? Integer performance will be almost the same, >> avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1, = r2" >> with "addq r1, r2"; even doing this thousands of times will not cause = a >> noticeable difference. >=20 > On the contrary, there are many example of huge performance gains = other than > floating point. The int-based random number generator from the = SciMark2 > benchmark is 6x faster with LLVM than with OCaml. Generic hash tables = are > 17x faster in F# than Java because of value types. And so on. I doubt that this is really related to the use of "value types", I also = doubt that the integer performance improvements are related to the = missing "subq" (must have been an unusual CPU then). The Java = performance is probably related to the GC in most JVMs, which is not as = well optimized for high speed allocation and compaction as the GCs found = in most runtimes of functional programming languages. But this is just = guessing, we'd need some facts to be sure what's the cause. I suggest = you present some proof-of-concept code in C or LLVM, simply using = malloc() for memory allocation, comparing a straight-forward = implementation to a "value type" implementation (with everything else = the same). If there's still a 17x improvement, I promise to rewrite the = whole OCaml system during the next three months to support value types. > Cheers, > Jon. greets, Benedikt