From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id A9CD0BBAF for ; Sun, 5 Dec 2010 21:12:45 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsAAN+A+0xKfVK0k2dsb2JhbACWQIxoCBUBAQEBCQkKCREEHqNdjAIBBYx9AQSCFIM1 X-IronPort-AV: E=Sophos;i="4.59,302,1288566000"; d="scan'208";a="91562176" Received: from mail-wy0-f180.google.com ([74.125.82.180]) by mail1-smtp-roc.national.inria.fr with ESMTP; 05 Dec 2010 21:12:45 +0100 Received: by wyb29 with SMTP id 29so6564313wyb.39 for ; Sun, 05 Dec 2010 12:12:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:organization:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; bh=2owqXHO0WiE1wU7gpKocCRrPUeOSlq/wF21iBYEQrvc=; b=OZQo5KSBKTCjP+CMHi/5VpIzrcXYWR+Mmzzin6d25f+jDiCrMBCoTd349Mww3mKgIG C5I4gfDCSdPCYee5y+JoBFICKztEIIkWFB6ZFGkhNlfL9w/+w6r/2mo7MUtQPT1AiyqH tlV6a7+QVnnG2Uvjqihoj0ZUHqtW0a16VOImA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:organization:message-id :mime-version:content-type:content-transfer-encoding:x-mailer :thread-index:content-language; b=r6zBQYTT7ajw/Igx5OlTYb6H7nG/w8rxtPoPR0AeuU2zOoPb32WI22MA1G+4zuqTOB Xx/OGATCc/6q7pr0IMWGDLLARgrhb8jl/eTKY5DKf3BA6qenxEjtuTfu/FfjHtNEcURc JAhlrHDvxbaGsa8428HNLJadZA+soYl6tNu88= Received: by 10.227.133.10 with SMTP id d10mr4673213wbt.94.1291579964935; Sun, 05 Dec 2010 12:12:44 -0800 (PST) Received: from WinEight ([87.113.160.118]) by mx.google.com with ESMTPS id p4sm1995196wer.29.2010.12.05.12.12.41 (version=SSLv3 cipher=RC4-MD5); Sun, 05 Dec 2010 12:12:42 -0800 (PST) From: Jon Harrop To: "'Benedikt Meurer'" , References: <3DCEA910-1382-47E5-876B-059178F8F82E@googlemail.com> <20101130124803.7952fca1@deb0> <0a8b01cb90da$da5e6240$8f1b26c0$@com> <5E2DA3F1-7998-4F62-B617-7B6451D1001D@googlemail.com> <0b3b01cb9161$a81c8e10$f855aa30$@com> <0b9301cb91a3$8f42fd60$adc8f820$@com> <0cae01cb9325$a8ddc3d0$fa994b70$@com> In-Reply-To: Subject: RE: ocamlopt LLVM support (Was: [Caml-list] OCamlJIT2 vs. OCamlJIT) Date: Sun, 5 Dec 2010 20:12:31 -0000 Organization: Flying Frog Consultancy Message-ID: <004901cb94b8$c03abf80$40b03e80$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcuUms3eUurRNRuyTIy8CjdO7Hbt8AADb1uA Content-Language: en-gb X-Spam: no; 0.00; ocamlopt:01 ocaml:01 ocamlopt:01 coq:01 ocaml:01 vectors:01 matrices:01 hash:01 unboxed:01 bigarrays:01 ffi:01 coq:01 vastly:01 compiler:01 bytecode:01 Benedikt wrote: > On Dec 3, 2010, at 21:07 , Jon Harrop wrote: > > Benedikt wrote: > >> So, let's take that LLVM thing to a somewhat more serious level > (which > >> means no more arguing with toy projects that simply ignore the > >> difficult stuff). > > > > You can learn a lot from prototypes. > > Indeed, but you need stay on topic. HLVM does not prototype the use of > LLVM within OCaml, because you're using a quite different data > representation. My point was that HLVM's data representation is far more flexible than ocamlopt's, so you can use it to measure the effect of changing data representations on the efficiency of the code LLVM generates more easily than with ocamlopt. In particular, I am saying that (from my own measurements) LLVM does not cope with data representations like ocamlopt's at all well. Specifically, when you box and cast away type information. My guess is that retargeting ocamlopt to LLVM IR (which is a considerable undertaking due to the amount of hacking required on LLVM itself) would result in more performance degradation that improvement. Moreover, that degradation is most likely to occur on code like Coq because ocamlopt has been so heavily optimized for that one application domain and, consequently, I doubt your hard work would ever be accepted into mainstream OCaml. Ultimately, LLVM was built specifically to exploit a typed intermediate representation whereas ocamlopt removes type information very early. > Using a different data representation within OCaml would indeed be > possible, but one has to ask what it's worth? You'd get better floating > point performance (most likely) And faster tuples, ints, chars, complex numbers, low-dimensional vectors/matrices, hash tables and so on. More types (e.g. int16 and float32). Even arbitrary-precision arithmetic can benefit by keeping small numbers unboxed when possible. Bigarrays disappear. The FFI gets simpler, easier to use and more powerful (e.g. you can generate AoS layouts for OpenGL). The benefits are enormous in the general case but that is beside the point here. Moreover, this would never be accepted either because it would degrade the performance of applications like Coq. If it is done at all, such work must be kept separate from the OCaml compilers. > and possibly also gain from a few LLVM optimization passes. I found that LLVM's optimization passes are of little value in this context. You want to do the optimization passes on the OCaml side (where it is vastly easier, if nothing else) and focus on emitting high quality LLVM IR for LLVM's code gen to work with. > But on the other hand: > > 1. You will have to rewrite not only the compiler, the standard library > and the bytecode interpreter (already a massive amount of work), but > you also loose compatibility with almost every existing OCaml library > binding, since the native function interface will be different. That's > a hell of a lot of work for many people. Yes. That would be revolution and not evolution. I cannot see any easy way around that and also advise against it. > 2. The current data representation is already close to optimal for > symbolic computation, so there's is not much to gain here. The most > complex applications use OCaml for symbolic computation (i.e. Coq), so > they will not benefit (if at all, see 3.). Yes. > 3. The current garbage collector is mostly straight-forward because of > the data representation. No need to carry around/load type information, Note that conveying run-time type information also facilitates useful features like generic printing. > you just need the following bits of information: Is it a block in the > heap? Does it contain pointers? If so how many? All this information is > immediately available with the current data representation (also > improves cache locality of the GC loops). That information is also immediately available with HLVM's design. Each value contains a pointer to its run-time type. Each run-time type contains a pointer to a "visit" function for the GC that returns the references in any value of that type. > Even if the current GC is > replaced with something more recent (which I'm sure is worth it), the > new GC will also be easy to implement because of the data > representation. I'm not sure that would be any easier than HLVM's design. > 4. Marshalling/unmarshalling will also be touched (and probably become > more complex), loosing even more compatibility (less of an issue than > the FFI, but nevertheless something to take care of). Yes. I'd be amazed if anyone cared about that though. > So, it is really worth to spend years on a new data representation > (including fixing all C bindings, etc.)? Just to get better floating > point performance? Integer performance will be almost the same, > avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1, r2" > with "addq r1, r2"; even doing this thousands of times will not cause a > noticeable difference. On the contrary, there are many example of huge performance gains other than floating point. The int-based random number generator from the SciMark2 benchmark is 6x faster with LLVM than with OCaml. Generic hash tables are 17x faster in F# than Java because of value types. And so on. > I already mentioned a few simple ways to improve floating point > performance w/o the need to throw away most of the code base. I.e. just > compile (tail-)recursive floating point functions in a way that avoids > the boxing for the (tail-)calls. If this seems too complex, you could > use an even simpler trick: Just compile every floating point function > twice, one "generic version", which accepts "value"s (for use in > closures, on parameter position, etc.), and one "float version" which > takes non-boxed doubles. Calling the appropriate version is trivial, > just pass the type information down to the Cmm level. That way > everything will continue to work, no need to mess up everything. And > you can also apply your LLVM optimizations. Yes. > >> 3. A better LLVM. If we do it right, other's implementing functional > >> languages using LLVM will not be faced with the same problems again. > >> Dunno if that's possible. > > > > What exactly are you thinking of that goes beyond what the other > functional > > languages that already target LLVM have already had to implement? > > Two main areas for now: The GC interface and the exception handling. > LLVM's exception support is really limited; the GC support is better > and more generic. I don't know how to implement the OCaml exception > model within LLVM w/o adding a lot of special case stuff to LLVM itself > (mentioned in my original post); if there would be a way to easily > extend LLVM with special exception models, other projects could plug in > their models. > > For the GC interface. From what I know right now, the "ocaml" GC plugin > generates the frametables and the special symbols. But several details > are left to LLVM, which don't seem to be handled properly: For example, > there does not seem to be a way to tell LLVM how to spill/reload; in > case of OCaml the stack cells marked as gcroots must not contain > "untagged integers". I'm still trying to figure out what are the exact > lowering mechanisms involved, so it may already be possible - we shall > see. > > The other thing I noticed so far: LLVM does not seem to eliminate > stores to gcroots (probably required for Java, dunno). In case of > ocamlopt, this would generate a lot of unnecessary stores. ocamlopt > avoids this because it has knowledge not available to LLVM, i.e. when > calling a native C floating point primitive (sin, cos, ...), ocamlopt > knows that i.e. %r12 is preserved and the primitive does not allocate > or raise so no need to spill the value to the stack cell, LLVM does not > know this (and there's no way to tell, yet) so it has to spill and > reload all (possible) gcroots. > > You can think of various similar examples, where ocamlopt generates > better code, simply because it "knows" OCaml (and even tho it lacks > complex data/control analysis), whereas LLVM is generic (not > necessarily a bad thing) but lacks ways to "learn" OCaml (or other > language specific stuff). Adding more ways to tell LLVM about certain > constraints will also be beneficial to other projects (not only > functional programming languages...). Yes. I suspect the effort required to bring such features up-to-par with ocamlopt will be pretty huge as well though. Cheers, Jon.