Re: Value types (Was: [Caml-list] ocamlopt LLVM support)

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Benedikt Meurer <benedikt.meurer@googlemail.com>
To: caml-list@yquem.inria.fr
Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support)
Date: Sun, 12 Dec 2010 20:09:00 +0100	[thread overview]
Message-ID: <9264BEE6-DBAE-4523-93AC-4560615D2AC5@googlemail.com> (raw)
In-Reply-To: <20101212175524.73a8e285@deb0>

[-- Attachment #1: Type: text/plain, Size: 5708 bytes --]


On Dec 12, 2010, at 16:55 , Török Edwin wrote:

> On Sun, 12 Dec 2010 14:54:14 -0000
> "Jon Harrop" <jon@ffconsultancy.com> wrote:
> 
>> The Haskell guys got their best performance improvement moving to
>> LLVM from the hailstone benchmark so it is interesting to examine
>> this case as well. I just added support for 64-bit ints to HLVM to
>> implement that benchmark and my code is:
>> 
>> Here’s the equivalent OCaml code:
>> 
>>  let rec collatzLen(c, n) : int =
>>    if n = 1L then c else
>>      collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else
> 
> OK, but boxing itself has nothing to do with the performance degration
> here. It is the lack of compiler optimizations on the Int64 type. This
> could be solved by implementing compiler optimizations for it (or
> manually optimizing some integer arithmetic that is slow).
> 
> Lets run the code under a profiler, or look at the assembly (I used
> 'perf record' and 'perf report'). 2 'idiv' instructions turn up as top
> offenders in the profile.
> 
> Problem #1: Int64.rem n 2 -> another idiv instruction
> 
> A C compiler would optimize this to an 'and' instruction.
> Change that to 'Int64.logand n 1L = 0L'/
> 
> Problem #2: Int64.div n 2 -> idiv instruction. 
> 
> A C compiler would optimize this to a right shift. Changing that to 'Int64.shift_right n 1' speeds
> up the code.

This is easy to fix in ocamlopt (see attached patch ocamlopt-natint.patch), by applying the same optimizations already used for constant int's to constant natint's (Int64 is Nativeint on 64bit). Note however, that "mod 2" is not really "and 1", neither is "div 2" equivalent to "lsr 1"; that would be the case for unsigned arithmetic (doesn't matter in your example, tho).

I don't see the point of optimizing for x86-32 (neither would I spend my time optimizing anything for VAX these days), but it would be possible to add appropriate cases for Int64 on x86 as well (regalloc would be most difficult here, since that requires support for register pairs to perform 64bit arithmetic).

>> 1. Unboxing can give huge performance improvements on serial code,
> 
> s/Unboxing/arithmetic optimizations/
> Please find an example where the performance benefit is due to
> unboxing, and not due to arithmetic optimizations performed on the
> unboxed code.

The boxing involved is relevant, but boxing in general is not the issue. In this special case, the "let nlen, n = if..." code requires heap allocation, because of the way the pattern is compiled. This could be fixed by moving the condition out of the code and using two if's to select n/nlen separately (doesn't speed up that much). Fixing the pattern compiler to handle these cases might be interesting for general benefit.

I already mentioned this multiple times, but here we go again: Unboxing optimizations may indeed prove useful if applied wisely (cmmgen.ml is of special interest here, the unboxing optimizations are more or less special cases; that could be extended to include interesting cases like moving boxing out of if-then-else in return position, etc).

But (here comes the special "Harrop note") this has absolutely nothing to do with LLVM (and of course really, really, really nothing to do with HLVM). Using a different data representation for the heap requires a nearly complete rewrite of the OCaml system (you would probably need to start at the Typing level); if one wants to do this, enjoy and come back with code. But even then, data representation issues will have to be considered long before it comes to actual code generation (if you are serious, you'll have to think about the runtime first prior to talking about code generation for a non-existing runtime), so even then it has nothing to do with LLVM (or C-- or C or whatever IR you can think of).

Combining alloc's across if-then-else constructs further reduces code size in your example (and probably other programs as well), see attached file ocamlopt-comballoc-ifthenelse.patch. It's quick&dirty, but it should illustrate the idea.

>> let alone parallel code. The optimized HLVM is running 32× faster
>> than the OCaml here.
>> 
>> 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it
>> to beat GCC-compiled C code here.
>> 
> 
> One advantage of using LLVM is that it would notice arithmetic
> optimizations like this and perform it itself (even if you use the
> boxed representation).

In case of x86-32, it won't, simply because LLVM will be presented with the calls to caml_int32_* functions. You'd need to change the Cmm code instead (changing the low-level stuff is straight-forward as demonstrated). For 64bit targets, see attached patch.

This doesn't mean that LLVM wouldn't be useful (in fact, I've just started an LLVM backend for ocamlopt). But it is important to note that LLVM is not the solution to everything. As the name implies, it's "low level", it does a few "higher level" optimizations for C, but these are special cases (and somewhat ugly if you take the time to read the code). It won't make a faster OCaml magically, just like it didn't make a faster Haskell by itself.

I could go on by quoting common "Harrop jokes" like "you need types in the low-level IR", etc. trying to tell him that this is simply wrong; but after reading through the Caml/LISP mailing list archives (thanks for the pointers by several readers), I'm pretty much convinced that Jon simply decided to end his war against LISP just to start a new one against ocamlopt.

If anyone is serious about ocamlopt with LLVM, feel free to contact me (tho, my time is limited atm).

> Best regards,
> --Edwin

greets,
Benedikt


[-- Attachment #2: ocamlopt-comballoc-ifthenelse.patch --]
[-- Type: application/octet-stream, Size: 6109 bytes --]

diff --git a/asmcomp/comballoc.ml b/asmcomp/comballoc.ml
index 5a862b1..82c01f9 100644
--- a/asmcomp/comballoc.ml
+++ b/asmcomp/comballoc.ml
@@ -27,64 +27,78 @@ let allocated_size = function
     No_alloc -> 0
   | Pending_alloc(reg, ofs) -> ofs
 
+let instr_cons_alloc sz a r n =
+  if sz != 0
+  then instr_cons (Iop(Ialloc sz)) a r n
+  else n
+
 let rec combine i allocstate =
   match i.desc with
     Iend | Ireturn | Iexit _ | Iraise ->
-      (i, allocated_size allocstate)
+      (i, allocated_size allocstate, true)
   | Iop(Ialloc sz) ->
       begin match allocstate with
         No_alloc ->
-          let (newnext, newsz) =
+          let (newnext, newsz, _) =
             combine i.next (Pending_alloc(i.res.(0), sz)) in
-          (instr_cons (Iop(Ialloc newsz)) i.arg i.res newnext, 0)
+          (instr_cons_alloc newsz i.arg i.res newnext, 0, false)
       | Pending_alloc(reg, ofs) ->
           if ofs + sz < Config.max_young_wosize then begin
-            let (newnext, newsz) =
+            let (newnext, newsz, safe) =
               combine i.next (Pending_alloc(reg, ofs + sz)) in
-            (instr_cons (Iop(Iintop_imm(Iadd, ofs))) [| reg |] i.res newnext,
-             newsz)
+            if sz != 0 && ofs != 0 then
+              (instr_cons (Iop(Iintop_imm(Iadd, ofs))) [|reg|] i.res newnext, newsz, safe)
+            else if sz != 0 then
+              (instr_cons (Iop Imove) [|reg|] i.res newnext, newsz, safe)
+            else
+              (newnext, newsz, safe)
           end else begin
-            let (newnext, newsz) =
+            let (newnext, newsz, _) =
               combine i.next (Pending_alloc(i.res.(0), sz)) in
-            (instr_cons (Iop(Ialloc newsz)) i.arg i.res newnext, ofs)
+            (instr_cons_alloc newsz i.arg i.res newnext, ofs, false)
           end
       end
   | Iop(Icall_ind | Icall_imm _ | Iextcall _ |
         Itailcall_ind | Itailcall_imm _) ->
       let newnext = combine_restart i.next in
-      (instr_cons_debug i.desc i.arg i.res i.dbg newnext,
-       allocated_size allocstate)
+      (instr_cons_debug i.desc i.arg i.res i.dbg newnext, allocated_size allocstate, false)
   | Iop op ->
-      let (newnext, sz) = combine i.next allocstate in
-      (instr_cons_debug i.desc i.arg i.res i.dbg newnext, sz)
+      let (newnext, sz, safe) = combine i.next allocstate in
+      (instr_cons_debug i.desc i.arg i.res i.dbg newnext, sz, safe)
   | Iifthenelse(test, ifso, ifnot) ->
-      let newifso = combine_restart ifso in
-      let newifnot = combine_restart ifnot in
-      let newnext = combine_restart i.next in
-      (instr_cons (Iifthenelse(test, newifso, newifnot)) i.arg i.res newnext,
-       allocated_size allocstate)
+      begin match allocstate, combine ifso allocstate, combine ifnot allocstate with
+        Pending_alloc(reg, ofs), (newifso, szifso, true), (newifnot, szifnot, true) when szifso = szifnot ->
+          let (newnext, sznext, safe) = combine i.next (Pending_alloc(reg, ofs + szifso)) in
+          (instr_cons (Iifthenelse(test, newifso, newifnot)) i.arg i.res newnext,
+           sznext,
+           safe)
+      | _, _, _ ->
+          let newifso = combine_restart ifso in
+          let newifnot = combine_restart ifnot in
+          let newnext = combine_restart i.next in
+          (instr_cons (Iifthenelse(test, newifso, newifnot)) i.arg i.res newnext,
+           allocated_size allocstate, false)
+      end
   | Iswitch(table, cases) ->
       let newcases = Array.map combine_restart cases in
       let newnext = combine_restart i.next in
-      (instr_cons (Iswitch(table, newcases)) i.arg i.res newnext,
-       allocated_size allocstate)
+      (instr_cons (Iswitch(table, newcases)) i.arg i.res newnext, allocated_size allocstate, false)
   | Iloop(body) ->
       let newbody = combine_restart body in
-      (instr_cons (Iloop(newbody)) i.arg i.res i.next,
-       allocated_size allocstate)
+      (instr_cons (Iloop(newbody)) i.arg i.res i.next, allocated_size allocstate, false)
   | Icatch(io, body, handler) ->
-      let (newbody, sz) = combine body allocstate in
+      let (newbody, sz, _) = combine body allocstate in
       let newhandler = combine_restart handler in
       let newnext = combine_restart i.next in
-      (instr_cons (Icatch(io, newbody, newhandler)) i.arg i.res newnext, sz)
+      (instr_cons (Icatch(io, newbody, newhandler)) i.arg i.res newnext, sz, false)
   | Itrywith(body, handler) ->
-      let (newbody, sz) = combine body allocstate in
+      let (newbody, sz, _) = combine body allocstate in
       let newhandler = combine_restart handler in
       let newnext = combine_restart i.next in
-      (instr_cons (Itrywith(newbody, newhandler)) i.arg i.res newnext, sz)
+      (instr_cons (Itrywith(newbody, newhandler)) i.arg i.res newnext, sz, false)
 
 and combine_restart i =
-  let (newi, _) = combine i No_alloc in newi
+  let (newi, _, _) = combine i No_alloc in newi
 
 let fundecl f =
   {f with fun_body = combine_restart f.fun_body}
diff --git a/asmcomp/selectgen.ml b/asmcomp/selectgen.ml
index 7daa239..4221f95 100644
--- a/asmcomp/selectgen.ml
+++ b/asmcomp/selectgen.ml
@@ -562,6 +562,9 @@ method emit_expr env exp =
           let (rif, sif) = self#emit_sequence env eif in
           let (relse, selse) = self#emit_sequence env eelse in
           let r = join rif sif relse selse in
+          (* Dummy Ialloc 0 for comballoc.ml *)
+          let ra = self#regs_for typ_addr in
+          self#insert (Iop(Ialloc 0)) [||] ra;
           self#insert (Iifthenelse(cond, sif#extract, selse#extract))
                       rarg [||];
           r
@@ -790,6 +793,9 @@ method emit_tail env exp =
       begin match self#emit_expr env earg with
         None -> ()
       | Some rarg ->
+          (* Dummy Ialloc 0 for comballoc.ml *)
+          let ra = self#regs_for typ_addr in
+          self#insert (Iop(Ialloc 0)) [||] ra;
           self#insert (Iifthenelse(cond, self#emit_tail_sequence env eif,
                                          self#emit_tail_sequence env eelse))
                       rarg [||]

[-- Attachment #3: ocamlopt-natint.patch --]
[-- Type: application/octet-stream, Size: 7218 bytes --]

diff --git a/asmcomp/amd64/selection.ml b/asmcomp/amd64/selection.ml
index 4921e51..bce8104 100644
--- a/asmcomp/amd64/selection.ml
+++ b/asmcomp/amd64/selection.ml
@@ -168,6 +168,9 @@ method! select_operation op args =
         [arg1; Cconst_int n] when self#is_immediate n
                                && n = 1 lsl (Misc.log2 n) ->
           (Iintop_imm(Idiv, n), [arg1])
+        | [arg1; Cconst_natint n] when self#is_immediate_natint n
+                               && n = Nativeint.shift_left 1n (Misc.log2 (Nativeint.to_int n)) ->
+          (Iintop_imm(Idiv, Nativeint.to_int n), [arg1])
       | _ -> (Iintop Idiv, args)
       end
   | Cmodi ->
@@ -175,6 +178,9 @@ method! select_operation op args =
         [arg1; Cconst_int n] when self#is_immediate n
                                && n = 1 lsl (Misc.log2 n) ->
           (Iintop_imm(Imod, n), [arg1])
+        | [arg1; Cconst_natint n] when self#is_immediate_natint n
+                               && n = Nativeint.shift_left 1n (Misc.log2 (Nativeint.to_int n)) ->
+          (Iintop_imm(Imod, Nativeint.to_int n), [arg1])
       | _ -> (Iintop Imod, args)
       end
   (* Recognize float arithmetic with memory. *)
diff --git a/asmcomp/selectgen.ml b/asmcomp/selectgen.ml
index 50f949a..7daa239 100644
--- a/asmcomp/selectgen.ml
+++ b/asmcomp/selectgen.ml
@@ -201,6 +201,8 @@ method is_simple_expr = function
 
 method virtual is_immediate : int -> bool
 
+method virtual is_immediate_natint : nativeint -> bool
+
 (* Selection of addressing modes *)
 
 method virtual select_addressing :
@@ -238,11 +240,21 @@ method select_operation op args =
       if n = 1 lsl l
       then (Iintop_imm(Ilsl, l), [arg1])
       else self#select_arith_comm Imul args
+  | (Cmuli, [arg1; Cconst_natint n]) ->
+      let l = Misc.log2 (Nativeint.to_int n) in
+      if n = Nativeint.shift_left 1n l
+      then (Iintop_imm(Ilsl, l), [arg1])
+      else self#select_arith_comm Imul args
   | (Cmuli, [Cconst_int n; arg1]) ->
       let l = Misc.log2 n in
       if n = 1 lsl l
       then (Iintop_imm(Ilsl, l), [arg1])
       else self#select_arith_comm Imul args
+  | (Cmuli, [Cconst_natint n; arg1]) ->
+      let l = Misc.log2 (Nativeint.to_int n) in
+      if n = Nativeint.shift_left 1n l
+      then (Iintop_imm(Ilsl, l), [arg1])
+      else self#select_arith_comm Imul args
   | (Cmuli, _) -> self#select_arith_comm Imul args
   | (Cdivi, _) -> self#select_arith Idiv args
   | (Cmodi, _) -> self#select_arith_comm Imod args
@@ -270,38 +282,60 @@ method select_operation op args =
 method private select_arith_comm op = function
     [arg; Cconst_int n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natint n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [arg; Cconst_pointer n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natpointer n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [Cconst_int n; arg] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [Cconst_natint n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [Cconst_pointer n; arg] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [Cconst_natpointer n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | args ->
       (Iintop op, args)
 
 method private select_arith op = function
     [arg; Cconst_int n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natint n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [arg; Cconst_pointer n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natpointer n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | args ->
       (Iintop op, args)
 
 method private select_shift op = function
     [arg; Cconst_int n] when n >= 0 && n < Arch.size_int * 8 ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natint n] when n >= 0n && n < Nativeint.of_int (Arch.size_int * 8) ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | args ->
       (Iintop op, args)
 
 method private select_arith_comp cmp = function
     [arg; Cconst_int n] when self#is_immediate n ->
       (Iintop_imm(Icomp cmp, n), [arg])
+  | [arg; Cconst_natint n] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp cmp, Nativeint.to_int n), [arg])
   | [arg; Cconst_pointer n] when self#is_immediate n ->
       (Iintop_imm(Icomp cmp, n), [arg])
+  | [arg; Cconst_natpointer n] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp cmp, Nativeint.to_int n), [arg])
   | [Cconst_int n; arg] when self#is_immediate n ->
       (Iintop_imm(Icomp(swap_intcomp cmp), n), [arg])
+  | [Cconst_natint n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp(swap_intcomp cmp), Nativeint.to_int n), [arg])
   | [Cconst_pointer n; arg] when self#is_immediate n ->
       (Iintop_imm(Icomp(swap_intcomp cmp), n), [arg])
+  | [Cconst_natpointer n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp(swap_intcomp cmp), Nativeint.to_int n), [arg])
   | args ->
       (Iintop(Icomp cmp), args)
 
@@ -310,12 +344,20 @@ method private select_arith_comp cmp = function
 method select_condition = function
     Cop(Ccmpi cmp, [arg1; Cconst_int n]) when self#is_immediate n ->
       (Iinttest_imm(Isigned cmp, n), arg1)
+  | Cop(Ccmpi cmp, [arg1; Cconst_natint n]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned cmp, Nativeint.to_int n), arg1)
   | Cop(Ccmpi cmp, [Cconst_int n; arg2]) when self#is_immediate n ->
       (Iinttest_imm(Isigned(swap_comparison cmp), n), arg2)
+  | Cop(Ccmpi cmp, [Cconst_natint n; arg2]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned(swap_comparison cmp), Nativeint.to_int n), arg2)
   | Cop(Ccmpi cmp, [arg1; Cconst_pointer n]) when self#is_immediate n ->
       (Iinttest_imm(Isigned cmp, n), arg1)
+  | Cop(Ccmpi cmp, [arg1; Cconst_natpointer n]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned cmp, Nativeint.to_int n), arg1)
   | Cop(Ccmpi cmp, [Cconst_pointer n; arg2]) when self#is_immediate n ->
       (Iinttest_imm(Isigned(swap_comparison cmp), n), arg2)
+  | Cop(Ccmpi cmp, [Cconst_natpointer n; arg2]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned(swap_comparison cmp), Nativeint.to_int n), arg2)
   | Cop(Ccmpi cmp, args) ->
       (Iinttest(Isigned cmp), Ctuple args)
   | Cop(Ccmpa cmp, [arg1; Cconst_pointer n]) when self#is_immediate n ->
diff --git a/asmcomp/selectgen.mli b/asmcomp/selectgen.mli
index 7c30f9f..8966ad1 100644
--- a/asmcomp/selectgen.mli
+++ b/asmcomp/selectgen.mli
@@ -23,6 +23,7 @@ class virtual selector_generic : object
   (* The following methods must or can be overridden by the processor
      description *)
   method virtual is_immediate : int -> bool
+  method virtual is_immediate_natint : nativeint -> bool
     (* Must be defined to indicate whether a constant is a suitable
        immediate operand to arithmetic instructions *)
   method virtual select_addressing :

next prev parent reply	other threads:[~2010-12-12 19:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-12 14:54 Jon Harrop
2010-12-12 15:55 ` Török Edwin
2010-12-12 17:14   ` Jon Harrop
2010-12-12 17:26     ` Török Edwin
2010-12-12 18:01       ` Jon Harrop
2010-12-12 18:22         ` Török Edwin
2010-12-12 19:09   ` Benedikt Meurer [this message]
2010-12-12 19:20     ` John Carr
2010-12-14  9:43       ` Value types Goswin von Brederlow
2010-12-12 19:55     ` Value types (Was: [Caml-list] ocamlopt LLVM support) Török Edwin
2010-12-12 22:05       ` Jon Harrop
2010-12-12 22:27         ` Török Edwin
2010-12-12 23:41           ` Jon Harrop
2010-12-13  2:13             ` Eray Ozkural
2010-12-12 21:50     ` Jon Harrop
2010-12-13  8:43     ` Alain Frisch
2010-12-15 10:29       ` Benedikt Meurer
2010-12-15 13:15         ` Jon Harrop
2010-12-14  9:54   ` Value types Goswin von Brederlow
2010-12-12 19:53 ` Value types (Was: [Caml-list] ocamlopt LLVM support) Brian Hurt
2010-12-12 20:39   ` Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9264BEE6-DBAE-4523-93AC-4560615D2AC5@googlemail.com \
    --to=benedikt.meurer@googlemail.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).