From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <benedikt.meurer@googlemail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id AB378BC57
	for <caml-list@yquem.inria.fr>; Sun, 12 Dec 2010 20:09:06 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ArcAAGOsBE3RVaEtimdsb2JhbACkAQgVAQEBCgkMBw8GJaYdjAgBBYxzAQSFSg
X-IronPort-AV: E=Sophos;i="4.59,333,1288566000"; 
   d="scan'208";a="92064972"
Received: from mail-fx0-f45.google.com ([209.85.161.45])
  by mail1-smtp-roc.national.inria.fr with ESMTP; 12 Dec 2010 20:09:06 +0100
Received: by fxm12 with SMTP id 12so5588611fxm.4
        for <caml-list@yquem.inria.fr>; Sun, 12 Dec 2010 11:09:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlemail.com; s=gamma;
        h=domainkey-signature:received:received:from:mime-version
         :content-type:subject:date:in-reply-to:to:references:message-id
         :x-mailer;
        bh=x+JRSiUQv/k9VI4YA36nRcHCXYTB4Gl3hvNXHeYa2zw=;
        b=I62cxxb2dgYqrJe5sM0arxksSQEgTvDemw/rZDBK+i8IdphEcgB/1hODdHkwva29Wx
         yGyz+Tp23Lrhl5q3v7S5H/ejFkuR/dL+aLNDoWXwfPLU27y8van2TGvdRC4BtOBYROOh
         C8AICx38ZpMYlwXBYluvKZlZ9qUyUFr11KcqA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=from:mime-version:content-type:subject:date:in-reply-to:to
         :references:message-id:x-mailer;
        b=tL/e7z6WFYKhS1bHygo+9JqMHwS6f0987FzrY/IDRxURWDaP+H/M3CdX2JDxN0pSGR
         OX/glxP/9KUAY1YcZUQ0x9fWCtEs/SDYKRK1Hz5aha2tnRzpj66HchmcIw0aoQ8aHGbV
         J3hdW+0GzNHwwrBGhLspQCH8w9/Exr7Ru/6KU=
Received: by 10.223.125.68 with SMTP id x4mr3411510far.65.1292180943706;
        Sun, 12 Dec 2010 11:09:03 -0800 (PST)
Received: from bespin.kosmos.all (ip-95-223-170-32.unitymediagroup.de [95.223.170.32])
        by mx.google.com with ESMTPS id n3sm1461696faa.5.2010.12.12.11.09.01
        (version=SSLv3 cipher=RC4-MD5);
        Sun, 12 Dec 2010 11:09:02 -0800 (PST)
From: Benedikt Meurer <benedikt.meurer@googlemail.com>
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: multipart/mixed; boundary=Apple-Mail-247-341128255
Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support)
Date: Sun, 12 Dec 2010 20:09:00 +0100
In-Reply-To: <20101212175524.73a8e285@deb0>
To: caml-list@yquem.inria.fr
References: <036001cb9a0c$725acef0$57106cd0$@com> <20101212175524.73a8e285@deb0>
Message-Id: <9264BEE6-DBAE-4523-93AC-4560615D2AC5@googlemail.com>
X-Mailer: Apple Mail (2.1081)
X-Spam: no; 0.00; ocamlopt:01 haskell:01 ocaml:01 compiler:01 compiler:01 'int:01 'int:01 ocamlopt:01 int's:01 nativeint:01 unboxing:01 unboxing:01 unboxed:01 cmmgen:01 ocaml:01 
X-Attachments: type="application/octet-stream" name="ocamlopt-comballoc-ifthenelse.patch" name="ocamlopt-comballoc-ifthenelse.patch" type="application/octet-stream" name="ocamlopt-natint.patch" name="ocamlopt-natint.patch" 


--Apple-Mail-247-341128255
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


On Dec 12, 2010, at 16:55 , T=F6r=F6k Edwin wrote:

> On Sun, 12 Dec 2010 14:54:14 -0000
> "Jon Harrop" <jon@ffconsultancy.com> wrote:
>=20
>> The Haskell guys got their best performance improvement moving to
>> LLVM from the hailstone benchmark so it is interesting to examine
>> this case as well. I just added support for 64-bit ints to HLVM to
>> implement that benchmark and my code is:
>>=20
>> Here=92s the equivalent OCaml code:
>>=20
>>  let rec collatzLen(c, n) : int =3D
>>    if n =3D 1L then c else
>>      collatzLen (c+1, if Int64.rem n 2L =3D 0L then Int64.div n 2L =
else
>=20
> OK, but boxing itself has nothing to do with the performance degration
> here. It is the lack of compiler optimizations on the Int64 type. This
> could be solved by implementing compiler optimizations for it (or
> manually optimizing some integer arithmetic that is slow).
>=20
> Lets run the code under a profiler, or look at the assembly (I used
> 'perf record' and 'perf report'). 2 'idiv' instructions turn up as top
> offenders in the profile.
>=20
> Problem #1: Int64.rem n 2 -> another idiv instruction
>=20
> A C compiler would optimize this to an 'and' instruction.
> Change that to 'Int64.logand n 1L =3D 0L'/
>=20
> Problem #2: Int64.div n 2 -> idiv instruction.=20
>=20
> A C compiler would optimize this to a right shift. Changing that to =
'Int64.shift_right n 1' speeds
> up the code.

This is easy to fix in ocamlopt (see attached patch =
ocamlopt-natint.patch), by applying the same optimizations already used =
for constant int's to constant natint's (Int64 is Nativeint on 64bit). =
Note however, that "mod 2" is not really "and 1", neither is "div 2" =
equivalent to "lsr 1"; that would be the case for unsigned arithmetic =
(doesn't matter in your example, tho).

I don't see the point of optimizing for x86-32 (neither would I spend my =
time optimizing anything for VAX these days), but it would be possible =
to add appropriate cases for Int64 on x86 as well (regalloc would be =
most difficult here, since that requires support for register pairs to =
perform 64bit arithmetic).

>> 1. Unboxing can give huge performance improvements on serial code,
>=20
> s/Unboxing/arithmetic optimizations/
> Please find an example where the performance benefit is due to
> unboxing, and not due to arithmetic optimizations performed on the
> unboxed code.

The boxing involved is relevant, but boxing in general is not the issue. =
In this special case, the "let nlen, n =3D if..." code requires heap =
allocation, because of the way the pattern is compiled. This could be =
fixed by moving the condition out of the code and using two if's to =
select n/nlen separately (doesn't speed up that much). Fixing the =
pattern compiler to handle these cases might be interesting for general =
benefit.

I already mentioned this multiple times, but here we go again: Unboxing =
optimizations may indeed prove useful if applied wisely (cmmgen.ml is of =
special interest here, the unboxing optimizations are more or less =
special cases; that could be extended to include interesting cases like =
moving boxing out of if-then-else in return position, etc).

But (here comes the special "Harrop note") this has absolutely nothing =
to do with LLVM (and of course really, really, really nothing to do with =
HLVM). Using a different data representation for the heap requires a =
nearly complete rewrite of the OCaml system (you would probably need to =
start at the Typing level); if one wants to do this, enjoy and come back =
with code. But even then, data representation issues will have to be =
considered long before it comes to actual code generation (if you are =
serious, you'll have to think about the runtime first prior to talking =
about code generation for a non-existing runtime), so even then it has =
nothing to do with LLVM (or C-- or C or whatever IR you can think of).

Combining alloc's across if-then-else constructs further reduces code =
size in your example (and probably other programs as well), see attached =
file ocamlopt-comballoc-ifthenelse.patch. It's quick&dirty, but it =
should illustrate the idea.

>> let alone parallel code. The optimized HLVM is running 32=D7 faster
>> than the OCaml here.
>>=20
>> 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it
>> to beat GCC-compiled C code here.
>>=20
>=20
> One advantage of using LLVM is that it would notice arithmetic
> optimizations like this and perform it itself (even if you use the
> boxed representation).

In case of x86-32, it won't, simply because LLVM will be presented with =
the calls to caml_int32_* functions. You'd need to change the Cmm code =
instead (changing the low-level stuff is straight-forward as =
demonstrated). For 64bit targets, see attached patch.

This doesn't mean that LLVM wouldn't be useful (in fact, I've just =
started an LLVM backend for ocamlopt). But it is important to note that =
LLVM is not the solution to everything. As the name implies, it's "low =
level", it does a few "higher level" optimizations for C, but these are =
special cases (and somewhat ugly if you take the time to read the code). =
It won't make a faster OCaml magically, just like it didn't make a =
faster Haskell by itself.

I could go on by quoting common "Harrop jokes" like "you need types in =
the low-level IR", etc. trying to tell him that this is simply wrong; =
but after reading through the Caml/LISP mailing list archives (thanks =
for the pointers by several readers), I'm pretty much convinced that Jon =
simply decided to end his war against LISP just to start a new one =
against ocamlopt.

If anyone is serious about ocamlopt with LLVM, feel free to contact me =
(tho, my time is limited atm).

> Best regards,
> --Edwin

greets,
Benedikt


--Apple-Mail-247-341128255
Content-Disposition: attachment;
	filename=ocamlopt-comballoc-ifthenelse.patch
Content-Type: application/octet-stream;
	name="ocamlopt-comballoc-ifthenelse.patch"
Content-Transfer-Encoding: 7bit

diff --git a/asmcomp/comballoc.ml b/asmcomp/comballoc.ml
index 5a862b1..82c01f9 100644
--- a/asmcomp/comballoc.ml
+++ b/asmcomp/comballoc.ml
@@ -27,64 +27,78 @@ let allocated_size = function
     No_alloc -> 0
   | Pending_alloc(reg, ofs) -> ofs
 
+let instr_cons_alloc sz a r n =
+  if sz != 0
+  then instr_cons (Iop(Ialloc sz)) a r n
+  else n
+
 let rec combine i allocstate =
   match i.desc with
     Iend | Ireturn | Iexit _ | Iraise ->
-      (i, allocated_size allocstate)
+      (i, allocated_size allocstate, true)
   | Iop(Ialloc sz) ->
       begin match allocstate with
         No_alloc ->
-          let (newnext, newsz) =
+          let (newnext, newsz, _) =
             combine i.next (Pending_alloc(i.res.(0), sz)) in
-          (instr_cons (Iop(Ialloc newsz)) i.arg i.res newnext, 0)
+          (instr_cons_alloc newsz i.arg i.res newnext, 0, false)
       | Pending_alloc(reg, ofs) ->
           if ofs + sz < Config.max_young_wosize then begin
-            let (newnext, newsz) =
+            let (newnext, newsz, safe) =
               combine i.next (Pending_alloc(reg, ofs + sz)) in
-            (instr_cons (Iop(Iintop_imm(Iadd, ofs))) [| reg |] i.res newnext,
-             newsz)
+            if sz != 0 && ofs != 0 then
+              (instr_cons (Iop(Iintop_imm(Iadd, ofs))) [|reg|] i.res newnext, newsz, safe)
+            else if sz != 0 then
+              (instr_cons (Iop Imove) [|reg|] i.res newnext, newsz, safe)
+            else
+              (newnext, newsz, safe)
           end else begin
-            let (newnext, newsz) =
+            let (newnext, newsz, _) =
               combine i.next (Pending_alloc(i.res.(0), sz)) in
-            (instr_cons (Iop(Ialloc newsz)) i.arg i.res newnext, ofs)
+            (instr_cons_alloc newsz i.arg i.res newnext, ofs, false)
           end
       end
   | Iop(Icall_ind | Icall_imm _ | Iextcall _ |
         Itailcall_ind | Itailcall_imm _) ->
       let newnext = combine_restart i.next in
-      (instr_cons_debug i.desc i.arg i.res i.dbg newnext,
-       allocated_size allocstate)
+      (instr_cons_debug i.desc i.arg i.res i.dbg newnext, allocated_size allocstate, false)
   | Iop op ->
-      let (newnext, sz) = combine i.next allocstate in
-      (instr_cons_debug i.desc i.arg i.res i.dbg newnext, sz)
+      let (newnext, sz, safe) = combine i.next allocstate in
+      (instr_cons_debug i.desc i.arg i.res i.dbg newnext, sz, safe)
   | Iifthenelse(test, ifso, ifnot) ->
-      let newifso = combine_restart ifso in
-      let newifnot = combine_restart ifnot in
-      let newnext = combine_restart i.next in
-      (instr_cons (Iifthenelse(test, newifso, newifnot)) i.arg i.res newnext,
-       allocated_size allocstate)
+      begin match allocstate, combine ifso allocstate, combine ifnot allocstate with
+        Pending_alloc(reg, ofs), (newifso, szifso, true), (newifnot, szifnot, true) when szifso = szifnot ->
+          let (newnext, sznext, safe) = combine i.next (Pending_alloc(reg, ofs + szifso)) in
+          (instr_cons (Iifthenelse(test, newifso, newifnot)) i.arg i.res newnext,
+           sznext,
+           safe)
+      | _, _, _ ->
+          let newifso = combine_restart ifso in
+          let newifnot = combine_restart ifnot in
+          let newnext = combine_restart i.next in
+          (instr_cons (Iifthenelse(test, newifso, newifnot)) i.arg i.res newnext,
+           allocated_size allocstate, false)
+      end
   | Iswitch(table, cases) ->
       let newcases = Array.map combine_restart cases in
       let newnext = combine_restart i.next in
-      (instr_cons (Iswitch(table, newcases)) i.arg i.res newnext,
-       allocated_size allocstate)
+      (instr_cons (Iswitch(table, newcases)) i.arg i.res newnext, allocated_size allocstate, false)
   | Iloop(body) ->
       let newbody = combine_restart body in
-      (instr_cons (Iloop(newbody)) i.arg i.res i.next,
-       allocated_size allocstate)
+      (instr_cons (Iloop(newbody)) i.arg i.res i.next, allocated_size allocstate, false)
   | Icatch(io, body, handler) ->
-      let (newbody, sz) = combine body allocstate in
+      let (newbody, sz, _) = combine body allocstate in
       let newhandler = combine_restart handler in
       let newnext = combine_restart i.next in
-      (instr_cons (Icatch(io, newbody, newhandler)) i.arg i.res newnext, sz)
+      (instr_cons (Icatch(io, newbody, newhandler)) i.arg i.res newnext, sz, false)
   | Itrywith(body, handler) ->
-      let (newbody, sz) = combine body allocstate in
+      let (newbody, sz, _) = combine body allocstate in
       let newhandler = combine_restart handler in
       let newnext = combine_restart i.next in
-      (instr_cons (Itrywith(newbody, newhandler)) i.arg i.res newnext, sz)
+      (instr_cons (Itrywith(newbody, newhandler)) i.arg i.res newnext, sz, false)
 
 and combine_restart i =
-  let (newi, _) = combine i No_alloc in newi
+  let (newi, _, _) = combine i No_alloc in newi
 
 let fundecl f =
   {f with fun_body = combine_restart f.fun_body}
diff --git a/asmcomp/selectgen.ml b/asmcomp/selectgen.ml
index 7daa239..4221f95 100644
--- a/asmcomp/selectgen.ml
+++ b/asmcomp/selectgen.ml
@@ -562,6 +562,9 @@ method emit_expr env exp =
           let (rif, sif) = self#emit_sequence env eif in
           let (relse, selse) = self#emit_sequence env eelse in
           let r = join rif sif relse selse in
+          (* Dummy Ialloc 0 for comballoc.ml *)
+          let ra = self#regs_for typ_addr in
+          self#insert (Iop(Ialloc 0)) [||] ra;
           self#insert (Iifthenelse(cond, sif#extract, selse#extract))
                       rarg [||];
           r
@@ -790,6 +793,9 @@ method emit_tail env exp =
       begin match self#emit_expr env earg with
         None -> ()
       | Some rarg ->
+          (* Dummy Ialloc 0 for comballoc.ml *)
+          let ra = self#regs_for typ_addr in
+          self#insert (Iop(Ialloc 0)) [||] ra;
           self#insert (Iifthenelse(cond, self#emit_tail_sequence env eif,
                                          self#emit_tail_sequence env eelse))
                       rarg [||]

--Apple-Mail-247-341128255
Content-Disposition: attachment;
	filename=ocamlopt-natint.patch
Content-Type: application/octet-stream;
	name="ocamlopt-natint.patch"
Content-Transfer-Encoding: 7bit

diff --git a/asmcomp/amd64/selection.ml b/asmcomp/amd64/selection.ml
index 4921e51..bce8104 100644
--- a/asmcomp/amd64/selection.ml
+++ b/asmcomp/amd64/selection.ml
@@ -168,6 +168,9 @@ method! select_operation op args =
         [arg1; Cconst_int n] when self#is_immediate n
                                && n = 1 lsl (Misc.log2 n) ->
           (Iintop_imm(Idiv, n), [arg1])
+        | [arg1; Cconst_natint n] when self#is_immediate_natint n
+                               && n = Nativeint.shift_left 1n (Misc.log2 (Nativeint.to_int n)) ->
+          (Iintop_imm(Idiv, Nativeint.to_int n), [arg1])
       | _ -> (Iintop Idiv, args)
       end
   | Cmodi ->
@@ -175,6 +178,9 @@ method! select_operation op args =
         [arg1; Cconst_int n] when self#is_immediate n
                                && n = 1 lsl (Misc.log2 n) ->
           (Iintop_imm(Imod, n), [arg1])
+        | [arg1; Cconst_natint n] when self#is_immediate_natint n
+                               && n = Nativeint.shift_left 1n (Misc.log2 (Nativeint.to_int n)) ->
+          (Iintop_imm(Imod, Nativeint.to_int n), [arg1])
       | _ -> (Iintop Imod, args)
       end
   (* Recognize float arithmetic with memory. *)
diff --git a/asmcomp/selectgen.ml b/asmcomp/selectgen.ml
index 50f949a..7daa239 100644
--- a/asmcomp/selectgen.ml
+++ b/asmcomp/selectgen.ml
@@ -201,6 +201,8 @@ method is_simple_expr = function
 
 method virtual is_immediate : int -> bool
 
+method virtual is_immediate_natint : nativeint -> bool
+
 (* Selection of addressing modes *)
 
 method virtual select_addressing :
@@ -238,11 +240,21 @@ method select_operation op args =
       if n = 1 lsl l
       then (Iintop_imm(Ilsl, l), [arg1])
       else self#select_arith_comm Imul args
+  | (Cmuli, [arg1; Cconst_natint n]) ->
+      let l = Misc.log2 (Nativeint.to_int n) in
+      if n = Nativeint.shift_left 1n l
+      then (Iintop_imm(Ilsl, l), [arg1])
+      else self#select_arith_comm Imul args
   | (Cmuli, [Cconst_int n; arg1]) ->
       let l = Misc.log2 n in
       if n = 1 lsl l
       then (Iintop_imm(Ilsl, l), [arg1])
       else self#select_arith_comm Imul args
+  | (Cmuli, [Cconst_natint n; arg1]) ->
+      let l = Misc.log2 (Nativeint.to_int n) in
+      if n = Nativeint.shift_left 1n l
+      then (Iintop_imm(Ilsl, l), [arg1])
+      else self#select_arith_comm Imul args
   | (Cmuli, _) -> self#select_arith_comm Imul args
   | (Cdivi, _) -> self#select_arith Idiv args
   | (Cmodi, _) -> self#select_arith_comm Imod args
@@ -270,38 +282,60 @@ method select_operation op args =
 method private select_arith_comm op = function
     [arg; Cconst_int n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natint n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [arg; Cconst_pointer n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natpointer n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [Cconst_int n; arg] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [Cconst_natint n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [Cconst_pointer n; arg] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [Cconst_natpointer n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | args ->
       (Iintop op, args)
 
 method private select_arith op = function
     [arg; Cconst_int n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natint n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | [arg; Cconst_pointer n] when self#is_immediate n ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natpointer n] when self#is_immediate_natint n ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | args ->
       (Iintop op, args)
 
 method private select_shift op = function
     [arg; Cconst_int n] when n >= 0 && n < Arch.size_int * 8 ->
       (Iintop_imm(op, n), [arg])
+  | [arg; Cconst_natint n] when n >= 0n && n < Nativeint.of_int (Arch.size_int * 8) ->
+      (Iintop_imm(op, Nativeint.to_int n), [arg])
   | args ->
       (Iintop op, args)
 
 method private select_arith_comp cmp = function
     [arg; Cconst_int n] when self#is_immediate n ->
       (Iintop_imm(Icomp cmp, n), [arg])
+  | [arg; Cconst_natint n] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp cmp, Nativeint.to_int n), [arg])
   | [arg; Cconst_pointer n] when self#is_immediate n ->
       (Iintop_imm(Icomp cmp, n), [arg])
+  | [arg; Cconst_natpointer n] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp cmp, Nativeint.to_int n), [arg])
   | [Cconst_int n; arg] when self#is_immediate n ->
       (Iintop_imm(Icomp(swap_intcomp cmp), n), [arg])
+  | [Cconst_natint n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp(swap_intcomp cmp), Nativeint.to_int n), [arg])
   | [Cconst_pointer n; arg] when self#is_immediate n ->
       (Iintop_imm(Icomp(swap_intcomp cmp), n), [arg])
+  | [Cconst_natpointer n; arg] when self#is_immediate_natint n ->
+      (Iintop_imm(Icomp(swap_intcomp cmp), Nativeint.to_int n), [arg])
   | args ->
       (Iintop(Icomp cmp), args)
 
@@ -310,12 +344,20 @@ method private select_arith_comp cmp = function
 method select_condition = function
     Cop(Ccmpi cmp, [arg1; Cconst_int n]) when self#is_immediate n ->
       (Iinttest_imm(Isigned cmp, n), arg1)
+  | Cop(Ccmpi cmp, [arg1; Cconst_natint n]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned cmp, Nativeint.to_int n), arg1)
   | Cop(Ccmpi cmp, [Cconst_int n; arg2]) when self#is_immediate n ->
       (Iinttest_imm(Isigned(swap_comparison cmp), n), arg2)
+  | Cop(Ccmpi cmp, [Cconst_natint n; arg2]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned(swap_comparison cmp), Nativeint.to_int n), arg2)
   | Cop(Ccmpi cmp, [arg1; Cconst_pointer n]) when self#is_immediate n ->
       (Iinttest_imm(Isigned cmp, n), arg1)
+  | Cop(Ccmpi cmp, [arg1; Cconst_natpointer n]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned cmp, Nativeint.to_int n), arg1)
   | Cop(Ccmpi cmp, [Cconst_pointer n; arg2]) when self#is_immediate n ->
       (Iinttest_imm(Isigned(swap_comparison cmp), n), arg2)
+  | Cop(Ccmpi cmp, [Cconst_natpointer n; arg2]) when self#is_immediate_natint n ->
+      (Iinttest_imm(Isigned(swap_comparison cmp), Nativeint.to_int n), arg2)
   | Cop(Ccmpi cmp, args) ->
       (Iinttest(Isigned cmp), Ctuple args)
   | Cop(Ccmpa cmp, [arg1; Cconst_pointer n]) when self#is_immediate n ->
diff --git a/asmcomp/selectgen.mli b/asmcomp/selectgen.mli
index 7c30f9f..8966ad1 100644
--- a/asmcomp/selectgen.mli
+++ b/asmcomp/selectgen.mli
@@ -23,6 +23,7 @@ class virtual selector_generic : object
   (* The following methods must or can be overridden by the processor
      description *)
   method virtual is_immediate : int -> bool
+  method virtual is_immediate_natint : nativeint -> bool
     (* Must be defined to indicate whether a constant is a suitable
        immediate operand to arithmetic instructions *)
   method virtual select_addressing :

--Apple-Mail-247-341128255--