caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] ARM code generator problem
@ 2012-08-10 21:41 Jeffrey Scofield
  2012-08-11  8:00 ` Benedikt Meurer
  0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-10 21:41 UTC (permalink / raw)
  To: Caml List; +Cc: Jeffrey Scofield

Greetings,

While working on porting OCaml 4.00.0 to iOS, I ran across
what looks like a problem in the ARM code generation.

If you look at asmcomp/arm/emit.mlp you see lots of places where
s14 is used as a scratch register.  The one that showed up in my
code is the code sequence for float_of_int:

    | Lop(Ifloatofint) ->
        `       fmsr    s14, {emit_reg i.arg.(0)}\n`;
        `       fsitod  {emit_reg i.res.(0)}, s14\n`; 2

Note that the emitted code always uses s14 (unconditionally).  This
suggests that s14 should be set aside as a scratch register.

However, s14 is also an alias for the low order part of d7.  If you look
at asmcomp/arm/proc.ml you'll see that d7 is used as a general purpose
register.

The result is that a value in d7 is sometimes destroyed by a use
of s14 as a scratch register.  In my code it was a call to float_of_int
that destroyed a float value being kept in d7.

I'm wondering if there's any wisdom on the list about this problem.
I don't see anything about it on Mantis.

For my own project, I think I can solve this simply by leaving d7 out of
the list of general registers in proc.ml.  However, this might be a bit
drastic.  Maybe there is a more subtle and wise solution.

You can read about OCaml4-on-iOS progress in my sporadic blog:

    http://psellos.com/2012/07/2012.07.ocamlxarm-ocaml4-1.html

I can provide my OCaml code and the generated ARM code if it will help
show the problem.  I haven't (yet) tried to whittle it down to a small
case.

Regards,

Jeffrey


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] ARM code generator problem
  2012-08-10 21:41 [Caml-list] ARM code generator problem Jeffrey Scofield
@ 2012-08-11  8:00 ` Benedikt Meurer
  2012-08-11  8:13   ` Benedikt Meurer
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-11  8:00 UTC (permalink / raw)
  To: Jeffrey Scofield; +Cc: Caml List


On Aug 10, 2012, at 23:41 , Jeffrey Scofield wrote:

> Greetings,

Hey Jeffrey,

> While working on porting OCaml 4.00.0 to iOS, I ran across
> what looks like a problem in the ARM code generation.
> 
> If you look at asmcomp/arm/emit.mlp you see lots of places where
> s14 is used as a scratch register.  The one that showed up in my
> code is the code sequence for float_of_int:
> 
>    | Lop(Ifloatofint) ->
>        `       fmsr    s14, {emit_reg i.arg.(0)}\n`;
>        `       fsitod  {emit_reg i.res.(0)}, s14\n`; 2
> 
> Note that the emitted code always uses s14 (unconditionally).  This
> suggests that s14 should be set aside as a scratch register.
> 
> However, s14 is also an alias for the low order part of d7.  If you look
> at asmcomp/arm/proc.ml you'll see that d7 is used as a general purpose
> register.
> 
> The result is that a value in d7 is sometimes destroyed by a use
> of s14 as a scratch register.  In my code it was a call to float_of_int
> that destroyed a float value being kept in d7.

If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that d7 (s14+s15) is marked as destroyed for those operations where it is used as scratch register.

If possible, it would probably also make sense to merge some of the iOS related code into the upstream ARM backend, in case you are interested.

> Regards,
> Jeffrey

greets,
Benedikt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] ARM code generator problem
  2012-08-11  8:00 ` Benedikt Meurer
@ 2012-08-11  8:13   ` Benedikt Meurer
  2012-08-11  8:57     ` Jeffrey Scofield
  2012-08-11  8:52   ` [Caml-list] " Jeffrey Scofield
  2012-08-13 19:21   ` [Caml-list] " Jeffrey Scofield
  2 siblings, 1 reply; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-11  8:13 UTC (permalink / raw)
  To: Jeffrey Scofield; +Cc: Caml List


On Aug 11, 2012, at 10:00 , Benedikt Meurer wrote:

> If possible, it would probably also make sense to merge some of the iOS related code into the upstream ARM backend, in case you are interested.

Looking through the arm-as-to-ios script you published, I could merge most of the label, symbol addressing and jump table related code. BTW you're script isn't going to work for large compilation units, because the range of the LDR instruction is limited and you always allocate the pool at the end of the file.

greets,
Benedikt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] ARM code generator problem
  2012-08-11  8:00 ` Benedikt Meurer
  2012-08-11  8:13   ` Benedikt Meurer
@ 2012-08-11  8:52   ` Jeffrey Scofield
  2012-08-13 19:21   ` [Caml-list] " Jeffrey Scofield
  2 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-11  8:52 UTC (permalink / raw)
  To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List

Benedikt,

> If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that
> d7 (s14+s15) is marked as destroyed for those operations where it is
> used as scratch register.

I definitely see d7 being overwritten in the way I described, and I
don't think I've changed these parts of the code.  Most of the work
was in reformatting the output for the iOS assembler.  There are
some smallish changes to the linkage for calling functions like
sin() and cos().

I'll look to see how destroyed_at_oper is working, maybe it will
explain things.

I made a pretty small file (35 lines or so) that shows the problem.
Unfortunately, I don't have access to a Linux/ARM machine, so I can't
easily try it on an unmodified version of OCaml 4.00.0.  If I still
think there's a problem after figuring out destroyed_at_oper, I'll send
you a description in private mail.

> If possible, it would probably also make sense to merge some of the iOS
> related code into the upstream ARM backend, in case you are interested.

I'd definitely be interested, once I get things working reasonably well.

Thanks for the help.

Jeffrey


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] ARM code generator problem
  2012-08-11  8:13   ` Benedikt Meurer
@ 2012-08-11  8:57     ` Jeffrey Scofield
  2012-08-11  9:48       ` [Caml-list] " Benedikt Meurer
  0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-11  8:57 UTC (permalink / raw)
  To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List

Benedikt,

> Looking through the arm-as-to-ios script you published, I could merge
> most of the label, symbol addressing and jump table related code. BTW
> you're script isn't going to work for large compilation units, because
> the range of the LDR instruction is limited and you always allocate the
> pool at the end of the file.

Since I only use the script to process arm.S, I didn't work *too* hard
at making it work for everything.  But I thought it might be useful
to other people as a starting point, or as a catalog of the changes
I had to make.

If you're not too put off by the ugliness of the compatibility changes,
I'd be very happy to merge the code.

Thanks for looking at my work.

Best regards,

Jeffrey


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] Re: ARM code generator problem
  2012-08-11  8:57     ` Jeffrey Scofield
@ 2012-08-11  9:48       ` Benedikt Meurer
  0 siblings, 0 replies; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-11  9:48 UTC (permalink / raw)
  To: Jeffrey Scofield; +Cc: Caml List


On Aug 11, 2012, at 10:57 , Jeffrey Scofield wrote:

> Benedikt,

Hey Jeffrey,

>> Looking through the arm-as-to-ios script you published, I could merge
>> most of the label, symbol addressing and jump table related code. BTW
>> you're script isn't going to work for large compilation units, because
>> the range of the LDR instruction is limited and you always allocate the
>> pool at the end of the file.
> 
> Since I only use the script to process arm.S, I didn't work *too* hard
> at making it work for everything.  But I thought it might be useful
> to other people as a starting point, or as a catalog of the changes
> I had to make.
> 
> If you're not too put off by the ugliness of the compatibility changes,
> I'd be very happy to merge the code.

I started work on merging your code, see the diff here:

https://github.com/bmeurer/ocaml-arm/compare/bm/ios

That handles most of the basic stuff. Now there are some open issues, i.e. what about .arch / .machine? Is that armv6 vs. armv7 thing an ABI difference?

You can install Debian armel within qemu to easily test the Linux ARM stuff. Preinstalled Debian/squeeze images are available from http://people.debian.org/~aurel32/qemu/armel/

> Best regards,
> Jeffrey

Benedikt


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] Re: ARM code generator problem
  2012-08-11  8:00 ` Benedikt Meurer
  2012-08-11  8:13   ` Benedikt Meurer
  2012-08-11  8:52   ` [Caml-list] " Jeffrey Scofield
@ 2012-08-13 19:21   ` Jeffrey Scofield
  2012-08-14  7:11     ` Benedikt Meurer
  2 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-13 19:21 UTC (permalink / raw)
  To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List

OCamlers, Benedikt:

>> The result is that a value in d7 is sometimes destroyed by a use of s14
>> as a scratch register.  In my code it was a call to float_of_int that
>> destroyed a float value being kept in d7.
> 
> If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that
> d7 (s14+s15) is marked as destroyed for those operations where it is
> used as scratch register.

I was able to reproduce this behavior with the stock OCaml 4.00.0 compiler,
so I really do think there's a problem.

I whittled my code down to just a few lines.  Here it is:

    let rate_pos scounts : float =
        let m_MIN = -999.0 
        in let max1s = Array.make 14 m_MIN
        in let max2s = Array.make_matrix 14 14 m_MIN
        in let try_build (k1: int) (m: float) : unit =
            let denom = 12
            in let try1b (sawk1, xct) k =
                let () =
                    if max2s.(k1).(k) > m then
                        let adjm = if m <= m_MIN then 0.0 else m
                        in let numer =
                            if k = k1 then 48
                            else if sawk1 then 36
                            else 24
                        in let f = float_of_int numer /. float_of_int denom
                        in let () =
                            if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
                        in
                            max1s.(k1) <-
                                max1s.(k1) +. (max2s.(k1).(k) -. adjm) *. f
                in
                    if k = k1 then
                        (true, xct)
                    else
                        (sawk1, xct + scounts.(k))
            in
                ignore (List.fold_left try1b (false, 0) [])
        in let () = Array.iteri try_build max1s
        in
            0.0

(This is a heavily hacked up piece of an evaluation function for a card
game app.)

Here is my OCaml command line (running on Linux/ARM inside Qemu, as you
suggested--it works!):

$ ocamlopt -ffpu vfpv3 -c -S rate.ml

I'm using vfpv3 because that's what I use for my iOS port.  The system
type is linux_eabihf, which is what you need to get vfpv3 support.

The section that seems to misbehave is these three lines:

    in let f = float_of_int numer /. float_of_int denom
    in let () =
        if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0

Here is the assembly code with added annotations:

        ldr     r12, [r2, #16]    @ r12 <- m_MIN block
        mov     r0, r7, asr #1    
        ldr     r7, [r2, #20]
        movs    r6, #0xc          @ r6 <- denom
        fmsr    s14, r6           
        fsitod  d10, s14          @ d10 <- float_of_int denom
        ldr     r6, [r7, #-4]
        fldd    d7, [r12, #0]     @ d7 <- m_MIN
        ldr     r12, [r2, #28]
        fmsr    s14, r0           @ *** d7 is destroyed here ***
        fsitod  d9, s14           @ d9 <- float_of_int numer
        cmp     r12, r6, lsr #10
        bcs     .L111
        add     r6, r7, r12, lsl #2
        fldd    d6, [r6, #-4]     @ d6 <- max1s.(k1)
        fdivd   d8, d9, d10
        fcmpd   d6, d7            @ *** This comparison fails ***
        fmstat
        bhi     .L104

I built the OCaml 4.00.0 compiler from sources inside Qemu.  The
line for configure was just this:

    $ ./configure --host armv5tejl-unknown-linux-gnueabihf

After that, I just built as usual.

If you agree that this is a problem, I can create a Mantis
bug report for it (if you like).

Best regards,

Jeffrey


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] Re: ARM code generator problem
  2012-08-13 19:21   ` [Caml-list] " Jeffrey Scofield
@ 2012-08-14  7:11     ` Benedikt Meurer
  2012-08-17  4:26       ` Jeffrey Scofield
  0 siblings, 1 reply; 9+ messages in thread
From: Benedikt Meurer @ 2012-08-14  7:11 UTC (permalink / raw)
  To: Jeffrey Scofield; +Cc: Benedikt Meurer, Caml List


On Aug 13, 2012, at 21:21 , Jeffrey Scofield wrote:

> OCamlers, Benedikt:

Hey Jeffrey,

>>> The result is that a value in d7 is sometimes destroyed by a use of s14
>>> as a scratch register.  In my code it was a call to float_of_int that
>>> destroyed a float value being kept in d7.
>> 
>> If you look at destroyed_at_oper in asmcomp/arm/proc.ml, you'll see that
>> d7 (s14+s15) is marked as destroyed for those operations where it is
>> used as scratch register.
> 
> I was able to reproduce this behavior with the stock OCaml 4.00.0 compiler,
> so I really do think there's a problem.
> 
> I whittled my code down to just a few lines.  Here it is:
> 
>    let rate_pos scounts : float =
>        let m_MIN = -999.0 
>        in let max1s = Array.make 14 m_MIN
>        in let max2s = Array.make_matrix 14 14 m_MIN
>        in let try_build (k1: int) (m: float) : unit =
>            let denom = 12
>            in let try1b (sawk1, xct) k =
>                let () =
>                    if max2s.(k1).(k) > m then
>                        let adjm = if m <= m_MIN then 0.0 else m
>                        in let numer =
>                            if k = k1 then 48
>                            else if sawk1 then 36
>                            else 24
>                        in let f = float_of_int numer /. float_of_int denom
>                        in let () =
>                            if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
>                        in
>                            max1s.(k1) <-
>                                max1s.(k1) +. (max2s.(k1).(k) -. adjm) *. f
>                in
>                    if k = k1 then
>                        (true, xct)
>                    else
>                        (sawk1, xct + scounts.(k))
>            in
>                ignore (List.fold_left try1b (false, 0) [])
>        in let () = Array.iteri try_build max1s
>        in
>            0.0
> 
> (This is a heavily hacked up piece of an evaluation function for a card
> game app.)
> 
> Here is my OCaml command line (running on Linux/ARM inside Qemu, as you
> suggested--it works!):
> 
> $ ocamlopt -ffpu vfpv3 -c -S rate.ml
> 
> I'm using vfpv3 because that's what I use for my iOS port.  The system
> type is linux_eabihf, which is what you need to get vfpv3 support.
> 
> The section that seems to misbehave is these three lines:
> 
>    in let f = float_of_int numer /. float_of_int denom
>    in let () =
>        if max1s.(k1) <= m_MIN then max1s.(k1) <- 0.0
> 
> Here is the assembly code with added annotations:
> 
>        ldr     r12, [r2, #16]    @ r12 <- m_MIN block
>        mov     r0, r7, asr #1    
>        ldr     r7, [r2, #20]
>        movs    r6, #0xc          @ r6 <- denom
>        fmsr    s14, r6           
>        fsitod  d10, s14          @ d10 <- float_of_int denom
>        ldr     r6, [r7, #-4]
>        fldd    d7, [r12, #0]     @ d7 <- m_MIN
>        ldr     r12, [r2, #28]
>        fmsr    s14, r0           @ *** d7 is destroyed here ***
>        fsitod  d9, s14           @ d9 <- float_of_int numer
>        cmp     r12, r6, lsr #10
>        bcs     .L111
>        add     r6, r7, r12, lsl #2
>        fldd    d6, [r6, #-4]     @ d6 <- max1s.(k1)
>        fdivd   d8, d9, d10
>        fcmpd   d6, d7            @ *** This comparison fails ***
>        fmstat
>        bhi     .L104
> 
> I built the OCaml 4.00.0 compiler from sources inside Qemu.  The
> line for configure was just this:
> 
>    $ ./configure --host armv5tejl-unknown-linux-gnueabihf
> 
> After that, I just built as usual.
> 
> If you agree that this is a problem, I can create a Mantis
> bug report for it (if you like).

Jep, that's a bug indeed. Somewhow ocamlopt seems to believe that the Ifloatofint instruction preserves d7 although it is marked as destroyed for this operation.

> Best regards,
> Jeffrey

greets,
Benedikt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] Re: ARM code generator problem
  2012-08-14  7:11     ` Benedikt Meurer
@ 2012-08-17  4:26       ` Jeffrey Scofield
  0 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Scofield @ 2012-08-17  4:26 UTC (permalink / raw)
  To: Benedikt Meurer; +Cc: Jeffrey Scofield, Caml List

Benedikt and OCamlers,

On Aug 14, 2012, at 12:11 AM, Benedikt Meurer wrote:

> Jep, that's a bug indeed. Somewhow ocamlopt seems to believe that the
> Ifloatofint instruction preserves d7 although it is marked as destroyed
> for this operation.

I created a Mantis issue for this problem, 5731:

    http://caml.inria.fr/mantis/view.php?id=5731

Regards,

Jeffrey


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-08-17  4:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-10 21:41 [Caml-list] ARM code generator problem Jeffrey Scofield
2012-08-11  8:00 ` Benedikt Meurer
2012-08-11  8:13   ` Benedikt Meurer
2012-08-11  8:57     ` Jeffrey Scofield
2012-08-11  9:48       ` [Caml-list] " Benedikt Meurer
2012-08-11  8:52   ` [Caml-list] " Jeffrey Scofield
2012-08-13 19:21   ` [Caml-list] " Jeffrey Scofield
2012-08-14  7:11     ` Benedikt Meurer
2012-08-17  4:26       ` Jeffrey Scofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).