caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Native compilation for today's MIPS
@ 2009-08-10 17:34 rixed
  2009-08-10 23:08 ` [Caml-list] " rixed
  2009-08-21 20:20 ` rixed
  0 siblings, 2 replies; 8+ messages in thread
From: rixed @ 2009-08-10 17:34 UTC (permalink / raw)
  To: caml-list

Hello.

I'm trying to make ocaml native compiler works on a Loongson2F processor
with a GNU/Linux system.

So far, I managed to work around many ABI related issues (I want n32 ABI,
because from the configure script it seams closest from the old MIPS
assembly emmiter and because "the Internet" thinks it's faster than o32).

So, after some minor changes I got ocamlopt and ocamlopt.opt, but the
make opt.opt command fails while compiling camlp4 (or sometime the debugger,
depending on compilation flags) :

../ocamlopt.opt -nostdlib  -c -g -I camlp4 -I stdlib -o camlp4/Camlp4_import.cmx camlp4/Camlp4_import.ml
Fatal error: exception Invalid_argument("index out of bounds")

whatever OCAMLRUNPARAM settings I try, I have no backtrace. So to
figure out where this is comming from I tried gdb but with not much luck :

This GDB was configured as "mips64el-unknown-linux-gnu"...
(gdb) b caml_array_bound_error
Breakpoint 1 at 0x1016d708: file fail.c, line 192.
(gdb) r
Starting program: ocamlopt.opt -nostdlib -I ../stdlib -c -g -I camlp4 -I stdlib -o camlp4/Camlp4_import.cmx camlp4/Camlp4_import.ml

Breakpoint 1, caml_array_bound_error () at fail.c:192
192       if (! array_bound_error_bucket_inited) {
(gdb) bt
#0  caml_array_bound_error () at fail.c:192
#1  0x1016cff8 in caml_c_call () at mips.s:192
Backtrace stopped: frame did not save the PC

Now I'm running out of ideas.
I have cleared all gcc warnings about ABI mismatch but I suspect something
is still wrong in this area. Being new both to Mips and to OCaml does not help,
neither.

So I humbly request for any pointers or ideas about what to look for.

Also, I have to say these flawed ocamlopt and ocamlopt.opt compilers can
actually compile my own poor production of ML programs, which then appear
to run normaly, so Im lacking ML programs of intermediate "difficulty"
to experiment. Is there a test suite somewhere I could use to test the
compiler ? Should I test some particular language construct in particular ?
What's your opinion ?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-10 17:34 Native compilation for today's MIPS rixed
@ 2009-08-10 23:08 ` rixed
  2009-08-11 10:11   ` rixed
  2009-08-21 20:20 ` rixed
  1 sibling, 1 reply; 8+ messages in thread
From: rixed @ 2009-08-10 23:08 UTC (permalink / raw)
  To: caml-list

I managed to spot a problem with this small piece of code, which is
the last function of unison's fileinfo module :

let unchanged fspath path info =
  (* The call to [Util.time] must be before the call to [get] *)
  let t0 = Util.time () in
  let info' = get true fspath path in
  let dataUnchanged =
    Props.same_time info.desc info'.desc
      &&
    stamp info = stamp info'
      &&
    if Props.time info'.desc = t0 then begin
	 	prerr_endline ("infotime="^(string_of_float (Props.time info'.desc))^" et t0="^(string_of_float t0)); (* THIS POOR STYLE LINE IS FROM ME *)
      Unix.sleep 1;
      false
    end else
      true
  in ...

As is, the generated code does not work as intended : although
Props.time infos'.desc and t0 are two distinct floats, the = test
behave like the values were equal and this function loop while it
shouldn't.

The corresponding code is :

1004b608 <camlFileinfo__unchanged_109>:
1004b608:	27bdffe0 	addiu	sp,sp,-32
1004b60c:	afbf001c 	sw	ra,28(sp)
1004b610:	afbc0018 	sw	gp,24(sp)
1004b614:	3c180011 	lui	t8,0x11
1004b618:	27187eb8 	addiu	t8,t8,32440
1004b61c:	0338e02d 	daddu	gp,t9,t8

1004b620 <$180>:
1004b620:	afa80000 	sw	a4,0(sp)
1004b624:	afa90004 	sw	a5,4(sp)
1004b628:	afaa0008 	sw	a6,8(sp)
1004b62c:	0c01c5a5 	jal	10071694 <camlUtil__time_308>
1004b630:	24080001 	li	a4,1

1004b634 <$181>:
1004b634:	afa2000c 	sw	v0,12(sp)
1004b638:	24080003 	li	a4,3
1004b63c:	8fa90000 	lw	a5,0(sp)
1004b640:	0c012c9e 	jal	1004b278 <camlFileinfo__get_78>
1004b644:	8faa0004 	lw	a6,4(sp)

1004b648 <$182>:
1004b648:	afa20004 	sw	v0,4(sp)
1004b64c:	8c52000c 	lw	s2,12(v0)
1004b650:	8fae0008 	lw	t2,8(sp)
1004b654:	8dd1000c 	lw	s1,12(t2)
1004b658:	3c101015 	lui	s0,0x1015
1004b65c:	8e10f0a4 	lw	s0,-3932(s0)
1004b660:	8e0a002c 	lw	a6,44(s0)
1004b664:	8e49000c 	lw	a5,12(s2)
1004b668:	0c013889 	jal	1004e224 <camlProps__same_368>
1004b66c:	8e28000c 	lw	a4,12(s1)

1004b670 <$183>:
1004b670:	24010001 	li	at,1
1004b674:	10410051 	beq	v0,at,1004b7bc <$173>
1004b678:	00000000 	nop
1004b67c:	0c012d44 	jal	1004b510 <camlFileinfo__stamp_105>
1004b680:	8fa80004 	lw	a4,4(sp)

1004b684 <$184>:
1004b684:	afa20000 	sw	v0,0(sp)
1004b688:	0c012d44 	jal	1004b510 <camlFileinfo__stamp_105>
1004b68c:	8fa80008 	lw	a4,8(sp)

1004b690 <$185>:
1004b690:	0040202d 	move	a0,v0
1004b694:	8fa50000 	lw	a1,0(sp)
1004b698:	3c18100c 	lui	t8,0x100c
1004b69c:	0c033a96 	jal	100cea58 <caml_c_call>
1004b6a0:	271808a4 	addiu	t8,t8,2212

1004b6a4 <$186>:
1004b6a4:	24010001 	li	at,1
1004b6a8:	10410041 	beq	v0,at,1004b7b0 <$174>
1004b6ac:	00000000 	nop
1004b6b0:	8faa0004 	lw	a6,4(sp)
1004b6b4:	8d48000c 	lw	a4,12(a6)
1004b6b8:	8d07000c 	lw	a3,12(a4)
1004b6bc:	90e6fffc 	lbu	a2,-4(a3)
1004b6c0:	10c00007 	beqz	a2,1004b6e0 <$179>
1004b6c4:	00000000 	nop
1004b6c8:	8ce50000 	lw	a1,0(a3)
1004b6cc:	68b80000 	ldl	t8,0(a1)
1004b6d0:	6cb80007 	ldr	t8,7(a1)
1004b6d4:	44b80800 	dmtc1	t8,$f1
1004b6d8:	10000005 	b	1004b6f0 <$178>
1004b6dc:	00000000 	nop

1004b6e0 <$179>:
1004b6e0:	8ce40000 	lw	a0,0(a3)
1004b6e4:	68980000 	ldl	t8,0(a0)
1004b6e8:	6c980007 	ldr	t8,7(a0)
1004b6ec:	44b80800 	dmtc1	t8,$f1

1004b6f0 <$178>:
1004b6f0:	8fa8000c 	lw	a4,12(sp)
1004b6f4:	69180000 	ldl	t8,0(a4)
1004b6f8:	6d180007 	ldr	t8,7(a4)
1004b6fc:	44b80000 	dmtc1	t8,$f0
1004b700:	00000000 	nop
1004b704:	46200832 	c.eq.d	$f1,$f0
1004b708:	00000000 	nop
1004b70c:	45000025 	bc1f	1004b7a4 <$175>
1004b710:	00000000 	nop
1004b714:	0c020d9d 	jal	10083674 <camlPervasives__string_of_float_164>
1004b718:	00000000 	nop

1004b71c <$187>:
1004b71c:	0040482d 	move	a5,v0
1004b720:	3c081015 	lui	a4,0x1015
1004b724:	0c020cde 	jal	10083378 <camlPervasives__$5e_136>
1004b728:	2508ef1c 	addiu	a4,a4,-4324

1004b72c <$188>:
1004b72c:	afa20000 	sw	v0,0(sp)
1004b730:	8fa80004 	lw	a4,4(sp)
1004b734:	8d14000c 	lw	s4,12(a4)
1004b738:	8e93000c 	lw	s3,12(s4)
1004b73c:	9272fffc 	lbu	s2,-4(s3)
1004b740:	12400004 	beqz	s2,1004b754 <$177>
1004b744:	00000000 	nop
1004b748:	8e710000 	lw	s1,0(s3)
1004b74c:	10000003 	b	1004b75c <$176>
1004b750:	0220402d 	move	a4,s1

1004b754 <$177>:
1004b754:	8e700000 	lw	s0,0(s3)
1004b758:	0200402d 	move	a4,s0

1004b75c <$176>:
1004b75c:	0c020d9d 	jal	10083674 <camlPervasives__string_of_float_164>
1004b760:	00000000 	nop

1004b764 <$189>:
1004b764:	0040402d 	move	a4,v0
1004b768:	0c020cde 	jal	10083378 <camlPervasives__$5e_136>
1004b76c:	8fa90000 	lw	a5,0(sp)

1004b770 <$190>:
1004b770:	3c081015 	lui	a4,0x1015
1004b774:	2508ef0c 	addiu	a4,a4,-4340
1004b778:	0c020cde 	jal	10083378 <camlPervasives__$5e_136>
1004b77c:	0040482d 	move	a5,v0

1004b780 <$191>:
1004b780:	0c021067 	jal	1008419c <camlPervasives__prerr_endline_309>
1004b784:	0040402d 	move	a4,v0

1004b788 <$192>:
1004b788:	24040003 	li	a0,3
1004b78c:	3c18100b 	lui	t8,0x100b
1004b790:	0c033a96 	jal	100cea58 <caml_c_call>
1004b794:	27186e10 	addiu	t8,t8,28176

1004b798 <$193>:
1004b798:	240b0001 	li	a7,1
1004b79c:	10000009 	b	1004b7c4 <$172>
1004b7a0:	afab0000 	sw	a7,0(sp)

1004b7a4 <$175>:
1004b7a4:	240b0003 	li	a7,3
1004b7a8:	10000006 	b	1004b7c4 <$172>
1004b7ac:	afab0000 	sw	a7,0(sp)


This last caml_c_call being for the Unix.sleep if Im not mistaken.
Notice how floating point registers $f0 and $f1 are used and compared
together for equality, jumping if they are not equal to <$175> (which
sets a7 to true and leaves this part of the code. I'm googling for
mips instructions while reading so I migh be wrong, but the test
looks OK to me. So certainly f0 and/or f1 are wrong, despite the correct
values being displayed by prerr_endline.

Now, lets change a little how the test is done :

let unchanged fspath path info =
  (* The call to [Util.time] must be before the call to [get] *)
  let t0 = Util.time () in
  let info' = get true fspath path in
  let are_the_same a b = a = b in  (* I ADDED THIS MASTERPIECE *)
  let dataUnchanged =
    Props.same_time info.desc info'.desc
      &&
    stamp info = stamp info'
      &&
    if are_the_same (Props.time info'.desc) t0 then begin
	 	prerr_endline ("infotime="^(string_of_float (Props.time info'.desc))^" et t0="^(string_of_float t0));
      Unix.sleep 1;
      false
    end else
      true
  in

With the addition if this gratuitous complication of the code, unison
works (at least it does no longer loop forever here and do copy some files).
The generated code is here and, abracadabra!, all the floating point code is
gone :


1004b644 <camlFileinfo__unchanged_109>:
1004b644:	27bdffe0 	addiu	sp,sp,-32
1004b648:	afbf001c 	sw	ra,28(sp)
1004b64c:	afbc0018 	sw	gp,24(sp)
1004b650:	3c180011 	lui	t8,0x11
1004b654:	27187e8c 	addiu	t8,t8,32396
1004b658:	0338e02d 	daddu	gp,t9,t8

1004b65c <$182>:
1004b65c:	afa80000 	sw	a4,0(sp)
1004b660:	afa90004 	sw	a5,4(sp)
1004b664:	afaa0008 	sw	a6,8(sp)
1004b668:	0c01c5ad 	jal	100716b4 <camlUtil__time_308>
1004b66c:	24080001 	li	a4,1

1004b670 <$183>:
1004b670:	afa2000c 	sw	v0,12(sp)
1004b674:	24080003 	li	a4,3
1004b678:	8fa90000 	lw	a5,0(sp)
1004b67c:	0c012cad 	jal	1004b2b4 <camlFileinfo__get_78>
1004b680:	8faa0004 	lw	a6,4(sp)

# LOOK AT THESE TWO NEW INSTRUCTIONS HERE (lui and addiu)
# TO CHANGE v1. WHAT IS IN v1 ?
1004b684 <$184>:
1004b684:	afa20004 	sw	v0,4(sp)
1004b688:	3c031015 	lui	v1,0x1015
1004b68c:	2463ecc4 	addiu	v1,v1,-4924
1004b690:	8c55000c 	lw	s5,12(v0)
1004b694:	8faf0008 	lw	t3,8(sp)
1004b698:	8df4000c 	lw	s4,12(t3)
1004b69c:	3c131015 	lui	s3,0x1015
1004b6a0:	8e73f0b4 	lw	s3,-3916(s3)
1004b6a4:	8e6a002c 	lw	a6,44(s3)
1004b6a8:	8ea9000c 	lw	a5,12(s5)
1004b6ac:	0c013891 	jal	1004e244 <camlProps__same_368>
1004b6b0:	8e88000c 	lw	a4,12(s4)

1004b6b4 <$185>:
1004b6b4:	24010001 	li	at,1
1004b6b8:	10410048 	beq	v0,at,1004b7dc <$175>
1004b6bc:	00000000 	nop
1004b6c0:	0c012d53 	jal	1004b54c <camlFileinfo__stamp_105>
1004b6c4:	8fa80004 	lw	a4,4(sp)

1004b6c8 <$186>:
1004b6c8:	afa20000 	sw	v0,0(sp)
1004b6cc:	0c012d53 	jal	1004b54c <camlFileinfo__stamp_105>
1004b6d0:	8fa80008 	lw	a4,8(sp)

1004b6d4 <$187>:
1004b6d4:	0040202d 	move	a0,v0
1004b6d8:	8fa50000 	lw	a1,0(sp)
1004b6dc:	3c18100c 	lui	t8,0x100c
1004b6e0:	0c033a9e 	jal	100cea78 <caml_c_call>
1004b6e4:	271808c4 	addiu	t8,t8,2244

1004b6e8 <$188>:
1004b6e8:	24010001 	li	at,1
1004b6ec:	10410038 	beq	v0,at,1004b7d0 <$176>
1004b6f0:	00000000 	nop
1004b6f4:	8fab0004 	lw	a7,4(sp)
1004b6f8:	8d6b000c 	lw	a7,12(a7)
1004b6fc:	8d6a000c 	lw	a6,12(a7)
1004b700:	9149fffc 	lbu	a5,-4(a6)
1004b704:	11200003 	beqz	a5,1004b714 <$181>
1004b708:	00000000 	nop
1004b70c:	10000002 	b	1004b718 <$180>
1004b710:	8d440000 	lw	a0,0(a6)

1004b714 <$181>:
1004b714:	8d440000 	lw	a0,0(a6)

# HERE IT'S COMPLETELY DIFFERENT FROM ABOVE
# THE FP CODE IS GONE ??
1004b718 <$180>:
1004b718:	8fa5000c 	lw	a1,12(sp)
1004b71c:	3c18100c 	lui	t8,0x100c
1004b720:	0c033a9e 	jal	100cea78 <caml_c_call>
1004b724:	271808c4 	addiu	t8,t8,2244

1004b728 <$189>:
1004b728:	24010001 	li	at,1
1004b72c:	10410025 	beq	v0,at,1004b7c4 <$177>
1004b730:	00000000 	nop
1004b734:	0c020da5 	jal	10083694 <camlPervasives__string_of_float_164>
1004b738:	8fa8000c 	lw	a4,12(sp)

1004b73c <$190>:
1004b73c:	0040482d 	move	a5,v0
1004b740:	3c081015 	lui	a4,0x1015
1004b744:	0c020ce6 	jal	10083398 <camlPervasives__$5e_136>
1004b748:	2508ef2c 	addiu	a4,a4,-4308

1004b74c <$191>:
1004b74c:	afa20000 	sw	v0,0(sp)
1004b750:	8fa80004 	lw	a4,4(sp)
1004b754:	8d15000c 	lw	s5,12(a4)
1004b758:	8eb4000c 	lw	s4,12(s5)
1004b75c:	9293fffc 	lbu	s3,-4(s4)
1004b760:	12600004 	beqz	s3,1004b774 <$179>
1004b764:	00000000 	nop
1004b768:	8e920000 	lw	s2,0(s4)
1004b76c:	10000003 	b	1004b77c <$178>
1004b770:	0240402d 	move	a4,s2

1004b774 <$179>:
1004b774:	8e910000 	lw	s1,0(s4)
1004b778:	0220402d 	move	a4,s1

1004b77c <$178>:
1004b77c:	0c020da5 	jal	10083694 <camlPervasives__string_of_float_164>
1004b780:	00000000 	nop

1004b784 <$192>:
1004b784:	0040402d 	move	a4,v0
1004b788:	0c020ce6 	jal	10083398 <camlPervasives__$5e_136>
1004b78c:	8fa90000 	lw	a5,0(sp)

1004b790 <$193>:
1004b790:	3c081015 	lui	a4,0x1015
1004b794:	2508ef1c 	addiu	a4,a4,-4324
1004b798:	0c020ce6 	jal	10083398 <camlPervasives__$5e_136>
1004b79c:	0040482d 	move	a5,v0

1004b7a0 <$194>:
1004b7a0:	0c02106f 	jal	100841bc <camlPervasives__prerr_endline_309>
1004b7a4:	0040402d 	move	a4,v0

1004b7a8 <$195>:
1004b7a8:	24040003 	li	a0,3
1004b7ac:	3c18100b 	lui	t8,0x100b
1004b7b0:	0c033a9e 	jal	100cea78 <caml_c_call>
1004b7b4:	27186e30 	addiu	t8,t8,28208

1004b7b8 <$196>:
1004b7b8:	240b0001 	li	a7,1
1004b7bc:	10000009 	b	1004b7e4 <$174>
1004b7c0:	afab0000 	sw	a7,0(sp)

1004b7c4 <$177>:
1004b7c4:	240b0003 	li	a7,3
1004b7c8:	10000006 	b	1004b7e4 <$174>
1004b7cc:	afab0000 	sw	a7,0(sp)


So what's happening there ? Why no more FP registers ?

I'm still in the dark but I wanted to share these questions with the
list in the hope that a Mips guru would see this and recognize at first
sight the stupid mistake I did and point me in the right direction.

If all the gurus are in hollidays, well, it's fun anyway :)
Sory to waste the bandwidth.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-10 23:08 ` [Caml-list] " rixed
@ 2009-08-11 10:11   ` rixed
  2009-08-11 16:38     ` rixed
  2009-08-11 18:06     ` rixed
  0 siblings, 2 replies; 8+ messages in thread
From: rixed @ 2009-08-11 10:11 UTC (permalink / raw)
  To: caml-list

We are in August and there is nothing on TV. So why not read a little
MIPS assembly instead ?

I commented the part of unison's code that was looping forever, and while
doing so I think I started to understand what's wrong with it.

Remember this code from fileinfo.ml :

let unchanged fspath path info =
  (* The call to [Util.time] must be before the call to [get] *)
  let t0 = Util.time () in
  let info' = get true fspath path in
  let dataUnchanged =
    Props.same_time info.desc info'.desc
      &&
    stamp info = stamp info'
      &&
    if Props.time info'.desc = t0 then begin
	 	prerr_endline ("infotime="^(string_of_float (Props.time info'.desc))^" et t0="^(string_of_float t0));
      Unix.sleep 1;
      false
    end else
      true
  in
  ...

Here is the commented assembly (best viewed in 16.9 aspect ratio :-) 
Please if you have some knowlodge on Mips assembly check that Im
correct. Also, if you have some knowledge on Ocaml internals feel free
to fill the missing bits of information. Or if you are just currious
about what OCaml code looks like, the Mips wikipedia page provide
almost all the information required to understand the following code.
The code demonstrate some very good surprises, and some other less so.


; The three input params are in a4, a5 and a6.
; I was expecting Ocaml to follow regular conventions (use a0-a3 instead)
; but apparently it's not the case. There seams to be something in there
; already.
1004b608 <camlFileinfo__unchanged_109>:
1004b608:	27bdffe0 	addiu	sp,sp,-32	; sp -= 32 (make some room for our temp storage)
1004b60c:	afbf001c 	sw	ra,28(sp)		; save return address in [sp+28]
1004b610:	afbc0018 	sw	gp,24(sp)		; save global pointer (?) in [sp+24]
1004b614:	3c180011 	lui	t8,0x11		; t8 = 0x110000
1004b618:	27187eb8 	addiu	t8,t8,32440	; t8 += 32440 -> t8 = 0x117eb8
1004b61c:	0338e02d 	daddu	gp,t9,t8		; gp = t9 + t8 (what's in t9 ? something like caml_young_ptr ?)

1004b620:	afa80000 	sw	a4,0(sp)			; our params are : a4 = fspath, a5 = path, a6 = info, all boxed. Save them.
1004b624:	afa90004 	sw	a5,4(sp)			; onto [sp+0], [sp+1] and [sp+2]
1004b628:	afaa0008 	sw	a6,8(sp)

1004b62c:	0c01c5a5 	jal	10071694 <camlUtil__time_308>	; call time()
1004b630:	24080001 	li	a4,1				; Funny Mips trick #1 : this is executed simultaneously from the previous jump, and gives the only argument (which is unit)
1004b634:	afa2000c 	sw	v0,12(sp)	; save returned time in [sp+12] ( = t0, boxed float according to signature)

1004b638:	24080003 	li	a4,3			; a4 = 3 (true in Ocaml language)
1004b63c:	8fa90000 	lw	a5,0(sp)		; a5 = fspath
1004b640:	0c012c9e 	jal	1004b278 <camlFileinfo__get_78>	; called with true (a4), fspath, path
1004b644:	8faa0004 	lw	a6,4(sp)		; remember trick#1, here is path.
1004b648:	afa20004 	sw	v0,4(sp)		; save return value in [sp+4] (info')
; Notice here how we reused the location of path, which is no more used. Wow!

; We need here to have a look at info and info' type :
; info type = { typ : typ; inode : int; ctime : float; desc : Props.t; osX : Osx.info}
; where
; Props.t type is =  { perm : Perm.t; uid : Uid.t; gid : Gid.t; time : Time.t; typeCreator : TypeCreator.t; length : Uutil.Filesize.t }
1004b64c:	8c52000c 	lw	s2,12(v0)	; s2 = [v0+12] ie info'.desc
1004b650:	8fae0008 	lw	t2,8(sp)		; t2 = [sp+8] ie info (our third param)
1004b654:	8dd1000c 	lw	s1,12(t2)	; s1 = [info+12] ie info.desc
1004b658:	3c101015 	lui	s0,0x1015	; s0 = 0x1012 (some external symbol ?)
1004b65c:	8e10f0a4 	lw	s0,-3932(s0)	; s0 = something 3932 bytes before it
1004b660:	8e0a002c 	lw	a6,44(s0)	; a6 = this_thing.eleventh_slot ? ie a way to reach module Unix.time (see below) ?
1004b664:	8e49000c 	lw	a5,12(s2)	; a5 = info'.desc[12] ie info'.desc.time
1004b668:	0c013889 	jal	1004e224 <camlProps__same_368>	; call Props.same, which does not exist according to props.ml
1004b66c:	8e28000c 	lw	a4,12(s1)	; (mips trick#1 again) a4 = info.desc[12] ie info.desc.time
; But we have in props.ml this one : 
; let same_time p p' = Time.same p.time p'.time
; Look like we have here inter-module inlining.
; How is it possible ? Perhaps a "same" fucntion is added into Props module with an additionnal parameter to points to Time module,
; and this info is available somehow in props.cmi ?

1004b670:	24010001 	li	at,1	; a1 = true
1004b674:	10410051 	beq	v0,at,1004b7bc <$173>	; cmp previous return value with true, and if so goto $173 where we left
1004b678:	00000000 	nop	; this one is executed in both cases, so nop is safe
1004b67c:	0c012d44 	jal	1004b510 <camlFileinfo__stamp_105>	; call stamp on ...
1004b680:	8fa80004 	lw	a4,4(sp)	; ... [sp+4] aka info'
1004b684:	afa20000 	sw	v0,0(sp)		; and save in [sp]
1004b688:	0c012d44 	jal	1004b510 <camlFileinfo__stamp_105>	; and call once again stamp with ...
1004b68c:	8fa80008 	lw	a4,8(sp)	; ... [sp+8] aka info
1004b690:	0040202d 	move	a0,v0	; a0 = return value

1004b694:	8fa50000 	lw	a1,0(sp)	; restore from stack stamp of info'
1004b698:	3c18100c 	lui	t8,0x100c	; t8 = 0x100C + 2212 (see below) = 0x18B0 -> C float comparison function certainly.
1004b69c:	0c033a96 	jal	100cea58 <caml_c_call>	; This will call function at 0x18B0, respecting C calling conventions.
; register t8 is an alias $24 and mips.s comments about caml_c_call says : Function to call is in $24.
1004b6a0:	271808a4 	addiu	t8,t8,2212

1004b6a4:	24010001 	li	at,1	; true
1004b6a8:	10410041 	beq	v0,at,1004b7b0 <$174>	; if both stamps are eq, goto $174 which is equivalent to $147.
1004b6ac:	00000000 	nop	; done in both alternatives

; Now this is getting harder. We are seing here an inlined version of Props.time p = Time.extract p.time,
; which is itself inlined. We have :
; Time.extract t = match t with Synced v -> v | NotSynced v -> v
; with this type = Synced of float | NotSynced of float
; I'm still puzzled by the amount of information this code have about Props.t abstract type. Were'nt modules
; supposed to be compiled separately ?
1004b6b0:	8faa0004 	lw	a6,4(sp)	; a6 = [sp+4] aka info'
1004b6b4:	8d48000c 	lw	a4,12(a6)	; a4 = info'.desc
1004b6b8:	8d07000c 	lw	a3,12(a4)	; a3 = info'.desc.time
1004b6bc:	90e6fffc 	lbu	a2,-4(a3)	; a2 = the byte at a3-4, ie the tag of info'.desc.time (correct since we are little endian)
1004b6c0:	10c00007 	beqz	a2,1004b6e0 <$179>	; if it's 0 (Synced), then goto $179
1004b6c4:	00000000 	nop
1004b6c8:	8ce50000 	lw	a1,0(a3)	; a1 = the NotSynced float (which is boxed)
1004b6cc:	68b80000 	ldl	t8,0(a1)	; Mips strange but clever way to load a possibly unaligned doubleword into t8.
1004b6d0:	6cb80007 	ldr	t8,7(a1)	; Contrary to what an Intel coder might think, we are keeping same byte order and the 7 here should not be a 8 :-)
1004b6d4:	44b80800 	dmtc1	t8,$f1	; now on a 64bits host this loads $f1 with this doubleword.
1004b6d8:	10000005 	b	1004b6f0 <$178>	; and we are done
1004b6dc:	00000000 	nop

1004b6e0 <$179>:	; when info'.desc.time is Synched... just does the same :-)
; I suppose this duplication of code could have been avoided with another ML syntax ?
1004b6e0:	8ce40000 	lw	a0,0(a3)
1004b6e4:	68980000 	ldl	t8,0(a0)
1004b6e8:	6c980007 	ldr	t8,7(a0)
1004b6ec:	44b80800 	dmtc1	t8,$f1

; Now that we have (Props.time info'.desc) in $f1, fetch t0
1004b6f0 <$178>:
1004b6f0:	8fa8000c 	lw	a4,12(sp)	; a4 = [sp+12] (t0)
1004b6f4:	69180000 	ldl	t8,0(a4)	; load the double word float into t8
1004b6f8:	6d180007 	ldr	t8,7(a4)
1004b6fc:	44b80000 	dmtc1	t8,$f0	; and copy it to FPU's $f0
1004b700:	00000000 	nop				; specs says : "For MIPS III, the contents of FPR fs are undefined for the instruction immediately following DMTC1.". So be it.
1004b704:	46200832 	c.eq.d	$f1,$f0	; compare those doublewords FP registers (remember, when running these two times are not equal).
1004b708:	00000000 	nop	; again, Mips need some rest here before accessing result
1004b70c:	45000025 	bc1f	1004b7a4 <$175>	; if the previous test was false (thus, if $f1!=$f0) goto $175. This seams correct.
1004b710:	00000000 	nop
1004b714:	0c020d9d 	jal	10083674 <camlPervasives__string_of_float_164>	; and yet, we are entering here !
1004b718:	00000000 	nop

.. This part of this function's code is irrelevant to the problem ...

; The end when dataUnchanged is true. We could have reused $174 or $173.
; So each time you write a && b && c you end up with 3 different true(s) ?
1004b7a4 <$175>:
1004b7a4:	240b0003 	li	a7,3
1004b7a8:	10000006 	b	1004b7c4 <$172>
1004b7ac:	afab0000 	sw	a7,0(sp)

; The end when dataUnchanged is true. We could have reused $173.
1004b7b0 <$174>:
1004b7b0:	240b0001 	li	a7,1
1004b7b4:	10000003 	b	1004b7c4 <$172>
1004b7b8:	afab0000 	sw	a7,0(sp)

; The end when dataUnchanged is true
1004b7bc <$173>:
1004b7bc:	240b0001 	li	a7,1
1004b7c0:	afab0000 	sw	a7,0(sp)

etc.

Interresting indeed.
But what about this floatnig point comparison ?
I've read somewhere that MIPS3 FPU can mimick a 32 bits
MIPS1 FPU, where 64bits doubles were stored on a pair of FPU registers.
In this mode, writing a double or reading one from an odd register in undefined.

Maybe someone knows if Loongson CPU are configured this way  on Linux ?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-11 10:11   ` rixed
@ 2009-08-11 16:38     ` rixed
  2009-08-11 18:06     ` rixed
  1 sibling, 0 replies; 8+ messages in thread
From: rixed @ 2009-08-11 16:38 UTC (permalink / raw)
  To: rixed; +Cc: caml-list

First, correcting myself, since I managed to confuse true and false :

-[ Tue, Aug 11, 2009 at 12:11:00PM +0200, rixed@happyleptic.org ]----
> 1004b670:	24010001 	li	at,1	; a1 = true

No, 1 is false. True would be 3. Thus :

> 1004b674:	10410051 	beq	v0,at,1004b7bc <$173>	; cmp previous return value with true, and if so goto $173 where we left

Of course, we break out of this sequence of "&&"s if the test is false.

> 1004b6a4:	24010001 	li	at,1	; true
> 1004b6a8:	10410041 	beq	v0,at,1004b7b0 <$174>	; if both stamps are eq, goto $174 which is equivalent to $147.

Same remark as above.

> ; The end when dataUnchanged is true. We could have reused $174 or $173.
> ; So each time you write a && b && c you end up with 3 different true(s) ?

So here at the end, dataUnchanged is false.

But this does change much to the problem.

So, After this little training, deciphering the second code (the one whithout FPU code
which happen to work) is a piece of cake. Basically, instead of the FP registers comparison
we have this :


1004b718 <$180>:
1004b718:   8fa5000c    lw a1,12(sp)   ; a1 = [sp+12], aka t0
1004b71c:   3c18100c    lui   t8,0x100c   ; t8 = 0x100c+2244 -> same C function than before (equality test)
1004b720:   0c033a9e    jal   100cea78 <caml_c_call>  ; call it...
1004b724:   271808c4    addiu t8,t8,2244

Ie the added function "is_same a b = a = b" had the (well known) consequence to replace
the inline equality testing by the generic C version.

So, the bug lies in the inline equality function for float values.

Stay tunned !

:-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-11 10:11   ` rixed
  2009-08-11 16:38     ` rixed
@ 2009-08-11 18:06     ` rixed
  2009-08-12 12:12       ` David MENTRE
  1 sibling, 1 reply; 8+ messages in thread
From: rixed @ 2009-08-11 18:06 UTC (permalink / raw)
  To: caml-list

> 1004b6cc:	68b80000 	ldl	t8,0(a1)	; Mips strange but clever way to load a possibly unaligned doubleword into t8.
> 1004b6d0:	6cb80007 	ldr	t8,7(a1)	

Too clever for me apparently.
Of course on a little endian CPU that must be :

ldl t8,7(a1)
ldr t8,0(a1)

This problem is gone.
Now camlp4 is compiled, and unison seams to work.

Im going to try it on all Ocaml programs I can find to stress it a
little bit.

:-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-11 18:06     ` rixed
@ 2009-08-12 12:12       ` David MENTRE
  2009-08-12 13:46         ` rixed
  0 siblings, 1 reply; 8+ messages in thread
From: David MENTRE @ 2009-08-12 12:12 UTC (permalink / raw)
  To: rixed; +Cc: caml-list

Hello,

2009/8/11  <rixed@happyleptic.org>:
>> 1004b6cc:     68b80000        ldl     t8,0(a1)        ; Mips strange but clever way to load a possibly unaligned doubleword into t8.
>> 1004b6d0:     6cb80007        ldr     t8,7(a1)
>
> Too clever for me apparently.
> Of course on a little endian CPU that must be :
>
> ldl t8,7(a1)
> ldr t8,0(a1)
>
> This problem is gone.
> Now camlp4 is compiled, and unison seams to work.

Is this a problem you fixed in the compiler?

Yours,
d.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-12 12:12       ` David MENTRE
@ 2009-08-12 13:46         ` rixed
  0 siblings, 0 replies; 8+ messages in thread
From: rixed @ 2009-08-12 13:46 UTC (permalink / raw)
  To: caml-list

-[ Wed, Aug 12, 2009 at 02:12:26PM +0200, David MENTRE ]----
> Hello,
> 
> > ldl t8,7(a1)
> > ldr t8,0(a1)
> >
> Is this a problem you fixed in the compiler?

Well, it's not really a bug since the MIPs compiler was tailored for bigendians.
But if one want a MIPS compiler also for little endians this is something that
must be changed in mips/emit.ml (this, and a couple of other things).

For now I'm just experimenting. I will build an actual patch for people interrested
when/if I become confident with the result.

I just learnt that Debian is willing to create a mips64el distribution in addition to
their venerable mips and mipsel versions. OCaml native compiler would be fine to
have.  Anyway, I have many other things to debug (like dynamic sharing)  :)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Native compilation for today's MIPS
  2009-08-10 17:34 Native compilation for today's MIPS rixed
  2009-08-10 23:08 ` [Caml-list] " rixed
@ 2009-08-21 20:20 ` rixed
  1 sibling, 0 replies; 8+ messages in thread
From: rixed @ 2009-08-21 20:20 UTC (permalink / raw)
  To: caml-list

Done !

At least, now coq compiles and seams to run OK.

I encountered a strange bug, anyway, and would like some advice about
my fix. Also, this bug do not seams related to my particular
architecture and, as far as I can tell, would hit any MIPS.

At various occasions the coqtop program jumped in the wrong places.
This was due to the code emited for some tail calls :

    | Lop(Itailcall_imm s) ->
        if s = !function_name then begin
          `	b	{emit_label !tailrec_entry_point}\n`
        end else begin
          let n = frame_size() in
          if !contains_calls then
            `	lw	$31, {emit_int(n - 4)}($sp)\n`;
          if !uses_gp then
            `	lw	$gp, {emit_int(n - 8)}($sp)\n`;
          if n > 0 then
            `	addu	$sp, $sp, {emit_int n}\n`;
          `	la	$25, {emit_symbol s}\n`;
          liveregs i live_25;
          `	j	$25\n`
        end

Now when !uses_gp is true, then the gp register is restored.
Only then the address of symbol s is fetched into register $25, and
jumped to. The problem is : the pseudo instruction 'la' may result 
in some code that uses gp to reach the global offset table and read the
address from there. The assembler will then assume that the gp is the
one that was setup at the begening of the function, and not the one
that's just restored from the stack. Maybe for small programs the value
of gp is always the same (I only vaguely understand this global pointer
thing) but it is not the case for coq.

So I merely moved upward the 'la' instruction, before the 'lw gp,stack'.
And it works, apparently.

What do you think ?


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-08-21 20:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-10 17:34 Native compilation for today's MIPS rixed
2009-08-10 23:08 ` [Caml-list] " rixed
2009-08-11 10:11   ` rixed
2009-08-11 16:38     ` rixed
2009-08-11 18:06     ` rixed
2009-08-12 12:12       ` David MENTRE
2009-08-12 13:46         ` rixed
2009-08-21 20:20 ` rixed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).