caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] OCaml SPARC asm compiler implementation question
@ 2002-06-03 16:45 John Carr
  2002-06-04  9:58 ` Xavier Leroy
  0 siblings, 1 reply; 2+ messages in thread
From: John Carr @ 2002-06-03 16:45 UTC (permalink / raw)
  To: caml-list


ocaml 3.04 uses registers %l6 and %l7 for allocation in compiled SPARC
code.  %l6 is the address of the last object allocated.  %l7 is a
pointer to a word holding the lower limit of the youngest generation,
young_limit in minor_gc.c.

The alloc instruction sequence looks like:
	ld	[%l7],%g1	! load young_limit
	sub	%l6,N,%l6	! decrement alloc pointer
	cmp	%l6,%g1		! bounds check
	bgeu	label		! branch if out of memory
	...

I have three questions:

Why does %l7 hold the address of young_limit instead of its value?
The alloc instruction sequence would be reduced by one instruction and
1-3 cycles (depending on chip) if the load could be omitted.  The MIPS
implementation does not have this indirection.


Looking at this code made me wonder, what fraction of operations in a
typical ocaml program are allocations?  _Garbage Collection_ quotes
statistics for several functional languages, including a version of ML,
but I don't think it mentions ocaml.


Are many people using ocaml on UltraSPARC processors?  The code is
optimized for older SPARCs and substantial improvement is possible
for UltraSPARC.  (For example, the integer load-double and store-double
instructions are faster than a pair of load or store instructions on
older SPARCs, are substantially slower on UltraSPARC, and trap to
emulation on the Hal SPARC-64.)  More on this in a future message.


    --John Carr (jfc@mit.edu)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] OCaml SPARC asm compiler implementation question
  2002-06-03 16:45 [Caml-list] OCaml SPARC asm compiler implementation question John Carr
@ 2002-06-04  9:58 ` Xavier Leroy
  0 siblings, 0 replies; 2+ messages in thread
From: Xavier Leroy @ 2002-06-04  9:58 UTC (permalink / raw)
  To: John Carr; +Cc: caml-list

> Why does %l7 hold the address of young_limit instead of its value?
> The alloc instruction sequence would be reduced by one instruction and
> 1-3 cycles (depending on chip) if the load could be omitted.  The MIPS
> implementation does not have this indirection.

The reason is related to signal handling.  Outside of blocking I/O
primitives, OCaml processes signals by polling: the signal handler
sets a flag, and this flag is periodically polled.  To avoid adding
polling instructions to ocamlopt-generated code, the signal handler
also simulates a "minor heap is full" condition by setting young_limit
above young_ptr; the next allocation then calls the garbage collector,
which tests the signal flag and acts accordingly.  

Consequently, young_limit can be cached in a register only if the
signal handler is able to modify this register in addition to the
global variable young_limit; otherwise, the next allocation would not
see the "minor heap is full" condition.

Modifying a register from a signal handler is highly OS-specific: the
kernel saves the registers on stack before calling the signal handler,
and restores the registers from the stack area before returning to the
interrupted code, so it's really the saved register area on the stack
that must be modified.  Most kernels give access to this saved
register area via additional arguments to the signal handler.
However, for the Sparc, at least one of the supported operating systems
(Solaris, SunOS 4, Linux, BSD) does not provide this facility (I can't
remember which one(s)).

Hence the Sparc code generator uses plan B: the address of young_limit
(and not its value) is cached in register %l7, and the current value
of young_limit is reloaded at each allocation.  This way, changes to
young_limit are seen at the next allocation.

(Other ports of ocamlopt use plan B for similar reasons: ARM, HPPA.)

> Looking at this code made me wonder, what fraction of operations in a
> typical ocaml program are allocations?  _Garbage Collection_ quotes
> statistics for several functional languages, including a version of ML,
> but I don't think it mentions ocaml.

I have no hard data to provide.

> Are many people using ocaml on UltraSPARC processors?  The code is
> optimized for older SPARCs and substantial improvement is possible
> for UltraSPARC.  (For example, the integer load-double and store-double
> instructions are faster than a pair of load or store instructions on
> older SPARCs, are substantially slower on UltraSPARC, and trap to
> emulation on the Hal SPARC-64.)

It is true that I do not have access to an UltraSPARC machine.
Thanks for the info on the integer ldd and std instructions.
Fortunately, they are not used in ocamlopt-generated code (for reasons
related to alignment constraints), only in glue code that is not time
critical.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-06-04  9:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-03 16:45 [Caml-list] OCaml SPARC asm compiler implementation question John Carr
2002-06-04  9:58 ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).