caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* RE: questions about costs of nativeint vs int
@ 2001-01-12 19:53 Brent Fulgham
  0 siblings, 0 replies; 8+ messages in thread
From: Brent Fulgham @ 2001-01-12 19:53 UTC (permalink / raw)
  To: Dave Berry; +Cc: caml-list

> Do you have any plans to implement unboxed native integers?
>

There's an interesting paper on the Caml site discussing the
performance issues of boxed/unboxed values.  From a performance
standpoint I believe it indicated boxed values were a better
tradeoff (between performance/flexibility), though the C
integration issues you mention below were not (IIRC) discussed.
 
> I would really have liked to see this happen, because it 
> would have made integration with C both easier and more efficient.
> 

Couldn't a wrapper generator of some kind make this less
problematic?

For instance, the MzScheme implementation provides various C
Macros and other tools to get at the boxed values:

int scheme_get_int_val(Scheme_Object* v);

And some predicates to decide what kind of object you are dealing
with:

int SCHEME_INTP(Scheme_Object* v);	// Returns true/false.

SWIG can be used to auto-generate interface code using these
routines to provide a C/Scheme bridge.

This should be possible in OCaml as well.

-Brent



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: questions about costs of nativeint vs int
@ 2001-01-17 13:47 Dave Berry
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Berry @ 2001-01-17 13:47 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Norman Ramsey, caml-list

I was assuming a different approach to GC, using a new header format rather
than ssociating static GC descriptors to every call site and heap block.  I
think this should be no slower than normal tagged GC. 

Let me describe the MLWorks tagging scheme, because it's got some ideas that
might be useful to other implementors, even without extending it to support
untagged integers.

Unlike most ML compilers, which use 31-bit integers and assume that heap
objects are word-aligned, MLWorks used 30-bit integers and assumed that heap
objects were double-word-aligned.  The use of 30-bit integers stemmed from
the tagged integer operations on SPARCs (now deprecated).  Combined with
aligning heap objects on double words, this gave 8 run-time tags:

Integers: Even(000), Odd(100)
Headers:  Heap block(010), Unused(110)
Pointers:  Normal(001), Ref(011), Pair(101), Unused(111)

Aligning heap objects on double-words required padding even-length objects
by an extra word.  But using a special pointer for pairs meant that we could
drop header words for pairs completely, which saved more space than was
introduced by padding longer structures.

I would be interested to learn whether any of these ideas are useful to
other compilers.

Given this framework, my idea for supporting untagged integers was to ensure
that all untagged elements of a record were placed at the end of the record.
The unused header tag (110) would mark the beginning of this section, and
would encode the number of untagged words to skip.  So if a record did not
include any untagged elements, the GC cost would be unaffected.

I was also planning to use the unused pointer tag (111) to encode lists of
untagged integers.  A 111 tag would both point to a pair and indicate that
the word immediately after the pointer was untagged.

Both approaches require that the compiler knows the actual layout of every
type, with coercions inserted for some polymorphic operations.  I accept
your point that the complexity and cost of doing this may outweigh the
gains.  Certainly it's complicated if you pursue a full typeful approach to
compilation.  For functors, I would probably go for compiling functors to
lambda code, and only generating machine code at functor application time.
This would have other optimisation advantages (at least, it would have done
for MLWorks, which didn't have any meaningful cross-module optimisation).
But obviously that's a major change to any system.

Dave.


-----Original Message-----
From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr]
Sent: Sunday, January 14, 2001 11:13
To: Dave Berry
Cc: Norman Ramsey; caml-list@inria.fr
Subject: Re: questions about costs of nativeint vs int


> Do you have any plans to implement unboxed native integers?

Ocaml performs some amount of local (intra-function) unboxing on large
integers (nativeint, int32, int64) as well as on floats.  Version 3.00
missed some opportunities for local unboxing of large integers, but
the next release will improve this, bringing integer unboxing to the
level of float unboxing.  However, large integers as well as
floats remain boxed across functions and inside data structures
(with a few special cases for arrays and big arrays).

But you were probably asking about having native integers unboxed
everywhere in the program.  I did that in my Gallium experimental
compiler in 1993-1994, using the type-based and coercion-based
approach presented in my POPL 92 paper.  Gallium had 32-bit unboxed
integers, 64-bit unboxed floats, and unboxed tuples, in variables as
well as inside data structures.

Later, I gave up on this technology for several reasons:

1- The approach is very hard to extend to structures and functors.
The FLINT and TIL teams later succeeded in making type-based data
representations work in the presence of functors, using run-time type
inspection instead (or in addition to) coercions.  Still, I think this
approach is too complex for its own good, and entails significant
performance hits on polymorphic or functorized code.

2- GC becomes too slow.  It is not hard to write a GC that can handle
mixtures of boxed and unboxed data (without tag bits for the latter):
just associate static GC descriptors to every call site and to every
heap block.  Gallium did it, no big deal.  But the cost of interpreting
these descriptors at every GC is quite high -- much higher than
testing primary/secondary tags on data like the OCaml GC does.

As a consequence, programs that do a lot of symbolic manipulations,
which spend a lot of time in GC, run slower, while they don't benefit
from unboxed integers and floats since they don't use integers and
floats much.  I found this unacceptable: symbolic computation is ML's
bread-and-butter, and should remain as fast as possible.

3- Local unboxing (of the sort I mentioned earlier) achieves
decent performances on floating-point and integer intensive code --
almost as good as Gallium-style unboxing, and without any of the
GC slowdowns.  (See my TIC'97 paper for some measures.)

> I would really have liked to see this happen, because it would have made
> integration with C both easier and more efficient.

Yes, but at so many extra costs (in GC speed and in overall complexity)
that I don't think it's worth it.

Cheers,

- Xavier

PS.  All papers mentioned are available at
http://pauillac.inria.fr/~xleroy/leroy.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: questions about costs of nativeint vs int
  2001-01-14 11:13 ` Xavier Leroy
@ 2001-01-15  2:09   ` John Prevost
  0 siblings, 0 replies; 8+ messages in thread
From: John Prevost @ 2001-01-15  2:09 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Dave Berry, Norman Ramsey, caml-list

>>>>> "xl" == Xavier Leroy <Xavier.Leroy@inria.fr> writes:

    xl> Ocaml performs some amount of local (intra-function) unboxing
    xl> on large integers (nativeint, int32, int64) as well as on
    xl> floats.  Version 3.00 missed some opportunities for local
    xl> unboxing of large integers, but the next release will improve
    xl> this, bringing integer unboxing to the level of float
    xl> unboxing.  However, large integers as well as floats remain
    xl> boxed across functions and inside data structures (with a few
    xl> special cases for arrays and big arrays).

You say that this doesn't happen across function-call boundaries.  Am
I right in assuming that function-call boundaries that are smoothed
away by inlining are an exception to this?

What about tail-recursion boundaries?

John.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: questions about costs of nativeint vs int
  2001-01-12 12:39 Dave Berry
@ 2001-01-14 11:13 ` Xavier Leroy
  2001-01-15  2:09   ` John Prevost
  0 siblings, 1 reply; 8+ messages in thread
From: Xavier Leroy @ 2001-01-14 11:13 UTC (permalink / raw)
  To: Dave Berry; +Cc: Norman Ramsey, caml-list

> Do you have any plans to implement unboxed native integers?

Ocaml performs some amount of local (intra-function) unboxing on large
integers (nativeint, int32, int64) as well as on floats.  Version 3.00
missed some opportunities for local unboxing of large integers, but
the next release will improve this, bringing integer unboxing to the
level of float unboxing.  However, large integers as well as
floats remain boxed across functions and inside data structures
(with a few special cases for arrays and big arrays).

But you were probably asking about having native integers unboxed
everywhere in the program.  I did that in my Gallium experimental
compiler in 1993-1994, using the type-based and coercion-based
approach presented in my POPL 92 paper.  Gallium had 32-bit unboxed
integers, 64-bit unboxed floats, and unboxed tuples, in variables as
well as inside data structures.

Later, I gave up on this technology for several reasons:

1- The approach is very hard to extend to structures and functors.
The FLINT and TIL teams later succeeded in making type-based data
representations work in the presence of functors, using run-time type
inspection instead (or in addition to) coercions.  Still, I think this
approach is too complex for its own good, and entails significant
performance hits on polymorphic or functorized code.

2- GC becomes too slow.  It is not hard to write a GC that can handle
mixtures of boxed and unboxed data (without tag bits for the latter):
just associate static GC descriptors to every call site and to every
heap block.  Gallium did it, no big deal.  But the cost of interpreting
these descriptors at every GC is quite high -- much higher than
testing primary/secondary tags on data like the OCaml GC does.

As a consequence, programs that do a lot of symbolic manipulations,
which spend a lot of time in GC, run slower, while they don't benefit
from unboxed integers and floats since they don't use integers and
floats much.  I found this unacceptable: symbolic computation is ML's
bread-and-butter, and should remain as fast as possible.

3- Local unboxing (of the sort I mentioned earlier) achieves
decent performances on floating-point and integer intensive code --
almost as good as Gallium-style unboxing, and without any of the
GC slowdowns.  (See my TIC'97 paper for some measures.)

> I would really have liked to see this happen, because it would have made
> integration with C both easier and more efficient.

Yes, but at so many extra costs (in GC speed and in overall complexity)
that I don't think it's worth it.

Cheers,

- Xavier

PS.  All papers mentioned are available at
http://pauillac.inria.fr/~xleroy/leroy.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: questions about costs of nativeint vs int
@ 2001-01-12 12:39 Dave Berry
  2001-01-14 11:13 ` Xavier Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Berry @ 2001-01-12 12:39 UTC (permalink / raw)
  To: Xavier Leroy, Norman Ramsey; +Cc: caml-list

Do you have any plans to implement unboxed native integers?

We had such a plan during the development of MLWorks, but never implemented
it.  The register allocation code recognised a set of GC-able registers and
a set of non-GC registers.  (Since we never implemented unboxed native ints,
the set of non-GC registers was always 0 in practice).  We were planning to
change the heap layout so that the header word would encode the number of
untagged words.  The GC scan routine would skip the unboxed words, and
continue scanning the tagged words.  Obviously the compiler would have to
change to emit the correct data structures.

There were various complications.  One arose from our omission of header
words on pairs (which saved a large amount of space), but a little trickery
handled this.  I think there were some potential complications to make sure
the compiler always had enough information about data layout, but that
should be easier in OCaml than SML  (with the possible exception of objects,
which I've never implemented).

I would really have liked to see this happen, because it would have made
integration with C both easier and more efficient.

Dave.

-----Original Message-----
From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr]
Sent: Thursday, January 11, 2001 18:34
To: Norman Ramsey
Cc: caml-list@inria.fr
Subject: Re: questions about costs of nativeint vs int


Dear Norman,

> We're designing interfaces for tools that will sometimes have to
> manipulate 32-bit integers, but will often be able to make do with the
> limited precision provided by the int type.  I would love to get some
> information about the relative cost of int vs nativeint in both
> parameter passing and datatype representation.

Values of type "nativeint", "int32" and "int64" are boxed
(heap-allocated and handled through a pointer), while values of type
"int" are unboxed.  So, in terms of space, we have:
        nativeint       1 word for the pointer + 2 words for the heap block
        int32           same
        int64           same on 64-bit machines, add 1 word for a 32-bitter
        int             1 word for the data
In terms of speed, operations on boxed integers are more expensive due
to the extra loads (on operands) and heap allocations (on results).
ocamlopt can avoid loads and allocations that cancel each other in
simple cases, but the fact remains that nativeint arithmetic is more
expensive than int arithmetic.

> For example, what are the relative costs (e.g., memory requirements)
> of these two definitions?
> 
>   type loc = Cell of char * int * width
>            | Slice of ...
> 
>   type loc' = Cell of char * nativeint * width
>             | Slice of ...

A "Cell" requires two more memory words in the latter case.

> As another example, what are the costs of calling these functions:
> 
>   type t
>   val add  : rd:int       -> rs1:int       -> rs2:int       -> t
>   val add' : rd:nativeint -> rs1:nativeint -> rs2:nativeint -> t

Assuming the nativeints are already available, the cost is exactly the
same: a pointer is passed instead of an unboxed integer.  However,
computing the nativeint arguments in the first place can be more expensive
if arithmetic operations are involved.

> To extend the question, what happens if I use int32 or int64 in place
> of nativeint?

No major differences.  On a 32-bit processor, int64 requires one more
memory word than the other two.  And of course 64-bit arithmetic is
slower than 32-bit arithmetic on a 32-bit processor.

Actually, "nativeint" is isomorphic to "int32" on a 32-bit platform
and to "int64" on a 64-bit platform.  The reason for having all three
types is that some applications require 32-bit integers (no more, no
less, regardless of the platform), some require 64-bit integers, and
some just want to get at the natural word size for the architecture.

> Finally, a procedural question: should I be posting these sorts of
> questions to comp.lang.ml instead of sending them to
> caml-list@inria.fr?

For technical questions about the Caml implementation, I think you get
a better chance of a reply here on caml-list.

All the best,

- Xavier



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: questions about costs of nativeint vs int
  2001-01-11 18:34 ` Xavier Leroy
@ 2001-01-11 22:17   ` Norman Ramsey
  0 siblings, 0 replies; 8+ messages in thread
From: Norman Ramsey @ 2001-01-11 22:17 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

 > Values of type "nativeint", "int32" and "int64" are boxed
 > (heap-allocated and handled through a pointer), while values of type
 > "int" are unboxed...
 >
 > >   type loc = Cell of char * int * width
 > >            | Slice of ...
 > > 
 > >   type loc' = Cell of char * nativeint * width
 > >             | Slice of ...
 > 
 > A "Cell" requires two more memory words in the latter case.

Thanks for this very helpful reply.

I think for simplicity, we may go with nativeint now.  Another
question: are there any tools that would enable us to track, e.g.,
how much memory is being occupied by instances of "Cell", or even just
how many "Cells" might be live at a given moment?
That way we could evaluate the potential savings in moving to int for
some variants...


Norman



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: questions about costs of nativeint vs int
  2001-01-10 18:25 Norman Ramsey
@ 2001-01-11 18:34 ` Xavier Leroy
  2001-01-11 22:17   ` Norman Ramsey
  0 siblings, 1 reply; 8+ messages in thread
From: Xavier Leroy @ 2001-01-11 18:34 UTC (permalink / raw)
  To: Norman Ramsey; +Cc: caml-list

Dear Norman,

> We're designing interfaces for tools that will sometimes have to
> manipulate 32-bit integers, but will often be able to make do with the
> limited precision provided by the int type.  I would love to get some
> information about the relative cost of int vs nativeint in both
> parameter passing and datatype representation.

Values of type "nativeint", "int32" and "int64" are boxed
(heap-allocated and handled through a pointer), while values of type
"int" are unboxed.  So, in terms of space, we have:
        nativeint       1 word for the pointer + 2 words for the heap block
        int32           same
        int64           same on 64-bit machines, add 1 word for a 32-bitter
        int             1 word for the data
In terms of speed, operations on boxed integers are more expensive due
to the extra loads (on operands) and heap allocations (on results).
ocamlopt can avoid loads and allocations that cancel each other in
simple cases, but the fact remains that nativeint arithmetic is more
expensive than int arithmetic.

> For example, what are the relative costs (e.g., memory requirements)
> of these two definitions?
> 
>   type loc = Cell of char * int * width
>            | Slice of ...
> 
>   type loc' = Cell of char * nativeint * width
>             | Slice of ...

A "Cell" requires two more memory words in the latter case.

> As another example, what are the costs of calling these functions:
> 
>   type t
>   val add  : rd:int       -> rs1:int       -> rs2:int       -> t
>   val add' : rd:nativeint -> rs1:nativeint -> rs2:nativeint -> t

Assuming the nativeints are already available, the cost is exactly the
same: a pointer is passed instead of an unboxed integer.  However,
computing the nativeint arguments in the first place can be more expensive
if arithmetic operations are involved.

> To extend the question, what happens if I use int32 or int64 in place
> of nativeint?

No major differences.  On a 32-bit processor, int64 requires one more
memory word than the other two.  And of course 64-bit arithmetic is
slower than 32-bit arithmetic on a 32-bit processor.

Actually, "nativeint" is isomorphic to "int32" on a 32-bit platform
and to "int64" on a 64-bit platform.  The reason for having all three
types is that some applications require 32-bit integers (no more, no
less, regardless of the platform), some require 64-bit integers, and
some just want to get at the natural word size for the architecture.

> Finally, a procedural question: should I be posting these sorts of
> questions to comp.lang.ml instead of sending them to
> caml-list@inria.fr?

For technical questions about the Caml implementation, I think you get
a better chance of a reply here on caml-list.

All the best,

- Xavier



^ permalink raw reply	[flat|nested] 8+ messages in thread

* questions about costs of nativeint vs int
@ 2001-01-10 18:25 Norman Ramsey
  2001-01-11 18:34 ` Xavier Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Norman Ramsey @ 2001-01-10 18:25 UTC (permalink / raw)
  To: caml-list

We're designing interfaces for tools that will sometimes have to
manipulate 32-bit integers, but will often be able to make do with the
limited precision provided by the int type.  I would love to get some
information about the relative cost of int vs nativeint in both
parameter passing and datatype representation.

For example, what are the relative costs (e.g., memory requirements)
of these two definitions?

  type loc = Cell of char * int * width
           | Slice of ...

  type loc' = Cell of char * nativeint * width
            | Slice of ...

As another example, what are the costs of calling these functions:

  type t
  val add  : rd:int       -> rs1:int       -> rs2:int       -> t
  val add' : rd:nativeint -> rs1:nativeint -> rs2:nativeint -> t

To extend the question, what happens if I use int32 or int64 in place
of nativeint?


Finally, a procedural question: should I be posting these sorts of
questions to comp.lang.ml instead of sending them to
caml-list@inria.fr?


Thanks in advance for any help!


Norman



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-01-20 15:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-12 19:53 questions about costs of nativeint vs int Brent Fulgham
  -- strict thread matches above, loose matches on Subject: below --
2001-01-17 13:47 Dave Berry
2001-01-12 12:39 Dave Berry
2001-01-14 11:13 ` Xavier Leroy
2001-01-15  2:09   ` John Prevost
2001-01-10 18:25 Norman Ramsey
2001-01-11 18:34 ` Xavier Leroy
2001-01-11 22:17   ` Norman Ramsey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).