From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA01221 for caml-red; Sat, 20 Jan 2001 16:18:09 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA03657 for ; Wed, 17 Jan 2001 14:43:39 +0100 (MET) Received: from mrwall.kal.com (mrwall.kal.com [194.193.14.236]) by nez-perce.inria.fr (8.11.1/8.10.0) with SMTP id f0HDhcL14959; Wed, 17 Jan 2001 14:43:38 +0100 (MET) Received: from mrwall.kal.com [194.193.14.236] (HELO localhost) by mrwall.kal.com (AltaVista Mail V2.0J/2.0J BL25J listener) id 0000_0045_3a65_a1d8_17ee; Wed, 17 Jan 2001 13:44:56 +0000 Received: from somewhere by smtpxd Message-ID: <3145774E67D8D111BE6E00C0DF418B663B84E6@nt.kal.com> From: Dave Berry To: Xavier Leroy Cc: Norman Ramsey , caml-list@inria.fr Subject: RE: questions about costs of nativeint vs int Date: Wed, 17 Jan 2001 13:47:35 -0000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.0.1460.8) Content-Type: text/plain; charset="iso-8859-1" Sender: weis@pauillac.inria.fr I was assuming a different approach to GC, using a new header format rather than ssociating static GC descriptors to every call site and heap block. I think this should be no slower than normal tagged GC. Let me describe the MLWorks tagging scheme, because it's got some ideas that might be useful to other implementors, even without extending it to support untagged integers. Unlike most ML compilers, which use 31-bit integers and assume that heap objects are word-aligned, MLWorks used 30-bit integers and assumed that heap objects were double-word-aligned. The use of 30-bit integers stemmed from the tagged integer operations on SPARCs (now deprecated). Combined with aligning heap objects on double words, this gave 8 run-time tags: Integers: Even(000), Odd(100) Headers: Heap block(010), Unused(110) Pointers: Normal(001), Ref(011), Pair(101), Unused(111) Aligning heap objects on double-words required padding even-length objects by an extra word. But using a special pointer for pairs meant that we could drop header words for pairs completely, which saved more space than was introduced by padding longer structures. I would be interested to learn whether any of these ideas are useful to other compilers. Given this framework, my idea for supporting untagged integers was to ensure that all untagged elements of a record were placed at the end of the record. The unused header tag (110) would mark the beginning of this section, and would encode the number of untagged words to skip. So if a record did not include any untagged elements, the GC cost would be unaffected. I was also planning to use the unused pointer tag (111) to encode lists of untagged integers. A 111 tag would both point to a pair and indicate that the word immediately after the pointer was untagged. Both approaches require that the compiler knows the actual layout of every type, with coercions inserted for some polymorphic operations. I accept your point that the complexity and cost of doing this may outweigh the gains. Certainly it's complicated if you pursue a full typeful approach to compilation. For functors, I would probably go for compiling functors to lambda code, and only generating machine code at functor application time. This would have other optimisation advantages (at least, it would have done for MLWorks, which didn't have any meaningful cross-module optimisation). But obviously that's a major change to any system. Dave. -----Original Message----- From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr] Sent: Sunday, January 14, 2001 11:13 To: Dave Berry Cc: Norman Ramsey; caml-list@inria.fr Subject: Re: questions about costs of nativeint vs int > Do you have any plans to implement unboxed native integers? Ocaml performs some amount of local (intra-function) unboxing on large integers (nativeint, int32, int64) as well as on floats. Version 3.00 missed some opportunities for local unboxing of large integers, but the next release will improve this, bringing integer unboxing to the level of float unboxing. However, large integers as well as floats remain boxed across functions and inside data structures (with a few special cases for arrays and big arrays). But you were probably asking about having native integers unboxed everywhere in the program. I did that in my Gallium experimental compiler in 1993-1994, using the type-based and coercion-based approach presented in my POPL 92 paper. Gallium had 32-bit unboxed integers, 64-bit unboxed floats, and unboxed tuples, in variables as well as inside data structures. Later, I gave up on this technology for several reasons: 1- The approach is very hard to extend to structures and functors. The FLINT and TIL teams later succeeded in making type-based data representations work in the presence of functors, using run-time type inspection instead (or in addition to) coercions. Still, I think this approach is too complex for its own good, and entails significant performance hits on polymorphic or functorized code. 2- GC becomes too slow. It is not hard to write a GC that can handle mixtures of boxed and unboxed data (without tag bits for the latter): just associate static GC descriptors to every call site and to every heap block. Gallium did it, no big deal. But the cost of interpreting these descriptors at every GC is quite high -- much higher than testing primary/secondary tags on data like the OCaml GC does. As a consequence, programs that do a lot of symbolic manipulations, which spend a lot of time in GC, run slower, while they don't benefit from unboxed integers and floats since they don't use integers and floats much. I found this unacceptable: symbolic computation is ML's bread-and-butter, and should remain as fast as possible. 3- Local unboxing (of the sort I mentioned earlier) achieves decent performances on floating-point and integer intensive code -- almost as good as Gallium-style unboxing, and without any of the GC slowdowns. (See my TIC'97 paper for some measures.) > I would really have liked to see this happen, because it would have made > integration with C both easier and more efficient. Yes, but at so many extra costs (in GC speed and in overall complexity) that I don't think it's worth it. Cheers, - Xavier PS. All papers mentioned are available at http://pauillac.inria.fr/~xleroy/leroy.html