From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA02245 for caml-red; Sun, 14 Jan 2001 21:38:48 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id MAA31489 for ; Sun, 14 Jan 2001 12:13:38 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f0EBDT514620; Sun, 14 Jan 2001 12:13:29 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id MAA01610; Sun, 14 Jan 2001 12:13:29 +0100 (MET) Date: Sun, 14 Jan 2001 12:13:29 +0100 From: Xavier Leroy To: Dave Berry Cc: Norman Ramsey , caml-list@inria.fr Subject: Re: questions about costs of nativeint vs int Message-ID: <20010114121329.A562@pauillac.inria.fr> References: <3145774E67D8D111BE6E00C0DF418B663AF87C@nt.kal.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <3145774E67D8D111BE6E00C0DF418B663AF87C@nt.kal.com>; from dave@kal.com on Fri, Jan 12, 2001 at 12:39:06PM -0000 Sender: weis@pauillac.inria.fr > Do you have any plans to implement unboxed native integers? Ocaml performs some amount of local (intra-function) unboxing on large integers (nativeint, int32, int64) as well as on floats. Version 3.00 missed some opportunities for local unboxing of large integers, but the next release will improve this, bringing integer unboxing to the level of float unboxing. However, large integers as well as floats remain boxed across functions and inside data structures (with a few special cases for arrays and big arrays). But you were probably asking about having native integers unboxed everywhere in the program. I did that in my Gallium experimental compiler in 1993-1994, using the type-based and coercion-based approach presented in my POPL 92 paper. Gallium had 32-bit unboxed integers, 64-bit unboxed floats, and unboxed tuples, in variables as well as inside data structures. Later, I gave up on this technology for several reasons: 1- The approach is very hard to extend to structures and functors. The FLINT and TIL teams later succeeded in making type-based data representations work in the presence of functors, using run-time type inspection instead (or in addition to) coercions. Still, I think this approach is too complex for its own good, and entails significant performance hits on polymorphic or functorized code. 2- GC becomes too slow. It is not hard to write a GC that can handle mixtures of boxed and unboxed data (without tag bits for the latter): just associate static GC descriptors to every call site and to every heap block. Gallium did it, no big deal. But the cost of interpreting these descriptors at every GC is quite high -- much higher than testing primary/secondary tags on data like the OCaml GC does. As a consequence, programs that do a lot of symbolic manipulations, which spend a lot of time in GC, run slower, while they don't benefit from unboxed integers and floats since they don't use integers and floats much. I found this unacceptable: symbolic computation is ML's bread-and-butter, and should remain as fast as possible. 3- Local unboxing (of the sort I mentioned earlier) achieves decent performances on floating-point and integer intensive code -- almost as good as Gallium-style unboxing, and without any of the GC slowdowns. (See my TIC'97 paper for some measures.) > I would really have liked to see this happen, because it would have made > integration with C both easier and more efficient. Yes, but at so many extra costs (in GC speed and in overall complexity) that I don't think it's worth it. Cheers, - Xavier PS. All papers mentioned are available at http://pauillac.inria.fr/~xleroy/leroy.html