From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA28871; Wed, 14 Feb 2001 20:44:56 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA28791 for ; Wed, 14 Feb 2001 20:44:55 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f1EJiqP19425; Wed, 14 Feb 2001 20:44:52 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA28780; Wed, 14 Feb 2001 20:44:52 +0100 (MET) Date: Wed, 14 Feb 2001 20:44:52 +0100 From: Xavier Leroy To: Don Syme Cc: caml-list@inria.fr Subject: Re: [Caml-list] RE: OCaml on CLR/JVM? Message-ID: <20010214204452.C28371@pauillac.inria.fr> References: <0C682B70CE37BC4EADED9D375809768A48FD3A@red-msg-04.redmond.corp.microsoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0i In-Reply-To: <0C682B70CE37BC4EADED9D375809768A48FD3A@red-msg-04.redmond.corp.microsoft.com>; from dsyme@microsoft.com on Tue, Feb 13, 2001 at 11:31:10PM -0800 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > If you're compiling to bytecode you can > also ensure more compatibilities of representations, e.g. make sure ML > int64's are exactly representationally equivalent to C's int64s. Note if > you don't compile to a bytecode then you even have to marshal integers > across the interop boundary in Caml, though this could be automated. The point I'd like to argue is that compiling to bytecode doesn't guarantee that Caml's int64's (or any other data structure) are representationally equivalent to C's int64s. What you're describing is the favorable case where the source language can be compiled to bytecode with a "natural mapping" (little or no encoding at all) of the data types and data structures. This is indeed a boon for interoperability: data doesn't need to be marshalled across language boundaries, objects can be physically shared, funny things like cross-language class inheritance become possible, etc. But: this can be impossible to achieve if the source language doesn't fit the assumptions of the target virtual machine. This is what happens with Caml and .NET (or the JVM, for that matter). Parametric polymorphism is the most well-known issue (.NET and the JVM support only object-based subtyping polymorphism, meaning that a method cannot operate both on objects and on integers or floats). Your work on building parametric polymorphism on top of .NET (an amazing feat of engineering, by the way) shows that this can be overcome. But what about other nasty features of Caml? E.g. functors, row polymorphism on objects, the Caml class system? None of these map naturally to the .NET intermediate language. I estimate that a "natural mapping" of these features would require two or three more "tour de force" of the kind you did with parametric polymorphism... (I can make these statements with some confidence, since at INRIA Bruno Pagano spent one whole year, with significant input from Luc Maranget and I, struggling with these issues.) So, when the source language isn't quite what the VM designers had in mind, the "natural mapping" doesn't work and one has to revert to encodings of data structures. For instance, integers and floats may have to be boxed (wrapped inside an object) most of the time. Source-level objects may have to be mapped to VM objects that manage themselves their own vtable of methods, bypassing that of the VM. Source-level classes map to even more complicated encodings. (I'm describing what Bruno Pagano did here.) Etc. All this can be made to work, and Bruno managed to do it. But it totally misses the point of interoperability: since your data structures are not represented "like everyone else's", conversions are again required at language boundaries, thus losing most of the benefits of the .NET approach: no cross-language sharing of objects or inheritance; need to generate the conversion code at boundary points; etc. And if one has to convert the data structures at boundaries anyway, why bother generating VM code? (And putting up with all the issues that Bruno had to deal with.) Why not just keep the existing Caml implementation, which already manages its own things quite well, thank you, and call into the JVM or the .NET machine (via a foreign-function interface) for cross-language calls? Conversion stub code still needs to be generated, but is not much more complex than if we were already in the VM, and the other problems just go away! At the very least, I think one should understand the conversion issues *before* embarking on generating VM code. The opposite approach of what we did with Bruno :-) Using the foreign-function interface approach allows to test the conversion issues first. > While at a certain level I like Xavier's approach, i.e. maintaining two > runtimes, garbage collectors etc., I have troubles seeing it scaling to the > multi-language component programming envisioned as part of .NET approach > (and indeed currently in practice with C#, C++, VB.NET and other .NET > langauges). Two GC's are already trouble enough (performance might suck as > they will both be tuned to fill up the cache), but if you have components > from 10 languages in one process? 10 GCs competing for attention? Why not? When you have 10 functions in your program, they are already competing for the attention of the cache. From a hardware standpoint, 10 GCs are just 10 functions :-) More seriously, I agree that there is certainly a performance penalty on cross-language calls. But there is a performance gain on inter-language calls: the Caml code can run at full speed, without paying the price of the data encodings into someone else's ideas of "primitive" data structures. And most of the libraries we're interested with do quite a lot of work on most calls (think GUIs, Fortran numerical libraries, or database interfaces). A serious performance investigation of the two approaches would be very interesting indeed. > Maybe it > can be made to work, but there's a certain conceptual clarity in just > accepting that a GC should form part of the computing infrastructure, and > share that service. These are the aspects of the .NET approach that I find > quite compelling. And these are the aspects that frighten me :-) I think GC's are "haute couture": tailor-made clothing, hand-fitted on the body of the customer... err, I meant, the source language. .NET is "prêt-à-porter": mass-produced, one-size-fits all clothes. It might fit many customers, but probably not all, or not all well. (At this point of the discussion, readers might find it useful to picture themselves a regular Saharian camel wearing a pair of Levis' and a Gap baggy sweatshirt, and trying to get its four hooves into a pair of brand-new Nike :-) > As an aside, I think it would be an interesting question to say "OK, let's > take it for granted that the end purpose of our language is to produce > components whose interface is expressed in terms of the Java or .NET type > systems, but which retains as many of the features and conceptual simplicity > of OCaml and ML as possible." I'm not sure exactly what you'd end up with, > but whatever it was it could be the language to take over from C# and/or > Java (if that's what you're interested in...) But without really taking > Java/.NET component building seriously right from the start I feel you're > always just going to end up with a bit of a hack - an interesting, usable > hack perhaps, but not a really _good_ language. This is one direction for research; the MLj folks, for instance, started to do something like this, lifting Java classes and objects as straightforwardly as possible into SML. However, it is also a bit frightening, as evidenced by Fabrice Le Fessant's inflammatory reply, because it means that all languages will start to look more and more the same: object-oriented, class-based, single inheritance with multiple interfaces -- Java clones, in short. Actually, we already see some evidence of this in the .NET world: C# is C++ mixed with Java; the new Visual Basic is the old Visual Basic with a solid dash of Java in it; etc. My mental picture of .NET is a vortex with Java at the center and all .NET languages spiraling around it and getting closer and closer to the center... One could argue that it is actually beneficial to the programmers that all languages share a basic model, thus interoperating well; then each language can have some added value in other areas. But I don't feel this appealing from a language designer's point of view. I feel we researchers in this area have a duty to promote more diversity. But cynics could say that we think so because we feel our jobs being threatened :-) I think I should stop here and not even mention the "Windows only" controversy around .NET. Dang! I did it! Briefly, then: the Caml party line is that we've supported Unix and Windows (and a few others) for almost 10 years now, and are commited to continue doing this in the future. If Microsoft shows similar commitments for .NET, that is great; if not, that is a real show-stopper. Apologies for this long rant. Cheers, - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr. Archives: http://caml.inria.fr