From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA20664 for caml-red; Wed, 10 Jan 2001 09:31:53 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA13169 for ; Tue, 9 Jan 2001 19:02:59 +0100 (MET) Received: from mrwall.kal.com (mrwall.kal.com [194.193.14.236]) by nez-perce.inria.fr (8.11.1/8.10.0) with SMTP id f09I2wL24402; Tue, 9 Jan 2001 19:02:58 +0100 (MET) Received: from mrwall.kal.com [194.193.14.236] (HELO localhost) by mrwall.kal.com (AltaVista Mail V2.0J/2.0J BL25J listener) id 0000_0045_3a5b_52bb_9015; Tue, 09 Jan 2001 18:04:43 +0000 Received: from somewhere by smtpxd Message-ID: <3145774E67D8D111BE6E00C0DF418B663AA6FB@nt.kal.com> From: Dave Berry To: Xavier Leroy , Charles Martin Cc: caml-list@inria.fr Subject: RE: Module hierarchies Date: Tue, 9 Jan 2001 18:06:36 -0000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.0.1460.8) Content-Type: text/plain; charset="iso-8859-1" Sender: weis@pauillac.inria.fr I think you have to step back and ask what you're trying to achieve. In particular, there are two related issues. One is nested namespaces, which is obviously desirable. The second is units of delivery, by which I mean DLLs, Components, Libraries, Executables, object files, or groups of these. As a generic term, I'll call these "deliverables". (I'd like to use "packages", but this would cause confusion with Java and Ada -- neither of which actually package the results of compiling a so-called "package"). I'll use an example with which I'm more familiar: the SML Basis Library. This contains several modules, some of which are grouped in various ways. One such grouping is the IO modules; these are designed so that you can use the rest of the library without having to use the IO modules, e.g. in case you have your own IO subsystem. So we want to be able to load a subset of the full library. Another grouping is the OS modules. These are nested under a main OS module, so you have OS.Path, OS.FileSys, etc. Above this it might be useful to have a Basis namespace, to distinguish Basis.List from anybody else's List structure. (The actual Basis Library doesn't specify such a module). To add further complication, some entries in the Basis Library are optional. E.g. an implementation can have any non-zero number of Int modules, for arbitrary . So you can't write a signature that describes the Basis library! So, how should this be packaged, and how should it be referenced? You don't necessarily want to distribute the library as a single DLL, because you might want to use only some parts of it. But you still want a nested namespace. So it's a bad idea to require that a namespace refers to a single compiled object. This is effectively what SML does -- a namespace is a structure, and if one part of a structure is available then all parts are (although an implementation might do something clever with delayed loading). You do want some way of distributing the library as a small number of files. Distributing large numbers of object files is a pain, especially if they have to be uninstalled in certain places. Archive files (ar, jar) are one solution. You might also have a library that you want to distribute as a DLL. Then you need to express dependencies. I think it's useful to be able to do this at the library level -- e.g. to say that application A depends on Basis.IO, and application B depends on Basis.* . You don't want to have to list each file separately. Possibly this can be done by command-line arguments, e.g. compile -I Basis\IO ? So there are lots of issues to consider. At Harlequin we wrote several project management systems for MLWorks, and a couple more for Dylan. They all involved difficult design decisions. My current belief is that you need a separate notion of library, being a namespace that corresponds to a deliverable, with some notion of dependencies at the library level, as well as dependencies between the modules in a library. Dave. -----Original Message----- From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr] Sent: Monday, January 08, 2001 10:24 To: Charles Martin Cc: caml-list@inria.fr Subject: Re: Module hierarchies > An alternative is to adopt the Java convention, in which a module such a > engine/graphics/texture/manager.ml > is automagically mapped to Engine.Graphics.Texture.Manager. Yes, this has been suggested already on this list. Problem number one is, as you said: > The difficulty > now is what to do about the file/module > engine/graphics/texture.ml <=> Engine.Graphics.Texture > It seems to me the easiest solution is to assume that a directory/file > layout has the semantics of a single file in which the modules are > catenated in depth-first order. This is one solution, but this ordering for submodules is somehow arbitrary. More pragmatically, it seems very hard (in the current implementation) to maintain the correspondence between a directory and a structure with sub-modules corresponding to the directory elements. An alternative solution (suggested by Judicaël Courant some time ago) would be to have a new command that groups together several separately-compiled modules into one module having the original modules as sub-structures. E.g. ocamlnewmagiccommand -o lib.cmo a.cmo b.cmo c.cmo would generate lib.cmo and lib.cmi files equivalent to the following source code for lib.ml: module A = struct (* contents of a.ml *) end module B = struct (* contents of b.ml *) end module C = struct (* contents of c.ml *) end In other terms, while the current OCaml library archive files (.cma files) generated by "ocamlc -a" are "flat" and introduce no additional structuring, the new command would do both library archiving and introducing of a layer of structuring. Of course, the order of .cmo files on the command line would determine the order of the sub-modules, thus relieving the compiler from guessing this order. (As an aside, it is interesting to note that the Linux kernel sources -- a large source tree indeed -- uses "ld -r" in subdirectories to group together the object files for each subdirectory in one easy to manipulate .o file. This is kind of the same idea, except that of course C's namespace is flat, so no additional structuring is introduced.) I still have no idea how hard it is to implement Judicaël's scheme, though. Coming back to the general problem of structuring a large OCaml project, my experience with the OCaml compiler itself is that the solution based on a flat module namespace + subdirectories to partition the files + a big Makefile at the top works out quite well for projects of about 100 KLOC, and could scale up some more, although perhaps not to 1 MLOC. In particular, one big Makefile is a lot easier to maintain than a zillion tiny recursive Makefiles. When comparing with Java, you have to keep in mind that Java source files are smaller and more numerous than Caml source files, since the latter can contain several classes as well as submodules. (Not to mention that a 10-line OCaml datatype declaration is roughly equivalent to 11 Java classes, each in its own file...) So, the need to break up a Java project into several packages appears earlier than the need to break up a Caml project into several directories. Still, I'd be very interested to know how others "do it" with large OCaml projects. Happy new year, - Xavier Leroy