caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* RE: Module hierarchies
@ 2001-01-09 16:46 Dave Berry
  2001-01-10  9:40 ` Markus Mottl
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Berry @ 2001-01-09 16:46 UTC (permalink / raw)
  To: Michael Hicks, caml-list

That's a really good article about make.  Not just for recursive make, but
also for general advice on speeding up any makefile.  I use #include and .d
files, but I wasn't aware of the difference between := and =, and I didn't
realise that make did so much lazy string processing.  This is a case where
eager evaluation wins hands down!

I don't agree with the final section though.  It's trying to make the
"single makefile" solution fit all projects.  I don't think this holds for
the case where sub-projects are producing separate deliverables, such as
DLLs, ActiveX controls or JavaBeans.  In such cases you want to minimise (or
eliminate) dependencies from one component to the internals of another, and
separate makefiles look the best way to go.

Dave. 


-----Original Message-----
From: Michael Hicks [mailto:mwh@dsl.cis.upenn.edu]
Sent: Sunday, January 07, 2001 16:08
To: Charles Martin; caml-list@inria.fr
Subject: RE: Module hierarchies

see "Recursive Make considered Harmful"
(http://www.pcug.org.au/~millerp/rmch/recu-make-cons-harm.html) for more on
this, and good suggestions for structuring large projects.



^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: Module hierarchies
@ 2001-01-11 12:53 Dave Berry
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Berry @ 2001-01-11 12:53 UTC (permalink / raw)
  To: gerd, Dave Berry, Xavier Leroy, Charles Martin; +Cc: caml-list

The namespace aspect is not so straightforward when you have parameterised
modules.  Suppose we adopted a Java-like global namespace, with subspaces
for each organisation, then for each project, etc.  What would it mean to
parameterise a namespace on another?  You don't necessarily want an entry in
the namespace hierarchy to refer to a complete semantic entity, but of
course this is at the heart of ML module systems.

Dave.

-----Original Message-----
From: Gerd Stolpmann [mailto:gerd@gerd-stolpmann.de]
Sent: Wednesday, January 10, 2001 20:12
To: Dave Berry; Xavier Leroy; Charles Martin
Cc: caml-list@inria.fr
Subject: RE: Module hierarchies

Your example shows that there are several aspects:

- How to statically refer to a module at compile time = namespace aspect
- How to separate the code base into separately linkable/loadable parts
  = library aspect
- How to use/initialize the code, i.e. there may be fragments of libraries
  which can be used independently of the rest of the library

In O'Caml, we have currently:

- A library (always in archive format) consists of several toplevel
  modules
- When linking the executable, only the actually referred modules of the
  libraries are included and initialized

It is very simple to use libraries, but it might be difficult to produce
them,
because unfortunately the toplevel modules of the library are the
compilation units when the library is created. For example, if I want to
make a library with, say, IO.File, IO.Net, and IO.Base, I have only one
compilation unit IO, although it is desirable to have thee such units which
are
linked together later.

>[...]
>So, how should this be packaged, and how should it be referenced?  You
don't
>necessarily want to distribute the library as a single DLL, because you
>might want to use only some parts of it.  But you still want a nested
>namespace.  So it's a bad idea to require that a namespace refers to a
>single compiled object.   This is effectively what SML does -- a namespace
>is a structure, and if one part of a structure is available then all parts
>are (although an implementation might do something clever with delayed
>loading).

DLLs are more problematic than static libraries when only parts of the
libraries are referred to, because it is more difficult to initialize a DLL
only partly. (But I can still deliver the library as single DLL if the
operating system has a mechanism that loads only the needed parts.)

>You do want some way of distributing the library as a small number of
files.
>Distributing large numbers of object files is a pain, especially if they
>have to be uninstalled in certain places.  Archive files (ar, jar) are one
>solution.  You might also have a library that you want to distribute as a
>DLL. 
>
>Then you need to express dependencies.  I think it's useful to be able to
do
>this at the library level -- e.g. to say that application A depends on
>Basis.IO, and application B depends on Basis.* .  You don't want to have to
>list each file separately.  Possibly this can be done by command-line
>arguments, e.g. compile -I Basis\IO ?

This is what I implemented in findlib, my package manager for O'Caml. It
focuses on the delivery problems of libraries: Where is the library
installed?
Which archives belong to a library? Which dependencies exist?

However, findlib is restricted because it only manages the details of the
command line arguments of the O'Caml compiler. So it can't find out
namespace
conflicts, nor it can rearrange namespaces.

The findlib URL: 
http://www.ocaml-programming.de/packages/documentation/findlib/

>So there are lots of issues to consider.  At Harlequin we wrote several
>project management systems for MLWorks, and a couple more for Dylan.  They
>all involved difficult design decisions.   My current belief is that you
>need a separate notion of library, being a namespace that corresponds to a
>deliverable, with some notion of dependencies at the library level, as well
>as dependencies between the modules in a library.

I think that namespaces and libraries are simply independent notions. The
current O'Caml implementation does this right when libraries are used; i.e.
you
can form a library from arbitrary modules. What is missing is the
possibility
to reorganize namespaces when libraries are created.

Of course, DLLs are missing, too. 

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: Module hierarchies
@ 2001-01-09 18:06 Dave Berry
  2001-01-10 20:12 ` Gerd Stolpmann
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Berry @ 2001-01-09 18:06 UTC (permalink / raw)
  To: Xavier Leroy, Charles Martin; +Cc: caml-list

I think you have to step back and ask what you're trying to achieve.  In
particular, there are two related issues.  One is nested namespaces, which
is obviously desirable. The second is units of delivery, by which I mean
DLLs, Components, Libraries, Executables, object files, or groups of these.
As a generic term, I'll call these "deliverables".  (I'd like to use
"packages", but this would cause confusion with Java and Ada -- neither of
which actually package the results of compiling a so-called "package").

I'll use an example with which I'm more familiar: the SML Basis Library.
This contains several modules, some of which are grouped in various ways.
One such grouping is the IO modules; these are designed so that you can use
the rest of the library without having to use the IO modules, e.g. in case
you have your own IO subsystem.  So we want to be able to load a subset of
the full library.

Another grouping is the OS modules.  These are nested under a main OS
module, so you have OS.Path, OS.FileSys, etc. 

Above this it might be useful to have a Basis namespace, to distinguish
Basis.List from anybody else's List structure.  (The actual Basis Library
doesn't specify such a module).

To add further complication, some entries in the Basis Library are optional.
E.g. an implementation can have any non-zero number of Int<N> modules, for
arbitrary <N>.  So you can't write a signature that describes the Basis
library!

So, how should this be packaged, and how should it be referenced?  You don't
necessarily want to distribute the library as a single DLL, because you
might want to use only some parts of it.  But you still want a nested
namespace.  So it's a bad idea to require that a namespace refers to a
single compiled object.   This is effectively what SML does -- a namespace
is a structure, and if one part of a structure is available then all parts
are (although an implementation might do something clever with delayed
loading).

You do want some way of distributing the library as a small number of files.
Distributing large numbers of object files is a pain, especially if they
have to be uninstalled in certain places.  Archive files (ar, jar) are one
solution.  You might also have a library that you want to distribute as a
DLL. 

Then you need to express dependencies.  I think it's useful to be able to do
this at the library level -- e.g. to say that application A depends on
Basis.IO, and application B depends on Basis.* .  You don't want to have to
list each file separately.  Possibly this can be done by command-line
arguments, e.g. compile -I Basis\IO ?

So there are lots of issues to consider.  At Harlequin we wrote several
project management systems for MLWorks, and a couple more for Dylan.  They
all involved difficult design decisions.   My current belief is that you
need a separate notion of library, being a namespace that corresponds to a
deliverable, with some notion of dependencies at the library level, as well
as dependencies between the modules in a library.

Dave.


-----Original Message-----
From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr]
Sent: Monday, January 08, 2001 10:24
To: Charles Martin
Cc: caml-list@inria.fr
Subject: Re: Module hierarchies


> An alternative is to adopt the Java convention, in which a module such a
>      engine/graphics/texture/manager.ml
> is automagically mapped to Engine.Graphics.Texture.Manager.

Yes, this has been suggested already on this list.  Problem number one
is, as you said:

> The difficulty
> now is what to do about the file/module
>      engine/graphics/texture.ml <=> Engine.Graphics.Texture
> It seems to me the easiest solution is to assume that a directory/file
> layout has the semantics of a single file in which the modules are
> catenated in depth-first order.

This is one solution, but this ordering for submodules is somehow
arbitrary.  More pragmatically, it seems very hard (in the current
implementation) to maintain the correspondence between a directory and
a structure with sub-modules corresponding to the directory elements.

An alternative solution (suggested by Judicaël Courant some time ago)
would be to have a new command that groups together several
separately-compiled modules into one module having the original
modules as sub-structures.  E.g.

        ocamlnewmagiccommand -o lib.cmo a.cmo b.cmo c.cmo

would generate lib.cmo and lib.cmi files equivalent to the following
source code for lib.ml:

        module A = struct (* contents of a.ml *) end
        module B = struct (* contents of b.ml *) end
        module C = struct (* contents of c.ml *) end

In other terms, while the current OCaml library archive files (.cma files)
generated by "ocamlc -a" are "flat" and introduce no additional
structuring, the new command would do both library archiving and
introducing of a layer of structuring.

Of course, the order of .cmo files on the command line would determine
the order of the sub-modules, thus relieving the compiler from
guessing this order.

(As an aside, it is interesting to note that the Linux kernel sources
-- a large source tree indeed -- uses "ld -r" in subdirectories to group
together the object files for each subdirectory in one easy to
manipulate .o file.  This is kind of the same idea, except that of
course C's namespace is flat, so no additional structuring is introduced.)

I still have no idea how hard it is to implement Judicaël's scheme,
though.

Coming back to the general problem of structuring a large OCaml
project, my experience with the OCaml compiler itself is that the
solution based on a flat module namespace + subdirectories to
partition the files + a big Makefile at the top works out quite well
for projects of about 100 KLOC, and could scale up some more, although
perhaps not to 1 MLOC.  In particular, one big Makefile is a lot
easier to maintain than a zillion tiny recursive Makefiles.

When comparing with Java, you have to keep in mind that Java source
files are smaller and more numerous than Caml source files, since the
latter can contain several classes as well as submodules.  (Not to
mention that a 10-line OCaml datatype declaration is roughly
equivalent to 11 Java classes, each in its own file...)  So, the need
to break up a Java project into several packages appears earlier than
the need to break up a Caml project into several directories.

Still, I'd be very interested to know how others "do it" with large
OCaml projects.

Happy new year,

- Xavier Leroy



^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: Module hierarchies
@ 2001-01-09 17:34 Daniel Ortmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Ortmann @ 2001-01-09 17:34 UTC (permalink / raw)
  To: John Max Skaller; +Cc: Michael Hicks, Charles Martin, caml-list



>>> I am wondering how a large OCaml project might be structured,
>>> specifically in terms of directories and files.

> My solution is simple enough: I use a literate programming tool
> (interscript, see my sig below) to _generate_ all the Ocaml files in a
> scratch directory.

Unfortunately, since we emacs cyborgs have already been assimilated, we
must interface through a literate-caml-mode.  Where?

:-(

> So the structure of the program is defined in terms of the LP source
> files, which need to be related to the generated files.  (It usually
> is though :-)

> The only problem I have with this is that ocamllex/yacc do not respect
> #line directives.

> --
> John (Max) Skaller, mailto:skaller@maxtal.com.au
> 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
> checkout Vyper http://Vyper.sourceforge.net
> download Interscript http://Interscript.sourceforge.net

--
Daniel "3rd of 5" Ortmann, IBM Circuit Technology, Rochester, MN




^ permalink raw reply	[flat|nested] 13+ messages in thread
* Module hierarchies
@ 2001-01-06 19:32 Charles Martin
  2001-01-07 10:10 ` Mattias Waldau
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Charles Martin @ 2001-01-06 19:32 UTC (permalink / raw)
  To: caml-list

I am wondering how a large OCaml project might be structured, specifically
in terms of directories and files.  I see that the OCaml compiler is
structured as a one-level collection of directories, with the Makefile
drawing them all together.  There's no hierarchy here; the directory
structure is used to partition major components of the system.

I'm used to thinking about breaking large projects down into smaller and
smaller hierarchical subcomponents.  I think the Java mapping between
package structure and directory structure helps, and I wonder whether
something similar could be useful for OCaml?

I can see a natural breakdown for large projects into nested module
hierarchies.  For example, a graphics engine might have modules such as:

     Engine.Graphics.Texture.Manager...

The only way to have such a structure currently would be to put your entire
engine in a single file, which is unrealistic for the programmers and
probably the compiler:

     $ cat engine.ml
     module Graphics = struct ...
       module Texture = struct ...
         module Manager = struct ...

A previous message implies that there will be an "#include" directive in
the next OCaml release.  This would allow you to break up what the compiler
sees as a single file into multiple files for the programmers:

     $ cat engine.ml
     #include "engine/graphics.ml"
     $ cat engine/graphics.ml
     #include "graphics/texture.ml"
     $ cat engine/graphics/texture.ml
     #include "texture/manager.ml"

This would allow you to have a directory and file structure that mirrored
the module structure.  It has the drawback that the compiler still sees
only a single file, which for a large project might doom compile times.

An alternative is to adopt the Java convention, in which a module such a

     engine/graphics/texture/manager.ml

is automagically mapped to Engine.Graphics.Texture.Manager.  The difficulty
now is what to do about the file/module

     engine/graphics/texture.ml <=> Engine.Graphics.Texture

It seems to me the easiest solution is to assume that a directory/file
layout has the semantics of a single file in which the modules are
catenated in depth-first order.  For example, the directory structure:

     a/b/c.ml
     a/b/d.ml
     a/b.ml
     a/e/f.ml
     a/e.ml
     a.ml

would have the semantics of a single file (also named a.ml):

     module B = struct
       module C = struct
         (* contents of c.ml *)
         end
       module D = struct
         (* contents of d.ml *)
         end
       (* contents of b.ml *)
     end
     module E = struct
       module F = struct
         (* contents of f.ml *)
         end
       (* contents of e.ml *)
     end
     (* contents of a.ml *)

Because of the depth-first ordering, the compilation of any individual
module does not depend upon the contents of a module placed higher in the
hierarchy.  Thus, they can be separately compiled.

I think this would be helpful for structuring large projects.  It would
also make it easier to incorporate third-party utilities, since we could
then adopt Java-style naming conventions: the distribution modules would
all be wrapped inside Ocaml, and everything else would use the inverted
domain name convention:

     Ocaml.Pervasives        Pervasives library, comes with the distribution
     Ocaml.Bigarray          Bigarray library, comes with the distribution
     Fr.Inria.Caml.Com       CamlIDL run time library for COM components
     Fr.Ens.Frisch.Getopt    Alan Frisch's parsing of command line args

I'm interested in any feedback on this idea.

Charles



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-01-11 17:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-09 16:46 Module hierarchies Dave Berry
2001-01-10  9:40 ` Markus Mottl
  -- strict thread matches above, loose matches on Subject: below --
2001-01-11 12:53 Dave Berry
2001-01-09 18:06 Dave Berry
2001-01-10 20:12 ` Gerd Stolpmann
2001-01-09 17:34 Daniel Ortmann
2001-01-06 19:32 Charles Martin
2001-01-07 10:10 ` Mattias Waldau
2001-01-07 16:07 ` Michael Hicks
2001-01-09  8:03   ` John Max Skaller
2001-01-07 20:37 ` Vitaly Lugovsky
2001-01-08 10:24 ` Xavier Leroy
2001-01-08 14:01   ` Judicael Courant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).