Module hierarchy revisited

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Module hierarchy revisited
@ 1999-12-04  6:45 John Prevost
  1999-12-06 23:19 ` Gerd Stolpmann
  0 siblings, 1 reply; 4+ messages in thread
From: John Prevost @ 1999-12-04  6:45 UTC (permalink / raw)
  To: caml-list

Idea for namespace management, followed by a question or two, and
explanation of why.

I just came up with what seems like a reasonable way to package my
modules hierarchically (to avoid namespace collisions) in a reasonable
way.

The idea:

For each "package" of stuff, the various modules (individual object
files) have short names, like "Foo", "Bar", and "Baz".  The danger, of
course, is that other packages from other sources will have names like
Foo Bar and Baz, because they're so short.

My current working solution is to add as the last object file
something like this:

===mypackage.ml===
module Foo = Foo
module Bar = Bar
module Baz = Baz
==================

Why am I doing this?  Well, with Gord Stolpmann's findlib, there's a
convenient way to link against a bunch of "packages" by name.  This is
nice, because you don't have to manage ordering and -Iing of all of
those files and directories by hand.  The difficulty is that control
over which modules come first is a problem.  Say we have two packages:

package1 => package1.cma (foo.cmo frubble.cmo)
package2 => package2.cma (foo.cmo bar.cmo zotz.cmo)

If I do:

ocamlc package1.cma package2.cma

I lose access to package1's Foo module in any files I use after this.
If I do it in the other order, I lose access to package2's Foo module.
I can work around this by putting things that want foo from
package1.cma before the reference to package2.cma, but this can be a
pain.  I could also insert a file between the two which binds
package1's Foo to something else:

===foobinder.ml===
module Zot = Foo
==================

Now I can refer to Foo from either package later on (which is
important if I want to use both in one file, or I have dependencies
which don't allow a simple ordering of my modules and the libraries.)
Even more, it's a pain to have to think about this stuff.  I'd rather
just be able to know it won't happen.

What adding the extra module at the end of the library does for me as
a library author is arrange for an "automatic" binding like this to
take place.  Mypackage.Foo Mypackage.Bar and Mypackage.Baz will always
be the modules from Mypackage unless Mypackage is shadowed.  And the
namespace of packages tends to be nicer and cleaner than the namespace
of individual modules in those packages.  (Say, Text.Parser for
low-level Unicode parsers vs XML.Parser for a module that does XML
parsing.)  One could extend this further by having super-packages
which provide namespace to a number of other packages:

module Apollo =
  struct
    module XML =
      struct
        module Parser = ...
        ...
      end
    module Text =
      struct
        module Parser = ...
        ...
      end
  end

or

module Apollo =
  struct
    module XML = XML
    module Text = Text
  end

So the question I have is whether people think that organizing things
in this manner is a Good Thing, and if people have opinions on whether
there's a Right Way to go about doing this and choosing names for
things.  As an example, I have a package I call "text" which has a
text.cma containing a module Text which points at the other modules by
name.  But the name "Text" is pretty broad, and could collide easily
with other people, even when both packages would be useful.

A second question is whether anyone has recommendations for hiding the
"other" bindings of modules (i.e. I don't want Iso_10646 to appear in
the top-level namespace, I only want Text to appear, containing
Text.Iso_10646) to keep people from referring to the modules in less
safe ways.

I'm thinking about this because I'd like to put some modules out there
for people to use, and the community-driven standards in the world of
Perl, for example, allow huge numbers of modules from all over to be
mixed and matched at will.  O'Caml stuff, on the other hand, tends to
be much more willy-nilly, making me think of the world of C libraries,
where people are much more likely to write their own library to do
something than to use someone else's, just because hooking things
together and finding libraries and the like is so painful.

findlib provides some nice features along these lines (though I think
it'd be nice if some of this functionality were folded into the
standard ocaml distribution, to encourage people to use it), but
without a discipline (community-driven, of course) for managing the
published module namespace, I don't think library development is
likely to grow like it has in Perl and Java-land--even with more
people developing.

In short, it's nice that ocaml has a compilation model that allows
more traditional models of building software than SML's CM package.
I'm much more comfortable with Makefiles and separate compilation than
with CM.  But there *are* amenities that would make library creation,
publication, and use much more common.

jmp

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Module hierarchy revisited
  1999-12-04  6:45 Module hierarchy revisited John Prevost
@ 1999-12-06 23:19 ` Gerd Stolpmann
  1999-12-09  9:09   ` Sven LUTHER
  0 siblings, 1 reply; 4+ messages in thread
From: Gerd Stolpmann @ 1999-12-06 23:19 UTC (permalink / raw)
  To: John Prevost; +Cc: caml-list

On Sat, 04 Dec 1999, John Prevost wrote:
>I just came up with what seems like a reasonable way to package my
>modules hierarchically (to avoid namespace collisions) in a reasonable
>way.

I used to give modules of a package common prefixes, e.g. Mypackage_foo,
Mypackage_bar, Mypackage_baz. This is not too inconvenient because I often
program in an object-oriented way, and thus the most frequent names are method
names which need not to be qualified.

But I agree: There is a problem.

>The idea:
>
>For each "package" of stuff, the various modules (individual object
>files) have short names, like "Foo", "Bar", and "Baz".  The danger, of
>course, is that other packages from other sources will have names like
>Foo Bar and Baz, because they're so short.
>
>My current working solution is to add as the last object file
>something like this:
>
>===mypackage.ml===
>module Foo = Foo
>module Bar = Bar
>module Baz = Baz
>==================

An interesting idea, but I think it is only a workaround. As you refer to Perl,
I can imagine what you really want: Defining toplevel modules in subordinated
namespaces. Currently, a toplevel module such as foo.ml is implicitly
surrounded by a module parenthesis:

module Foo = struct "all in foo.ml" end

This could be improved by allowing that several files, now called
"mypackage.foo.ml", "mypackage.bar.ml", "mypackage.baz.ml" are implicitly
extended as in

module Mypackage =
  struct
     module Foo = struct "all in foo.ml" end
     module Bar = struct "all in bar.ml" end
     module Baz = struct "all in baz.ml" end
  end

>From outside, you MUST access the members of the modules by the full path
Mypackage.Foo.some_symbol, and this enforces that it is always clear which
module is actually referred to. For convenient people, it is possible to open
the namespace: open Mypackage - or even open Mypackage.Foo. From inside, you
can always refer to the modules by their simple names (e.g. Foo.some_symbol).

There are of course some open questions:
1) What happens if there is also mypackage.ml?
2) What is the order of the modules? 
3) How is the namespace management integrated into the compilation process?

Perhaps this could work as follows:

- Add a -namespace option to ocamlc telling that the toplevel module is located
  inside another module.

  E.g. ocamlc -namespace Mypackage -c foo.ml
  This generates an object Mypackage.Foo, and sets a flag that Mypackage is
  "mergeable".

  This could also be implicitly done by using file names with dots, e.g. 
  "mypackage.foo.ml".

  When ocamlc searches module interfaces, the dot notation is respected.

- The rest is done by the linker. The linker can now merge namespaces which are
  flagged as mergeable. This simply means that it is allowed that there are
  archive objects  with names "Mypackage.Foo", and so on, inside the archive,
  but that it is forbidden that a real module "Mypackage" exists at the same
  time (the logic: Either there are several mergeable modules with the same
  name, or there is a single non-mergeable module). 

- If such an archive is accessed, an archive object with name "Mypackage.Foo" is
  treated as if there were a module "Mypackage" containing the module "Foo".

- Mypackage is an implicit module, only intended to serve as namespace. Because
  of this it does not have an explicit signature; the signature is only known
  after all members have been compiled. This is not a big problem, but an
  additional restriction is necessary:

  module M = Mypackage

  This can be read as renaming Mypackage into M. Because Mypackage does not
  have an explicit signature, M does not have either. It is not allowed that
  M's signature becomes public (part of an interface).

  I think there is no other way of referring to the signature of implicit
  modules. 

>What adding the extra module at the end of the library does for me as
>a library author is arrange for an "automatic" binding like this to
>take place.  Mypackage.Foo Mypackage.Bar and Mypackage.Baz will always
>be the modules from Mypackage unless Mypackage is shadowed.  And the
>namespace of packages tends to be nicer and cleaner than the namespace
>of individual modules in those packages.  (Say, Text.Parser for
>low-level Unicode parsers vs XML.Parser for a module that does XML
>parsing.)  One could extend this further by having super-packages
>which provide namespace to a number of other packages:
>
>module Apollo =
>  struct
>    module XML =
>      struct
>        module Parser = ...
>        ...
>      end
>    module Text =
>      struct
>        module Parser = ...
>        ...
>      end
>  end
>
>or
>
>module Apollo =
>  struct
>    module XML = XML
>    module Text = Text
>  end
>

We can go one step further. Currently we have only relative module paths, more
exactly, relative to one of the parent modules. I think it would be nice to
also have an absolute path:

Let Universe be a reserved module name, denoting the *single* toplevel
(namespace) module. If I define a module M outside any other module, it becomes
implicitly a member of Universe. As Universe is reserved, it is not allowed to
call any other module Universe, too. Then I can refer to every module in any
circumstances by beginning the module path with Universe (e.g.
Universe.M.N...)

>So the question I have is whether people think that organizing things
>in this manner is a Good Thing, and if people have opinions on whether
>there's a Right Way to go about doing this and choosing names for
>things.  As an example, I have a package I call "text" which has a
>text.cma containing a module Text which points at the other modules by
>name.  But the name "Text" is pretty broad, and could collide easily
>with other people, even when both packages would be useful.

I think package names should be less generic. For example, a short identifier
for the project, or the author's initials could be one part of the name, as in
jp_text. This makes it much more unlikely that name clashes occur.

>A second question is whether anyone has recommendations for hiding the
>"other" bindings of modules (i.e. I don't want Iso_10646 to appear in
>the top-level namespace, I only want Text to appear, containing
>Text.Iso_10646) to keep people from referring to the modules in less
>safe ways.

See above.

>
>I'm thinking about this because I'd like to put some modules out there
>for people to use, and the community-driven standards in the world of
>Perl, for example, allow huge numbers of modules from all over to be
>mixed and matched at will.  O'Caml stuff, on the other hand, tends to
>be much more willy-nilly, making me think of the world of C libraries,
>where people are much more likely to write their own library to do
>something than to use someone else's, just because hooking things
>together and finding libraries and the like is so painful.

I agree, and that was my motivation to write findlib.

>findlib provides some nice features along these lines (though I think
>it'd be nice if some of this functionality were folded into the
>standard ocaml distribution, to encourage people to use it), but
>without a discipline (community-driven, of course) for managing the
>published module namespace, I don't think library development is
>likely to grow like it has in Perl and Java-land--even with more
>people developing.

Yes, findlib would better be part of the ocaml distribution. It currently has a
very liberal license (allowing almost everything), and it is no problem to
integrate it. Of course, I would like to see a notice that I contributed it to
the distribution.

A simple way would be that I put it into the "usercontribs" tree of the CVS
repository, and that this tree is distributed, too. Note that there is already
software in the distribution not written by INRIA, namely the GNU regex library.

Gerd
--
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   Gerd.Stolpmann@darmstadt.netsurf.de (privat)
Germany                     
----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Module hierarchy revisited
  1999-12-06 23:19 ` Gerd Stolpmann
@ 1999-12-09  9:09   ` Sven LUTHER
  1999-12-09 22:17     ` Christian Lindig
  0 siblings, 1 reply; 4+ messages in thread
From: Sven LUTHER @ 1999-12-09  9:09 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: John Prevost, caml-list

On Tue, Dec 07, 1999 at 12:19:31AM +0100, Gerd Stolpmann wrote:
> On Sat, 04 Dec 1999, John Prevost wrote:
> >I just came up with what seems like a reasonable way to package my
> >modules hierarchically (to avoid namespace collisions) in a reasonable
> >way.
> 
> I used to give modules of a package common prefixes, e.g. Mypackage_foo,
> Mypackage_bar, Mypackage_baz. This is not too inconvenient because I often
> program in an object-oriented way, and thus the most frequent names are method
> names which need not to be qualified.
> 
> But I agree: There is a problem.

... skipped ...

Another place where this would be very usefull is the following :

Actually there are two installation schemes for ocaml packages :

 * Some install their stuff in a subdirectory of /usr/.../lib/ocaml. and you
   have to tell the system that you are using this directory. One example of
   this is the ocamltk package i think.

 * Others simply put their stuff in /usr/.../lib/ocaml. This is a problem,
   because it can produce name clashes, but modules put there are directly
   accesible without further work.

One advantage of the first approach was that you can then easily remove all
stuff from said package by removing the subdirectory. This was a concenr in
the past, but nowadays, with propper packaging support this is no more a
problem.

Also there is still a name clash in the first way of doing things, if there is
the same module in the ocamllib directory, and in the package.

With directory as modules support we use the cleaner first aproach, as well as
avoiding any name clashes.

This will become more and more a concern as ocaml support grows larger.

Any chance to see something like this in a next release ?

Anyone willing to write a patch to test this (now that ocaml if free software)

Friendly,

Sven LUTHER

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Module hierarchy revisited
  1999-12-09  9:09   ` Sven LUTHER
@ 1999-12-09 22:17     ` Christian Lindig
  0 siblings, 0 replies; 4+ messages in thread
From: Christian Lindig @ 1999-12-09 22:17 UTC (permalink / raw)
  To: caml-list

> With directory as modules support we use the cleaner first aproach,
> as well as avoiding any name clashes.
> 
> This will become more and more a concern as ocaml support grows
> larger.

I would like to support the proposed mapping of module hierarchies to
directory hierarchies.  It allows for a nice integration of many of
the existing caml code (like pcre, Chris Okasaki's data structures,
XLib to name just a few) into a contrib hierarchy. 

Let's learn from the Perl, Java, and Python communities which managed
to build a very large code base for their respective system. 
Maintenance of such a code base could (and probably would) be
independent from the OCaml compiler and its standard library once the
top level modules are fixed. 

To summarize: I'd like to see such a feature in OCaml 3.

-- Christian

-- 
 Christian Lindig       Gaertner Datensysteme GbR, Braunschweig,  Germany 
                        http://www.gaertner.de/~lindig lindig@gaertner.de
                        phone: +49 531 233 55 55   fax: +49 531 233 55 56 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1999-12-10  7:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-04  6:45 Module hierarchy revisited John Prevost
1999-12-06 23:19 ` Gerd Stolpmann
1999-12-09  9:09   ` Sven LUTHER
1999-12-09 22:17     ` Christian Lindig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).