[Caml-list] Re: functors and modules (was "What is an applicative functor?")

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Lucas Dixon <lucas.dixon@gmail.com>
To: Andreas Rossberg <rossberg@mpi-sws.org>
Cc: caml-list@inria.fr
Subject: [Caml-list] Re: functors and modules (was "What is an applicative functor?")
Date: Sat, 30 Apr 2011 18:19:09 -0400	[thread overview]
Message-ID: <BANLkTin+4HiUE=0QU_8hLAWHhrOQzr572A@mail.gmail.com> (raw)
In-Reply-To: <1CD0C5CF-4339-4365-A05C-58A12EE7AE2D@mpi-sws.org>

On 29 April 2011 09:09, Andreas Rossberg <rossberg@mpi-sws.org> wrote:
>> 2. the resulting signature of a Functor should be able to refer to the
>> input signatures. This would allow multiple inheritance in a natural
>> way:   functor S2(S1) : S1 + sig val f : S1.t -> int end;
>
> I don't understand. Is S1 a signature or structure here?

I'm sorry! that was a terribly formatted, late night confusion of a
sketch of my thoughts, lets see if I can clean up this poor
presentations...

Here's a better presentation of the idea:
http://informallyabstract.blogspot.com/2011/04/typed-signature-variables-for.html

> In the former case,
> how is this different from:
>
>  functor S2(X : S1) : sig include S1 val f : X.t -> int end
>
> In the latter case, how is it different from saying (in OCaml 3.12):
>
>  module S2(S1 : SIG) : sig include module type of S1 val f : S1.t -> int end
>
> Granted, it would be much nicer with mixin composition. ;)

I'd like to treat S1 as a signature variable... see blog post linked above.

>> 3. implicit signature dependencies:
>> let S1 : SIG1 ... in
>> functor F1(S2) = ...
>> functor F2(S3) = ...
>> end;
>> =
>> functor F1(S1 and S2) = ...
>> functor F2(S1 and S3) = ...
>
> Again, I don't understand what universes your Si are in, nor what you mean
> by "implicit signature dependency" or the "and" keyword. Assuming the Si are
> module identifiers, then isn't that just a nested functor?
>
>  functor F (S1 : SIG1) =
>  struct
>    functor F1 (S2 : SIG2) = ...
>    functor F2 (S3 : SIG3) = ...
>  end
>
> which in this case is isomorphic to the use of currying:
>
>    functor F1 (S1 : SIG1) (S2 : SIG2) = ...
>    functor F2 (S1 : SIG1) (S3 : SIG3) = ...
>
> All this can be done in OCaml and in several SML implementations.

again, sorry for confusion. Yes it's a syntax like:

functor F (S1 : SIG1) =
struct
  functor F1 (S2 : SIG2) = ...
  functor F2 (S3 : SIG3) = ...
end

which is useful, but, as you note, doesn't change semantics at all.
The thing is that I'd like to be able to include files within to to
push for more flexibility in the way we consider context. e.g.

functor F (S1 : SIG1) =
struct
  use "submodule1.ML";
  use "submodule2.ML";
end;

A little thought is needed here, but it seems reasonable... I think. :)

>> 4. sharing of structures should allow be fine-grained modification,
>> and should have sensible syntax (it's not a symmetric operation, so
>> don't use a symmetric syntax: use <= instead of =):
>> e.g..
>> sharing (S1 except S1.t) <= S2
>
> Sharing constraints a la SML are an anachronism. You simply want transparent
> (aka manifest) module specifications. E.g. in OCaml (or similarly in some
> SML dialects):
>
>  sig
>    ...
>    module S1 = S2
>    ...
>  end
>
> or
>
>  S with module S1 = S2
>
> Not sure what you intend with the "except", though.

I actually think that sharing is a good idea, but it has confused
syntax. Categorically, I think the idea is to provide a syntax for
pushouts, which is a naturally compositional construction; exactly
what I think one wants for module system. When you depend on two
things, and they can have different kinds of overlaps, then you should
be able to say *how* they overlap. This is what sharing is doing, and
I think it's the right idea...

Maybe if we sketch out an example it becomes clearer...

Lets imagine defining a module that depends on two kinds of mapping
structures, both instances of MAP, as defined by something like this:

signature MAP =
sig
  type key
  type value
  type map
  ...
and

e.g. you have:
functor  new_module(
  Map1 : MAP
  Map2 : MAP
) : ...
= struct ...

then there are lots of ways which these maps may be related.
perhaps Map1.key = Map2.key, perhaps Map2.key = Map1.value, etc. I
think that the dependencies of module should be a well-defined
signature, and the question is how to express this elegantly. The idea
being that we want to be able to check at compile time if the
composition of modules will work. I want a lightweight syntax for
this, and tools support also (e.g. signature inference for modules, so
you don't have to type it out). I do think you want explicit syntax
for it, because this lets different people implement stuff
independently. It makes code, and code development compositional...
this is a nice theory, but do you also think it's a good idea? I've
sometimes wondered about the need for all this explicitness, generally
I do think it's good, as long as you can find short, easy syntax....

anyway, the idea of "except" is just to extend the expressive language
of signatures to sharing.  It's just a syntactic mechanism to express
exactly what is shared. Except lets you removing some stuff from a
structure that you want to express is shared...

e.g. writing
  sharing (Map1 except map) <= Map2
is the same as
  sharing Map1.key = Map2.key
  sharing Map1.value = Map2.value
i.e. the keys and values are the same, but the data structure
implementing the maps is different. The idea is that set-like
expressions are a convenient way to say exactly what is shared.

>> Also when types are ground, if they are the same, then it shouldn't
>> complain.
>
> It was merely a backwards
> compatibility device for SML'90 code.

Indeed.

> Yes, don't use structure sharing in SML. Ever.

haha, but I do! :)

but seriously, I like being able to say how my module inputs are
related. I like SML functors and sharing. I just don't like the
impoverished syntax I have to write it with.

Maybe it's all been sorted out already, and we're just disagreeing on
what to call it. I remember reading some of your module papers and
thinking most of the problems have been solved, and we should just
implement it in ML. So hopefully I'm not just (re)sketching stuff
you've already done...

>>> No, I meant it literally. One common idiom with functors is to re-
>>> export all parameter structures in the result, in order to have a self-
>>> contained result signature (and ease sharing later on). When you
>>> continue doing that up the entire dependency graph, then signatures
>>> can grow exponentially (in the depth of the dependency chain).
>>
>> While the number of paths to the sub-structures is exponential, the
>> number of actual signatures is not. If the implementation does a
>> reasonable job at only storing one copy when they are shared, then I
>> don't think this is a problem.
>
> The size of a signature includes nested ones. You still have to do all the
> signature matching on the full thing, which is structural and involves
> varying instantiations, so you cannot generally "cache" it.

OK, I think I see what you mean; lets take a slightly concrete example:

MAP = {  ... }
SIG2A = { Map : SIG1 ... }
SIG2B = { Map : SIG1 ... }

SIG3A = { S2a : SIG2A;  S2b : SIG2B; sharing S2a.Map = S2b.Map; ... }
SIG3B = { S2a : SIG2A;  S2b : SIG2B; sharing S2a.Mao = S2b.Map; ... }

SIG4 = { S3a : SIG3A; S3b : SIG3B; sharing S3a.S2a.Map = S3b.S2a.Map; ... }

Note that SIG4 has 4 paths to MAP, they must all have same types.

So now if we let F5(S4 : SIG4) be a functor taking an instance of a
SIG4, then we need to check all four paths to Map are implementing the
same MAP. Hence the number of checks is exponential.

Caching can help in terms of avoiding detailed checks for each case,
but there are still an exponentially number of cases. I think I agree
with this.

But! I think a better language for expressing signatures can easily fix this.
e.g. we should be able to write:

MAP = {  ... }
SIG2A = { Map : SIG1 ... }
SIG2B = { Map : SIG1 ... }
SIG3A = {
  Map : MAP;
  S2a : SIG2A where S2a.Map := Map;
  S2b : SIG2B where S2a.Map := Map;
  ... }

Having structure/signature level substitutions seems to naturally fix
this: you now check a single Map at the top level, and modify all
types in S2a to refer to those in the parent structure. This also
cleans up the many occurrences of the same structure: I'm assuming
that S2a.Map no longer exists after you write S2a.Map := ..., instead
just have the single top-level Map. Maybe ":=" is not the best
syntax... but I think you get the idea.

>> I think it is a problem if people don't specify their dependencies
>> correctly, and the language should make it easy. Providing
>> sub-structures to hold everything is not a good idea. So in some sense
>> I agree: we need the right syntax for expressing what our
>> dependencies.
>
> The alternative to nesting the full structures is just flattening out their
> relevant type members. That makes the overall signature smaller, and is
> indeed the style used in most ML functor code today. But now it is just the
> number of direct type members that grows exponentially (at least in the
> worst case), which tends to be even more annoying to write down (although I
> could imagine you often can use some idiom of packaging them up in a single
> substructure that you can then include, but that still is kind of
> unpleasant).

I think you can have an explicit, but still convenient syntax for
structure sharing/substitutions... what do you think of the suggestion
above?

>> I think you want a reasonable language for specifying dependencies,
>> and actually I think functors, with some tweaks to the signature
>> language, would be pretty good. I can't see any better than saying
>> what you need and what you provide, which is essentially all that a
>> functor does.
>
> Well, there is a difference between referring to a concrete module you need,
> just by name (a definite reference) or abstracting over the concrete module,
> by giving its complete interface specification (an indefinite reference).
> I'm just saying you want both, scoping and functorisation, depending on
> whether the modules are in the same "package" or not. Always having to give
> the full signature spec is just way too inconvenient.

But when you specify a signature, you don't have to specify everything
it has, only a super-set of what you use, I can't see how you could
get away with less and maintain checking of module compositions...
actually I'm not sure I'm understanding this point, maybe you can
expand on it a bit?

>>> I guess the underlying question is what constitutes a module? I think
>>> you have to distinguish modules (as individual abstractions) from
>>> components/libraries/packages/whatever you like to call them. The
>>> latter can consist of many modules.
>>
>> I don't see why they need to be separate... I think structures can be
>> individual modules as well as packages.
>
> A package is typically too large to be written in one source file. So you
> want to split it into several modules without having to functorise
> everything by everything. OTOH, you actually want to functorise the whole
> thing in the end, and that is something you cannot do with the language as
> is. (OCaml provides some extra-linguistic means to do the former, via the
> compiler's -pack option, but nothing for the latter.)

Yes, I agree. File structure should be flexible, and at the same time,
reflect the structure of the code. I think the deeper problem here is
that we treat file-systems as trees, when what you want is a graph.
File systems are of course really DAGs, so maybe every directory
should be a module. And within every module directory, you have a
"deps" directory with sym-links to modules it depends on. Now you'll
want some tools for visualizing and developing this, but that's not
hard... Now you can have the module structure of your code broken up
into files and/or directories, as you feel is appropriate... or
perhaps we could take the idea of being able to write "use" within a
functor definition?

best,
lucas

next prev parent reply	other threads:[~2011-04-30 22:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-28  5:06 [Caml-list] " Lucas Dixon
2011-04-29 13:09 ` [Caml-list] " Andreas Rossberg
2011-04-30 22:19   ` Lucas Dixon [this message]
2011-05-03 18:47     ` Andreas Rossberg
2011-05-11 13:11       ` Lucas Dixon
2011-05-15  9:05         ` Andreas Rossberg
2011-05-18  4:45           ` Lucas Dixon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTin+4HiUE=0QU_8hLAWHhrOQzr572A@mail.gmail.com' \
    --to=lucas.dixon@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=rossberg@mpi-sws.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).