[Caml-list] why is building ocaml hard?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] why is building ocaml hard?
@ 2016-07-10  4:16 Martin DeMello
  2016-07-10 11:03 ` Gerd Stolpmann
  0 siblings, 1 reply; 11+ messages in thread
From: Martin DeMello @ 2016-07-10  4:16 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

My consistent experience with OCaml has been that the build systems are
fiddly and hard to work with, but I've never seen a discussion of why this
is so (as opposed to problems with specific build tools). Supposing you
were to start from scratch and develop a new build system in a bottom up
manner, starting with a set of libraries and utilities and working your way
up to a framework or dsl, what would the difficult steps be?

martin

[-- Attachment #2: Type: text/html, Size: 605 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-10  4:16 [Caml-list] why is building ocaml hard? Martin DeMello
@ 2016-07-10 11:03 ` Gerd Stolpmann
  2016-07-10 11:33   ` [Caml-list] Is there an efficient precise ocamldep - Was: " Gerd Stolpmann
                     ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Gerd Stolpmann @ 2016-07-10 11:03 UTC (permalink / raw)
  To: Martin DeMello; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3874 bytes --]

Am Samstag, den 09.07.2016, 21:16 -0700 schrieb Martin DeMello:
> My consistent experience with OCaml has been that the build systems
> are fiddly and hard to work with, but I've never seen a discussion of
> why this is so (as opposed to problems with specific build tools).
> Supposing you were to start from scratch and develop a new build
> system in a bottom up manner, starting with a set of libraries and
> utilities and working your way up to a framework or dsl, what would
> the difficult steps be?

Well, I don't think you can improve it that much (except... see below).
Since I've taken over omake and studied it carefully, I think I have
good insights into the problems.

In short, the big difficulty of OCaml is the strict build topology. You
need to build a module before the caller of the module. Most build-tool
related failures come from that. Note that many other languages have
more relaxed build topologies, or work around the problem by doing
2-pass builds (i.e. you first pre-compile and extract interfaces for the
whole project, and do the actual code generation in the second pass).

Let's have a closer look why it is relatively error-prone to extract the
dependencies. The tool in question is ocamldep. It is fairly dumb in so
far it is only parsing the source code, and then looks at all
module-related constructs (open, include, module, etc.). Because it
never looks into already compiled interfaces and also proceeds file by
file, it may sometimes emit wrong dependency information. For example,
when there is

open M1
open M2

at the beginning of a file, ocamldep doesn't know whether M2 is another
top-level module, or whether it is a submodule of M1. ocamldep normally
errs on the side of generating too many dependencies, which is then
tried to be corrected by only accepting those deps corresponding to
existing files. In this example, this would mean that a dependency to M2
is emitted when there is a file M2.ml. Note that this is wrong when M2
is actually a submodule of M1 AND the file M2.ml exists.

So how to fix this? In my opinion there are two solutions. You can
either have a more intelligent ocamldep (e.g. one that reads in
non-local cmi files and uses that information and also tries to
interpret all project ml files at once and not file by file - btw, did
anybody check whether there is an algorithm that precisely solves the
problem?). The other solution path is to mark toplevel modules in the
syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
module).

Besides ocamldep, there are also other aspects that affect the
dependency analysis. E.g. with omake there is a distinction of
project-local and other dependencies, and you need to set the
OCAMLINCLUDES variable to add other directories to the local part of the
analysis, whereas the non-local deps are nowadays handled with
ocamlfind. First of all, this distinction is not really clear to every
user, and second, there are some difficulties in processing that when
both concepts overlap (e.g. you want to also get project-local
dependency expansion with ocamlfind).

Note that recently the dependency analysis even became harder because of
flambda. Now cmx files play a bigger role. In particular a cmx file can
refer to another cmx file that isn't a direct dependency.  In other
words, there is a second kind of dependency that is added by the code
generator. Current build tools cannot record these dependencies yet.

Gerd

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Caml-list] Is there an efficient precise ocamldep - Was: why is building ocaml hard?
  2016-07-10 11:03 ` Gerd Stolpmann
@ 2016-07-10 11:33   ` Gerd Stolpmann
  2016-07-10 11:51     ` Petter A. Urkedal
  2016-07-10 22:41   ` [Caml-list] " Tom Ridge
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Gerd Stolpmann @ 2016-07-10 11:33 UTC (permalink / raw)
  To: Martin DeMello; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3402 bytes --]

Rephrasing this question. Given we wanted to develop an improved version
of ocamldep that (a) reads in cmi files from non-project-local
directories, and (b) processes all files of the project as a whole, and
(c) outputs dependencies precisely. Is there an algorithm that is better
than

- for all permutations p of the project files:
  - env := <modules defined in the non-project-local cmi files>
  - for all files f of p:
    - localenv := []
    - AST := parse f
    - interpret the module calculus of the AST, taking env
      (the toplevel modules) and localenv (the local module
      scope) into account, with these details:
       * if there is an unknown module identifier this is an
         error, and we go on with the next permutation
       * any definition of a module modifies localenv
       * the "open" directive is interpreted strictly using
         env and localenv
    - no error yet:
      - env := env + (f -> localenv)
        (i.e. add the module corresponding to the file to env)
 - if there is a permutation p that doesn't run into an error,
   re-run the dependency analyzer for p and output the deps
   between the (now unambiguously identified) toplevel modules
   (the keys of env)

I think this algorithm is a precise solution to the ocamldep problem
(well, I did not take mli files into account, but that shouldn't be too
difficult to add). However, it is horribly inefficient. Is there
anything better?

Gerd


Am Sonntag, den 10.07.2016, 13:03 +0200 schrieb Gerd Stolpmann:
> Let's have a closer look why it is relatively error-prone to extract the
> dependencies. The tool in question is ocamldep. It is fairly dumb in so
> far it is only parsing the source code, and then looks at all
> module-related constructs (open, include, module, etc.). Because it
> never looks into already compiled interfaces and also proceeds file by
> file, it may sometimes emit wrong dependency information. For example,
> when there is
> 
> open M1
> open M2
> 
> at the beginning of a file, ocamldep doesn't know whether M2 is another
> top-level module, or whether it is a submodule of M1. ocamldep normally
> errs on the side of generating too many dependencies, which is then
> tried to be corrected by only accepting those deps corresponding to
> existing files. In this example, this would mean that a dependency to M2
> is emitted when there is a file M2.ml. Note that this is wrong when M2
> is actually a submodule of M1 AND the file M2.ml exists.
> 
> So how to fix this? In my opinion there are two solutions. You can
> either have a more intelligent ocamldep (e.g. one that reads in
> non-local cmi files and uses that information and also tries to
> interpret all project ml files at once and not file by file - btw, did
> anybody check whether there is an algorithm that precisely solves the
> problem?). The other solution path is to mark toplevel modules in the
> syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
> module).
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Is there an efficient precise ocamldep - Was: why is building ocaml hard?
  2016-07-10 11:33   ` [Caml-list] Is there an efficient precise ocamldep - Was: " Gerd Stolpmann
@ 2016-07-10 11:51     ` Petter A. Urkedal
  0 siblings, 0 replies; 11+ messages in thread
From: Petter A. Urkedal @ 2016-07-10 11:51 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Martin DeMello, caml-list

On 10 July 2016 at 13:33, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
> Rephrasing this question. Given we wanted to develop an improved version
> of ocamldep that (a) reads in cmi files from non-project-local
> directories, and (b) processes all files of the project as a whole, and
> (c) outputs dependencies precisely. Is there an algorithm that is better
> than
>
> - for all permutations p of the project files:
>   - env := <modules defined in the non-project-local cmi files>
>   - for all files f of p:
>     - localenv := []
>     - AST := parse f
>     - interpret the module calculus of the AST, taking env
>       (the toplevel modules) and localenv (the local module
>       scope) into account, with these details:
>        * if there is an unknown module identifier this is an
>          error, and we go on with the next permutation
>        * any definition of a module modifies localenv
>        * the "open" directive is interpreted strictly using
>          env and localenv
>     - no error yet:
>       - env := env + (f -> localenv)
>         (i.e. add the module corresponding to the file to env)
>  - if there is a permutation p that doesn't run into an error,
>    re-run the dependency analyzer for p and output the deps
>    between the (now unambiguously identified) toplevel modules
>    (the keys of env)
>
> I think this algorithm is a precise solution to the ocamldep problem
> (well, I did not take mli files into account, but that shouldn't be too
> difficult to add). However, it is horribly inefficient. Is there
> anything better?

I think at least we can reduce upper bound for number of iteration
from n! to n(n-1)/2 by observing that once we find a module which is
successfully analysed, we can keep it for the next iteration:  The a
pass over n local files will either succeed in finding a module which
is successfully analysed, leaving n - 1 to be analysed, or fail, in
which case the dependencies are unsatifiable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-10 11:03 ` Gerd Stolpmann
  2016-07-10 11:33   ` [Caml-list] Is there an efficient precise ocamldep - Was: " Gerd Stolpmann
@ 2016-07-10 22:41   ` Tom Ridge
  2016-07-11  6:15   ` Martin DeMello
  2016-07-12  8:18   ` Goswin von Brederlow
  3 siblings, 0 replies; 11+ messages in thread
From: Tom Ridge @ 2016-07-10 22:41 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Martin DeMello, caml-list

[-- Attachment #1: Type: text/plain, Size: 4646 bytes --]

Gerd raises interesting technical points.

A possibly-too-simple point of view which is not really technical at all,
but usability: for a src/ directory, with (possibly nested) subdirectories
(used for organization of files not naming of modules), and with no
external dependencies, I think most users would like to write `ocamlfind
ocamlc src/` and have it just work. If there are external dependencies,
then `ocamlfind -pkgs x,y,z ocamlc src/` (or similar) should just work.

Now, ocaml's build process is very flexible, so we should not expect
everything to be so simple. But in the simple case, things should ideally
be simple.


On 10 July 2016 at 12:03, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:

> Am Samstag, den 09.07.2016, 21:16 -0700 schrieb Martin DeMello:
> > My consistent experience with OCaml has been that the build systems
> > are fiddly and hard to work with, but I've never seen a discussion of
> > why this is so (as opposed to problems with specific build tools).
> > Supposing you were to start from scratch and develop a new build
> > system in a bottom up manner, starting with a set of libraries and
> > utilities and working your way up to a framework or dsl, what would
> > the difficult steps be?
>
> Well, I don't think you can improve it that much (except... see below).
> Since I've taken over omake and studied it carefully, I think I have
> good insights into the problems.
>
> In short, the big difficulty of OCaml is the strict build topology. You
> need to build a module before the caller of the module. Most build-tool
> related failures come from that. Note that many other languages have
> more relaxed build topologies, or work around the problem by doing
> 2-pass builds (i.e. you first pre-compile and extract interfaces for the
> whole project, and do the actual code generation in the second pass).
>
> Let's have a closer look why it is relatively error-prone to extract the
> dependencies. The tool in question is ocamldep. It is fairly dumb in so
> far it is only parsing the source code, and then looks at all
> module-related constructs (open, include, module, etc.). Because it
> never looks into already compiled interfaces and also proceeds file by
> file, it may sometimes emit wrong dependency information. For example,
> when there is
>
> open M1
> open M2
>
> at the beginning of a file, ocamldep doesn't know whether M2 is another
> top-level module, or whether it is a submodule of M1. ocamldep normally
> errs on the side of generating too many dependencies, which is then
> tried to be corrected by only accepting those deps corresponding to
> existing files. In this example, this would mean that a dependency to M2
> is emitted when there is a file M2.ml. Note that this is wrong when M2
> is actually a submodule of M1 AND the file M2.ml exists.
>
> So how to fix this? In my opinion there are two solutions. You can
> either have a more intelligent ocamldep (e.g. one that reads in
> non-local cmi files and uses that information and also tries to
> interpret all project ml files at once and not file by file - btw, did
> anybody check whether there is an algorithm that precisely solves the
> problem?). The other solution path is to mark toplevel modules in the
> syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
> module).
>
> Besides ocamldep, there are also other aspects that affect the
> dependency analysis. E.g. with omake there is a distinction of
> project-local and other dependencies, and you need to set the
> OCAMLINCLUDES variable to add other directories to the local part of the
> analysis, whereas the non-local deps are nowadays handled with
> ocamlfind. First of all, this distinction is not really clear to every
> user, and second, there are some difficulties in processing that when
> both concepts overlap (e.g. you want to also get project-local
> dependency expansion with ocamlfind).
>
> Note that recently the dependency analysis even became harder because of
> flambda. Now cmx files play a bigger role. In particular a cmx file can
> refer to another cmx file that isn't a direct dependency.  In other
> words, there is a second kind of dependency that is added by the code
> generator. Current build tools cannot record these dependencies yet.
>
> Gerd
>
> --
> ------------------------------------------------------------
> Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
> My OCaml site:          http://www.camlcity.org
> Contact details:        http://www.camlcity.org/contact.html
> Company homepage:       http://www.gerd-stolpmann.de
> ------------------------------------------------------------
>
>

[-- Attachment #2: Type: text/html, Size: 5673 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-10 11:03 ` Gerd Stolpmann
  2016-07-10 11:33   ` [Caml-list] Is there an efficient precise ocamldep - Was: " Gerd Stolpmann
  2016-07-10 22:41   ` [Caml-list] " Tom Ridge
@ 2016-07-11  6:15   ` Martin DeMello
  2016-07-11  7:22     ` Frédéric Bour
  2016-07-11  9:14     ` Malcolm Matalka
  2016-07-12  8:18   ` Goswin von Brederlow
  3 siblings, 2 replies; 11+ messages in thread
From: Martin DeMello @ 2016-07-11  6:15 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

On Sun, Jul 10, 2016 at 4:03 AM, Gerd Stolpmann <info@gerd-stolpmann.de>
wrote:

> So how to fix this? In my opinion there are two solutions. You can
> either have a more intelligent ocamldep (e.g. one that reads in
> non-local cmi files and uses that information and also tries to
> interpret all project ml files at once and not file by file - btw, did
> anybody check whether there is an algorithm that precisely solves the
> problem?). The other solution path is to mark toplevel modules in the
> syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
> module).
>

Would an acceptable third option be to simply record the dag explicitly in
your build file? Working with google's build system [opensourced as bazel:
http://www.bazel.io/] has given me a great appreciation for simply writing
out build dependencies manually; sure, it is relatively tedious to have to
write out the graph yourself rather than have ocamldep figure it out, but
the time and effort to do so is a small fraction of the overall development
time of your project, and the reward is a 100% reliable "detection" of the
build topology.

martin

[-- Attachment #2: Type: text/html, Size: 1620 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-11  6:15   ` Martin DeMello
@ 2016-07-11  7:22     ` Frédéric Bour
  2016-07-11 10:36       ` Gerd Stolpmann
  2016-07-13 12:10       ` David Allsopp
  2016-07-11  9:14     ` Malcolm Matalka
  1 sibling, 2 replies; 11+ messages in thread
From: Frédéric Bour @ 2016-07-11  7:22 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 2258 bytes --]

I am beginning to think along those lines. Rather than just using 
top-modules and letting ocamldep guess dependencies, specifying them 
seems inherently more reliable and predictable.

Maybe with something like an import statement. If this is done in each 
file, it should scale easily.
Two things that might be related:
- putting more build information in ml files (I recall that Gabriel 
Scherer mentioned that once, is that right?)
- namespaces.

More generally, I identify the following problems when building OCaml code:
- under-specified dependencies, as discussed
- the compiler driver can produce multiple files on a single invocation. 
File level dependencies turn into an hypergraph rather than a graph 
(making a Makefile driven system hardly stable)
- an ML file alone is hard to understand out of context, because build 
specification are kept separate
- many names and partially overlapping concepts; top-modules, libraries, 
ocamlfind packages, opam packages.

On 07/11/2016 03:15 PM, Martin DeMello wrote:
> On Sun, Jul 10, 2016 at 4:03 AM, Gerd Stolpmann 
> <info@gerd-stolpmann.de <mailto:info@gerd-stolpmann.de>> wrote:
>
>     So how to fix this? In my opinion there are two solutions. You can
>     either have a more intelligent ocamldep (e.g. one that reads in
>     non-local cmi files and uses that information and also tries to
>     interpret all project ml files at once and not file by file - btw, did
>     anybody check whether there is an algorithm that precisely solves the
>     problem?). The other solution path is to mark toplevel modules in the
>     syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
>     module).
>
>
> Would an acceptable third option be to simply record the dag 
> explicitly in your build file? Working with google's build system 
> [opensourced as bazel: http://www.bazel.io/] has given me a great 
> appreciation for simply writing out build dependencies manually; sure, 
> it is relatively tedious to have to write out the graph yourself 
> rather than have ocamldep figure it out, but the time and effort to do 
> so is a small fraction of the overall development time of your 
> project, and the reward is a 100% reliable "detection" of the build 
> topology.
>
> martin


[-- Attachment #2: Type: text/html, Size: 3749 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-11  6:15   ` Martin DeMello
  2016-07-11  7:22     ` Frédéric Bour
@ 2016-07-11  9:14     ` Malcolm Matalka
  1 sibling, 0 replies; 11+ messages in thread
From: Malcolm Matalka @ 2016-07-11  9:14 UTC (permalink / raw)
  To: Martin DeMello; +Cc: caml-list

Martin DeMello <martindemello@gmail.com> writes:

> On Sun, Jul 10, 2016 at 4:03 AM, Gerd Stolpmann <info@gerd-stolpmann.de>
> wrote:
>
>> So how to fix this? In my opinion there are two solutions. You can
>> either have a more intelligent ocamldep (e.g. one that reads in
>> non-local cmi files and uses that information and also tries to
>> interpret all project ml files at once and not file by file - btw, did
>> anybody check whether there is an algorithm that precisely solves the
>> problem?). The other solution path is to mark toplevel modules in the
>> syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
>> module).
>>
>
> Would an acceptable third option be to simply record the dag explicitly in
> your build file? Working with google's build system [opensourced as bazel:
> http://www.bazel.io/] has given me a great appreciation for simply writing
> out build dependencies manually; sure, it is relatively tedious to have to
> write out the graph yourself rather than have ocamldep figure it out, but
> the time and effort to do so is a small fraction of the overall development
> time of your project, and the reward is a 100% reliable "detection" of the
> build topology.
>
> martin

I've created a build tool called pds (in opam, although a newer version
needs to be released) which is meant to be really easy to go from
nothing to a compiling project that installs.  One problem I found with
the various Ocaml build systems was that they were very flexible, which
can be nice, but also made them more complicated.  I was willing to
sacrifice flexibility for simplicity.

The README for the current version can be found here:

https://bitbucket.org/mimirops/pds/raw/95da73d295d790c82ed900a76880a402b9120b49/README.org

I'm sure there are bugs in there, especially the Makefile it generates,
but I use it on all of my projects with success.  It does rely heavily
on ocamldep to come up with the correct order to compile things within a
project.

/Malcolm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-11  7:22     ` Frédéric Bour
@ 2016-07-11 10:36       ` Gerd Stolpmann
  2016-07-13 12:10       ` David Allsopp
  1 sibling, 0 replies; 11+ messages in thread
From: Gerd Stolpmann @ 2016-07-11 10:36 UTC (permalink / raw)
  To: Frédéric Bour; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]

Am Montag, den 11.07.2016, 16:22 +0900 schrieb Frédéric Bour:
> - the compiler driver can produce multiple files on a single
> invocation. File level dependencies turn into an hypergraph rather
> than a graph (making a Makefile driven system hardly stable)

Just a comment on this point. This is actually not a big problem when
you record which rules have already been run, because then you can
consider

file1 file2 file3: file4 file5 file6
    cmd

just as

file1: file4 file5 file6
    cmd
file2: file4 file5 file6
    cmd
file3: file4 file5 file6
    cmd

This is what omake does (and the rules are identified by their MD5
digests, which is very reliable).

The limitation is that you cannot handle overlapping targets well,
something like

file1 file2: file4
    cmd1
file1 file3: file4
    cmd2

but this is something hardly any build system can do (i.e. alternate
rules).

Gerd

> - an ML file alone is hard to understand out of context, because build
> specification are kept separate
> - many names and partially overlapping concepts; top-modules,
> libraries, ocamlfind packages, opam packages.
> 
> On 07/11/2016 03:15 PM, Martin DeMello wrote:
> 
> > On Sun, Jul 10, 2016 at 4:03 AM, Gerd Stolpmann
> > <info@gerd-stolpmann.de> wrote:
> >         So how to fix this? In my opinion there are two solutions.
> >         You can
> >         either have a more intelligent ocamldep (e.g. one that reads
> >         in
> >         non-local cmi files and uses that information and also tries
> >         to
> >         interpret all project ml files at once and not file by file
> >         - btw, did
> >         anybody check whether there is an algorithm that precisely
> >         solves the
> >         problem?). The other solution path is to mark toplevel
> >         modules in the
> >         syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a
> >         toplevel
> >         module).
> > 
> > 
> > Would an acceptable third option be to simply record the dag
> > explicitly in your build file? Working with google's build system
> > [opensourced as bazel: http://www.bazel.io/] has given me a great
> > appreciation for simply writing out build dependencies manually;
> > sure, it is relatively tedious to have to write out the graph
> > yourself rather than have ocamldep figure it out, but the time and
> > effort to do so is a small fraction of the overall development time
> > of your project, and the reward is a 100% reliable "detection" of
> > the build topology.
> > 
> > 
> > martin 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] why is building ocaml hard?
  2016-07-10 11:03 ` Gerd Stolpmann
                     ` (2 preceding siblings ...)
  2016-07-11  6:15   ` Martin DeMello
@ 2016-07-12  8:18   ` Goswin von Brederlow
  3 siblings, 0 replies; 11+ messages in thread
From: Goswin von Brederlow @ 2016-07-12  8:18 UTC (permalink / raw)
  To: caml-list

On Sun, Jul 10, 2016 at 01:03:26PM +0200, Gerd Stolpmann wrote:
> Am Samstag, den 09.07.2016, 21:16 -0700 schrieb Martin DeMello:
> > My consistent experience with OCaml has been that the build systems
> > are fiddly and hard to work with, but I've never seen a discussion of
> > why this is so (as opposed to problems with specific build tools).
> > Supposing you were to start from scratch and develop a new build
> > system in a bottom up manner, starting with a set of libraries and
> > utilities and working your way up to a framework or dsl, what would
> > the difficult steps be?
> 
> Well, I don't think you can improve it that much (except... see below).
> Since I've taken over omake and studied it carefully, I think I have
> good insights into the problems.

I'm using oasis with myocamlbuild and one thing I find a total pain:
mssing files (typos in file name). The error reporting when something
goes wrong in myocamlbuild is a total nightmare as it reports on the
most unlikely build chain it can find to get the module you requested.

> In short, the big difficulty of OCaml is the strict build topology. You
> need to build a module before the caller of the module. Most build-tool
> related failures come from that. Note that many other languages have
> more relaxed build topologies, or work around the problem by doing
> 2-pass builds (i.e. you first pre-compile and extract interfaces for the
> whole project, and do the actual code generation in the second pass).
> 
> Let's have a closer look why it is relatively error-prone to extract the
> dependencies. The tool in question is ocamldep. It is fairly dumb in so
> far it is only parsing the source code, and then looks at all
> module-related constructs (open, include, module, etc.). Because it
> never looks into already compiled interfaces and also proceeds file by
> file, it may sometimes emit wrong dependency information. For example,
> when there is
> 
> open M1
> open M2
> 
> at the beginning of a file, ocamldep doesn't know whether M2 is another
> top-level module, or whether it is a submodule of M1. ocamldep normally
> errs on the side of generating too many dependencies, which is then
> tried to be corrected by only accepting those deps corresponding to
> existing files. In this example, this would mean that a dependency to M2
> is emitted when there is a file M2.ml. Note that this is wrong when M2
> is actually a submodule of M1 AND the file M2.ml exists.
> 
> So how to fix this? In my opinion there are two solutions. You can
> either have a more intelligent ocamldep (e.g. one that reads in
> non-local cmi files and uses that information and also tries to
> interpret all project ml files at once and not file by file - btw, did
> anybody check whether there is an algorithm that precisely solves the
> problem?).

Which basically means compiling all the sources recursively and
throwing away the AST. The algorithm is simple:

1) open file
2) parse code till you hit an unknown module
3) add dependency on module
4) check for cmi file for module and get list of submodules
   or check for mli file for module and gosub 1
   or check for ml file for module and gosub 1
   or assume module has no submodules
5) add module and submodules to known modules
6) goto 2

Note that this will still fail when you have

open M1
open M2

and M1 is generated and contains M2. You have to run ocamldep again
after M1 is generated.

> The other solution path is to mark toplevel modules in the
> syntax of OCaml (e.g. you'd have to do "open ^M2" is M2 is a toplevel
> module).

That seems rather stupid. As in why do I have to tell it that. Ocamlc
already figures that out just fine. I don't like doing the computers
job.

Here is a 3rd option:

3a) fix ocamlc to build recursively. When it finds an "open M1" and M1
is not yet build then build M1 first. This could be done in 3 ways:

  - Call itself to compile M1. Bad for parallel building and fails
    is M1 is generated (e.g. from mly file).

  - Retrun an error to the build system to build M1 first and retry.
    It's not easy to dynamically add depends in most build systems.
    For most this would probably mean restarting itself till there
    is no more dependency error.

  - A mixture of the two: ocamlc asks the build system to provide
    M1 when it needs it.

The 3rd option seems to be the sanest although figuring out a good
general interface might be tricky. E.g. for GNU make you have to pass
through the filedescriptor for the job server, some ENV vars and
temporarily returns its build token when calling make again.

3b) output dependencies as a side effect of buidling

This seems not needed since 3a ensures everything is build in order.
But knowing the dependencies makes it easier the second time around
and the build system can even watch for changed files and auto
rebuild. Building files in order also prevents many instances of
ocamlc being in memory at the same time. Building from scartch (no
depends yet) might fail because memory is exhausted from all the
ocamlc being started.

Note: ocamlc is just an example. Same goes for ocamlopt, ocamlyacc,
ocamllex, ...

I believe an interactive design, where the build system and the
compiler interact to figure out the dependencies together is the only
way to cover all cases. The compiler knows about modules and
submodules and all the ways the source can create a dependency on
another module and the build system knows about generated files. Both
are needed to figure out the complete dependencies.

> Besides ocamldep, there are also other aspects that affect the
> dependency analysis. E.g. with omake there is a distinction of
> project-local and other dependencies, and you need to set the
> OCAMLINCLUDES variable to add other directories to the local part of the
> analysis, whereas the non-local deps are nowadays handled with
> ocamlfind. First of all, this distinction is not really clear to every
> user, and second, there are some difficulties in processing that when
> both concepts overlap (e.g. you want to also get project-local
> dependency expansion with ocamlfind).
> 
> Note that recently the dependency analysis even became harder because of
> flambda. Now cmx files play a bigger role. In particular a cmx file can
> refer to another cmx file that isn't a direct dependency.  In other
> words, there is a second kind of dependency that is added by the code
> generator. Current build tools cannot record these dependencies yet.
> 
> Gerd

How does that change anything? Afaik if A depends on B and B depends
on C and C changes then B always needs to be rebuild. Therefore there
can never be a change in C.cmx unless B.cmx also changes. So for A
looking at B.cmx is enough. Right?

MfG
	Goswin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Caml-list] why is building ocaml hard?
  2016-07-11  7:22     ` Frédéric Bour
  2016-07-11 10:36       ` Gerd Stolpmann
@ 2016-07-13 12:10       ` David Allsopp
  1 sibling, 0 replies; 11+ messages in thread
From: David Allsopp @ 2016-07-13 12:10 UTC (permalink / raw)
  To: Frédéric Bour, caml-list

Frédéric Bour wrote:
> More generally, I identify the following problems when building OCaml code:
> - under-specified dependencies, as discussed
> - the compiler driver can produce multiple files on a single invocation. 
> File level dependencies turn into an hypergraph rather than a graph (making
> a Makefile driven system hardly stable)

The most horrible one I've found is the production of both .cmi and .cm[ox] for a .ml file with no .mli - it makes parallel compilation of bytecode and native code extremely difficult (for opam - using OCamlMakefile - I found the only way in make was to resort to using lockfile for .ml files with no .mli; of course, I prefer to have a rule that there must always be an .mli, even if it's just `touch foo.mli` to start with!).

Are there other instances where the emitting of multiple files is a stumbling block?

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-07-13 12:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-10  4:16 [Caml-list] why is building ocaml hard? Martin DeMello
2016-07-10 11:03 ` Gerd Stolpmann
2016-07-10 11:33   ` [Caml-list] Is there an efficient precise ocamldep - Was: " Gerd Stolpmann
2016-07-10 11:51     ` Petter A. Urkedal
2016-07-10 22:41   ` [Caml-list] " Tom Ridge
2016-07-11  6:15   ` Martin DeMello
2016-07-11  7:22     ` Frédéric Bour
2016-07-11 10:36       ` Gerd Stolpmann
2016-07-13 12:10       ` David Allsopp
2016-07-11  9:14     ` Malcolm Matalka
2016-07-12  8:18   ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).