Ok I think a good place to start a tour of the compiler is in parsing/parsetree.mli. This file is actually very well documented, with terse but effective examples of almost every constructor and type.

I had to refer to the OCaml manual for a few of the corner cases. For example, I didn't know about the #class type shortcut. I think a few comments explaining the more obscure facets of the language could be helpful.

Since the file is so well documented, I only have a few questions. I'll accept an answer or a hunch from anyone -- don't feel shy because you think you're not sure about the answer:

1. What is the difference between an extension and an attribute? From what I understand, they are both means of integrating additional metadata into the AST that can then be parsed by implementations of the ast-mapper, but why are there 2 mechanisms?

2. What is demonstrated in lines 114-117 regarding polymorphic variant row fields:

  | Rtag of label * bool * core_type list
        (* [`A]                   ( true,  [] )
           [`A of T]              ( false, [T] )
           [`A of T1 & .. & Tn]   ( false, [T1;...Tn] )
           [`A of & T1 & .. & Tn] ( true,  [T1;...Tn] )

What does the bool value represent?
Why are the type separators in the comments using the & symbol?
What is the difference between the 3rd and 4th example?

3. line 684: what is the purpose of the override flag on Pstr_open? It's not explained by the comment.

4. The toplevel phrases are not clear. What is the purpose of Ptop_dir on line 721?

Like I said, feel free to jump in and answer any one of these questions.

Thanks in advance for everyone's help


On Tue, Apr 1, 2014 at 6:03 AM, Mark Shinwell <mshinwell@janestreet.com> wrote:
I would suggest that it's probably better to keep the documentation as
comments where possible.  However, I think it is important to avoid
excessive commentary, especially if it is likely to get out of sync as
a result of future modifications to the code.  It may be that in some
cases making alterations to the code (for example, improving the name
of a variable) is a more satisfactory approach than adding a comment.

Thanks for working on this.


On 31 March 2014 18:51, Yotam Barnoy <yotambarnoy@gmail.com> wrote:
> I think it depends on how much feedback I get on any particular question. By
> default, I would like comments to go in the code. Additionally, there's the
> ocaml-internals wiki at https://github.com/ocamllabs/ocaml-internals which
> will be useful for any concepts that span multiple files, or that are too
> beginner-oriented. I'm guessing that for many things, it will just have to
> be decided on a case-by-case basis.
> Of course, the most important ingredient for the success of this 'project'
> is the willing, patient participation of the core team, as well as the other
> experts on this list.
> -Yotam
> On Mon, Mar 31, 2014 at 1:06 PM, Milan Stanojević <milanst@gmail.com> wrote:
>> Thank you for doing this, I'm interested in learning more about how
>> compiler works.
>> Are you creating a separate file(s) to document the compiler or you
>> are adding comments to ml files?
>> On Mon, Mar 31, 2014 at 11:39 AM, Yotam Barnoy <yotambarnoy@gmail.com>
>> wrote:
>> > Hi everybody
>> >
>> > It's been mentioned before that the OCaml compiler's documentation is
>> > somewhat lacking. I've been going over the compiler code gradually (both
>> > the
>> > frontend and the backend) and while some parts are understandable
>> > enough,
>> > others are missing some basic explanations. Some explanations are also
>> > spread out throughout the codebase, making it hard to know what
>> > something
>> > means unless you've read another part of the codebase that relates to
>> > it.
>> >
>> > Since the call to submit documentation commits has gone mostly
>> > unanswered,
>> > I'd like to suggest a method of making both my own progress through the
>> > code
>> > easier and hopefully making it easier for others who will follow.
>> >
>> > What I'm going to do is, focusing on more or less one file at a time,
>> > I'll
>> > post newbie questions to the list about the code. Once I'm satisfied
>> > that I
>> > have a good enough understanding, I'll add comments to the
>> > aforementioned
>> > files and submit pull requests for them. I also encourage others to do
>> > the
>> > same.
>> >
>> > What I need from the list, and especially from the more knowledgeable
>> > members (who already know the compiler code) is the willingness to
>> > explain
>> > the concepts and answer my questions, annoying as they may be. I have a
>> > pretty decent background in compilers, ASTs, code generation, etc, but
>> > not
>> > so much in type inference.
>> >
>> > I'm not suggesting a particular timeframe for this process -- I'm doing
>> > this
>> > on the side while working on a research project and TAing, but I really
>> > would like to get to the point where I can make significant
>> > contributions to
>> > the toolchain, and if I can help others who follow in my footsteps, then
>> > that's a nice bonus.
>> >
>> > While I could have skipped this introduction and just proceeded with
>> > inundating the list with questions, I felt that this (hopefully) gives a
>> > purpose and perhaps motivation for those who have the answers to answer
>> > my
>> > questions even if they get annoying. In particular, I may often miss
>> > some
>> > parts that may seem obvious because I don't necessarily have the time to
>> > read all the connected code in depth. Hopefully you'll bear with me.
>> >
>> > Does this sound reasonable to the fine folks on the list?
>> >
>> > Yotam