Note: the reason why you get _weak type variables (which are not polymorphic, just not-inferred-yet) is that the type-checker cannot detect that (z // (fun () -> ...)) is a value: it would have to unfold the call to (//) for this, which it doesn't do in general, and here certainly could not do given that its definition is hidden behind an abstraction boundary.

I would recommend experimenting *without* the module interface and the auxiliary function, with the constructors directly, you would get slightly better types.

# S(S(Z, (fun () -> Int32.zero)), (fun () -> ""));;
- : (int32 -> string -> 'a, 'a) t = S (S (Z, <fun>), <fun>)

Historically we have used module interfaces to implement "phantom types" -- because the type information there is only present in the interface, not in the definition. With GADTs, the type constraints are built into the definition itself, which is precisely what makes them more powerful; no need for an abstract interface on top.

The first part of your question is about understanding how the type-inference work (how variables are manipulated by the type-checker and then "propagated back up"). This sounds like a difficult way to understand GADTs: you want to learn the typing rules *and* the type-inference algorithm at once. But only the first part is necessary to use the feature productively (with a few tips on when to use annotations, which are easy to explain and in fact explained in the manual: http://caml.inria.fr/pub/docs/manual-ocaml/manual033.html ). So instead of asking: "how did the compiler get this type?", I would ask: "why is this the right type"? I think you could convince yourself that (1) it is a correct type and (2) any other valid type would be a specialization of this type, there is no simpler solution.

The second part: you wrote:

let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
| Z ->
f
| S (t, v) ->
bind (f (v ())) t

Let's focus on the second clause:

| S (t, v) ->

bind (f (v ())) t

we know that (f : f) holds, and that the pattern-matching is on a value of type (f, r) t, and we must return r.

When pattern-matching on S (t, v), we learn extra type information from the typing rule of S:

| S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t

if r has type (f, r) t, then (t, v) has type ((f, a -> r) t * (unit -> a)) for *some* unknown/existential type a. So within the branch we have

bind : type f r. f -> (f, r) t -> r

f : (f, a -> r) t

v : unit -> a

expected return type: r

f does *not* have a function type here, so your idea of applying (f (v ())) cannot work (v does have a function type, so (v ()) is valid).

The only thing you can do on f is call (bind) recursively (with what arguments?), and then you will get an (a -> r) as result.

Do you see how to write a correct program from there?

On Fri, Sep 20, 2019 at 5:42 AM Malcolm Matalka <mmatalka@gmail.com> wrote:

I have been using GADTs to make type-safe versions of APIs that are kind
of like printf. I've been using this pattern by rote and now I'm
getting to trying to understand how it works.

I have the following code:

module F : sig
type ('f, 'r) t

val z : ('r, 'r) t
val (//) : ('f, 'a -> 'r) t -> (unit -> 'a) -> ('f, 'r) t
end = struct
type ('f, 'r) t =
| Z : ('r, 'r) t
| S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t

let z = Z
let (//) t f = S (t, f)
end

And the following usage:

utop # F.(z // (fun () -> Int32.zero) // (fun () -> ""));;
- : (int32 -> string -> '_weak1, '_weak1) F.t = <abstr>

I understand how 'r is '_weak1 (I think), but I'm still trying to figure
out how 'f gets its type by applying (//). I think I have an
explanation below, and I'm hoping someone can say if it's correct and/or
simplify it.

Explanation:

The constructor Z says that 'f and 'r, are the same (Z : ('r, 'r) t).
The constructor S says that the (_, _) t that it takes has some type 'f,
but that the second type variable must be of the form 'a -> 'r, sort of
deconstructing pattern match on the type (if that makes sense). And if
that (_, _) t is a Z, where we have stated the two type variables have
the same value, that means 'f must also be ('a -> 'r). So for the
first application of (//) it makes sense. If we apply (//) again, the
first parameter has the type (int32 -> '_weak1, '_weak1) t, but that
must actually mean '_weak1 in this case is string -> '_weak, and then
this is where things become jumbled in my head. If the passed in (_, _)
t must be (string -> '_weak), and the inner type actually must be (int32
-> '_weak), then, the inner inner type must be ('_weak), the type of 'r
at Z must be (string -> int32 -> '_weak), and since 'r, and 'f are the
same type for Z, 'f must also be (string -> int32 -> '_weak), and that
knowledge propagates back up? But that doesn't feel quite right to me,
either.

With the help of reading other code, I've figured out how to make a
function that uses a type like this that works like kprintf, however
where I'm going in this case is that I want to take a function that
matches 'f and apply it to the result of my functions.

Something like:

let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
| Z ->
f
| S (t, v) ->
bind (f (v ())) t

However, this does not compile given:

Error: This expression has type f
This is not a function; it cannot be applied.

My understanding was if I have Z, and an input that has the type f, then
that is just the return type since at Z, f and r are the same type. And
than at S, I can peel away the type of f by applying the function f and
then recursively calling. But my understanding is missing something.

Does this make sense?

What am I not understanding?

Thank you,
/Malcolm