Note: the reason why you get _weak type variables (which are not
polymorphic, just not-inferred-yet)  is that the type-checker cannot detect
that (z // (fun () -> ...)) is a value: it would have to unfold the call to
(//) for this, which it doesn't do in general, and here certainly could not
do given that its definition is hidden behind an abstraction boundary.
I would recommend experimenting *without* the module interface and the
auxiliary function, with the constructors directly, you would get slightly
better types.

# S(S(Z, (fun () -> Int32.zero)), (fun () -> ""));;
- : (int32 -> string -> 'a, 'a) t = S (S (Z, <fun>), <fun>)

Historically we have used module interfaces to implement "phantom types" --
because the type information there is only present in the interface, not in
the definition. With GADTs, the type constraints are built into the
definition itself, which is precisely what makes them more powerful; no
need for an abstract interface on top.

The first part of your question is about understanding how the
type-inference work (how variables are manipulated by the type-checker and
then "propagated back up"). This sounds like a difficult way to understand
GADTs: you want to learn the typing rules *and* the type-inference
algorithm at once. But only the first part is necessary to use the feature
productively (with a few tips on when to use annotations, which are easy to
explain and in fact explained in the manual:
http://caml.inria.fr/pub/docs/manual-ocaml/manual033.html ). So instead of
asking: "how did the compiler get this type?", I would ask: "why is this
the right type"? I think you could convince yourself that (1) it is a
correct type and (2) any other valid type would be a specialization of this
type, there is no simpler solution.

The second part: you wrote:

  let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
    | Z ->
      f
    | S (t, v) ->
      bind (f (v ())) t

Let's focus on the second clause:

    | S (t, v) ->
      bind (f (v ())) t

we know that (f : f) holds, and that the pattern-matching is on a value of
type (f, r) t, and we must return r.
When pattern-matching on S (t, v), we learn extra type information from the
typing rule of S:

    | S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t

if r has type (f, r) t, then (t, v) has type ((f, a -> r) t * (unit -> a))
for *some* unknown/existential type a. So within the branch we have

  bind : type f r. f -> (f, r) t -> r
  f : (f, a -> r) t
  v : unit -> a
  expected return type: r

f does *not* have a function type here, so your idea of applying (f (v ()))
cannot work (v does have a function type, so (v ()) is valid).

The only thing you can do on f is call (bind) recursively (with what
arguments?), and then you will get an (a -> r) as result.

Do you see how to write a correct program from there?


On Fri, Sep 20, 2019 at 5:42 AM Malcolm Matalka <mmatalka@gmail.com> wrote:

> I have been using GADTs to make type-safe versions of APIs that are kind
> of like printf.  I've been using this pattern by rote and now I'm
> getting to trying to understand how it works.
>
> I have the following code:
>
> module F : sig
>   type ('f, 'r) t
>
>   val z : ('r, 'r) t
>   val (//) : ('f, 'a -> 'r) t -> (unit -> 'a) -> ('f, 'r) t
> end = struct
>   type ('f, 'r) t =
>     | Z : ('r, 'r) t
>     | S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t
>
>   let z = Z
>   let (//) t f = S (t, f)
> end
>
> And the following usage:
>
> utop # F.(z // (fun () -> Int32.zero) // (fun () -> ""));;
> - : (int32 -> string -> '_weak1, '_weak1) F.t = <abstr>
>
> I understand how 'r is '_weak1 (I think), but I'm still trying to figure
> out how 'f gets its type by applying (//).  I think I have an
> explanation below, and I'm hoping someone can say if it's correct and/or
> simplify it.
>
> Explanation:
>
> The constructor Z says that 'f and 'r, are the same (Z : ('r, 'r) t).
> The constructor S says that the (_, _) t that it takes has some type 'f,
> but that the second type variable must be of the form 'a -> 'r, sort of
> deconstructing pattern match on the type (if that makes sense).  And if
> that (_, _) t is a Z, where we have stated the two type variables have
> the same value, that means 'f must also be ('a -> 'r). So for the
> first application of (//) it makes sense.  If we apply (//) again, the
> first parameter has the type (int32 -> '_weak1, '_weak1) t, but that
> must actually mean '_weak1 in this case is string -> '_weak, and then
> this is where things become jumbled in my head.  If the passed in (_, _)
> t must be (string -> '_weak), and the inner type actually must be (int32
> -> '_weak), then, the inner inner type must be ('_weak), the type of 'r
> at Z must be (string -> int32 -> '_weak), and since 'r, and 'f are the
> same type for Z, 'f must also be (string -> int32 -> '_weak), and that
> knowledge propagates back up?  But that doesn't feel quite right to me,
> either.
>
> With the help of reading other code, I've figured out how to make a
> function that uses a type like this that works like kprintf, however
> where I'm going in this case is that I want to take a function that
> matches 'f and apply it to the result of my functions.
>
> Something like:
>
>   let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
>     | Z ->
>       f
>     | S (t, v) ->
>       bind (f (v ())) t
>
> However, this does not compile given:
>
> Error: This expression has type f
>        This is not a function; it cannot be applied.
>
> My understanding was if I have Z, and an input that has the type f, then
> that is just the return type since at Z, f and r are the same type.  And
> than at S, I can peel away the type of f by applying the function f and
> then recursively calling.  But my understanding is missing something.
>
> Does this make sense?
>
> What am I not understanding?
>
> Thank you,
> /Malcolm
>