```
From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: Malcolm Matalka <mmatalka@gmail.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] How is this type inferred (GADTs)
Date: Fri, 20 Sep 2019 10:35:06 +0200
Message-ID: <CAPFanBFAr5LOEXTdRJtrX6FUQ2_LVjbVgZqiZHLMyE6KkUPL8A@mail.gmail.com> (raw)
In-Reply-To: <86v9tnfkum.fsf@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 9450 bytes --]
There is a direct relation between how type evolve during GADT typing and
how queries evolve during logic/relational programming. It can help to
think of a GADT declaration as the set of rules of a Prolog program. (I you
have never looked at Prolog, here is an occasion to learn something new).
That does *not* mean that GADTs let you do prolog-style programming in
types (the compiler will not automatically search for a proof, the user has
to write it down explicitly as a GADT term, so this work is not done
implicitly, as with type-class search for example), but it does give a way
to think about GADT declarations.
Your declaration (notice that I renamed the Z variable, for a good reason
that will appear clear below):
type (_, _) t =
| Z : ('f, 'f) t
| S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t
becomes
t F F.
t F R :- t F (A -> R), (unit -> A).
In any case, your declaration has the following effect:
- by the Z rule, all types of the form
('a1 -> 'a2 -> ... -> 'an -> 'r, 'a -> 'a2 -> ... -> 'an -> 'r) t
are inhabited
- by the S rule, you can "hide" an element on the right
(and you have to provide a thunk for it)
, moving from
('a1 -> ... -> 'an -> r, 'ak -> 'a{k+1} -> .. -> 'an -> r) t
to
('a1 -> ... -> 'an -> r, 'a{k+1} -> .. -> 'an -> r) t
If you do this repeatedly you end up on what you want: ('a1 -> .. -> 'an ->
r, r) t.
The way you write about it, it sounds like you had the following
*different* definition in mind
type (_, _) t =
| Z : ('r, 'r) t
| S : ('f, 'r) t * (unit -> 'a) -> ('a -> 'f, 'r) t
let test = S (S (Z, (fun () -> 3l)), (fun () -> "foo"))
corresponding to the following Prolog program that I do find easier to
think about:
t R R.
t (A -> F) R :- t F R, (unit -> A).
It turns out that with this other definition, the first implementation that
you had written actually type-check fines.
(You could try to write conversion functions between those two definitions
to convince yourself that they are morally equivalent, but this is actually
a difficult exercise.)
On Fri, Sep 20, 2019 at 10:11 AM Malcolm Matalka <mmatalka@gmail.com> wrote:
>
> Gabriel Scherer <gabriel.scherer@gmail.com> writes:
>
> > Note: the reason why you get _weak type variables (which are not
> > polymorphic, just not-inferred-yet) is that the type-checker cannot
> detect
> > that (z // (fun () -> ...)) is a value: it would have to unfold the call
> to
> > (//) for this, which it doesn't do in general, and here certainly could
> not
> > do given that its definition is hidden behind an abstraction boundary.
> > I would recommend experimenting *without* the module interface and the
> > auxiliary function, with the constructors directly, you would get
> slightly
> > better types.
> >
> > # S(S(Z, (fun () -> Int32.zero)), (fun () -> ""));;
> > - : (int32 -> string -> 'a, 'a) t = S (S (Z, <fun>), <fun>)
> >
> > Historically we have used module interfaces to implement "phantom types"
> --
> > because the type information there is only present in the interface, not
> in
> > the definition. With GADTs, the type constraints are built into the
> > definition itself, which is precisely what makes them more powerful; no
> > need for an abstract interface on top.
> >
> > The first part of your question is about understanding how the
> > type-inference work (how variables are manipulated by the type-checker
> and
> > then "propagated back up"). This sounds like a difficult way to
> understand
> > GADTs: you want to learn the typing rules *and* the type-inference
> > algorithm at once. But only the first part is necessary to use the
> feature
> > productively (with a few tips on when to use annotations, which are easy
> to
> > explain and in fact explained in the manual:
> > http://caml.inria.fr/pub/docs/manual-ocaml/manual033.html ). So instead
> of
> > asking: "how did the compiler get this type?", I would ask: "why is this
> > the right type"? I think you could convince yourself that (1) it is a
> > correct type and (2) any other valid type would be a specialization of
> this
> > type, there is no simpler solution.
> >
> > The second part: you wrote:
> >
> > let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
> > | Z ->
> > f
> > | S (t, v) ->
> > bind (f (v ())) t
> >
> > Let's focus on the second clause:
> >
> > | S (t, v) ->
> > bind (f (v ())) t
> >
> > we know that (f : f) holds, and that the pattern-matching is on a value
> of
> > type (f, r) t, and we must return r.
> > When pattern-matching on S (t, v), we learn extra type information from
> the
> > typing rule of S:
> >
> > | S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t
> >
> > if r has type (f, r) t, then (t, v) has type ((f, a -> r) t * (unit ->
> a))
> > for *some* unknown/existential type a. So within the branch we have
> >
> > bind : type f r. f -> (f, r) t -> r
> > f : (f, a -> r) t
> > v : unit -> a
> > expected return type: r
> >
> > f does *not* have a function type here, so your idea of applying (f (v
> ()))
> > cannot work (v does have a function type, so (v ()) is valid).
> >
> > The only thing you can do on f is call (bind) recursively (with what
> > arguments?), and then you will get an (a -> r) as result.
> >
> > Do you see how to write a correct program from there?
>
> Ah-ha! I was close but my thinking was at the "wrong level" (I'm not
> sure how to put it). The working implementation of bind is:
>
> let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
> | Z ->
> f
> | S (t, v) ->
> let f' = bind f t in
> f' (v ())
>
> And now I see why you wanted me to look at it not through the module
> interface. Looking at how the recursion would work, of course the
> most-nested S will have the function corresponding to the first
> parameter of the function. I was thinking that I would consume the
> parameters on the way down, but it's actually on the way up.
>
> I am still working out in my head the answer to "why is this the right
> type", I think it has to slosh around in there a bit before it truly
> makes sense to me.
>
> Thank you!
>
> >
> >
> >
> > On Fri, Sep 20, 2019 at 5:42 AM Malcolm Matalka <mmatalka@gmail.com>
> wrote:
> >
> >> I have been using GADTs to make type-safe versions of APIs that are kind
> >> of like printf. I've been using this pattern by rote and now I'm
> >> getting to trying to understand how it works.
> >>
> >> I have the following code:
> >>
> >> module F : sig
> >> type ('f, 'r) t
> >>
> >> val z : ('r, 'r) t
> >> val (//) : ('f, 'a -> 'r) t -> (unit -> 'a) -> ('f, 'r) t
> >> end = struct
> >> type ('f, 'r) t =
> >> | Z : ('r, 'r) t
> >> | S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t
> >>
> >> let z = Z
> >> let (//) t f = S (t, f)
> >> end
> >>
> >> And the following usage:
> >>
> >> utop # F.(z // (fun () -> Int32.zero) // (fun () -> ""));;
> >> - : (int32 -> string -> '_weak1, '_weak1) F.t = <abstr>
> >>
> >> I understand how 'r is '_weak1 (I think), but I'm still trying to figure
> >> out how 'f gets its type by applying (//). I think I have an
> >> explanation below, and I'm hoping someone can say if it's correct and/or
> >> simplify it.
> >>
> >> Explanation:
> >>
> >> The constructor Z says that 'f and 'r, are the same (Z : ('r, 'r) t).
> >> The constructor S says that the (_, _) t that it takes has some type 'f,
> >> but that the second type variable must be of the form 'a -> 'r, sort of
> >> deconstructing pattern match on the type (if that makes sense). And if
> >> that (_, _) t is a Z, where we have stated the two type variables have
> >> the same value, that means 'f must also be ('a -> 'r). So for the
> >> first application of (//) it makes sense. If we apply (//) again, the
> >> first parameter has the type (int32 -> '_weak1, '_weak1) t, but that
> >> must actually mean '_weak1 in this case is string -> '_weak, and then
> >> this is where things become jumbled in my head. If the passed in (_, _)
> >> t must be (string -> '_weak), and the inner type actually must be (int32
> >> -> '_weak), then, the inner inner type must be ('_weak), the type of 'r
> >> at Z must be (string -> int32 -> '_weak), and since 'r, and 'f are the
> >> same type for Z, 'f must also be (string -> int32 -> '_weak), and that
> >> knowledge propagates back up? But that doesn't feel quite right to me,
> >> either.
> >>
> >> With the help of reading other code, I've figured out how to make a
> >> function that uses a type like this that works like kprintf, however
> >> where I'm going in this case is that I want to take a function that
> >> matches 'f and apply it to the result of my functions.
> >>
> >> Something like:
> >>
> >> let rec bind : type f r. f -> (f, r) t -> r = fun f -> function
> >> | Z ->
> >> f
> >> | S (t, v) ->
> >> bind (f (v ())) t
> >>
> >> However, this does not compile given:
> >>
> >> Error: This expression has type f
> >> This is not a function; it cannot be applied.
> >>
> >> My understanding was if I have Z, and an input that has the type f, then
> >> that is just the return type since at Z, f and r are the same type. And
> >> than at S, I can peel away the type of f by applying the function f and
> >> then recursively calling. But my understanding is missing something.
> >>
> >> Does this make sense?
> >>
> >> What am I not understanding?
> >>
> >> Thank you,
> >> /Malcolm
> >>
>
>
[-- Attachment #2: Type: text/html, Size: 12567 bytes --]
<div dir="ltr"><div>There is a direct relation between how type evolve during GADT typing and how queries evolve during logic/relational programming. It can help to think of a GADT declaration as the set of rules of a Prolog program. (I you have never looked at Prolog, here is an occasion to learn something new). That does *not* mean that GADTs let you do prolog-style programming in types (the compiler will not automatically search for a proof, the user has to write it down explicitly as a GADT term, so this work is not done implicitly, as with type-class search for example), but it does give a way to think about GADT declarations.</div><div><br></div><div>Your declaration (notice that I renamed the Z variable, for a good reason that will appear clear below):<br></div><div><br></div><div>
type (_, _) t =<br>
| Z : ('f, 'f) t<br>
| S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t</div><div><br></div><div>becomes<br></div><div><br></div><div> t F F.<br></div><div> t F R :- t F (A -> R), (unit -> A).</div><div><br></div><div>In any case, your declaration has the following effect:</div><div>- by the Z rule, all types of the form</div><div> ('a1 -> 'a2 -> ... -> 'an -> 'r, 'a -> 'a2 -> ... -> 'an -> 'r) t <br></div><div> are inhabited<br></div><div>- by the S rule, you can "hide" an element on the right</div><div> (and you have to provide a thunk for it)<br></div><div>, moving from</div><div> ('a1 -> ... -> 'an -> r, 'ak -> 'a{k+1} -> .. -> 'an -> r) t<br></div><div> to</div><div> ('a1 -> ... -> 'an -> r, 'a{k+1} -> .. -> 'an -> r) t<br><div> <br></div><div>If you do this repeatedly you end up on what you want: ('a1 -> .. -> 'an -> r, r) t.</div><div><br></div><div>The way you write about it, it sounds like you had the following *different* definition in mind</div><div><br></div><div>type (_, _) t =<br>| Z : ('r, 'r) t<br>| S : ('f, 'r) t * (unit -> 'a) -> ('a -> 'f, 'r) t<br><br>let test = S (S (Z, (fun () -> 3l)), (fun () -> "foo"))</div><div><br></div><div>corresponding to the following Prolog program that I do find easier to think about:<br></div><div><br></div><div><div> t R R.<br></div><div> t (A -> F) R :- t F R, (unit -> A).</div><div><br></div></div><div>It turns out that with this other definition, the first implementation that you had written actually type-check fines.</div><div><br></div><div>(You could try to write conversion functions between those two definitions to convince yourself that they are morally equivalent, but this is actually a difficult exercise.)<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 20, 2019 at 10:11 AM Malcolm Matalka <<a href="mailto:mmatalka@gmail.com">mmatalka@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Gabriel Scherer <<a href="mailto:gabriel.scherer@gmail.com" target="_blank">gabriel.scherer@gmail.com</a>> writes:<br>
<br>
> Note: the reason why you get _weak type variables (which are not<br>
> polymorphic, just not-inferred-yet) is that the type-checker cannot detect<br>
> that (z // (fun () -> ...)) is a value: it would have to unfold the call to<br>
> (//) for this, which it doesn't do in general, and here certainly could not<br>
> do given that its definition is hidden behind an abstraction boundary.<br>
> I would recommend experimenting *without* the module interface and the<br>
> auxiliary function, with the constructors directly, you would get slightly<br>
> better types.<br>
><br>
> # S(S(Z, (fun () -> Int32.zero)), (fun () -> ""));;<br>
> - : (int32 -> string -> 'a, 'a) t = S (S (Z, <fun>), <fun>)<br>
><br>
> Historically we have used module interfaces to implement "phantom types" --<br>
> because the type information there is only present in the interface, not in<br>
> the definition. With GADTs, the type constraints are built into the<br>
> definition itself, which is precisely what makes them more powerful; no<br>
> need for an abstract interface on top.<br>
><br>
> The first part of your question is about understanding how the<br>
> type-inference work (how variables are manipulated by the type-checker and<br>
> then "propagated back up"). This sounds like a difficult way to understand<br>
> GADTs: you want to learn the typing rules *and* the type-inference<br>
> algorithm at once. But only the first part is necessary to use the feature<br>
> productively (with a few tips on when to use annotations, which are easy to<br>
> explain and in fact explained in the manual:<br>
> <a href="http://caml.inria.fr/pub/docs/manual-ocaml/manual033.html" rel="noreferrer" target="_blank">http://caml.inria.fr/pub/docs/manual-ocaml/manual033.html</a> ). So instead of<br>
> asking: "how did the compiler get this type?", I would ask: "why is this<br>
> the right type"? I think you could convince yourself that (1) it is a<br>
> correct type and (2) any other valid type would be a specialization of this<br>
> type, there is no simpler solution.<br>
><br>
> The second part: you wrote:<br>
><br>
> let rec bind : type f r. f -> (f, r) t -> r = fun f -> function<br>
> | Z -><br>
> f<br>
> | S (t, v) -><br>
> bind (f (v ())) t<br>
><br>
> Let's focus on the second clause:<br>
><br>
> | S (t, v) -><br>
> bind (f (v ())) t<br>
><br>
> we know that (f : f) holds, and that the pattern-matching is on a value of<br>
> type (f, r) t, and we must return r.<br>
> When pattern-matching on S (t, v), we learn extra type information from the<br>
> typing rule of S:<br>
><br>
> | S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t<br>
><br>
> if r has type (f, r) t, then (t, v) has type ((f, a -> r) t * (unit -> a))<br>
> for *some* unknown/existential type a. So within the branch we have<br>
><br>
> bind : type f r. f -> (f, r) t -> r<br>
> f : (f, a -> r) t<br>
> v : unit -> a<br>
> expected return type: r<br>
><br>
> f does *not* have a function type here, so your idea of applying (f (v ()))<br>
> cannot work (v does have a function type, so (v ()) is valid).<br>
><br>
> The only thing you can do on f is call (bind) recursively (with what<br>
> arguments?), and then you will get an (a -> r) as result.<br>
><br>
> Do you see how to write a correct program from there?<br>
<br>
Ah-ha! I was close but my thinking was at the "wrong level" (I'm not<br>
sure how to put it). The working implementation of bind is:<br>
<br>
let rec bind : type f r. f -> (f, r) t -> r = fun f -> function<br>
| Z -><br>
f<br>
| S (t, v) -><br>
let f' = bind f t in<br>
f' (v ())<br>
<br>
And now I see why you wanted me to look at it not through the module<br>
interface. Looking at how the recursion would work, of course the<br>
most-nested S will have the function corresponding to the first<br>
parameter of the function. I was thinking that I would consume the<br>
parameters on the way down, but it's actually on the way up.<br>
<br>
I am still working out in my head the answer to "why is this the right<br>
type", I think it has to slosh around in there a bit before it truly<br>
makes sense to me.<br>
<br>
Thank you!<br>
<br>
><br>
><br>
><br>
> On Fri, Sep 20, 2019 at 5:42 AM Malcolm Matalka <<a href="mailto:mmatalka@gmail.com" target="_blank">mmatalka@gmail.com</a>> wrote:<br>
><br>
>> I have been using GADTs to make type-safe versions of APIs that are kind<br>
>> of like printf. I've been using this pattern by rote and now I'm<br>
>> getting to trying to understand how it works.<br>
>><br>
>> I have the following code:<br>
>><br>
>> module F : sig<br>
>> type ('f, 'r) t<br>
>><br>
>> val z : ('r, 'r) t<br>
>> val (//) : ('f, 'a -> 'r) t -> (unit -> 'a) -> ('f, 'r) t<br>
>> end = struct<br>
>> type ('f, 'r) t =<br>
>> | Z : ('r, 'r) t<br>
>> | S : (('f, 'a -> 'r) t * (unit -> 'a)) -> ('f, 'r) t<br>
>><br>
>> let z = Z<br>
>> let (//) t f = S (t, f)<br>
>> end<br>
>><br>
>> And the following usage:<br>
>><br>
>> utop # F.(z // (fun () -> Int32.zero) // (fun () -> ""));;<br>
>> - : (int32 -> string -> '_weak1, '_weak1) F.t = <abstr><br>
>><br>
>> I understand how 'r is '_weak1 (I think), but I'm still trying to figure<br>
>> out how 'f gets its type by applying (//). I think I have an<br>
>> explanation below, and I'm hoping someone can say if it's correct and/or<br>
>> simplify it.<br>
>><br>
>> Explanation:<br>
>><br>
>> The constructor Z says that 'f and 'r, are the same (Z : ('r, 'r) t).<br>
>> The constructor S says that the (_, _) t that it takes has some type 'f,<br>
>> but that the second type variable must be of the form 'a -> 'r, sort of<br>
>> deconstructing pattern match on the type (if that makes sense). And if<br>
>> that (_, _) t is a Z, where we have stated the two type variables have<br>
>> the same value, that means 'f must also be ('a -> 'r). So for the<br>
>> first application of (//) it makes sense. If we apply (//) again, the<br>
>> first parameter has the type (int32 -> '_weak1, '_weak1) t, but that<br>
>> must actually mean '_weak1 in this case is string -> '_weak, and then<br>
>> this is where things become jumbled in my head. If the passed in (_, _)<br>
>> t must be (string -> '_weak), and the inner type actually must be (int32<br>
>> -> '_weak), then, the inner inner type must be ('_weak), the type of 'r<br>
>> at Z must be (string -> int32 -> '_weak), and since 'r, and 'f are the<br>
>> same type for Z, 'f must also be (string -> int32 -> '_weak), and that<br>
>> knowledge propagates back up? But that doesn't feel quite right to me,<br>
>> either.<br>
>><br>
>> With the help of reading other code, I've figured out how to make a<br>
>> function that uses a type like this that works like kprintf, however<br>
>> where I'm going in this case is that I want to take a function that<br>
>> matches 'f and apply it to the result of my functions.<br>
>><br>
>> Something like:<br>
>><br>
>> let rec bind : type f r. f -> (f, r) t -> r = fun f -> function<br>
>> | Z -><br>
>> f<br>
>> | S (t, v) -><br>
>> bind (f (v ())) t<br>
>><br>
>> However, this does not compile given:<br>
>><br>
>> Error: This expression has type f<br>
>> This is not a function; it cannot be applied.<br>
>><br>
>> My understanding was if I have Z, and an input that has the type f, then<br>
>> that is just the return type since at Z, f and r are the same type. And<br>
>> than at S, I can peel away the type of f by applying the function f and<br>
>> then recursively calling. But my understanding is missing something.<br>
>><br>
>> Does this make sense?<br>
>><br>
>> What am I not understanding?<br>
>><br>
>> Thank you,<br>
>> /Malcolm<br>
>><br>
<br>
</blockquote></div>
```

next prev parent reply indexThread overview:6+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-09-20 3:42 Malcolm Matalka 2019-09-20 6:35 ` Gabriel Scherer 2019-09-20 8:11 ` Malcolm Matalka2019-09-20 8:35 ` Gabriel Scherer [this message]2019-09-20 8:56 ` Gabriel Scherer 2019-09-20 14:25 ` Oleg

Reply instructions:You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the--to,--cc, and--in-reply-toswitches of git-send-email(1): git send-email \ --in-reply-to=CAPFanBFAr5LOEXTdRJtrX6FUQ2_LVjbVgZqiZHLMyE6KkUPL8A@mail.gmail.com \ --to=gabriel.scherer@gmail.com \ --cc=caml-list@inria.fr \ --cc=mmatalka@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting theIn-Reply-Toheader via mailto: links, try the mailto: link

caml-list - the Caml user's mailing list Archives are clonable: git clone --mirror http://inbox.vuxu.org/caml-list git clone --mirror https://inbox.ocaml.org/caml-list Example config snippet for mirrors Newsgroup available over NNTP: nntp://inbox.vuxu.org/vuxu.archive.caml-list AGPL code for this site: git clone https://public-inbox.org/public-inbox.git