Very interesting and thoughtprovoking writeup, thank you!
Incidentally, we're investigating the same venues, in our CMU BAP project,
as we found out that we need the extensibility in the style of type
classes/canonical structures to decouple complex dependencies which arise
in the program analysis domain.
In fact, we build our new BAP 2.0 framework largely on your
[taglessfinal][1] style which, let's admit it, works much better with type
classes. Therefore we ended up implementing extensible type representations
along with registries for our type classes. Unfortunately, the idea of
storing rules in the registry didn't visit us, but we're now thinking about
how to incorporate it (the classes that we have are very nontrivial,
usually having hundreds of methods, so we're currently using functors to
manually derive on class from another, and registering the resulting
structures  but using your approach we can register functors as well and
automate the derivation). We also didn't generalize the type class
instantiation, so our solutions do have some boilerplate (but I have to
admit, that the total number of type classes that we need is not very big,
so it really never bothered us). What could be surprising is that the
universe of types actually grew quite large, that large that the linear
search in the registry is not an option for us anymore. In fact, we have so
many comparisons between treps, that instead of extracting the extension
constructor number from an extensible variant we had to rely on our own
freshly generated identifier. But I'm running in front of myself, an
important lesson that we have learned is that treps should not only be
equality comparable but also ordered (and even hashable) so that we can
implement our registries as hash tables. It is also better to keep them
abstract so that we can later extend them without breaking user code (to
implement introspection as well as different resolution schemes). This is
basically an elaboration of your approach (which is also could be commonly
found in Janestreet's Core (Type_equal.Uid.t) and other implementations of
existentials). In our case, we ended up with the following implementation
```
type 'a witness = ..
module type Witness = sig
type t
type _ witness += Id : t witness
end
type 'a typeid = (module Witness with type t = 'a)
type 'a key = {
ord : int;
key : 'a typeid;
name : string; (* for introspection *)
show : 'a > Sexp.t; (* also for introspection *)
}
```
Now, we can use the `ord` field to order types, compare them, store in
maps, hash tables, and even arrays. E.g., this is how our `teq` function
looks like,
```
let same (type a b) x y : (a,b) Type_equal.t =
if x.id = y.id then
let module X = (val x.key : Witness with type t = a) in
let module Y = (val y.key : Witness with type t = b) in
match X.Id with
 Y.Id > Type_equal.T
 _ > failwith "broken type equality"
else failwith "types are not equal"
```
It is often used in the context where we already know that `x.id = y.id`,
e.g., when we already found an entry, so we just need to obtain the
equality witness (we use Janestreet's Type_equal.T, which is the same as
yours eq type).
Concerning the representation of the registry, we also experimented with
different approaches (since we have a few ways to make a type existential
in OCaml), and found out the following to be the most efficient and easy to
work with,
```
type ordered = {
order : 'a. 'a key > 'a > 'a > int;
} [@@unboxed]
```
Notice, that thanks to `[@@unboxed]` we got a free unpacked existential. We
will next store `ordered` in our registry, which is a hash table,
```
let ordered : ordered Hashtbl.M(Int).t = Hashtbl.create (module Int)
```
and register it as simple as,
```
let register: type p. p Key.t > (p > p > int) > unit = fun key order
>
Hashtbl.add_exn vtables ~key:(uid key) ~data:{
order = fun (type a) (k : a key) (x : a) (y : a) >
let T = same k key in (* obtain the witness that we found the right
structure *)
order x y
}
```
Instead of a hashtable, it is also possible to use `ordered array ref`
(since our `ord` is just an integer which we increment every time a new
class is declared). This will give us even faster lookup.
I hope that this was interesting. And if yes, I'm ready to elaborate more
on our design decision or to hear suggestions and critics. Here are a few
links:
 https://github.com/BinaryAnalysisPlatform/bap  the BAP project per se.
 https://binaryanalysisplatform.github.io/knowledgeintro1  a small
introductionary post about BAP 2.0 Knowledge representation

https://github.com/BinaryAnalysisPlatform/bap/blob/master/lib/knowledge/bap_knowledge.ml
 the implementation of the knowledge system

https://github.com/BinaryAnalysisPlatform/bap/tree/master/lib/bap_core_theory
 The Core Theory, an exemplar type class of the theory that we're
developing :)
Cheers,
Ivan Gotovchits
Research Scientist, CMU Cylab
[1]: http://okmij.org/ftp/taglessfinal/index.html
On Wed, Sep 4, 2019 at 11:23 AM Oleg wrote:
>
> This is to announce a simple, plain OCaml library to experiment with
> typeclass/implicits resolution, which can be thought of as evaluating
> a Prologlike program. One may allow `overlapping instances'  or
> prohibit them, insisting on uniqueness. One may make the search fully
> deterministic, fully nondeterministic, or something inbetween.
> There is an immediate, albeit modest practical benefit: the facility
> like "#install_printer", which was restricted to toplevel, is now
> available for all  as a small, selfcontained, plain OCaml library
> with no knowledge or dependencies on the compiler internals. We show
> an example at the end of the message.
>
> This message has been inspired by the remarkable paper
> Canonical Structures for the working Coq user
> Assia Mahboubi, Enrico Tassi
> DOI: 10.1007/9783642396342_5
> Its introduction is particularly insightful: the power of
> (mathematical) notation is in eliding distracting details. Yet to
> formally check a proof, or to run a program, the omitted has to be
> found. When pressed to fill in details, people `skillful in the art'
> look in the database of the `state of the art', with the context as
> the key. Computers can be programmed similarly; types well represent
> the needed context to guide the search.
>
> Mahboubi and Tassi's paper explains very well how this eliding and
> fillingin is realized, as a programmable unification, and used in
> Coq. Yet their insights go beyond Coq and deserve to be known better.
> This message and the accompanying code is to explain them in
> plain OCaml and encourage experimentation. It could have been titled
> `Canonical Structures for the working OCaml (meta) programmer'.
>
> The rudiment of canonical structures is already present in OCaml, in
> the form of the registry of printers for userdefined types. This
> facility is available only at the toplevel, however, and deeply
> intertwined with it. As a modest practical benefit, this facility is
> now available for all programs, as a plain, small, selfcontained
> library, with no compiler or other magic. The full potential of the
> method is realized however in (multi) staged programming. In fact, I'm
> planning to use it in the upcoming version of MetaOCaml to implement
> `lifting' from a value to the code that produces it  letting the
> users register lifting functions for their own data types.
>
>
> http://okmij.org/ftp/ML/canonical.ml
> The implementation and the examples, some of which are noted below.
> http://okmij.org/ftp/ML/trep.ml
> A generic library of type representations: something like
> Typeable in Haskell. Some day it may be builtin into the compiler
> http://okmij.org/ftp/ML/canonical_leadup.ml
> A wellcommented code that records the progressive development of
> canonical.ml. It is not part of the library, but may serve as
> its explanation.
>
> Here are a few examples, starting with the most trivial one
> module Show = MakeResolve(struct type 'a dict = 'a > string end)
> let () = Show.register Int string_of_int (* Define `instances' *)
> let () = Show.register Bool string_of_bool
> Show.find Int 1;;
>
> However contrived and flawed, it is instructive. Here (Int : int trep)
> is the value representing the type int. The type checker can certainly
> figure out that 1 is of the type int, and could potentially save us
> trouble writing Int explicitly. What the type checker cannot do by
> itself is to find out which function to use to convert an int to a
> string. After all, there are many of them. Show.register lets us
> register the *canonical* int>string function. Show.find is to search
> the database of such canonical functions: in effect, finding *the*
> evidence that the type int>string is populated. Keeping CurryHoward
> in mind, Show.find does a _proof search_.
>
> The type of Show.find is 'a trep > ('a > string). Compare with
> Haskell's show : Show a => a > String (or, desuraging => and Show)
> show : ('a > string) > ('a > string). Haskell's show indeed does
> not actually do anything: it is the identity function. All the hard
> work  finding out the right dictionary (the string producing
> function)  is done by the compiler. If one does not like the way the
> compiler goes about it  tough luck. There is little one may do save
> complaining on reddit. In contrast, the first argument of Show.find is
> trivial: it is a mere reflection of the type int, with no further
> information. Hence Show.find has to do a nontrivial work. In the
> case of int, this work is the straightforward database search 
> or, if you prefer, running the query ? dict(int,R) against a logic
> program
> dict(int,string_of_int).
> dict(bool,string_of_bool).
> The program becomes more interesting when it comes to pairs:
> dict(T,R) : T = pair(X,Y), !,
> dict(X,DX), dict(Y,DY), R=make_pair_dict(DX,DY).
> Here is how it is expressed in OCaml:
> let () =
> let open Show in
> let pat : type b. b trep > b rule_body option = function
>  Pair (x,y) >
> Some (Arg (x, fun dx > Arg (y, fun dy >
> Fact (fun (x,y) > "(" ^ dx x ^ "," ^ dy y ^ ")"))))
>  _ > None
> in register_rule {pat}
>
> let () = Show.register (Pair(Bool,Bool))
> (fun (x,y) > string_of_bool x ^ string_of_bool y)
>
> Our library permits `overlapping instances'. We hence registered the
> printer for generic pairs, and a particular printer just for pairs of
> booleans.
>
> The library is extensible with userdefined data types, for example:
> type my_fancy_datatype = Fancy of int * string * (int > string)
>
> After registering the type with trep library, and the printer
> type _ trep += MyFancy : my_fancy_datatype trep
> let () = Show.register MyFancy (function Fancy(x,y,_) >
> string_of_int x ^ "/" ^ y ^ "/" ^ "")
>
> one can print rather complex data with fancy, with no further ado:
> Show.find (List(List(Pair(MyFancy,Int)))) [[(Fancy ...,5)];[]]
>
> As Mahboubi and Tassi would say, proof synthesis at work!
>
> We should stress that what we have described is not a typeclass
> facility for OCaml. It is *meta* typeclass facility. Show.find has
> many drawbacks: we have to explicitly pass the trep argument like
> Int. The resolution happens at run time, and hence the failure of the
> resolution is a runtime exception. But the canonical instance
> resolution was intended to be a part of a type checker. There, the
> resolution failure is a type checking error. The trep argument,
> representing the type in the object program, is also at
> hand. Likewise, the drawbacks of Show.find disappear when we use the
> library in a metaprogram (code generator). The library then becomes a
> typeclass/implicits facility, for the generated code  the facility,
> we can easily (re)program.
>