caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Oleg <oleg@okmij.org>
To: ivg@ieee.org
Cc: caml-list@inria.fr
Subject: [Caml-list] Type-indexed heterogeneous collections (Was: Implicits for the masses)
Date: Tue, 10 Sep 2019 23:40:16 +0900	[thread overview]
Message-ID: <20190910144016.GA1725@Melchior.localnet> (raw)
In-Reply-To: <CALdWJ+wpwafYOddNYhTFY5Zz02k4GcWLBmZLGkekuJSMjrdd6Q@mail.gmail.com>


> https://github.com/BinaryAnalysisPlatform/bap - the BAP project per se.
> ...
That is quite a project! And interesting, too. Thank you for letting
me know. Now I understand what you mean by ``universe of types
actually grew quite large, that large that the linear search in the
registry is not an option for us anymore.''

What your message has made it very clear, to me, is that we really
need the standard type representation library -- hopefully being part
of Stdlib and eventually integrated with the compiler. For one, it is
high time we standardized the spelling of the ('a,'b) eq type and its
constructor. Mainly, as your message has well demonstrated, an
efficient trep is actually non-trivial, and takes lots of experience
to design. Incidentally, the situation on Haskell is very similar:
Typeable is now well integrated with the compiler. It is quite a
popular library.

I'd also like to draw attention to truly type-indexed heterogeneous
collections where trep is a part of a key rather than a part of a
value. To explain what I mean, let's recall the state of the art of
heterogeneous collections. Let ('key,'v) coll be an ordinary
homogeneous collection (a map, a hashtable, or an associative list)
with a look-up function
        lookup : 'key -> ('key,'v) coll -> 'v option
We can turn it into a heterogeneous collection that stores data of
various types by using dynamics dyn: an existential that contains the
data of some type t along with t trep, the representation of the type t.
You have showed a very efficient dyn (which I will keep in mind, thank
you). The dyn extraction
        from_dyn_opt : 'a trep -> dyn -> 'a option
checks if the trep given as an argument matches trep stored inside
dyn, and if so, returns dyn's value, of the requested type. Related
        from_dyn : 'a trep -> dyn -> 'a
throws an exception on mismatch. The
heterogeneous collection is then ('key, dyn) coll, with the
look-up operation that is a `composition' of lookup and from_dyn

        let lookup_het : 'key -> 'a trep -> ('key,dyn) coll -> 'a option =
          fun key trep coll ->
            match lookup key coll with
            | None -> None
            | Some dyn -> Some (from_dyn trep dyn)

I grant the elegance of the implementation and the reuse of the
existing collections. What bothers me however is that the 'a trep is
a part of the element value (part of dyn) rather a part of the
key. The real lookup is done solely on the key, and the
trep comparison is used as a verification, just to make sure we found
the right dyn. If the code is correct, the check does not
fail. Somehow I am always bothered by operations whose results isn't
actually used but still have to be done. Wouldn't it be nice if all
operations advanced us towards the goal, with no need for looking back
and double-checking.

Heterogeneous collections, at least, could be designed that way. Here
is the gist:
        type 'key binding = B : 'key * 'a trep * 'a -> 'key binding
(a more efficient existential could be designed, following the ideas
of your message.) The binding is what its name says: an association of
the key, the trep, and the value. The simplest heterogeneous
collection, the analogue of association lists, is a list of bindings.
        type 'key halist = 'key binding list

let lookup_hetkey : type a key. key -> a trep -> key halist -> a option =
  fun key trep coll ->
    let rec loop : key halist -> a option = function
      | [] -> None
      | B (key',trep',v) :: t -> match (key=key',teq trep' trep) with
        | (true,Some Refl) -> Some v
        | _                -> loop t
    in loop coll

Now both key and trep equally participate in search: the mismatch of
trep and trep' is not a failure but an indicator to continue search.
The clear drawback is that the shown code uses a linear search -- and
we cannot use off-the-shelf Maps or hashtables to speed it up. Still,
an adaptation of an extant Hashtable is rather straightforward. In
fact, the following file

        http://okmij.org/ftp/ML/hetcoll.ml

shows the implementation of type-indexed heterogeneous hash tables. It
is almost the same as Stdlib.Hashtable, but stripped to bare minimum
and slightly adjusted. (The biggest adjustment has nothing to do with
types; I simply didn't like how resizing was implemented). 

module type TypedHashedType =
  sig
    type t                              (* key type *)
    val equal: t -> t -> bool           (* key equality *)
    val hash:  t -> 'a trep -> int      (* Hash the key and trep *)
  end
module type S =
  sig
    type key
    type t
    val create: int -> t
    val clear : t -> unit
    val add: t -> key -> 'a trep -> 'a -> unit
    val find_opt: t -> key -> 'a trep -> 'a option
    val length: t -> int
  end

Although the interface is very minimal, it is enough for registering
your types and classes and my canonical functions. Stdlib.Map could
likewise be adjusted.


  reply	other threads:[~2019-09-10 14:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 15:25 [Caml-list] Implicits for the masses Oleg
2019-09-04 20:41 ` Ivan Gotovchits
2019-09-10 14:40   ` Oleg [this message]
2019-09-10 19:03     ` [Caml-list] Type-indexed heterogeneous collections (Was: Implicits for the masses) Ivan Gotovchits

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190910144016.GA1725@Melchior.localnet \
    --to=oleg@okmij.org \
    --cc=caml-list@inria.fr \
    --cc=ivg@ieee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).