From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by c5ff346549e7 (Postfix) with ESMTP id 59D555D4 for ; Tue, 10 Sep 2019 19:01:40 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.64,490,1559512800"; d="scan'208,217";a="401134365" Received: from sympa.inria.fr ([193.51.193.213]) by mail2-relais-roc.national.inria.fr with ESMTP; 10 Sep 2019 21:01:37 +0200 Received: by sympa.inria.fr (Postfix, from userid 20132) id 640CA7EF5F; Tue, 10 Sep 2019 21:01:37 +0200 (CEST) Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id C0EC57EF4C for ; Tue, 10 Sep 2019 21:01:26 +0200 (CEST) Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=ivg@ieee.org; spf=Pass smtp.mailfrom=ivg@ieee.org; spf=None smtp.helo=postmaster@mail-lj1-f193.google.com IronPort-PHdr: =?us-ascii?q?9a23=3AQ4Rd8BBkd6wHtDNwjQt2UyQJP3N1i/DPJgcQr6Af?= =?us-ascii?q?oPdwSPT5o8bcNUDSrc9gkEXOFd2Cra4d0KyO6+u5AT1Ioc7Y9ixbKtoUD15NoP?= =?us-ascii?q?5VtjRoONSCB0z/IayiRA0BN+MGamVY+WqmO1NeAsf0ag6aiHSz6TkPBke3blIt?= =?us-ascii?q?daz6FYHIksu4yf259YHNbAVUnjq9Zq55IAmroQnLucQanIVvJrwtxhfVrXdEZv?= =?us-ascii?q?hayGd1Ll6Xgxrw+9288ZF+/yhOof4t69JMXaDndKkkULJUCygrPXoo78PxrxnD?= =?us-ascii?q?SgWP5noYUmoIlxdDHhbI4hLnUJrvqyX2ruVy1jWUMs3wVrA0RC+t77x3Rx/yiS?= =?us-ascii?q?cILCA2/WfKgcFtlq1boRahpxtiw47IZYyeKfRzcr/Bcd4cWGFMWNtaWS5cDYOm?= =?us-ascii?q?d4YAAOQBMuRYoYfzpFUAsAWwChW3Cez11jNFnGX70bEm3+kjFwzNwQwuH8gJsH?= =?us-ascii?q?TRtNj7N7kSXvqzzKLVzDvDaO9W2TDj6IfUchAhoO2MXaltesfWyEkvCQzFg06R?= =?us-ascii?q?qYP7ITyayP4Bs2+B7+pvTO+ijXMspQ92ojiq3Mgsi4/Ji5oaylDF6SV5wJs1Ks?= =?us-ascii?q?aiREFnZt6kFZ1dvDyZOYtuWs4uXX1ktSIgxrAFuZO3ZjYGxIgkyhLFdvCKd4aF?= =?us-ascii?q?7xT+X+iLOzh4nmhqeLenihay70egzur8W9Gx0FlQrypFlsDAtnQP1xDO88SHRO?= =?us-ascii?q?Zx80Ov1DqV2ADT7eZEIU8wlaXFMZIu3rkwlp8LvUTCGC/5hln2gbeIekk4/uWk?= =?us-ascii?q?8efqb7X8qpOCK4N5iRvyPrkql8GxGeg4NxIBX2mf+eSyzr3j+kj5Ta1Ljv0ona?= =?us-ascii?q?nUq5HaKtoFqaGnGQNV1Zwj6xmnAze8zNsYhWUHLE5CeB+fk4fmIVTOIPThAfe7?= =?us-ascii?q?glSsiytryuvdPrzhB5XNNmLMnK3gfbZ78U5cyRA8wcpR55JOWfk9J6fUXkL+/I?= =?us-ascii?q?jfChI2Gwu3xuflTtJn2dVaEUORC6nRH6TOtkGD5uMzOKHYZZEakDfwJvVj4OTh?= =?us-ascii?q?2ztxklYYeeyt3IALICSzF/FiZkGYembEg9EbEG5MsBBoH8Lwj1jXcCBaYT6dWL?= =?us-ascii?q?47+Do7CZ69RdPCWI+FgbGM0WG8BJIANTMOMUyFDXq9L9bMYPwLci/HepY9wAxB?= =?us-ascii?q?bqCoTsoa7T/rrBXzkuI1L+fZ92sfr52xjIEotd2Wrgk78HlPN+rY02yJSDsqzG?= =?us-ascii?q?YBRjtz2KIm5EIkmwzF3q9/jPhVU9dU4qERC1ZoBdvn1+V/TuvKdEfEd9aNRkyh?= =?us-ascii?q?R4z+UzA8Q993xMUBMR9w?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0DYAwD08XddZ8HQVdFlHgEGBwaBZ4FpL?= =?us-ascii?q?21SMyqEIXAtjXaCD4lzkSYJAQMBDCcIAQGEPwKCSRsHAQQ0EwIMAQEEAQEBAgE?= =?us-ascii?q?DAwQBEwsLCwYphTcMgjoigm8BAQECARIRBBkBATAIBAsJAgs3AgIiEgEFARwGA?= =?us-ascii?q?RIbB4MBgXYFDw+MV48MgQM8ijFzfzOCfQEBBYEyAYRBgUAJEoEii3gYgUA/hCM?= =?us-ascii?q?+gT+BCRkBAQKBKyBBgl6CWIkmgyiIN5ZYboIrhwGFDYQVhFQbgjSWVo1/gTiFX?= =?us-ascii?q?m6RAw8jgUZlgRR9CDsxBoI1CYI5g3KCSoJKhVsnMAqNS4JTAQE?= X-IPAS-Result: =?us-ascii?q?A0DYAwD08XddZ8HQVdFlHgEGBwaBZ4FpL21SMyqEIXAtjXa?= =?us-ascii?q?CD4lzkSYJAQMBDCcIAQGEPwKCSRsHAQQ0EwIMAQEEAQEBAgEDAwQBEwsLCwYph?= =?us-ascii?q?TcMgjoigm8BAQECARIRBBkBATAIBAsJAgs3AgIiEgEFARwGARIbB4MBgXYFDw+?= =?us-ascii?q?MV48MgQM8ijFzfzOCfQEBBYEyAYRBgUAJEoEii3gYgUA/hCM+gT+BCRkBAQKBK?= =?us-ascii?q?yBBgl6CWIkmgyiIN5ZYboIrhwGFDYQVhFQbgjSWVo1/gTiFXm6RAw8jgUZlgRR?= =?us-ascii?q?9CDsxBoI1CYI5g3KCSoJKhVsnMAqNS4JTAQE?= X-IronPort-AV: E=Sophos;i="5.64,490,1559512800"; d="scan'208,217";a="318950398" X-MGA-submission: =?us-ascii?q?MDH+tPf/7YkwN2SqrHeePFnStFxIqjng7t6CaT?= =?us-ascii?q?Pn3WEJ5vtxPh1MDrcx0Ugr3QvENe9kPbfieH0xXwtjO0VjfsI3Sp5RUi?= =?us-ascii?q?9FaxaMYg3A5herYUvPndXKFDsQEUz3yAm6ut3XZHWFYxWsJExlSbRQ4k?= =?us-ascii?q?XRjzjukdPMmCbwNYSgNsQtcg=3D=3D?= Received: from mail-lj1-f193.google.com ([209.85.208.193]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 10 Sep 2019 21:01:22 +0200 Received: by mail-lj1-f193.google.com with SMTP id l20so17572469ljj.3 for ; Tue, 10 Sep 2019 12:01:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ieee.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=LISjQj4Z29GaAI+R0P38f7+kz8kawu/bJtB5KoAuYS4=; b=er5PP7WutDLecRvbmrnOY8cBMtPhknF9EYZy1/smI8X+AFtwlB29pr3sNgZALhQpa2 A5UgtkbXn3JK49MMtVI+nu5KZe6mC5uF//31UG+zjOea90yYgjLAQWvHPBnv3h8gLYa8 djinyKVd8XVIVTv0kd/F9xZTpWh2GFtoLmFs8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=LISjQj4Z29GaAI+R0P38f7+kz8kawu/bJtB5KoAuYS4=; b=ocW2rcq2qLvP1dCfbSNM+ZO1+US4MzOtkKkxvyOkvXME2Vp8Uj8zti8voVtcrcBYym Uw60zsHTJF19zpVfT9bNBw6ZyFTEcxLtq895yrjE6WWtu70ELHvNUxSapeoJ2kLxgz1t lKxrgmCcbJVFQCy9rO9J4dsu3rAlWPSZhEOnJNA+T1atj3Yc+yXc2qAHAWulAiwTWjOt Fnsy8Yyuom4dXfdWzKtmhQg/scN5dPMiOkwiWamE5a1UfgkxHA+m86VmiAmbhZGs1ISF HUWudv2JxL5eBpys0Xej+nXXE5LCQ1ZuQw3vxXHoEEt39x38eLwAlzGghD7whf+zdDDR /JIA== X-Gm-Message-State: APjAAAUI51yMyXHmwVjFRVZYx6Y14398iUbqWm8LWXVd3s9NjfP/A+kC tQdCBz3H5DEinKFPKoIMmKzqCg9v8SjSmLF+VcN+Jw== X-Google-Smtp-Source: APXvYqxF6KLhrZs+2gSRLyWLiqvdaagOV8jaY9DRxNmopELLHRb/78pJSPS1c3R0qYFbaS04bmz4G0Hrza5+gzFWJUI= X-Received: by 2002:a2e:850b:: with SMTP id j11mr184916lji.147.1568142081489; Tue, 10 Sep 2019 12:01:21 -0700 (PDT) MIME-Version: 1.0 References: <20190910144016.GA1725@Melchior.localnet> In-Reply-To: <20190910144016.GA1725@Melchior.localnet> From: Ivan Gotovchits Date: Tue, 10 Sep 2019 15:03:15 -0400 Message-ID: To: Oleg , Ivan Gotovchits , caml-list Content-Type: multipart/alternative; boundary="000000000000b99ed505923787b8" Subject: Re: [Caml-list] Type-indexed heterogeneous collections (Was: Implicits for the masses) Reply-To: Ivan Gotovchits X-Loop: caml-list@inria.fr X-Sequence: 17799 Errors-to: caml-list-owner@inria.fr Precedence: list Precedence: bulk Sender: caml-list-request@inria.fr X-no-archive: yes List-Id: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --000000000000b99ed505923787b8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Sep 10, 2019 at 10:38 AM Oleg wrote: > > > https://github.com/BinaryAnalysisPlatform/bap - the BAP project per se. > > ... > That is quite a project! And interesting, too. Thank you for letting > me know. Now I understand what you mean by ``universe of types > actually grew quite large, that large that the linear search in the > registry is not an option for us anymore.'' > > What your message has made it very clear, to me, is that we really > need the standard type representation library -- hopefully being part > of Stdlib and eventually integrated with the compiler. For one, it is > high time we standardized the spelling of the ('a,'b) eq type and its > constructor. Mainly, as your message has well demonstrated, an > efficient trep is actually non-trivial, and takes lots of experience > to design. Incidentally, the situation on Haskell is very similar: > Typeable is now well integrated with the compiler. It is quite a > popular library. > Definitely! So far the closest to what could be called a standard representation is the Base.Type_equal [1] module (and its extensions [2]). But unfortunately, it comes with strings attached and that hampers its adoption in other projects. Basically, we need at least to put the data constructor in a separate library (as it was done for the `result` type previously). Going beyond just defining the data type would require some consensus. A good example would be deciding how to implement the total order relation on type indices. The Base Library is using `Obj.extension_constructor` which has a few problems: 1) It is in the Obj module with all the ramifications 2) The implementation is inefficient and overly cautious (so cautious that it doesn't work with the Ancient module, for example) 3) The corresponding index is too global, not only shared by all extensible variants and exceptions but every new object increments the identifier (i.e., it is the same global counter which is used by `Oo.id`, which is probably an error which should be fixed in the compiler), therefore there is a chance to overflow it. That's why we decided to implement our own indexing, for example. [1]: https://github.com/janestreet/base/blob/master/src/type_equal.mli [2]: https://github.com/janestreet/typerep > > I'd also like to draw attention to truly type-indexed heterogeneous > collections where trep is a part of a key rather than a part of a > value. I like to treat such structures as a reification of OCaml records so that trep is now a first-class field and a record is an n-tuple with named fields. Since the fields are named, the n-tuple can grow and shrink, i.e., the size is not fixed. We can implement the n-tuple using heterogeneous maps, hashtables, or just an assoc list. So far we had a few libraries which provided heterogeneous maps indexed by the type index, the `Univ_map` [3] from Janestreet and Hmap [4] from Daniel B=C3=BCnzli, to name a few. Both libraries follow the same approach, namely parametrizing the existing non-heterogeneous Map functor with the order function induced by the age of a key, with some minor differences, e.g., Base's version is using the int type as the key (where the key denotes the extension constructor identifier), while Daniel is using the trep itself as the key, and extracting the order from the age of the key. However, though semantically data is indeed indexed by the trpep, syntactically it is represented with an extra layer of indirection, e.g., instead of having a single binding type 'key binding =3D Binding : 'a trep * 'a -> 'key binding we have a non-heterogeneous binding type ('a,'b) binding =3D Binding of 'a * 'b which packs a universal something like this as `Binding (trep,Pack (trep,x))`. This creates a lot of overhead memory-wise and (given that OCaml performance is usually conditioned by the allocation rate) performance-wise. OCaml AVL-trees are already having a lot of overhead, about 5 extra words per bindings (Base has a little bit less, but still a lot - about 3 words per binding). Therefore, (after lots of profiling) we actually ended up implementing heterogenous AVL trees, which doesn't suffer from this problem. We ended up with the following representation [5] ``` type record =3D | T0 | T1 : 'a key * 'a -> record | T2 : 'a key * 'a * 'b key * 'b -> record | T3 : 'a key * 'a * 'b key * 'b * 'c key * 'c -> record | T4 : 'a key * 'a * 'b key * 'b * 'c key * 'c * 'd key * 'd -> record | LL : record * 'a key * 'a * record -> record (* h(x) =3D h(y) - 1 *) | EQ : record * 'a key * 'a * record -> record (* h(x) =3D h(y) *) | LR : record * 'a key * 'a * record -> record (* h(x) =3D h(y) + 1 *) ``` Which stores the key and data directly on the branches of a tree without any unnecessary indirection. And also unfolds n-tuples with n < 5, an implementation detail which is not really important wrt to the current discussion, (but was a dealbreaker for us as it significantly reduced the memory consumption and the total number of rebalances, so that at the end of the day we got a factor of 5 improvements both in time and space). Another interesting feature is that if implemented as a functor, this implementation could be used to derive non-heterogeneous maps (e.g., when `'a key =3D int`). During the implementation, we had to find solutions to a few very interesting problems, like how to represent an existential folding function (or how to fold over heterogeneous map)[6]. Or more interesting, how to update keys and merge two maps [7] and do this efficiently (hint: you have to use CPS). [3]: https://github.com/janestreet/core_kernel/blob/master/src/univ_map.ml [4]: https://github.com/dbuenzli/hmap/blob/master/src/hmap.ml [5]: https://github.com/BinaryAnalysisPlatform/bap/blob/0352edfe42ba5560b178b6bf= 990cf15e81d78f57/lib/knowledge/bap_knowledge.ml#L766 [6]: https://github.com/BinaryAnalysisPlatform/bap/blob/0352edfe42ba5560b178b6bf= 990cf15e81d78f57/lib/knowledge/bap_knowledge.ml#L824 [7]: https://github.com/BinaryAnalysisPlatform/bap/blob/0352edfe42ba5560b178b6bf= 990cf15e81d78f57/lib/knowledge/bap_knowledge.ml#L1316 Treating heterogenous maps as a generalization of records brings us to another interesting development. We do not want to completely forfeit types of records and treat all records as a member of one universal record type. We would like to be able to define type schemes so that we can distinguish between a `student record` and a `course record` while keeping both universal and extensible. In BAP we implemented this using classes and slots (borrowing those ideas from frame languages of AI, any resemblance to OO is purely coincidental). We turn our untyped `record` into a value parameterized by its class, e.g., ``` type 'a value =3D { cls : 'a cls; data : record; } ``` The `'a cls` type could be phantom, e.g., `type 'a cls =3D 'a`, but we can benefit from storing some type information to implement runtime reflection. And an extra layer of indirection on top of treps enables the structural definition of a record type that can span across multiple modules: ``` type ('a,'p) slot =3D { cls : 'a cls; key : 'p key; } ``` so that now a class is defined by its slots, e.g., ``` val name : (student, string) slot val age : (student, age) slot ... val teacher : (course, teacher) slot ``` and an abstract interface of the Value will prevent us from confusing different fields: ``` module Value : sig type 'a t val get : ('a,'p) slot -> 'a t -> 'p val put : ('a,'p) slot -> 'a t -> p -> 'a t end ``` The real implementation in BAP [8], is a little bit more convoluted since we want to create hierarchies of classes (which we need for type-safe representation of their denotational semantics) and make classes indexable by runtime values. Therefore our class is actually a collection of sorts. And we also have triggers (i.e., computable fields), as well as we use domains (aka Scott-Ershov domains) to represent the properties of classes, which gives us a nice feature - our `Value.get` function is total as for each type of property we have the bottom value as the default. Finally, we can store our values in a database and hide this storage under a monad interface, so that we can implement different access schemes (and put a fixed point computation inside), but this is a completely different story. The takeaway is that OCaml as a language is now fully equipped for implementing efficient heterogeneous data structures and can be used in domains which previously were previously dominated by languages with dynamic typing. Besides, the Knowledge library is totally independent on BAP despite the fact that it is currently distributed from the BAP repository. We've tried not to leak any domain-specific details into the knowledge abstraction albeit keeping it versatile enough to be able to represent the knowledge representation of the program semantics. [8]: https://github.com/BinaryAnalysisPlatform/bap/blob/master/lib/knowledge/bap= _knowledge.mli#L153 Cheers, Ivan Gotovchits --000000000000b99ed505923787b8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Sep 10, 2019 at 10:38 AM Oleg= <oleg@okmij.org= > wrote:

> https://github.com/BinaryAnalysisPlatform/bap = - the BAP project per se.
> ...
That is quite a project! And interesting, too. Thank you for letting
me know. Now I understand what you mean by ``universe of types
actually grew quite large, that large that the linear search in the
registry is not an option for us anymore.''

What your message has made it very clear, to me, is that we really
need the standard type representation library -- hopefully being part
of Stdlib and eventually integrated with the compiler. For one, it is
high time we standardized the spelling of the ('a,'b) eq type and i= ts
constructor. Mainly, as your message has well demonstrated, an
efficient trep is actually non-trivial, and takes lots of experience
to design. Incidentally, the situation on Haskell is very similar:
Typeable is now well integrated with the compiler. It is quite a
popular library.

Definitely! So far the= closest to what could be called a standard representation is the Base.Type= _equal [1] module (and its extensions [2]). But unfortunately, it comes wit= h strings attached and that hampers its adoption in other projects.=C2=A0
Basically, we need at least to put the data constructor in a separ= ate library (as it was done for the `result` type previously). Going beyond= just defining the data type would require some consensus. A good example
would be deciding how to implement the total order relation on typ= e indices. The Base Library is using `Obj.extension_constructor` which has = a few problems:
1) It is in the Obj module with all the ramificat= ions
2) The implementation is inefficient and overly cautious (so= cautious that it doesn't work with the Ancient module, for example)
3) The corresponding index is too global, not only shared by all ex= tensible variants and exceptions but every new object increments the identi= fier (i.e., it is the same global counter which is used by `Oo.id`,=C2=A0
which is probably an error which should be fixed in the compiler),= therefore there is a chance to overflow it.=C2=A0

That's why we decided to implement our own indexing, for example.=C2= =A0

=C2=A0

I'd also like to draw attention to truly type-indexed heterogeneous
collections where trep is a part of a key rather than a part of a
value.

I like to treat such structures as = a reification of OCaml records so that trep is now a first-class field and = a record is an n-tuple with named fields. Since the fields are named, the n= -tuple can grow and shrink, i.e., the size is not fixed. We can implement t= he n-tuple using heterogeneous maps, hashtables, or just an assoc list. So = far we had a few libraries which provided heterogeneous maps indexed by the= type index, the `Univ_map` [3] from Janestreet and Hmap [4] from Daniel B= =C3=BCnzli, to name a few.

Both libraries follow t= he same approach, namely parametrizing the existing non-heterogeneous Map f= unctor with the order function induced by the age of a key, with some minor= differences, e.g., Base's version is using the int type as the key (wh= ere the key denotes the extension constructor identifier), while Daniel is = using the trep itself as the key, and extracting the order from the age of = the key.
However, though semantically data is indeed indexed by t= he trpep, syntactically it is represented with an extra layer of indirectio= n, e.g., instead of having a single binding=C2=A0

= =C2=A0 =C2=A0 =C2=A0type 'key binding =3D Binding : 'a trep * '= a -> 'key binding

we have a non-heterog= eneous binding

=C2=A0 =C2=A0 type ('a,'b) = binding =3D Binding of 'a * 'b
=C2=A0 =C2=A0=C2=A0
<= div>which packs a universal something like this as `Binding (trep,Pack (tre= p,x))`.

This creates a lot of overhead memory-wise= and (given that OCaml performance is usually conditioned by the allocation= rate) performance-wise. OCaml AVL-trees are already having a lot of overhe= ad, about 5 extra words per bindings (Base has a little bit less, but still= a lot - about 3 words per binding). Therefore, (after lots of profiling) w= e actually ended up implementing heterogenous AVL trees, which doesn't = suffer from this problem. We ended up with the following representation [5]=
```
=C2=A0 type record =3D
=C2=A0 =C2=A0= | T0
=C2=A0 =C2=A0 | T1 : 'a key * 'a -> record
=C2=A0 = =C2=A0 | T2 : 'a key * 'a *
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0'b key * 'b -> record
=C2=A0 =C2=A0 | T3 : 'a key *= 'a *
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0'b key * 'b *=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0'c key * 'c -> reco= rd
=C2=A0 =C2=A0 | T4 : 'a key * 'a *
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0'b key * 'b *
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0'c key * 'c *
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0'd key * 'd -> record
=C2=A0 =C2=A0 | LL : record * = 9;a key * 'a * record -> record (* h(x) =3D h(y) - 1 *)
=C2=A0 = =C2=A0 | EQ : record * 'a key * 'a * record -> record (* h(x) = =3D h(y) *)
=C2=A0 =C2=A0 | LR : record * 'a key * 'a * record -= > record (* h(x) =3D h(y) + 1 *)
```

<= div>Which stores the key and data directly on the branches of a tree withou= t any unnecessary indirection. And also unfolds n-tuples with n < 5, an = implementation detail which is not really important wrt to the current disc= ussion, (but was a dealbreaker for us as it significantly reduced the memor= y consumption and the total number of rebalances, so that at the end of the= day we got a factor of 5 improvements both in time and space).
<= br>
=C2=A0Another interesting feature is that if implemented as a= functor, this implementation could be used to derive non-heterogeneous map= s (e.g., when `'a key =3D int`).

During the im= plementation, we had to find solutions to a few very interesting problems, = like how to represent an existential folding function (or how to fold over = heterogeneous map)[6]. Or more interesting, how to update keys and merge tw= o maps [7] and do this efficiently (hint: you have to use CPS).
<= br>

[4]:=C2=A0https://github.com/dbuenzli/hmap/blob/master/src/hm= ap.ml

Trea= ting heterogenous maps as a generalization of records brings us to another = interesting development. We do not want to completely forfeit types of reco= rds and treat all records as a member of one universal record type. We woul= d like to be able to define type schemes so that we can distinguish between= a `student record` and a `course record` while keeping both universal and = extensible. In BAP we implemented this using classes and slots (borrowing t= hose ideas from frame languages of AI, any resemblance to OO is purely coin= cidental). We turn our untyped `record` into a value parameterized by its c= lass, e.g.,

```
type 'a value =3D {<= /div>
=C2=A0 =C2=A0 =C2=A0cls : 'a cls;
=C2=A0 =C2= =A0 =C2=A0data : record;
}
```

The `'a cls` type could be phantom, e.g., `type 'a cls =3D 'a`= , but we can benefit from storing some type information to implement runtim= e reflection. And an extra layer of indirection on top of treps enables the= structural definition of a record type that can span across multiple modul= es:
```
type ('a,'p) slot =3D {
=C2= =A0 =C2=A0cls : 'a cls;
=C2=A0 =C2=A0key : 'p key;
<= div>}
```

so that now a class is defined= by its slots, e.g.,

```
val name : (stu= dent, string) slot
val age : (student, age) slot
...
val teacher : (course, teacher) slot
```

and an abstract interface of the Value will prevent us from confus= ing different fields:
```
module Value : sig=C2=A0
=C2=A0 type 'a t
=C2=A0 val get : ('a,'p) slot = -> 'a t -> 'p
=C2=A0 val put : ('a,'p) slot= -> 'a t -> p -> 'a t
end
```

The real implementation in BAP [8], is a little bit more c= onvoluted since we want to create hierarchies of classes (which we need for= type-safe representation of their denotational semantics) and make classes= indexable by runtime values. Therefore our class is actually a collection = of sorts. And we also have triggers (i.e., computable fields), as well as w= e use domains (aka Scott-Ershov domains) to represent the properties of cla= sses, which gives us a nice feature - our `Value.get` function is total as = for each type of property we have the bottom value as the default.=C2=A0

Finally, we can store our values in a database and h= ide this storage under a monad interface, so that we can implement differen= t access schemes (and put a fixed point computation inside), but this is a = completely different story. The takeaway is that OCaml as a language is now= fully equipped for implementing efficient heterogeneous data structures an= d can be used in domains which previously were previously dominated by lang= uages with dynamic typing.=C2=A0

Besides, the Know= ledge library is totally independent on BAP despite the fact that it is cur= rently distributed from the BAP repository. We've tried not to leak any= domain-specific details into the knowledge abstraction albeit keeping it v= ersatile enough to be able to represent the knowledge representation of the= program semantics.=C2=A0


Cheers,<= /div>
Ivan Gotovchits

--000000000000b99ed505923787b8--