From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 9742481798 for ; Thu, 11 Jul 2013 16:17:37 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=pra; client-ip=212.227.15.14; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=mailfrom; client-ip=212.227.15.14; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.web.de) identity=helo; client-ip=212.227.15.14; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="postmaster@mout.web.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq8BAJid3lHU4w8Om2dsb2JhbABawCeFMYEGFg4BAQEBAQYLCwkUKIIjAQEEAScTRAsLGAklDwUoiDABDAquFB+JCY9ogwlsA5QDg1aGEY5N X-IPAS-Result: Aq8BAJid3lHU4w8Om2dsb2JhbABawCeFMYEGFg4BAQEBAQYLCwkUKIIjAQEEAScTRAsLGAklDwUoiDABDAquFB+JCY9ogwlsA5QDg1aGEY5N X-IronPort-AV: E=Sophos;i="4.87,1043,1363129200"; d="scan'208";a="25644984" Received: from mout.web.de ([212.227.15.14]) by mail2-smtp-roc.national.inria.fr with ESMTP; 11 Jul 2013 16:17:37 +0200 Received: from frosties.localnet ([95.208.119.3]) by smtp.web.de (mrweb001) with ESMTPSA (Nemesis) id 0M5Oql-1TyPfz2L2V-00zaqp for ; Thu, 11 Jul 2013 16:17:36 +0200 Received: from mrvn by frosties.localnet with local (Exim 4.80) (envelope-from ) id 1UxHgd-0007Ac-Rf for caml-list@inria.fr; Thu, 11 Jul 2013 16:17:35 +0200 Date: Thu, 11 Jul 2013 16:17:35 +0200 From: Goswin von Brederlow To: caml-list@inria.fr Message-ID: <20130711141735.GC26521@frosties> References: <20130709204312.GA30194@frosties> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Provags-ID: V03:K0:8DEX8NZk30wBE/q7FxhVHD0s0sbNm1Hp5hulfRk/tLNXvsd/x4R N07pukSZQmPiG5so3fUQts9AhFZDB+1HpxPFOw3XfrPI40OB1uwh2aYgTbyZ6vreDq7BkG7 UpgfMsv2T81FD/Z2LuMWbTlR2DJfho98iIcO2sJ1ErNPNroCOovXkUXEmGVbJhGAP1EuaGw o1ZFOzluyX3yqUgl/iV+Q== Subject: Re: [Caml-list] GADTs and associative container On Wed, Jul 10, 2013 at 03:22:30AM +0100, Jeremy Yallop wrote: > It is indeed possible to create associative containers where the types > of the values depend on the types of the keys. Let's see what can to > be done to turn the standard hash table into such a container, using > GADTs for keys. > > We'll start with the interface. The standard hash table interface > (Hashtbl.S) looks like this, in part: > > module type S = > sig > type key > type 'a t > val create : int -> 'a t > val add : 'a t -> key -> 'a -> unit > val remove : 'a t -> key -> unit > val find : 'a t -> key -> 'a > val iter : (key -> 'a -> unit) -> 'a t -> unit > end Ok, so you are going with the FUNCTOR interface. I understand that allows for a higher order of type parameters. Right? I think that's where I got stuck with trying to use the simpler non-functor Hashtbl. > The type of tables ('a t) is parameterized by the type of values, > since each table holds a single type of value. We're aiming instead > to have value types depend on key types, so we'll move the type > parameter into the key type. Making this change mechanically > throughout the interface gives us the following: > > module type GS = > sig > type 'a key > type t > val create : int -> t > val add : t -> 'a key -> 'a -> unit > val remove : t -> 'a key -> unit > val find : t -> 'a key -> 'a > val iter : < f: 'a. 'a key -> 'a -> unit > -> t -> unit > end Can't GADT using functions be passed as arguments without stuffing them into a class object? > Actually, I've made one additional change, in the type of iter. In > the regular Hashtbl iter function we can get by with ML-style > polymorphism, where all the type variables are implicitly quantified > at the outermost point. This constrains the function passed to iter > to be monomorphic, which is fine, since regular Hashtbls only support > a single value type. In our revised interface, however, the function > argument must be polymorphic, since it needs to handle *any* suitable > pairing of keys and values. The object type allows quantifier > nesting, giving us the polymorphism we need. > > The type of iter is a hint of things to come: putting things *into* a > polymorphic hash table is a doddle, but there's a bit of a knack to > getting them *out* again intact, as we'll see further down. > > Next up is the definition of keys. The standard Hashtbl.Make functor > uses a definition of keys that bundles the key type together with > equality and hashing operations, like this: > > module type HashedType = > sig > type t > val equal : t -> t -> bool > val hash : t -> int > end > > Of course, it's no good having just any old definitions of equal and > hash. It's essential that equal l r implies hash l = hash r, for > example, and there are additional fairly obvious constraints on equal. > > Our analogue to HashedType, GHashedType, comes with some additional > operations (and so places additional demands on the creator of hash > tables). The first part of the signature looks essentially the same > as HashedType: we've added a parameter to the key type, but it's not > used as yet, so we can replace it with the don't-care underscore. > (Note that this means that our equality is heterogeneous, happy to > accept keys of disparate types.) The remainder of the signature deals > with packing up key-value pairs into existential boxes, and attempting > to get them out again; this will allow us to store different types of > key in a single table in our implementation. > > module type GHashedType = > sig > type _ key > val equal : _ key -> _ key -> bool > val hash : _ key -> int > > type pair = Pair : 'a key * 'a -> pair > val unpack : 'a key -> pair -> 'a > end > > As with HashedType there are requirements not captured in the types. > In particular, we'd like unpack k (Pair (k', v)) = v whenever equal k > k' is true. > > Time for the implementation. This is mostly straightforward: after > some preliminaries (mostly about hiding the type parameter in keys by > boxing them up appropriately), there are two functions (add and > remove) that put keys and values in boxes and store them in a > monomorphic table, and two functions (find and iter) that unbox keys > and values to recover the parameterization. > > module GHashtbl (G : GHashedType) : > GS with type 'a key = 'a G.key = > struct > include G > type k = Key : 'a key -> k > module H = Hashtbl.Make(struct > type t = k > let hash (Key k) = hash k > let equal (Key l) (Key r) = equal l r > end) > > type t = pair H.t > > let create n = H.create n > > let add tbl k v = H.add tbl (Key k) (Pair (k, v)) > > let remove tbl k = H.remove tbl (Key k) > > let find tbl key = unpack key (H.find tbl (Key key)) > > let iter (f : 'a -> unit>) tbl = > H.iter (fun _ (Pair (k, v)) -> f#f k v) tbl > end > > As is often the case, the unboxing is the interesting part. The find > function reveals why we introduced the unpack operation for keys (and > hence the pair type, which could otherwise have been hidden away in > the body of GHashtbl), and shows the secondary purpose of keys as > unboxers of values. The iter function makes use of the polymorphism > that we introduced in its signature earlier; when we unbox a pair we > have no idea how to instantiate the type variable in the contents, so > it's just as well we have a function to hand (f#f) that's polymorphic > enough to handle any possible instantiation. > > Time to try it out. Here's a sample implementation of GHashedType > that associates ints with int lists (which we'll use to store prime > factors) and strings with bools (which we'll use to indicate > capitalization). > > module KeyType (* : GHashedType *) = > struct > type _ key = > Int : int -> int list key > | String : string -> bool key > > let equal : type a b. a key -> b key -> bool = > fun l r -> match l, r with > | Int x, Int y -> x = y > | String x, String y -> x = y > | _ -> false > > let hash = Hashtbl.hash > > type pair = Pair : 'a key * 'a -> pair > > let rec unpack : type a. a key -> pair -> a = > fun k p -> match k, p with > | Int _, Pair (Int _, v) -> v > | String _, Pair (String _, v) -> v > | _ -> raise Not_found > end This boxing and unboxing feels wrong. It's wastefull because you create dummy objects for the Pair. The key part is a repetition of what is already the key in the table. Only the value is actualy new information. Secondly I don't want to have to write the unpacking function for every type. As I see it you are replacing a dependent type with a runtime check for type equality here. Idealy the type system should see that the "Not_found" part in unpack isn't even be possible. The compiler should output code that simply returns v without any checks. So I still wonder if one couldn't build a (better) associative container from scratch that supports GADTs to describe the type dependency between the key and the value without that custom unpack function. How much of the above is just there to make it work with the standard polymorphic Hahstbl.t? MfG Goswin