From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 57DC3BB81 for ; Sun, 27 Nov 2005 16:35:08 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jARFZ78F016538 for ; Sun, 27 Nov 2005 16:35:07 +0100 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id QAA24375 for ; Sun, 27 Nov 2005 16:35:07 +0100 (MET) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.197]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jARFZ6mA008452 for ; Sun, 27 Nov 2005 16:35:06 +0100 Received: by wproxy.gmail.com with SMTP id i3so2011586wra for ; Sun, 27 Nov 2005 07:35:06 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=pEbILzDbXY+s0CMszdTnwAQndkJZtsIfSWpUkydX8fq56ya+yciIEw16iXnnxis+u1SM0RhyPzMHwQ3OXNBlFCX7pn0YdEwAxzpd8oftsly9Yv1ShxFgIYkBtJcIYQvJNf0j6Mrvf/kNYjBqB1oa/EnF/LHdrr5xVJVX+iB1pm8= Received: by 10.65.214.4 with SMTP id r4mr482002qbq; Sun, 27 Nov 2005 07:35:05 -0800 (PST) Received: by 10.65.192.4 with HTTP; Sun, 27 Nov 2005 07:35:05 -0800 (PST) Message-ID: Date: Sun, 27 Nov 2005 10:35:05 -0500 From: "Michael D. Adams" To: caml-list@inria.fr Subject: Re: [Caml-list] Efficency of varient types In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <4387ACC9.2040107@motion-twin.com> X-Miltered: at concorde with ID 4389D22B.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 4389D22A.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocaml:01 bool:01 compiler:01 unboxed:01 compiler:01 variants:01 bool:01 variants:01 unboxed:01 pointer:01 char:01 verbose:01 pointer:01 pointers:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_BY_IP autolearn=disabled version=3.0.3 On 11/27/05, Brian Hurt wrote: > On Sun, 27 Nov 2005, Michael D. Adams wrote: > > I agree, which is why it was my hope that OCaml might do some of that > > for me. Consider a home brew bool type, "type mybool =3D Mytrue | > > Myfalse". If the compiler were smart enough, it could represent that > > as an unboxed type. > > The compiler is smart enough to do that already- variants without > associated data are represented as ints. I am glad to hear that. > > From there it might be a small step to > > semi-unboxed types such as the one I started this discussion with, > > "type value =3D Int of int | Bool of bool | String of string". > > The problem here is that the variants take two words- one word which says > which variant it is, and the second word which is the unboxed int, unboxe= d > bool, or pointer to the boxed string. I should clarify what I am proposing because it also only requires one word. The bottom one or two bits of that word would say which variant it is. The remaining bits representing any data with that variant.=20 (Obviously this would only work if the associated data is small enough, but fortunately most of the primitives are small enough (e.g. int (31 bits), bool (1 bit), char (8 bits).)) (Please forgive me if the following is verbose, I want to make sure my idea is clear. In the scheme world I think it's called "pointer tagging" or "tagged pointers".) Consider "type value =3D Int of int | Bool of bool | None | String of string". For the moment I will assume that the GC only assumes 4 byte aligned values are pointers. Since int is 31 bits that leaves one bit to say what type it is. We don't want the GC grabbing it, so we will represent it with a single '1' bit appended. A 6 would be represented by 1101 (110 for the 6 and 1 for the "tag"). This means that "Int 6" only takes one word. Let's look at "Bool of bool". We only need one bit of representation for that, leaving 31 bits free for tagging. We can represent it by appending a three bit tag of '010' to the regular bool value. So true is '1010' and false is '0010'. (NB: this depends on the GC ignoring non-4-byte-aligned values.) Again "Bool true" and "Bool false" would only be one word. The "None" case can use any free bit pattern. We can give it the three bit tag '110'. Since there is no data, "None" would be '0110', only one word. A string can not fit in one word, so it will have to be boxed. We would store a pointer to the block structure storing the data that Nicolas mentioned (tag, GC bits, size). Since all pointers allocated by OCaml are four byte aligned, it would be something like '10001100'. Because the last two bits are '00' that forms as a sort of 'tag' that signals "Hey, this thing is a pointer to find out what variant it is you will have to actually dereference it and find the tag in the block structure". Again this pointer would be one word long. As you mentioned functions like List.rev don't care what their arguments are so long as they are a single word. The above representation satisfies that. The only other interesting part is pattern matching. Matching would require examination of the final bits of a value. The final bits would determine the constructor, unless they were '00' (i.e. word aligned) in which case the value is a pointer to a block that contains the tag that determines the constructor. In portable assembly (a.k.a. C) a match operation would look something like the following. if (x & 0x03) { /* It's an unboxed constructor */ if (x & 0x01) { /* It's an Int */ } else if (x & 0x07 =3D=3D 0x02) { /* It's a Bool */ } else if (x & 0x07 =3D=3D 0x06) { /* It's a None */ } else { /* Malformed value */ } } else { /* It's word aligned so it's a boxed constructor */ switch (Tag_val(x)) { case STRING_TAG: /* It's a String */ break; default: /* Malformed value */ } } Michael D. Adams mdmkolbe@gmail.com