caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Michael D. Adams" <mdmkolbe@gmail.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Efficency of varient types
Date: Sun, 27 Nov 2005 10:35:05 -0500	[thread overview]
Message-ID: <c62c8d860511270735v2366b474m75b6084a0055f8d5@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.63.0511262336031.24132@localhost.localdomain>

On 11/27/05, Brian Hurt <bhurt@spnz.org> wrote:
> On Sun, 27 Nov 2005, Michael D. Adams wrote:
> > I agree, which is why it was my hope that OCaml might do some of that
> > for me.  Consider a home brew bool type, "type mybool = Mytrue |
> > Myfalse".  If the compiler were smart enough, it could represent that
> > as an unboxed type.
>
> The compiler is smart enough to do that already- variants without
> associated data are represented as ints.

I am glad to hear that.

> > From there it might be a small step to
> > semi-unboxed types such as the one I started this discussion with,
> > "type value = Int of int | Bool of bool | String of string".
>
> The problem here is that the variants take two words- one word which says
> which variant it is, and the second word which is the unboxed int, unboxed
> bool, or pointer to the boxed string.

I should clarify what I am proposing because it also only requires one
word.  The bottom one or two bits of that word would say which variant
it is.  The remaining bits representing any data with that variant. 
(Obviously this would only work if the associated data is small
enough, but fortunately most of the primitives are small enough (e.g.
int (31 bits), bool (1 bit), char (8 bits).))

(Please forgive me if the following is verbose, I want to make sure my
idea is clear.  In the scheme world I think it's called "pointer
tagging" or "tagged pointers".)

Consider "type value = Int of int | Bool of bool | None | String of
string".  For the moment I will assume that the GC only assumes 4 byte
aligned values are pointers.

Since int is 31 bits that leaves one bit to say what type it is.  We
don't want the GC grabbing it, so we will represent it with a single
'1' bit appended.  A 6 would be represented by 1101 (110 for the 6 and
1 for the "tag").  This means that "Int 6" only takes one word.

Let's look at "Bool of bool".  We only need one bit of representation
for that, leaving 31 bits free for tagging.  We can represent it by
appending a three bit tag of '010' to the regular bool value.  So true
is '1010' and false is '0010'.  (NB: this depends on the GC ignoring
non-4-byte-aligned values.) Again "Bool true" and "Bool false" would
only be one word.

The "None" case can use any free bit pattern.  We can give it the
three bit tag '110'.  Since there is no data, "None" would be '0110',
only one word.

A string can not fit in one word, so it will have to be boxed. We
would store a pointer to the block structure storing the data that
Nicolas mentioned (tag, GC bits, size).  Since all pointers allocated
by OCaml are four byte aligned, it would be something like '10001100'.
 Because the last two bits are '00' that forms as a sort of 'tag' that
signals "Hey, this thing is a pointer to find out what variant it is
you will have to actually dereference it and find the tag in the block
structure".  Again this pointer would be one word long.

As you mentioned functions like List.rev don't care what their
arguments are so long as they are a single word.  The above
representation satisfies that.

The only other interesting part is pattern matching.  Matching would
require examination of the final bits of a value.  The final bits
would determine the constructor, unless they were '00' (i.e. word
aligned) in which case the value is a pointer to a block that contains
the tag that determines the constructor.  In portable assembly (a.k.a.
C) a match operation would look something like the following.

if (x & 0x03) {
  /* It's an unboxed constructor */
  if (x & 0x01) { /* It's an Int */ }
  else if (x & 0x07 == 0x02) { /* It's a Bool */ }
  else if (x & 0x07 == 0x06) { /* It's a None */ }
  else { /* Malformed value */ }
} else {
  /* It's word aligned so it's a boxed constructor */
  switch (Tag_val(x)) {
    case STRING_TAG: /* It's a String */ break;
    default: /* Malformed value */
  }
}


Michael D. Adams
mdmkolbe@gmail.com


  parent reply	other threads:[~2005-11-27 15:35 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-25 23:53 Michael D. Adams
2005-11-26  0:31 ` [Caml-list] " Nicolas Cannasse
2005-11-26  1:22   ` Brian Hurt
2005-11-26  9:39     ` Nicolas Cannasse
2005-11-28  0:17     ` Obj or not Obj Christophe Raffalli
2005-11-28  8:41       ` [Caml-list] " Nicolas Cannasse
2005-11-28  9:27         ` Christophe Raffalli
2005-11-28  9:33         ` skaller
2005-11-28  8:43       ` Alain Frisch
2005-11-26  2:54   ` [Caml-list] Efficency of varient types skaller
2005-11-27  5:06   ` Michael D. Adams
2005-11-27  5:45     ` Brian Hurt
2005-11-27 10:02       ` Nicolas Cannasse
2005-11-27 15:35       ` Michael D. Adams [this message]
2005-11-27 18:08         ` Brian Hurt
2005-12-02 15:07           ` Michael D. Adams
2005-11-26  1:18 ` Jon Harrop
2005-11-27 14:57 ` Lukasz Stafiniak
2005-11-27 15:47   ` Lukasz Stafiniak
2005-11-28  8:14   ` Christophe Raffalli
2005-11-28  7:24 ` David Baelde
2005-11-28  7:49   ` Jacques Garrigue
2005-11-28 10:01     ` Jon Harrop
2005-11-28 10:26       ` Luc Maranget
2005-11-28  7:53   ` Ville-Pertti Keinonen
2005-12-01 17:05 ` Stefan Monnier
2005-12-02 15:07   ` [Caml-list] " Michael D. Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c62c8d860511270735v2366b474m75b6084a0055f8d5@mail.gmail.com \
    --to=mdmkolbe@gmail.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).