From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <weis@pauillac.inria.fr>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id F3224BB9C
	for <caml-list@yquem.inria.fr>; Fri,  2 Dec 2005 16:07:24 +0100 (CET)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jB2F7OdW004496
	for <caml-list@yquem.inria.fr>; Fri, 2 Dec 2005 16:07:24 +0100
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id QAA27500 for <caml-list@pauillac.inria.fr>; Fri, 2 Dec 2005 16:07:23 +0100 (MET)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.198])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jB2F7M70004483
	for <caml-list@inria.fr>; Fri, 2 Dec 2005 16:07:23 +0100
Received: by wproxy.gmail.com with SMTP id i30so139647wra
        for <caml-list@inria.fr>; Fri, 02 Dec 2005 07:07:22 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=E43xW6Ns9rm23WWVWw7WVHq6zlRpr/c+ztBi0j+2W0NAxZ3csBpkZjw642FqDh6WKJ9JqEX+pOk0vDdtIGMlo3WT03QRVlfiUmCHXgRiNt/wsBkW8WDrrr9Yu7UQqaFwPzUzIM74dRxUNQFs5mUGFMJ16Wp6Umq43ODirHa8ti0=
Received: by 10.65.11.4 with SMTP id o4mr1509506qbi;
        Fri, 02 Dec 2005 07:07:21 -0800 (PST)
Received: by 10.65.192.4 with HTTP; Fri, 2 Dec 2005 07:07:21 -0800 (PST)
Message-ID: <c62c8d860512020707x4477138ct542bda1d9c6a3261@mail.gmail.com>
Date: Fri, 2 Dec 2005 10:07:21 -0500
From: "Michael D. Adams" <mdmkolbe@gmail.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Efficency of varient types
In-Reply-To: <Pine.LNX.4.63.0511271155540.24132@localhost.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
References: <c62c8d860511251553k782c574cg37fa1116d6fc3064@mail.gmail.com>
	 <4387ACC9.2040107@motion-twin.com>
	 <c62c8d860511262106g59cb7bccwcfa24124b97a0a2b@mail.gmail.com>
	 <Pine.LNX.4.63.0511262336031.24132@localhost.localdomain>
	 <c62c8d860511270735v2366b474m75b6084a0055f8d5@mail.gmail.com>
	 <Pine.LNX.4.63.0511271155540.24132@localhost.localdomain>
X-Miltered: at nez-perce with ID 4390632C.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Miltered: at nez-perce with ID 4390632A.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 bool:01 char:01 pointers:01 pointer:01 segfault:01 ocaml:01 pointer:01 char:01 pointers:01 constructors:01 argued:01 ocaml:01 constructors:01 minor:01 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_BY_IP autolearn=disabled 
	version=3.0.3

On 11/27/05, Brian Hurt <bhurt@spnz.org> wrote:
> On Sun, 27 Nov 2005, Michael D. Adams wrote:
> > I should clarify what I am proposing because it also only requires one
> > word.  The bottom one or two bits of that word would say which variant
> > it is.  The remaining bits representing any data with that variant.
> > (Obviously this would only work if the associated data is small
> > enough, but fortunately most of the primitives are small enough (e.g.
> > int (31 bits), bool (1 bit), char (8 bits).))
>
> Except that this runs into the problem of knowing the subtle details of
> the GC- not only how it works now, but what (if any) gaurentees are being
> given for future development.  For example, is it gaurenteed that the GC
> will recognize words with the low two bits being 10 as not pointers?  Or
> does the low bit being 0 means the GC may (or may not) assume it's a
> pointer, and weird word values would cause the GC to segfault?  Even if i=
t
> works now, will it continue to work with Ocaml 3.10, 3.20, 4.0, etc.  I
> don't know- I don't work at INRIA.  This is exactly what I was warning
> about earlier.

A brief glance at the code seems to confirm that the GC is only
checking the last bit even though it could check the last two bits.=20
That means I'll have to go with a 30 bit representation of ints with
the pointer tags like '01' for ints, '011' for bools, '111' for char,
'0' for pointers to boxed constructors.

<Tangent>
I use '01' for int instead of '11' b/c it requires fewer operations to
add and multiply ints.  The consideration of pointers doesn't matter
in this case.  Some have argued, though, that even when pointers do
matter, ints should still be '0' and pointers '1', b/c always
accessing pointers with an offset is cheap (there's an addressing mode
for it) and it makes int operations cheaper.
</Tangent>

Yes, this means I'm digging around the internal representation.  I
would rather OCaml were smart enough to figure out from a type
declaration that it could use pointer tags and unbox some of the
constructors.  Oh well, maybe in the next version of OCaml.  (/me
nudges Inria)

> The other thing I'll throw in here is a warning against premature
> optimization.  OK, so method A is 15x slower than method B.  But how big
> of an effect will this have on the overall program?  If method B is one
> clock cycle, and method A is 15 clock cycles, probably not much.  15 cloc=
k
> cycles is about the cost of one mispredicted branch, and much, much
> cheaper than a cache miss.  If you're doing anything at all interesting,
> the most likely case is that the 14 clock difference will be minor
> compared to the costs of the rest of the program.

I would have absolutely agreed with you until a couple months ago when
a friend pointed out to me the difference between optimizing an
application and optimizing a language.

Applications are usually dominated by some bottleneck that is slower
than the rest.  Most applications just need to be fast 'enough'.=20
Having 'ls' take 10 seconds is to slow but the difference between 10
ms and 1 ms usually doesn't matter.  These two facts mean that you
shouldn't prematurely optimize your application.

Compare this with a language implementation.  If some part of the
implementation that is commonly used, then every application that uses
that implementation will have to pay for it and in cases like the cost
of primative operations on a value (which is tied to the cost of
tags), those applications will have to pay for it almost everywhere.=20
Furthermore, "fast enough" doesn't apply as much with languages,
because if I make my implementation generate twice as fast code, then
that means someone optimizing their application with my implementation
only has to do half as much optimization to reach what ever their
"fast enough" level is.  So the rules for optimizing language
implementation are a little different than for applications.

Ok, there is a "fast enough" for my language implementation.  My
original objective was simply to show that if writing a compiler for a
language (e.g. scheme, python), it is a lot smarter to compile down to
OCaml than to compile down to C like a lot of implementations do.  The
results should be easier to write and will probably run faster because
you can use an already fast runtime (OCaml's) instead of writing your
own in C which will likely be slow unless you spend a lot of time on
it.  Thus I'm fast enough if I'm significantly faster than python.=20
However, that 15x slow down for using tags hurts b/c python is only
about 50x slower than C.  I know there will be other slow downs I will
have to pay for and 15x blows the budget.

As far as doing type inference, I will do that eventually, but I want
to try to see how fast I can make it without type inference, then add
inference later.  A friend of mine that wrote one of the worlds
fastest Scheme implementations (i.e. on par or better than equivalent
C), claims the cost of tagging is low if done 'right' (i.e. pointer
tags instead of block tags) and that type inference wont gain much
once that is done.  So to test his claim I need to first write a very
fast implementation that uses pointer tags, then see how much it
speeds up when I add type inference.

Michael D. Adams
mdmkolbe@gmail.com