From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ober.14@osu.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: *
X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,SPF_NEUTRAL 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id AD088BBC4
	for <caml-list@yquem.inria.fr>; Wed,  4 Mar 2009 17:30:15 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AiUFABs7rklDWxLCaWdsb2JhbACBTpM0FAQiBLM8BgKHKohMglAIgTAG
X-IronPort-AV: E=Sophos;i="4.38,301,1233529200"; 
   d="scan'208";a="25060946"
Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194])
  by mail1-smtp-roc.national.inria.fr with ESMTP; 04 Mar 2009 17:30:10 +0100
Received: from kuba-laptop.bertec.net (kuba-laptop.bertec.net [192.168.2.48])
	by server1.bertec.net (Postfix) with ESMTP id BB4FF10575C
	for <caml-list@yquem.inria.fr>; Wed,  4 Mar 2009 11:30:06 -0500 (EST)
Message-Id: <A89DF7A3-81DD-4F03-B5B3-0F76AFB551B8@osu.edu>
From: Kuba Ober <ober.14@osu.edu>
To: caml-list@yquem.inria.fr
In-Reply-To: <caee5ad80903040817m7f0a8afer2abf80f75c16a692@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v930.3)
Subject: Re: [Caml-list] Odd performance result with HLVM
Date: Wed, 4 Mar 2009 11:30:06 -0500
References: <200902280112.24115.jon@ffconsultancy.com> <C61E9F1C-36D0-4429-BDFB-6F36A3F67AA5@osu.edu> <200902282152.18430.jon@ffconsultancy.com> <49ABED07.30800@imag.fr> <C1456ED1-E6DD-476B-8B46-C327F7B19CC3@osu.edu> <caee5ad80903040817m7f0a8afer2abf80f75c16a692@mail.gmail.com>
X-Mailer: Apple Mail (2.930.3)
X-Spam: no; 0.00; mikkel:01 rgensen:01 haskell:01 ocaml:01 ocaml:01 statically:01 runtime:01 compilation:01 granularity:01 cheers:01 wrote:01 stack:01 graph:01 dynamically:01 compile:01 


On Mar 4, 2009, at 11:17 AM, Mikkel Fahn=F8e J=F8rgensen wrote:

> When looking at the benchmark game and other benchmarks I have seen, I
> noticed that Haskell is almost as fast as OCaml and sometimes faster.
> Some Lisp implementations are also pretty fast.
>
> However, when you look at memory consumption OCaml uses considerably
> less memory, except for languages in the C family.
>
> I suspect that many real world performance scenarios, such as heavily
> loaded web servers and complex simulations, depend very much on memory
> consumption. This is both because of GC overhead and because of the
> slower memory pipeline the more cache levels are involved.
>
> So in case of a new JIT solution for OCaml, I believe it is important
> to observe this aspect as well.

I believe it is also important not to dynamically allocate memory for no
good reason.

All of my realtime numerical code uses statically allocated memory with
overlaying based on execution flow of basic blocks. That has zero =20
runtime
overhead: the produced machine code has fixed addresses for data
(not all data of course).

It reduces to whether a "basic block" can be re-entered from its =20
future (downstream)
or not. If it can, you have to use stack or heap. If it won't, then =20
you can do static
allocation. The potential cost is if given function is entered from =20
many points.
At that point you can get some overhead since the overlaying has to take
into account all possible ways the code is reached. This can be =20
mitigated
by generating more than one copy of the function. It makes sense when =20=

you
have some free code ROM, but your RAM is almost full.

This of course can only be done when you do whole-project compilation. =20=

If you
compile "modules" separately, you have to fall back on doing it in the =20=

linker,
where all you have is the function call graph and available =20
granularity is much
worse, at bigger RAM overhead. The code ROM overhead is then none =20
since linker
can hardly generate copies of functions; at the point where you copy =20
functions
you may as well do other optimizations, so linker is way too late to =20
do that
efficiently.

There's no reason not to use those techniques in code that runs on =20
"large"
platforms. It'd at least artificially boost some benchmark results ;)

Cheers, Kuba=