From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jon@ffconsultancy.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id D55B3BB84
	for <caml-list@yquem.inria.fr>; Wed, 24 Dec 2008 17:43:57 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AlQCACf1UUnUnw6Ugmdsb2JhbACCO5EwAQELCwgHEwSrB1iRC4ZC
X-IronPort-AV: E=Sophos;i="4.36,279,1228086000"; 
   d="scan'208";a="21712932"
Received: from fhw-relay07.plus.net ([212.159.14.148])
  by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 24 Dec 2008 17:43:57 +0100
Received: from [87.113.91.222] (helo=leper.local)
	 by fhw-relay07.plus.net with esmtp (Exim) id 1LFWpw-0005UF-9L
	for caml-list@yquem.inria.fr; Wed, 24 Dec 2008 16:43:56 +0000
From: Jon Harrop <jon@ffconsultancy.com>
Organization: Flying Frog Consultancy Ltd.
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] More Caml
Date: Wed, 24 Dec 2008 16:47:03 +0000
User-Agent: KMail/1.9.9
References: <157727.93194.qm@web111508.mail.gq1.yahoo.com> <200812230959.24662.jon@ffconsultancy.com> <caee5ad80812240512q5dfbfafeoad6dabc8f0f93bea@mail.gmail.com>
In-Reply-To: <caee5ad80812240512q5dfbfafeoad6dabc8f0f93bea@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200812241647.03447.jon@ffconsultancy.com>
X-Plusnet-Relay: b63cda2c08cbcccc290a23611a286984
X-Spam: no; 0.00; mikkel:01 rgensen:01 allocations:01 ocaml:01 ocaml:01 haskell:01 pointers:01 allocating:01 non-mutable:01 trivial:01 erlang:01 threading:01 model:01 parallelism:01 compiler:01 

On Wednesday 24 December 2008 13:12:58 Mikkel Fahn=C3=B8e J=C3=B8rgensen wr=
ote:
> 2008/12/23 Jon Harrop <jon@ffconsultancy.com>:
> > symbolics. I am guessing the performance of allocation will be degraded
> > 10-100x but many allocations can be removed. This leaves me wondering h=
ow
> > much slowdown is acceptable without deterring a lot of users?
>
> I think this would be major problem. A great advantage of OCaml is
> that you can do things you would not normally do in C/C++ for
> performance reasons, like mapping a list.

Yes. The counter argument is that people wanting performance should not be=
=20
using long linked lists in the first place. I think that is actually quite =
a=20
compelling argument: it is more important to remove barriers to potential=20
performance than it is to make everything fast.

Indeed, that is one of the strongest reasons to choose OCaml over the=20
alternatives: you can write very fast code in OCaml without having to drop =
to=20
C. In contrast, Haskell compilers do a lot of generic optimizations like=20
deforesting that help in unimportant cases but place a low performance=20
ceiling on fast code so you are forced to drop to another language for spee=
d.

> I believe it is desirable to split allocation into pools dedicated to
> each physical thread.

Absolutely. However, that is also extremely complicated and I just cannot s=
ee=20
myself implementing that any time soon. A parallel GC using a shadow stack=
=20
seems feasible and should still be a productive improvement. Symbolic code=
=20
will really suffer but numeric code will really gain.

> Memory barriers are really expensive, such as used for atomic increment, =
not
> to mention locking.=20

Particularly when mutating pointers to reference types in the heap. However=
,=20
reading and writing value types in the heap should still be fast, e.g.=20
parallel numerical methods.

> Small block=20
> allocation could be done in relatively small heap segments where each
> physical thread locks a larger heap only when it needs a new segment,
> but not while allocating within a segment. This can further be
> improved by adding NUMA support and scope based allocation (arena). If
> full segments are marked non-mutable where possible, it is probably
> possible to reduce thread blocking during collection. Garbage
> collection could be limited to processing full segments. Probably not
> trivial though.

Indeed. There are also a variety of type-directed approaches that may be=20
useful given the information available to the JIT. Not to mention that I'm=
=20
expecting to JIT the GC code for each type as well... :-)

Incidentally, the availability of fences as intrinsics in LLVM makes it sou=
nd=20
ideal for implementing GCs using wait-free methods. If I can get the basics=
=20
up and running then it may be best for me to focus on making this VM an eas=
y=20
target for GC researchers to implement new algorithms.

> How do you think of adding an Erlang style threading model?

That is effectively what F#'s asynchronous workflows aim to achieve. Howeve=
r,=20
I currently have no use for it so I only desire that its existence does not=
=20
limit the capabilities that I am interested in (parallelism).

> Speaking of JIT compiler, would eval make any sense?

Yes, no reason why not. Perhaps the same infrastructure used to type=20
specialize definitions as they are JIT compiled can be used for other kinds=
=20
of partial specialization, like low-dimensional vector/matrix operations.

=2D-=20
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e