From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id D55B3BB84 for ; Wed, 24 Dec 2008 17:43:57 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlQCACf1UUnUnw6Ugmdsb2JhbACCO5EwAQELCwgHEwSrB1iRC4ZC X-IronPort-AV: E=Sophos;i="4.36,279,1228086000"; d="scan'208";a="21712932" Received: from fhw-relay07.plus.net ([212.159.14.148]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 24 Dec 2008 17:43:57 +0100 Received: from [87.113.91.222] (helo=leper.local) by fhw-relay07.plus.net with esmtp (Exim) id 1LFWpw-0005UF-9L for caml-list@yquem.inria.fr; Wed, 24 Dec 2008 16:43:56 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] More Caml Date: Wed, 24 Dec 2008 16:47:03 +0000 User-Agent: KMail/1.9.9 References: <157727.93194.qm@web111508.mail.gq1.yahoo.com> <200812230959.24662.jon@ffconsultancy.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200812241647.03447.jon@ffconsultancy.com> X-Plusnet-Relay: b63cda2c08cbcccc290a23611a286984 X-Spam: no; 0.00; mikkel:01 rgensen:01 allocations:01 ocaml:01 ocaml:01 haskell:01 pointers:01 allocating:01 non-mutable:01 trivial:01 erlang:01 threading:01 model:01 parallelism:01 compiler:01 On Wednesday 24 December 2008 13:12:58 Mikkel Fahn=C3=B8e J=C3=B8rgensen wr= ote: > 2008/12/23 Jon Harrop : > > symbolics. I am guessing the performance of allocation will be degraded > > 10-100x but many allocations can be removed. This leaves me wondering h= ow > > much slowdown is acceptable without deterring a lot of users? > > I think this would be major problem. A great advantage of OCaml is > that you can do things you would not normally do in C/C++ for > performance reasons, like mapping a list. Yes. The counter argument is that people wanting performance should not be= =20 using long linked lists in the first place. I think that is actually quite = a=20 compelling argument: it is more important to remove barriers to potential=20 performance than it is to make everything fast. Indeed, that is one of the strongest reasons to choose OCaml over the=20 alternatives: you can write very fast code in OCaml without having to drop = to=20 C. In contrast, Haskell compilers do a lot of generic optimizations like=20 deforesting that help in unimportant cases but place a low performance=20 ceiling on fast code so you are forced to drop to another language for spee= d. > I believe it is desirable to split allocation into pools dedicated to > each physical thread. Absolutely. However, that is also extremely complicated and I just cannot s= ee=20 myself implementing that any time soon. A parallel GC using a shadow stack= =20 seems feasible and should still be a productive improvement. Symbolic code= =20 will really suffer but numeric code will really gain. > Memory barriers are really expensive, such as used for atomic increment, = not > to mention locking.=20 Particularly when mutating pointers to reference types in the heap. However= ,=20 reading and writing value types in the heap should still be fast, e.g.=20 parallel numerical methods. > Small block=20 > allocation could be done in relatively small heap segments where each > physical thread locks a larger heap only when it needs a new segment, > but not while allocating within a segment. This can further be > improved by adding NUMA support and scope based allocation (arena). If > full segments are marked non-mutable where possible, it is probably > possible to reduce thread blocking during collection. Garbage > collection could be limited to processing full segments. Probably not > trivial though. Indeed. There are also a variety of type-directed approaches that may be=20 useful given the information available to the JIT. Not to mention that I'm= =20 expecting to JIT the GC code for each type as well... :-) Incidentally, the availability of fences as intrinsics in LLVM makes it sou= nd=20 ideal for implementing GCs using wait-free methods. If I can get the basics= =20 up and running then it may be best for me to focus on making this VM an eas= y=20 target for GC researchers to implement new algorithms. > How do you think of adding an Erlang style threading model? That is effectively what F#'s asynchronous workflows aim to achieve. Howeve= r,=20 I currently have no use for it so I only desire that its existence does not= =20 limit the capabilities that I am interested in (parallelism). > Speaking of JIT compiler, would eval make any sense? Yes, no reason why not. Perhaps the same infrastructure used to type=20 specialize definitions as they are JIT compiled can be used for other kinds= =20 of partial specialization, like low-dimensional vector/matrix operations. =2D-=20 Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e