From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 1B027BBC1; Mon, 21 Apr 2008 11:24:25 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAJf5C0jAXQIn/2dsb2JhbACqCg X-IronPort-AV: E=Sophos;i="4.25,689,1199660400"; d="scan'208";a="11184975" Received: from concorde.inria.fr ([192.93.2.39]) by mail1-smtp-roc.national.inria.fr with ESMTP; 21 Apr 2008 11:24:24 +0200 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id m3L9OI2F019852 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK); Mon, 21 Apr 2008 11:24:24 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Au0AAJf5C0jBL1AZiGdsb2JhbACRWgEBAQ8ml3g X-IronPort-AV: E=Sophos;i="4.25,689,1199660400"; d="scan'208";a="9830900" Received: from gw.exalead.com (HELO exalead.com) ([193.47.80.25]) by mail2-smtp-roc.national.inria.fr with ESMTP; 21 Apr 2008 11:24:24 +0200 Received: from [192.168.204.148] (madpc064.exalead.com [192.168.204.148]) (authenticated bits=0) by exalead.com (8.14.2/8.14.0) with ESMTP id m3L9OLWl027643; Mon, 21 Apr 2008 11:24:24 +0200 Message-ID: <480C5D45.7020903@exalead.com> Date: Mon, 21 Apr 2008 11:24:21 +0200 From: Berke Durak User-Agent: Thunderbird 1.5.0.10 (X11/20070221) MIME-Version: 1.0 To: Xavier Leroy , Caml-list List Subject: Re: [Caml-list] OCaml Summer Project decisions are in References: <865618.45090.qm@web54601.mail.re2.yahoo.com> <1208546581.16295.108.camel@nyc-qws-018.delacy.com> <1208614908.6790.53.camel@benjamin-laptop> <480B6B89.2070403@inria.fr> In-Reply-To: <480B6B89.2070403@inria.fr> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 480C5D42.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; berke:01 durak:01 berke:01 durak:01 ocaml:01 speedup:01 ocaml:01 parallelism:01 run-time:01 runtime:01 runtimes:01 emits:01 integers:01 outputting:01 mutexes:01 Xavier Leroy wrote: > By Amdahl's law, a stop-the-world GC cannot scale well, but it might > be good enough for 2 to 4 cores. For instance, assuming a sequential > program that spends 25% of its time in garbage collection and > parallelizes with low overhead, you would get a speedup of 2 on > a 4-core machine. No a very bright prospect. Many people already have dual cores under their desks, so quad cores should be the norm in a year or two. I'd hate to see Ocaml become one of the slower languages. > A related question is how long it takes to stop a > large number of threads. But only experimentation can tell. > .. > Another crucial point is the ability to stop all threads and obtain > their roots, which is not obvious at all and a prerequisite for any > form of concurrent garbage collection. Stopping the world in our heavily threaded ExaVM for GC is quite painful since the threads often run in native C/C++ code which naturally has no business checking the stop condition as often as Exascript code, giving very poor performance on multi-core machines (we got something like 10-15% CPU usage on a 32-core machine) XL: > There are two points that I'd like to emphasize. The first step towards > any form of true parallelism in Caml (including message-passing > between threads having their own heaps and sequential GCs) is to clean > up the numerous global C variables that the run-time system uses. > But maybe this was implicit in your point 1. Besides moving the interpreter/runtime global state into some datastructure, what about emitted asm code that references things like caml_young_limit? Those would have to be made thread-specific. Why not allocate one or two of these extra x86-64 registers for storing those? That could also be an occasion to "librarify" the interpreter in order to allow multiple Ocaml interpreters/run-times to execute in the same process (with disjoint heaps). That way one could embed multiple Ocaml libraries in programs written in other languages (which could allow easier monetization of Ocaml software ;). Also, we could explore alternate paths to paralellism, such as running multiple Ocaml interpreters/runtimes in different threads with disjoint heaps. Those threads would run truly in parallel on different CPUs. Then, we could try different custom solutions for cheap inter-thread communication. For example, we could write a Marshal derivative that emits a simpler inter-thread representation. It would work under the hypothesis that the deserializing process runs the same code as the serializing one, on the same machine, at the same time. This would allow to remove some serialization overhead: - No need to compact integers for a byte-based serialization format - No need to worry about endianness or float representations - No need to outputting or verify program MD5s for functions - Native data could be passed as-is. Thus we would have one process running n threads, but those threads having disjoint heap spaces, allowing for fully parallel GC and execution (distinct heap spaces mean reduced cache/bus contention), exchanging data in shared memory thru an inter-thread Marshal module. (Note that using n processes with shared memory would not be as convenient: - You'd need to set up shared memory segment using IPC, /dev/shm or other unportable, OS-specific mechanisms. - You won't be able to use native multi-threaded libraries - You won't be able to share file desriptors or common C code/data. - You won't be able to use inter-thread signaling/synchronization mechanisms, for your implementation) People would then write libraries for shared immutable data, or a shared store, probably using a STM-like approach. Those could probably be written in Ocaml. Also some POSIX thread synchronization mechanisms (mutexes, etc.) could be exposed as Ocaml types. As for the interpreter threads, they could use lightweight threads or classic Ocaml serializing threads for their internal non-parallel concurrency. I personally don't believe much in fine-grained parallelism using threads and locks because it makes the GC difficult, relies on the notoriously difficult to use and non-composable synchronization mechanisms such as locks and mutexes, doesn't scale to distributed computing, and doesn't run as fast as most people think it would because of bus/cache contention. > At any rate, it's probably better not to ask too many questions at > this point and let you concentrate on the task at hand. Feel free to > contact me and Damien Doligez for specific questions or general > advice. I certainly don't want to discourage anyone from writing a parallel GC! -- Berke DURAK