From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p2QAWjZp028957 for ; Sat, 26 Mar 2011 11:32:45 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAEfAjU1QRFuw/2dsb2JhbAClYXfCe4MPgloE X-IronPort-AV: E=Sophos;i="4.63,247,1299452400"; d="scan'208";a="91271062" Received: from furbychan.cocan.org ([80.68.91.176]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/AES256-SHA; 26 Mar 2011 11:32:00 +0100 Received: from rich by furbychan.cocan.org with local (Exim 4.72) (envelope-from ) id 1Q3Qmb-0001yq-9q; Sat, 26 Mar 2011 10:31:49 +0000 Date: Sat, 26 Mar 2011 10:31:49 +0000 From: "Richard W.M. Jones" To: Gerd Stolpmann Cc: Hugo Ferreira , Martin Jambon , caml-list@inria.fr Message-ID: <20110326103149.GA7467@annexia.org> References: <2054357367.219171.1300974318806.JavaMail.root@zmbs4.inria.fr> <4D8BD02D.1010505@inria.fr> <4D8C73C8.6020801@inescporto.pt> <1301055903.8429.314.camel@thinkpad> <341494683.237537.1301057887481.JavaMail.root@zmbs4.inria.fr> <4D8C944A.9060601@inria.fr> <4D8CB859.9040709@inescporto.pt> <4D8CDDCC.4010000@ens-lyon.org> <4D8CEAA4.2030403@inescporto.pt> <1301084818.8429.435.camel@thinkpad> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1301084818.8429.435.camel@thinkpad> User-Agent: Mutt/1.5.18 (2008-05-17) Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap? Obligatory pointer to Uli Drepper's series about what every programmer should know about memory. Part 1 is here, and parts 2-9 are linked to at the end of the article just before the comments: http://lwn.net/Articles/250967/ To expand on what I said before: The sort of high end hardware we're seeing now has 128 cores and hundreds of gigabytes or even a terabyte of non-uniform RAM. The cores are grouped into NUMA nodes with each node having its own attached memory, southbridge and hardware (network ports etc) -- in effect separate computers interconnected with some very fast and very specialized "network" channels that exist but are invisible to the programmer. If you 'cross' a node boundary, eg. by having a program or its data located in the memory in one node and running on a core in another node, then you suffer some penalty (10-40% directly, plus a lot of hard-to-measure indirect costs from consuming channel resources between nodes). Obviously we try hard to schedule things and to pin processes, virtual machines and so on so that this never happens. Having said the above, even straight SMP isn't very uniform. You've still got to take into account caches and cache consistency (which at the hardware level is message passing or bus snooping). Rich. -- Richard Jones Red Hat