From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 89BD6BB81 for ; Wed, 8 Mar 2006 22:38:12 +0100 (CET) Received: from ash25e.internode.on.net (ash25e.internode.on.net [203.16.214.182]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k28LcAjT032718 for ; Wed, 8 Mar 2006 22:38:11 +0100 Received: from [192.168.1.200] (ppp9-113.lns1.syd7.internode.on.net [59.167.9.113]) by ash25e.internode.on.net (8.13.5/8.13.5) with ESMTP id k28Lc1Nv004787; Thu, 9 Mar 2006 08:08:02 +1030 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] STM support in OCaml From: skaller To: Brian Hurt Cc: William Lovas , caml-list@yquem.inria.fr In-Reply-To: References: <440DB255.1030701@asfandyar.cjb.net> <1141751708.20944.355.camel@budgie.wigram> <440DD982.8080800@asfandyar.cjb.net> <1141779125.20944.405.camel@budgie.wigram> <20060308193633.GA5460@coruscant.stwing.upenn.edu> Content-Type: text/plain Organization: Async P/L Date: Thu, 09 Mar 2006 09:06:34 +1100 Message-Id: <1141855594.23909.63.camel@budgie.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 440F4EC2.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocaml:01 posix:01 afaik:01 interprocess:01 async:01 2006:98 roof:98 consultants:98 wrote:01 wrote:01 sourceforge:01 sourceforge:01 caml-list:01 slower:01 kernel:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote: > One comment I will make is that a mutex is expensive, but not *that* > expensive. I just wrote a quick program (available if anyone cares) in > GNU C that measures the cost, in clocks, of locking and unlocking a posix > mutex. On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost > of like 44 clock cycles. Which makes it less expensive than an L2 cache > miss. Ahem. Now try that on an AMDx2 (dual core). The cost goes through the roof if one process has a thread on each core. Because each core has its own cache and both caches have to be flushed/ synchronised. And those caches are BIG! I had hoped to compare dual CPU with dual core by buying a two CPU box (i.e. with two dual cores on it) but they're 2-3 times more expensive because it seems there's no board that supports Athlons (you need Opterons which cost a lot more). I have no idea if Linux, for example, running SMP kernel, is smart enough to know if a mutex is shared between two processing units or not: AFAIK Linux doesn't support interprocess mutex. Windows does. Be interesting to compare. As mentioned before the only data I have at the moment is a two thread counter increment experiment on a dual CPU G5 box, where the speed up from 2 CPUs vs 1 was a factor of 15 .. times SLOWER. -- John Skaller Async PL, Realtime software consultants Checkout Felix: http://felix.sourceforge.net