From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id D57C3BC6B for ; Thu, 6 Sep 2007 09:18:55 +0200 (CEST) Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l867IqSv030412 for ; Thu, 6 Sep 2007 09:18:54 +0200 X-IronPort-AV: E=Sophos;i="4.20,214,1186324200"; d="scan'208";a="182696006" Received: from ppp121-44-82-249.lns10.syd6.internode.on.net (HELO [192.168.1.201]) ([121.44.82.249]) by ipmail02.adl2.internode.on.net with ESMTP; 06 Sep 2007 16:47:03 +0930 Subject: Re: [Caml-list] More registers in modern day CPUs From: skaller To: Tom Cc: Caml-list List In-Reply-To: References: Content-Type: text/plain Date: Thu, 06 Sep 2007 17:17:02 +1000 Message-Id: <1189063022.9224.126.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 46DFA9DC.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; 0200,:01 ocaml:01 ocaml:01 pointer:01 dependencies:01 coupling:01 dispatching:01 threads:01 sourceforge:01 imho:01 wrote:01 caml-list:01 emulate:01 lazy:02 data:02 On Thu, 2007-09-06 at 08:20 +0200, Tom wrote: > (This question may not be OCaml specific, you'd be surprised .. > However, would it be possible to "emulate" cpu registers using > software? By keeping registers in the main memory, but accessing them > often enough to keep them in primary cache? That would be quite fast I > believe... The technique is called 'boxing'. This is one reason why Ocaml is so fast, when you'd expect the extra dereferences required all the time to be a big penalty. Instead, if the address is used but not the data (eg generic operation) cache is saved compared to an expanded representation. The cache is loaded if the pointer is dereferenced, and subsequent derefs are effectively free provided only a small number of boxes is opened: there is an extra cost of one word for the address, which is the price of the lazy loading, and is amortised away by generic operations. This is even faster than one might think because cache can do speculative preload of the pointed at data. [Does Ocaml bother to generate those instructions?] IMHO, the main purpose of registers is to organise the interleaving of parallel operations (memory reads mainly) based on dependencies. They differ from main memory (and cache) in that they're usually thread local (whereas all the other stuff is shared) so they're expressing coupling between data and flow of control. for example in: R1 = a R2 = b R3 = R1 + R2 R4 = c R5 = d R6 = R4 + R5 you'd be mainly wrong to think of these instructions as operating on data. No. Not today. These instructions are chopping up the control flow into parallel threads: a b c d | | | | V V V V + + | | I think that's the main reason for registers, not memory operands. Registers only need a few bits to name, so the dispatching to functional units is easier to calculate with less hardware. -- John Skaller Felix, successor to C++: http://felix.sf.net