From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 2929EBB9A for ; Tue, 27 Sep 2005 07:32:57 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j8R5WuIX008827 for ; Tue, 27 Sep 2005 07:32:56 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id HAA21984 for ; Tue, 27 Sep 2005 07:32:55 +0200 (MET DST) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j8R5WrSw008822 for ; Tue, 27 Sep 2005 07:32:54 +0200 Received: from rosella (ppp16-174.lns2.syd7.internode.on.net [59.167.16.174]) by smtp1.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id j8R5WnKe028065; Tue, 27 Sep 2005 15:02:51 +0930 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] Re: Ant: Efficiency of let/and From: skaller To: Stefan Monnier Cc: caml-list@inria.fr In-Reply-To: <87hdc724wo.fsf-monnier+gmane.comp.lang.caml.inria@gnu.org> References: <20050926043240.24009.qmail@web26809.mail.ukl.yahoo.com> <87hdc724wo.fsf-monnier+gmane.comp.lang.caml.inria@gnu.org> Content-Type: text/plain Date: Tue, 27 Sep 2005 15:32:49 +1000 Message-Id: <1127799169.31518.154.camel@rosella> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 4338D988.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 4338D985.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 semantically:01 ocamlopt:01 compiler:01 compiler:01 parallelism:01 parallelism:01 pipelines:01 ocaml:01 arises:01 reusing:01 invocation:01 invocation:01 doubles:01 ocaml:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Mon, 2005-09-26 at 12:05 -0400, Stefan Monnier wrote: > > Syntactically and semantically there is no difference. I was wondering if > > the ocamlopt compiler took advatange of the implicit paralellism at all. > > If someone tries to use such things as `and' or > unspecified-argument-evaluation-order in the hopes that the compiler will > extract some imagined parallelism is simply deluding himself. > In some cases, the freedom to execute in any order does lead to better > code, but that code rarely if ever uses any kind of parallelism. This is not so at all. Limited Parallelism is indeed found in all modern processors, which can, for example, distribute multiple instructions on several pipelines, decode in parallel with computing values, or perform several integer and/or floating operations simultaneously. In fact, as far back as 197x, the Cyber 70 could do this, in fact it relied on it. In particular, register calculations and memory fetches and stores we always done in parallel, automatically, until a join point. When you did a load, for example, the next instruction would be executed immediately, without waiting for the load to complete, provided it didn't use the register being loaded (and I think .. it would then proceed to the next instruction, and execute THAT whilst suspending the previous one, provided it used distinct registers .. but I'm not sure). It may well be that this has no impact on Ocaml code. However it certainly DOES have an impact on low level C code. In addition, parallel assignment is a major benefit, not because the assignments are done concurrently, but because it can save memory. For example: x,y = 1,2 cf x,y = y,x The first pair of assignments is faster and uses no temporaries, the second requires a temporary. So an arbitrary parallel assignment can, in fact, be optimised to reduce the number of temporaries used, and thus the number of operations. Parallel assignments as written aren't that useful, however there is a particular optimisation of great benefit which is requires parallel assignment: self-tail-recursion. In this case, the assignment arises through reusing the same closure, and the optimisation is of the form: x,y,x = f(x,y,z), g(x,y,z), h(x,y,x) goto start In particular, the argument of the new invocation can depend on the argument of the old invocation, and, in tuple form the assignment would, without analysis, require a temporary the size of the argument: tmp = f(x,y,z), g(x,y,z), h(x,y,z) x,y,z = tmp Which, in the worst case, *doubles* the number of assignments, and thus the cost of looping -- in a tight inner loop this can be critical. In a language like Ocaml, a parallel operation can have a major performance advantage, because it explicitly indicates to the compiler that the order of evaluation is irrelevant: due to side-effects and separate compilation, the order of evaluation of sequential assignments is likely to be fixed, and therefore cannot be optimised. I do not know whether Ocaml does parallel assignment optimisations on self-tail recursions .. but I can tell you that Felix DOES. Perhaps this is why it calculates Ackermann's function faster than Ocaml and C? [I have no idea of the real reason .. but it does] -- John Skaller Felix, successor to C++: http://felix.sf.net