From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p4JAxdCZ007104 for ; Thu, 19 May 2011 12:59:39 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ai8CAJ731E3B/BfVkWdsb2JhbACmIAEBAQEJCwsHFCXJR4YZBJARhC+KSA X-IronPort-AV: E=Sophos;i="4.65,236,1304287200"; d="scan'208";a="94996059" Received: from msa04.smtpout.orange.fr (HELO msa.smtpout.orange.fr) ([193.252.23.213]) by mail2-smtp-roc.national.inria.fr with ESMTP; 19 May 2011 12:59:33 +0200 Received: from [192.168.1.63] ([83.199.83.241]) by mwinf5d39 with ME id lNzY1g00R5CQSYr03NzZiy; Thu, 19 May 2011 12:59:33 +0200 Message-ID: <4DD4F817.9060901@frisch.fr> Date: Thu, 19 May 2011 12:59:35 +0200 From: Alain Frisch User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Joel Reymont CC: caml-list References: <346B52D2-EE21-43D1-B41E-3AEB3BBF0013@gmail.com> <4DD4150E.6080400@frisch.fr> <4DD4D3C3.9050603@frisch.fr> <09C897B9-05FC-486C-A126-C644D04BAE94@gmail.com> In-Reply-To: <09C897B9-05FC-486C-A126-C644D04BAE94@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Caml-list] optimizing numerical code On 05/19/2011 10:37 AM, Joel Reymont wrote: > > On May 19, 2011, at 10:24 AM, Alain Frisch wrote: > >> Actually, even in this branch, acc would not be unboxed. The new heuristic unboxes float variables unless they are both needed in boxed form AND mutated, which is the case for acc in your function. > > Shouldn't it be unboxed, though? Yes, in this case, it would make a lot of sense to work with an unboxed variable and only box it when needed. But finding a good heuristic is not trivial. Always applying this approach of "on-demand" boxing might have drawbacks: more allocations (if the same value is used several times in boxed form) and less sharing of allocations (the compiler combines adjacent allocations to avoid some overhead). In the example, it is easy to find out that the variable is used only once in boxed form (in a context executed at most once). Here is another "lazy" approach that could be worth trying. Consider a float variable x which is assigned and used both in boxed and unboxed form in the same function. Internally, we keep two variants of it: x_boxed and x_unboxed. When x is assigned the result of some float expression e: - if e can be computed directly in unboxed form (e.g. it is the result of some numerical operation), assign x_unboxed only; - if e comes in boxed form (e.g. it is the result of some function call), assign x_boxed and x_unboxed (by dereferencing x_boxed); When x is used an an unboxing context, use x_unboxed. When x is used as a boxed value (e.g. as an argument to a function call, or as the return value of the current function), check (at runtime) if the content of x_boxed is equal to x_unboxed; if yes, return x_boxed; otherwise, box x_unboxed, store the new block in x_boxed and returns it. This scheme guarantees that at most one allocation happens between two successive assignment to the variable, and that no allocation happens if it turns out that the current value is never used in boxed form. This comes at the price of some extra equality checks between floats and conditional jumps, but it might be worth trying. -- Alain