From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id XAA15737; Mon, 27 Oct 2003 23:11:26 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id XAA26525 for ; Mon, 27 Oct 2003 23:11:25 +0100 (MET) Received: from ms-smtp-02-eri0.ohiordc.rr.com (ms-smtp-02-smtplb.ohiordc.rr.com [65.24.5.136]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h9RMBO120684 for ; Mon, 27 Oct 2003 23:11:24 +0100 (MET) Received: from vilya (cpe-024-033-201-104.columbus.rr.com [24.33.201.104]) by ms-smtp-02-eri0.ohiordc.rr.com (8.12.10/8.12.7) with ESMTP id h9RMBNqt023471; Mon, 27 Oct 2003 17:11:23 -0500 (EST) Received: from peuter ([192.168.2.2] ident=mail) by vilya with esmtp (Exim 3.36 #1 (Debian)) id 1AEFa7-0000jL-00; Mon, 27 Oct 2003 17:11:23 -0500 Received: from andrewl by peuter with local (Exim 3.36 #1 (Debian)) id 1AEFa7-0000Nw-00; Mon, 27 Oct 2003 17:11:23 -0500 Date: Mon, 27 Oct 2003 17:11:23 -0500 From: Andrew Lenharth To: William Chesters Cc: caml-list@inria.fr Subject: Re: [Caml-list] partial eval question Message-ID: <20031027221122.GA608@peuter> References: <004a01c39c2b$819db8f0$1fcf2952@Archimedes> <20031027081453.37b9f6ee.Damien.Pous@ens-lyon.fr> <16285.15401.421786.814601@beertje.william.bogus> <20031027185021.GA1793@vilya.homelinux.net> <16285.28219.572454.790216@beertje.william.bogus> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16285.28219.572454.790216@beertje.william.bogus> User-Agent: Mutt/1.5.4i X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 chesters:01 recursion:01 unrolling:01 unrolling:01 runtime:01 gcc:01 optimising:01 gcc:01 intel's:01 outputs:01 gettimeofday:01 flags:01 keywork:01 multi-stage:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Mon, Oct 27, 2003 at 07:12:59PM +0000, William Chesters wrote: > What I wrote, the "obvious thing", is > -- easy to write, and hard to get wrong Easy to write, maybe, if recursion is hard. Harder for compilers to optimize. Tests latter. > -- gives much less confusing error messages if you get it slightly > wrong > -- easy to read Which is less important in high performance situlations. > -- uses a smaller subset of the language, so is especially easier > for non-C++ experts > -- more general, in that it doesn't blow up and use silly amounts > of space (and probably more time too, given cache churn) if N is not > tiny When I dump the code, I see 2 * N bytes of code in the unrolled loop. When I dump your code, I see no loop unrolling, hence no point in even trying to specialize the case. Anytime you are doing code generation, you must be careful to avoid blow up. Such is life. As you alluded to in the cache churn comment idea unrolling requires mesurement. However, with the template code we are explicitly using code generation, in the c code, we are simply hoping the compiler will do something we want. > -- more general also in that the same code does both the > general-purpose (n known only at runtime) and the special-purpose > job yes, and properly written the template code will support that. > > The C example relies on a fairly smart compiler to > > do interprocedual analysis. > > Depends what you mean by fairly smart. This is standard stuff: gcc is > really not the best optimising compiler around. This is true. Therefore I tried this on both gcc and icc (intel's compiler). icc did worse on the c than gcc did (interesting). Especially since icc was trying to do interprocedual analysis. The C++ versions were very similar at the assembly level in both compilers. The body of the loop was just series of fmul is both compiler outputs. quick measurement with gettimeofday: compiler N=3 N=30 gcc 5181 89866 icc,c 18437 106522 icc,c++ 1336 1296 g++ 1297 1297 for 1M iterations of assignment of pow into a volatile double These are very dependent on compiler flags in the c case. Having icc be more aggressive with loop unrolling made matters worse, since the code no longer fit in L1. There was much less needing to tune compiler options with the template approach, it consistently gave the best code. I would venture to say that for performance, there is still need for a programmer visible code generation engine. > > The C++ example only requires the inline keywork be honored, and > > you don't need explicit pow3 pow2, you have pow<3> pow<2> pow > constant>. > > > > Gives a bit more control over code generation. > > It does. I personally feel (actually have learned the hard way) that > using C++ templates for multi-stage programming is mostly much more > trouble than it is worth---especially when you realise what careful > exploitation of C-level specalisation by inlining can do. Relying on the compiler is risky business. Doing code generation and hoping the compiler will optimize something a certain way are really different things. Further, for really interesting code generation, C-level specalisation by inlining is utterly inadiquit. Some tasks, such as data marshalling engines, can benifit greatly from straight forward template style code generation (both in shorter development time and smaller code base), whereas there is no equivilent framework to do that in C. > If you really want more control over code generation (not forgetting > that just writing out what you want by hand is often the simplest > option in practice!) then I think C++ templates are a dead end---far > better to make the object language the same as the target language, > as in MetaOcaml and similar. Indeed, a unified approach is easier to use, though for performance the code generation needs to be tuned at runtime, a task C++ templates certainly cannot pretent to approach. Andrew Lenharth ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners