From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id EF8EBBC37 for ; Mon, 9 Nov 2009 05:23:51 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvoDAPIn90rNm0EPgWdsb2JhbACQOItAAQEWJL1NhD4EgWg X-IronPort-AV: E=Sophos;i="4.44,705,1249250400"; d="scan'208";a="36395098" Received: from virginia.nps.edu ([205.155.65.15]) by mail2-smtp-roc.national.inria.fr with ESMTP; 09 Nov 2009 05:23:51 +0100 Received: from Adric.ern.nps.edu ([172.20.216.170]) by virginia.nps.edu with Microsoft SMTPSVC(6.0.3790.3959); Sun, 8 Nov 2009 20:23:47 -0800 Received: by Adric.ern.nps.edu (Postfix, from userid 760) id 9330E1727F; Sun, 8 Nov 2009 20:23:28 -0800 (PST) From: oleg@okmij.org To: caml-list@inria.fr Subject: MetaOcaml and high-performance [was: AST versus Ocaml] Message-Id: <20091109042328.9330E1727F@Adric.ern.nps.edu> Date: Sun, 8 Nov 2009 20:23:28 -0800 (PST) X-OriginalArrivalTime: 09 Nov 2009 04:23:48.0092 (UTC) FILETIME=[6ED06BC0:01CA60F4] X-Spam: no; 0.00; oleg:01 metaocaml:01 ocaml:01 metaocaml:01 restrictive:01 compiler:01 ocaml:01 compiler:01 pepm:01 generative:01 gaussian:01 tobias:01 high-level:01 high-level:01 low-level:01 Jon Harrop wrote: > I did some experiments with MetaOCaml. Firstly, it is x86 only and not x64 > which means poor floating-point performance, many times slower than HLVM's. > The language itself is also very restrictive, e.g. you cannot generate > pattern matches dynamically so you cannot leverage the pattern match > compiler, which is a huge loss. In essence, effective use of MetaOCaml > requires writing in continuation passing style which OCaml and, therefore, > MetaOCaml do not handle well (closure invocations are slow in OCaml and CPS > is not optimized back into anything sane). So I do not consider MetaOCaml to > be a viable option for performance users. A few clarifications seems to be in order. First of all, the original poster asked about _offshoring_ with MetaOCaml. When the generated code is run with offshoring, a C of Fortran file is created, which can be compiled with your favorite compiler and dynamically linked back into the running OCaml program. Alternatively, you can use the generated C/Fortran code as it is, as a part of a C/Fortran project. We did exactly the latter in our FFT project: we used MetaOCaml to create C files for FFT kernels, and plugged the files into the FFTW benchmarking framework, which is pure C. It worked as expected. Because offshoring produces a portable C or Fortran code file, you can use the code on 32 or 64-bit platform. The reason the native MetaOCaml without offshoring does not work on amd64 is because at that time OCaml didn't emit PIC code for amd64. So, dynamic linking was impossible. That problem has long been fixed in later versions of OCaml. Offshoring is a good way to get around it, thanks Jan Kybic. Offshoring could just as well produce Verilog or LLVM code. Alas, we didn't get around to exploring that idea. Regarding continuation-passing style (CPS): if your code generatOR needs let-insertion, then the generatOR _may_ need to be encoded in CPS. The generatED code is _not_ in CPS; it is in the `conventional', so-called `direct style'. Even if we assume that CPS code is difficult to compile efficiently (which is not certain: in CPS, all `important' function calls are tail-calls, and OCaml is very good with tail-calls), that difficulty affects only the code generator rather than the generated code. Most of the time it is the generated code that is performance-critical. One may say that writing the generator in CPS is a bother. I have heard such objections. Please see our PEPM2009 paper about a way to address it http://okmij.org/ftp/Computation/Generative.html#circle-shift and write generators in the conventional style. Please see the example of writing a flexible Gaussian Elimination code, paying no penalty for abstractions. Fortunately, some people have considered MetaOCaml to be a viable option for performance users and have reported good results. For example, Tuning MetaOCaml Programs for High Performance Diploma Thesis of Tobias Langhammer. http://www.infosun.fmi.uni-passau.de/cl/arbeiten/Langhammer.pdf Here is a good quotation from the Introduction: ``This thesis proposes MetaOCaml for enriching the domain of high-performance computing by multi-staged programming. MetaOCaml extends the OCaml language. ... Benchmarks for all presented implementations confirm that the execution time can be reduced significantly by high-level optimizations. Some MetaOCaml programs even run as fast as respective C implementations. Furthermore, in situations where optimizations in pure MetaOCaml are limited, computation hotspots can be explicitly or implicitly exported to C. This combination of high-level and low-level techniques allows optimizations which cannot be obtained in pure C without enormous effort.''