From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 6525DBC69 for ; Thu, 31 May 2007 07:50:07 +0200 (CEST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.226]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l4V5o6tg011591 for ; Thu, 31 May 2007 07:50:06 +0200 Received: by wr-out-0506.google.com with SMTP id i28so46514wra for ; Wed, 30 May 2007 22:50:06 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=KTTACytRJ72NcMgBBhsVvN/r6gkBKkHantzm6avEjJuLzY4164+vALfkwOdZxlxPjXUVUFe/3PNETDLLhK1LxA4PNTARKY/K2RbHIgaSxiEcnyLu7+K/hGGS+FZ4DshTdAmXIaZh7SAt4n9IIowkXOjpF5MjtYLpTCYGAc27rio= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=BZRL8XvPblPF2A/YUVXPKMTFYrzzviBBt2WgGChVa/bUeAhIuFJZIdqQqKshWxcL4FBOqcwtkrG2eoq3pAxF2ULM3xAVozADuQsIzKHqaCAcKifDXImAbwvnTY3cP8BVQiP2nGsnuADlyxcUqEjmZDCWJ29YSqBU9nZCb4WxbWk= Received: by 10.78.160.2 with SMTP id i2mr126835hue.1180590605164; Wed, 30 May 2007 22:50:05 -0700 (PDT) Received: by 10.78.90.19 with HTTP; Wed, 30 May 2007 22:50:05 -0700 (PDT) Message-ID: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com> Date: Thu, 31 May 2007 13:50:05 +0800 From: "Yuanchen Zhu" Sender: yuanchen.zhu@gmail.com To: caml-list@yquem.inria.fr Subject: Comparison of OCaml and MLton for numerics MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 963145956a5a32d4 X-Miltered: at discorde with ID 465E620E.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 ocaml:01 sml:01 trivial:01 prototyping:01 ocamlopt:01 -inline:01 -unsafe:01 translated:01 uglier:01 numerically:01 ocamlopt:01 img:98 img:98 1.0:98 Greetings! I've been using Ocaml for implementing computer graphics algorithms. One of the algorithms that I implemented was: Li etal, "Compressing and Companding High Dynamic Range Images with Subband Architectures" For intellectual entertainment, I next converted the code to SML, and got some interesting performance numbers on Ocaml and MLton. Since the implementation is not a trivial micro benchmark, I thought I'd share it with the group. I also have some questions regarding the future of ocaml. The most expensive part of the algorithm is image convolution. The convolution kernel is separable, so the actual computation is done for one horizontal/vertical strip at a time, i.e., it is 1-d convolution. For rapid prototyping and easier maintenance, I used high-order functions a lot in my code. For example, the horizontal convolution routine looks something like the following: let hconvolve kern (img:t) r = let ac y x s (kx, ky) = s +. ky *. getReflected img y (x + kx) 1.0 r in mapi (fun y x _ -> Array.fold_left (ac y x) 0.0 kern) img Where the mapi function is basically a 2D version of Array.mapi. The code is compiled using ocamlopt with arguments: -inline 4 -S -ffast-math -unsafe. Next I translated the Caml code to Standard ML and compiled using MLton with no additional arguments. The performance numbers were as following: Ocaml (unsafe) : user: 39.674s, real: 41.356s MLton (safe): user: 17.981s, real: 21.968s Note that I didn't find an option that turns off array bound-checking for MLton, so the MLton version is safe. Clearly polymorphic function application in the inner-loop is killing Ocaml. So I recoded the convolution function using for-loops. The code is now much uglier: let hconvolve kern (img:t) r = let sup = Array.length kern - 1 in let img' = create (size img) in for y = 0 to height img - 1 do for x = 0 to width img - 1 do let s = ref 0.0 in for i = 0 to sup do let (kx, ky) = kern.(i) in s := !s +. ky *. getReflected img y (x + kx) 1.0 r done; img'.(y).(x) <- !s; done done; img' The new running time is: Ocaml (unsafe) : user: 21.477s, real: 23.366s which is much in line with MLton: MLton (safe): user: 17.981s, real: 21.968s Although note that the MLton version has array-bound check enabled and used the two-line high order function version of hconvolve. So the moral of the story: To use Ocaml for numerically intensive work, code in C style in the inner loops. This brings me to the next question: is there any plan to implement type specialization optimization for ocamlopt? For numerics, this is really crucial if you want write both in an elegant functional style and get good performance. Also, I remember reading somewhere that the current code base of Ocaml is ill-suited for implementing this kind of optimization. May I ask what exactly needs to be done to the current code base in order to support that? I have some compiler-writing background and this sounds like an interesting project to work in my past time. Regards, Yuanchen