From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id D7871BC0C for ; Wed, 3 Jan 2007 17:40:45 +0100 (CET) Received: from mta4.srv.hcvlny.cv.net (mta4.srv.hcvlny.cv.net [167.206.4.199]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l03Geigd017593 for ; Wed, 3 Jan 2007 17:40:45 +0100 Received: from [192.168.15.101] (ool-457da870.dyn.optonline.net [69.125.168.112]) by mta4.srv.hcvlny.cv.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTP id <0JBA0054WWZTS7U0@mta4.srv.hcvlny.cv.net> for caml-list@yquem.inria.fr; Wed, 03 Jan 2007 11:40:44 -0500 (EST) Date: Wed, 03 Jan 2007 11:43:46 -0500 From: Serge Aleynikov Subject: Re: [Caml-list] Question on writing efficient Ocaml. In-reply-to: <200612290207.19141.jon@ffconsultancy.com> To: Jon Harrop , oversby@hotmail.com Cc: caml-list@yquem.inria.fr Message-id: <459BDD42.8080101@hq.idt.net> MIME-version: 1.0 Content-type: multipart/mixed; boundary="Boundary_(ID_ThCKxqQOWgV5TMFfL+IzSw)" References: <200612290207.19141.jon@ffconsultancy.com> User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) X-Miltered: at discorde with ID 459BDC8C.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 ocaml:01 erlang:01 cpp:01 cpp:01 erlang:01 -unsafe:01 verbose:01 unboxing:01 iter:01 flatten:01 argv:01 generality:01 polymorphism:01 optimising:01 X-Attachments: type="application/x-gzip" name="nqueens.tar.gz" name="nqueens.tar.gz" This is a multi-part message in MIME format. --Boundary_(ID_ThCKxqQOWgV5TMFfL+IzSw) Content-type: text/plain; charset=ISO-8859-15; format=flowed Content-transfer-encoding: 7BIT Here's a slightly modified version of the algorithm that finds all solutions for the N-Queens problem implemented in C/C++/OCaml/Erlang. $ wc -l nqueens.{c,cpp,ml,erl} 93 nqueens.c 109 nqueens.cpp 61 nqueens.ml 61 nqueens.erl On my host timing is as follows: $ make run # 8 queens, 1000 times # Program Time ./queens_c: 0.96 ./queens_cpp: 1.16 ./queens_ml: 0.77 ./queens_ml.bcode: 22.55 erl -noshell -run nqueens main 8 1000 -s erlang halt -- : 8.56 Note that Erlang code is natively compiled. Without the native flag it gives about the same time as queens_ml.bcode. I also have a parallel version of Erlang implementation of this algorithm that in SMP mode scales linearly with the number of CPUs, but since this "number crunching" problem domain is not for meant for Erlang, I haven't included it in this email. If we use optimization (-O3 for C/C++ and -unsafe for OCaml) the timing picture is slightly different. Serge Jon Harrop wrote: > On Thursday 28 December 2006 11:42, Ian Oversby wrote: >> I've written some Ocaml code to solve the problem of placing 8 queens on a >> board so that none of them attack each-other. > > Your program is very verbose: many times longer than is necessary. Most of > this verbosity can be attributed to performing various boxing, unboxing and > allocation tasks that are superfluous and simply slow the code down. However, > profiling suggests that you're also using a different algorithm in OCaml as > the core functions are called many more times in the OCaml than in the C++. > > Contrast your code with a minimal program to solve the n-queens problem in > OCaml: > > let rec unthreatened (x1, y1) (x2, y2) = > x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1;; > > let rec search n f qs ps = > if length qs = n then f qs else > iter (fun q -> search n f (q::qs) (filter (unthreatened q) ps)) ps;; > > let ps = rev (flatten (init n (fun i -> init n (fun j -> i, j)))) > search n f [] ps > > This program starts with a list of all board positions "ps" and simply filters > out threatened positions as queens are added. Performance is poor compared to > your C++: > > 0.625s C++ (130 LOC) > 4.466s OCaml (28 LOC) > > My first solution is written for brevity and not performance so it is still 3x > slower than the C++: > > The core of the program is just this: > > open List;; > > let print_board n queens = > for y=0 to n-1 do > for x=0 to n-1 do > print_string (if mem (x, y) queens then "Q" else ".") > done; > print_newline() > done; > print_newline();; > > let rec fold_n2 f accu x y n = > if y=n then accu else > if x=n then fold_n2 f accu 0 (y + 1) n else > fold_n2 f (f accu x y) (x + 1) y n;; > > let unthreatened (x1, y1) (x2, y2) = > x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1;; > > let rec search n f queens () x y = > if for_all (unthreatened (x, y)) queens then > if length queens = n - 1 then f ((x, y) :: queens) else > fold_n2 (search n f ((x, y) :: queens)) () (x + 1) y n;; > > then I wrote this to search once and print and then search a given number of > times (for benchmarking): > > exception Queens of (int * int) list;; > > let _ = > let n = 8 in > let f qs = raise (Queens qs) in > (try search n f [] () 0 0 with Queens qs -> print_board n qs); > match Sys.argv with > | [|_; reps|] -> > for rep=2 to int_of_string reps do > try search n (fun _ -> raise Exit) [] () 0 0 with Exit -> () > done > | _ -> () > > The "fold_n2" and "search" functions are both polymorphic folds. The former > folds over a square array and the latter folds over all solutions to the > n-queens problem. The generality of the "search" function is not used in this > case, as I just print the first solution found and bail using an exception. > > On my machine, 1000 solutions takes: > > 0.625s C++ (130 LOC) > 1.764s OCaml (30 LOC) > > So OCaml is >4x more concise but 2.8x slower than C++. > > Profiling shows that a lot of time is spent in List.for_all. We can roll this > ourselves to remove some polymorphism and improve performance: > > let rec unthreatened x1 y1 = function > | (x2, y2) :: t -> > x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1 && > unthreatened x1 y1 t > | [] -> true;; > > let rec search n f queens () x y = > if unthreatened x y queens then > if length queens = n - 1 then f ((x, y) :: queens) else > fold_n2 (search n f ((x, y) :: queens)) () (x + 1) y n;; > > This gets the time down to: > > 1.159s OCaml (33 LOC) > > We can continue optimising by removing some generality. Let's turn the "fold" > into an "iter" so that "search" becomes monomorphic: > > let rec iter_n2 f x y n = > if y if x=n then iter_n2 f 0 (y + 1) n else > (f x y; > iter_n2 f (x + 1) y n);; > > let rec search n f queens x y = > if unthreatened x y queens then > if length queens = n - 1 then f ((x, y) :: queens) else > iter_n2 (search n f ((x, y) :: queens)) (x + 1) y n;; > > Performance is improved even more, at the cost of generality. > > 0.827s OCaml (34 LOC) > > So OCaml is now only 30% slower whilst still being ~4x more concise. > --Boundary_(ID_ThCKxqQOWgV5TMFfL+IzSw) Content-type: application/x-gzip; name=nqueens.tar.gz Content-transfer-encoding: base64 Content-disposition: inline; filename=nqueens.tar.gz H4sIAIvam0UAA+1be3PbOA7vv9WnwPqSVkpsx8qzY9fZzfZxk8y0t93k5mYuzXgUmYmVypJX kpNm2/SzHwCSEiXbeez2MTcjzG5tkSAIggD4A+VEf0yFiNK2/+jbUQdpe3OTP3e2d+jTdTc6 /Iy0iU+P3M7W5vbOVmdjZ+MRPu90th9B5xvqlNM0zbwE4FEqknNxC99QXIaT76HQ96W1Ffht mkziVHThVXA+ykA6BEyS+DQUY0jj8FIkYMEK7E2zUZx0AQ7JVrAXiuso+BBfwnM23i+jP9rB MGtHItsl9pdehkKRNjqtl8JvraMXUPuLeDwJQuw6931o/ccLQ3imvRBasVJg4CPvmvWPIPLD 6VDgHNkwiNuj3XJTGJxSm4W7mAU+BFG2Aqexlwx7RhPQKgZ+PMWvfeiUu7zTOMnEEPXkrux6 IobijPtO4zjsWRba6DAOp1kQRxBPs8k0I82syzgYqmebuIfBuMnDxt7HAU7Y5PHYnE5C79oB 65NF1iCOoAkXPX7if3LtVldla3AG9k9asSdPChlSBNEkQUFndmO59Wy41mgWMpxeznMWJ2AH fbcHATzvk4I9WF0NnGLw+hCHBsYQ3fM+atxHUKFPSadnwy5UBOdiLvqdHlzAcy3loiqlJCkl OTbv6HHQck+g3wcc8TM03jUAJ1lpVOa4matReTk3tyxX9mnb9/Vm0i4UTrSbt/esG3aQIxGG Ke1bEl+hkaI4g2yUCAyBCMWQu7AzpN6ZKDtLUHIM0ysWWCvXnZxEGYbNIr9fnMDnz6h/WvS1 8i6H+IKWIYQoEdk0iVBrsq1UwGzCmVuuXucbL4jAC8/jJMhGY/AxenF9ifCnSRpcivC6CA0Z x/cIDW2IxXawLzg4tSFoM3R4zLhQYRItRRuLjS9VcapOR/3Fap0Zh1RxzqPzReiwLHugCFMx M14ZY+541GfVnXHPG7K4tFkQsRW95Nxvgj/ykhX6fnl8okyWmzgLxiI1JghYqGlrbrDUgkkg rdk1rKEj4t+pd445GgMQJaPqlPyOg0wkHuXB9ASO1SxT+YwR1JRadU6KtSg/crnhRs9NW0jU By+LA5tHuWoUL4H7pHq7sE7hXjCuoxd3lUQdnAXzRpl5g5nRc+S0bALklpPQyjG9a5+Rk9IJ 4rBfx77crjT4U8Rn1OE4uflURkTJZp7G7ChlV9NjnjpwhG673SU6hRH5NAhKPql36SjOvBB3 aQj5TvBGVE6DG17lWSKEzAqOXLbanQ4F948GIjX9EIo08pp8O3B7F/7f3tnM8f/6lov4f319 u1Pj/+9BJv5/25qL/b8J9F9dVdDfcEAT/E8mD4H/pTbx0RcT1MPyQy9NYY/Scxcm09MQIT/2 d7uSg060Tzc9zff2HU+tQIhk7+Y5V/XyOfxSo5k3+TnLrRIMdKF0+GN6t3GAk6d2WzMaWdru 4FNpmDoQbDWF05RnlB2JK5rtGEXi0fZJJnamL1pFbIahCEUm4PhEFUNgMDI4M5ZjonTu+12k IsOK5wWrRuLK5ZOBn7mCMliZPumDJR+mh+CxdUmuUR5P0NaAbxWwUvAV4srtORQ3WvMqsCz4 UOPvmUX/SwI8h2B7fGWz1yDPzR31nzJ6tzt3vHKmufXd36rpaAVfra5bIOwv1HYM3f8f6jva gDsquwJv8Y7KPGI7Cwq+IF1Y8+UuYjifrvlusdr/TZ2Xr6+UU4oFyvWVCrgHFG3SaA+u13Q4 3qM002r/vRpsUREmRZazWu6C8wqwr1yBlUswfQ7cVoE9rAR7YA12SxGmHenWwqiowCBLrg2r Gams01tciDEpyFE553rzeJRzdCq9c0qy0t4tqsuU1GLSWZfDWMv8EcgTBJ4IPuVBL/prl2wa fo3DryFtPt2F/zcK/O9udjYJ/7v4UeP/70D2D7z/j31vHMaTrI3/Q2sa0f0cFA5plAPjEAc6 FmJa46jGFJKIM+gA2AZA4z5UeAXPWMvs4NAMonMSxJIUlPMgynNYkZxUSveTKurEyKeDXnNm IxGBLRMBnS5gFzj+N04GbZkTQKE6+GkOjKXkhacZuJDFqM4wnhmLgA45hnEkKphugBVBGEQ4 sXObvFKmqgpnSKcuC8GUwac2y2i5M1IKFdIsIcPaaBuvbSOMc+gcl7ZpvEPRbJnGClTBXHk9 89ZkzWedy+aoIwbV+MmsWPTusj6JF6Aurz4Gmdov5Q0Ibfh2nv2BbHc2jXxyHBb5GZdzRcPJ ImSM+dTa5aEXwD7pJQKuBKDv4cyARWaG3hlOxxG5YEkqWY1shp8Xzh1SEWmiOBSNYZkFCMzW MCiDP+MITxyFP3+eneA0xeOZZ2nJWRy+7MQnnHDBBMPAO4+j+VLvQSjVMKh9gfiqMLXKMvOC L9Dhd6sHysU8R/17ReNYHp3GtB0HrhDB5hyf4cyjba8oas9avUxol2yEllEbyDYqoL5jyM+S qZBGD5SvoPxbck0uXyMFXPc0mie0qvQtNmQ0S0L9kfA/oMUJwFKMYrWSatEcUj3Okgz3MeWf J96YEyTF0OF12g4omXoYCJdC5zqwOHBoJxHfpeRHe0niXbdDEZ1nI6BhBPzI1VzMoTommfk5 5SSSky+O5AxQSCUp/QX0O5ueKnv4Go+dyBuL9qmX8pdc1zYiPEfrSiQoQbh5Wi9pW4BnSkLx mc5/hTC3JIvGFJhaW2JXW2KREATdbGi3KqtA3IWs9dtlbShZnaqs4sSzFR6HsvKlo0SyDOOS oYtU2+U36GYfYXTtp8pLfA4bNiIGJ31UnbdTEkHhK/M1x2lvwYlQcZ/5ANw8fHv1K49HBf4X ybcrAO7A/1udnS2F/zd3ttY35f3/Vo3/vwctL98T/yPjQ/A/ss/F/9ie43/0OR9WI48PF8MR ien3adSViTjBQiCK05EI8UsyjTQn38nAM3DRhaCVEqOHOW/khRkKsFrjeDgNha24nbbV8lh/ +2lV26fUKT5O6H7vmKSuubpSX9s4wU4rjOOJ/bYJ+N/rJuwxYnpNDXuYj7hzv9y552LKfE2t e04TJMeqm/O4KJQaB1GoR/Kwsqj4A44N4i4ykRJoFXmrVSAFTMk+ZrY4FHQUvNQAKmW80OZN Oxrxkf+U31I/lcBE10hpcI7wjqCMxy8z+E6ChmOKRuAwbFvqZw+DX+l+rgkDfXHRVDCqDwP9 OoNhzoeeHgJqCBRDTFYJCuIuHi8I2uzGl9azK7ryPtbsJ05TXgrhHPQG3mZ5qlHbTloUkToa bSBxrCFy/aoBgCL3T/D0ix429tkV33LT2OZduOLdPuooMFhElNHumZouplk9DkgP3qCDfh+F SpXQmAO6b0xtKqWc3l3azJA9ILkGzcilwuxhYsmc80ZQe1veVi3DPhcSMpIAPYxuyvcr1+QW /yhn8Cu5CpoBrXdQ2OAtqbrf7x/07qvXsllmSdlAsgdl2cVmYSdfluuWA9kyb95lrrhktaVW sHgGupgvz9IqT8Gz2vutA2OyZS65ZmUXllE1lW22rrqcHSpX9pa6RUWmI3khjQGOwcXbJsuT dEDo/lwk9ltKNMUjDyg3qcE4KYLNjpJJX3Uss9NTHHAa7o69D2KQTSeYgd/Sj1lkNLyWdbX0 yLyCcW2VK+QkzUI+XQ43CZCiX0kJjOsO5XWzvCjlnOnmq3xNs6npjIBWePDLVYEHv9CF7LEW eKL8ln13gV5FKnupf6aznyczvbYD5JuOjeWt26nIqumBdtMQPB2XZdJ6IU9XtDwzB9IitS6o tp7mntrC4rzqY12kHEzKki6GGxCfyZK8WjyjOD2PenmQl7/7tH0cwocvcDLNhsfgwlSjzg41 +eGLXPXbkjBJ76tVgxcNUcVYvYXVXnvLWKn1oNqx2DnnmXVVvYHCbStCMb9Asvjt0fEhWlu5 6Qnvw0xzExqdBr2FmdMjpz9x8tgNgzSziUUGKj8agcvPOuSVi6qUQF2DLM5jWwmpNmth1XYl FB3vR4PXmv42vcE8fYZo/FvOcUf9564Xf/+xuY2FH/ZubWzU9d/3oKO93//56ugQ87P+5ZXx E6zi9UvxrX3qx8OiXDsV3tiyvBDP1iVbCXMsS8voFr/vsh4Xf++x9Jxe7iz9UjBOJl3zp2DI nP9CbIZ5HHaN10TW49J7pDncUuU5Y3yD21xQt1SNPi7VqUvP6Qdjwou61uNWMobWmbFwWGn7 4xX8N7asw/3/vuo/s47237w67FORallYveKoX+hKjf74BBrtNb3+hvkwmZiP47D8JJfTgPeV U61xR6m8ZJNKDqlLOjmVohlaLWhAD8U+HsaGcOGPYhQroLG05HffZ++zRg9scemF0KD7QGhN AHtmxGNJsb77xIXPkIohCoCnawhrw7V0rb0Ca2uTpzwXzUAXefVZUlNNNdVUU0011VRTTTXV VFNNNdVUU0011VRTTTXV9HXof8q6w8cAUAAA --Boundary_(ID_ThCKxqQOWgV5TMFfL+IzSw)--