caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics
Date: Sun, 10 Jun 2007 13:10:51 +0100	[thread overview]
Message-ID: <200706101310.52165.jon@ffconsultancy.com> (raw)
In-Reply-To: <46641BC1.9090305@cs.umd.edu>

On Monday 04 June 2007 15:03:45 Mike Furr wrote:
> I have already started using your suggestion of
> including a 1-element constructor for my trees as it does indeed seem to
> give a noticeable speedup.

I have another suggestion for you:

OCaml's exceptions are very fast and can be used to accelerate many no-op 
paths. For example, when inserting an element into a set that already 
contains that element, the current Set implementation will reallocate every 
node on the path to the element even though the resulting set is identical. 
You can improve performance enormously for the no-op case, avoiding this 
allocation, by raising an exception as soon as you realise the element is 
already present and catching it on the outside of the "insert" function to 
return the input set as the output.

A related optimization is to use physical equality to avoid allocation 
(copying) when the output will be equivalent to the input.

Insertion sort is a good example of this. A naive insertion sort may be 
written:

let rec insertion = function
  | [] -> []
  | h1::t ->
    match insertion t with 
    | h2::t when h1>h2 -> h2::insertion(h1::t)
    | t -> h1::t

but this copies already-sorted tail lists unnecessarily. It can be optimized 
by returning the original "list" when possible:

let rec insertion = function
  | [] -> []
  | h1::t as list ->
    match insertion t with 
    | h2::t when h1>h2 -> h2::insertion(h1::t)
    | t' -> if t==t' then list else h1::t'

More generally, applications such as term rewriters can benefit from 
specialized higher-order functions that incorporate this physical equality 
based optimization. My term rewriter used an "id_map" function:

val id_map : ('a -> 'a) -> 'a array -> 'a array

which returns the input when possible. This can avoid a lot of unnecessary 
allocation during rewriting and doubled the performance of the whole program!

My implementation of id_map was as follows:

let id_map f a =
  if a = [||] then a else
    let b = ref a in
    try
      for i = 0 to length a - 1 do
        let e = f a.(i) in
        if e != a.(i) then begin
          b := Array.copy a;
          (!b).(i) <- e;
          raise (Start (i+1))
        end
      done;
      a
    with Start start ->
      let b = !b in
      for i = start to length a - 1 do
        let e = f a.(i) in
        if e != a.(i) then b.(i) <- e;
      done;
      b

Another useful function along similar lines combines a map and a fold_left 
into one operation:

let id_map_fold_left f x a =
  if a = [||] then x, a else
    let r = ref x in
    let b = ref a in
    try
      for i = 0 to length a - 1 do
        let r', e = f !r a.(i) in
        r := r';
        if e != a.(i) then begin
          b := Array.copy a;
          (!b).(i) <- e;
          raise (Start (i+1))
        end
      done;
      !r, a
    with Start start ->
      let b = !b in
      for i = start to length a - 1 do
        let r', e = f !r a.(i) in
        r := r';
        if e != a.(i) then b.(i) <- e;
      done;
      !r, b

I used the "map" to rewrite one expression into another and the "fold_left" to 
accumulate the state of the interpreter.

I'm sure you can think of many combinations along similar lines. I think a new 
standard library would do very well to work such optimizations into the 
existing framework (e.g. the Set module) and providing some useful additional 
functions would be nice too.

I'll write articles about these sorts of things for the OCaml Journal.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


  parent reply	other threads:[~2007-06-10 12:17 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-31  5:50 Yuanchen Zhu
2007-05-31  6:17 ` [Caml-list] " Jon Harrop
2007-05-31  6:32   ` skaller
2007-05-31  7:31   ` Yuanchen Zhu
2007-05-31  9:08     ` Jon Harrop
2007-05-31  9:22       ` Yuanchen Zhu
2007-05-31 10:27         ` Jon Harrop
2007-05-31 21:30           ` Alain Frisch
2007-06-01  1:22             ` skaller
2007-06-01  1:36               ` Erik de Castro Lopo
2007-06-01  2:21                 ` skaller
2007-06-01  2:49                   ` Erick Tryzelaar
2007-06-01  3:05                     ` skaller
2007-06-01  5:30               ` Alain Frisch
2007-06-01  5:39                 ` Jon Harrop
2007-06-01  6:36                   ` Yuanchen Zhu
2007-06-01  8:09                 ` skaller
2007-06-01  8:53                   ` Richard Jones
2007-06-01  8:59                     ` Richard Jones
2007-06-01  9:22                       ` Stephan Tolksdorf
2007-06-01  9:49                         ` Richard Jones
2007-06-01  9:32                       ` Stephan Tolksdorf
2007-06-01 10:02                     ` skaller
2007-06-01 11:29                 ` Yaron Minsky
2007-06-01 11:43                   ` Erik de Castro Lopo
2007-06-01 11:58                     ` Jon Harrop
2007-06-01 13:49                       ` Julien Signoles
2007-06-01 14:18                         ` Stephen Weeks
2007-06-01 14:43                           ` Julien Signoles
2007-06-01 14:57                           ` Brian Hurt
2007-06-01 15:40                             ` Alain Frisch
2007-06-01 15:58                               ` Brian Hurt
2007-06-01 16:25                                 ` Markus Mottl
2007-06-01 16:47                               ` Jon Harrop
2007-06-01 23:26                             ` skaller
2007-06-01 23:49                               ` Brian Hurt
2007-06-02  3:26                                 ` skaller
2007-06-01 12:40                     ` Erik de Castro Lopo
2007-06-01 13:56                       ` Julien Signoles
2007-06-01 11:49                   ` David MENTRE
2007-06-01 14:41                     ` skaller
2007-06-01 16:52                       ` Jon Harrop
2007-06-01 23:33                         ` skaller
2007-06-01 16:14                     ` Markus Mottl
2007-06-01 16:46                       ` Jon Harrop
2007-06-01 17:13                       ` Jon Harrop
2007-06-04 14:03                         ` Mike Furr
2007-06-04 14:39                           ` Jon Harrop
2007-06-04 15:33                             ` Mike Furr
2007-06-04 18:08                             ` skaller
     [not found]                               ` <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com>
2007-06-04 22:19                                 ` Fwd: " Till Varoquaux
2007-06-04 23:40                                   ` Jon Harrop
2007-06-05  2:24                                   ` skaller
2007-06-04 22:44                               ` Pierre Etchemaïté
2007-06-05  1:42                                 ` Jon Harrop
2007-06-05 10:30                                   ` Jon Harrop
2007-06-10 12:10                           ` Jon Harrop [this message]
2007-06-10 12:58                             ` skaller
2007-06-01 14:15                 ` Stephen Weeks
2007-06-01 14:37                   ` Brian Hurt
2007-06-01 14:39                   ` Eric Cooper
2007-05-31  9:24       ` Yuanchen Zhu
2007-05-31 10:25       ` Loup Vaillant
2007-05-31 10:30         ` Jon Harrop
2007-05-31 12:12     ` skaller
2007-05-31  7:11 ` Daniel Bünzli
2007-05-31 15:15 ` Christophe Raffalli
2007-05-31 15:23   ` Jon Harrop
2007-05-31 15:35     ` Christophe Raffalli
     [not found]       ` <604682010705310923o5a1ee0eiee5ae697da9d3c60@mail.gmail.com>
2007-05-31 20:14         ` Stephen Weeks
2007-05-31 15:16 ` Christophe Raffalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200706101310.52165.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).