caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jacques GARRIGUE <garrigue@kurims.kyoto-u.ac.jp>
To: postmaster@jdh30.plus.com
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Does Caml have slow arithmetics ?
Date: Thu, 08 Jul 2004 14:05:27 +0900 (JST)	[thread overview]
Message-ID: <20040708.140527.70987434.garrigue@kurims.kyoto-u.ac.jp> (raw)
In-Reply-To: <200407072226.36912.postmaster@jdh30.plus.com>

From: Jon Harrop <postmaster@jdh30.plus.com>

> Funny you should say that. I've only recently been trying my hand at such 
> low-level optimisations in OCaml and, to my surprise, I'm terrible at it (no 
> comments, please!).
> 
> Specifically, I recalled a post that ocamlopt doesn't do CSE. On the basis of 
> this I thought that I could optimise my original code:
> 
>       let pivot = (first + last)/2 in
>       let sum1=helper (scale+1) (2*entry) first pivot in
>       let sum2=helper (scale+1) (2*entry+1) pivot last in
>       Array.unsafe_set data2 entry (sum1 -. sum2);
>       sum1 +. sum2
> 
> by performing some CSE manually, for example:
> 
>       let pivot = (first + last)/2 in
>       let (sum1, sum2) =
>         let scale=scale+1 and entry=2*entry in
>         (helper scale entry first pivot, helper scale (entry+1) pivot last)
>       in
>       Array.unsafe_set data2 entry (sum1 -. sum2);
>       sum1 +. sum2

Interesting.
Actually, I suppose that this kind of situation could be optimized,
by working at the Clambda level. This is a bit unfortunate, as most
"non-local" optimizations are expected to be done at earlier stages,
but would be necessary to cope with inlining too.

Yet, in this example sharing the computation for scale and entry is
useless: addition and multiplication by a power of 2 cost virtually
nothing anyway.

> I also tried applying part of the arguments to Array.set to avoid
> the creation of a pair:
> 
>       let pivot = (first + last)/2 in
>       let set=Array.unsafe_set data2 entry in
>       let scale=scale+1 and entry=2*entry in
>       let sum1=helper scale entry first pivot in
>       let sum2=helper scale (entry+1) pivot last in
>       set (sum1 -. sum2);
>       sum1 +. sum2
> 
> Contrary to my expectations, these revised versions are 30 and 70%
> slower than the original, respectively.

This is a well-known behaviour, building the closure "set" is costly
for two reasons: a data structure holding it must be allocated
(causing eventually GC), and calls through a closure cost more than
direct ones (particularly if the "helper" function is known.)
In your case, there is even a 3rd reason: unsafe_set is not a real
function, but a primitive, which just generates the operations to
modify an array. By wrapping it in set, you make it a function.
And for some reason, ocamlopt does not inline set, leaving you with a
function call where there should have been none.

You may look at an abstraction of the code generated with the -cmm
option. For the above code it gives you:

(data
 int 3319
 "camlPivot__2":
 addr "caml_curry3"
 int 7
 addr "camlPivot__fun_75")

(function camlPivot__fun_75 (prim/73: addr prim/72: addr prim/71: addr)
 (if (!= (load unsigned int8 (+a prim/73 -4)) 254)
   (extcall "caml_modify" (+a (+a prim/73 (<< prim/72 1)) -2) prim/71 unit)
   (store float64u (+a (+a prim/73 (<< prim/72 2)) -4)
     (load float64u prim/71)))
 1a)

(function camlPivot__pivot_57
     (first/58: addr last/59: addr data2/60: addr entry/61: addr
      scale/62: addr helper/63: addr)
 (let
   (pivot/64 (+ (<< (/ (>>s (+ (+ first/58 last/59) -1) 1) 2) 1) 1)
    set/65 (app "caml_apply2" data2/60 entry/61 "camlPivot__2" addr)
    scale/66 (+ scale/62 2) entry/67 (+ (* 4 (>>s entry/61 1)) 1)
    sum1/68
      (app "caml_apply4" scale/66 entry/67 first/58 pivot/64 helper/63 addr)
    sum2/69
      (app "caml_apply4" scale/66 (+ entry/67 2) pivot/64 last/59 helper/63
        addr))
   (app (load set/65)
     (alloc 2301 (-f (load float64u sum1/68) (load float64u sum2/69))) set/65
     unit)
   (alloc 2301 (+f (load float64u sum1/68) (load float64u sum2/69)))))

You can even see another problem here: the function call to set requires
wrapping your float into a heap value, which introduces yet another
inneficiency.

Jacques Garrigue

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


      reply	other threads:[~2004-07-08  5:05 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-07  6:45 Diego Olivier Fernandez Pons
2004-07-07  8:14 ` Richard Jones
2004-07-07  9:13 ` Basile Starynkevitch [local]
2004-07-07 12:32   ` Diego Olivier Fernandez Pons
2004-07-07 13:00     ` Basile Starynkevitch [local]
2004-07-07 15:09       ` Olivier Pérès
2004-07-07 17:06         ` David Brown
2004-07-07 17:43           ` Olivier Pérès
2004-07-08  3:40             ` David Brown
2004-07-08 11:06               ` Olivier Pérès
2004-07-07 13:06     ` Richard Jones
2004-07-07 13:22     ` David Haguenauer
2004-07-07 15:48     ` Brian Hurt
2004-07-07 13:57   ` Evgeny Chukreev
2004-07-07 14:58     ` Xavier Leroy
2004-07-07 15:48       ` Evgeny Chukreev
2004-07-07 19:16       ` skaller
2004-07-08  3:44         ` David Brown
2004-07-08  5:50           ` skaller
2004-07-08  9:51           ` Andreas Rossberg
2004-07-08 12:03             ` Marcin 'Qrczak' Kowalczyk
2004-07-08 14:04             ` David Brown
2004-07-08 14:36               ` Luc Maranget
2004-07-08 15:11                 ` Alex Baretta
2004-07-08 15:49                   ` Luc Maranget
2004-07-09 14:06                     ` Alex Baretta
2004-07-09 16:20                       ` Markus Mottl
     [not found]                     ` <Luc.Maranget@inria.fr>
2004-07-09 17:54                       ` Norman Ramsey
2004-07-12  8:08                         ` Luc Maranget
2004-07-08 15:51                   ` Markus Mottl
2004-07-08 18:27                     ` skaller
2004-07-08 21:14                     ` [Caml-list] tail recursion and register poor Intel architecture Brandon J. Van Every
2004-07-08 21:35                       ` Brian Hurt
2004-07-08 23:01                         ` Brandon J. Van Every
2004-07-09  4:36                           ` Brian Hurt
2004-07-09  6:53                           ` Florian Hars
2004-07-09 14:44                             ` Brandon J. Van Every
2004-07-09  6:55                           ` skaller
2004-07-09 14:45                             ` Brandon J. Van Every
2004-07-09 16:09                               ` skaller
2004-07-10  9:19                                 ` [Caml-list] embedded OCaml Brandon J. Van Every
2004-07-11 23:11                                   ` Alex Baretta
2004-07-12  7:39                                     ` Brandon J. Van Every
2004-07-12 14:04                                       ` Alex Baretta
2004-07-12 18:48                                         ` Brandon J. Van Every
2004-07-12 23:22                                           ` Richard Jones
2004-07-13  6:39                                             ` Alex Baretta
2004-07-13  8:47                                               ` Brandon J. Van Every
2004-07-13  8:58                                                 ` Benjamin Geer
2004-07-13  9:47                                                   ` Brandon J. Van Every
2004-07-13  9:18                                                 ` Damien Doligez
2004-07-13  9:56                                                   ` [Caml-list] OCaml as business model Brandon J. Van Every
2004-07-13  9:59                                                     ` Richard Jones
2004-07-13 10:50                                                       ` Brandon J. Van Every
2004-07-13 11:20                                                     ` Damien Doligez
2004-07-13 12:01                                                       ` Brandon J. Van Every
2004-07-14  8:05                                                 ` [Caml-list] embedded OCaml I R T
2004-07-12 10:25                                     ` Christophe TROESTLER
2004-07-13  7:06                                       ` Alex Baretta
2004-07-08 17:12                 ` Tail calls (Was Re: [Caml-list] Does Caml have slow arithmetics ?) brogoff
2004-07-08 17:23                   ` Richard Jones
2004-07-12 17:07                   ` Xavier Leroy
2004-07-12 21:13                     ` skaller
2004-07-13  7:20                       ` Xavier Leroy
2004-07-08 15:00             ` [Caml-list] Does Caml have slow arithmetics ? Brian Hurt
2004-07-08 13:30         ` Alex Baretta
2004-07-07 21:26       ` Jon Harrop
2004-07-08  5:05         ` Jacques GARRIGUE [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040708.140527.70987434.garrigue@kurims.kyoto-u.ac.jp \
    --to=garrigue@kurims.kyoto-u.ac.jp \
    --cc=caml-list@inria.fr \
    --cc=postmaster@jdh30.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).