Re: [Caml-list] Persistence

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Jon Harrop <jon@jdh30.plus.com>
To: "Don Syme" <dsyme@microsoft.com>
Cc: "Caml mailing list" <caml-list@inria.fr>
Subject: Re: [Caml-list] Persistence
Date: Mon, 21 Feb 2005 23:24:10 +0000	[thread overview]
Message-ID: <200502212324.11937.jon@jdh30.plus.com> (raw)
In-Reply-To: <5DCA48FADB33FF4D8C32A164DF24F2B0027899E2@EUR-MSG-03.europe.corp.microsoft.com>

On Monday 21 February 2005 20:17, Don Syme wrote:
> Could you send around the source code to this?  I intuitively knew that
> such performance results were likely, but had never had such a
> convincing example to wave at people.

Yes. The implementation of traveling salesman given in my book is another 
example which is several orders of magnitude faster than the C/C++/Fortran 
implementations given in all of the other text books that I have read.

This problem also happens to be a defacto standard test given to most natural 
science undergrads on their computing courses, which should be good for book 
sales. Tell everyone you know. :-)

> It would also be interesting to ask some C++ programmers how they would
> go about optimizing this code, for a couple of reasons
>
> (a) to see if they come up with anything faster; and

I managed to, after a while. However, doing this optimisation required me to 
have several things:

1. Estimates of run-time sizes of data structures.
2. Knowledge of the internals of the STL set implementation.
3. Knowledge of the complexities of various operations over STL sets.
4. Huge kahunas, as the resulting code is much more likely to be wrong.

The optimised C++ is also specialised by (1), whereas the OCaml code will 
always be efficient (because Xavier wrote the hard part ;-). Also, the C++ 
implementation consumes twice as much memory (which could be a real killer as 
many scientific computations use resources to the limit of the available 
computer).

Both the OCaml and the C++ can probably be optimised more, but I'm interested 
in problems where author-time is comparable to execution-time. Were I to 
continue optimising, I would move to hash sets, but these require an 
imperative approach which negates the principal advantage of the current 
OCaml implementation: persistence.

> (b) to see if they would ever even think of using
> sharing-of-complex-immutable-internal-data-structure-nodes as an
> optimization technique.

No way. :-)

Also (from the traveling salesman solution):

 (c) to see if they would ever even think of reinventing closures.

For people who just want a quick look, here is the OCaml code for the core 
function:

  let rec nth_nn =
    let memory = Hashtbl.create 1 in
    fun n (i, io) ->
      try Hashtbl.find memory (n, i)
      with Not_found -> match n with
        0 -> AtomSet.singleton (i, io)
      | 1 ->
          let nn = bonds.(i - 1) in
          if io = zero then nn else
          let aux (j, jo) s = AtomSet.add (j, add_i io jo) s in
          AtomSet.fold aux nn AtomSet.empty
      | n ->
          let pprev = nth_nn (n-2) (i, io) in
          let prev = nth_nn (n-1) (i, io) in
          let aux j t = AtomSet.union (nth_nn 1 j) t in
          let t = AtomSet.fold aux prev AtomSet.empty in
          let t = AtomSet.diff (AtomSet.diff t prev) pprev in
          Hashtbl.add memory (n, i) t;
          t in

$ time ./nth 30 1 <cfg-diamond >tmp

real    0m4.167s
user    0m3.940s
sys     0m0.000s

$ time ./nth.opt 30 1 <cfg-diamond >tmp

real    0m0.492s
user    0m0.450s
sys     0m0.000s

Here's the elegant C++ equivalent:

const AtomSet nth_nn(int n, int i, const vector<int> io) {
  const Map::key_type k=make_pair(n, make_pair(i, io));
  Map::const_iterator m=memory.find(k);
  if (m != memory.end()) return m->second;
  AtomSet s;
  if (n == 0) {
    s.insert(make_pair(i, io));
    return s;
  } else
    if (n == 1) {
      const AtomSet &nn=bonds[i-1];
      for (AtomSet::const_iterator it=nn.begin();
    it != nn.end(); it++) {
 int j = it->first;
 vector<int> jo = it->second;
 for (i=0; i<d; i++)
   jo[i] += io[i];
 s.insert(make_pair(j, jo));
      }
      return s;
    } else {
      const AtomSet
 &pprev = nth_nn(n-2, i, io),
 &prev = nth_nn(n-1, i, io);
      for (AtomSet::const_iterator it=prev.begin();
    it != prev.end(); it++) {
 const AtomSet &t = nth_nn(1, it->first, it->second);
 AtomSet t2;
 set_union(s.begin(), s.end(),
    t.begin(), t.end(),
    inserter(t2, t2.begin()));
 s=t2;
      }
      AtomSet s2;
      set_difference(s.begin(), s.end(),
       prev.begin(), prev.end(),
       inserter(s2, s2.begin()));
      s=AtomSet();
      set_difference(s2.begin(), s2.end(),
       pprev.begin(), pprev.end(),
       inserter(s, s.begin()));
    }
  memory[k] = s;
  return memory.find(k)->second;
}

With this C++, I get:

$ time ./nth 30 1 <cfg-diamond >tmp

real    1m31.178s
user    1m12.680s
sys     0m0.100s

With my sneaky optimisation:

$ time ./nth 30 1 <cfg-diamond >tmp

real    0m0.479s
user    0m0.480s
sys     0m0.000s

I've just noticed that the timing results are quite interesting functions of 
"n". I think I'll plot them...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com

next prev parent reply	other threads:[~2005-02-21 23:22 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-21 20:17 Don Syme
2005-02-21 23:24 ` Jon Harrop [this message]
2005-02-22  9:06   ` Diego Olivier Fernandez Pons
2005-02-22 11:59     ` Thomas Fischbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200502212324.11937.jon@jdh30.plus.com \
    --to=jon@jdh30.plus.com \
    --cc=caml-list@inria.fr \
    --cc=dsyme@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).