caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* RE: Re: [Caml-list] Persistence
@ 2005-02-21 20:17 Don Syme
  2005-02-21 23:24 ` Jon Harrop
  0 siblings, 1 reply; 4+ messages in thread
From: Don Syme @ 2005-02-21 20:17 UTC (permalink / raw)
  To: Jon Harrop, Caml mailing list


Hi Jon,

Could you send around the source code to this?  I intuitively knew that
such performance results were likely, but had never had such a
convincing example to wave at people.  

It would also be interesting to ask some C++ programmers how they would
go about optimizing this code, for a couple of reasons

(a) to see if they come up with anything faster; and 

(b) to see if they would ever even think of using
sharing-of-complex-immutable-internal-data-structure-nodes as an
optimization technique.

Don



-----Original Message-----
From: caml-list-admin@yquem.inria.fr
[mailto:caml-list-admin@yquem.inria.fr] On Behalf Of Jon Harrop
Sent: 18 February 2005 09:40
To: caml-list@yquem.inria.fr
Subject: Fwd: Re: [Caml-list] Persistence


Many moons ago:

On Sunday 06 February 2005 22:26, Radu Grigore wrote:
> I had written:
> > The STL will probably take <<1ms because the STL's size() is
probably
> > O(1) whereas Map's fold is O(n). In contrast, forking lineages is
> > probably O(n) with the STL (requiring a copy) and O(1) with ocaml's
Map.
>
> I haven't actually tested with STL but the implementation of size() is
> indeed a simple return. Do you have an example where forking lineages
> is useful?

I do now. :-)

I've just finished converting one of my example scientific programs from
OCaml to C++. The program basically computes the "n"th nearest
neighbours of
a vertex in a graph. The implementations are based upon sets of
vertices.

The OCaml version simply implements the obvious mathematical expression.
The
only difference is that reuse is implicit in the maths (which is purely
functional) and has to be made explicit in the OCaml code by memoizing
the
function.

My first stab at a C++ version did exactly the same thing. As you know,
OCaml
performance is supposed to to be within a factor of 2 of C/C++. Somewhat
suprisingly, on my first timed run the C++ code turned out to be over
100
times slower than the OCaml. After some profiling it transpired that
virtually all of the time was spent computing set unions which I had
stupidly
implemented using the STL set_union function. ;-)

The reason is quite simple. Thanks to the purely functional
implementation in
OCaml, sets are immutable. OCaml's Set.union function exploits this by
reusing pieces of tree from the input sets in the output ("copying"
arbitrarily-large subsets in O(1) time), happily knowing that they
cannot be
altered afterwards. In contrast, the C++ implementation of set_union can
make
no such assumption and is forced to copy most of the elements from both
of
the input sets. The performance of set_union in the STL is, therefore,
vastly
worse than the performance of Set.union in OCaml, thanks to persistence.

In summary, like me, you've probably been using persistence a lot
without
realising it. Every time you do set operations, for example.

HTH.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Persistence
  2005-02-21 20:17 Re: [Caml-list] Persistence Don Syme
@ 2005-02-21 23:24 ` Jon Harrop
  2005-02-22  9:06   ` Diego Olivier Fernandez Pons
  0 siblings, 1 reply; 4+ messages in thread
From: Jon Harrop @ 2005-02-21 23:24 UTC (permalink / raw)
  To: Don Syme; +Cc: Caml mailing list

On Monday 21 February 2005 20:17, Don Syme wrote:
> Could you send around the source code to this?  I intuitively knew that
> such performance results were likely, but had never had such a
> convincing example to wave at people.

Yes. The implementation of traveling salesman given in my book is another 
example which is several orders of magnitude faster than the C/C++/Fortran 
implementations given in all of the other text books that I have read.

This problem also happens to be a defacto standard test given to most natural 
science undergrads on their computing courses, which should be good for book 
sales. Tell everyone you know. :-)

> It would also be interesting to ask some C++ programmers how they would
> go about optimizing this code, for a couple of reasons
>
> (a) to see if they come up with anything faster; and

I managed to, after a while. However, doing this optimisation required me to 
have several things:

1. Estimates of run-time sizes of data structures.
2. Knowledge of the internals of the STL set implementation.
3. Knowledge of the complexities of various operations over STL sets.
4. Huge kahunas, as the resulting code is much more likely to be wrong.

The optimised C++ is also specialised by (1), whereas the OCaml code will 
always be efficient (because Xavier wrote the hard part ;-). Also, the C++ 
implementation consumes twice as much memory (which could be a real killer as 
many scientific computations use resources to the limit of the available 
computer).

Both the OCaml and the C++ can probably be optimised more, but I'm interested 
in problems where author-time is comparable to execution-time. Were I to 
continue optimising, I would move to hash sets, but these require an 
imperative approach which negates the principal advantage of the current 
OCaml implementation: persistence.

> (b) to see if they would ever even think of using
> sharing-of-complex-immutable-internal-data-structure-nodes as an
> optimization technique.

No way. :-)

Also (from the traveling salesman solution):

 (c) to see if they would ever even think of reinventing closures.

For people who just want a quick look, here is the OCaml code for the core 
function:

  let rec nth_nn =
    let memory = Hashtbl.create 1 in
    fun n (i, io) ->
      try Hashtbl.find memory (n, i)
      with Not_found -> match n with
        0 -> AtomSet.singleton (i, io)
      | 1 ->
          let nn = bonds.(i - 1) in
          if io = zero then nn else
          let aux (j, jo) s = AtomSet.add (j, add_i io jo) s in
          AtomSet.fold aux nn AtomSet.empty
      | n ->
          let pprev = nth_nn (n-2) (i, io) in
          let prev = nth_nn (n-1) (i, io) in
          let aux j t = AtomSet.union (nth_nn 1 j) t in
          let t = AtomSet.fold aux prev AtomSet.empty in
          let t = AtomSet.diff (AtomSet.diff t prev) pprev in
          Hashtbl.add memory (n, i) t;
          t in

$ time ./nth 30 1 <cfg-diamond >tmp

real    0m4.167s
user    0m3.940s
sys     0m0.000s

$ time ./nth.opt 30 1 <cfg-diamond >tmp

real    0m0.492s
user    0m0.450s
sys     0m0.000s

Here's the elegant C++ equivalent:

const AtomSet nth_nn(int n, int i, const vector<int> io) {
  const Map::key_type k=make_pair(n, make_pair(i, io));
  Map::const_iterator m=memory.find(k);
  if (m != memory.end()) return m->second;
  AtomSet s;
  if (n == 0) {
    s.insert(make_pair(i, io));
    return s;
  } else
    if (n == 1) {
      const AtomSet &nn=bonds[i-1];
      for (AtomSet::const_iterator it=nn.begin();
    it != nn.end(); it++) {
 int j = it->first;
 vector<int> jo = it->second;
 for (i=0; i<d; i++)
   jo[i] += io[i];
 s.insert(make_pair(j, jo));
      }
      return s;
    } else {
      const AtomSet
 &pprev = nth_nn(n-2, i, io),
 &prev = nth_nn(n-1, i, io);
      for (AtomSet::const_iterator it=prev.begin();
    it != prev.end(); it++) {
 const AtomSet &t = nth_nn(1, it->first, it->second);
 AtomSet t2;
 set_union(s.begin(), s.end(),
    t.begin(), t.end(),
    inserter(t2, t2.begin()));
 s=t2;
      }
      AtomSet s2;
      set_difference(s.begin(), s.end(),
       prev.begin(), prev.end(),
       inserter(s2, s2.begin()));
      s=AtomSet();
      set_difference(s2.begin(), s2.end(),
       pprev.begin(), pprev.end(),
       inserter(s, s.begin()));
    }
  memory[k] = s;
  return memory.find(k)->second;
}

With this C++, I get:

$ time ./nth 30 1 <cfg-diamond >tmp

real    1m31.178s
user    1m12.680s
sys     0m0.100s

With my sneaky optimisation:

$ time ./nth 30 1 <cfg-diamond >tmp

real    0m0.479s
user    0m0.480s
sys     0m0.000s

I've just noticed that the timing results are quite interesting functions of 
"n". I think I'll plot them...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Persistence
  2005-02-21 23:24 ` Jon Harrop
@ 2005-02-22  9:06   ` Diego Olivier Fernandez Pons
  2005-02-22 11:59     ` Thomas Fischbacher
  0 siblings, 1 reply; 4+ messages in thread
From: Diego Olivier Fernandez Pons @ 2005-02-22  9:06 UTC (permalink / raw)
  To: caml-list

    Bonjour,

> > Could you send around the source code to this?  I intuitively knew
> > that such performance results were likely, but had never had such
> > a convincing example to wave at people.
>
> Yes. The implementation of traveling salesman given in my book is
> another example which is several orders of magnitude faster than the
> C/C++/Fortran implementations given in all of the other text books
> that I have read.

Branch-and-bound algorithms (pure integer, not LP based) are good
candidates for important speed-ups due to data persistence : when you
backtrack it is most of the time much faster to restart from a
memorized data structure than to reconstruct it, specially with
structured data like ordered sets, graphs, etc.

Generally speaking, the _good_ paradigm to program B&B solvers for
NP-hard problems is mixed functional/imperative programming (in other
words ML), because you benefit from fast recursion, data persistence,
garbage collection while you use very few floating point computations.

Actually, most of the commercial C/C++ libraries for constraint
programming (= integer branch and bound) include a 'functional layer'
that contains one or several garbage collectors, persistent data
structures, a system to make persistent user-defined data structures,
function objects, etc.

Winning against a mixed imperative/functional implementation in e.g.
C/C++ require a lot of low-level implementation work - and you will
end anyway implementing some kind of 'functional core'. Worse,
algorithmic wins (better algorithms) are most of the time several
order of magnitude superior than implementation wins (better
implementation).

Therefore it is also more convenient to use a high level language like
Caml in order to focalize on algorithms rather than implementation.


        Diego Olivier


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Persistence
  2005-02-22  9:06   ` Diego Olivier Fernandez Pons
@ 2005-02-22 11:59     ` Thomas Fischbacher
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Fischbacher @ 2005-02-22 11:59 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list


On Tue, 22 Feb 2005, Diego Olivier Fernandez Pons wrote:

> Winning against a mixed imperative/functional implementation in e.g.
> C/C++ require a lot of low-level implementation work - and you will
> end anyway implementing some kind of 'functional core'.

Greenspun's tenth rule, isn't it?

-- 
regards,               tf@cip.physik.uni-muenchen.de              (o_
 Thomas Fischbacher -  http://www.cip.physik.uni-muenchen.de/~tf  //\
(lambda (n) ((lambda (p q r) (p p q r)) (lambda (g x y)           V_/_
(if (= x 0) y (g g (- x 1) (* x y)))) n 1))                  (Debian GNU)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-02-22 11:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-21 20:17 Re: [Caml-list] Persistence Don Syme
2005-02-21 23:24 ` Jon Harrop
2005-02-22  9:06   ` Diego Olivier Fernandez Pons
2005-02-22 11:59     ` Thomas Fischbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).