From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA23166 for caml-red; Wed, 8 Nov 2000 17:33:02 +0100 (MET)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id JAA26064 for <caml-list@pauillac.inria.fr>; Tue, 7 Nov 2000 09:22:33 +0100 (MET)
Received: from pcrm.win.tue.nl (pcrm.win.tue.nl [131.155.69.41])
	by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id eA78MWv12692
	for <caml-list@inria.fr>; Tue, 7 Nov 2000 09:22:32 +0100 (MET)
Received: from pcrm (stephan@localhost [127.0.0.1])
	by pcrm.win.tue.nl (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with SMTP id JAA01465;
	Tue, 7 Nov 2000 09:26:07 +0100
From: Stephan Houben <stephan@pcrm.win.tue.nl>
Reply-To: stephanh@win.tue.nl
Organization: Scientific Computing Group
To: Chris Hecker <checker@d6.com>
Subject: Re: practical functional programming
Date: Tue, 7 Nov 2000 08:54:01 +0100
X-Mailer: KMail [version 1.0.28]
Content-Type: text/plain
Cc: caml-list@inria.fr
References: <4.3.2.7.2.20001103055229.00ab6880@shell16.ba.best.com> <4.3.2.7.2.20001106100700.00c447b0@shell16.ba.best.com>
In-Reply-To: <4.3.2.7.2.20001106100700.00c447b0@shell16.ba.best.com>
MIME-Version: 1.0
Message-Id: <00110709260602.01341@pcrm>
Content-Transfer-Encoding: 8bit
Sender: weis@pauillac.inria.fr

On Mon, 06 Nov 2000, Chris Hecker wrote:
 
> I guess my next question then would be related to efficiency: isn't there a lot of copying going on if you've got to make a new instance of your symbol table just to add an element?  

Well, one of the advantages of having a purely functional data structure is
that you can freely share substructures. So there is very little copying.
Because the updated structure shares most of its memory with the original
structure, holding on to both isn't prohibitively expensive.

Again, this makes it simple to "roll back" to a previous version.

> Again, I'm not saying efficiency is the only metric for software, but it does concern me when I see a large datastructure copied.

No copying. Just pointer sharing.

>  Is there some sort of reference counted sharing going on, 

Reference counting is a (bad) substitute for real garbage collection, such as
O'Caml has. No need for refcounting, since we have GC.

> or are there actually two copies of the data in memory if you wrote your
> example in caml?  

No, this is not the case. Most of the data is shared.

> I guess my hope would be that in the case where you
> didn't reference the old datastructure ever again you would get performance
> close to the equivalent imperative datastructure.

It depends on the data structure, but some of Okasaki's data structures come
close to this ideal.

> In the case where you did
> reference both old and new you'd get performance similar to writing an
> "undo-able" imperative datastructure from scratch, 

Writing an undo-able imperative data-structure is no easy job.
Most people would probably go for a functional data structure then.

For example: the new ReiserFs in Linux (journalling filesystem) needs
for its journalling a similar commit/roll-back functionality. Therefore, it
is based on balanced trees, which are a somwehat functional data structure.
(I definitely do not want to suggest that ReiserFs is somehow a purely
functional file system, just that the basic idea is somewhat similar to some
functional data structures.)

> possibly with an
> optimization for not accessing the old datastructure until the new one had
> been undone (like in your example, and I would assume that's the common case
> for using persistence).

I am afraid that compilers aren't that smart yet.

I think the bottom line is this: imperative datastructures are often (somewhat)
faster, but purely functional datastructures are more general. This shows the
importance of Okasaki's work: sometimes it is possible to have your cake and
eat it, too.

So now it is up to you to decide whether you need a purely functional or an 
imperative data structure. There is no hard&fast rule here: it depends
completely on the application. But the nice thing of O'Caml is that it gives you
the choice. Which is also the bad thing: if we were writing Perl, we would
almost always use hashes and not have to think about these issues.

Good luck with your decision,

Stephan

--  ir. Stephan H.M.J. Houben tel.
+31-40-2474358 / +31-40-2743497 e-mail: stephanh@win.tue.nl