From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA18123; Wed, 25 Sep 2002 21:22:24 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA18119 for <caml-list@pauillac.inria.fr>; Wed, 25 Sep 2002 21:22:23 +0200 (MET DST)
Received: from fichte.ai.univie.ac.at (fichte.ai.univie.ac.at [131.130.174.156])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g8PJMMD25416
	for <caml-list@inria.fr>; Wed, 25 Sep 2002 21:22:22 +0200 (MET DST)
Received: from fichte.ai.univie.ac.at (markus@localhost [127.0.0.1])
	by fichte.ai.univie.ac.at (8.12.3/8.12.3/Debian -4) with ESMTP id g8PJMJEI022445;
	Wed, 25 Sep 2002 21:22:19 +0200
Received: (from markus@localhost)
	by fichte.ai.univie.ac.at (8.12.3/8.12.3/Debian -4) id g8PJMIQA022444;
	Wed, 25 Sep 2002 21:22:18 +0200
Date: Wed, 25 Sep 2002 21:22:18 +0200
From: Markus Mottl <markus@oefai.at>
To: Brian Hurt <brian.hurt@qlogic.com>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Probably FAQ: Why is list-append (list :: elem) so expensive?
Message-ID: <20020925192218.GA20345@fichte.ai.univie.ac.at>
Mail-Followup-To: Brian Hurt <brian.hurt@qlogic.com>,
	Ocaml Mailing List <caml-list@inria.fr>
References: <20020925161703.GC31883@fichte.ai.univie.ac.at> <Pine.LNX.4.33.0209251315440.1974-100000@eagle.ancor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.33.0209251315440.1974-100000@eagle.ancor.com>
User-Agent: Mutt/1.4i
Organization: Austrian Research Institute for Artificial Intelligence
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

On Wed, 25 Sep 2002, Brian Hurt wrote:
> I think I'm asking the wrong question.  Is there any circumstances where 
> Ocaml will reuse the cells from one list to make another list?
> 
> Consider the following Ocaml code:
> 
>     let dup_list lst =
>         let rec reverse_list l accum =
>             match l with
>                 [] -> accum
>                 | head :: tail -> reverse_list tail (head :: accum)
>         in
>             reverse_list (reverse_list lst)
>     ;;

Thanks to referential transparency of purely functional code it is easy
to prove by induction that twice reversing a list will yield the original
list again (home exercise :-).

However, you shouldn't expect that current compiler technology is
strong enough to exploit such equivalences even though some automated
theorem provers manage to prove them without human help. Most real world
high-level optimizers already struggle with doing partial evaluation
correctly and efficiently, not even to mention trickier stuff like above.

> Ok, the first time through reverse_list the compiler has to allocate a 
> whole new list, including a whole new set of list cells, for the 
> accumlation list.  This is because the code doesn't dare modify lst.  But 
> the second time reverse_list is called, it's obvious that l is the list 
> allocated the first time through, and it's garbage as soon as the call 
> completes.  In fact, each cell of the list is garbage after it's been 
> prepended to accum.  So why not reuse the cells from l to form the accum?  
> Yes, I know this violates the immutability of l.  But it seems to me that, 
> with the exception of garbage creation rates, the two are identical.

Yes, the two are indeed equivalent. And since our lists are purely
functional, there is no danger of overwriting them (it) during reuse. The
only noticable computational effect is that things will run faster after
your transformation.

> Phrased this way, I start wondering how much of an optimization this
> actually would be.

It's pretty demanding! Search space size for finding suitable high-level
transformations usually grows exponentially with the number of definitions
to consider. There are, however, certain classes of transformations that
can be found and applied somewhat efficiently. The Haskell-compiler
implements some of them (e.g. deforestation, which concerns the
elimination of certain intermediate datastructures).

One question is an economic one: where do compiler writers invest
their time? As the OCaml-compiler shows, you can gain really a lot by
forgetting about high-level optimizations (it doesn't even perform
common subexpression elimination) and by writing good backends
for machine-code generation instead. The problem is that the "easy"
high-level transformation usually do not give you so much speedup,
whereas the tricky ones (some can even yield super-exponential speedups)
boost compilation times by some orders of magnitude. Few developers want
to spend their life time waiting for their compilers to finish...

In case you want to learn more about automating functional program
transformations, you might find a hopefully somewhat understandable
introduction in my MSc-thesis (just skip the mistakes ;-)

  http://www.oefai.at/~markus/msc_thesis

> Heh.  One of the dangers of Ocaml is that it makes the algorithm clear
> enough that it encourages you to over-optimize your algorithms in
> the same way that C encourages you to cycle count and over-optimize
> your implementations.

This may happen at times. It is generally a good idea to just write the
simplest implementation you can imagine, profile your program and then
optimize it. Very often the simplest algorithm is also among the best
so it seldom pays to start optimizing before your program actually works.

Regards,
Markus Mottl

-- 
Markus Mottl                                             markus@oefai.at
Austrian Research Institute
for Artificial Intelligence                  http://www.oefai.at/~markus
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners