optimization and purity

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* optimization and purity
@ 1999-07-18 18:57 Markus Mottl
  1999-07-22  6:51 ` Diagnostic bug? John Skaller
  0 siblings, 1 reply; 8+ messages in thread
From: Markus Mottl @ 1999-07-18 18:57 UTC (permalink / raw)
  To: OCAML

Hello,

I would like to know whether anyone has already thought about means of
indicating or inferring purity of functions.

My specific problem:

I am just writing a library which features some functions that take
parameters which can only be generated by other functions of this module
(= the parameters have an abstract type).

E.g.:

  let rec foo () = f (make_a a) (make_b b) ... in foo ()

The conversion functions like "make_a" or "make_b" may take a not
insignificant amount of time. I know that "f" will be heavily used within
recursive functions or loops and that the conversion functions are all
pure (=referentially transparent).

So the user can move these expressions outside of loops, because this
will not change semantics. He will even have to do this if he wants that
his code runs fast.

E.g.:

  let made_a = make_a a
  and made_b = make_b b in
  let rec foo () = f made_a made_b ... in foo ()

But it can be tedious for the user to keep writing such code if
"f" is a very common function and if it takes numerous parameters.
Inexperienced users will probably not even know about such optimizations.

Thus, I have thought that it would be an interesting idea to tag all
kinds of external functions so as to indicate whether their evaluation
yields side effects or not.

E.g.:

  external pure make_a : ...

This would permit some very interesting optimizations, because this
information allows the compiler to infer purity for all (better: nearly
all (*)) other functions. In the upper case the user would not need
to do optimization by hand - the compiler can see that e.g. "make_a"
is pure and may, if appropriate (the presence of conditionals can make
things difficult), move its evaluation outside of the loop.

(*) If a function makes use of only pure functions, it is always pure
    itself. If a function contains calls to impure functions, it may be
    that these functions will never be evaluated no matter what the input
    to their "mother function" so this one is still pure as a whole. But
    the latter case is not decidable...

Considering the paragraph on "Optimization" on the page
"http://caml.inria.fr/ocaml/numerical.html", I wonder whether there are
already intentions to implement some other "higher-level" optimizations
in the OCaml-compiler.

For example lambda-lifting would be nice - especially to me, because I
make heavy use of functions-within-functions, be it to restrict their
scope (yields less complex programs) or to bind names to values in the
environment of embracing functions (less parameters needed both for
function definition and for calls).

Small example for problem that could be eliminated with lambda-lifting:

  let slow () =
    let rec f () = () in
    f ()

  let rec f () = ()
  let fast () = f ()

  let _ = for i = 1 to 1000000 do slow () done

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Diagnostic bug?
  1999-07-18 18:57 optimization and purity Markus Mottl
@ 1999-07-22  6:51 ` John Skaller
  1999-07-27 15:34   ` Hendrik Tews
  0 siblings, 1 reply; 8+ messages in thread
From: John Skaller @ 1999-07-22  6:51 UTC (permalink / raw)
  To: OCAML

The following code indicates an incorrect error diagnostic:

type t = T;;

let rec g (x:int) = 
  (let (aT:t) = f x in ())
and f x : unit = ();;

Line 5 characters 17-19:
This expression has type unit but is here used with type t.

The function f is explicitly declared with return type unit,
and clearly returns the unit value: there is no error
in f, and certainly the () is _not_ being used 'here'
with type t. 

The error may be considered to be in the call of f, 
in the function g, or it may be considered to be 
in the explicit declaration of the return type of f,
which clashes with the infered return type,
or even in the whole of f,
but it cannot be in the body of f: f is internally
consistent.

In my actual code, I got an error
in a 5000 character expression, and it took three
days to figure out the error wasn't in that expression
at all. 

I can guess how the problem arises in the inference
algorithm, but now I can't figure out a policy for tracking
down type errors (since they can no longer be trusted).
I found previously that adding explicit types helped
isolate errors, but in this case it didn't.

-------------------------------------------------------
John Skaller    email: skaller@maxtal.com.au
		http://www.maxtal.com.au/~skaller
		phone: 61-2-96600850
		snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Diagnostic bug?
  1999-07-22  6:51 ` Diagnostic bug? John Skaller
@ 1999-07-27 15:34   ` Hendrik Tews
  1999-07-30 14:55     ` John Skaller
  0 siblings, 1 reply; 8+ messages in thread
From: Hendrik Tews @ 1999-07-27 15:34 UTC (permalink / raw)
  To: caml-list

Hi,

as an answer to John I am just telling my heuristics to track
down type errors:

Assuming, that John started with the following program:

    type t = T;;

    let rec g (x) = 
      let (aT : t) = f x in ()
    and f x = ();;

then, he gets the error 

    File "t.ml", line 5, characters 10-12:
    This expression has type unit but is here used with type t

Now this indicates, that there is something wrong with at least
one of ``f'', ``x'', or ``()''. The error occurs, because from
previous uses of the problematic identifiers the compiler
collected type information, which in incompatible with line 5.
Therefore my strategy is to add type annotations at the first use
of the problematic identifiers, which leads to 

    type t = T;;

    let rec g (x) = 
      let (aT : t) = (f : 'a -> unit) x  in ()
    and f x = ();;

or alternatively 

    type t = T;;

    let rec g (x) = 
      let (aT : t) = (f x : unit) in ()
    and f x = ();;

For both programs the compiler generates a usable error message.

John Skaller writes:

   In my actual code, I got an error
   in a 5000 character expression, and it took three
   days to figure out the error wasn't in that expression
   at all. 

You are right, error reporting is one of the weak points of the
ocaml system. You can find lots of strange examples in error
reporting. My favorite one is option -i, which prints the types
of top level identifiers, if the file can be compiled. Now, in
the situtation, where the inferred types would be most valuable
--a type error in the file-- the compiler prints just nothing :-(

Bye,

Hendrik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Diagnostic bug?
  1999-07-27 15:34   ` Hendrik Tews
@ 1999-07-30 14:55     ` John Skaller
  0 siblings, 0 replies; 8+ messages in thread
From: John Skaller @ 1999-07-30 14:55 UTC (permalink / raw)
  To: Hendrik Tews, caml-list

At 17:34 27/07/99 +0200, Hendrik Tews wrote:
>Hi,
>
>as an answer to John I am just telling my heuristics to track
>down type errors:

[...]

>Therefore my strategy is to add type annotations at the first use
>of the problematic identifiers, 

	Ah, thank you! This is a good rule. 

>John Skaller writes:
>   
>   In my actual code, I got an error
>   in a 5000 character expression, and it took three
>   days to figure out the error wasn't in that expression
>   at all. 
>   
>You are right, error reporting is one of the weak points of the
>ocaml system. 

	Error reporting is a 'higher order' function
of a compiler: it is the weakest part of almost all
compilers. In fact, I find ocaml error reporting
quite reasonable compared with other systems.

	For example, just try tracking down errors in
complex template instantiations in C++.

-------------------------------------------------------
John Skaller    email: skaller@maxtal.com.au
		http://www.maxtal.com.au/~skaller
		phone: 61-2-96600850
		snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia





^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: optimization and purity
  1999-07-23 15:48 optimization and purity Andrew Kennedy
  1999-07-23 17:49 ` Simon Helsen
  1999-07-23 21:37 ` Pierre Weis
@ 1999-07-24 12:45 ` John Skaller
  2 siblings, 0 replies; 8+ messages in thread
From: John Skaller @ 1999-07-24 12:45 UTC (permalink / raw)
  To: Andrew Kennedy, 'Markus Mottl', caml-list

At 08:48 23/07/99 -0700, Andrew Kennedy wrote:
>> -----Original Message-----
>> From: Markus Mottl [mailto:mottl@miss.wu-wien.ac.at]
>> Subject: optimization and purity
>> 
>> I would like to know whether anyone has already thought about means of
>> indicating or inferring purity of functions.
>> 

>You are indeed correct that knowing the purity of functions would permit a
>number of optimisations such as the loop-hoisting example you present. 

	Actually, I'm quite interested in the same thing
from a different angle: rather than being concerned with
_compiler_ code optimisation, I'm concerned with human ability
to reason about code: ocaml loses the simplicity of reasoning about
derived from referential transparency available in 'pure' functional
languages. (But of course the two things are strongly related)

-------------------------------------------------------
John Skaller    email: skaller@maxtal.com.au
		http://www.maxtal.com.au/~skaller
		phone: 61-2-96600850
		snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: optimization and purity
  1999-07-23 15:48 optimization and purity Andrew Kennedy
  1999-07-23 17:49 ` Simon Helsen
@ 1999-07-23 21:37 ` Pierre Weis
  1999-07-24 12:45 ` John Skaller
  2 siblings, 0 replies; 8+ messages in thread
From: Pierre Weis @ 1999-07-23 21:37 UTC (permalink / raw)
  To: Andrew Kennedy; +Cc: caml-list

> You are indeed correct that knowing the purity of functions would permit a
> number of optimisations such as the loop-hoisting example you present. There
> was quite a bit of work on "effect inference" a few years back, though much
> of it had a different motivation, namely to permit more expressions to be
> given polymorphic types in the let construct (e.g. in the current revision
> of Standard ML -- sorry I don't know the situation for CAML -- the
> expression let val x = rev [] in (1::x, true::x) end is ill-typed even
> though it would be sound to assign it the obvious type).

The Caml FAQ explains the problem, its solution in Caml, and why it is
considered a good solution. See
http://pauillac.inria.fr/caml/FAQ/FAQ_EXPERT-eng.html

This mailing list archives contain an article about the history of the
problem, http://pauillac.inria.fr/caml/caml-list/0485.html. It states
that the Caml implementors impulsed a new interest on the subject by
writing papers as soon as 1990.

This article also recalls that the Caml compiler incorporates the
restriction of polymorphism to non expansive expressions since 1995
(Caml-Light 0.7), even before Andrew Wright's article was published.

Best regards,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: optimization and purity
  1999-07-23 15:48 optimization and purity Andrew Kennedy
@ 1999-07-23 17:49 ` Simon Helsen
  1999-07-23 21:37 ` Pierre Weis
  1999-07-24 12:45 ` John Skaller
  2 siblings, 0 replies; 8+ messages in thread
From: Simon Helsen @ 1999-07-23 17:49 UTC (permalink / raw)
  To: Andrew Kennedy; +Cc: 'Markus Mottl', caml-list

Hi,

>You are indeed correct that knowing the purity of functions would permit a
>number of optimisations such as the loop-hoisting example you present. There
>was quite a bit of work on "effect inference" a few years back, though much
>of it had a different motivation, namely to permit more expressions to be
>given polymorphic types in the let construct (e.g. in the current revision
>of Standard ML -- sorry I don't know the situation for CAML -- the
>expression let val x = rev [] in (1::x, true::x) end is ill-typed even
>though it would be sound to assign it the obvious type). I believe that the

indeed, there used to be a semantic-based approach to figure out about
polymorphic references (using imperative variables), but they did not say
anything about the purity of a function. Moreover, the scheme was
difficult to use and hence abandoned (see earlier discussions on this list
and comp.lang.ml - subject value polymorphism) 

>MLj compiler, of which I am a developer, is the only ML compiler that
>currently performs type-based effect inference and uses the results to
>optimise code (I'd be happy to be contradicted here!). At present we have a

It depends on how you view it. The ML Kit also performs a region/type
based effect analysis, but to implement compile-time garbage collection. 
I don't know if they use it to perform low-level optimisations

However, quite related to compilation is partial evaluation (or program
specialisation). There you can use an effect analysis to classify more
program parts as static, i.e. executable at partial evaluation time. 
Indeed, such a technique has been incorporated in the PGG system, a
partial evaluator for the full Scheme language
(http://www.informatik.uni-freiburg.de/proglang/software/pgg/). It is the
only PE system (of which I know of) which uses such a type-based effect
analysis to specialize impure code at partial evaluation time. In fact, we
are now exploring new opportunities in Region/Effect analysis to produce a
powerful partial evaluator for (Standard - sorry) ML. 

>Note that the problem with many effect analyses of the past is that they
>didn't consider termination. This is unfortunate, for you need to know
>whether or not something might loop to use (for example) the first two of
>the rewrites above. However, it's very hard to do a good job of inferring
>termination behaviour -- in MLj, we basically assume that any recursive
>function might loop. Notice also that recursion (and hence looping) can get
>in by the back door -- two well-known routes are via recursive types and via

The problem of termination is equally there in partial evaluation. In PE
this issue is treated separately from impurity though, as it may occur in
pure code as well. Moreover, as you indicate, it is substantially more
difficult. In general, a partial evaluator guarantees to preserve
the semantics (including termination behaviour). What it usually does not
guarantee is to terminate itself. This is something where PE departs from
standard compilation and that may have consequences to what level of
optimisations you can put through. In fact, a PE system can be considered 
an aggresive program optimizer which performs function unfolding and
aggressive constant propagation.

cheers,

	Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: optimization and purity
@ 1999-07-23 15:48 Andrew Kennedy
  1999-07-23 17:49 ` Simon Helsen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Andrew Kennedy @ 1999-07-23 15:48 UTC (permalink / raw)
  To: 'Markus Mottl', caml-list

> -----Original Message-----
> From: Markus Mottl [mailto:mottl@miss.wu-wien.ac.at]
> Sent: Sunday, July 18, 1999 7:58 PM
> To: caml-list@inria.fr
> Subject: optimization and purity
> 
> 
> I would like to know whether anyone has already thought about means of
> indicating or inferring purity of functions.
> 
> 

You are indeed correct that knowing the purity of functions would permit a
number of optimisations such as the loop-hoisting example you present. There
was quite a bit of work on "effect inference" a few years back, though much
of it had a different motivation, namely to permit more expressions to be
given polymorphic types in the let construct (e.g. in the current revision
of Standard ML -- sorry I don't know the situation for CAML -- the
expression let val x = rev [] in (1::x, true::x) end is ill-typed even
though it would be sound to assign it the obvious type). I believe that the
MLj compiler, of which I am a developer, is the only ML compiler that
currently performs type-based effect inference and uses the results to
optimise code (I'd be happy to be contradicted here!). At present we have a
fairly crude set of effects:

  reads, writes, allocates, throws, loops

and the effect inference is also crude even though the *system* is quite
sophisticated (involving true subtyping induced by inclusion on sets of
effects). Expressions are divided into values (which syntactically are
effect-free) and computations (which might have side-effects). Computations
are given types that are annotated with their possible effects. Then we can
perform optimisations that depend on effect information (sorry about the SML
syntax):

(dead-code)
let val x = e1 in e2 end   -->   e2    
      if x not free in e2 
      and e1 has effect E for E \subseteq {reads,allocs}

(hoisting out of a function)
fn x => let val y = e1 in e2 end  -->   let val y = e1 in fn x => e2 end
      if e1 has the empty effect

(commuting computations)
let val x = e1 val y = e2 in e3 end --> let val y = e2 val x = e1 in e3 end
       if x,y not free in e1,e2
       and e1,e2 have effects E1,E2 
       such that E1,E2 \subseteq {reads,allocs,loops}
         OR E1 \subseteq {allocs,loops} and E2 \subseteq
{allocs,loops,writes}

Nick Benton and I have a paper submitted to HOOTS'99 on the theory of this
effect system in which we prove (e.g.) the equivalences listed above. If it
gets accepted I'll post a pointer to the final version. The bones of the
system can be seen in our ICFP'98 paper on MLj. I'll soon post a pointer to
that too if you like. Also in ICFP'98 you'll find Phil Wadler's paper on the
"marriage of effects and monads".

Note that the problem with many effect analyses of the past is that they
didn't consider termination. This is unfortunate, for you need to know
whether or not something might loop to use (for example) the first two of
the rewrites above. However, it's very hard to do a good job of inferring
termination behaviour -- in MLj, we basically assume that any recursive
function might loop. Notice also that recursion (and hence looping) can get
in by the back door -- two well-known routes are via recursive types and via
ref cells. Here's an example of the former:

datatype T = C of T->T;
val f = fn (y as C x) => x y;
val ouch = f (C f);

Hope this (partly) answers your question.

- Andrew.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1999-08-12  9:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-07-18 18:57 optimization and purity Markus Mottl
1999-07-22  6:51 ` Diagnostic bug? John Skaller
1999-07-27 15:34   ` Hendrik Tews
1999-07-30 14:55     ` John Skaller
1999-07-23 15:48 optimization and purity Andrew Kennedy
1999-07-23 17:49 ` Simon Helsen
1999-07-23 21:37 ` Pierre Weis
1999-07-24 12:45 ` John Skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).