Just as a preface, it's very important to me that this feature remain relatively simple. I'm looking to incorporate this into the core language, and I think one of ocaml's strengths is its simplicity. Of course, simplicity is a relative term in and of itself, but I think one of the reasons for the success of e.g. OOP is the fact that the concept is simple and maps naturally to our conception of the world. I see many languages that drown in their complexity, haskell being a very good example of that. The cognitive overhead that's involved in programming haskell is not that different (and may be much more) than programming in C++ IMO. The resulting code is much more terse and safe, but reading and writing complex systems might as well involve memory management and class hierarchies. Ocaml hits a sweet spot (again, IMO) of simplicity, safety and performance, and I'm not looking to change that.



Obviously the purity of a higher order function can be dependent on
the purity of its arguments. You need a syntax for the type of a
function that can reflect that.


Well, that's what hpure and hst are for -- feel free to suggest better names. A pure higher-order function forces its function arguments to all be pure. An hpure higher-order function does nothing impure, but can become either st or impure based on what it's called with.

Also note that once we're dealing with 2nd order high level functions, or functions saved into other types (records, ADTs), we can't infer the type of the function -- annotations must be used. Fortunately, the majority of cases involve things like mapping/folding a lambda or an external function -- all things that can be fully inferred.


Unfortunately in ocaml you have to annotate a lot if done right. You
need to create a lot of .mli files, module types or class types. Not
to mention GADT need type annoation for nearly every function.

So I'm not too sure about keeping this transparent.


I haven't dealt much with GADTs myself, so I'm not sure if they present a special challenge.
 

It might be better to talk about the scope of impurity there. The
impurity dies with the death of the mutable. Like when you create a
local hashtbl to speed up computations, get the result and then return
the result (destroying the hashtbl). The impurity ends there. Global
mutables are just a special case of having the lifetime of the
programm.


Fair enough. This is why the st annotation would exist for functions/expressions that use the same local mutable state: the interaction of these statements must retain its ordering, but once the scope is exited, nothing else can interfere with this local state. Global state can always be interfered with by other code.
 

That won't work. You need to be able to declare purity in the source.
Otherwise, for example, how would you pass a module to a function that
must have a pure function? (as you say below)


I'm not sure what you mean. Are you asking for purity annotations in the syntax? It's always possible to do
let f : pure 'a -> 'a = fun x -> ...
as one option, but I agree that having some shortcut in the syntax would be useful. I'm open to suggestions. Regarding first class modules, those are in the same category as higher-order functions that cannot be inferred -- they must be annotated.
 
If you can annotate that in the source (.ml or .mli files) then that
should end up in the .cmi file as well, right. Then why would the
compiler generated purity, in case you don't have a .mli file, not end
up in there too?

Note that if a library does not annotate purity but the function
happens to be pure then you can't have the compiler fill that in. The
library makes no promise that the function will remain pure in the
future and by having the compiler adding that promise you would change
the API of the library. This would break dynamic linking.

So I realy think purity belongs in the .mli and .cmi files. Not
something seperate. And the compiler can only fill in the purity when
there is no .mli file. With .mli file it can only check that the given
type can be unified with the code regarding its purity.


This is a very good point that I haven't considered before. Inferred cross-module purity supplies more information than the interface provides. I can see a few ways to approach this:

1. Your method, which is the simplest method, of requiring all purity annotations to be in the .mli file. This may make the most sense, but it also means that to take advantage of the resulting potential optimizations, a company like Jane Street would have to go back and modify all of its .mli files, constraining the purity of functions. I really wanted this to be more of a transparent feature -- one which can help with optimization, but can also allow for more advanced uses (like forcing purity in monads and other special cases). However, maybe this is the only 'proper' way to do this. It would mean that ocaml becomes a fully purity-conscious language. .mli files are encouraged (for good reason), and if you constrain any part of your code to purity, any functions called by that code must also be pure, causing a cascade of purity annotations.

2. Allowing for the method I outlined, which is that impure functions can be constrained by the compiler even in the presence of impure signatures in the .mli file. The advantage here is that you don't need to change much in the code to get potential optimizations. It also doesn't transform ocaml in any meaningful way -- it simply allows for another style of programming. The problem is (as you mentioned), what happens if a library changes its implementation of a function that was not annotated with purity but was internally pure, and now makes it impure? Well, one approach is that if you annotated your code with purity, the code should break with a type error. You obviously had a reason to choose to annotate your code with purity. Choosing to use an external library's function that was NOT annotated with purity (but happened to be pure and therefore compiled) was not a very safe thing to do -- there was no guarantee that the function would stay pure. So maybe this isn't much of a problem after all.

3. Another approach is to follow a process similar to 2, but to limit it when it comes to libraries. For example, perhaps libraries should never ship with/be loaded with the file that supplies function purity metadata. So a library would be forced to annotate its functions with their purity, but a regular module would not. I think that would be fair.

Now that I think about it, an external compiler-generated metadata file containing inferred purity information could be simplified to just modules, their function names, and those functions' purity level (and an md5 of the mli/cmi+ml file). We can't infer anything else cross-module anyway -- any purity information within types would have to be specified by the user. This could even be appended to the cmi file to ensure that the data is always up to date. A simple [function;purity] list would be readable even within a binary file.
 

I think that would be a mistake. Exceptions are wide spread in ocaml.
By not annotating types with their exceptions you MAKE them side
effects because they become unpredictable.

On the other hand if types are annotated with the exceptions they
throw then they become just another return value and become pure.
Same input, same exception every time.

True this would complicate the type of functions because the return
type would be a pair of successfull type and exceptions. You need a
syntax for the type that allows the return type to be a combination of
the argument types, like:

val map : ('a -> 'b ['c] ) -> 'a list -> 'b list ['c]
val mapmap : ('a -> 'b ['c] ) ('b -> 'd ['e] ) -> 'a list -> 'd list ['c | 'e]

This will be hairy for purity, too, or even more so. You need ways to
say that the combined purity is the least pure of the inputs. Or with
to function arguments the purity is the result of passing one function
to the other. And so on.

Then again consider this code:

let f () = raise Not_found
val f : unit -> 'a [Not_found]

let g fn x = try fn x with Not_found -> Empty
val g : ('a -> 'b [< Not_found as 'c | 'd]) -> 'b [ Empty if 'c | 'd]

g takes a function that may raise Not_found or 'd. G then may itself
raise Empty (only if the function can raise Not_found though) or 'd.

let h fn1 fn2 x = fn2 fn1 x
val h : ('a -> 'b [< Not_found as 'c | 'd]) ->
        (('a -> 'b [< Not_found as 'c | 'd]) -> 'b [ Empty if 'c | 'd]) ->
        'a -> 'b [ Empty if 'c | 'd])

let h2 = h f g
val h2: 'a -> 'b [Empty]


This is what I was referring to above with simplicity. I think any attempt to handle exceptions, if it will succeed, will be overly complex. Exceptions weren't really made to be a sum type -- they were made to halt computation and transfer it to an arbitrarily high level in the call stack. I'm ok with allowing local usage of exceptions within the same function - analogous to a goto. Anything more than that seems overkill. Think about using your scheme to map a function over some values: if the function throws any exception and I want to treat it as pure, I'd have to have a map function to accept and handle those exceptions or I'm breaking purity.
 

Much of the stdlib uses exceptions. So you eigther have to rewrite it
to be pure or it will be too hard to write pure code.


Should be pretty easy to add some pure functions to stdlib. Same applies to Core and Batteries. Alternatively, an 'unsafe' function could wrap the exception-throwing code and translate it to types.
 
I like phantom types for that. But that gets hairy with containers.
E.g. a Hashtbl of strings can be itself mutable or contain mutable
strings or both. So the type paramters of the hashtbl need phantom
types too. And thats where my ocaml magic broke down when I tried to
generalize that.


Phantom types seem too complex a solution for something as pervasive as strings. Since one almost never writes to strings, we might be able to pull off some kind of hack for reusing the string type immutably without creating an immutable string type.
 
-Yotam


> On Tue, Jan 21, 2014 at 4:49 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:
>
> > On Mon, Jan 20, 2014 at 03:45:15PM -0500, Yotam Barnoy wrote:
> > > I wanted to gauge the interest of people on the list in adding purity
> > > annotations to ocaml. Purity is one of those things that could really
> > help
> > > with reducing memory allocations through deforestation and decreasing the
> > > running time of programs written in the functional paradigm, and it could
> > > be very useful for parallelism as well. The basic scheme I have in mind
> > is
> > > this:
> > >
> > > - Functions that do not access mutable structures would be marked pure.
> > > - Functions that access only local mutable structures would be marked as
> > st
> > > (a la state monad)
> >
> > Does local include the arguments passed to the function?
> >
> > > - Functions that access global mutable data would be unmarked (as they
> > are
> > > now).
> > > - Pure functions can call st functions/code so long as all of the state
> > > referred to by the st code is contained within said pure functions.
> >
> > Because if arguments don't count this makes no sense.
> >
> > But then shouldn't there be another level for functions that don't
> > alter its arguments?
> >
> > > - Functions that call higher order functions, but do not modify mutable
> > > state would be marked hpure (half-pure). These functions would be pure so
> > > long as the functions they call remain pure. This allows List.map,
> > > List.fold etc to work for both pure and impure code.
> > > - The same thing exists for st code: hst represents functions that take
> > > higher order functions but only performs local state mutation.
> > > - In order to take advantage of this mechanism, there's no need to
> > annotate
> > > functions. The type inference algorithm will figure out the strictest
> > type
> > > that can be applied to a function and will save the annotation to an
> > > external, saved annotation file. This means that non-annotated code can
> > > take advantage of purity without doing any extra work, and the programmer
> > > never has to think about purity.
> > > - Having the purity annotations as an option is useful to force certain
> > > parts of the code, such as monads, to be pure.
> > > - An edge case: local state can be made to refer to global state by some
> > > external function call. Therefore, local state is considered 'polluted'
> > > (and global) if it is passed to an impure function.
> > > - Exceptions: not sure how to handle them yet. The easiest solution is to
> > > forbid them in st/pure code. Another easy alternative is to only allow
> > > catching them in impure code, as haskell does.
> > >
> > > Thoughts?
> > >
> > > -Yotam
> >
> > 1) What does pure mean? What does it gain you? How do you want to use it?
> >
> > 2) What syntax do you suggest for annotating functions?
> >
> > 3) Why are exceptions a problem?
> >
> > 4) Will this allow to annotate exceptions too so the compiler can
> > track which exception could be thrown and when all exceptions are
> > caught? If no exception can escape an function then it can be pure
> > again, right?
> >
> > MfG
> >         Goswin

MfG
        Goswin

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs