From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id A875E7EE99 for ; Tue, 21 Jan 2014 16:27:36 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of yotambarnoy@gmail.com) identity=pra; client-ip=209.85.128.48; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yotambarnoy@gmail.com"; x-sender="yotambarnoy@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of yotambarnoy@gmail.com designates 209.85.128.48 as permitted sender) identity=mailfrom; client-ip=209.85.128.48; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yotambarnoy@gmail.com"; x-sender="yotambarnoy@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-qe0-f48.google.com) identity=helo; client-ip=209.85.128.48; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yotambarnoy@gmail.com"; x-sender="postmaster@mail-qe0-f48.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoIEAIaR3lLRVYAwlGdsb2JhbABag0NWiEegJjKJfIhSgQkIFg4BAQEBBwsLCRIqgiUBAQEDARomARsQAggDAQMBCwYFCw0NISIBEQEFAQoSBhMICgmHVQEDCQgNm3SMXIMJkiMKGScDChVPhHIRAQUMjm8EB4Q4BIlHinWDZoEyjnkYKYR3Hg X-IPAS-Result: AoIEAIaR3lLRVYAwlGdsb2JhbABag0NWiEegJjKJfIhSgQkIFg4BAQEBBwsLCRIqgiUBAQEDARomARsQAggDAQMBCwYFCw0NISIBEQEFAQoSBhMICgmHVQEDCQgNm3SMXIMJkiMKGScDChVPhHIRAQUMjm8EB4Q4BIlHinWDZoEyjnkYKYR3Hg X-IronPort-AV: E=Sophos;i="4.95,696,1384297200"; d="scan'208";a="54217374" Received: from mail-qe0-f48.google.com ([209.85.128.48]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 21 Jan 2014 16:27:34 +0100 Received: by mail-qe0-f48.google.com with SMTP id b4so2569437qen.21 for ; Tue, 21 Jan 2014 07:27:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=ZZWx1znW8waHR+Uk1MbvldAIfRGbVHnIoHqnjuN/pbo=; b=AsYCFqnniUzldrUlSNjs1kELOXqF1HmRcdBptB/ekn8fjcoevOt+e9VpdVdqn3LTV7 LW/iNMz4t37hXLiazrh5PZ89qKjoFiVDD4LMogl0pdOz2f/nHpRmLyUvDcovpv/dAuFC Zn8MXZI/3FBWWtqGFPOsjpDrBRZJeZNNFpThHkptb6hAOz+AtyMJn62SWFPoPmucVkTT 00YLfSq7jaHb8clnnDpr3146ctDWaKX56IFvwS8jAvZqkSwQghjpeSDL9o4DevRzvbM5 2lrDDBtC1GJACnlMWX2pjimc+BhOZ5D3NvyQq7mM7uStvMjYBj3+x289yvqy35MVhwHZ i7Jw== X-Received: by 10.140.34.196 with SMTP id l62mr36433570qgl.51.1390318053575; Tue, 21 Jan 2014 07:27:33 -0800 (PST) MIME-Version: 1.0 Received: by 10.224.95.8 with HTTP; Tue, 21 Jan 2014 07:27:13 -0800 (PST) In-Reply-To: <20140121094939.GA13578@frosties> References: <20140121094939.GA13578@frosties> From: Yotam Barnoy Date: Tue, 21 Jan 2014 10:27:13 -0500 Message-ID: To: Goswin von Brederlow Cc: Ocaml Mailing List Content-Type: multipart/alternative; boundary=001a11c106f2b5517404f07ca5b8 Subject: Re: [Caml-list] Purity in ocaml --001a11c106f2b5517404f07ca5b8 Content-Type: text/plain; charset=ISO-8859-1 Great questions. There are indeed different definitions of the word 'pure'. What I'm going for here is very similar to what haskell does by default: 1) Purity in my definition means no side effects, including the lack of access to mutable state. That means no reading from or writing to mutable fields, strings, arrays, and refs. It also means no printing, no network messages etc. Pure code has many advantages, particularly in functional languages. Since the result of each call to a pure function is entirely dependent on its arguments (and its closure, which by definition is constant), pure calls can be made in any order and still produce the same result. This is great for optimization, for example, for grouping allocations together. Additionally, knowledge of purity can allow advanced optimization techniques like deforestation ( http://research.microsoft.com/apps/pubs/default.aspx?id=67493), which allows intermediate data structures to be eliminated. So, for example, a range creation that feeds a map function that feeds a fold function that outputs a value could be optimized into a single for loop, allowing all allocations to be eliminated. This is a huge performance benefit for functional languages. Pure code is also safer. The lack of side effects means that there's less chance for complexity and interdependence -- ideas well known to functional programmers. Purity tends to cause formation of programs arranged in a star schema: impure central code, usually manipulating input, output and global state, calls out to pure functions forming the arms of the star. which do the main calculation of the program. This is because impure code can call pure code, but pure code can only call other pure code. In fact, I already program this way in ocaml, and I imagine many others do as well. It's a great way to make sure your programs don't involve unintended interactions through global state. (In haskell, you're forced to program this way because of the type system). Finally, purity is often very useful for parallelization. Functions without side effects can easily be run in parallel in many paradigms. So that's the rationale for pure functions. Since ocaml allows for mutable state, it's useful to create another class of purity called st, which corresponds to the state monad in haskell. In haskell, having even local mutable state requires a state monad, which is both cumbersome and verbose. In ocaml, you get the state monad for free by allowing for a class of functions that can access local state. By local state I mean any mutable variables that are in some local scope. The reason for this is that local variables are not accessible from external code. A pure function can also include local mutable state. So long as that state is not accessible from another, external scope, it's not considered a side effect. The local state is thus contained within the pure function and the function maintains its purity. So a pure function could, for example, create some mutable state, and pass it by refs or by closure to an st function. This is the kind of state I already use on a regular basis: for example, computing something expensive using an internal hashtable that has only local scope and is thus destroyed once the function returns. It also makes sense to define a half-pure annotation to enable reuse of library functions. The classic map and fold functions don't do anything impure themselves, but they can't vouch for the higher-order functions they call. By declaring half-purity (hpure), you can now use these functions in both pure and impure context (something haskell cannot do). Any pure function that takes a higher function without restricting its purity is essentially a half-pure function. The nice thing is, this can all be done mostly transparently. You never need to annotate anything, since the compiler will just figure out which parts of your code are pure and impure and add annotations to your type signatures. In response to your question, a function that doesn't modify its mutable argument can be considered pure so long as it also doesn't read from said argument. This is very useful when you have a large record with some mutable fields. So long as a function reads only from the immutable fields, it's still pure. If it accesses the immutable fields, it becomes an st function, though that means it can also be called with a global reference, in which case it functions as impure in that context. If it calls an impure function directly, it becomes impure, and can no longer be called from a pure function. 2) I've been debating syntax for a long time. I've settled on pure, st, hpure, hst and unsafe in function types for now. In most cases, the programmer should not have to think of purity. The way I envision it, you write your function signatures as you do now, and the compiler would try to infer if it can make the signatures any stricter. It would save the stricter signatures into another signature file. If an extra signature file is present, the compiler will use that when compiling modules. Otherwise, it will use the regular signature file and assume every function in the module is impure (which is the default). Sometime you will want to control the purity of functions. For example, a parallel monadic library may want to make sure the functions it calls are pure. In this case you can put in type annotations for purity. The compiler will never make functions less strict than their purity annotations. So many times I've come across libraries where the author says, unfortunately ocaml doesn't side effects so we had to do X and hope for the best. Well, this changes that. It allows for ocaml to be as pure as haskell is when there's a need for it. Nevertheless, I don't expect pure code to become the dominant paradigm in ocaml. 3) Exceptions are considered a side effect. See this answer http://stackoverflow.com/a/10720037. I can't say I understand all the specifics, but one can reason that unless every exception thrown by a function is caught right at its invocation (like different variants), the behavior of said function is unpredictable. Haskell's solution is to only allow exceptions to be caught by impure code, and I'm mimicking this solution, with an additional allowance for catching exceptions raised within the same function (and serving as gotos). An alternative solution is to annotate the types of functions with all the exceptions they may raise, but this becomes extremely hairy when you consider higher order functions (functions become incompatible with each other's type because they raise different exceptions), so I'd rather not go that way. Pure code in general doesn't rely much on exceptions except to signal a catastrophic failure, and I don't see a reason to stray from that. For the most part, I don't see people programming exclusively in pure code anyway. Rather, an impure function can have pure, st and impure parts within it. For example, suppose an impure function writes to some global state, catches an exception, reads from some local state, and maps and folds pure lambdas over a collection. The function as a whole is clearly impure. However, the parts of the function that are pure can be rearranged and optimized -- the map/fold may be turned into a for loop if there are no intermediate dependencies between them and the st/impure parts. The st parts cannot be reordered - state modification to the same reference must retain its order - but they can be moved around to different areas of the function for optimization. Thus, with good cross-module information about purity, you can still do a lot within impure functions to improve performance. On the other hand, libraries for parallel computation probably want to enforce purity, and that means eschewing exceptions as well. One issue I haven't discussed is the matter of strings and arrays. Strings are mutable by default, which means that any read of a string amounts to reading mutable state, causing at least an st annotation. One way of dealing with this is by adding an immutable string type, which could be constructed by something like "hello world"p, where the p stands for pure. Another option is that since mutation of strings is so rare, a flag could exist per type scope to signal if strings are modified anywhere in the local scope. If they're not, they can be treated as immutable within the local type scope. Type scope can be defined as the extent up to which a type is kept abstract -- if I have a module with a type t that contains a string and writes to it, but t is not fully exported, then outside of this module, nobody can write to strings. It would also be useful to have immutable arrays, which can be turned into mutable arrays only after optimization. I hope this clarifies things. -Yotam On Tue, Jan 21, 2014 at 4:49 AM, Goswin von Brederlow wrote: > On Mon, Jan 20, 2014 at 03:45:15PM -0500, Yotam Barnoy wrote: > > I wanted to gauge the interest of people on the list in adding purity > > annotations to ocaml. Purity is one of those things that could really > help > > with reducing memory allocations through deforestation and decreasing the > > running time of programs written in the functional paradigm, and it could > > be very useful for parallelism as well. The basic scheme I have in mind > is > > this: > > > > - Functions that do not access mutable structures would be marked pure. > > - Functions that access only local mutable structures would be marked as > st > > (a la state monad) > > Does local include the arguments passed to the function? > > > - Functions that access global mutable data would be unmarked (as they > are > > now). > > - Pure functions can call st functions/code so long as all of the state > > referred to by the st code is contained within said pure functions. > > Because if arguments don't count this makes no sense. > > But then shouldn't there be another level for functions that don't > alter its arguments? > > > - Functions that call higher order functions, but do not modify mutable > > state would be marked hpure (half-pure). These functions would be pure so > > long as the functions they call remain pure. This allows List.map, > > List.fold etc to work for both pure and impure code. > > - The same thing exists for st code: hst represents functions that take > > higher order functions but only performs local state mutation. > > - In order to take advantage of this mechanism, there's no need to > annotate > > functions. The type inference algorithm will figure out the strictest > type > > that can be applied to a function and will save the annotation to an > > external, saved annotation file. This means that non-annotated code can > > take advantage of purity without doing any extra work, and the programmer > > never has to think about purity. > > - Having the purity annotations as an option is useful to force certain > > parts of the code, such as monads, to be pure. > > - An edge case: local state can be made to refer to global state by some > > external function call. Therefore, local state is considered 'polluted' > > (and global) if it is passed to an impure function. > > - Exceptions: not sure how to handle them yet. The easiest solution is to > > forbid them in st/pure code. Another easy alternative is to only allow > > catching them in impure code, as haskell does. > > > > Thoughts? > > > > -Yotam > > 1) What does pure mean? What does it gain you? How do you want to use it? > > 2) What syntax do you suggest for annotating functions? > > 3) Why are exceptions a problem? > > 4) Will this allow to annotate exceptions too so the compiler can > track which exception could be thrown and when all exceptions are > caught? If no exception can escape an function then it can be pure > again, right? > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > --001a11c106f2b5517404f07ca5b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Great questions. There are indeed different= definitions of the word 'pure'. What I'm going for here is ver= y similar to what haskell does by default:

1) Purity in my def= inition means no side effects, including the lack of access to mutable stat= e. That means no reading from or writing to mutable fields, strings, arrays= , and refs. It also means no printing, no network messages etc.

Pure code has many advantages, particularly in functional languages. Si= nce the result of each call to a pure function is entirely dependent on its= arguments (and its closure, which by definition is constant), pure calls c= an be made in any order and still produce the same result. This is great fo= r optimization, for example, for grouping allocations together.

Additionally, knowledge of purity can allow advanced optimization techn= iques like deforestation (http://research.microsoft.com/= apps/pubs/default.aspx?id=3D67493), which allows intermediate data stru= ctures to be eliminated. So, for example, a range creation that feeds a map= function that feeds a fold function that outputs a value could be optimize= d into a single for loop, allowing all allocations to be eliminated. This i= s a huge performance benefit for functional languages.

Pure code is also safer. The lack of side effects means that= there's less chance for complexity and interdependence -- ideas well k= nown to functional programmers. Purity tends to cause formation of programs= arranged in a star schema: impure central code, usually manipulating input= , output and global state, calls out to pure functions forming the arms of = the star. which do the main calculation of the program. This is because imp= ure code can call pure code, but pure code can only call other pure code. I= n fact, I already program this way in ocaml, and I imagine many others do a= s well. It's a great way to make sure your programs don't involve u= nintended interactions through global state. (In haskell, you're forced= to program this way because of the type system).

Finally, purity is often very useful for parallelization. Functio= ns without side effects can easily be run in parallel in many paradigms.
So that's the rationale for pure functions. Since ocaml allo= ws for mutable state, it's useful to create another class of purity cal= led st, which corresponds to the state monad in haskell. In haskell, having= even local mutable state requires a state monad, which is both cumbersome = and verbose. In ocaml, you get the state monad for free by allowing for a c= lass of functions that can access local state. By local state I mean any mu= table variables that are in some local scope. The reason for this is that l= ocal variables are not accessible from external code. A pure function can a= lso include local mutable state. So long as that state is not accessible fr= om another, external scope, it's not considered a side effect. The loca= l state is thus contained within the pure function and the function maintai= ns its purity. So a pure function could, for example, create some mutable s= tate, and pass it by refs or by closure to an st function. This is the kind= of state I already use on a regular basis: for example, computing somethin= g expensive using an internal hashtable that has only local scope and is th= us destroyed once the function returns.

It also makes sense to define a half-pure annotati= on to enable reuse of library functions. The classic map and fold functions= don't do anything impure themselves, but they can't vouch for the = higher-order functions they call. By declaring half-purity (hpure), you can= now use these functions in both pure and impure context (something haskell= cannot do). Any pure function that takes a higher function without restric= ting its purity is essentially a half-pure function.

The nice thing is, this can all be done mostly transparently= . You never need to annotate anything, since the compiler will just figure = out which parts of your code are pure and impure and add annotations to you= r type signatures.

In response to your question, a function that doe= sn't modify its mutable argument can be considered pure so long as it a= lso doesn't read from said argument. This is very useful when you have = a large record with some mutable fields. So long as a function reads only f= rom the immutable fields, it's still pure. If it accesses the immutable= fields, it becomes an st function, though that means it can also be called= with a global reference, in which case it functions as impure in that cont= ext. If it calls an impure function directly, it becomes impure, and can no= longer be called from a pure function.

2) I've been debating syntax for a long time. I've s= ettled on pure, st, hpure, hst and unsafe in function types for now. In mos= t cases, the programmer should not have to think of purity. The way I envis= ion it, you write your function signatures as you do now, and the compiler = would try to infer if it can make the signatures any stricter. It would sav= e the stricter signatures into another signature file. If an extra signatur= e file is present, the compiler will use that when compiling modules. Other= wise, it will use the regular signature file and assume every function in t= he module is impure (which is the default).

Sometime you will want to control the purity of functions. F= or example, a parallel monadic library may want to make sure the functions = it calls are pure. In this case you can put in type annotations for purity.= The compiler will never make functions less strict than their purity annot= ations.

So many times I've come across libraries where the author says, unf= ortunately ocaml doesn't side effects so we had to do X and hope for th= e best. Well, this changes that. It allows for ocaml to be as pure as haske= ll is when there's a need for it. Nevertheless, I don't expect pure= code to become the dominant paradigm in ocaml.

3) Exceptions are considered a side effect. See t= his answer http://stackover= flow.com/a/10720037. I can't say I understand all the specifics, bu= t one can reason that unless every exception thrown by a function is caught= right at its invocation (like different variants), the behavior of said fu= nction is unpredictable. Haskell's solution is to only allow exceptions= to be caught by impure code, and I'm mimicking this solution, with an = additional allowance for catching exceptions raised within the same functio= n (and serving as gotos). An alternative solution is to annotate the types = of functions with all the exceptions they may raise, but this becomes extre= mely hairy when you consider higher order functions (functions become incom= patible with each other's type because they raise different exceptions)= , so I'd rather not go that way.

Pure code in general doesn't rely much on exceptions except to sign= al a catastrophic failure, and I don't see a reason to stray from that.= For the most part, I don't see people programming exclusively in pure = code anyway. Rather, an impure function can have pure, st and impure parts = within it. For example, suppose an impure function writes to some global st= ate, catches an exception, reads from some local state, and maps and folds = pure lambdas over a collection. The function as a whole is clearly impure. = However, the parts of the function that are pure can be rearranged and opti= mized -- the map/fold may be turned into a for loop if there are no interme= diate dependencies between them and the st/impure parts. The st parts canno= t be reordered - state modification to the same reference must retain its o= rder - but they can be moved around to different areas of the function for = optimization. Thus, with good cross-module information about purity, you ca= n still do a lot within impure functions to improve performance. On the oth= er hand, libraries for parallel computation probably want to enforce purity= , and that means eschewing exceptions as well.

One issue I haven't discussed is the matter of strings a= nd arrays. Strings are mutable by default, which means that any read of a s= tring amounts to reading mutable state, causing at least an st annotation. = One way of dealing with this is by adding an immutable string type, which c= ould be constructed by something like "hello world"p, where the p= stands for pure. Another option is that since mutation of strings is so ra= re, a flag could exist per type scope to signal if strings are modified any= where in the local scope. If they're not, they can be treated as immuta= ble within the local type scope. Type scope can be defined as the extent up= to which a type is kept abstract -- if I have a module with a type t that = contains a string and writes to it, but t is not fully exported, then outsi= de of this module, nobody can write to strings. It would also be useful to = have immutable arrays, which can be turned into mutable arrays only after o= ptimization.

I hope this clarifies things.

-= Yotam


On Tue, Jan 21, 2014 at 4:49 AM, Goswin von Brederlow <go= swin-v-b@web.de> wrote:
On Mon, Jan 20, 2014= at 03:45:15PM -0500, Yotam Barnoy wrote:
> I wanted to gauge the interest of people on the list in adding purity<= br> > annotations to ocaml. Purity is one of those things that could really = help
> with reducing memory allocations through deforestation and decreasing = the
> running time of programs written in the functional paradigm, and it co= uld
> be very useful for parallelism as well. The basic scheme I have in min= d is
> this:
>
> - Functions that do not access mutable structures would be marked pure= .
> - Functions that access only local mutable structures would be marked = as st
> (a la state monad)

Does local include the arguments passed to the function?

> - Functions that access global mutable data would be unmarked (as they= are
> now).
> - Pure functions can call st functions/code so long as all of the stat= e
> referred to by the st code is contained within said pure functions.

Because if arguments don't count this makes no sense.

But then shouldn't there be another level for functions that don't<= br> alter its arguments?

> - Functions that call higher order functions, but do not modify mutabl= e
> state would be marked hpure (half-pure). These functions would be pure= so
> long as the functions they call remain pure. This allows List.map,
> List.fold etc to work for both pure and impure code.
> - The same thing exists for st code: hst represents functions that tak= e
> higher order functions but only performs local state mutation.
> - In order to take advantage of this mechanism, there's no need to= annotate
> functions. The type inference algorithm will figure out the strictest = type
> that can be applied to a function and will save the annotation to an > external, saved annotation file. This means that non-annotated code ca= n
> take advantage of purity without doing any extra work, and the program= mer
> never has to think about purity.
> - Having the purity annotations as an option is useful to force certai= n
> parts of the code, such as monads, to be pure.
> - An edge case: local state can be made to refer to global state by so= me
> external function call. Therefore, local state is considered 'poll= uted'
> (and global) if it is passed to an impure function.
> - Exceptions: not sure how to handle them yet. The easiest solution is= to
> forbid them in st/pure code. Another easy alternative is to only allow=
> catching them in impure code, as haskell does.
>
> Thoughts?
>
> -Yotam

1) What does pure mean? What does it gain you? How do you want to use= it?

2) What syntax do you suggest for annotating functions?

3) Why are exceptions a problem?

4) Will this allow to annotate exceptions too so the compiler can
track which exception could be thrown and when all exceptions are
caught? If no exception can escape an function then it can be pure
again, right?

MfG
=A0 =A0 =A0 =A0 Goswin

--
Caml-list mailing list. =A0Subscription management and archives:
ht= tps://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

--001a11c106f2b5517404f07ca5b8--