caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] default fold for visitors
@ 2017-08-17 11:57 Christoph Höger
  2017-08-17 12:48 ` Gabriel Scherer
  2017-08-19  9:25 ` François Pottier
  0 siblings, 2 replies; 3+ messages in thread
From: Christoph Höger @ 2017-08-17 11:57 UTC (permalink / raw)
  To: OCaml Mailing List, francois.pottier

[-- Attachment #1: Type: text/plain, Size: 1823 bytes --]

Dear all (especially Francois),

I am currently porting a DSL to OCaml and wanted to use the most modern ppx
approach for the typical boilerplate. One of the first things is the
implementation of the free variables with the help of ppx_visitors.
Naturally, the set of free variables forms a monoid, so I can comfortably
use the reduce variety. But that allocates quite a lot of temporary sets,
so I had a look into the fold variety.

If I understand the documentation correctly, this class requires to define
a build method for each variant of the datatype. I wonder if there is a way
to have a "default" function, namely the identity of the result value?

Consider, for example the simple lambda calculus:


type expr = Abs of (string[@opaque]) * expr | App of expr * expr | Var of
(string[@opaque]) [@@deriving visitors { variety = "fold" }]

In that case, you'd want to define

method build_Abs () x = StrSet.remove x
method build_Var () = StrSet.singleton

but also have to

method build_App () = StrSet.union

According to the manual, the visitor should be something like this:

method visit_App () l r fvs0 =
  let fvs1 = self#visit_expr () l fvs0 in
  let fvs2 = self#visit_expr () l fvs1 in
  self#build_App () fvs1 fvs2                (* Why is this /always/
necessary ? *)

I suppose this final step of building the results allows some flexibility,
but in my case I could as well just yield fvs2. I wonder if it would be
possible to have a default implementation like this:

method build_App () _ r = r

For now, I am perfectly fine with the reduce variety, but I suppose that
there are some use cases where you would actually need to fold, but only
some variants actually do something. Is this a reasonable idea or is there
some caveat in the design of ppx_visitors that makes it impossible?

thanks,

Christoph

[-- Attachment #2: Type: text/html, Size: 2000 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] default fold for visitors
  2017-08-17 11:57 [Caml-list] default fold for visitors Christoph Höger
@ 2017-08-17 12:48 ` Gabriel Scherer
  2017-08-19  9:25 ` François Pottier
  1 sibling, 0 replies; 3+ messages in thread
From: Gabriel Scherer @ 2017-08-17 12:48 UTC (permalink / raw)
  To: Christoph Höger; +Cc: OCaml Mailing List, Francois Pottier

[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

This is only a minor comment not addressing the main question, but I was
surprised by Christoph's assumption that folding on a tree (repeatedly
calling Set.add) would be more efficient than a divide-and-conquer merge
approach (repeatedly calling Set.union), at least with a
balanced-search-tree implementation of sets. I would rather have assumed
that a divide-and-conquer merge may be faster, on the vague hunch that
merging whole trees instead of single leaves (adding a key is like merging
a leave) allows to reuse information between comparing the root of a tree
and merging its substrees.

I wrote a quick benchmark (
https://gitlab.com/gasche-snippets/tree-fold-merge-benchmark ) that
collects leaves of a binary tree into a set, and it turns out that which is
faster depends on the key ordering. If the keys are completely random, then
the fold-based code is twice faster. If the keys are mostly sorted, then
the merge-based code is twice faster. The fold-based code performance seems
independent of the key distribution, it is the merge-based code that goes
from twice slower to twice faster depending on the scenario.

My proposed explanation would be that merging two trees where all the keys
in a tree are larger than almost all the keys in the other is fast (this is
the case where little rebalancing is needed), so this hits a good case
where tree-form indeed reuses comparison information. On the contrary,
random trees are not faster than just calling add on all leaves.

(I would guess that in complexity tree union is always as good as
repeated-add or better, for all key distributions, but that the constant
factors of the current implementation are higher because we manipulate more
slightly complex data structures.)



On Thu, Aug 17, 2017 at 1:57 PM, Christoph Höger <
christoph.hoeger@celeraone.com> wrote:

> Dear all (especially Francois),
>
> I am currently porting a DSL to OCaml and wanted to use the most modern
> ppx approach for the typical boilerplate. One of the first things is the
> implementation of the free variables with the help of ppx_visitors.
> Naturally, the set of free variables forms a monoid, so I can comfortably
> use the reduce variety. But that allocates quite a lot of temporary sets,
> so I had a look into the fold variety.
>
> If I understand the documentation correctly, this class requires to define
> a build method for each variant of the datatype. I wonder if there is a way
> to have a "default" function, namely the identity of the result value?
>
> Consider, for example the simple lambda calculus:
>
>
> type expr = Abs of (string[@opaque]) * expr | App of expr * expr | Var of
> (string[@opaque]) [@@deriving visitors { variety = "fold" }]
>
> In that case, you'd want to define
>
> method build_Abs () x = StrSet.remove x
> method build_Var () = StrSet.singleton
>
> but also have to
>
> method build_App () = StrSet.union
>
> According to the manual, the visitor should be something like this:
>
> method visit_App () l r fvs0 =
>   let fvs1 = self#visit_expr () l fvs0 in
>   let fvs2 = self#visit_expr () l fvs1 in
>   self#build_App () fvs1 fvs2                (* Why is this /always/
> necessary ? *)
>
> I suppose this final step of building the results allows some flexibility,
> but in my case I could as well just yield fvs2. I wonder if it would be
> possible to have a default implementation like this:
>
> method build_App () _ r = r
>
> For now, I am perfectly fine with the reduce variety, but I suppose that
> there are some use cases where you would actually need to fold, but only
> some variants actually do something. Is this a reasonable idea or is there
> some caveat in the design of ppx_visitors that makes it impossible?
>
> thanks,
>
> Christoph
>

[-- Attachment #2: Type: text/html, Size: 4266 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] default fold for visitors
  2017-08-17 11:57 [Caml-list] default fold for visitors Christoph Höger
  2017-08-17 12:48 ` Gabriel Scherer
@ 2017-08-19  9:25 ` François Pottier
  1 sibling, 0 replies; 3+ messages in thread
From: François Pottier @ 2017-08-19  9:25 UTC (permalink / raw)
  To: Christoph Höger, OCaml Mailing List


Dear Christoph,

> Naturally, the set of free variables forms a monoid, so I can
> comfortably use the reduce variety. But that allocates quite a lot of
> temporary sets, so I had a look into the fold variety.

The "fold" variety is more general than the "reduce" variety, but follows
the same pattern (a bottom-up computation, involving set unions), so I do
not think that using "fold" instead of "reduce" will allow you to allocate
fewer temporary sets.

An alternative approach is to use a mutable set (or a mutable reference to
an immutable set) and perform a linear (say, left-to-right) tree traversal
instead of a bottom-up tree traversal. Then, instead of Set.union, only
Set.add is needed. This can be done using the "iter" variety of visitors.

As noted by Gabriel, it is not clear a priori which of these two 
approaches is
faster. This depends on your data and on the data structure with which you
implement sets.

Concerning the implementation of visitors that traverse abstract syntax with
binding, let me mention my upcoming ICFP paper:

   http://gallium.inria.fr/~fpottier/publis/fpottier-visitors-unchained.pdf

and the accompanying library, alphaLib (as yet unreleased / undocumented):

   https://gitlab.inria.fr/fpottier/alphaLib

You may be particularly interested in the implementation of the computation
of the free variables of a term:

   https://gitlab.inria.fr/fpottier/alphaLib/blob/master/src/KitFa.ml
 
https://gitlab.inria.fr/fpottier/alphaLib/blob/master/src/AlphaLibMacros.cppo.ml

> If I understand the documentation correctly, this class requires to
> define a build method for each variant of the datatype. I wonder if
> there is a way to have a "default" function

At the moment, the "fold" variety requires a linear number of "build_"
methods to be written by hand. If you wish to avoid this, then you have
to use a more specific variety, such as "iter", "map", "reduce", etc.

I am aware that the varieties that exist at the moment are somewhat
arbitrary and do not cover every possible wish, but I am deliberately
trying to avoid an explosion in the number of varieties, which I am
afraid would make the package more difficult to maintain.

--
François Pottier
francois.pottier@inria.fr
http://gallium.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-19  9:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-17 11:57 [Caml-list] default fold for visitors Christoph Höger
2017-08-17 12:48 ` Gabriel Scherer
2017-08-19  9:25 ` François Pottier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).