OCaml Weekly News

Previous Week Up Next Week

Hello

Here is the latest OCaml Weekly News, for the week of April 14 to 21, 2020.

Table of Contents

Current_incr: a small incremental library with no dependencies

Thomas Leonard announced

The recent OCurrent 0.2 release included a little incremental library which might be interesting to some people. It is useful for writing programs that need to keep some computation up-to-date efficiently as the inputs change.

It is similar to the existing incremental and react libraries already in opam. Unlike incremental (which pulls in the whole of core_kernel), current_incr has no runtime dependencies (and build dependencies only on ocaml and dune). Unlike react, current_incr immediately stops computations when they are no longer needed (rather than relying on weak references and the garbage collector).

It is a fairly direct implementation of the Adaptive Functional Programming paper, and might be a good starting point for people wanting to learn about that.

You can get the library using opam:

opam install current_incr

Here's a simple example (in utop):

#require "current_incr";;

let total = Current_incr.var 10
let complete = Current_incr.var 5

let status =
  Current_incr.of_cc begin
    Current_incr.read (Current_incr.of_var total) @@ function
    | 0 -> Current_incr.write "No jobs"
    | total ->
      Current_incr.read (Current_incr.of_var complete) @@ fun complete ->
      let frac = float_of_int complete /. float_of_int total in
      Printf.sprintf "%d/%d jobs complete (%.1f%%)"
                     complete total (100. *. frac)
      |> Current_incr.write
  end

This defines two input variables (total and complete) and a "changeable computation" (status) whose output depends on them. At the top-level, we can observe the initial state using observe:

# print_endline @@ Current_incr.observe status;;
5/10 jobs complete (50.0%)

Unlike a plain ref cell, a Current_incr.var keeps track of which computations depend on it. After changing them, you must call propagate to update the results:

# Current_incr.change total 12;;
# Current_incr.change complete 4;;
# print_endline @@ Current_incr.observe status;;
5/10 jobs complete (50.0%)      (* Not yet updated *)

# Current_incr.propagate ();
# print_endline @@ Current_incr.observe status;;
4/12 jobs complete (33.3%)

Computations can have side-effects, and you can use on_release to run some compensating action if the computation needs to be undone later. Here's a function that publishes a result, and also registers a compensation for it:

let publish msg =
  Printf.printf "PUBLISH: %s\n%!" msg;
  Current_incr.on_release @@ fun () ->
  Printf.printf "RETRACT: %s\n%!" msg

It can be used like this:

# let display = Current_incr.map publish status;;
PUBLISH: 4/12 jobs complete (33.3%)

# Current_incr.change total 0;
# Current_incr.propagate ()
RETRACT: 4/12 jobs complete (33.3%)
PUBLISH: No jobs

A major difference between this and the react library (which I've used in previously in 0install's progress reporting and CueKeeper) is that Current_incr does not depend on the garbage collector to decide when to stop a computation. In react, you'd have to be careful to make sure that display didn't get GC'd (even though you don't need to refer to it again) because if it did then the output would stop getting updated. Also, setting total to 0 in react might cause the program to crash with a division-by-zero exception, because the frac computation will continue running until it gets GC'd, even though it isn't needed for anything now.

Current_incr's API is pretty small. You might want to wrap it to provide extra features, e.g.

  • Use of a result type to propagate errors.
  • Integration with Lwt to allow asynchronous computations.
  • Static analysis to render your computation with graphviz.
  • Persistence of state to disk.

If you need that, consider using the main OCurrent library, which extends current_incr with these features.

Scikit-learn for OCaml

UnixJunkie announced

Ronan Lehy just hacked this:

https://github.com/lehy/ocaml-sklearn

This might interest a significant number of people out there. We are no more condemned to live in a world full of snakes that will bite us at run-time. :smiley:

Ronan Le Hy then said

So I came here to announce ocaml-sklearn as it just got published on Opam, but I see @UnixJunkie did it for me (arigato gozai masu). Anyway:

  • this ambitions to cover the complete scikit-learn API
  • this ambition is currently not totally realized, but I wanted to release something initial that one can play with
  • it's all @UnixJunkie's fault with his funny R wrappers.

So:

Anton Kochkov then added

OCaml and opam container images updated: new Fedora/Alpine/Ubuntu images

Anil Madhavapeddy announced

The Docker ocaml and opam container images have been updated:

  • Alpine 3.11, Fedora 31 and Ubuntu 20.04 (beta) are now included.
  • Ubuntu 19.04 and Fedora 29 and 30 are now deprecated.
  • OCaml 4.09.1 and 4.11.0~dev have been refreshed.

You can find the full details of the container images available on the OCaml infrastructure wiki.

The containers are generated from a set of scripts using ocaml-dockerfile, and will be migrating over the next six months to use an ocurrent-based infrastructure. There will be an announcement on this forum about any user-facing changes that involves, with plenty of time to transition your own CIs over. Thanks go to @talex5 and @XVilka for contributions to this round of updates.

OCamlformat 0.14.0

Jules announced

As Etienne mentioned, we have released OCamlformat 0.14.1, reverting the change to the defaults and our plans to deprecate the doc-comments option.

For projects that already upgraded to 0.14.0 (eg. Coq), the doc-comments option will change its meaning again. It is necessary to add doc-comments=before to have the documentation comments placed before. Moreover, the new option doc-comments-val added in 0.14.0 has a higher precedence than doc-comments, even when it's not set. It is thus necessary to set them both to before to have the old "before" behavior. This will be improved in the next release (see https://github.com/ocaml-ppx/ocamlformat/pull/1340).

Thank you to our early adopters to bear us. We are improving our release process to reduce confusion for the next updates. As usual, if you have any feedback, please open an issue on https://github.com/ocaml-ppx/ocamlformat to discuss it with us.

Hashconsing an AST via PPX

Chet Murthy announced

[up-front (so nobody gets the wrong idea): I'm not pushing Camlp5. Rather, I'm just noting that this sort of thing is really easy to do, and I encourage someone to do something similar using the PPX infrastructure.]

I didn't want to derail the "Future of PPX" thread, so I thought I'd post separately to answer ivg@ 's issue about hashconsing of ASTs using PPX. It's actually [uh, I think] really, really easy to implement hashconsing of ADTs, using a PPX extension. On a lark, I decided to do it today, and while the code I've got isn't sufficient to use, I think it's not very far away, and I have the perfect use-case already in-mind. It took me two hours to implement the rewriter and the testcase, on top of the other infrastructure, which has no support for hashconsing of any sort.

Here are some examples of data-types and functions that are automaticaly hash-consed. The idea is that in the pattern-match the pattern is annotated with a variable (in this example, "z"); the expression that is supposed to be hash-consed against that pattern is annotated with that same variable. [The code that descends to the expression is a little weak right now, but I think that's easily fixable.] The algorithm goes as follows:

(1) "decorate" the pattern with "as z_<integer>" variables everywhere in constructors. This allows us to refer to parts of the original value.

(2) then find each expression that is marked with that same varable. Structurally descend the pattern and the expression in parallel and generate code to compare sub-structure and hashcons where appropriate.

And that's really it. I'm sure this can be implemented using the PPX tools.

Some comments: (1) what's nice, is that we can just take already-written code like List.map and annotate it; that generates a hash-consed version. And since the generated code never uses deep structural equality (only pointer-equality) it should be only marginally slower than the original implementation.

(2) The variable in the annotation ("z") is used as the base for generating a whole slew of fresh variables, and I don't bother (yet) to check for clashes; this (again) is straightforward, but hey, I started two hours ago.

type t = Leaf of int | Node of t * int * t

module HCList = struct

let rec map f = function
    [][@hashrecons z] -> [][@hashrecons z]
  | (a::l)[@hashrecons z] -> let r = f a in ((r :: map f l)[@hashrecons z])

end

let deep =
let rec deep = (function
  Leaf n[@hashrecons z] -> Leaf n[@hashrecons z]
| Node (l, n, r) [@hashrecons z] ->
  Node (deep l, n, deep r) [@hashrecons z]
  )
[@@ocaml.warning "-26"]
in deep

type sexp =
  | Atom of string
  | List of sexp list

let sexp_deep =
  let rec deep = function
      Atom s[@hashrecons z] -> Atom s[@hashrecons z]
    | List l[@hashrecons z] -> List (HCList.map deep l)[@hashrecons z]
  in deep

Links: First, at the commit, so they won't change

the testcase file: https://github.com/chetmurthy/pa_ppx/commit/5dd6b2ef3ca3677e11a0ad696074200101bd661f#diff-e6dffe78fc6c27bdffa41970c4a7f1ca

the "ppx rewriter": https://github.com/chetmurthy/pa_ppx/commit/5dd6b2ef3ca3677e11a0ad696074200101bd661f#diff-24aeaf51366017948f5735727f001c85

Second, the files with human-readable names, etc.:

the testcase: https://github.com/chetmurthy/pa_ppx/blob/master/tests/test_hashrecons.ml

the "ppx rewriter": https://github.com/chetmurthy/pa_ppx/blob/master/pa_hashrecons/pa_hashrecons.ml

The projects:

chetmurthy/pa_ppx: A reimplementation of ppx_deriving, all its plugins, ppx_import, and a few others.

https://github.com/chetmurthy/pa_ppx

chetmurthy/camlp5: Camlp5, version pre-8.00 on which the above is based. This is on the branch 26.attempt-pa-deriving .

Kakadu said

I experimented with this some time ago for ML workshop. The idea was to provide function: t -> htbl -> htbl * t which rewrites value of type t by removing equal subtrees. Essentially it is just a fold over data type.

https://github.com/kakadu/GT/blob/master/regression/test816hash.ml#L74

Chet Murthy asked and Josh Berdine replied

If you wanna use a hashtable (and, I presume, Obj.magic) you can write a single function that does the trick for all immutable data-types, right?

Yes, we have some magic @mbouaziz code in Infer that does this to create as much sharing as possible as values are Marshaled out.

Genprint v0.4

progman announced

A re-announcement of Genprint, a general value printing library, that addresses prior limitations that made it none too useful!

  1. It didn't work correctly for 4.08.0, the latest compiler release as of first announcement (though fine for 4.02 .. 4.07.1)
  2. There was an awkward need to specify a search path for .cmt files when working with the likes of Dune (which uses separate directories for source, .cmi and (for opt) .cmt files)
  3. More often than not values of interest would display simply as <abstr> owing to the presence of signature abstraction of the module's type of interest.

These issues have been addressed:

  1. Works with versions 4.04 .. 4.10.0 (earlier versions became invalid after a dependency change to ppxlib)
  2. The location of .cmt files is captured automatically by the PPX preprocessor.
  3. Signatures at the implementation level (.mli files) and internally (functor application constraints) are removed to reveal the inner structure of otherwise abstract values. For instance, the Ephemeron module:

    module EM=Ephemeron.K1.Make(struct type t=int let equal=(=) let hash=Hashtbl.hash end)
    open EM
    let _=
      let v=EM.create 0 in
      EM.add v 12345678 'X';
      let emprint ppf (v: Obj.Ephemeron.t) =
        Format.fprintf ppf "<C wrapper of key/data>" in
      [%install_printer emprint];
      [%pr ephem v];
    

    Which prints:

    ephem => {size = 1;
              data =
               [|Empty; Empty; Empty; Empty; Empty; Empty; Empty; Empty; Empty;
                 Empty; Empty; Cons (922381915, <C wrapper of key/data>, Empty);
                 Empty; Empty; Empty; Empty|];
              seed = 0; initial_size = 16}
    

    This also demos the [%install_printer] facility which mirrors the REPL's.

Installation is via the Opam main repository.

Additionally, the project repository contains two compiler versions of ocamldebug integrated with the Genprint library which thereby becomes its default printer.

All of which makes this library much more useful than previously. See the project page for the details.

Other OCaml News

From the ocamlcore planet blog

Editor note: Thanks to ezcurl, I can restore this section. I'm putting all the links this week, I will prune to only put the new ones next week.

Here are links from many OCaml blogs aggregated at OCaml Planet.

Old CWN

If you happen to miss a CWN, you can send me a message and I'll mail it to you, or go take a look at the archive or the RSS feed of the archives.

If you also wish to receive it every week by mail, you may subscribe online.