As someone who does not yet have the expertise to clearly see the cost of abstractions by looking at the source code, the lack of a freely available and easily usable allocation profiler is often a hindrance for writing high-performance OCaml code. I have implemented a simple allocation profiler for the bytecode interpreter [1]. Part of the motivation to write one for bytecode instead of native is to profile the multicore OCaml compiler, which doesn’t support native compilation yet. I find it to be quite useful in practice to get an overview of allocation bottlenecks before applying targeted optimisations. 

The profiler is quite naive at this point. The tooling support is non-existent; one has to manually search through the relevant text files to figure out the source of allocations. I am interested in understanding how to make this better. As a first step, I would like to do something similar to `ocamlprof`. I plan to keep improving the allocation profiler for vanilla OCaml as the multicore OCaml development continues.

