I am pleased to announce the 111.06.00 release of the Core suite. A new package appears with this release: async_ssl. It is an Async-pipe-based interface with OpenSSL. Bindings to OpenSSL are written with ctypes. The following packages were upgraded: - async_extra - async_kernel - async_unix - core - core_extended - core_kernel - jenga - re2 - textutils - typerep Files and documentation for this release are available on our website and all packages are in opam: https://ocaml.janestreet.com/ocaml-core/111.06.00/individual/ https://ocaml.janestreet.com/ocaml-core/111.06.00/doc/ Here is list of changes for this version: # 111.06.00 ## async_extra - Added `?on_wouldblock:(unit -> unit)` callback to `Udp.recvmmsg_loop` and `recvmmsg_no_sources_loop`. - For functions that create `Rpc` connections, added optional arguments: `?max_message_size:int` and `?handshake_timeout:Time.Span.t`. These arguments were already available to `Connection.create`, but are now uniformly available to all functions that create connections. ## async_kernel - Improved the performance of `Pipe.filter_map` by using batching. ## async_ssl Initial release ## async_unix - In the `Busy_pollers.t` record, made the `kernel_scheduler` field be `sexp_opaque`. Did this so that one doesn't get two copies of the kernel scheduler in sexps of the scheduler, which already has its own `kernel_scheduler` field. ## core - Added inline benchmarks for =Iobuf= and =Time=. Hera are some of the results from the new benchmarks, with some indexed tests dropped. | Name | Time/Run | mWd/Run | Percentage | |--------------------------------------|----------|---------|------------| | [time.ml:Time] Time.to_string | 848.74ns | 249.98w | 100.00% | | [time.ml:Time] Time.to_ofday | 59.66ns | 38.00w | 7.03% | | [time.ml:Time] Time.now | 39.78ns | 2.00w | 4.69% | | [time.ml:Time] Time.Zone.find_office | 83.64ns | 4.00w | 9.85% | | [time.ml:Time] Time.Span.of_hr | 3.71ns | 2.00w | 0.44% | | [time.ml:Time] Time.Span.of_min | 3.69ns | 2.00w | 0.44% | | [time.ml:Time] Time.Span.of_sec | 2.72ns | | 0.32% | | [time.ml:Time] Time.Span.of_ms | 6.02ns | 2.00w | 0.71% | | [time.ml:Time] Time.Span.of_ns | 5.98ns | 2.00w | 0.71% | | Name | Time/Run | Percentage | |------------------------------------------|----------|------------| | [iobuf.ml:Blit tests] functor blit:5 | 15.53ns | 7.66% | | [iobuf.ml:Poke tests] char:0 | 4.11ns | 2.03% | | [iobuf.ml:Poke tests] uint8:0 | 5.35ns | 2.64% | | [iobuf.ml:Poke tests] int8:0 | 4.59ns | 2.26% | | [iobuf.ml:Poke tests] int16_be:0 | 5.19ns | 2.56% | | [iobuf.ml:Poke tests] int16_le:0 | 5.14ns | 2.53% | | [iobuf.ml:Poke tests] uint16_be:0 | 5.11ns | 2.52% | | [iobuf.ml:Poke tests] uint16_le:0 | 5.12ns | 2.53% | | [iobuf.ml:Poke tests] int32_be:0 | 5.17ns | 2.55% | | [iobuf.ml:Poke tests] int32_le:0 | 4.91ns | 2.42% | | [iobuf.ml:Poke tests] uint32_be:0 | 5.73ns | 2.83% | | [iobuf.ml:Poke tests] uint32_le:0 | 5.74ns | 2.83% | | [iobuf.ml:Poke tests] int64_be:0 | 5.33ns | 2.63% | | [iobuf.ml:Poke tests] int64_le:0 | 4.93ns | 2.43% | | [iobuf.ml:Peek tests] char:0 | 5.50ns | 2.71% | | [iobuf.ml:Peek tests] uint8:0 | 4.68ns | 2.31% | | [iobuf.ml:Peek tests] int8:0 | 4.91ns | 2.42% | | [iobuf.ml:Peek tests] int16_be:0 | 5.19ns | 2.56% | | [iobuf.ml:Peek tests] int16_le:0 | 4.90ns | 2.42% | | [iobuf.ml:Peek tests] uint16_be:0 | 5.17ns | 2.55% | | [iobuf.ml:Peek tests] uint16_le:0 | 5.10ns | 2.51% | | [iobuf.ml:Peek tests] int32_be:0 | 5.17ns | 2.55% | | [iobuf.ml:Peek tests] int32_le:0 | 4.92ns | 2.42% | | [iobuf.ml:Peek tests] uint32_be:0 | 5.45ns | 2.69% | | [iobuf.ml:Peek tests] uint32_le:0 | 5.46ns | 2.69% | | [iobuf.ml:Peek tests] int64_be:0 | 6.61ns | 3.26% | | [iobuf.ml:Peek tests] int64_le:0 | 6.31ns | 3.11% | - Re-implemented `Thread_safe_queue` to improve performance and reduce allocation. The new implementation requires 3 words per element, down from the 7 words required by the old implementation. The new implementation pools elements so that they can be reused, so there is no allocation in steady-state use. The new implementation has `dequeue_exn` rather than `dequeue`, so that one can dequeue without allocating 2 words. Eliminated `create'`. One should just use `create` and explicit calls to `enqueue` and `dequeue_exn`. Eliminated `dequeue_until_empty`. One should use an explicit while loop guarded by `length` and using `dequeue_exn`. Moved `Thread_safe_queue` from `Core_kernel` to `Core`, since it's thread related. All in, there is now no allocation in a steady-state usage of enqueueing and dequeueing elements, as opposed to 9 words per `enqueue+dequeue` in the old implementation. This reduces the cost from `enqueue+dequeue` taking 166-216ns to `enqueue+dequeue_exn` taking 48-82ns (plus eliminating gc impacts). Here are some `BENCH` results, the first table being the old implementation, and the second table the new. | Name | Time/Run | mWd/Run | mjWd/Run | |------------------------------------------------------------|----------|---------|----------| | [thread_safe_queue.ml] enqueue + dequeue of immediate | 183.89ns | 9.00w | 7.02w | | [thread_safe_queue.ml] enqueue + dequeue of young object | 216.69ns | 11.00w | 9.01w | | [thread_safe_queue.ml] enqueue + dequeue_exn of old object | 166.75ns | 9.00w | 7.02w | | Name | Time/Run | mWd/Run | |--------------------------------------------------------------|----------|---------| | [thread_safe_queue.ml] enqueue + dequeue_exn of immediate | 48.20ns | | | [thread_safe_queue.ml] enqueue + dequeue_exn of young object | 81.96ns | 2.00w | | [thread_safe_queue.ml] enqueue + dequeue_exn of old object | 48.30ns | | - Changed `{Bigstring,Iobuf}.recvmmsg_assume_fd_is_nonblocking`, when no message is available, to return a negative number rather than raise. This was done for performance reasons, because raising an exception is expensive, due to the stashing of the backtrace and the string creation. - Added `Iobuf.unsafe_resize`. - Changed `Bigstring.blit` so that it doesn't release the OCaml lock on `map_file` bigstrings. The old behavior of releasing the lock for blits of (small) bigstrings involving mmapped files was problematic and inconsistent. Its cost is high, and fundamentally any access to a mapped bigstring could cause some level of blocking. - Added time-related `Arg_type.t` values to `Command.Spec`. - Added module `Type_immediacy`, which has witnesses that express whether a type's values are always, sometimes, or never immediate. This code used to be in the `Typerep_immediate` library in typerep. ## core_kernel - Added inline benchmarks for `Array` Hera are some of the results from the new benchmarks, with some indexed tests dropped. | Name | Time/Run | mWd/Run | mjWd/Run | |-----------------------------------------------------|-------------|---------|-----------| | [core_array.ml:Alloc] create:0 | 13.65ns | | | | [core_array.ml:Alloc] create:100 | 99.83ns | 101.00w | | | [core_array.ml:Alloc] create:255 | 201.32ns | 256.00w | | | [core_array.ml:Alloc] create:256 | 1_432.43ns | | 257.00w | | [core_array.ml:Alloc] create:1000 | 5_605.58ns | | 1_001.01w | | [core_array.ml:Blit.Poly] blit (tuple):10 | 87.10ns | | | | [core_array.ml:Blit.Poly] blito (tuple):10 | 112.14ns | 2.00w | | | [core_array.ml:Blit.Poly] blit (int):10 | 85.25ns | | | | [core_array.ml:Blit.Poly] blito (int):10 | 107.23ns | 2.00w | | | [core_array.ml:Blit.Poly] blit (float):10 | 84.71ns | | | | [core_array.ml:Blit.Poly] blito (float):10 | 86.71ns | 2.00w | | | [core_array.ml:Blit.Int] blit:10 | 19.77ns | | | | [core_array.ml:Blit.Int] blito:10 | 23.54ns | 2.00w | | | [core_array.ml:Blit.Float] blit:10 | 19.87ns | | | | [core_array.ml:Blit.Float] blito:10 | 24.12ns | 2.00w | | | [core_array.ml:Is empty] Polymorphic '=' | 18.21ns | | | | [core_array.ml:Is empty] Array.equal | 8.08ns | 6.00w | | | [core_array.ml:Is empty] phys_equal | 2.98ns | | | | [core_array.ml:Is empty] Array.is_empty (empty) | 2.98ns | | | | [core_array.ml:Is empty] Array.is_empty (non-empty) | 3.00ns | | | - Moved `Thread_safe_queue` to core - Generalized the type of `Exn.handle_uncaught_and_exit` to `(unit -> 'a) -> 'a`. In the case where `handle_uncaught_and_exit` succeeds, it can return the value of the supplied function. It's type had been: ```ocaml val handle_uncaught_and_exit : (unit -> never_returns) -> never_returns ``` - Added `Int.round*` functions for rounding to a multiple of another int. ```ocaml val round : ?dir:[ `Zero | `Nearest | `Up | `Down ] -> t -> to_multiple_of:t -> t val round_towards_zero : t -> to_multiple_of:t -> t val round_down : t -> to_multiple_of:t -> t val round_up : t -> to_multiple_of:t -> t val round_nearest : t -> to_multiple_of:t -> t ``` These functions were added to `Int_intf.S`, implemented by `Int`, `Nativeint`, `Int32`, and `Int64`. Various int modules were also lightly refactored to make it easier in the future to implement common operators available for all modules implementing the int interface via a functor to share the code. ## jenga - Improved the error message when the same library is defined multiple times. - Fixed an issue where jenga sometimes would sometimes complain about a self cycle when `foo.ml` uses a module `Foo`. - With `-no-notifiers`, jenga doesn't use `inotify` to watch for file changes. This is useful for linting `jengaroot.ml`. - Allowed writing jenga rules which restrict dependencies from an initial conservative approximation to a more accurate set discovered after an action is run ## re2 - Added `Re2.Std`, so that one should now use `Re2` via `module Re2 = Re2.Std.Re2`. At some future date, we will rename the `Regex` module to `Re2_internal` to force the stragglers to update to the new convention. ## typerep - Renamed `Typerep` libraries for more consistency with the rest of the framework. ```ocaml Typerep_kernel --> Typerep_lib Typerep_core --> Typerep_extended Typereplib --> Typerep_experimental ``` -- Jeremie Dimino, for the Core team