caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: "François Pottier" <francois.pottier@inria.fr>,
	"Arthur Charguéraud" <arthur@chargueraud.org>
Cc: caml users <caml-list@inria.fr>
Subject: Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
Date: Sat, 4 Apr 2020 15:17:36 +0200	[thread overview]
Message-ID: <CAPFanBH3e5Kbj6+vpzexqDPCaUEy3Vqm6iyBdkDNoHFagv8F5w@mail.gmail.com> (raw)
In-Reply-To: <e3069702-85bd-73a5-8ae0-8b6459f856cf@inria.fr>

[-- Attachment #1: Type: text/plain, Size: 5623 bytes --]

Looks very nice, thanks! Some feedback from reading the documentation.

1. gitlab.inria.fr makes it difficult for non-INRIA people to contribute
(they cannot open a pull/merge request). It is a mistake to use it for
publicly-distributed software. (In theory it would be fine if you never
needed outside help, if you were the perfect maintainer whose opam, dune
and whatever stuff is always up-to-date. In practice no one is.)

2.
> Instead, one should first build an ephemeral sequence through a series of
ephemeral push operations,
> then convert it to a persistent sequence. This way, the construction cost
is only *O(n+K)*.
I would guess that it is O(T + K), or n has to be defined first.

3. The list of function complexities is hard to digest. Would it be
possible to have, in addition or replacement, the complexity of each
operation mentioned in the documentation of the operation? (This makes it
easy to lookup the complexity of a given function)

4. I had never seen the trick of single-value functor parameters used as
labelled arguments. It is interesting. Another approach would be to have a
single Parameters module with one value for each runtime parameter. This
would be syntactically lighter, and I cannot think of any downside right
now. (One minor upside in general is that one can add new runtime
parameters without changing the functor abstraction code / formal parameter
list.)

5.
> Furthermore, in a series of push operations, starting from an empty
sequence, the amortized cost of one push operation is only *O(1)*.
If I undestand, this sentence is in tension with the sentence mentioning a
quadratic behavior up to T. Both are true, but they give
contradictory advise on how to write your code. Maybe you could recall here
that one should only take advantage of this property for sequences that are
expected to be much larger than T?

(In general there is little guidance in the documentation on how to set T,
whereas there is a number given for K. For example, if I don't know the
data size in advance, is it reasonable to use T=2? Maybe you could
benchmark a good choice of T for some workload depending on the average
size of your sequences (N=10^2, 10^4, 10^6), and let us know.

6. The natural mental model for snapshot_and_clear is a transfer of unique
ownership (into shareable ownership). If this is the model I am using in my
program, then performing any operation after the snapshot would be a
programming mistake. Your _and_clear specification avoids any undefined or
surprising/undocumented behavior in the case that happens, but it may still
hide my programming mistake. Could there be a control-transfer version that
"poisons" the ephemeral structure, so that any later operation on it raises
an exception?

On Sat, Apr 4, 2020 at 12:12 PM François Pottier <francois.pottier@inria.fr>
wrote:

>
> Fellow OCaml users,
>
> We are pleased to announce the first release of Sek, an OCaml library that
> offers an efficient implementation of sequences.
>
> The library offers both ephemeral (mutable) sequences and persistent
> (immutable) sequences, and offers constant-time conversions between these
> flavors.
>
> It supports all of the standard operations on stacks, queues, deques (e.g.
> push, pop at either end), catenable sequences (concat, split), and random
> access sequences (get, set).
>
> Data is stored internally in chunks (fixed-capacity arrays),
> which is why this data structure is known as a chunK SEquence.
>
> It is intended to achieve excellent time complexity and memory usage.
>
> This is an initial release. The library has not been tested in production,
> but has received extensive unit testing, via afl-fuzz and ocaml+afl --
> which are remarkably effective tools, by the way!
>
> This is work in progress; more features, such as iterators, will be added
> in the future.
>
> To install Sek, just type
>
>    opam update && opam install sek
>
> Documentation is at
>
>    http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html
>
> Feedback is welcome!
>
> --
> Arthur Charguéraud
> François Pottier
> with contributions by Émilie Guermeur
>

On Sat, Apr 4, 2020 at 12:12 PM François Pottier <francois.pottier@inria.fr>
wrote:

>
> Fellow OCaml users,
>
> We are pleased to announce the first release of Sek, an OCaml library that
> offers an efficient implementation of sequences.
>
> The library offers both ephemeral (mutable) sequences and persistent
> (immutable) sequences, and offers constant-time conversions between these
> flavors.
>
> It supports all of the standard operations on stacks, queues, deques (e.g.
> push, pop at either end), catenable sequences (concat, split), and random
> access sequences (get, set).
>
> Data is stored internally in chunks (fixed-capacity arrays),
> which is why this data structure is known as a chunK SEquence.
>
> It is intended to achieve excellent time complexity and memory usage.
>
> This is an initial release. The library has not been tested in production,
> but has received extensive unit testing, via afl-fuzz and ocaml+afl --
> which are remarkably effective tools, by the way!
>
> This is work in progress; more features, such as iterators, will be added
> in the future.
>
> To install Sek, just type
>
>    opam update && opam install sek
>
> Documentation is at
>
>    http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html
>
> Feedback is welcome!
>
> --
> Arthur Charguéraud
> François Pottier
> with contributions by Émilie Guermeur
>

[-- Attachment #2: Type: text/html, Size: 6903 bytes --]

  reply	other threads:[~2020-04-04 13:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-04 10:11 François Pottier
2020-04-04 13:17 ` Gabriel Scherer [this message]
2020-04-04 13:29   ` orbifx
2020-04-04 13:56     ` Gabriel Scherer
2020-04-04 14:04       ` orbifx
2020-04-07 22:06       ` Gerd Stolpmann
2020-04-05 11:35   ` François Pottier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPFanBH3e5Kbj6+vpzexqDPCaUEy3Vqm6iyBdkDNoHFagv8F5w@mail.gmail.com \
    --to=gabriel.scherer@gmail.com \
    --cc=arthur@chargueraud.org \
    --cc=caml-list@inria.fr \
    --cc=francois.pottier@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).