caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Announcing Sek, an efficient implementation of sequences
@ 2020-04-04 10:11 François Pottier
  2020-04-04 13:17 ` Gabriel Scherer
  0 siblings, 1 reply; 7+ messages in thread
From: François Pottier @ 2020-04-04 10:11 UTC (permalink / raw)
  To: caml users


Fellow OCaml users,

We are pleased to announce the first release of Sek, an OCaml library that
offers an efficient implementation of sequences.

The library offers both ephemeral (mutable) sequences and persistent
(immutable) sequences, and offers constant-time conversions between these
flavors.

It supports all of the standard operations on stacks, queues, deques (e.g.
push, pop at either end), catenable sequences (concat, split), and random
access sequences (get, set).

Data is stored internally in chunks (fixed-capacity arrays),
which is why this data structure is known as a chunK SEquence.

It is intended to achieve excellent time complexity and memory usage.

This is an initial release. The library has not been tested in production,
but has received extensive unit testing, via afl-fuzz and ocaml+afl --
which are remarkably effective tools, by the way!

This is work in progress; more features, such as iterators, will be added
in the future.

To install Sek, just type

   opam update && opam install sek

Documentation is at

   http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html

Feedback is welcome!

--
Arthur Charguéraud
François Pottier
with contributions by Émilie Guermeur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
  2020-04-04 10:11 [Caml-list] Announcing Sek, an efficient implementation of sequences François Pottier
@ 2020-04-04 13:17 ` Gabriel Scherer
  2020-04-04 13:29   ` orbifx
  2020-04-05 11:35   ` François Pottier
  0 siblings, 2 replies; 7+ messages in thread
From: Gabriel Scherer @ 2020-04-04 13:17 UTC (permalink / raw)
  To: François Pottier, Arthur Charguéraud; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 5623 bytes --]

Looks very nice, thanks! Some feedback from reading the documentation.

1. gitlab.inria.fr makes it difficult for non-INRIA people to contribute
(they cannot open a pull/merge request). It is a mistake to use it for
publicly-distributed software. (In theory it would be fine if you never
needed outside help, if you were the perfect maintainer whose opam, dune
and whatever stuff is always up-to-date. In practice no one is.)

2.
> Instead, one should first build an ephemeral sequence through a series of
ephemeral push operations,
> then convert it to a persistent sequence. This way, the construction cost
is only *O(n+K)*.
I would guess that it is O(T + K), or n has to be defined first.

3. The list of function complexities is hard to digest. Would it be
possible to have, in addition or replacement, the complexity of each
operation mentioned in the documentation of the operation? (This makes it
easy to lookup the complexity of a given function)

4. I had never seen the trick of single-value functor parameters used as
labelled arguments. It is interesting. Another approach would be to have a
single Parameters module with one value for each runtime parameter. This
would be syntactically lighter, and I cannot think of any downside right
now. (One minor upside in general is that one can add new runtime
parameters without changing the functor abstraction code / formal parameter
list.)

5.
> Furthermore, in a series of push operations, starting from an empty
sequence, the amortized cost of one push operation is only *O(1)*.
If I undestand, this sentence is in tension with the sentence mentioning a
quadratic behavior up to T. Both are true, but they give
contradictory advise on how to write your code. Maybe you could recall here
that one should only take advantage of this property for sequences that are
expected to be much larger than T?

(In general there is little guidance in the documentation on how to set T,
whereas there is a number given for K. For example, if I don't know the
data size in advance, is it reasonable to use T=2? Maybe you could
benchmark a good choice of T for some workload depending on the average
size of your sequences (N=10^2, 10^4, 10^6), and let us know.

6. The natural mental model for snapshot_and_clear is a transfer of unique
ownership (into shareable ownership). If this is the model I am using in my
program, then performing any operation after the snapshot would be a
programming mistake. Your _and_clear specification avoids any undefined or
surprising/undocumented behavior in the case that happens, but it may still
hide my programming mistake. Could there be a control-transfer version that
"poisons" the ephemeral structure, so that any later operation on it raises
an exception?

On Sat, Apr 4, 2020 at 12:12 PM François Pottier <francois.pottier@inria.fr>
wrote:

>
> Fellow OCaml users,
>
> We are pleased to announce the first release of Sek, an OCaml library that
> offers an efficient implementation of sequences.
>
> The library offers both ephemeral (mutable) sequences and persistent
> (immutable) sequences, and offers constant-time conversions between these
> flavors.
>
> It supports all of the standard operations on stacks, queues, deques (e.g.
> push, pop at either end), catenable sequences (concat, split), and random
> access sequences (get, set).
>
> Data is stored internally in chunks (fixed-capacity arrays),
> which is why this data structure is known as a chunK SEquence.
>
> It is intended to achieve excellent time complexity and memory usage.
>
> This is an initial release. The library has not been tested in production,
> but has received extensive unit testing, via afl-fuzz and ocaml+afl --
> which are remarkably effective tools, by the way!
>
> This is work in progress; more features, such as iterators, will be added
> in the future.
>
> To install Sek, just type
>
>    opam update && opam install sek
>
> Documentation is at
>
>    http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html
>
> Feedback is welcome!
>
> --
> Arthur Charguéraud
> François Pottier
> with contributions by Émilie Guermeur
>

On Sat, Apr 4, 2020 at 12:12 PM François Pottier <francois.pottier@inria.fr>
wrote:

>
> Fellow OCaml users,
>
> We are pleased to announce the first release of Sek, an OCaml library that
> offers an efficient implementation of sequences.
>
> The library offers both ephemeral (mutable) sequences and persistent
> (immutable) sequences, and offers constant-time conversions between these
> flavors.
>
> It supports all of the standard operations on stacks, queues, deques (e.g.
> push, pop at either end), catenable sequences (concat, split), and random
> access sequences (get, set).
>
> Data is stored internally in chunks (fixed-capacity arrays),
> which is why this data structure is known as a chunK SEquence.
>
> It is intended to achieve excellent time complexity and memory usage.
>
> This is an initial release. The library has not been tested in production,
> but has received extensive unit testing, via afl-fuzz and ocaml+afl --
> which are remarkably effective tools, by the way!
>
> This is work in progress; more features, such as iterators, will be added
> in the future.
>
> To install Sek, just type
>
>    opam update && opam install sek
>
> Documentation is at
>
>    http://cambium.inria.fr/~fpottier/sek/doc/sek/Sek/index.html
>
> Feedback is welcome!
>
> --
> Arthur Charguéraud
> François Pottier
> with contributions by Émilie Guermeur
>

[-- Attachment #2: Type: text/html, Size: 6903 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
  2020-04-04 13:17 ` Gabriel Scherer
@ 2020-04-04 13:29   ` orbifx
  2020-04-04 13:56     ` Gabriel Scherer
  2020-04-05 11:35   ` François Pottier
  1 sibling, 1 reply; 7+ messages in thread
From: orbifx @ 2020-04-04 13:29 UTC (permalink / raw)
  To: caml-list

On 04/04/2020 14:17, Gabriel Scherer wrote:
> 1. gitlab.inria.fr <http://gitlab.inria.fr> makes it difficult for non-INRIA people to contribute (they cannot open a pull/merge request).

Only for those who don't know how to contribute without "opening" a pull request. Which is otherwise as simple as sending a message with: here is my repo URL, please pull; or here is my patch attachment.

> It is a mistake to use it for publicly-distributed software.

I think not. I find it positively refreshing to see something outwith Gitlab's dominance. So it's a matter of opinion.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
  2020-04-04 13:29   ` orbifx
@ 2020-04-04 13:56     ` Gabriel Scherer
  2020-04-04 14:04       ` orbifx
  2020-04-07 22:06       ` Gerd Stolpmann
  0 siblings, 2 replies; 7+ messages in thread
From: Gabriel Scherer @ 2020-04-04 13:56 UTC (permalink / raw)
  To: orbifx; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

I don't have strong opinion against mail-based contributions (I suspect
they are perceived as less beginner-friendly by newer contributors who have
grown up in the web-interface-based-world, but that's a discussion for
another time). The problem here is that a select group of contributors can
use a certain contribution procedure (here: Merge Requests), and external
contributors have to use a different approach (email or issue-based), or
ask permission to use the first-group's approach. In this setting the
second approach is *objectively* second-class, and this difference
objectively sucks.

There are other places that host Gitlab instances without this restriction.

The restriction is originally caused by design issues in Gitlab, which ties
"sending a Merge Request" with "having a fork on the same server" (which
paranoid sysadmins want to forbid to public users as it may create
resource-consumption issues). Gitlab is working on making it possible to
open a Merge Request by just sending a patch (
https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html#new-merge-request-by-email-core-only
), and maybe some other git-host server will rise to visibility with a
better federation story ( https://sourcehut.org/ ?).

(If you are interested in hosting a web platform for git hosting, I'm told
that gitea ( https://gitea.io/ ) is much easier to deploy than Gitlab.)

On Sat, Apr 4, 2020 at 3:30 PM orbifx <fox@orbitalfox.eu> wrote:

> On 04/04/2020 14:17, Gabriel Scherer wrote:
> > 1. gitlab.inria.fr <http://gitlab.inria.fr> makes it difficult for
> non-INRIA people to contribute (they cannot open a pull/merge request).
>
> Only for those who don't know how to contribute without "opening" a pull
> request. Which is otherwise as simple as sending a message with: here is my
> repo URL, please pull; or here is my patch attachment.
>
> > It is a mistake to use it for publicly-distributed software.
>
> I think not. I find it positively refreshing to see something outwith
> Gitlab's dominance. So it's a matter of opinion.
>
>

[-- Attachment #2: Type: text/html, Size: 2870 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
  2020-04-04 13:56     ` Gabriel Scherer
@ 2020-04-04 14:04       ` orbifx
  2020-04-07 22:06       ` Gerd Stolpmann
  1 sibling, 0 replies; 7+ messages in thread
From: orbifx @ 2020-04-04 14:04 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: caml users

On 04/04/2020 14:56, Gabriel Scherer wrote:
> I don't have strong opinion against mail-based contributions [..]
> There are other places that host Gitlab instances without this restriction.[..] The restriction is originally caused by design issues in Gitlab,

Ow, I see what you mean now.

> (If you are interested in hosting a web platform for git hosting, I'm told that gitea ( https://gitea.io/ ) is much easier to deploy than Gitlab.)

I concur Gitea is easier install and lighter to run than Gitlab.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
  2020-04-04 13:17 ` Gabriel Scherer
  2020-04-04 13:29   ` orbifx
@ 2020-04-05 11:35   ` François Pottier
  1 sibling, 0 replies; 7+ messages in thread
From: François Pottier @ 2020-04-05 11:35 UTC (permalink / raw)
  To: Gabriel Scherer, Arthur Charguéraud; +Cc: caml users


Hi Gabriel,

Thanks for your remarks; I have made a few improvements to the documentation
based on your remarks. In particular, the default value of T is 
currently 64.

On 04/04/2020 15:17, Gabriel Scherer wrote:
> 6. Your _and_clear specification avoids any undefined or
> surprising/undocumented behavior in the case that happens, but it may still
> hide my programming mistake.

This is true. We debated various options (described in NOTES.md) but 
chose the
version whose specification is simplest (no undefined behavior, no 
exception).
Because we would prefer to avoid offering many variants of each 
operation, we
will probably keep things this way.

--
François Pottier
francois.pottier@inria.fr
http://cambium.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Announcing Sek, an efficient implementation of sequences
  2020-04-04 13:56     ` Gabriel Scherer
  2020-04-04 14:04       ` orbifx
@ 2020-04-07 22:06       ` Gerd Stolpmann
  1 sibling, 0 replies; 7+ messages in thread
From: Gerd Stolpmann @ 2020-04-07 22:06 UTC (permalink / raw)
  To: caml-list


[-- Attachment #1.1.1: Type: text/plain, Size: 1690 bytes --]


Am 04.04.20 um 15:56 schrieb Gabriel Scherer:
> (If you are interested in hosting a web platform for git hosting, I'm
> told that gitea ( https://gitea.io/ ) is much easier to deploy than
> Gitlab.)

As I went through this in case of Gitlab: the deployment is very simple
- they have a docker container. One command and it is running. The issue
with a running a public Gitlab instance was that you need to invest some
time to maintain it (it is abused for hosting ads of any kind), and that
is why I stopped doing this.

Gerd



> On Sat, Apr 4, 2020 at 3:30 PM orbifx <fox@orbitalfox.eu
> <mailto:fox@orbitalfox.eu>> wrote:
>
>     On 04/04/2020 14:17, Gabriel Scherer wrote:
>     > 1. gitlab.inria.fr <http://gitlab.inria.fr>
>     <http://gitlab.inria.fr> makes it difficult for non-INRIA people
>     to contribute (they cannot open a pull/merge request).
>
>     Only for those who don't know how to contribute without "opening"
>     a pull request. Which is otherwise as simple as sending a message
>     with: here is my repo URL, please pull; or here is my patch
>     attachment.
>
>     > It is a mistake to use it for publicly-distributed software.
>
>     I think not. I find it positively refreshing to see something
>     outwith Gitlab's dominance. So it's a matter of opinion.
>
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #1.1.2: Type: text/html, Size: 3410 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-07 22:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-04 10:11 [Caml-list] Announcing Sek, an efficient implementation of sequences François Pottier
2020-04-04 13:17 ` Gabriel Scherer
2020-04-04 13:29   ` orbifx
2020-04-04 13:56     ` Gabriel Scherer
2020-04-04 14:04       ` orbifx
2020-04-07 22:06       ` Gerd Stolpmann
2020-04-05 11:35   ` François Pottier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).