caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
@ 2021-10-09  4:05 Christophe Raffalli
  2021-10-09  8:58 ` Mario Pereira
  2021-10-22  4:42 ` Kazuhiko Sakaguchi
  0 siblings, 2 replies; 6+ messages in thread
From: Christophe Raffalli @ 2021-10-09  4:05 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

Hello,

I just pushed on https://github.com/craff/block_sort.git an old piece of code: a
stable sorting algorithm which is in O(n ln k) where n is the size of the list
and k the number of changes of direction.  It is often much faster than
List.sort, on my tests, never more than 10% slower.  The tests are available on
github.

If people are interested I could provide a version for arrays.

A challenge would be to provide a O(n ln k) sorting on list that is always
faster than List.sort. Any ideas ?

Cheers,
Christophe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
  2021-10-09  4:05 [Caml-list] O(n ln k) sorting for ocaml on github and a challenge Christophe Raffalli
@ 2021-10-09  8:58 ` Mario Pereira
  2021-10-09 19:59   ` Christophe Raffalli
  2021-10-22  4:42 ` Kazuhiko Sakaguchi
  1 sibling, 1 reply; 6+ messages in thread
From: Mario Pereira @ 2021-10-09  8:58 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 281 bytes --]

Hi Christophe,

Your implementation reminds me very much of TimSort (for an OCaml
implementation, see for instance
https://github.com/LesBoloss-es/sorting/blob/master/src/list/timsort.ml).
This is also a stable algorithm.

Was TimSort an inspiration?

Cheers
--
Mário

[-- Attachment #2: Type: text/html, Size: 570 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
  2021-10-09  8:58 ` Mario Pereira
@ 2021-10-09 19:59   ` Christophe Raffalli
  2021-10-11 11:46     ` Niols
  0 siblings, 1 reply; 6+ messages in thread
From: Christophe Raffalli @ 2021-10-09 19:59 UTC (permalink / raw)
  To: Mario Pereira; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2907 bytes --]

On 21-10-09 09:58:12, Mario Pereira wrote:
> Hi Christophe,
>
> Your implementation reminds me very much of TimSort (for an OCaml
> implementation, see for instance https://github.com/LesBoloss-es/sorting/blob/
> master/src/list/timsort.ml). This is also a stable algorithm.

Hi Mário,

Yes the idea is similar, but

- I use a reference to the list for the sorted/reverse-sorted block I build in
  first phase (like Timsort on arrays, which actually is stable, in place and as
  good complexity, but you have to get the balancing right, see below)

- in the second phase, I use a merge, returning sorted list at even depth and
rev-sorted at odd depth (like OCaml's List.sort). This avoid the Asc/Desc
constructor and the List.rev.

- the balancement is ensured by a simple arithmetic computation, not by a stack
and an invariant on sizes (probably building a balanced binary tree with the
initial block instead of a list could be a bit better, I am not sure).

- Last but not least: the code you point seems broken somewhere, it does not
  ends on large lists (probably quadratic because the balancing with size is
  probably wrong. cf the paragraph "bug" on wikipedia about Timsort. Anyway, the
  code you point is probably optimised for arrays.

> Was TimSort an inspiration?

The inspiration was a paper I read long ago about the O(n ln n) not being the
best we can do for lists, as O(n ln k) can be achieved where k is the number of
change of direction.

My work aims at being a replacement for OCaml sort (for list currently). Here is
the timing I get:

Correctness test passed
Stability test passed
Random lists:
          random: tf = 1.44, tg = 1.39, factor = 1.04x, gain = 3.70%
    random small: tf = 1.10, tg = 1.19, factor = 0.92x, gain = -8.27%
Worst cases:
          worst1: tf = 0.83, tg = 0.71, factor = 1.17x, gain = 14.21%
          worst2: tf = 0.65, tg = 0.70, factor = 0.93x, gain = -7.17%
Sorted (partially) lists:
          sorted: tf = 0.65, tg = 0.01, factor = 97.86x, gain = 98.98%
        reversed: tf = 0.65, tg = 0.09, factor = 7.62x, gain = 86.88%
      sorted@rev: tf = 0.65, tg = 0.21, factor = 3.14x, gain = 68.19%
      rev@sorted: tf = 0.65, tg = 0.21, factor = 3.13x, gain = 68.03%
Shuffled lists (permute k times 2 elements in a sorted list):
      shuffle 10: tf = 0.66, tg = 0.41, factor = 1.61x, gain = 38.01%
     shuffle 100: tf = 0.64, tg = 0.51, factor = 1.25x, gain = 20.28%
    shuffle 1000: tf = 0.63, tg = 0.56, factor = 1.14x, gain = 11.94%
   shuffle 10000: tf = 0.64, tg = 0.61, factor = 1.04x, gain = 4.30%

factor > 1 means Block_sort is faster that List.sort. Remark that in the case of
sorted list, my algorithm returns the original list using constant memory.

Cheers,
Christophe

PS: I probably will try something much more like Timsort on arrays later ...

> Cheers
> --
> Mário

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
  2021-10-09 19:59   ` Christophe Raffalli
@ 2021-10-11 11:46     ` Niols
  2021-10-11 21:27       ` Mario Pereira
  0 siblings, 1 reply; 6+ messages in thread
From: Niols @ 2021-10-11 11:46 UTC (permalink / raw)
  To: caml-list

Hi Mário, Christophe,

I'm glad some other people are also doing that kind of things!

On 10/9/21 9:59 PM, Christophe Raffalli wrote:
> On 21-10-09 09:58:12, Mario Pereira wrote:
>> Your implementation reminds me very much of TimSort (for an OCaml
>> implementation, see for instance https://github.com/LesBoloss-es/sorting/blob/
>> master/src/list/timsort.ml). This is also a stable algorithm.
> 
> Yes the idea is similar, but
> 
> - I use a reference to the list for the sorted/reverse-sorted block I build in
>    first phase (like Timsort on arrays, which actually is stable, in place and as
>    good complexity, but you have to get the balancing right, see below)
> 
> - in the second phase, I use a merge, returning sorted list at even depth and
> rev-sorted at odd depth (like OCaml's List.sort). This avoid the Asc/Desc
> constructor and the List.rev.
> 
> - the balancement is ensured by a simple arithmetic computation, not by a stack
> and an invariant on sizes (probably building a balanced binary tree with the
> initial block instead of a list could be a bit better, I am not sure).
> 
> - Last but not least: the code you point seems broken somewhere, it does not
>    ends on large lists (probably quadratic because the balancing with size is
>    probably wrong. cf the paragraph "bug" on wikipedia about Timsort. Anyway, the
>    code you point is probably optimised for arrays.

The repository in question is very experimental. Our implementation of 
TimSort on list is indeed broken. I can't remember if it contains the 
famous TimSort bug, but we are aware that it exists and know of two ways 
to fix it. We quickly switched to working on arrays, though, and were 
planning on coming back to list eventually.

Our implementation of TimSort on arrays can be found here:

 
https://github.com/LesBoloss-es/sorting/blob/master/src/array/timsort.ml

It does include a fix of the bug. We have tested it successfully on a 
rather wide range of arrays.

We have also implemented PeekSort and PowerSort from “Nearly-Optimal 
Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to 
Existing Runs” by J. Ian Munro and Sebastian Wild. cf:

     https://www.wild-inter.net/publications/munro-wild-2018

They use a number of comparisons similar to TimSort. Our implementations 
are copied directly from the implementation of the authors. There is a 
lot of room for improvement to make the runtime much slower but we 
haven't gotten to that yet.

PowerSort, in particular, is designed to be efficient in cache. We 
wondered whether it would adapt nicely to lists but it is also only 
future work.

> My work aims at being a replacement for OCaml sort (for list currently). Here is
> the timing I get:
> 
> Correctness test passed
> Stability test passed
> Random lists:
>            random: tf = 1.44, tg = 1.39, factor = 1.04x, gain = 3.70%
>      random small: tf = 1.10, tg = 1.19, factor = 0.92x, gain = -8.27%
> Worst cases:
>            worst1: tf = 0.83, tg = 0.71, factor = 1.17x, gain = 14.21%
>            worst2: tf = 0.65, tg = 0.70, factor = 0.93x, gain = -7.17%
> Sorted (partially) lists:
>            sorted: tf = 0.65, tg = 0.01, factor = 97.86x, gain = 98.98%
>          reversed: tf = 0.65, tg = 0.09, factor = 7.62x, gain = 86.88%
>        sorted@rev: tf = 0.65, tg = 0.21, factor = 3.14x, gain = 68.19%
>        rev@sorted: tf = 0.65, tg = 0.21, factor = 3.13x, gain = 68.03%
> Shuffled lists (permute k times 2 elements in a sorted list):
>        shuffle 10: tf = 0.66, tg = 0.41, factor = 1.61x, gain = 38.01%
>       shuffle 100: tf = 0.64, tg = 0.51, factor = 1.25x, gain = 20.28%
>      shuffle 1000: tf = 0.63, tg = 0.56, factor = 1.14x, gain = 11.94%
>     shuffle 10000: tf = 0.64, tg = 0.61, factor = 1.04x, gain = 4.30%
> 
> factor > 1 means Block_sort is faster that List.sort. Remark that in the case of
> sorted list, my algorithm returns the original list using constant memory.

I guess we had similar goals. I do think it can be nice to have a 
sorting algorithm in the standard library that behaves well on lists 
that have some order in them, as such lists occur quite often in real life.

We also obtained similar results when comparing to the standard library 
(for arrays): On purely random lists, our TimSort is ~20% slower and 
uses some more comparisons. As soon as the lists contain some order, 
TimSort beats the standard library by far both in term of runtime and 
comparisons.

We were planning on starting to work again on this project soon-ish 
(hopefully within the coming month), and in particular we wanted to 
optimise a bit more PeekSort and PowerSort and then package the whole 
thing in a library. We'll follow closely what you are doing so as to not 
clash with your work!

Cheers,
Niols

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
  2021-10-11 11:46     ` Niols
@ 2021-10-11 21:27       ` Mario Pereira
  0 siblings, 0 replies; 6+ messages in thread
From: Mario Pereira @ 2021-10-11 21:27 UTC (permalink / raw)
  To: Niols; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

Hi Niols,

The repository in question is very experimental. Our implementation of
> TimSort on list is indeed broken. I can't remember if it contains the
> famous TimSort bug, but we are aware that it exists and know of two ways
> to fix it. We quickly switched to working on arrays, though, and were
> planning on coming back to list eventually.
>
> Our implementation of TimSort on arrays can be found here:
>
>
> https://github.com/LesBoloss-es/sorting/blob/master/src/array/timsort.ml
>
> It does include a fix of the bug. We have tested it successfully on a
> rather wide range of arrays.
>

So interesting that you mentioned this :) I actually came across your
implementation as I was looking for an OCaml implementation of TimSort that
I could use as a test stress for Cameleer (a deductive verification tool
for OCaml-written code).

I didn't get that much far (cf, the in-progress proof is here:
https://github.com/ocaml-gospel/cameleer/blob/master/examples/in_progress/timsort.ml),
but I was planning to get back to it very soon. Maybe I will take a look
first at the implementation on arrays :)

Cheers
--
Mário

[-- Attachment #2: Type: text/html, Size: 1720 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
  2021-10-09  4:05 [Caml-list] O(n ln k) sorting for ocaml on github and a challenge Christophe Raffalli
  2021-10-09  8:58 ` Mario Pereira
@ 2021-10-22  4:42 ` Kazuhiko Sakaguchi
  1 sibling, 0 replies; 6+ messages in thread
From: Kazuhiko Sakaguchi @ 2021-10-22  4:42 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

Hi Christophe,

If I understand correctly, that's called "smooth sort" problem in
Paulson's "ML for the Working Programmer" book:
https://www.cl.cam.ac.uk/~lp15/MLbook/PDF/chapter3.pdf

I have tail-recursive smooth mergesort implemented (and verified) in Coq here:
https://github.com/pi8027/stablesort/blob/04a1441d36a379c31ea0221ace27ead1e93fc39e/theories/stablesort.v#L662-L685
Indeed, I can extract OCaml code from the Coq implementation, although
it is hard to read:
https://github.com/pi8027/stablesort/blob/benchmark-results/benchmark/ocaml/mergesort_coq.ml#L1285-L1364
Here are the benchmark results. Although I didn't use native integers,
my implementation seems to outperform List.stable_sort of OCaml:
https://github.com/pi8027/stablesort/blob/benchmark-results/output/main.pdf

This does not answer the initial question (challenge) since my
implementation targets lists rather than arrays. But, I think that the
`merge_sort_push` and `merge_sort_pop` functions are worth
understanding. It was initially devised to work around the termination
checker of Coq, but it is also useful to prevent stack overflow.
https://sympa.inria.fr/sympa/arc/coq-club/2009-04/msg00040.html

I hope it helps.

Best,
Kazuhiko

2021年10月9日(土) 13:06 Christophe Raffalli <christophe@raffalli.eu>:
>
> Hello,
>
> I just pushed on https://github.com/craff/block_sort.git an old piece of code: a
> stable sorting algorithm which is in O(n ln k) where n is the size of the list
> and k the number of changes of direction.  It is often much faster than
> List.sort, on my tests, never more than 10% slower.  The tests are available on
> github.
>
> If people are interested I could provide a version for arrays.
>
> A challenge would be to provide a O(n ln k) sorting on list that is always
> faster than List.sort. Any ideas ?
>
> Cheers,
> Christophe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-22  4:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09  4:05 [Caml-list] O(n ln k) sorting for ocaml on github and a challenge Christophe Raffalli
2021-10-09  8:58 ` Mario Pereira
2021-10-09 19:59   ` Christophe Raffalli
2021-10-11 11:46     ` Niols
2021-10-11 21:27       ` Mario Pereira
2021-10-22  4:42 ` Kazuhiko Sakaguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).