caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Christophe Raffalli <christophe@raffalli.eu>
To: Mario Pereira <mjp.pereira@fct.unl.pt>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge
Date: Sat, 9 Oct 2021 09:59:10 -1000	[thread overview]
Message-ID: <20211009195910.mqivg4tsafiyb3tr@oulala> (raw)
In-Reply-To: <CAJ98AgmGGR-JaciOfMB=9wOt5im4qPg--EBrt5AJOvBg7b9Niw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2907 bytes --]

On 21-10-09 09:58:12, Mario Pereira wrote:
> Hi Christophe,
>
> Your implementation reminds me very much of TimSort (for an OCaml
> implementation, see for instance https://github.com/LesBoloss-es/sorting/blob/
> master/src/list/timsort.ml). This is also a stable algorithm.

Hi Mário,

Yes the idea is similar, but

- I use a reference to the list for the sorted/reverse-sorted block I build in
  first phase (like Timsort on arrays, which actually is stable, in place and as
  good complexity, but you have to get the balancing right, see below)

- in the second phase, I use a merge, returning sorted list at even depth and
rev-sorted at odd depth (like OCaml's List.sort). This avoid the Asc/Desc
constructor and the List.rev.

- the balancement is ensured by a simple arithmetic computation, not by a stack
and an invariant on sizes (probably building a balanced binary tree with the
initial block instead of a list could be a bit better, I am not sure).

- Last but not least: the code you point seems broken somewhere, it does not
  ends on large lists (probably quadratic because the balancing with size is
  probably wrong. cf the paragraph "bug" on wikipedia about Timsort. Anyway, the
  code you point is probably optimised for arrays.

> Was TimSort an inspiration?

The inspiration was a paper I read long ago about the O(n ln n) not being the
best we can do for lists, as O(n ln k) can be achieved where k is the number of
change of direction.

My work aims at being a replacement for OCaml sort (for list currently). Here is
the timing I get:

Correctness test passed
Stability test passed
Random lists:
          random: tf = 1.44, tg = 1.39, factor = 1.04x, gain = 3.70%
    random small: tf = 1.10, tg = 1.19, factor = 0.92x, gain = -8.27%
Worst cases:
          worst1: tf = 0.83, tg = 0.71, factor = 1.17x, gain = 14.21%
          worst2: tf = 0.65, tg = 0.70, factor = 0.93x, gain = -7.17%
Sorted (partially) lists:
          sorted: tf = 0.65, tg = 0.01, factor = 97.86x, gain = 98.98%
        reversed: tf = 0.65, tg = 0.09, factor = 7.62x, gain = 86.88%
      sorted@rev: tf = 0.65, tg = 0.21, factor = 3.14x, gain = 68.19%
      rev@sorted: tf = 0.65, tg = 0.21, factor = 3.13x, gain = 68.03%
Shuffled lists (permute k times 2 elements in a sorted list):
      shuffle 10: tf = 0.66, tg = 0.41, factor = 1.61x, gain = 38.01%
     shuffle 100: tf = 0.64, tg = 0.51, factor = 1.25x, gain = 20.28%
    shuffle 1000: tf = 0.63, tg = 0.56, factor = 1.14x, gain = 11.94%
   shuffle 10000: tf = 0.64, tg = 0.61, factor = 1.04x, gain = 4.30%

factor > 1 means Block_sort is faster that List.sort. Remark that in the case of
sorted list, my algorithm returns the original list using constant memory.

Cheers,
Christophe

PS: I probably will try something much more like Timsort on arrays later ...

> Cheers
> --
> Mário

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-10-09 19:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-09  4:05 Christophe Raffalli
2021-10-09  8:58 ` Mario Pereira
2021-10-09 19:59   ` Christophe Raffalli [this message]
2021-10-11 11:46     ` Niols
2021-10-11 21:27       ` Mario Pereira
2021-10-22  4:42 ` Kazuhiko Sakaguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211009195910.mqivg4tsafiyb3tr@oulala \
    --to=christophe@raffalli.eu \
    --cc=caml-list@inria.fr \
    --cc=mjp.pereira@fct.unl.pt \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).