From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by c5ff346549e7 (Postfix) with ESMTPS id 25C285D6 for ; Mon, 11 Oct 2021 11:46:56 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.84,326,1620684000"; d="scan'208";a="533218121" Received: from prod-listesu18.inria.fr (HELO sympa.inria.fr) ([128.93.162.160]) by mail2-relais-roc.national.inria.fr with ESMTP; 11 Oct 2021 13:46:53 +0200 Received: by sympa.inria.fr (Postfix, from userid 20132) id 90B36E019F; Mon, 11 Oct 2021 13:46:53 +0200 (CEST) Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 00239E003C for ; Mon, 11 Oct 2021 13:46:47 +0200 (CEST) Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=niols@niols.fr; spf=None smtp.mailfrom=niols@niols.fr; spf=None smtp.helo=postmaster@smtp-190e.mail.infomaniak.ch IronPort-PHdr: =?us-ascii?q?A9a23=3A2in7BRXPT2tWJbWjbXSWYX+pTtjV8Kx+VTF92vM?= =?us-ascii?q?cY1JmTK2v8tzYMVDF4r011RmVB92duqsP1rWempujcFRI2YyGvnEGfc4EfD4+o?= =?us-ascii?q?uJSoTYdBtWYA1bwNv/gYn9yNs1DUFh44yPzahANS47xaFLIv3K98yMZFAnhOgp?= =?us-ascii?q?pPOT1HZPZg9iq2+yo9JDffRlEiCC5bL9vIxm7rQfcvdQKjIV/Lao81gHHqWZSd?= =?us-ascii?q?eRMwmNoK1OTnxLi6cq14ZVu7Sdete8/+sBZSan1cLg2QrJeDDQ9LmA6/9brugX?= =?us-ascii?q?ZTQuO/XQTTGMbmQdVDgff7RH6WpDxsjbmtud4xSKXM9H6QawyVD+/6apgVR3mh?= =?us-ascii?q?zodNzMh7W/ZlMJwgqJYrhyvqRNwzIzbb52aOvdlYqPQfskXSXZdUstfVSFMBJ6?= =?us-ascii?q?3YYsVD+oGOOZVt5Hzp1oJrRu6HgmnGeTiyjlJhn/x2a06yP8sEQfH3AwnG9IOq?= =?us-ascii?q?27YrNvvNKoLV+2+0abHwynZYfxMxTf99JbHcgonofyUX799b9TdxEcyGw3FgFu?= =?us-ascii?q?ctZLoMjGI2+kDt2WW4fdtWOCyhmAptw1/oiSjy9oyhoTUm44Y117K+yp9zYspO?= =?us-ascii?q?dG1R0h2asOqHptXsiGVLYp2QsU6TmFnuSY61r0GuYOgcyQQ1JsnwBvfZ+Sbc4i?= =?us-ascii?q?J5xLjT+GRIS1khH5/fbK/gw6+8Eahyu34S8a4yVdKrixZktbSrHABzRrT5dabS?= =?us-ascii?q?vZ740yv2i6P2hjO5uxHIU04j7fXJ4Ahz7IqiJYesV7PEjL5lUnukqOZbFko9vW?= =?us-ascii?q?t5uv5bLjrqZqROJFohQ3iMqkjn9CwDvo5PwQSWmWW+Pmz2bLm8E33T7hFkOY5n?= =?us-ascii?q?6zbvZ3eJMkXuqu0DxFP3Ysn5RayCSqt3s4CknkdNl1FfQqKj4j3NFHKJ/D1Fey?= =?us-ascii?q?/g1GwkDdz3vzKI7nsDonTIXTZlbfuZ7d960pGxAoyy9Bf6ZVUCrQbL/L1W0/+r?= =?us-ascii?q?t/YAgUlPAy02+rnCdN92Z0CWW+XH6OUM6PfvUWV6u4xI+SAfpEZtTbnJ/Q46PP?= =?us-ascii?q?ilXo5lkUcfamt05sXcne4HvF+LkWfYHrshdMBEXwRswo4Tezqj1mCUSVJa3a8R?= =?us-ascii?q?aIw/is7B56+DYffWoCth6SM0zuhEZ1TYmBKE1SMEXbzd4WYQPoMcyKTIsp5kjM?= =?us-ascii?q?eT7ShSokh1QuvtADg0bZnIPDUqWUkssfo3d1xounSjg0a9DpuDs3b3XveYXtzm?= =?us-ascii?q?zYuRjgyx+grp016zkyr16l/h+ceG8YFtKABaRszKZOJl78yMNv1QA+UJr9hrX6?= =?us-ascii?q?gQ9KiGnc/VIBpqzfhS0B8HNG5yB7ZjXLC6149jLmXHNlto/rRmX34JsI7ynDI0?= =?us-ascii?q?6woiVgvQY1ENT/+7pM=3D?= IronPort-HdrOrdr: =?us-ascii?q?A9a23=3AN74jEKzAvPAdaxsayGQMKrPwOb1zdoMgy1kn?= =?us-ascii?q?xilNoG9uA6mlfqGV7ZYmPHDP5gr5NEtLpTnEAtjifZq+z/5ICOsqUotKNTOO0F?= =?us-ascii?q?dAbrsP0WKI+Vbd8kPFm9K1mZ0AT5RD?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0AEAgASaxphhw4ZfblaHgEBCxIMQIMsg?= =?us-ascii?q?U9XATkxhEePSYF3LQMwg2iZTAsBAwENNwgCBAEBhGECgmsCHQYGNBMBAgQVAQE?= =?us-ascii?q?FAQEBAgEDAwQBEwEBAQENCw4IOiSFaA2CNSKDawEBAQECARIRDwEFLQcKCAsLG?= =?us-ascii?q?AICJgICVxMIAQEVCYJPAYJmIgMLm04BiyN6gTETboNRBYESg0+BXQaBECqNcyc?= =?us-ascii?q?PDYFIRYE8gkc3PoJLFwIChHWCZASDdBmBHRQQL1YTA0OBGZEXq2gzgXeBO4FCg?= =?us-ascii?q?VKHKZQABg8FJoZkjmWKdoY2ok+ZB4F3gWgOB00nETuCNQEzUBkOjjmDWYRZhgc?= =?us-ascii?q?/ATE4AgYBCgEBAwmBD4hmAQE?= X-IPAS-Result: =?us-ascii?q?A0AEAgASaxphhw4ZfblaHgEBCxIMQIMsgU9XATkxhEePSYF?= =?us-ascii?q?3LQMwg2iZTAsBAwENNwgCBAEBhGECgmsCHQYGNBMBAgQVAQEFAQEBAgEDAwQBE?= =?us-ascii?q?wEBAQENCw4IOiSFaA2CNSKDawEBAQECARIRDwEFLQcKCAsLGAICJgICVxMIAQE?= =?us-ascii?q?VCYJPAYJmIgMLm04BiyN6gTETboNRBYESg0+BXQaBECqNcycPDYFIRYE8gkc3P?= =?us-ascii?q?oJLFwIChHWCZASDdBmBHRQQL1YTA0OBGZEXq2gzgXeBO4FCgVKHKZQABg8FJoZ?= =?us-ascii?q?kjmWKdoY2ok+ZB4F3gWgOB00nETuCNQEzUBkOjjmDWYRZhgc/ATE4AgYBCgEBA?= =?us-ascii?q?wmBD4hmAQE?= X-IronPort-AV: E=Sophos;i="5.84,326,1620684000"; d="scan'208";a="395518754" X-MGA-submission: =?us-ascii?q?MDGnz5/23GYuibtjbXca2kx0zkb72Tk+gEKsWy?= =?us-ascii?q?vzZirfWrwTECCoHeKL4gpYUk3/p5eJrJsRdgj2Cn1LJ8/9ovxDkz68Tz?= =?us-ascii?q?sjivUdaIAOblpcCeiwc6EMJRydIaa9si/O4uQMTHQ3Gglh4spvqoms5n?= =?us-ascii?q?XjymaPLwZm+/YcGpIniUNW+g=3D=3D?= Received: from smtp-190e.mail.infomaniak.ch ([185.125.25.14]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Oct 2021 13:46:47 +0200 Received: from smtp-2-0000.mail.infomaniak.ch (unknown [10.5.36.107]) by smtp-3-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4HScTL2wvWzMq07X for ; Mon, 11 Oct 2021 13:46:46 +0200 (CEST) Received: from [IPv6:2a01:e34:ec05:ce60:6d4:f4eb:7960:2a9f] (unknown [IPV6:2a01:e34:ec05:ce60:6d4:f4eb:7960:2a9f]) by smtp-2-0000.mail.infomaniak.ch (Postfix) with ESMTPA id 4HScTL1Zz9zlhP4l for ; Mon, 11 Oct 2021 13:46:46 +0200 (CEST) To: caml-list@inria.fr References: <20211009040533.chu7kqqk27r5wcok@oulala> <20211009195910.mqivg4tsafiyb3tr@oulala> From: Niols Message-ID: <875a3212-c4a1-41fb-65f7-2a68b012af9f@niols.fr> Date: Mon, 11 Oct 2021 13:46:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20211009195910.mqivg4tsafiyb3tr@oulala> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Subject: Re: [Caml-list] O(n ln k) sorting for ocaml on github and a challenge Reply-To: Niols X-Loop: caml-list@inria.fr X-Sequence: 18602 Errors-To: caml-list-owner@inria.fr Precedence: list Precedence: bulk Sender: caml-list-request@inria.fr X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: Archived-At: Hi Mário, Christophe, I'm glad some other people are also doing that kind of things! On 10/9/21 9:59 PM, Christophe Raffalli wrote: > On 21-10-09 09:58:12, Mario Pereira wrote: >> Your implementation reminds me very much of TimSort (for an OCaml >> implementation, see for instance https://github.com/LesBoloss-es/sorting/blob/ >> master/src/list/timsort.ml). This is also a stable algorithm. > > Yes the idea is similar, but > > - I use a reference to the list for the sorted/reverse-sorted block I build in > first phase (like Timsort on arrays, which actually is stable, in place and as > good complexity, but you have to get the balancing right, see below) > > - in the second phase, I use a merge, returning sorted list at even depth and > rev-sorted at odd depth (like OCaml's List.sort). This avoid the Asc/Desc > constructor and the List.rev. > > - the balancement is ensured by a simple arithmetic computation, not by a stack > and an invariant on sizes (probably building a balanced binary tree with the > initial block instead of a list could be a bit better, I am not sure). > > - Last but not least: the code you point seems broken somewhere, it does not > ends on large lists (probably quadratic because the balancing with size is > probably wrong. cf the paragraph "bug" on wikipedia about Timsort. Anyway, the > code you point is probably optimised for arrays. The repository in question is very experimental. Our implementation of TimSort on list is indeed broken. I can't remember if it contains the famous TimSort bug, but we are aware that it exists and know of two ways to fix it. We quickly switched to working on arrays, though, and were planning on coming back to list eventually. Our implementation of TimSort on arrays can be found here: https://github.com/LesBoloss-es/sorting/blob/master/src/array/timsort.ml It does include a fix of the bug. We have tested it successfully on a rather wide range of arrays. We have also implemented PeekSort and PowerSort from “Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs” by J. Ian Munro and Sebastian Wild. cf: https://www.wild-inter.net/publications/munro-wild-2018 They use a number of comparisons similar to TimSort. Our implementations are copied directly from the implementation of the authors. There is a lot of room for improvement to make the runtime much slower but we haven't gotten to that yet. PowerSort, in particular, is designed to be efficient in cache. We wondered whether it would adapt nicely to lists but it is also only future work. > My work aims at being a replacement for OCaml sort (for list currently). Here is > the timing I get: > > Correctness test passed > Stability test passed > Random lists: > random: tf = 1.44, tg = 1.39, factor = 1.04x, gain = 3.70% > random small: tf = 1.10, tg = 1.19, factor = 0.92x, gain = -8.27% > Worst cases: > worst1: tf = 0.83, tg = 0.71, factor = 1.17x, gain = 14.21% > worst2: tf = 0.65, tg = 0.70, factor = 0.93x, gain = -7.17% > Sorted (partially) lists: > sorted: tf = 0.65, tg = 0.01, factor = 97.86x, gain = 98.98% > reversed: tf = 0.65, tg = 0.09, factor = 7.62x, gain = 86.88% > sorted@rev: tf = 0.65, tg = 0.21, factor = 3.14x, gain = 68.19% > rev@sorted: tf = 0.65, tg = 0.21, factor = 3.13x, gain = 68.03% > Shuffled lists (permute k times 2 elements in a sorted list): > shuffle 10: tf = 0.66, tg = 0.41, factor = 1.61x, gain = 38.01% > shuffle 100: tf = 0.64, tg = 0.51, factor = 1.25x, gain = 20.28% > shuffle 1000: tf = 0.63, tg = 0.56, factor = 1.14x, gain = 11.94% > shuffle 10000: tf = 0.64, tg = 0.61, factor = 1.04x, gain = 4.30% > > factor > 1 means Block_sort is faster that List.sort. Remark that in the case of > sorted list, my algorithm returns the original list using constant memory. I guess we had similar goals. I do think it can be nice to have a sorting algorithm in the standard library that behaves well on lists that have some order in them, as such lists occur quite often in real life. We also obtained similar results when comparing to the standard library (for arrays): On purely random lists, our TimSort is ~20% slower and uses some more comparisons. As soon as the lists contain some order, TimSort beats the standard library by far both in term of runtime and comparisons. We were planning on starting to work again on this project soon-ish (hopefully within the coming month), and in particular we wanted to optimise a bit more PeekSort and PowerSort and then package the whole thing in a library. We'll follow closely what you are doing so as to not clash with your work! Cheers, Niols