From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 1556 invoked from network); 11 Feb 2023 09:23:06 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 11 Feb 2023 09:23:06 -0000 Received: (qmail 18165 invoked by uid 550); 11 Feb 2023 09:23:04 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 18130 invoked from network); 11 Feb 2023 09:23:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=s31663417; t=1676107371; bh=n+9GIeRVzUfOv5jr98DG7/OQBwkcIV7Acl+o3hKNWpM=; h=X-UI-Sender-Class:Date:From:To:Subject:References:In-Reply-To; b=oaxeyOHr8liRppkI2/RoxjD1qZbKnDXM7+LlRmjyrLAxllIXuyzv0XuSESvP5Ed7P 1ofPH9MEut1u2vw+XXiBfKfHtYGNcyMP8LaCSKAFc8QNCkr17QGFufuDoOS9YJNtbD sgRNM82K+kIQUp3+tSRPoM1eSvmUh+xRkjhYsBezBokoE/nIg4uRZ6Nqc7aKVV93VL 5k74YmGjvAymhcIW7jl2L6E3axHkdj04FBDGMS4XNZn0/VlMdVNSHhNlqtajieRq+G klQ+N8jmX6OkwF+4gFsgPwAnsI57RxPAKB0FpresYK0iHUYUMz3BOEWuUWinJagQXJ FwpFdQDRVUHzg== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Date: Sat, 11 Feb 2023 10:22:49 +0100 From: Markus Wichmann To: musl@lists.openwall.com Message-ID: <20230211092249.GC1903@voyager> References: <4d290220.36d6.1860222ca46.Coremail.00107082@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4d290220.36d6.1860222ca46.Coremail.00107082@163.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Provags-ID: V03:K1:jIow7Wm50YvSVNDMw6aX13njkMo+3uh47rzEZNMCAs8YtvP6DeU 0h5WmWzL6iWT5bWsFQXkw9DZ/KGLhC3yuhgR9bJoXO7NlzhUucCoEGib9f4JZZ9GfxRSxBg lwN6XO3Ii5G8yREntxVQmgq95z5PanKsQoO9n3tMLdMlmuI4FQxl4b9pvlzZnVZMVo577Xf qaPI32Hr/Dn+qrZmia6HQ== UI-OutboundReport: notjunk:1;M01:P0:Fs2d0mPw7kc=;yiu5wgfn1Qr/gsGMUMtpbD+NB1E 8MFF5haEDeY5ZHJMHhWetRk7twrsVkqqkeHg9dXoKd0QMSF9TP8BEQLQjc832eZuRl7V30sJO 64IQg0BqVzHL1rlU/78dLvwxBOS60K3Xrv7G+5k/bWJ4i5VIm+eCvgptCpw9gsVb24SHOt1t5 cg/Tm44qW2Vsm1k+TmlzpY2plYAwGC/0pz6ZWAWmPvH9yY4COFuhMlTzuR5K1vxIjAoGQ1p6Y Hy8+DbbIAke5hBzR7TmdtOxB5tjl+AaF+QlHkb2BnVLVMrcRbdMKXtTZB0+gcnVLOa9ufZqJ5 cekhd2ahXg73Z1rqU6Rk5FA7ITm4JQAF2BmYolrJ8IQgzRVwYOqen2Hc5RPuWQaVpcGGW9LQ1 nG8yUId5sCuLjCLiCMYcRjbO1Q9BTqZmG2StenhArethBSNkh/GJH0rsPleMoLSmXekIn0fWS H0vaRjbazzRs44QXhQWfI17xVwryNPu4V4OCdde1wr4mPz7615Nqcrhob6yRcM8GNA50WNOeV R3fPmEUAVCKBNw/MisUSsi4MwDlDcf/CQMf9tYJAiTHMsaK2K6JJ0R38uH7NkeXbSMh6A3x4C X/sQZyKmJECXT/yEBaiX8TkBkZGPsbATH6Pm/9r2tKKkddlBb7Ehi5BwqTu7ohyF3ilwwR7b2 UluTErkqdRn0mwLD4hB3+G//yU2HjFLubc8YjX8TTo2/PTno0sPTZd5urh1lv31P1xp2hk7FN 0ai/RBeiGCXn9LKdEwEa2QlnJQfyaONWjFyNjTJ/pEtQb+6Doe990mcKCZZeyZgiXRdYXYRZ6 gzCwErgyT5w7q72236g++avT2xjD0gL7PtUg1onJwPZYu3R/voGZzezj3q6iWmVEdGzBaQPSk 1bN+GBT6EsyOXGFuLowPkHpzAC0tu5DTPl16Z/x6UDc3ZreMlRtgKe+kMjZQf0U8+p+xdRDr+ iK2JIQ== Subject: Re: [musl] Re:Re: [musl] qsort On Mon, Jan 30, 2023 at 06:04:39PM +0800, David Wang wrote: > On average, qsort with musl is significantly slower than c++ sort, > while with glibc the average performance of qsort is bettern than c++ > sort. Getting back to this, with the benchmarks recently performed, I notice one area where C++ is able to significantly outperform C, and that is in the area of generic functions. C++ std::sort is a template function, and the compiler can optimize the code for the situation. For example, in your case, you only ever need to swap two elements by way of copying four bytes, which the compiler can just inline. In C, the same code can only call memcpy(), and a call to cycle() will involve many calls to memcpy(), and they cannot be inlined since the compiler will not know the copy size at the time it is compiling the calls. We end up with tons of calls to memcpy() with a size of four, when memcpy() is optimized for large sizes. Merge sort, as glibc implements it, allows at least a few memcpy() calls with larger sizes. Glibc also avoids calling memcpy by duplicating the code for common sizes, only resorting to memcpy() for the default case. That may be an avenue worth pursuing, at least for cycle(). I'm not quite certain why glibc chose the sizes they chose for msort_with_tmp(), but they are branching out for 32-bit integers, 64-bit integers, longs (which in a System V ABI are always one of the former two) and void* (which in a System V ABI are always the same as a long). I think special casing 32-bit and 64-bit integers ought to get us most of the way there, since those are the most common sizes that can still be copied simply inline. Also note that glibc further reduces the need for memcpy() by sorting indirectly if the size of a single element exceeds 32 bytes. That is of course something they can do, since they already are allocating memory and have a fall-back strategy in place. Not sure if this is worthwhile for musl. At least in the static linking case, it would suddenly pull malloc() and free() into executables that previously did not have those. That last optimization would also not have any bearing on the current batch of benchmarks, since in those, a size of four is set in stone. Ciao, Markus