From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10499 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: memchr() performance Date: Sun, 18 Sep 2016 16:40:30 -0400 Message-ID: <20160918204030.GC15995@brightrain.aerifal.cx> References: <20160918185422.GA2577@dell12.lru.li> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1474231262 26677 195.159.176.226 (18 Sep 2016 20:41:02 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 18 Sep 2016 20:41:02 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-10511-gllmg-musl=m.gmane.org@lists.openwall.com Sun Sep 18 22:40:58 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1blitH-0005Ln-Tf for gllmg-musl@m.gmane.org; Sun, 18 Sep 2016 22:40:44 +0200 Original-Received: (qmail 21600 invoked by uid 550); 18 Sep 2016 20:40:43 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 21582 invoked from network); 18 Sep 2016 20:40:43 -0000 Content-Disposition: inline In-Reply-To: <20160918185422.GA2577@dell12.lru.li> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:10499 Archived-At: On Sun, Sep 18, 2016 at 08:54:22PM +0200, Georg Sauthoff wrote: > (please CC me as I am not subscribed to this ML) > > Hello, > > fyi, I've done some benchmarking of different memchr() and std::find() > versions. > > I also included the memchr() version from musl. > > In general, musl's memchr() implementation doesn't perform better than a > simple unrolled loop (as used in libstdc++ std::find()) - and that is > consistent over different CPU generations and architectures. > > On recent Intel CPUs it is even slower than a naive implementation: Are you assuming vectorization of the naive version by the compiler? I think it's reasonable to assume that on x86_64 but not on 32-bit since many users build for a baseline ISA that does not have vector ops (i486 or i586). > https://gms.tf/stdfind-and-memchr-optimizations.html#measurements > https://gms.tf/sparc-and-ppc-find-benchmark-results.html > > Of course, on x86, other implementations that use SIMD instructions > perform even better. I'm aware that musl's memchr (and more generally the related functions like strchr, strlen, etc.) are not performing great, but it's not clear to me what the right solution is, since the different approaches vary A LOT in terms of how they compare with each other depending on the exact cpu model and compiler. Improving this situation is probably a big project. Rich