From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10498 Path: news.gmane.org!.POSTED!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: memchr() performance Date: Sun, 18 Sep 2016 22:40:36 +0200 Message-ID: <20160918204036.GZ1280@port70.net> References: <20160918185422.GA2577@dell12.lru.li> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1474231260 25605 195.159.176.226 (18 Sep 2016 20:41:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 18 Sep 2016 20:41:00 +0000 (UTC) User-Agent: Mutt/1.6.0 (2016-04-01) Cc: musl@lists.openwall.com To: Georg Sauthoff Original-X-From: musl-return-10512-gllmg-musl=m.gmane.org@lists.openwall.com Sun Sep 18 22:40:54 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1blitN-0005kq-HV for gllmg-musl@m.gmane.org; Sun, 18 Sep 2016 22:40:49 +0200 Original-Received: (qmail 22004 invoked by uid 550); 18 Sep 2016 20:40:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 21979 invoked from network); 18 Sep 2016 20:40:48 -0000 Mail-Followup-To: Georg Sauthoff , musl@lists.openwall.com Content-Disposition: inline In-Reply-To: <20160918185422.GA2577@dell12.lru.li> Xref: news.gmane.org gmane.linux.lib.musl.general:10498 Archived-At: * Georg Sauthoff [2016-09-18 20:54:22 +0200]: > > In general, musl's memchr() implementation doesn't perform better than a > simple unrolled loop (as used in libstdc++ std::find()) - and that is > consistent over different CPU generations and architectures. > memchr in musl was never updated (same for >5 years) so probably should be and last time the position was "In the particular case of strlen, the naive unrolled strlen with no OOB access is actually optimal on most or all 32-bit archs, better than what we have now. I suspect the same is true for strchr and other related functions." http://www.openwall.com/lists/musl/2016/01/05/5 but we did not have benchmark numbers at the time.. note that this benchmark does not measure the effect of more branch prediction slots used in the unrolled case. > On recent Intel CPUs it is even slower than a naive implementation: > > https://gms.tf/stdfind-and-memchr-optimizations.html#measurements > https://gms.tf/sparc-and-ppc-find-benchmark-results.html > > Of course, on x86, other implementations that use SIMD instructions > perform even better. > yes simd is expected to be faster. but that needs asm which is expensive to maintain (there is no portable simd language extension for c and there is the aliasing issue: the reinterpret_cast in your code is formally ub). > Best regards > Georg