From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 314462129A for ; Thu, 29 Feb 2024 01:58:02 +0100 (CET) Received: (qmail 20221 invoked by uid 550); 29 Feb 2024 00:54:21 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 20186 invoked from network); 29 Feb 2024 00:54:21 -0000 Date: Thu, 29 Feb 2024 01:57:48 +0100 From: Robert Clausecker To: musl@lists.openwall.com Message-ID: References: <20240227140756.216904-1-tirtajames45@gmail.com> <20240227144926.GK4163@brightrain.aerifal.cx> <20240227145123.GL4163@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [musl] [PATCH] add memcmpeq: memcmp that returns length of first mismatch Greetings, Am Thu, Feb 29, 2024 at 12:10:05AM +0000 schrieb Thorsten Glaser: > Pedro Falcato dixit: > > >Small note: This isn't quite true for remotely modern x86, unaligned > > It’s very much true, e.g. it breaks atomicity (ok, not relevant > *here*, but in general). > > AIUI, even modern amd64 chips of all vendors are reverting to > optimising rep movsb/lodsb instead again, for stringops. That is not the case. REP MOVSB and friends have a high startup latency, so you only want to use them for large-ish blocks. Too large and all of the sudden AVX-512 is faster again. For small blocks however, you do not want to use this instruction. It's indeed much better to do a pair of overlapping stores. They do not perform crazy well, but it's still better than all alternatives. Also note that REP LODSB is pretty useless; did you perhaps mean REP STOSB? Source: have spent a good part of last year implementing in x86 assembly for FreeBSD's libc. > Of course the status on other architectures should be sufficient to > not use unaligned accesses. Yours, Robert Clausecker -- () ascii ribbon campaign - for an encoding-agnostic world /\ - against html email - against proprietary attachments