From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id 10cda2db for ; Wed, 22 Jan 2020 19:37:41 +0000 (UTC) Received: (qmail 24166 invoked by uid 550); 22 Jan 2020 19:37:39 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 8021 invoked from network); 22 Jan 2020 17:12:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:disposition-notification-to:date :user-agent:mime-version:content-language; bh=AOeg7sf+DRbUsNzcq8w7RR5IW/9UL58uR+xmYPW3z6I=; b=Zc+9Nfpe7DX6KJXrVXl8Q5FKm6p7t/GGuqvMMiR0QlCYxi1pT9Tqt9fakBMmWQRFNV wEbXEUzQaXLCqs0t4THu4OsOboxFakZ9qJW431/YkL99gE40cx2S+H4tD+aMTwgeEp+B 0Ca35U1nfAVvG3X8zxOVbBJIA2M/f45zOST5DzEuf8YVdjXcoKsJqz4fVZZRsoAJz1Om TyVVXTXY548fkpwAUU+m9M7hkE4gRHFHFLDdcK/Pb44ouiy9p/l7oVTvoE71Jm6FvC6L waAVZa9TFQqVvtCXea6kWRlQhCkfWVnwcSb20su6IuENMxsEg0j0fgceSKMQh6JD4p23 OSgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:message-id :disposition-notification-to:date:user-agent:mime-version :content-language; bh=AOeg7sf+DRbUsNzcq8w7RR5IW/9UL58uR+xmYPW3z6I=; b=mfwlfrMkTIclud240lzrkyJe/y+Mzkb63Kbwj5EJ+V3rWnB1RupFUtf6mxrxKXQ2pq UdcsvCgfEIg+iaW9HUwIew2w1z1l42nSzEpAq/d+XVjeZUi2ope8Cgq1Gxw+M6AqvXD3 qLjsuHAcHSpla7SNRftmR1ZGqsH2KY+b+biTL+4WDavCcyeuC2/GYsAVaNf5dR8pvjOg ps4OMveBxfCxIjsHCD1MdB8h7PRjNBR4EjgKnAkEGIfkN6JjwEgyWBDPQHKvOMra8TDb 5z9ZLKhRsEVmvCwD+Uf6aPxhbOaMsXyL+HfMR5kCVgxt39jtgPfZYBmL4RdXcjh32lnY YfUw== X-Gm-Message-State: APjAAAWANzyb8+B0dcXiHKUDv4VRyEcPDxOdDeGVJIOSFJU6B4N/7POK cGQ3E9Za1k5HfZ1NuSdhH1JCW+Fv X-Google-Smtp-Source: APXvYqz6vvgvq4vcPRUaJovnqMXCWhXVjBJmOnyJtnmp4Rk27sHE/7o+rchB9ejlCFZhZIIXldwycA== X-Received: by 2002:adf:dfc1:: with SMTP id q1mr12317794wrn.155.1579713151439; Wed, 22 Jan 2020 09:12:31 -0800 (PST) To: musl@lists.openwall.com From: Petr Skocik Message-ID: <3016485f-0790-4634-ae6c-d9a05cdb79aa@gmail.com> Date: Wed, 22 Jan 2020 18:12:28 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------202098DF9960FF133F632A4F" Content-Language: en-US Subject: [musl] faster memcmp This is a multi-part message in MIME format. --------------202098DF9960FF133F632A4F Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, I've noticed musl's implementation of memcmp was about way over times slower than glibc's, which made memcpy-based comparisons of e.g., definitely contiguous structs actually quite a bit slower than applying memcmp on them. I've put together a memcmp that compares memory word by word if the operands are aligned or identically misaligned (I figure the comparisons of differently aligned objects is rare in real code). It's still pretty compact, both in C (~40 LOC) and assembly (126 B @ x86_64 gcc -Os) and much closer to gcc's performance (less than twice slower for comparison of 128B objects vs musl's 13 times). If you like it, take it. Regards, Petr Skocik --------------202098DF9960FF133F632A4F Content-Type: text/x-csrc; charset=UTF-8; name="my_memcmp.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="my_memcmp.c" #include #include #include __attribute((optimize("s"))) //a bit faster than -O2/-O3 int my_memcmp(const void *Vl, const void *Vr, size_t N) { typedef size_t uword_tp; const unsigned char *l=Vl, *r=Vr; size_t ByteWiseCnt; size_t aloff = (uintptr_t)l%_Alignof(uword_tp); if ( aloff != (uintptr_t)r%_Alignof(uword_tp) ) { /*differently aligned, finish with bytewise cmp and let it fallthru to return 0 unless the return happens in the bytewise loop*/ ByteWiseCnt = N; //the goto helps keep the output assembly small do_ByteWiseCmp: //run for all, at the beginning if equally misaligned, and possibly at the end N-=ByteWiseCnt; //<= disables the rest if run for all or at the end; adjusts N if run at the beginning for (; 0!=ByteWiseCnt; ByteWiseCnt--, l++, r++) if (*l != *r) return *l-*r; }else if(aloff!=0) { /*do_ByteWiseCmp until aligned*/ ByteWiseCnt= sizeof(uword_tp)-aloff < N ? sizeof(uword_tp)-aloff : N; goto do_ByteWiseCmp; } size_t nw = N/sizeof(uword_tp); N %= sizeof(uword_tp); for (; nw != 0; nw--,l+=sizeof(uword_tp),r+=sizeof(uword_tp)){ size_t l0, r0; memcpy(&l0, l, sizeof(uword_tp)); memcpy(&r0, r, sizeof(uword_tp)); if (l0 == r0) continue; else{ if (1) break; //smaller assembly else { ByteWiseCnt = sizeof(uword_tp); goto do_ByteWiseCmp; } //larger code } } //if (nw!=0), then ByteWiseCnt will return within the next sizeof(size_t) bytes and //nw needs to be >=sizeof(size_t), otherwise it needs to be N if (0==(ByteWiseCnt=nw*sizeof(size_t)+N)) return 0; goto do_ByteWiseCmp; } --------------202098DF9960FF133F632A4F--