mailing list of musl libc
 help / color / Atom feed
* [musl] faster memcmp
@ 2020-01-22 17:12 Petr Skocik
  0 siblings, 0 replies; only message in thread
From: Petr Skocik @ 2020-01-22 17:12 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]


I've noticed musl's implementation of memcmp was about way over times
slower than glibc's, which made memcpy-based comparisons of e.g.,
definitely contiguous structs actually quite a bit slower than applying
memcmp on them.

I've put together a memcmp that compares memory word by word if the
operands are aligned or identically misaligned (I figure the comparisons
of differently aligned objects is rare in real code).

It's still pretty compact, both in C (~40 LOC) and assembly (126 B @
x86_64 gcc -Os) and much closer to gcc's performance (less than twice
slower for comparison of 128B objects vs musl's 13 times).

If you like it, take it.

Petr Skocik

[-- Attachment #2: my_memcmp.c --]
[-- Type: text/x-csrc, Size: 1598 bytes --]

#include <stdint.h>
#include <stddef.h>
#include <string.h>

__attribute((optimize("s"))) //a bit faster than -O2/-O3
int my_memcmp(const void *Vl, const void *Vr, size_t N)
	typedef size_t uword_tp;
	const unsigned char *l=Vl, *r=Vr;
	size_t ByteWiseCnt;
	size_t aloff = (uintptr_t)l%_Alignof(uword_tp);

	if ( aloff != (uintptr_t)r%_Alignof(uword_tp) ) {
		/*differently aligned, finish with bytewise cmp and let it fallthru to return 0
		 unless the return happens in the bytewise loop*/

		ByteWiseCnt = N;

		//the goto helps keep the output assembly small
		do_ByteWiseCmp: //run for all, at the beginning if equally misaligned, and possibly at the end
		N-=ByteWiseCnt; //<= disables the rest if run for all or at the end; adjusts N if run at the beginning 
		for (; 0!=ByteWiseCnt; ByteWiseCnt--, l++, r++)
			if (*l != *r) return *l-*r;

	}else if(aloff!=0) {
		/*do_ByteWiseCmp until aligned*/
		ByteWiseCnt= sizeof(uword_tp)-aloff < N ? sizeof(uword_tp)-aloff : N;
		goto do_ByteWiseCmp;

	size_t nw = N/sizeof(uword_tp); N %= sizeof(uword_tp);
	for (; nw != 0; nw--,l+=sizeof(uword_tp),r+=sizeof(uword_tp)){
		size_t l0, r0; memcpy(&l0, l, sizeof(uword_tp)); memcpy(&r0, r, sizeof(uword_tp));
		if (l0 == r0) continue;
		   	if (1) break; //smaller assembly
			else { ByteWiseCnt = sizeof(uword_tp); goto do_ByteWiseCmp; } //larger code

	//if (nw!=0), then ByteWiseCnt will return within the next sizeof(size_t) bytes and
	//nw needs to be >=sizeof(size_t), otherwise it needs to be N
	if (0==(ByteWiseCnt=nw*sizeof(size_t)+N))
		return 0;

	goto do_ByteWiseCmp;

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, back to index

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22 17:12 [musl] faster memcmp Petr Skocik

mailing list of musl libc

Archives are clonable: git clone --mirror

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone