From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3594 Path: news.gmane.org!not-for-mail From: Andre Renaud Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Wed, 10 Jul 2013 10:26:46 +1200 Message-ID: References: <20130613012517.GA5859@brightrain.aerifal.cx> <20130613014314.GC29800@brightrain.aerifal.cx> <20130709053711.GO29800@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1373408817 27395 80.91.229.3 (9 Jul 2013 22:26:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 9 Jul 2013 22:26:57 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3598-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 10 00:26:59 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UwgN8-00033R-T9 for gllmg-musl@plane.gmane.org; Wed, 10 Jul 2013 00:26:58 +0200 Original-Received: (qmail 32205 invoked by uid 550); 9 Jul 2013 22:26:58 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32196 invoked from network); 9 Jul 2013 22:26:58 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=gIRqAPRW5xXygkgtCCe5GjVe+v8a8p4kfmIs/uvFMYw=; b=idBR58MKK8VK5TL7x2G3cxagXMUNKrj2Idwa9dwTLM6IBRWqD9RHi2AQg8Q2ot+NId K/t1bXCiyY7S//og2PuMplOZ/ZROu9V6Z1tH8URMTlnWbWdJsOH6nDFKiJ3eouTFVn+M c6eWRF5X3r8tPMMhyhD9W0y+vdVISZ5iU/e/WU6P3nI9YPM4voDbAufvCZxfZ1WBIpae n5p4msZfipK9k4ZbI9CVfzbXZFkQw5sbUcetholvLI3hR6D4WzlpXPXPw6pLgfPECAPF JLwzWeXAwf6eeSdrBht39hJOZ50wHlNgJcQaAyxkpdX4zZnAtaKqdwBlYSxlOHe38kHP UK5Q== X-Received: by 10.58.54.70 with SMTP id h6mr17505480vep.36.1373408806208; Tue, 09 Jul 2013 15:26:46 -0700 (PDT) In-Reply-To: X-Gm-Message-State: ALoCoQlwJ4KohYDJTYPiJIUUO0veeI+s2FAiVz/ljDVdLlsAROvDDVZTUHLv0ji1Cgaq9Kr+9Ac1 Xref: news.gmane.org gmane.linux.lib.musl.general:3594 Archived-At: Replying to myself > Certainly if there was a more straight forward C implementation that > achieved similar results that would be superior. However the existing > musl C memcpy code is already optimised to some degree (doing 32-bit > rather than 8-bit copies), and it is difficult to convince gcc to use > the load-multiple & store-multiple instructions via C code I've found, > without resorting to pretty horrible C code. It may still be > preferable to the assembler though. At this stage I haven't > benchmarked this - I'll see if I can come up with something. As a comparison, the existing memcpy.c implementation tries to copy sizeof(size_t) bytes at a time, which on ARM is 4. This ends up being a standard load/store. However GCC is smart enough to know that it can use ldm/stm instructions for copying structures > 4 bytes. So if we change memcpy.c to use a structure whose size is > 4 (ie: 16), instead of size_t for it's basic copy unit, we do see some improvements: typedef struct multiple_size_t { size_t d[4]; } multiple_size_t; #define SS (sizeof(multiple_size_t)) #define ALIGN (sizeof(multiple_size_t)-1) void *my_memcpy(void * restrict dest, const void * restrict src, size_t n) { unsigned char *d = dest; const unsigned char *s = src; if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN)) goto misaligned; for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++; if (n) { multiple_size_t *wd = (void *)d; const struct multiple_size_t *ws = (const void *)s; for (; n>=SS; n-=SS) *wd++ = *ws++; d = (void *)wd; s = (const void *)ws; misaligned: for (; n; n--) *d++ = *s++; } return dest; } This results in 95MB/s on my platform (up from 65MB/s for the existing memcpy.c, and down from 105MB/s with the asm optimised version). It is essentially identically readable to the existing memcpy.c. I'm not really famiilar with any other cpu architectures, so I'm not sure if this would improve, or hurt, performance on other platforms. Any comments on using something like this for memcpy instead? Obviously this gives you a higher penalty if the size of the area to be copied is between sizeof(size_t) and sizeof(multiple_size_t). Regards, Andre