From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1399 Path: news.gmane.org!not-for-mail From: John Spencer Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: musl libc, memcpy Date: Sat, 04 Aug 2012 01:22:10 +0200 Message-ID: <501C5D22.1000405@barfooze.de> References: <20120730204100.GY544@brightrain.aerifal.cx> <20120801042722.GB544@brightrain.aerifal.cx> <20120801054011.GC544@brightrain.aerifal.cx> <20120801061904.GD544@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1344036069 9374 80.91.229.3 (3 Aug 2012 23:21:09 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 3 Aug 2012 23:21:09 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1400-gllmg-musl=m.gmane.org@lists.openwall.com Sat Aug 04 01:21:07 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1SxRB4-0007is-VR for gllmg-musl@plane.gmane.org; Sat, 04 Aug 2012 01:21:07 +0200 Original-Received: (qmail 32650 invoked by uid 550); 3 Aug 2012 23:21:05 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32642 invoked from network); 3 Aug 2012 23:21:05 -0000 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110221 SUSE/3.1.8 Mail/1.0 In-Reply-To: <20120801061904.GD544@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:1399 Archived-At: i've setup a perfomance test ( https://github.com/rofl0r/memcpy-test ) these are the average results for i386 (100 runs on big sizes, 10000 on smaller ones) asm version current c-version size: 3 172 ticks 199 ticks size: 4 167 ticks 167 ticks size: 5 197 ticks 186 ticks size: 8 187 ticks 186 ticks size: 15 195 ticks 196 ticks size: 16 186 ticks 185 ticks size: 23 202 ticks 199 ticks size: 24 193 ticks 188 ticks size: 25 205 ticks 212 ticks size: 31 199 ticks 198 ticks size: 32 195 ticks 192 ticks size: 33 204 ticks 192 ticks size: 63 213 ticks 255 ticks size: 64 219 ticks 226 ticks size: 65 208 ticks 238 ticks size: 95 220 ticks 247 ticks size: 96 214 ticks 239 ticks size: 97 217 ticks 243 ticks size: 127 233 ticks 261 ticks size: 128 225 ticks 254 ticks size: 129 229 ticks 266 ticks size: 159 242 ticks 279 ticks size: 160 235 ticks 268 ticks size: 161 238 ticks 273 ticks size: 191 255 ticks 288 ticks size: 192 264 ticks 288 ticks size: 193 248 ticks 287 ticks size: 255 279 ticks 323 ticks size: 256 266 ticks 313 ticks size: 257 269 ticks 319 ticks size: 383 332 ticks 391 ticks size: 384 308 ticks 370 ticks size: 385 307 ticks 384 ticks size: 511 345 ticks 439 ticks size: 512 315 ticks 434 ticks size: 513 318 ticks 439 ticks size: 767 370 ticks 571 ticks size: 768 330 ticks 555 ticks size: 769 334 ticks 566 ticks size: 1023 382 ticks 740 ticks size: 1024 349 ticks 727 ticks size: 1025 358 ticks 694 ticks size: 1535 423 ticks 936 ticks size: 1536 393 ticks 930 ticks size: 1537 400 ticks 929 ticks size: 2048 448 ticks 1176 ticks size: 4096 822 ticks 2404 ticks size: 8192 3136 ticks 8310 ticks size: 16384 6481 ticks 9780 ticks size: 32768 11645 ticks 19060 ticks size: 65536 29700 ticks 52051 ticks size: 131072 307029 ticks 310875 ticks size: 262144 608502 ticks 617698 ticks size: 524288 1222116 ticks 1244987 ticks size: 1048576 2500207 ticks 2712991 ticks size: 2097152 5279016 ticks 5566665 ticks size: 4194304 10586333 ticks 10849110 ticks size: 8388608 21961730 ticks 22473953 ticks size: 16777216 45966254 ticks 47159258 ticks size: 33554432 92434464 ticks 95873868 ticks size: 67108864 189858530 ticks 190456107 ticks it looks as if the asm version is up to twice as fast, depending on the size of data copied. now waiting for the x86_64 version (if you could provide a working 64bit rdtsc inline asm function, i'll gladly take that as well) someone on ##asm suggested that movaps with xmm regs was fastest in his tests. would be interesting to test such a version as well. On 08/01/2012 08:19 AM, Rich Felker wrote: > On Wed, Aug 01, 2012 at 01:40:11AM -0400, Rich Felker wrote: >> On Wed, Aug 01, 2012 at 12:27:22AM -0400, Rich Felker wrote: >>> I'm attaching a (possibly buggy; not heavily tested) rep-movsd-based >>> version. I'd be interested in hearing how it performs. >> And here is the attachment... > And here's a version that might be faster; reportedly, rep movsd works > better when the destination address is aligned. > > Rich