On Tue, Feb 17, 2015 at 5:12 PM, Rich Felker wrote: > On Tue, Feb 17, 2015 at 02:08:52PM +0100, Denys Vlasenko wrote: >> >> Please see attached file. >> > >> > I tried it and it's ~1 cycle slower for at least sizes 16-30; >> > presumably we're seeing the cost of the extra compare/branch at these >> > sizes but not at others. What does your timing test show? >> >> See below. >> First column - result of my2.s >> Second column - result of vda1.s >> >> Basically, the "rep stosq" code path got a bit faster, while >> small memsets stayed the same. > > Can you post your test program for me to try out? Here's what I've > been using, attached. With your program I see similar results: ... size 50: min=10, avg=10 min=10, avg=10 size 52: min=10, avg=10 min=10, avg=10 size 54: min=10, avg=11 min=10, avg=11 size 56: min=10, avg=11 min=10, avg=11 size 58: min=10, avg=11 min=10, avg=10 size 60: min=10, avg=10 min=10, avg=12 size 62: min=10, avg=10 min=10, avg=11 size 64: min=18, avg=18 min=18, avg=22 size 96: min=17, avg=17 min=18, avg=18 size 128: min=31, avg=32 min=32, avg=32 size 160: min=35, avg=37 min=33, avg=37 size 192: min=40, avg=40 min=36, avg=37 size 224: min=43, avg=43 min=40, avg=40 size 256: min=44, avg=47 min=43, avg=43 size 288: min=47, avg=48 min=46, avg=47 size 320: min=50, avg=52 min=52, avg=52 size 352: min=53, avg=54 min=52, avg=60 size 384: min=56, avg=57 min=55, avg=57 size 416: min=59, avg=60 min=62, avg=63 size 448: min=63, avg=65 min=66, avg=66 size 480: min=66, avg=71 min=69, avg=69 size 512: min=73, avg=74 min=73, avg=76 size 1024: min=127, avg=129 min=127, avg=129 size 2048: min=221, avg=236 min=221, avg=236 size 4096: min=425, avg=444 min=424, avg=450 size 8192: min=831, avg=881 min=830, avg=883 size 16384: min=1644, avg=1717 min=1643, avg=1748 My test program is attached, I use: gcc -O2 -Wall memset-cycles.c FOO.s