I'm sending this to the list before committing it just to get some comments/feedback. The key feature of this memset, much like the x86 asm, is that it write from both ends in a possibly-overlapping manner to minimize the number of branches. Unlike in the asm, though, I've also used the write-from-both-ends logic to allow trivial alignment handling. One aspect of this code that may appear ugly at first is the usage of the __GNUC__ macro. I've been bothered for a long time by the aliasing violations in src/string/*.c which are only "safe" insomuch as the compiler cannot see across extern function calls. The purpose of checking for __GNUC__ and using the may_alias attribute is to document to the compiler that aliasing is taking place in a controlled manner. If we don't have a compiler that accepts this attribute, the code falls back to using a naive loop with no aliasing violations. The prologue code, including alignment, is still kept, so that optimizing compilers can tell that the pointer is aligned when the naive loop is reached, possibly optimizing it back into something fast. (In fact, with -msse, gcc is able to make the naive version nearly twice as fast as the fancy C version, but unfortunately it's unable to do any gp-register based vectorization for non-SIMD targets. At some point we may want to add an override to turn off the fancy C code and let the compiler do all the work...) So, I'd like to consider gradually transitioning all of the string code that breaks the aliasing rules over to using an approach like this. Any thoughts on this? I hope it's not too ugly, but I don't know any other way that improves correctness and maintains or improves performance. By the way, this new code obsoletes the memset asm for i386 and x86_64 that was added during this release cycle, so I guess I should just delete the asm. I tried some simple improvements to the asm to make it faster, but couldn't come close to beating the new C code. Rich