* Optimized C memset @ 2013-08-27 8:30 Rich Felker 2013-08-27 8:52 ` Jens Gustedt 2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker 0 siblings, 2 replies; 14+ messages in thread From: Rich Felker @ 2013-08-27 8:30 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1971 bytes --] I'm sending this to the list before committing it just to get some comments/feedback. The key feature of this memset, much like the x86 asm, is that it write from both ends in a possibly-overlapping manner to minimize the number of branches. Unlike in the asm, though, I've also used the write-from-both-ends logic to allow trivial alignment handling. One aspect of this code that may appear ugly at first is the usage of the __GNUC__ macro. I've been bothered for a long time by the aliasing violations in src/string/*.c which are only "safe" insomuch as the compiler cannot see across extern function calls. The purpose of checking for __GNUC__ and using the may_alias attribute is to document to the compiler that aliasing is taking place in a controlled manner. If we don't have a compiler that accepts this attribute, the code falls back to using a naive loop with no aliasing violations. The prologue code, including alignment, is still kept, so that optimizing compilers can tell that the pointer is aligned when the naive loop is reached, possibly optimizing it back into something fast. (In fact, with -msse, gcc is able to make the naive version nearly twice as fast as the fancy C version, but unfortunately it's unable to do any gp-register based vectorization for non-SIMD targets. At some point we may want to add an override to turn off the fancy C code and let the compiler do all the work...) So, I'd like to consider gradually transitioning all of the string code that breaks the aliasing rules over to using an approach like this. Any thoughts on this? I hope it's not too ugly, but I don't know any other way that improves correctness and maintains or improves performance. By the way, this new code obsoletes the memset asm for i386 and x86_64 that was added during this release cycle, so I guess I should just delete the asm. I tried some simple improvements to the asm to make it faster, but couldn't come close to beating the new C code. Rich [-- Attachment #2: memset5.c --] [-- Type: text/plain, Size: 1131 bytes --] #include <string.h> #include <stdint.h> #if 100*__GNUC__+__GNUC_MINOR__ >= 302 #define may_alias __attribute__((__may_alias__)) #else #define may_alias #endif typedef uint32_t may_alias u32; typedef uint64_t may_alias u64; void *memset(void *dest, int c, size_t n) { unsigned char *s = dest; u32 c32; u64 c64; size_t k; if (!n) return dest; s[0] = s[n-1] = c; if (n <= 2) return dest; s[1] = s[n-2] = c; s[2] = s[n-3] = c; if (n <= 6) return dest; s[3] = s[n-4] = c; if (n <= 8) return dest; k = -(uintptr_t)s & 3; s += k; n -= k; n &= -3; #ifdef __GNUC__ c32 = ((u32)-1)/255 * (unsigned char)c; *(u32 *)(s+0) = c32; *(u32 *)(s+n-4) = c32; if (n <= 8) return dest; *(u32 *)(s+4) = c32; *(u32 *)(s+8) = c32; *(u32 *)(s+n-12) = c32; *(u32 *)(s+n-8) = c32; if (n <= 24) return dest; *(u32 *)(s+12) = c32; *(u32 *)(s+n-16) = c32; s = (void *)((uintptr_t)(s+16) & -8); n -= 24; c64 = c32 | ((u64)c32 << 32); for (; n >= 32; n-=32, s+=32) { *(u64 *)(s+0) = c64; *(u64 *)(s+8) = c64; *(u64 *)(s+16) = c64; *(u64 *)(s+24) = c64; } #else for (; n; n--, s++) *s = c; #endif return dest; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 8:30 Optimized C memset Rich Felker @ 2013-08-27 8:52 ` Jens Gustedt 2013-08-27 9:17 ` Rich Felker 2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker 1 sibling, 1 reply; 14+ messages in thread From: Jens Gustedt @ 2013-08-27 8:52 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1760 bytes --] Hello, Am Dienstag, den 27.08.2013, 04:30 -0400 schrieb Rich Felker: > One aspect of this code that may appear ugly at first is the usage of > the __GNUC__ macro. It not only *appears* to be ugly :) You already have a slight inconsistency in that at one point the code depends on a particular version of "gcc" and other part only depends on the fact that the __GNUC__ macro is set. There are a lot of compilers out there faking to be a gcc of some version and that are not always feature consistent with the gcc version that they are claming to fake. ICC is notorious for that. So to my opinion this is a dangerous path to follow. (I don't know about any problems with the __may_alias__ attribute, though) To make this easier to maintain, I'd suggest to introduce a special feature test macro, something like __has_may_alias__, and have that set at the beginning of the file in a section that is clearly dedicated to compile time feature detection. Other compilers may have different syntax for such a feature (a _Pragma comes in mind) and may be detected quite differently than by gcc version numbering. Such specific feature test macros is the way that clang goes, and from my experience this is much easier to maintain and to understand when you stumble on such #ifdef'ed code. You not only know that this needs a special version of a compiler, but also for what reason. Best Jens -- :: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/ :: :: AlGorille ::::::::::::::: office Nancy : +33 383593090 :: :: ICube :::::::::::::: office Strasbourg : +33 368854536 :: :: ::::::::::::::::::::::::::: gsm France : +33 651400183 :: :: :::::::::::::::::::: gsm international : +49 15737185122 :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 8:52 ` Jens Gustedt @ 2013-08-27 9:17 ` Rich Felker 2013-08-27 9:50 ` Jens Gustedt 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2013-08-27 9:17 UTC (permalink / raw) To: musl On Tue, Aug 27, 2013 at 10:52:55AM +0200, Jens Gustedt wrote: > Hello, > > Am Dienstag, den 27.08.2013, 04:30 -0400 schrieb Rich Felker: > > One aspect of this code that may appear ugly at first is the usage of > > the __GNUC__ macro. > > It not only *appears* to be ugly :) > > You already have a slight inconsistency in that at one point the code > depends on a particular version of "gcc" and other part only depends > on the fact that the __GNUC__ macro is set. That's intentional. The __may_alias__ attribute has existed since GCC 3.2 (I did the research just a little bit ago). Earlier versions of GCC certainly aren't capable of making optimization that could see cross-translation-unit aliasing, so it's "safe" to do without the attribute on such versions. > There are a lot of compilers out there faking to be a gcc of some > version and that are not always feature consistent with the gcc > version that they are claming to fake. ICC is notorious for that. So > to my opinion this is a dangerous path to follow. (I don't know about > any problems with the __may_alias__ attribute, though) So far, my experience is that compilers which advertise themseleves as "GNU C" compilers have all the semantic features of GCC 3 at the very least. Is this an invalid assumption? Note that even if we "wrongly" detect a compiler as supporting __may_alias__, as long as it doesn't give errors on unknown GCC attributes, the situation is no worse than what we have right now: aliasing violations that the compiler can't see without performing cross-translation-unit analysis. So the code couldn't break without a very fancy compiler, and I would expect any compiler providing such fancy optimization and claiming it's "GNU C" to support the GNU C feature for allowing aliasing in certain places. > To make this easier to maintain, I'd suggest to introduce a special > feature test macro, something like __has_may_alias__, and have that > set at the beginning of the file in a section that is clearly > dedicated to compile time feature detection. Other compilers may have > different syntax for such a feature (a _Pragma comes in mind) and may > be detected quite differently than by gcc version numbering. Where would you suggest it be added? The result of the check is only used in one place, so I don't see how #if .... #define __has_may_alias__ #endif #ifdef __has_may_alias__ [use it] #endif is better than just #if .... [use it] #endif in any practical sense. If on the other hand you're suggesting that I should be more conservative and check not just for __GNUC__, but also for the availability of __may_alias__, in the actual code body (rather than just the typedefs), then it may make more sense. > Such specific feature test macros is the way that clang goes, and from > my experience this is much easier to maintain and to understand when > you stumble on such #ifdef'ed code. You not only know that this needs > a special version of a compiler, but also for what reason. Oh, I agree it's much cleaner. Sadly, though, GCC doesn't seem to give us that information... Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 9:17 ` Rich Felker @ 2013-08-27 9:50 ` Jens Gustedt 2013-08-27 14:21 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Jens Gustedt @ 2013-08-27 9:50 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2911 bytes --] Am Dienstag, den 27.08.2013, 05:17 -0400 schrieb Rich Felker: > So far, my experience is that compilers which advertise themseleves as > "GNU C" compilers have all the semantic features of GCC 3 at the very > least. Is this an invalid assumption? for gcc versions less than 4, probably, I don't remember an exception for that. For version in the 4, there are several such examples. > > To make this easier to maintain, I'd suggest to introduce a special > > feature test macro, something like __has_may_alias__, and have that > > set at the beginning of the file in a section that is clearly > > dedicated to compile time feature detection. Other compilers may have > > different syntax for such a feature (a _Pragma comes in mind) and may > > be detected quite differently than by gcc version numbering. > > Where would you suggest it be added? The result of the check is only > used in one place, so I don't see how I'd wouldn't have an #else case to the detection but do #ifndef may_alias # if 100*__GNUC__+__GNUC_MINOR__ >= 302 # define may_alias __attribute__((__may_alias__)) # endif #endif #ifdef may_alias # define __has_may_alias__ #else # define may_alias #endif This is not much more complicated than what you have and makes it easy to add other cases, without having to maintain the __has_may_alias__ feature macro itself. > If on the other hand you're suggesting that I should be more > conservative and check not just for __GNUC__, but also for the > availability of __may_alias__, in the actual code body (rather than > just the typedefs), then it may make more sense. yes, I thought in these lines. that code should be handled well by any modern compiler that you can convince to not playing games with aliasing, here. So the fact that you tested it with gcc-like compilers is merely an artefact. > > Such specific feature test macros is the way that clang goes, and from > > my experience this is much easier to maintain and to understand when > > you stumble on such #ifdef'ed code. You not only know that this needs > > a special version of a compiler, but also for what reason. > > Oh, I agree it's much cleaner. Sadly, though, GCC doesn't seem to give > us that information... yes, for gcc you'd always have to detect features in the way you are doing it know. But I'd think that feature detection should be separated from the use of the features. The uses might be spread over several places, but the detection should be done in one well detectable spot in the source code. Jens -- :: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/ :: :: AlGorille ::::::::::::::: office Nancy : +33 383593090 :: :: ICube :::::::::::::: office Strasbourg : +33 368854536 :: :: ::::::::::::::::::::::::::: gsm France : +33 651400183 :: :: :::::::::::::::::::: gsm international : +49 15737185122 :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 9:50 ` Jens Gustedt @ 2013-08-27 14:21 ` Rich Felker 2013-08-27 14:34 ` Luca Barbato 2013-08-27 15:20 ` John Spencer 0 siblings, 2 replies; 14+ messages in thread From: Rich Felker @ 2013-08-27 14:21 UTC (permalink / raw) To: musl On Tue, Aug 27, 2013 at 11:50:40AM +0200, Jens Gustedt wrote: > Am Dienstag, den 27.08.2013, 05:17 -0400 schrieb Rich Felker: > > So far, my experience is that compilers which advertise themseleves as > > "GNU C" compilers have all the semantic features of GCC 3 at the very > > least. Is this an invalid assumption? > > for gcc versions less than 4, probably, I don't remember an exception > for that. For version in the 4, there are several such examples. I just checked and confirmed that cparser/libfirm and pcc both support the may_alias attribute. I would assume clang does. > > > To make this easier to maintain, I'd suggest to introduce a special > > > feature test macro, something like __has_may_alias__, and have that > > > set at the beginning of the file in a section that is clearly > > > dedicated to compile time feature detection. Other compilers may have > > > different syntax for such a feature (a _Pragma comes in mind) and may > > > be detected quite differently than by gcc version numbering. > > > > Where would you suggest it be added? The result of the check is only > > used in one place, so I don't see how > > > I'd wouldn't have an #else case to the detection but do > > #ifndef may_alias > # if 100*__GNUC__+__GNUC_MINOR__ >= 302 > # define may_alias __attribute__((__may_alias__)) > # endif > #endif > > #ifdef may_alias > # define __has_may_alias__ > #else > # define may_alias > #endif > > This is not much more complicated than what you have and makes it easy > to add other cases, without having to maintain the __has_may_alias__ > feature macro itself. At this point, __has_may_alias__ is not used for anything. So I fail to see how the above is any different from what I wrote in the source file. > > If on the other hand you're suggesting that I should be more > > conservative and check not just for __GNUC__, but also for the > > availability of __may_alias__, in the actual code body (rather than > > just the typedefs), then it may make more sense. > > yes, I thought in these lines. that code should be handled well by any > modern compiler that you can convince to not playing games with > aliasing, here. So the fact that you tested it with gcc-like compilers > is merely an artefact. If you're saying testing __has_may_alias__ rather than __GNUC__ buys us more portability, I don't see a reason to believe this. A hypothetical non-GNUC compiler might need an actual pointer variable (rather than cast through a pointer type with may_alias on it) to avoid the aliasing violation, or it might require a special attribute on the function indicating that the whole function has immunity from the aliasing rules, or any number of other possibilities. The general approach in musl to these sort of things (compiler and machine dependency) is not to design around some hypothetical abstract level of generality, but to be general to all known existing targets, and adapt to increase generality only when needed. (And, whenever possible, have a 100% portable fallback that relies on nothing compiler- or machine-specific.) The logic for accessing the libc struct in src/internal/libc.h is the best example of this approach. I guess what I'm saying is that I don't want more complex ifdeffery to support optimizations on some hypothetical non-GCC compiler that does may_alias in a way different from GCC, unless such a compiler actually exists, since any "solution" we try to make now to the problem of additional generality might not even solve the problem if/when such a compiler exists. > > > Such specific feature test macros is the way that clang goes, and from > > > my experience this is much easier to maintain and to understand when > > > you stumble on such #ifdef'ed code. You not only know that this needs > > > a special version of a compiler, but also for what reason. > > > > Oh, I agree it's much cleaner. Sadly, though, GCC doesn't seem to give > > us that information... > > yes, for gcc you'd always have to detect features in the way you are > doing it know. But I'd think that feature detection should be > separated from the use of the features. The uses might be spread over > several places, but the detection should be done in one well > detectable spot in the source code. The trade-off is between some duplication of the logic for which version of the code can be used, and added barrier to reading the code. One of the things I think our users like about musl versus glibc is that, for the vast majority of the code, you can fully determine what it's doing without reading other implementation-specific files that define magic macros for things you might not understand -- and that you can take the code and drop it into another project without having to find all the implementation-internal headers it depends on. If something needs to be changed about the logic for may_alias, a simple grep -r will find all the source files it's in and makes it easy to change several occurrences. So I tend to think preserving readability and ease of reuse are more important than avoiding duplication, but if others agree with you, I wouldn't be entirely opposed to adding a "string_impl.h" or similar header with some shared preprocessor logic for all of the string functions that might be doing sketchy things with aliasing and alignment. I'd appreciate comments on this matter from others on which way we should go. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 14:21 ` Rich Felker @ 2013-08-27 14:34 ` Luca Barbato 2013-08-27 14:39 ` Rich Felker 2013-08-27 15:20 ` John Spencer 1 sibling, 1 reply; 14+ messages in thread From: Luca Barbato @ 2013-08-27 14:34 UTC (permalink / raw) To: musl On 27/08/13 16:21, Rich Felker wrote: > If you're saying testing __has_may_alias__ rather than __GNUC__ buys > us more portability, I don't see a reason to believe this. A > hypothetical non-GNUC compiler might need an actual pointer variable > (rather than cast through a pointer type with may_alias on it) to > avoid the aliasing violation, or it might require a special attribute > on the function indicating that the whole function has immunity from > the aliasing rules, or any number of other possibilities. You might decouple the two: - support may_alias - safe to compile code that may alias Just that the latter is harder to check. lu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 14:34 ` Luca Barbato @ 2013-08-27 14:39 ` Rich Felker 0 siblings, 0 replies; 14+ messages in thread From: Rich Felker @ 2013-08-27 14:39 UTC (permalink / raw) To: musl On Tue, Aug 27, 2013 at 04:34:31PM +0200, Luca Barbato wrote: > On 27/08/13 16:21, Rich Felker wrote: > > If you're saying testing __has_may_alias__ rather than __GNUC__ buys > > us more portability, I don't see a reason to believe this. A > > hypothetical non-GNUC compiler might need an actual pointer variable > > (rather than cast through a pointer type with may_alias on it) to > > avoid the aliasing violation, or it might require a special attribute > > on the function indicating that the whole function has immunity from > > the aliasing rules, or any number of other possibilities. > > You might decouple the two: > > - support may_alias > - safe to compile code that may alias > > Just that the latter is harder to check. Indeed. My current logic is that it's safe to compile code that may alias if the compiler is ancient GCC (note: musl does not even advertise support for pre-3.2 GCC) or if it supports may_alias. The most likely negative consequence of the proposed code is that build may completely fail on compilers that advertise themselves as __GNUC__ but fail to accept __attribute__((__may_alias__)). In this sense, a configure check might be preferable, but I hate to get started with those... Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 14:21 ` Rich Felker 2013-08-27 14:34 ` Luca Barbato @ 2013-08-27 15:20 ` John Spencer 2013-08-27 15:34 ` Rich Felker 1 sibling, 1 reply; 14+ messages in thread From: John Spencer @ 2013-08-27 15:20 UTC (permalink / raw) To: musl On 08/27/2013 04:21 PM, Rich Felker wrote: > One of the things I think our users like about musl versus glibc > is that, for the vast majority of the code, you can fully determine > what it's doing without reading other implementation-specific files > that define magic macros for things you might not understand -- and > that you can take the code and drop it into another project without > having to find all the implementation-internal headers it depends on. > > If something needs to be changed about the logic for may_alias, a > simple grep -r will find all the source files it's in and makes it > easy to change several occurrences. So I tend to think preserving > readability and ease of reuse are more important than avoiding > duplication, but if others agree with you, I wouldn't be entirely > opposed to adding a "string_impl.h" or similar header with some shared > preprocessor logic for all of the string functions that might be doing > sketchy things with aliasing and alignment. I'd appreciate comments on > this matter from others on which way we should go. my feeling is that we should stick to our current policy of minor macros being defined in the TUs that use them, making it both simpler to read, and faster to compile (less work for the preprocessor). sufficiently complex macros can go to internal headers instead, so there's only one spot to be taken care of. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset 2013-08-27 15:20 ` John Spencer @ 2013-08-27 15:34 ` Rich Felker 0 siblings, 0 replies; 14+ messages in thread From: Rich Felker @ 2013-08-27 15:34 UTC (permalink / raw) To: musl On Tue, Aug 27, 2013 at 05:20:01PM +0200, John Spencer wrote: > On 08/27/2013 04:21 PM, Rich Felker wrote: > >One of the things I think our users like about musl versus glibc > >is that, for the vast majority of the code, you can fully determine > >what it's doing without reading other implementation-specific files > >that define magic macros for things you might not understand -- and > >that you can take the code and drop it into another project without > >having to find all the implementation-internal headers it depends on. > > > >If something needs to be changed about the logic for may_alias, a > >simple grep -r will find all the source files it's in and makes it > >easy to change several occurrences. So I tend to think preserving > >readability and ease of reuse are more important than avoiding > >duplication, but if others agree with you, I wouldn't be entirely > >opposed to adding a "string_impl.h" or similar header with some shared > >preprocessor logic for all of the string functions that might be doing > >sketchy things with aliasing and alignment. I'd appreciate comments on > >this matter from others on which way we should go. > > > my feeling is that we should stick to our current policy of minor > macros being defined in the TUs that use them, making it both > simpler to read, and faster to compile (less work for the > preprocessor). > > sufficiently complex macros can go to internal headers instead, so > there's only one spot to be taken care of. Indeed, while it's tempting to do something like defining "repr32", "repr64", etc. types (with may_alias attribute) in a common header, I think it makes the code a lot less clear, and actually more difficult to maintain if requirements of individual translation units change (for example some want to use fixed bit sizes, others want to use system word size). What may be beneficial, however, is putting the logic for whether aliasing rule exemptions are possible at all, and the attribute to use to get them, into a shared header file. However I'm not convinced that this logic is anymore complex than "#ifdef __GNUC__", so I think for now I'd like to start off just doing it simple, and possibly switch later as more string functions are adapted to document their aliasing to the compiler. I've always found that if factoring out this kind of logic is beneficial, the right way to factor it emerges while actually implementing the code rather than by trying to guess ahead of time what will be best. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset [v2] 2013-08-27 8:30 Optimized C memset Rich Felker 2013-08-27 8:52 ` Jens Gustedt @ 2013-08-27 16:22 ` Rich Felker 2013-08-27 17:28 ` Jeremy Huntwork 2013-08-28 0:05 ` Andre Renaud 1 sibling, 2 replies; 14+ messages in thread From: Rich Felker @ 2013-08-27 16:22 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 570 bytes --] Here's version 2 (filename version 6, in honor of glibc ;) of the memset code. I fixed a bug in the logic for coverage of the tail (the part past what's covered by the loop) for some values of n and alignments, and cleaned up the __GNUC__ usage a bit to use less #ifdeffery. The remaining test at the top for the __GNUC__ version is ugly, I admit, and should possibly just be removed and replaced by a configure check to add -D__may_alias__= to the CFLAGS if the compiler defines __GNUC__ but does not recognize __attribute__((__may_alias__)) -- opinions on this? Rich [-- Attachment #2: memset6.c --] [-- Type: text/plain, Size: 1275 bytes --] #include <string.h> #include <stdint.h> #if defined(__GNUC__) && 100*__GNUC__+__GNUC_MINOR__ < 302 #define __may_alias__ #endif void *memset(void *dest, int c, size_t n) { unsigned char *s = dest; size_t k; if (!n) return dest; s[0] = s[n-1] = c; if (n <= 2) return dest; s[1] = s[n-2] = c; s[2] = s[n-3] = c; if (n <= 6) return dest; s[3] = s[n-4] = c; if (n <= 8) return dest; k = -(uintptr_t)s & 3; s += k; n -= k; n &= -3; #ifdef __GNUC__ typedef uint32_t __attribute__((__may_alias__)) u32; typedef uint64_t __attribute__((__may_alias__)) u64; u32 c32 = ((u32)-1)/255 * (unsigned char)c; *(u32 *)(s+0) = c32; *(u32 *)(s+n-4) = c32; if (n <= 8) return dest; *(u32 *)(s+4) = c32; *(u32 *)(s+8) = c32; *(u32 *)(s+n-12) = c32; *(u32 *)(s+n-8) = c32; if (n <= 24) return dest; *(u32 *)(s+12) = c32; *(u32 *)(s+16) = c32; *(u32 *)(s+20) = c32; *(u32 *)(s+24) = c32; *(u32 *)(s+n-28) = c32; *(u32 *)(s+n-24) = c32; *(u32 *)(s+n-20) = c32; *(u32 *)(s+n-16) = c32; k = 24 + ((uintptr_t)s & 4); s += k; n -= k; u64 c64 = c32 | ((u64)c32 << 32); for (; n >= 32; n-=32, s+=32) { *(u64 *)(s+0) = c64; *(u64 *)(s+8) = c64; *(u64 *)(s+16) = c64; *(u64 *)(s+24) = c64; } #else for (; n; n--, s++) *s = c; #endif return dest; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset [v2] 2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker @ 2013-08-27 17:28 ` Jeremy Huntwork 2013-08-27 21:27 ` Rich Felker 2013-08-28 0:05 ` Andre Renaud 1 sibling, 1 reply; 14+ messages in thread From: Jeremy Huntwork @ 2013-08-27 17:28 UTC (permalink / raw) To: musl On Aug 27, 2013, at 12:22 PM, Rich Felker <dalias@aerifal.cx> wrote: > Here's version 2 (filename version 6, in honor of glibc ;) of the > memset code. I fixed a bug in the logic for coverage of the tail (the > part past what's covered by the loop) for some values of n and > alignments, and cleaned up the __GNUC__ usage a bit to use less > #ifdeffery. The remaining test at the top for the __GNUC__ version is > ugly, I admit, and should possibly just be removed and replaced by a > configure check to add -D__may_alias__= to the CFLAGS if the compiler > defines __GNUC__ but does not recognize __attribute__((__may_alias__)) > -- opinions on this? Doing the test via configure and setting a flag seems more consistent with comments that have been made elsewhere on this list about the correct way to feature test. JH ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset [v2] 2013-08-27 17:28 ` Jeremy Huntwork @ 2013-08-27 21:27 ` Rich Felker 0 siblings, 0 replies; 14+ messages in thread From: Rich Felker @ 2013-08-27 21:27 UTC (permalink / raw) To: musl On Tue, Aug 27, 2013 at 01:28:36PM -0400, Jeremy Huntwork wrote: > On Aug 27, 2013, at 12:22 PM, Rich Felker <dalias@aerifal.cx> wrote: > > > Here's version 2 (filename version 6, in honor of glibc ;) of the > > memset code. I fixed a bug in the logic for coverage of the tail (the > > part past what's covered by the loop) for some values of n and > > alignments, and cleaned up the __GNUC__ usage a bit to use less > > #ifdeffery. The remaining test at the top for the __GNUC__ version is > > ugly, I admit, and should possibly just be removed and replaced by a > > configure check to add -D__may_alias__= to the CFLAGS if the compiler > > defines __GNUC__ but does not recognize __attribute__((__may_alias__)) > > -- opinions on this? > > > Doing the test via configure and setting a flag seems more > consistent with comments that have been made elsewhere on this list > about the correct way to feature test. Agreed. Also, clang/LLVM seems to still be lacking may_alias: http://llvm.org/bugs/show_bug.cgi?id=11082 I'm not really worried about the semantic problem, since this is just about the most harmless type of aliasing you can have, and we're already using it (without proper annotations for the compiler) anyway. But I don't want build regressions. So I'll whip up something for configure... Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset [v2] 2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker 2013-08-27 17:28 ` Jeremy Huntwork @ 2013-08-28 0:05 ` Andre Renaud 2013-08-28 1:24 ` Rich Felker 1 sibling, 1 reply; 14+ messages in thread From: Andre Renaud @ 2013-08-28 0:05 UTC (permalink / raw) To: musl Hi Rich, On 28 August 2013 04:22, Rich Felker <dalias@aerifal.cx> wrote: > Here's version 2 (filename version 6, in honor of glibc ;) of the > memset code. I fixed a bug in the logic for coverage of the tail (the > part past what's covered by the loop) for some values of n and > alignments, and cleaned up the __GNUC__ usage a bit to use less > #ifdeffery. The remaining test at the top for the __GNUC__ version is > ugly, I admit, and should possibly just be removed and replaced by a > configure check to add -D__may_alias__= to the CFLAGS if the compiler > defines __GNUC__ but does not recognize __attribute__((__may_alias__)) > -- opinions on this? Can you explain the algorithm a bit - I can't entirely follow the us of negation/masking, but it looks like at the end you're doing a loop of 64-bit aligned writes, but I don't see how it can work if the tail end ends in something that isn't 64-bit aligned? Is this assuming that unaligned writes will work ok? Regards, Andre ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Optimized C memset [v2] 2013-08-28 0:05 ` Andre Renaud @ 2013-08-28 1:24 ` Rich Felker 0 siblings, 0 replies; 14+ messages in thread From: Rich Felker @ 2013-08-28 1:24 UTC (permalink / raw) To: musl On Wed, Aug 28, 2013 at 12:05:43PM +1200, Andre Renaud wrote: > Hi Rich, > > On 28 August 2013 04:22, Rich Felker <dalias@aerifal.cx> wrote: > > Here's version 2 (filename version 6, in honor of glibc ;) of the > > memset code. I fixed a bug in the logic for coverage of the tail (the > > part past what's covered by the loop) for some values of n and > > alignments, and cleaned up the __GNUC__ usage a bit to use less > > #ifdeffery. The remaining test at the top for the __GNUC__ version is > > ugly, I admit, and should possibly just be removed and replaced by a > > configure check to add -D__may_alias__= to the CFLAGS if the compiler > > defines __GNUC__ but does not recognize __attribute__((__may_alias__)) > > -- opinions on this? > > Can you explain the algorithm a bit - I can't entirely follow the us > of negation/masking, but it looks like at the end you're doing a loop > of 64-bit aligned writes, but I don't see how it can work if the tail > end ends in something that isn't 64-bit aligned? Is this assuming that > unaligned writes will work ok? See the version I committed a couple hours ago. It has comments added. The basic thing you're missing is that the code before the loop fills from both the beginning and the end, not just the beginning. This allows for a really effective O(log n) branch strategy to fill n bytes: essentially, knowing n>=k allows you to fill up to 2*k bytes: 0,1,...,k-1 and n-1,n-2,n-3,...,n-k. If n<2*k, some of these will overlap, but it doesn't matter. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-08-28 1:24 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-08-27 8:30 Optimized C memset Rich Felker 2013-08-27 8:52 ` Jens Gustedt 2013-08-27 9:17 ` Rich Felker 2013-08-27 9:50 ` Jens Gustedt 2013-08-27 14:21 ` Rich Felker 2013-08-27 14:34 ` Luca Barbato 2013-08-27 14:39 ` Rich Felker 2013-08-27 15:20 ` John Spencer 2013-08-27 15:34 ` Rich Felker 2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker 2013-08-27 17:28 ` Jeremy Huntwork 2013-08-27 21:27 ` Rich Felker 2013-08-28 0:05 ` Andre Renaud 2013-08-28 1:24 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).