mailing list of musl libc
 help / color / mirror / code / Atom feed
* Optimized C memset
@ 2013-08-27  8:30 Rich Felker
  2013-08-27  8:52 ` Jens Gustedt
  2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker
  0 siblings, 2 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-27  8:30 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]

I'm sending this to the list before committing it just to get some
comments/feedback. The key feature of this memset, much like the x86
asm, is that it write from both ends in a possibly-overlapping manner
to minimize the number of branches. Unlike in the asm, though, I've
also used the write-from-both-ends logic to allow trivial alignment
handling.

One aspect of this code that may appear ugly at first is the usage of
the __GNUC__ macro. I've been bothered for a long time by the aliasing
violations in src/string/*.c which are only "safe" insomuch as the
compiler cannot see across extern function calls. The purpose of
checking for __GNUC__ and using the may_alias attribute is to document
to the compiler that aliasing is taking place in a controlled manner.
If we don't have a compiler that accepts this attribute, the code
falls back to using a naive loop with no aliasing violations. The
prologue code, including alignment, is still kept, so that optimizing
compilers can tell that the pointer is aligned when the naive loop is
reached, possibly optimizing it back into something fast. (In fact,
with -msse, gcc is able to make the naive version nearly twice as fast
as the fancy C version, but unfortunately it's unable to do any
gp-register based vectorization for non-SIMD targets. At some point we
may want to add an override to turn off the fancy C code and let the
compiler do all the work...)

So, I'd like to consider gradually transitioning all of the string
code that breaks the aliasing rules over to using an approach like
this. Any thoughts on this? I hope it's not too ugly, but I don't know
any other way that improves correctness and maintains or improves
performance.

By the way, this new code obsoletes the memset asm for i386 and x86_64
that was added during this release cycle, so I guess I should just
delete the asm. I tried some simple improvements to the asm to make it
faster, but couldn't come close to beating the new C code.

Rich

[-- Attachment #2: memset5.c --]
[-- Type: text/plain, Size: 1131 bytes --]

#include <string.h>
#include <stdint.h>

#if 100*__GNUC__+__GNUC_MINOR__ >= 302
#define may_alias __attribute__((__may_alias__))
#else
#define may_alias
#endif

typedef uint32_t may_alias u32;
typedef uint64_t may_alias u64;

void *memset(void *dest, int c, size_t n)
{
	unsigned char *s = dest;
	u32 c32;
	u64 c64;
	size_t k;

	if (!n) return dest;
	s[0] = s[n-1] = c;
	if (n <= 2) return dest;
	s[1] = s[n-2] = c;
	s[2] = s[n-3] = c;
	if (n <= 6) return dest;
	s[3] = s[n-4] = c;
	if (n <= 8) return dest;

	k = -(uintptr_t)s & 3;
	s += k;
	n -= k;
	n &= -3;

#ifdef __GNUC__
	c32 = ((u32)-1)/255 * (unsigned char)c;

	*(u32 *)(s+0) = c32;
	*(u32 *)(s+n-4) = c32;
	if (n <= 8) return dest;
	*(u32 *)(s+4) = c32;
	*(u32 *)(s+8) = c32;
	*(u32 *)(s+n-12) = c32;
	*(u32 *)(s+n-8) = c32;
	if (n <= 24) return dest;
	*(u32 *)(s+12) = c32;
	*(u32 *)(s+n-16) = c32;

	s = (void *)((uintptr_t)(s+16) & -8);
	n -= 24;

	c64 = c32 | ((u64)c32 << 32);
	for (; n >= 32; n-=32, s+=32) {
		*(u64 *)(s+0) = c64;
		*(u64 *)(s+8) = c64;
		*(u64 *)(s+16) = c64;
		*(u64 *)(s+24) = c64;
	}
#else
	for (; n; n--, s++) *s = c;
#endif

	return dest;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27  8:30 Optimized C memset Rich Felker
@ 2013-08-27  8:52 ` Jens Gustedt
  2013-08-27  9:17   ` Rich Felker
  2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Gustedt @ 2013-08-27  8:52 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

Hello,

Am Dienstag, den 27.08.2013, 04:30 -0400 schrieb Rich Felker:
> One aspect of this code that may appear ugly at first is the usage of
> the __GNUC__ macro.

It not only *appears* to be ugly :)

You already have a slight inconsistency in that at one point the code
depends on a particular version of "gcc" and other part only depends
on the fact that the __GNUC__ macro is set.

There are a lot of compilers out there faking to be a gcc of some
version and that are not always feature consistent with the gcc
version that they are claming to fake. ICC is notorious for that. So
to my opinion this is a dangerous path to follow. (I don't know about
any problems with the __may_alias__ attribute, though)

To make this easier to maintain, I'd suggest to introduce a special
feature test macro, something like __has_may_alias__, and have that
set at the beginning of the file in a section that is clearly
dedicated to compile time feature detection. Other compilers may have
different syntax for such a feature (a _Pragma comes in mind) and may
be detected quite differently than by gcc version numbering.

Such specific feature test macros is the way that clang goes, and from
my experience this is much easier to maintain and to understand when
you stumble on such #ifdef'ed code. You not only know that this needs
a special version of a compiler, but also for what reason.

Best

Jens


-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27  8:52 ` Jens Gustedt
@ 2013-08-27  9:17   ` Rich Felker
  2013-08-27  9:50     ` Jens Gustedt
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2013-08-27  9:17 UTC (permalink / raw)
  To: musl

On Tue, Aug 27, 2013 at 10:52:55AM +0200, Jens Gustedt wrote:
> Hello,
> 
> Am Dienstag, den 27.08.2013, 04:30 -0400 schrieb Rich Felker:
> > One aspect of this code that may appear ugly at first is the usage of
> > the __GNUC__ macro.
> 
> It not only *appears* to be ugly :)
> 
> You already have a slight inconsistency in that at one point the code
> depends on a particular version of "gcc" and other part only depends
> on the fact that the __GNUC__ macro is set.

That's intentional. The __may_alias__ attribute has existed since GCC
3.2 (I did the research just a little bit ago). Earlier versions of
GCC certainly aren't capable of making optimization that could see
cross-translation-unit aliasing, so it's "safe" to do without the
attribute on such versions.

> There are a lot of compilers out there faking to be a gcc of some
> version and that are not always feature consistent with the gcc
> version that they are claming to fake. ICC is notorious for that. So
> to my opinion this is a dangerous path to follow. (I don't know about
> any problems with the __may_alias__ attribute, though)

So far, my experience is that compilers which advertise themseleves as
"GNU C" compilers have all the semantic features of GCC 3 at the very
least. Is this an invalid assumption?

Note that even if we "wrongly" detect a compiler as supporting
__may_alias__, as long as it doesn't give errors on unknown GCC
attributes, the situation is no worse than what we have right now:
aliasing violations that the compiler can't see without performing
cross-translation-unit analysis. So the code couldn't break without a
very fancy compiler, and I would expect any compiler providing such
fancy optimization and claiming it's "GNU C" to support the GNU C
feature for allowing aliasing in certain places.

> To make this easier to maintain, I'd suggest to introduce a special
> feature test macro, something like __has_may_alias__, and have that
> set at the beginning of the file in a section that is clearly
> dedicated to compile time feature detection. Other compilers may have
> different syntax for such a feature (a _Pragma comes in mind) and may
> be detected quite differently than by gcc version numbering.

Where would you suggest it be added? The result of the check is only
used in one place, so I don't see how

#if ....
#define __has_may_alias__
#endif

#ifdef __has_may_alias__
[use it]
#endif

is better than just

#if ....
[use it]
#endif

in any practical sense.

If on the other hand you're suggesting that I should be more
conservative and check not just for __GNUC__, but also for the
availability of __may_alias__, in the actual code body (rather than
just the typedefs), then it may make more sense.

> Such specific feature test macros is the way that clang goes, and from
> my experience this is much easier to maintain and to understand when
> you stumble on such #ifdef'ed code. You not only know that this needs
> a special version of a compiler, but also for what reason.

Oh, I agree it's much cleaner. Sadly, though, GCC doesn't seem to give
us that information...

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27  9:17   ` Rich Felker
@ 2013-08-27  9:50     ` Jens Gustedt
  2013-08-27 14:21       ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Gustedt @ 2013-08-27  9:50 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2911 bytes --]

Am Dienstag, den 27.08.2013, 05:17 -0400 schrieb Rich Felker:
> So far, my experience is that compilers which advertise themseleves as
> "GNU C" compilers have all the semantic features of GCC 3 at the very
> least. Is this an invalid assumption?

for gcc versions less than 4, probably, I don't remember an exception
for that. For version in the 4, there are several such examples.

> > To make this easier to maintain, I'd suggest to introduce a special
> > feature test macro, something like __has_may_alias__, and have that
> > set at the beginning of the file in a section that is clearly
> > dedicated to compile time feature detection. Other compilers may have
> > different syntax for such a feature (a _Pragma comes in mind) and may
> > be detected quite differently than by gcc version numbering.
> 
> Where would you suggest it be added? The result of the check is only
> used in one place, so I don't see how


I'd wouldn't have an #else case to the detection but do

#ifndef may_alias
# if 100*__GNUC__+__GNUC_MINOR__ >= 302
#  define may_alias __attribute__((__may_alias__))
# endif
#endif

#ifdef may_alias
# define __has_may_alias__
#else
# define may_alias
#endif

This is not much more complicated than what you have and makes it easy
to add other cases, without having to maintain the __has_may_alias__
feature macro itself.


> If on the other hand you're suggesting that I should be more
> conservative and check not just for __GNUC__, but also for the
> availability of __may_alias__, in the actual code body (rather than
> just the typedefs), then it may make more sense.

yes, I thought in these lines. that code should be handled well by any
modern compiler that you can convince to not playing games with
aliasing, here. So the fact that you tested it with gcc-like compilers
is merely an artefact.

> > Such specific feature test macros is the way that clang goes, and from
> > my experience this is much easier to maintain and to understand when
> > you stumble on such #ifdef'ed code. You not only know that this needs
> > a special version of a compiler, but also for what reason.
> 
> Oh, I agree it's much cleaner. Sadly, though, GCC doesn't seem to give
> us that information...

yes, for gcc you'd always have to detect features in the way you are
doing it know. But I'd think that feature detection should be
separated from the use of the features. The uses might be spread over
several places, but the detection should be done in one well
detectable spot in the source code.

Jens

-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27  9:50     ` Jens Gustedt
@ 2013-08-27 14:21       ` Rich Felker
  2013-08-27 14:34         ` Luca Barbato
  2013-08-27 15:20         ` John Spencer
  0 siblings, 2 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-27 14:21 UTC (permalink / raw)
  To: musl

On Tue, Aug 27, 2013 at 11:50:40AM +0200, Jens Gustedt wrote:
> Am Dienstag, den 27.08.2013, 05:17 -0400 schrieb Rich Felker:
> > So far, my experience is that compilers which advertise themseleves as
> > "GNU C" compilers have all the semantic features of GCC 3 at the very
> > least. Is this an invalid assumption?
> 
> for gcc versions less than 4, probably, I don't remember an exception
> for that. For version in the 4, there are several such examples.

I just checked and confirmed that cparser/libfirm and pcc both support
the may_alias attribute. I would assume clang does.

> > > To make this easier to maintain, I'd suggest to introduce a special
> > > feature test macro, something like __has_may_alias__, and have that
> > > set at the beginning of the file in a section that is clearly
> > > dedicated to compile time feature detection. Other compilers may have
> > > different syntax for such a feature (a _Pragma comes in mind) and may
> > > be detected quite differently than by gcc version numbering.
> > 
> > Where would you suggest it be added? The result of the check is only
> > used in one place, so I don't see how
> 
> 
> I'd wouldn't have an #else case to the detection but do
> 
> #ifndef may_alias
> # if 100*__GNUC__+__GNUC_MINOR__ >= 302
> #  define may_alias __attribute__((__may_alias__))
> # endif
> #endif
> 
> #ifdef may_alias
> # define __has_may_alias__
> #else
> # define may_alias
> #endif
> 
> This is not much more complicated than what you have and makes it easy
> to add other cases, without having to maintain the __has_may_alias__
> feature macro itself.

At this point, __has_may_alias__ is not used for anything. So I fail
to see how the above is any different from what I wrote in the source
file.

> > If on the other hand you're suggesting that I should be more
> > conservative and check not just for __GNUC__, but also for the
> > availability of __may_alias__, in the actual code body (rather than
> > just the typedefs), then it may make more sense.
> 
> yes, I thought in these lines. that code should be handled well by any
> modern compiler that you can convince to not playing games with
> aliasing, here. So the fact that you tested it with gcc-like compilers
> is merely an artefact.

If you're saying testing __has_may_alias__ rather than __GNUC__ buys
us more portability, I don't see a reason to believe this. A
hypothetical non-GNUC compiler might need an actual pointer variable
(rather than cast through a pointer type with may_alias on it) to
avoid the aliasing violation, or it might require a special attribute
on the function indicating that the whole function has immunity from
the aliasing rules, or any number of other possibilities.

The general approach in musl to these sort of things (compiler and
machine dependency) is not to design around some hypothetical abstract
level of generality, but to be general to all known existing targets,
and adapt to increase generality only when needed. (And, whenever
possible, have a 100% portable fallback that relies on nothing
compiler- or machine-specific.) The logic for accessing the libc
struct in src/internal/libc.h is the best example of this approach.

I guess what I'm saying is that I don't want more complex ifdeffery to
support optimizations on some hypothetical non-GCC compiler that does
may_alias in a way different from GCC, unless such a compiler actually
exists, since any "solution" we try to make now to the problem of
additional generality might not even solve the problem if/when such a
compiler exists.

> > > Such specific feature test macros is the way that clang goes, and from
> > > my experience this is much easier to maintain and to understand when
> > > you stumble on such #ifdef'ed code. You not only know that this needs
> > > a special version of a compiler, but also for what reason.
> > 
> > Oh, I agree it's much cleaner. Sadly, though, GCC doesn't seem to give
> > us that information...
> 
> yes, for gcc you'd always have to detect features in the way you are
> doing it know. But I'd think that feature detection should be
> separated from the use of the features. The uses might be spread over
> several places, but the detection should be done in one well
> detectable spot in the source code.

The trade-off is between some duplication of the logic for which
version of the code can be used, and added barrier to reading the
code. One of the things I think our users like about musl versus glibc
is that, for the vast majority of the code, you can fully determine
what it's doing without reading other implementation-specific files
that define magic macros for things you might not understand -- and
that you can take the code and drop it into another project without
having to find all the implementation-internal headers it depends on.

If something needs to be changed about the logic for may_alias, a
simple grep -r will find all the source files it's in and makes it
easy to change several occurrences. So I tend to think preserving
readability and ease of reuse are more important than avoiding
duplication, but if others agree with you, I wouldn't be entirely
opposed to adding a "string_impl.h" or similar header with some shared
preprocessor logic for all of the string functions that might be doing
sketchy things with aliasing and alignment. I'd appreciate comments on
this matter from others on which way we should go.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27 14:21       ` Rich Felker
@ 2013-08-27 14:34         ` Luca Barbato
  2013-08-27 14:39           ` Rich Felker
  2013-08-27 15:20         ` John Spencer
  1 sibling, 1 reply; 14+ messages in thread
From: Luca Barbato @ 2013-08-27 14:34 UTC (permalink / raw)
  To: musl

On 27/08/13 16:21, Rich Felker wrote:
> If you're saying testing __has_may_alias__ rather than __GNUC__ buys
> us more portability, I don't see a reason to believe this. A
> hypothetical non-GNUC compiler might need an actual pointer variable
> (rather than cast through a pointer type with may_alias on it) to
> avoid the aliasing violation, or it might require a special attribute
> on the function indicating that the whole function has immunity from
> the aliasing rules, or any number of other possibilities.

You might decouple the two:

- support may_alias
- safe to compile code that may alias

Just that the latter is harder to check.

lu




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27 14:34         ` Luca Barbato
@ 2013-08-27 14:39           ` Rich Felker
  0 siblings, 0 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-27 14:39 UTC (permalink / raw)
  To: musl

On Tue, Aug 27, 2013 at 04:34:31PM +0200, Luca Barbato wrote:
> On 27/08/13 16:21, Rich Felker wrote:
> > If you're saying testing __has_may_alias__ rather than __GNUC__ buys
> > us more portability, I don't see a reason to believe this. A
> > hypothetical non-GNUC compiler might need an actual pointer variable
> > (rather than cast through a pointer type with may_alias on it) to
> > avoid the aliasing violation, or it might require a special attribute
> > on the function indicating that the whole function has immunity from
> > the aliasing rules, or any number of other possibilities.
> 
> You might decouple the two:
> 
> - support may_alias
> - safe to compile code that may alias
> 
> Just that the latter is harder to check.

Indeed. My current logic is that it's safe to compile code that may
alias if the compiler is ancient GCC (note: musl does not even
advertise support for pre-3.2 GCC) or if it supports may_alias. The
most likely negative consequence of the proposed code is that build
may completely fail on compilers that advertise themselves as __GNUC__
but fail to accept __attribute__((__may_alias__)). In this sense, a
configure check might be preferable, but I hate to get started with
those...

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27 14:21       ` Rich Felker
  2013-08-27 14:34         ` Luca Barbato
@ 2013-08-27 15:20         ` John Spencer
  2013-08-27 15:34           ` Rich Felker
  1 sibling, 1 reply; 14+ messages in thread
From: John Spencer @ 2013-08-27 15:20 UTC (permalink / raw)
  To: musl

On 08/27/2013 04:21 PM, Rich Felker wrote:
> One of the things I think our users like about musl versus glibc
> is that, for the vast majority of the code, you can fully determine
> what it's doing without reading other implementation-specific files
> that define magic macros for things you might not understand -- and
> that you can take the code and drop it into another project without
> having to find all the implementation-internal headers it depends on.
>
> If something needs to be changed about the logic for may_alias, a
> simple grep -r will find all the source files it's in and makes it
> easy to change several occurrences. So I tend to think preserving
> readability and ease of reuse are more important than avoiding
> duplication, but if others agree with you, I wouldn't be entirely
> opposed to adding a "string_impl.h" or similar header with some shared
> preprocessor logic for all of the string functions that might be doing
> sketchy things with aliasing and alignment. I'd appreciate comments on
> this matter from others on which way we should go.


my feeling is that we should stick to our current policy of minor macros 
being defined in the TUs that use them, making it both simpler to read, 
and faster to compile (less work for the preprocessor).

sufficiently complex macros can go to internal headers instead, so 
there's only one spot to be taken care of.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset
  2013-08-27 15:20         ` John Spencer
@ 2013-08-27 15:34           ` Rich Felker
  0 siblings, 0 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-27 15:34 UTC (permalink / raw)
  To: musl

On Tue, Aug 27, 2013 at 05:20:01PM +0200, John Spencer wrote:
> On 08/27/2013 04:21 PM, Rich Felker wrote:
> >One of the things I think our users like about musl versus glibc
> >is that, for the vast majority of the code, you can fully determine
> >what it's doing without reading other implementation-specific files
> >that define magic macros for things you might not understand -- and
> >that you can take the code and drop it into another project without
> >having to find all the implementation-internal headers it depends on.
> >
> >If something needs to be changed about the logic for may_alias, a
> >simple grep -r will find all the source files it's in and makes it
> >easy to change several occurrences. So I tend to think preserving
> >readability and ease of reuse are more important than avoiding
> >duplication, but if others agree with you, I wouldn't be entirely
> >opposed to adding a "string_impl.h" or similar header with some shared
> >preprocessor logic for all of the string functions that might be doing
> >sketchy things with aliasing and alignment. I'd appreciate comments on
> >this matter from others on which way we should go.
> 
> 
> my feeling is that we should stick to our current policy of minor
> macros being defined in the TUs that use them, making it both
> simpler to read, and faster to compile (less work for the
> preprocessor).
> 
> sufficiently complex macros can go to internal headers instead, so
> there's only one spot to be taken care of.

Indeed, while it's tempting to do something like defining "repr32",
"repr64", etc. types (with may_alias attribute) in a common header, I
think it makes the code a lot less clear, and actually more difficult
to maintain if requirements of individual translation units change
(for example some want to use fixed bit sizes, others want to use
system word size).

What may be beneficial, however, is putting the logic for whether
aliasing rule exemptions are possible at all, and the attribute to use
to get them, into a shared header file. However I'm not convinced that
this logic is anymore complex than "#ifdef __GNUC__", so I think for
now I'd like to start off just doing it simple, and possibly switch
later as more string functions are adapted to document their aliasing
to the compiler. I've always found that if factoring out this kind of
logic is beneficial, the right way to factor it emerges while actually
implementing the code rather than by trying to guess ahead of time
what will be best.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset [v2]
  2013-08-27  8:30 Optimized C memset Rich Felker
  2013-08-27  8:52 ` Jens Gustedt
@ 2013-08-27 16:22 ` Rich Felker
  2013-08-27 17:28   ` Jeremy Huntwork
  2013-08-28  0:05   ` Andre Renaud
  1 sibling, 2 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-27 16:22 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 570 bytes --]

Here's version 2 (filename version 6, in honor of glibc ;) of the
memset code. I fixed a bug in the logic for coverage of the tail (the
part past what's covered by the loop) for some values of n and
alignments, and cleaned up the __GNUC__ usage a bit to use less
#ifdeffery. The remaining test at the top for the __GNUC__ version is
ugly, I admit, and should possibly just be removed and replaced by a
configure check to add -D__may_alias__= to the CFLAGS if the compiler
defines __GNUC__ but does not recognize __attribute__((__may_alias__))
-- opinions on this?

Rich

[-- Attachment #2: memset6.c --]
[-- Type: text/plain, Size: 1275 bytes --]

#include <string.h>
#include <stdint.h>

#if defined(__GNUC__) && 100*__GNUC__+__GNUC_MINOR__ < 302
#define __may_alias__
#endif

void *memset(void *dest, int c, size_t n)
{
	unsigned char *s = dest;
	size_t k;

	if (!n) return dest;
	s[0] = s[n-1] = c;
	if (n <= 2) return dest;
	s[1] = s[n-2] = c;
	s[2] = s[n-3] = c;
	if (n <= 6) return dest;
	s[3] = s[n-4] = c;
	if (n <= 8) return dest;

	k = -(uintptr_t)s & 3;
	s += k;
	n -= k;
	n &= -3;

#ifdef __GNUC__
	typedef uint32_t __attribute__((__may_alias__)) u32;
	typedef uint64_t __attribute__((__may_alias__)) u64;

	u32 c32 = ((u32)-1)/255 * (unsigned char)c;

	*(u32 *)(s+0) = c32;
	*(u32 *)(s+n-4) = c32;
	if (n <= 8) return dest;
	*(u32 *)(s+4) = c32;
	*(u32 *)(s+8) = c32;
	*(u32 *)(s+n-12) = c32;
	*(u32 *)(s+n-8) = c32;
	if (n <= 24) return dest;
	*(u32 *)(s+12) = c32;
	*(u32 *)(s+16) = c32;
	*(u32 *)(s+20) = c32;
	*(u32 *)(s+24) = c32;
	*(u32 *)(s+n-28) = c32;
	*(u32 *)(s+n-24) = c32;
	*(u32 *)(s+n-20) = c32;
	*(u32 *)(s+n-16) = c32;

	k = 24 + ((uintptr_t)s & 4);
	s += k;
	n -= k;

	u64 c64 = c32 | ((u64)c32 << 32);
	for (; n >= 32; n-=32, s+=32) {
		*(u64 *)(s+0) = c64;
		*(u64 *)(s+8) = c64;
		*(u64 *)(s+16) = c64;
		*(u64 *)(s+24) = c64;
	}
#else
	for (; n; n--, s++) *s = c;
#endif

	return dest;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset [v2]
  2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker
@ 2013-08-27 17:28   ` Jeremy Huntwork
  2013-08-27 21:27     ` Rich Felker
  2013-08-28  0:05   ` Andre Renaud
  1 sibling, 1 reply; 14+ messages in thread
From: Jeremy Huntwork @ 2013-08-27 17:28 UTC (permalink / raw)
  To: musl

On Aug 27, 2013, at 12:22 PM, Rich Felker <dalias@aerifal.cx> wrote:

> Here's version 2 (filename version 6, in honor of glibc ;) of the
> memset code. I fixed a bug in the logic for coverage of the tail (the
> part past what's covered by the loop) for some values of n and
> alignments, and cleaned up the __GNUC__ usage a bit to use less
> #ifdeffery. The remaining test at the top for the __GNUC__ version is
> ugly, I admit, and should possibly just be removed and replaced by a
> configure check to add -D__may_alias__= to the CFLAGS if the compiler
> defines __GNUC__ but does not recognize __attribute__((__may_alias__))
> -- opinions on this?


Doing the test via configure and setting a flag seems more consistent with comments that have been made elsewhere on this list about the correct way to feature test.

JH



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset [v2]
  2013-08-27 17:28   ` Jeremy Huntwork
@ 2013-08-27 21:27     ` Rich Felker
  0 siblings, 0 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-27 21:27 UTC (permalink / raw)
  To: musl

On Tue, Aug 27, 2013 at 01:28:36PM -0400, Jeremy Huntwork wrote:
> On Aug 27, 2013, at 12:22 PM, Rich Felker <dalias@aerifal.cx> wrote:
> 
> > Here's version 2 (filename version 6, in honor of glibc ;) of the
> > memset code. I fixed a bug in the logic for coverage of the tail (the
> > part past what's covered by the loop) for some values of n and
> > alignments, and cleaned up the __GNUC__ usage a bit to use less
> > #ifdeffery. The remaining test at the top for the __GNUC__ version is
> > ugly, I admit, and should possibly just be removed and replaced by a
> > configure check to add -D__may_alias__= to the CFLAGS if the compiler
> > defines __GNUC__ but does not recognize __attribute__((__may_alias__))
> > -- opinions on this?
> 
> 
> Doing the test via configure and setting a flag seems more
> consistent with comments that have been made elsewhere on this list
> about the correct way to feature test.

Agreed. Also, clang/LLVM seems to still be lacking may_alias:

http://llvm.org/bugs/show_bug.cgi?id=11082

I'm not really worried about the semantic problem, since this is just
about the most harmless type of aliasing you can have, and we're
already using it (without proper annotations for the compiler) anyway.
But I don't want build regressions. So I'll whip up something for
configure...

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset [v2]
  2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker
  2013-08-27 17:28   ` Jeremy Huntwork
@ 2013-08-28  0:05   ` Andre Renaud
  2013-08-28  1:24     ` Rich Felker
  1 sibling, 1 reply; 14+ messages in thread
From: Andre Renaud @ 2013-08-28  0:05 UTC (permalink / raw)
  To: musl

Hi Rich,

On 28 August 2013 04:22, Rich Felker <dalias@aerifal.cx> wrote:
> Here's version 2 (filename version 6, in honor of glibc ;) of the
> memset code. I fixed a bug in the logic for coverage of the tail (the
> part past what's covered by the loop) for some values of n and
> alignments, and cleaned up the __GNUC__ usage a bit to use less
> #ifdeffery. The remaining test at the top for the __GNUC__ version is
> ugly, I admit, and should possibly just be removed and replaced by a
> configure check to add -D__may_alias__= to the CFLAGS if the compiler
> defines __GNUC__ but does not recognize __attribute__((__may_alias__))
> -- opinions on this?

Can you explain the algorithm a bit - I can't entirely follow the us
of negation/masking, but it looks like at the end you're doing a loop
of 64-bit aligned writes, but I don't see how it can work if the tail
end ends in something that isn't 64-bit aligned? Is this assuming that
unaligned writes will work ok?

Regards,
Andre


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Optimized C memset [v2]
  2013-08-28  0:05   ` Andre Renaud
@ 2013-08-28  1:24     ` Rich Felker
  0 siblings, 0 replies; 14+ messages in thread
From: Rich Felker @ 2013-08-28  1:24 UTC (permalink / raw)
  To: musl

On Wed, Aug 28, 2013 at 12:05:43PM +1200, Andre Renaud wrote:
> Hi Rich,
> 
> On 28 August 2013 04:22, Rich Felker <dalias@aerifal.cx> wrote:
> > Here's version 2 (filename version 6, in honor of glibc ;) of the
> > memset code. I fixed a bug in the logic for coverage of the tail (the
> > part past what's covered by the loop) for some values of n and
> > alignments, and cleaned up the __GNUC__ usage a bit to use less
> > #ifdeffery. The remaining test at the top for the __GNUC__ version is
> > ugly, I admit, and should possibly just be removed and replaced by a
> > configure check to add -D__may_alias__= to the CFLAGS if the compiler
> > defines __GNUC__ but does not recognize __attribute__((__may_alias__))
> > -- opinions on this?
> 
> Can you explain the algorithm a bit - I can't entirely follow the us
> of negation/masking, but it looks like at the end you're doing a loop
> of 64-bit aligned writes, but I don't see how it can work if the tail
> end ends in something that isn't 64-bit aligned? Is this assuming that
> unaligned writes will work ok?

See the version I committed a couple hours ago. It has comments added.
The basic thing you're missing is that the code before the loop fills
from both the beginning and the end, not just the beginning. This
allows for a really effective O(log n) branch strategy to fill n
bytes: essentially, knowing n>=k allows you to fill up to 2*k bytes:
0,1,...,k-1 and n-1,n-2,n-3,...,n-k. If n<2*k, some of these will
overlap, but it doesn't matter.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-08-28  1:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-27  8:30 Optimized C memset Rich Felker
2013-08-27  8:52 ` Jens Gustedt
2013-08-27  9:17   ` Rich Felker
2013-08-27  9:50     ` Jens Gustedt
2013-08-27 14:21       ` Rich Felker
2013-08-27 14:34         ` Luca Barbato
2013-08-27 14:39           ` Rich Felker
2013-08-27 15:20         ` John Spencer
2013-08-27 15:34           ` Rich Felker
2013-08-27 16:22 ` Optimized C memset [v2] Rich Felker
2013-08-27 17:28   ` Jeremy Huntwork
2013-08-27 21:27     ` Rich Felker
2013-08-28  0:05   ` Andre Renaud
2013-08-28  1:24     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).