mailing list of musl libc
 help / color / Atom feed
* [PATCH] a_ctz_32: Instead of a subtraction by 31, use an xor
@ 2019-11-08  2:39 Rosen Penev
  2019-12-12 15:10 ` Alexander Monakov
  0 siblings, 1 reply; 3+ messages in thread
From: Rosen Penev @ 2019-11-08  2:39 UTC (permalink / raw)
  To: musl

This reduces produced assembly from a mov and sub to a single xor.

Tested with godbolt: https://godbolt.org/z/Qz-Qr5

Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 src/internal/atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/internal/atomic.h b/src/internal/atomic.h
index f938879b..196b3fb1 100644
--- a/src/internal/atomic.h
+++ b/src/internal/atomic.h
@@ -256,7 +256,7 @@ static inline void a_crash()
 static inline int a_ctz_32(uint32_t x)
 {
 #ifdef a_clz_32
-	return 31-a_clz_32(x&-x);
+	return a_clz_32(x&-x)^31;
 #else
 	static const char debruijn32[32] = {
 		0, 1, 23, 2, 29, 24, 19, 3, 30, 27, 25, 11, 20, 8, 4, 13,
-- 
2.23.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] a_ctz_32: Instead of a subtraction by 31, use an xor
  2019-11-08  2:39 [PATCH] a_ctz_32: Instead of a subtraction by 31, use an xor Rosen Penev
@ 2019-12-12 15:10 ` Alexander Monakov
  2019-12-12 15:17   ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Monakov @ 2019-12-12 15:10 UTC (permalink / raw)
  To: musl

On Thu, 7 Nov 2019, Rosen Penev wrote:

> This reduces produced assembly from a mov and sub to a single xor.
> 
> Tested with godbolt: https://godbolt.org/z/Qz-Qr5
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>
> ---
>  src/internal/atomic.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/internal/atomic.h b/src/internal/atomic.h
> index f938879b..196b3fb1 100644
> --- a/src/internal/atomic.h
> +++ b/src/internal/atomic.h
> @@ -256,7 +256,7 @@ static inline void a_crash()
>  static inline int a_ctz_32(uint32_t x)
>  {
>  #ifdef a_clz_32
> -	return 31-a_clz_32(x&-x);
> +	return a_clz_32(x&-x)^31;

(since there was no other response...)

While certainly this is a nice improvement when considered in isolation,
looking at how this is used in the rest of the library (in ffs, ffsl, qsort)
reveals that returned value is used in further arithmetics.

So e.g. ffs/ffsl do a_ctz_l(i)+1, and +1 may be folded together with 31-...,
but not with the xor.  Thus the original is preferable.

Alexander


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] a_ctz_32: Instead of a subtraction by 31, use an xor
  2019-12-12 15:10 ` Alexander Monakov
@ 2019-12-12 15:17   ` Rich Felker
  0 siblings, 0 replies; 3+ messages in thread
From: Rich Felker @ 2019-12-12 15:17 UTC (permalink / raw)
  To: musl

On Thu, Dec 12, 2019 at 06:10:31PM +0300, Alexander Monakov wrote:
> On Thu, 7 Nov 2019, Rosen Penev wrote:
> 
> > This reduces produced assembly from a mov and sub to a single xor.
> > 
> > Tested with godbolt: https://godbolt.org/z/Qz-Qr5
> > 
> > Signed-off-by: Rosen Penev <rosenp@gmail.com>
> > ---
> >  src/internal/atomic.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/internal/atomic.h b/src/internal/atomic.h
> > index f938879b..196b3fb1 100644
> > --- a/src/internal/atomic.h
> > +++ b/src/internal/atomic.h
> > @@ -256,7 +256,7 @@ static inline void a_crash()
> >  static inline int a_ctz_32(uint32_t x)
> >  {
> >  #ifdef a_clz_32
> > -	return 31-a_clz_32(x&-x);
> > +	return a_clz_32(x&-x)^31;
> 
> (since there was no other response...)
> 
> While certainly this is a nice improvement when considered in isolation,
> looking at how this is used in the rest of the library (in ffs, ffsl, qsort)
> reveals that returned value is used in further arithmetics.
> 
> So e.g. ffs/ffsl do a_ctz_l(i)+1, and +1 may be folded together with 31-...,
> but not with the xor.  Thus the original is preferable.

Thanks for commenting on this. I hadn't thought about that, and indeed
the current version is probably better.

It would be even nicer if there were some way to make the compiler
aware that the range is limited (so that it could make such a
transformation itself in either direction if it wanted), but it
probably doesn't matter.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08  2:39 [PATCH] a_ctz_32: Instead of a subtraction by 31, use an xor Rosen Penev
2019-12-12 15:10 ` Alexander Monakov
2019-12-12 15:17   ` Rich Felker

mailing list of musl libc

Archives are clonable: git clone --mirror http://inbox.vuxu.org/musl

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.musl


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git