9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] [PATCH] branchless assembly abs() and labs() for amd64
@ 2020-12-25 16:36 Kemal
  2020-12-25 18:13 ` ori
  0 siblings, 1 reply; 6+ messages in thread
From: Kemal @ 2020-12-25 16:36 UTC (permalink / raw)
  To: 9front

hello,

i remembered a method to do branchless abs.
can i send it as a patch?

regards.

diff -r d9e940a768d1 sys/src/libc/amd64/abs.s
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/sys/src/libc/amd64/abs.s Fri Dec 25 15:54:42 2020 +0300
@@ -0,0 +1,16 @@
+/*
+this method of abs works this way:
+first it gets the sign bit and extends it with cdq
+sign bit is the most significant bit of an integer showing it's sign
+lets say we have a decimal number -127. according to two's
complement, this is "10000001" in binary. the sign bit here is the
first "1" cdq gets the ax's
+sign bit and extends it like "11111111" then puts it to dx. now we are going to
+do a little math trick here. if we xor dx and ax we will have
"01111110" which is 126. according to two's complement, anything goes
like "11...11" is -1, so to recover the missing 1 we can substract dx
from ax which will
+result in ax-(-1) which would be ax+1. we have 127, the abs value.
+i don't remember the source where i got this method from so i can't
cite them :-(
+*/
+TEXT abs(SB),$0
+ MOVL RARG, AX
+ CDQ
+ XORL DX, AX
+ SUBL DX, AX
+ RET
diff -r d9e940a768d1 sys/src/libc/amd64/labs.s
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/sys/src/libc/amd64/labs.s Fri Dec 25 15:54:42 2020 +0300
@@ -0,0 +1,7 @@
+/* check abs.s for explanation */
+TEXT labs(SB),$0
+ MOVL RARG, AX
+ CDQ
+ XORL DX, AX
+ SUBL DX, AX
+ RET
diff -r d9e940a768d1 sys/src/libc/amd64/mkfile
--- a/sys/src/libc/amd64/mkfile Mon Oct 19 01:20:29 2020 +0200
+++ b/sys/src/libc/amd64/mkfile Fri Dec 25 15:54:42 2020 +0300
@@ -3,10 +3,12 @@

 LIB=/$objtype/lib/libc.a
 SFILES=\
+ abs.s\
  argv0.s\
  atom.s\
  cycles.s\
  getfcr.s\
+ labs.s\
  main9.s\
  main9p.s\
  memccpy.s\

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9front] [PATCH] branchless assembly abs() and labs() for amd64
  2020-12-25 16:36 [9front] [PATCH] branchless assembly abs() and labs() for amd64 Kemal
@ 2020-12-25 18:13 ` ori
  2020-12-25 18:28   ` boehm.igor
  0 siblings, 1 reply; 6+ messages in thread
From: ori @ 2020-12-25 18:13 UTC (permalink / raw)
  To: kemalinanc8, 9front

Quoth Kemal <kemalinanc8@gmail.com>:
> hello,
> 
> i remembered a method to do branchless abs.
> can i send it as a patch?

more code is more work, especially with platform
specific assembly. is there any real world program
where this matters a lot?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9front] [PATCH] branchless assembly abs() and labs() for amd64
  2020-12-25 18:13 ` ori
@ 2020-12-25 18:28   ` boehm.igor
  2020-12-25 22:30     ` Kemal
  0 siblings, 1 reply; 6+ messages in thread
From: boehm.igor @ 2020-12-25 18:28 UTC (permalink / raw)
  To: 9front

> Quoth Kemal <kemalinanc8@gmail.com>:
>> hello,
>> 
>> i remembered a method to do branchless abs.
>> can i send it as a patch?

Nice :-)

> more code is more work, especially with platform
> specific assembly. is there any real world program
> where this matters a lot?

On top of having quantitative data (i.e.  a microbenchmark as well as
something real like a benchmark suite) available that show the benefit,
it would also be interesting what the compiler generates for
abs()/labs() and juxtapose that to see how much savings in code size,
improvement in performance, etc. this can accomplish.

That way it is easier to make the trade-off between maintainability
burden vs. performance gain.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9front] [PATCH] branchless assembly abs() and labs() for amd64
  2020-12-25 18:28   ` boehm.igor
@ 2020-12-25 22:30     ` Kemal
  2020-12-26  1:38       ` ori
  0 siblings, 1 reply; 6+ messages in thread
From: Kemal @ 2020-12-25 22:30 UTC (permalink / raw)
  To: 9front

hello again,

on maintainability issue, we can do it on c with bit shifts like this:

#include <u.h>
#include <libc.h>

int
abs(int a)
{
 return (a ^ (a >> 8*sizeof(int)-1)) - (a >> 8*sizeof(int)-1));
}

so that we don't have to maintain more platform specific code.

on size, the assembly one seems to be the clear winner. the original
abs's object file (unlinked) is around 400~ bytes, assembly one is
120-140~, the branchless c one is 300~.

on performance, well i couldn't do much tests on this and i think my
test program doesn't compare these 3 implementations properly. i might
try performance benchmarks if i can setup a better suite.

on if abs is commonly used, well i couldn't find a program that uses
it heavily so having branchless abs is not important.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9front] [PATCH] branchless assembly abs() and labs() for amd64
  2020-12-25 22:30     ` Kemal
@ 2020-12-26  1:38       ` ori
  2020-12-26 14:12         ` Kemal
  0 siblings, 1 reply; 6+ messages in thread
From: ori @ 2020-12-26  1:38 UTC (permalink / raw)
  To: kemalinanc8, 9front

Quoth Kemal <kemalinanc8@gmail.com>:
> hello again,
> 
> on maintainability issue, we can do it on c with bit shifts like this:

Simplicity and clarity should win unless the
added complexity buys us something. To me, the
bit twiddling operation seems less obviously
correct than the current:

		if(a < 0)
			return -a;
		return a;

I appreciate the effort, but I'd prefer that we
leave this patch be unless it's solving a problem
Optimization should ideally start with measuring
a real program.

> on size, the assembly one seems to be the clear winner. the original
> abs's object file (unlinked) is around 400~ bytes, assembly one is
> 120-140~, the branchless c one is 300~.

Note that unlinked sizes don't matter much: our
object files do not contain encoded assembly, but
instead a simpler "bytecode" that the linker will
optimize lightly and then encode.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9front] [PATCH] branchless assembly abs() and labs() for amd64
  2020-12-26  1:38       ` ori
@ 2020-12-26 14:12         ` Kemal
  0 siblings, 0 replies; 6+ messages in thread
From: Kemal @ 2020-12-26 14:12 UTC (permalink / raw)
  To: ori, 9front

i agree that this optimization is useless when there is no program
using it heavily so unless we have one you can discard this patch.
> I appreciate the effort, but I'd prefer that we
> leave this patch be unless it's solving a problem
> Optimization should ideally start with measuring
> a real program.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-26 14:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 16:36 [9front] [PATCH] branchless assembly abs() and labs() for amd64 Kemal
2020-12-25 18:13 ` ori
2020-12-25 18:28   ` boehm.igor
2020-12-25 22:30     ` Kemal
2020-12-26  1:38       ` ori
2020-12-26 14:12         ` Kemal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).