New comment by tornaria on void-packages repository

https://github.com/void-linux/void-packages/pull/32453#issuecomment-897002429

Comment:
> Can you try the `-fno-semantic-interposition` compiler argument? Python has started using it for libpython for similar performance gains. I think it might allow us to keep the space gain of dynamic linking and the speed gain of static linking.

A quick informal test shows that while adding that option improves the dynamic binary a bit, it's not close to recovering what is lost from pthreads. Running the script below with the current version in the repos (no pthreads, no CFLAGS):
```
$ gp -q regress-best5.gp
4470 4491 4500 4532 4509   best=4470
```

Now with the new version, enabling pthreads:
- CFLAGS=""
gp-dyn: 5970 6000 6006 6010 6020   best=5970
gp-sta: 4219 4280 4210 4154 4203   best=4154

- CFLAGS="-flto"
gp-dyn: 6226 6168 6145 6196 6095   best=6095
gp-sta: 4166 4045 4028 4010 4013   best=4010

- CFLAGS="-flto -fno-semantic-interposition"
gp-dyn: 5669 5680 5702 5686 5628   best=5628
gp-sta: 3900 3934 4050 3921 3963   best=3900

So for this particular test, there's a 40% loss by going dynamic with `-fno-semantic-interposition` instead of 50% loss without. 

Note that we are NOT static linking system libraries. It's only `libpari` that is statically linked into the `gp` instead of being dynamically linked.

---




- regress-best_of_5.gp
```
sigmatwist1(n,k)=
{
   if(denominator(n)>1||n<=0,return(0));
   n>>=valuation(n,2);
   sumdiv(n,d,if(d%4==1,d^k,-d^k));
}

sumtwist1(k,m,N)=
{
   sigmatwist1(m/N,k-1)+2*sum(s=1,sqrtint(m),sigmatwist1((m-s^2)/N,k-1));
}

a={vector(5,i,
  gettime();
  sumtwist1(3,(10^12+4)/4,1);
  t=gettime();
  print1(t," ");
  t)};
  print("  best=",vecmin(a));
\q
```