New comment by tornaria on void-packages repository https://github.com/void-linux/void-packages/pull/32453#issuecomment-897002429 Comment: > Can you try the `-fno-semantic-interposition` compiler argument? Python has started using it for libpython for similar performance gains. I think it might allow us to keep the space gain of dynamic linking and the speed gain of static linking. A quick informal test shows that while adding that option improves the dynamic binary a bit, it's not close to recovering what is lost from pthreads. Running the script below with the current version in the repos (no pthreads, no CFLAGS): ``` $ gp -q regress-best5.gp 4470 4491 4500 4532 4509 best=4470 ``` Now with the new version, enabling pthreads: - CFLAGS="" gp-dyn: 5970 6000 6006 6010 6020 best=5970 gp-sta: 4219 4280 4210 4154 4203 best=4154 - CFLAGS="-flto" gp-dyn: 6226 6168 6145 6196 6095 best=6095 gp-sta: 4166 4045 4028 4010 4013 best=4010 - CFLAGS="-flto -fno-semantic-interposition" gp-dyn: 5669 5680 5702 5686 5628 best=5628 gp-sta: 3900 3934 4050 3921 3963 best=3900 So for this particular test, there's a 40% loss by going dynamic with `-fno-semantic-interposition` instead of 50% loss without. Note that we are NOT static linking system libraries. It's only `libpari` that is statically linked into the `gp` instead of being dynamically linked. --- - regress-best_of_5.gp ``` sigmatwist1(n,k)= { if(denominator(n)>1||n<=0,return(0)); n>>=valuation(n,2); sumdiv(n,d,if(d%4==1,d^k,-d^k)); } sumtwist1(k,m,N)= { sigmatwist1(m/N,k-1)+2*sum(s=1,sqrtint(m),sigmatwist1((m-s^2)/N,k-1)); } a={vector(5,i, gettime(); sumtwist1(3,(10^12+4)/4,1); t=gettime(); print1(t," "); t)}; print(" best=",vecmin(a)); \q ```