On Sun, Jun 28, 2015 at 02:48:34AM +0300, Alexander Monakov wrote: > Based on http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html > > Do a little hand-holding for the compiler and fold magic post-shift into > 32-bit high word -> low word shift on 64-bit platforms. > --- On 32-bit Atom, I was able to get an improvement from 116ms to 113ms running an empty program linked with a .so produced using the attached script. This is probably roughly best-case improvement, so the umod stuff is probably not likely to be worthwhile unless/until we can improve the areas that are currently dominating the runtime. That's rather disappointing, but maybe we can improve enough other stuff for it to make sense to do the umod optimization. Rich