On Tue, 30 Jun 2015, Rich Felker wrote: > Discussion on #musl with Timo Teräs has produced the following > results: > > - Moving bloom filter size to struct dso gives 5% improvement in clang > (built as 110 .so's) start time, simply because of a reduction of > number of instructions in the hot path. So I think we should apply > that patch. I think most of the improvement here actually comes from fewer cache misses. As a result, I think we should take this idea further and shuffle struct dso a little bit so that fields accessed in the hot find_sym loop are packed together, if possible. > - The whole outer for loop in find_sym is the hot path for > performance. As such, eliminating the lazy calculation of gnu_hash > and simply doing it before the loop should be a measurable win, just > by removing the if (!ghm) branch. On a related note, it's possible to avoid calculating sysv hash, if gnu-hash is enabled system-wide, by not setting 'global' flag on the vdso item (as mentioned on IRC in your conversation with Timo). > - Even the check if (!dso->global) continue; has nontrivial cost. > Since I want to replace this representation with a separate > linked-list chain for global dsos anyway (for other reasons) I think > that's worth prioritizing for performance too. I'm curious what the other reasons are? :) > - The strength-reduction of remainder operations does not seem to > provide worthwhile benefits yet, simply because so little of the > overall time is spent on the division/remainder. On IRC we noted that on AArch64 it's slower than native div/mod on our microbenchmark, and on ARM the speedup is smaller than expected. My testing on x86 indicates that it's not profitable in the dynamic linker (not sure why). Alexander