From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23442 invoked from network); 10 Feb 2023 13:11:02 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 10 Feb 2023 13:11:02 -0000 Received: (qmail 14239 invoked by uid 550); 10 Feb 2023 13:10:59 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 14173 invoked from network); 10 Feb 2023 13:10:58 -0000 Date: Fri, 10 Feb 2023 08:10:45 -0500 From: Rich Felker To: David Wang <00107082@163.com> Cc: musl@lists.openwall.com, Markus Wichmann Message-ID: <20230210131044.GZ4163@brightrain.aerifal.cx> References: <4d290220.36d6.1860222ca46.Coremail.00107082@163.com> <20230201180115.GB2626@voyager> <20230209190316.GU4163@brightrain.aerifal.cx> <75d9cfae.35eb.1863ac4e3c0.Coremail.00107082@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <75d9cfae.35eb.1863ac4e3c0.Coremail.00107082@163.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Re:Re: [musl] Re:Re: [musl] qsort On Fri, Feb 10, 2023 at 06:00:27PM +0800, David Wang wrote: > > > >Before we consider any such change though we should probably see > >profiling data to determine where the time is being spent in > >smoothsort, in case it's stupid stuff that's fixable. > > > >Rich > > I made a rough profiling against current implementation of qsort in > musl (Just copy the source code and recompile it to expose symbols) > > > The function call counters are as following: > > main: 931387 > qsort2: 917148 <----this is qsort > __qsort_r: 911594 > trinkle: 811314 > sift: 537263 > wrapper_cmp: 257403 <--- wrapper cmp function > mycmp: 167410 <---real cmp function > cycle: 105809 > shr: 41585 > pntz: 27833 > _init: 14925 > a_ctz_64: 11341 > a_ctz_l: 9127 > shl: 8333 > > > 1. wrapper_cmp took about 25%, overhead (257403-167410) seems high > when the real comp functions is very simple(which just compares > integer values), could be optimized out for qsort > 2. most "qsort" time spend in "trinkle", and most "trinkle" time > spend in "sift". > > > A tree-view profiling report is attached. (There may be several > incorrect call-chains collected, but I think the proportion among > function-call counters is correct, with high probability....) What tool was used for this? gprof or anything else invasive is not meaningful; for tiny functions, the entire time measured will be the profiling overhead. perf(1) is the only way I know to get meaningful numbers. In particular, it makes no sense that significant time was spent in wrapper_cmp, which looks like (i386): 0: ff 64 24 0c jmp *0xc(%esp) or (x86_64): 0: ff e2 jmpq *%rdx or (arm): 0: 4710 bx r2 but I can imagine it being (relatively) gigantic with a call out to profiling code. Rich