From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 2588 invoked from network); 10 Feb 2023 14:20:11 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 10 Feb 2023 14:20:11 -0000 Received: (qmail 18117 invoked by uid 550); 10 Feb 2023 14:20:08 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 18082 invoked from network); 10 Feb 2023 14:20:07 -0000 Date: Fri, 10 Feb 2023 09:19:55 -0500 From: Rich Felker To: David Wang <00107082@163.com> Cc: musl@lists.openwall.com, Markus Wichmann Message-ID: <20230210141955.GA4163@brightrain.aerifal.cx> References: <4d290220.36d6.1860222ca46.Coremail.00107082@163.com> <20230201180115.GB2626@voyager> <20230209190316.GU4163@brightrain.aerifal.cx> <75d9cfae.35eb.1863ac4e3c0.Coremail.00107082@163.com> <20230210131044.GZ4163@brightrain.aerifal.cx> <23b37232.4d4c.1863b92aa13.Coremail.00107082@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <23b37232.4d4c.1863b92aa13.Coremail.00107082@163.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl] qsort On Fri, Feb 10, 2023 at 09:45:12PM +0800, David Wang wrote: > > > > At 2023-02-10 21:10:45, "Rich Felker" wrote: > >On Fri, Feb 10, 2023 at 06:00:27PM +0800, David Wang wrote: > > >What tool was used for this? gprof or anything else invasive is not > >meaningful; for tiny functions, the entire time measured will be the > >profiling overhead. perf(1) is the only way I know to get meaningful > >numbers. > > > >In particular, it makes no sense that significant time was spent in > >wrapper_cmp, which looks like (i386): > > > > 0: ff 64 24 0c jmp *0xc(%esp) > > > >or (x86_64): > > > > 0: ff e2 jmpq *%rdx > > > >or (arm): > > > > 0: 4710 bx r2 > > > >but I can imagine it being (relatively) gigantic with a call out to > >profiling code. > > > >Rich > > I have myself implemented a profiling tool, using perf_event_open to > start profiling and mmap to collect callchains, the source code is > here > https://github.com/zq-david-wang/linux-tools/blob/main/perf/profiler/profiler.cpp > (Still buggy, there is always strange callchain which I could not > figure out...and I am still working on it...) Thanks for sharing. It's nice to see another tool like this. > Also, I did not use any optimization when compile the code, which > could make a difference, I will take time to give it a try . Yes, that would make a big difference. For this to be meaninful the measurement needs to be with optimizations. > About wrapper_cmp, in my last profiling, there are total 931387 > samples collected, 257403 samples contain callchain ->wrapper_cmp, > among those 257403 samples, 167410 samples contain callchain > ->wrapper_cmp->mycmp, that is why I think there is extra overhead > about wrapper_cmp. Maybe compiler optimization would change the > result, and I will make further checks. Yes. On i386 here, -O0 takes wrapper_cmp from 1 instruction to 10 instructions. Rich