From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.scc.kit.edu (mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id s070MhOd016807 for ; Mon, 6 Jan 2014 19:22:43 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1W0KRO-0000Lg-Iq; Tue, 07 Jan 2014 01:22:42 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1W0KRO-0000Au-J8 for tech@mdocml.bsd.lv; Tue, 07 Jan 2014 01:22:42 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1W0KRO-0003gM-HP for tech@mdocml.bsd.lv; Tue, 07 Jan 2014 01:22:42 +0100 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1W0KRO-0001BS-61 for tech@mdocml.bsd.lv; Tue, 07 Jan 2014 01:22:42 +0100 Date: Tue, 7 Jan 2014 01:22:41 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: mdocml: Gprof(1) is fun. Message-ID: <20140107002241.GC4788@iris.usta.de> References: <201401062346.s06Nk7Xq005688@krisdoz.my.domain> <20140106235620.GA31237@britannica.bec.de> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20140106235620.GA31237@britannica.bec.de> User-Agent: Mutt/1.5.21 (2010-09-15) Hi, Joerg Sonnenberger wrote on Tue, Jan 07, 2014 at 12:56:20AM +0100: > On Mon, Jan 06, 2014 at 06:46:07PM -0500, schwarze@mdocml.bsd.lv wrote: >> Do not copy predefined strings into the dynamic string table, just >> leave them in their own static table and use that one as a fallback >> at lookup time. This saves us copying and deleting them for each manual. >> No functional change. > What about doing a binary sort of that table and also storing the size > and checking that first? Nice idea, that might help a bit more. There is one other trivial improvement - caching the result of the uname(3) multi-sysctl(3) library call, which is likely to yield another 4-5% speedup, and after that, the gprof(1) analysis probably ought to be redone to find the areas that are now taking the biggest chunks of time. Quite possible what you say may be among them. Here is a krautcomputing backup of my first gprof(1) analysis, mostly such that i don't lose it... Yours, Ingo ----- 8< ----- schnipp ----- >8 ----- 8< ----- schnapp ----- >8 ----- All numbers in percent of total user+system time of the profiled mandocdb(8) executable when run with -Q over /usr/share/man/. Profiling overhead (55% total): ------------------------------- __mcount 35.3 write 8.5 43.8 fsync 5.8 49.6 _thread_sys_unlink 2.6 52.2 _thread_sys_lseek 2.5 54.7 fcntl 0.8 55.5 In the following call tree, * "->" marks functions where different code paths join, coming down from different directions. This is only noted where more than one of these paths is relevant for performance. * "!!!" marks standard library functions where a lot of time was wasted in a particular code path. These were marked as a first step to identify candidates for optimization. * "<<<" marks mandoc(3) functions selected for optimization, as a second step in deciding what to work on. * "DONE" refers to code optimized on Jan 6, 2014; all these optimizations are committed. The total speedup archieved so far is 45% (3.1 -> 1.7 seconds), which is nearly a factor of two. Time spent on payload (45% total, about three seconds absolute) --------------------------------------------------------------- main mandocdb 44.6 mpages_merge 42.9 mparse_readfd 33.5 mparse_parse_buffer 30.5 mparse_buf_r 30.0 roff_parseln 17.3 roff_Dd 10.5 DONE roff_setstr 17.6 DONE roff_setstrn 17.5 DONE <<< TARGET 1 <<< strcmp 5.1 DONE !!! mandoc_malloc 4.1 DONE malloc 4.0 DONE !!! mandoc_strndup 3.0 DONE mandoc_malloc DONE -> roff_setstrn memcpy 1.4 DONE !!! roff_parse 2.9 roff_getstrn 2.3 <<< TARGET 2 <<< strncmp 1.4 !!! roff_TH 1.6 DONE roff_setstr DONE -> roff_Dd roff_cond_sub 1.5 roff_parse -> roff_parseln roff_ds 0.8 roff_setstr -> roff_Dd mdoc_parseln 7.9 mdoc_pmacro 7.7 mdoc_macro 6.3 blk_full 3.9 rew_sub 1.7 rew_last 3.6 mdoc_valid_post 5.7 post_dd 4.1 DONE mandoc_normdate 5.7 DONE <<< TARGET 3 <<< a2time 5.0 DONE mktime 4.9 DONE !!! time2a 0.8 DONE strftime 0.5 DONE !!! post_os 1.0 uname 1.0 !!! dword -> in_line_eoln in_line_eoln 2.3 dword 2.3 mdoc_word_alloc 1.9 node_append 2.4 mdoc_valid_post roff_strdup 1.1 mandoc_realloc 0.8 realloc 0.7 rew_elem 1.6 rew_last -> rew_sub in_line man_parseln 1.8 man_pmacro 1.7 blk_imp 1.1 rew_scope 0.7 man_unscope 0.9 man_valid_post 1.7 post_TH 1.7 DONE mandoc_normdate DONE -> post_dd in_line_eoln 0.6 mparse_end 0.5 _thread_sys_open 1.8 _thread_sys_close 1.0 munmap 0.8 mparse_reset 8.8 roff_reset 8.6 DONE roff_setstr DONE ->roff_Dd roff_free1 4.2 DONE roff_freestr 4.0 DONE <<< TARGET 4 <<< free 4.6 DONE !!! munmap DONE -> mparse_readfd treescan 1.6 fts_read 1.4 -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv