From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 24380 invoked from network); 15 Nov 2023 01:26:10 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 15 Nov 2023 01:26:10 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 52e4b231 for ; Wed, 15 Nov 2023 01:26:08 +0000 (UTC) Received: from scc-mailout-kit-01.scc.kit.edu (scc-mailout-kit-01.scc.kit.edu [129.13.231.81]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 19e46271 for ; Wed, 15 Nov 2023 01:26:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=usta.de; s=kit1; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject :Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=jQcQxqH/qYqRzV/gIrUSJEfceZIR7EdKlA0onZzAk10=; b=sVS2mMzF2ljOXL9jumVkcbRib4 TwilrQEfBkKjbwRWx8cZnT/miriP0/bYRGeaqSAjY5iEayWhhkTwz2CXYpXkSFlDJ9+kV6RmuyPMb p1761Pj0k5C9wMssjSZ/BdRVV614wjfAlJiIYebdcbIEuPH3Auvz37Q6tqnCx39zLrqAUHLBbU2H0 O9n3XB6tDAC4LZvY2z9n2GI0ix7ONhGOaf0bshyVh3x50KZSFe2t8T+174q4Ydr02DpLHj1nklATX 19IZjna2PSPVuFra0NcO946Dp5j9hY/nTqMg8a6DGBj5VrtWIKtTlfGm39FgI0hRNxPMVU8OJgqdO l4jeVWbA==; Received: from hekate.asta.kit.edu ([2a00:1398:5:f401::77]) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (envelope-from ) id 1r34fP-0020ru-0p; Wed, 15 Nov 2023 02:26:07 +0100 Received: from login-1.asta.kit.edu ([2a00:1398:5:f400::72]) by hekate.asta.kit.edu with esmtp (Exim 4.94.2) (envelope-from ) id 1r34fO-001hcA-7p; Wed, 15 Nov 2023 02:26:05 +0100 Received: from schwarze by login-1.asta.kit.edu with local (Exim 4.94.2) (envelope-from ) id 1r34fN-004Cn1-HT; Wed, 15 Nov 2023 02:26:05 +0100 Date: Wed, 15 Nov 2023 02:26:05 +0100 From: Ingo Schwarze To: Wesley Moore Cc: tech@mandoc.bsd.lv Subject: Re: Improve performance of makewhatis Message-ID: References: <20231113014330.2247710-1-wes@wezm.net> X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231113014330.2247710-1-wes@wezm.net> Hi Wesley, Wesley Moore wrote on Mon, Nov 13, 2023 at 11:42:42AM +1000: > In this patch set I aimed to improve the performance of makewhatis. > On my Chimera Linux systems Interesting. I just added Chimera Linux to these two pages: https://mandoc.bsd.lv/ports.html https://mandoc.bsd.lv/porthistory.html Can you confirm that these statements are accurate? * Chimera Linux includes mandoc in its base system since 2021 Oct 30. * There were no official nor unofficial packages of mandoc for Chimera Linux before that. * It uses the mandoc(1) parsing and formatting engine by default since the same date. * The default apropos(1) program is mandoc since the same date. * The default man(1) program is mandoc since the same date. * Chimera Linux does not provide an officially supported alternative for the man(1) program that users can select. * Chimera Linux does not publish its manual pages on the web on a site similar to https://manpages.debian.org/ or https://man.openbsd.org/ or https://man.voidlinux.org/ . > makewhatis -Tutf8 is run to rebuild the index > whenever a package containing man pages is installed or updated. That's not particularly smart. Rebuilding the whole database from scratch is unavoidably slow. The makewhatis(8) options -d and -u are specifically provided to serve the needs of package management systems. Specifically, if a package management systems cares about performance, it can use -d to add just those manual page files it just installed to the existing database, rather than rebuilding the whole database from scratch, which obviously implies reading every manual page file in the whole system from disk and parsing all those files. > I noticed this took multiple seconds even on a Ryzen 9 7950X system. That may not be a major problem because installing an additional package is not usually a fast operation (it usually requires both network access and expensive database locking and management outside the domain of manual pages) and it isn't done often either. So i expect most users won't feel offended if installing new software takes a few seconds. > Profiling showed a lot of time spent in the loops of roff_getstrn. Hmm, even though i'm not convinced your usecase is particularly important, that's interesting, and i wouldn't have expected it. It may be worth looking into that. > To speed this up I made use of hash tables. I tested the changes on > several different systems: > > - OpenBSD amd64 in VM > - Linux on Raspberry Pi 4 > - Linux on Ryzen 9 7950X > > In each case the wall clock run time for makewhatis -Tutf8 -n showed > an reduction of between 25% and 35%. For makewhatis(8), i would say that's such a small gain that it likely isn't worth much complication. Then again, it also affects mandoc(1), so maybe looking into it is worthwhile after all. Here is what i found with gprof(1) on OpenBSD-current: share local 100% 100% main -> mandocdb 92% 95% (mandocdb) -> mpages_merge 66% 67% (mpages_merge) -> mparse_readfd -> mparse_buf_r 32% 26% (mparse_buf_r) -> roff_parseln 19% 21% MANY -> free -> ofree 16.3% 2.5% (mparse_buf_r) -> mdoc_parseln 6.8% 18.4% (mparse_buf_r) -> man_parseln 15.6% 14.7% MANY -> roff_word_alloc 14.9% 17.0% (_libc_malloc etc) -> omalloc 14.7% 10.9% (mpages_merge) -> mparse_reset 14.6% 10.6% (roff_parse, roff_expand) -> roff_getstrn The "share" column refers to /usr/share/man/, i.e. the OpenBSD base system manuals, a good example of a large, mdoc(7)-heavy collection of manual pages. The "local" column refers to /usr/local/man/, i.e. the optional third-party software i happen to have installed, which contains mostly man(7) manual pages. On each line, the two numbers give the time spent in the respective function including functions called from it, relative to the time spent in main() including all functions called from there. The last name on each line is the respective function. The names before the arrows specify from where it is usually called. So only 10-15% of the time is spent in roff_getstrn(), which cannot possibly result in the 25-35% overall speedup you report, so something looks fishy here. Then again, even 25-35% is not a large gain. In general, improving the structure is much more important than improving the performance because the code has grown historically, becoming more and more complicated and less and less uniform in structure, i.e. harder and harder to maintain. In particular, such a small gain is clearly not worth further complication. On the other hand, the critical code path for the optimization potential you found is: roff_parse -> roff_getstrn In that area, code structure is not particularly good. Parsing for macros and strings is done in various different ways: 1. The code path roff_parse -> roff_getstrn looks for user-defined and renamed macros using r->strtab and r->rentab. 2. The code path roff_expand -> roff_getstrn looks for predefined strings using a linear search in predefs[]. 3. The code paths roff_block(ROFF_am) -> roff_getstrn and roff_rn -> roff_getstrn look for standard high-level macros directly in the roff_name[] array. 4. The functions mdoc_pmacro() in mdoc.c and man_pmacro() in man.c look for standard high-level macros using local mdoc->mdocmac and man->manmac hash tables with roffhash_find(). So it is likely this can be better structured with less duplication and not gratuitiously using different data structures, while also using hash tables for macro and string lookup throughout. > The one caveat of this change is the roff/while/into regression test is > failing with my changes. I've tried to work out why but I don't know > roff well enough understand the intent of the test nor have I been able > to clearly determine why my changes affect it. I was hoping to get some > help with this. The roff/while/into test uses .de in a very strange way, and .de internally uses the string storage, so that's how it may be related, but i would have to look at the details. Then again, studying that is not urgent because your patch is definitely not ready for commit. It adds complication rather than removing some. In particular: * Do not touch roff_int.h. It is only intended for internal API features needed by more than one parser, see mandoc_headers(3) for details, but what you are doing here is local to the roff parser (roff.c). * struct roff_entry is essentially the same as struct roffkv. There should not be two data structures for the same purpose. * roff_free_entry() is trivial and only used at one place, so it should be inlined. * Renaming roff_setstrn to roff_setentry causes quite some churn, so don't do that. If that means using ohash for xmbtab, too, so be it. * roff_strhash_alloc is also next to trivial. Maybe it can be inlined - maybe not because it is called from a few more places. If we introduce it, roffhash_alloc should also use it. * The duplication between roffhash_free() and roff_strhash_free() and between roffhash_find() and roff_strhash_find() is rather ugly. Not yet sure what to do about that. * roff_strhash_insert() is trivial and called only from one place. Some ideas are also good: * Using struct ohash *strtab directly in struct roff is good, just like it is already done for struct ohash *reqtab. * Introducing r->pretab and clearing it in roff_free() rather than in the badly named roff_free1() - just like r->reqtab - is good. One other thing. I hate patch series, don't ever send them. They are usually a symptom of poor development methodology, and usually i reject them without even inspecting them. Here, i made an exception because your idea seems to have some merit. Every patch needs to * perform one well-defined task and * make the code better even if nothing else is ever committed building on it. Splitting a patch like you did it here only adds obfuscation If a patch becomes too big for review, then the *task* it performs needs to be split into logically separate steps, rather than splitting code changes into mutiple patch files that essentially all save the same purpose and are similar and closely related in their content. Such *logical* splitting can become tricky, but i doubt that is needed here. I'm not yet completely sure what the best way forward is. Probably something like: 1. Identifying all places that do string lookup in tables or lists. Some can probably be left alone: arch.c, att.c, cgi.c, chars.c, eqn.c, mansearch.c, mdoc_argnames in mdoc.c, msec.c, st.c, regtab in roff.c, tbl_layout.c, tbl_opts.c But the others should probably be dealt with in a unified manner: reqtab, mdocmac, manmac on the one hand strtab, rentab, pretab, xmbtab on the other hand Not sure yet whether xtab can be included into xmbtab. 2. Design and implement a common handling for these using ohash that is as simple as possible. Not sure yet whether these are multiple steps or a single step, but doing one separate step for every *tab is definitely not the way. I guess that's enough for today... Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv