From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 7443 invoked from network); 15 Nov 2023 08:29:40 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 15 Nov 2023 08:29:40 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 71be79fc for ; Wed, 15 Nov 2023 08:29:36 +0000 (UTC) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 2bdb7a5d for ; Wed, 15 Nov 2023 08:29:36 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 121F75C0154 for ; Wed, 15 Nov 2023 03:29:36 -0500 (EST) Received: from imap50 ([10.202.2.100]) by compute5.internal (MEProxy); Wed, 15 Nov 2023 03:29:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wezm.net; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1700036976; x=1700123376; bh=IR BtGrUy2Ft42RMj6V5jPTnWj+SAZd9T46DmCXpoUVs=; b=WMN4NnUELViuCmtX91 Kq9Sv+bdF7SByQFoioBWDSFy/PsWcAtxRSo+d2/+OJQ859ZqwnR8iLpwoJiY2/Ls CRXhpxsmrSSF6KQsXJIB8CF5m35/6K2DRu+5hGgYUGJCN0McZZzePlHv+jyA8pzq TJY9s4rfwBVIJ5FOacoo8KvILrUf4DqhNKGQ0yr0EylO6/bl9FWUeXpWc2ptH4tj Gh/yml4S3tjo54VRXt2cf6BOEpS3WbCfxB8c6fFXWUT01H8K/XRa2cPNiijMa6c/ GbtpygnSRzni2IFXr4dTzqSauFdX0x3LmxK3bKdRUMXIBYy04jOa/W837LYUv9hm aYpQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1700036976; x=1700123376; bh=IRBtGrUy2Ft42 RMj6V5jPTnWj+SAZd9T46DmCXpoUVs=; b=wWyyCbklzgaANgVFYQ4Iv3ppN+oDm iouNLqFaBKnlxAP/+WlbeImq/n1x5AW0yxPcBIa0Go2TYAu+vwTSTLju+f9AASW0 LT+ttKirDiqn6HUef0uglB5B3qTMKQeXm7nNFQ5kGmKix6fH/mqTDbA6+yU2JRav 1GqwlBw7hUnpM89cownw4elBBir4c/UlxQBxZkjFP5c7e7n/gkfutu/LJpXOwsaR o1NKe9f51lKFVQYTjt1LO90TeVQhg2s3VHu3QvcJf4hizzZmhsIMVrjtjTdpTMb4 9oghzSZFkW+QS/VUp4syBMmRpEjQc2nVV67ROoPc0bXyNtMTS8jp+XF1A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrudefhedgheegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedfhggvshhlvgihucfoohhorhgvfdcuoeifvghsseifvgii mhdrnhgvtheqnecuggftrfgrthhtvghrnhepfeeuveelvdfhgfefffejuddtteehuddvve evjeejtdfgkeeiheehfffgveetteeinecuffhomhgrihhnpehgihhthhhusgdrtghomhdp uggvsghirghnrdhorhhgpdhophgvnhgsshgurdhorhhgpdhvohhiughlihhnuhigrdhorh hgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepfigv shesfigviihmrdhnvght X-ME-Proxy: Feedback-ID: i0dbc4144:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 6AA9B1700089; Wed, 15 Nov 2023 03:29:35 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-1108-g3a29173c6d-fm-20231031.005-g3a29173c X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Message-Id: <1ff732ac-40f7-44e3-a140-ba9771cad72a@app.fastmail.com> In-Reply-To: References: <20231113014330.2247710-1-wes@wezm.net> Date: Wed, 15 Nov 2023 18:29:15 +1000 From: "Wesley Moore" To: tech@mandoc.bsd.lv Subject: Re: Improve performance of makewhatis Content-Type: text/plain Hi Ingo, Thanks for the detailed reply. On Wed, 15 Nov 2023, at 11:26 AM, Ingo Schwarze wrote: > > Can you confirm that these statements are accurate? > > * Chimera Linux includes mandoc in its base system since 2021 Oct 30. That's correct. > * There were no official nor unofficial packages of mandoc > for Chimera Linux before that. Yes that's right. > * It uses the mandoc(1) parsing and formatting engine by default > since the same date. Yes (pretty much). It looks like it was added a day later https://github.com/chimera-linux/cports/commit/3c493733f7f0df345816cc69e273fca32fe70830 > * The default apropos(1) program is mandoc since the same date. > * The default man(1) program is mandoc since the same date. > * Chimera Linux does not provide an officially supported > alternative for the man(1) program that users can select. > * Chimera Linux does not publish its manual pages on the web > on a site similar to https://manpages.debian.org/ > or https://man.openbsd.org/ or https://man.voidlinux.org/ . Yes all of above are correct. >> makewhatis -Tutf8 is run to rebuild the index >> whenever a package containing man pages is installed or updated. >> I noticed this took multiple seconds even on a Ryzen 9 7950X system. > > That may not be a major problem because installing an additional > package is not usually a fast operation (it usually requires both > network access and expensive database locking and management outside > the domain of manual pages) and it isn't done often either. > > So i expect most users won't feel offended if installing new > software takes a few seconds. Perhaps that is the case but it bothered me enough to go to the effort of writing these patches... [snip] > So only 10-15% of the time is spent in roff_getstrn(), which cannot > possibly result in the 25-35% overall speedup you report, so something > looks fishy here. Then again, even 25-35% is not a large gain. Those are the numbers I saw from my basic benchmarking measuring the wall clock time of makewhatis -Tutf -n across three runs and averaging the results for each machine. The raw data from my notes is available here: https://gist.github.com/wezm/4043bde0dc2974c88b7706c60b58f900 [snip] > One other thing. I hate patch series, don't ever send them. > They are usually a symptom of poor development methodology, > and usually i reject them without even inspecting them. > Here, i made an exception because your idea seems to have some merit. > > Every patch needs to > * perform one well-defined task and > * make the code better even if nothing else is ever committed > building on it. > > Splitting a patch like you did it here only adds obfuscation > If a patch becomes too big for review, then the *task* it performs > needs to be split into logically separate steps, rather than > splitting code changes into mutiple patch files that essentially > all save the same purpose and are similar and closely related in > their content. Such *logical* splitting can become tricky, but > i doubt that is needed here. Sorry about that. While I've been programming for a couple of decades the email based patch workflow is very new to me and different to all my experience to date. If I post any more changes I'll be sure to follow your suggestions. > I'm not yet completely sure what the best way forward is. > Probably something like: > 1. Identifying all places that do string lookup in tables or lists. > Some can probably be left alone: > arch.c, att.c, cgi.c, chars.c, eqn.c, mansearch.c, > mdoc_argnames in mdoc.c, msec.c, st.c, regtab in roff.c, > tbl_layout.c, tbl_opts.c > But the others should probably be dealt with in a unified manner: > reqtab, mdocmac, manmac on the one hand > strtab, rentab, pretab, xmbtab on the other hand > Not sure yet whether xtab can be included into xmbtab. > 2. Design and implement a common handling for these using ohash > that is as simple as possible. When I have a moment I'll take a look through these and try to come up with an approach/plan and post it for feedback before making code changes. > Not sure yet whether these are multiple steps or a single step, > but doing one separate step for every *tab is definitely not the > way. > > I guess that's enough for today... > > Yours, > Ingo Wes -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv