From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED,
	T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4
Received: (qmail 24380 invoked from network); 15 Nov 2023 01:26:10 -0000
Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12)
  by inbox.vuxu.org with ESMTPUTF8; 15 Nov 2023 01:26:10 -0000
Received: from fantadrom.bsd.lv (localhost [127.0.0.1])
	by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 52e4b231
	for <ml@inbox.vuxu.org>;
	Wed, 15 Nov 2023 01:26:08 +0000 (UTC)
Received: from scc-mailout-kit-01.scc.kit.edu (scc-mailout-kit-01.scc.kit.edu [129.13.231.81])
	by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 19e46271
	for <tech@mandoc.bsd.lv>;
	Wed, 15 Nov 2023 01:26:08 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=usta.de;
	s=kit1; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject
	:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=jQcQxqH/qYqRzV/gIrUSJEfceZIR7EdKlA0onZzAk10=; b=sVS2mMzF2ljOXL9jumVkcbRib4
	TwilrQEfBkKjbwRWx8cZnT/miriP0/bYRGeaqSAjY5iEayWhhkTwz2CXYpXkSFlDJ9+kV6RmuyPMb
	p1761Pj0k5C9wMssjSZ/BdRVV614wjfAlJiIYebdcbIEuPH3Auvz37Q6tqnCx39zLrqAUHLBbU2H0
	O9n3XB6tDAC4LZvY2z9n2GI0ix7ONhGOaf0bshyVh3x50KZSFe2t8T+174q4Ydr02DpLHj1nklATX
	19IZjna2PSPVuFra0NcO946Dp5j9hY/nTqMg8a6DGBj5VrtWIKtTlfGm39FgI0hRNxPMVU8OJgqdO
	l4jeVWbA==;
Received: from hekate.asta.kit.edu ([2a00:1398:5:f401::77])
	by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256)
	(envelope-from <schwarze@usta.de>)
	id 1r34fP-0020ru-0p;
	Wed, 15 Nov 2023 02:26:07 +0100
Received: from login-1.asta.kit.edu ([2a00:1398:5:f400::72])
	by hekate.asta.kit.edu with esmtp (Exim 4.94.2)
	(envelope-from <schwarze@usta.de>)
	id 1r34fO-001hcA-7p; Wed, 15 Nov 2023 02:26:05 +0100
Received: from schwarze by login-1.asta.kit.edu with local (Exim 4.94.2)
	(envelope-from <schwarze@usta.de>)
	id 1r34fN-004Cn1-HT; Wed, 15 Nov 2023 02:26:05 +0100
Date: Wed, 15 Nov 2023 02:26:05 +0100
From: Ingo Schwarze <schwarze@usta.de>
To: Wesley Moore <wes@wezm.net>
Cc: tech@mandoc.bsd.lv
Subject: Re: Improve performance of makewhatis
Message-ID: <ZVQeLXHqOUUA36tB@asta-kit.de>
References: <20231113014330.2247710-1-wes@wezm.net>
X-Mailinglist: mandoc-tech
Reply-To: tech@mandoc.bsd.lv
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20231113014330.2247710-1-wes@wezm.net>

Hi Wesley,

Wesley Moore wrote on Mon, Nov 13, 2023 at 11:42:42AM +1000:

> In this patch set I aimed to improve the performance of makewhatis.
> On my Chimera Linux systems

Interesting.  I just added Chimera Linux to these two pages:

  https://mandoc.bsd.lv/ports.html
  https://mandoc.bsd.lv/porthistory.html

Can you confirm that these statements are accurate?

 * Chimera Linux includes mandoc in its base system since 2021 Oct 30.
 * There were no official nor unofficial packages of mandoc
   for Chimera Linux before that.
 * It uses the mandoc(1) parsing and formatting engine by default
   since the same date.
 * The default apropos(1) program is mandoc since the same date.
 * The default man(1) program is mandoc since the same date.
 * Chimera Linux does not provide an officially supported
   alternative for the man(1) program that users can select.
 * Chimera Linux does not publish its manual pages on the web
   on a site similar to https://manpages.debian.org/
   or https://man.openbsd.org/ or https://man.voidlinux.org/ .

> makewhatis -Tutf8 is run to rebuild the index
> whenever a package containing man pages is installed or updated.

That's not particularly smart.  Rebuilding the whole database from
scratch is unavoidably slow.  The makewhatis(8) options -d and -u
are specifically provided to serve the needs of package management
systems.  Specifically, if a package management systems cares about
performance, it can use -d to add just those manual page files it
just installed to the existing database, rather than rebuilding the
whole database from scratch, which obviously implies reading every
manual page file in the whole system from disk and parsing all those
files.

> I noticed this took multiple seconds even on a Ryzen 9 7950X system.

That may not be a major problem because installing an additional
package is not usually a fast operation (it usually requires both
network access and expensive database locking and management outside
the domain of manual pages) and it isn't done often either.

So i expect most users won't feel offended if installing new
software takes a few seconds.

> Profiling showed a lot of time spent in the loops of roff_getstrn.

Hmm, even though i'm not convinced your usecase is particularly
important, that's interesting, and i wouldn't have expected it.
It may be worth looking into that.

> To speed this up I made use of hash tables. I tested the changes on
> several different systems:
> 
> - OpenBSD amd64 in VM
> - Linux on Raspberry Pi 4
> - Linux on Ryzen 9 7950X
> 
> In each case the wall clock run time for makewhatis -Tutf8 -n showed
> an reduction of between 25% and 35%.

For makewhatis(8), i would say that's such a small gain that it likely
isn't worth much complication.  Then again, it also affects mandoc(1),
so maybe looking into it is worthwhile after all.

Here is what i found with gprof(1) on OpenBSD-current:

share local
 100%  100% main -> mandocdb
  92%   95% (mandocdb) -> mpages_merge
  66%   67% (mpages_merge) -> mparse_readfd -> mparse_buf_r
  32%   26% (mparse_buf_r) -> roff_parseln
  19%   21% MANY -> free -> ofree
16.3%  2.5% (mparse_buf_r) -> mdoc_parseln
 6.8% 18.4% (mparse_buf_r) -> man_parseln
15.6% 14.7% MANY -> roff_word_alloc
14.9% 17.0% (_libc_malloc etc) -> omalloc
14.7% 10.9% (mpages_merge) -> mparse_reset
14.6% 10.6% (roff_parse, roff_expand) -> roff_getstrn

The "share" column refers to /usr/share/man/, i.e. the OpenBSD base
system manuals, a good example of a large, mdoc(7)-heavy collection
of manual pages.  The "local" column refers to /usr/local/man/,
i.e. the optional third-party software i happen to have installed,
which contains mostly man(7) manual pages.

On each line, the two numbers give the time spent in the respective
function including functions called from it, relative to the time
spent in main() including all functions called from there.
The last name on each line is the respective function.
The names before the arrows specify from where it is usually called.

So only 10-15% of the time is spent in roff_getstrn(), which cannot
possibly result in the 25-35% overall speedup you report, so something
looks fishy here.  Then again, even 25-35% is not a large gain.

In general, improving the structure is much more important than
improving the performance because the code has grown historically,
becoming more and more complicated and less and less uniform in
structure, i.e. harder and harder to maintain.  In particular, such
a small gain is clearly not worth further complication.

On the other hand, the critical code path for the optimization
potential you found is:  roff_parse -> roff_getstrn
In that area, code structure is not particularly good.
Parsing for macros and strings is done in various different ways:

 1. The code path roff_parse -> roff_getstrn looks for
    user-defined and renamed macros using r->strtab and r->rentab.
 2. The code path roff_expand -> roff_getstrn looks for
    predefined strings using a linear search in predefs[].
 3. The code paths roff_block(ROFF_am) -> roff_getstrn and
    roff_rn -> roff_getstrn look for standard high-level macros
    directly in the roff_name[] array.
 4. The functions mdoc_pmacro() in mdoc.c and man_pmacro() in man.c
    look for standard high-level macros using local mdoc->mdocmac
    and man->manmac hash tables with roffhash_find().

So it is likely this can be better structured with less duplication
and not gratuitiously using different data structures, while also
using hash tables for macro and string lookup throughout.

> The one caveat of this change is the roff/while/into regression test is
> failing with my changes. I've tried to work out why but I don't know
> roff well enough understand the intent of the test nor have I been able
> to clearly determine why my changes affect it. I was hoping to get some
> help with this.

The roff/while/into test uses .de in a very strange way, and .de
internally uses the string storage, so that's how it may be related,
but i would have to look at the details.  Then again, studying that
is not urgent because your patch is definitely not ready for commit.
It adds complication rather than removing some.  In particular:

 * Do not touch roff_int.h.  It is only intended for internal API
   features needed by more than one parser, see mandoc_headers(3)
   for details, but what you are doing here is local to the roff
   parser (roff.c).
 * struct roff_entry is essentially the same as struct roffkv.
   There should not be two data structures for the same purpose.
 * roff_free_entry() is trivial and only used at one place,
   so it should be inlined.
 * Renaming roff_setstrn to roff_setentry causes quite some churn,
   so don't do that.  If that means using ohash for xmbtab, too,
   so be it.
 * roff_strhash_alloc is also next to trivial.  Maybe it can be
   inlined - maybe not because it is called from a few more
   places.  If we introduce it, roffhash_alloc should also
   use it.
 * The duplication between roffhash_free() and roff_strhash_free()
   and between roffhash_find() and roff_strhash_find() is rather
   ugly.  Not yet sure what to do about that.
 * roff_strhash_insert() is trivial and called only from one place.

Some ideas are also good:

 * Using struct ohash *strtab directly in struct roff is good,
   just like it is already done for struct ohash *reqtab.
 * Introducing r->pretab and clearing it in roff_free()
   rather than in the badly named roff_free1() - just like
   r->reqtab - is good.

One other thing.  I hate patch series, don't ever send them.
They are usually a symptom of poor development methodology,
and usually i reject them without even inspecting them.
Here, i made an exception because your idea seems to have some merit.

Every patch needs to
 * perform one well-defined task and
 * make the code better even if nothing else is ever committed
   building on it.

Splitting a patch like you did it here only adds obfuscation
If a patch becomes too big for review, then the *task* it performs
needs to be split into logically separate steps, rather than
splitting code changes into mutiple patch files that essentially
all save the same purpose and are similar and closely related in
their content.  Such *logical* splitting can become tricky, but
i doubt that is needed here.

I'm not yet completely sure what the best way forward is.
Probably something like:
 1. Identifying all places that do string lookup in tables or lists.
    Some can probably be left alone:
      arch.c, att.c, cgi.c, chars.c, eqn.c, mansearch.c,
      mdoc_argnames in mdoc.c, msec.c, st.c, regtab in roff.c,
      tbl_layout.c, tbl_opts.c
    But the others should probably be dealt with in a unified manner:
      reqtab, mdocmac, manmac  on the one hand
      strtab, rentab, pretab, xmbtab  on the other hand
      Not sure yet whether xtab can be included into xmbtab.
 2. Design and implement a common handling for these using ohash
    that is as simple as possible.

Not sure yet whether these are multiple steps or a single step,
but doing one separate step for every *tab is definitely not the
way.

I guess that's enough for today...

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv