tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: Soeren Tempel <soeren@soeren-tempel.net>
Cc: tech@mandoc.bsd.lv
Subject: Re: Discrepancy in mansearch and fs_lookup behavior
Date: Wed, 1 Sep 2021 22:35:23 +0200	[thread overview]
Message-ID: <20210901203523.GB15706@athene.usta.de> (raw)
In-Reply-To: <1ZRQZL79I0W2O.3ILLDNAL4JXUP@8pit.net>

Hi Soeren,

Soeren Tempel wrote on Mon, Aug 30, 2021 at 09:26:59PM +0200:

> I am currently working on rearranging the Alpine Linux POSIX and Perl
> man pages

Hold on a second; if there are mandoc bugs or misfeatures in this area,
uprooting major areas of manual page forests in a particular operating
system may not be the best option for dealing with that.

> to make them work properly without mandoc.

Do you mean, "with mandoc"?

> For some historic reason, Alpine presently install these as follows:
> 
> 	/usr/share/man/man3/open.3pm.gz
> 	/usr/share/man/man3/open.3p.gz
> 
> where 3pm is the perl open man page and 3p is the POSIX open man page.

Certainly not maximally robust, but not totally unreasonable on
first sight.

There are three places where section identifiers can be stored:
 1) at the end of the directory name
 2) at the end of the file name
 3) in the .Dt or .TH macro.

Needless to say, it is most robust to have all three agree, but that
is not always possible for a wide variety of reasons.  Operating
system conventions are just one such reason.  Some manual pages may
belong to more than one section.  For example, a section 8 manual page
may do double duty as a section 5 manual page for the associated
configuration file.  And third party manual pages may be hard to
control.  Packagers may be more willing to adjust file and directory
names at package build time and more hesitant to patch upstream file
content, in particular with respect to something that can be perceived
as mostly an aesthetic issue.  And there may be more reasons.

Hence, in general, mandoc tries to handle mismatching section identifiers
gracefully, putting the page in all sections mentioned.  Apart from the
section stated inside the file, man3/open.3pm is supposed to show up
in both sections "3" and "3pm" and man3/open.3p in both "3" and "3p".


In the following, let us keep the discussion of makewhatis(8),
mansearch() and fs_search() strictly separate.  This is the order
of decreasing importance.

First, regarding makewhatis(8).  To test, i ran:

  rm -rf Test
  mkdir Test
  mkdir Test/man3
  cp /usr/ports/pobj/man-pages-posix-2017a/man-pages-posix-2017/man3p/open.3p\
     Test/man3/
  cp /usr/share/man/man3p/open.3p Test/man3/open.3pm
  makewhatis Test
  alias dbm_dump=/usr/obj/regress/usr.bin/mandoc/db/dbm_dump/dbm_dump
  dbm_dump Test/mandoc.db

which yielded:

  === PAGES ===
  page name # [fh1t]open # [t]openat 
  page sect # 3 # 3P # 3p 
  page desc # open file 
  page file src # man3/open.3p 
  page name # [fh1t]open 
  page sect # 3 # 3p # 3pm 
  page desc # perl pragma to set default PerlIO layers for input and output 
  page file src # man3/open.3pm 
  === END OF PAGES ===

That looks reasonable to me.  In particular, it has the "3" from
the directory name, the 3P/3p from the file content, and the 3p/3pm
from the file name.


Next up, regarding mansearch().

  man -M Test open      -> Perl; random choice because bits, sec, name,
                                 and arch all compare equal
  man -M Test 3 open    -> Perl; dto. and both file extensions mismatch
  man -M Test 3p open   -> POSIX; preferred due to exact file extension match
  man -M Test 3pm open  -> Perl; because the POSIX page does not match at all

That all looks correct to me, too.


Now finally, fs_search().

  rm -f Test/mandoc.db
  man -M Test open      -> POSIX; random
  man -M Test 3p open   -> No entry; because directory does not exist
  man -M Test 3pm open  -> No entry; because directory does not exist

I think the globbing in the second part of fs_lookup could be relaxed
to allow "manpath/man3*/name.[01-9]*" for sec == 3*, and multiple
paths could be added to *res.  Then the main program would do the
usual prioritization.

I think i'll start working on that but found it useful to provide
feedback without too much delay, and i would welcome feedback on my
other points and questions in turn.


> With this setup `man 3p open` will always open the Perl man page and
> there seems to be no way to open the POSIX man page.

I can't reproduce, see above.  Are you sure?  Is this with or without
mandoc.db?  Are you using the latest mandoc from CVS?

> Looking at the fs_lookup implementation I believe this to be the
> case because mandoc expects each section to have its own subdirectory
> in MANDIR.

Yes.

> I am also aware that OpenBSD uses the 3p section for Perl man pages
> which is a bit confusing but probably Alpine's fault.

Using 3p for POSIX seems to be an upstream decision, not sure whether
by the Austin group or by the kernel.org packager.  Anyway, it does
not look like Alpine's fault.

> My present understanding is that this would have to be fixed on the
> Alpine side by moving these man pages to their own subdirectory. Please
> let me know if there is an alternative solution. While experimenting
> with moving these pages, I noticed a discrepancy in the man page lookup
> behavior of mansearch and fs_lookup. Assuming the above man pages are
> installed as follows:
> 
> 	/usr/share/man/man3/open.3pm.gz
> 	/usr/share/man/man3p/open.3p.gz
> 
> if a mandoc.db exists, `man 3p open` will display the Perl (open.3pm.gz)
> man page (mansearch).

I cannot reproduce, see above.

> If it doesn't (fs_lookup), it will display the
> POSIX man page (open.3p.gz).

Yes.

> I find this surprising as I would expect
> the two algorithms to be equivalent.

They cannot be equivalent: fs_search() needs to be *much* simpler
because we cannot possibly duplicate all the mandocdb.c logic in main.c.
That would be too much code and too slow.  Besides, main.c cannot look
at file contents.

When it doesn't cause too much complexity, making the two more similar
to each other may be be worthwhile for particular cases that matter
for practical purposes, though.

> I am reporting this here as I believe this to be a "minor bug" that
> doesn't interest the majority of mandoc users.

Good choice, thank you.

> See also: https://gitlab.alpinelinux.org/alpine/aports/-/issues/12958

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


  reply	other threads:[~2021-09-01 20:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30 19:26 Sören Tempel
2021-09-01 20:35 ` Ingo Schwarze [this message]
2021-09-02 19:09   ` Ingo Schwarze
2021-09-04 10:12   ` Sören Tempel
2021-09-04 13:18     ` Ingo Schwarze
2021-09-04 16:16       ` Ingo Schwarze
2021-09-04 17:51         ` Sören Tempel
2021-09-05 12:47           ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210901203523.GB15706@athene.usta.de \
    --to=schwarze@usta.de \
    --cc=soeren@soeren-tempel.net \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).