tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* Discrepancy in mansearch and fs_lookup behavior
@ 2021-08-30 19:26 Sören Tempel
  2021-09-01 20:35 ` Ingo Schwarze
  0 siblings, 1 reply; 8+ messages in thread
From: Sören Tempel @ 2021-08-30 19:26 UTC (permalink / raw)
  To: tech

Hello!

I am currently working on rearranging the Alpine Linux POSIX and Perl
man pages to make them work properly without mandoc. For some historic
reason, Alpine presently install these as follows:

	/usr/share/man/man3/open.3pm.gz
	/usr/share/man/man3/open.3p.gz

where 3pm is the perl open man page and 3p is the POSIX open man page.
With this setup `man 3p open` will always open the Perl man page and
there seems to be no way to open the POSIX man page. Looking at the
fs_lookup implementation I believe this to be the case because mandoc
expects each section to have its own subdirectory in MANDIR. I am also
aware that OpenBSD uses the 3p section for Perl man pages which is a bit
confusing but probably Alpine's fault.

My present understanding is that this would have to be fixed on the
Alpine side by moving these man pages to their own subdirectory. Please
let me know if there is an alternative solution. While experimenting
with moving these pages, I noticed a discrepancy in the man page lookup
behavior of mansearch and fs_lookup. Assuming the above man pages are
installed as follows:

	/usr/share/man/man3/open.3pm.gz
	/usr/share/man/man3p/open.3p.gz

if a mandoc.db exists, `man 3p open` will display the Perl (open.3pm.gz)
man page (mansearch). If it doesn't (fs_lookup), it will display the
POSIX man page (open.3p.gz). I find this surprising as I would expect
the two algorithms to be equivalent. I am reporting this here as I
believe this to be a "minor bug" that doesn't interest the majority of
mandoc users.

See also: https://gitlab.alpinelinux.org/alpine/aports/-/issues/12958

Greetings,
Sören
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-08-30 19:26 Discrepancy in mansearch and fs_lookup behavior Sören Tempel
@ 2021-09-01 20:35 ` Ingo Schwarze
  2021-09-02 19:09   ` Ingo Schwarze
  2021-09-04 10:12   ` Sören Tempel
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Schwarze @ 2021-09-01 20:35 UTC (permalink / raw)
  To: Soeren Tempel; +Cc: tech

Hi Soeren,

Soeren Tempel wrote on Mon, Aug 30, 2021 at 09:26:59PM +0200:

> I am currently working on rearranging the Alpine Linux POSIX and Perl
> man pages

Hold on a second; if there are mandoc bugs or misfeatures in this area,
uprooting major areas of manual page forests in a particular operating
system may not be the best option for dealing with that.

> to make them work properly without mandoc.

Do you mean, "with mandoc"?

> For some historic reason, Alpine presently install these as follows:
> 
> 	/usr/share/man/man3/open.3pm.gz
> 	/usr/share/man/man3/open.3p.gz
> 
> where 3pm is the perl open man page and 3p is the POSIX open man page.

Certainly not maximally robust, but not totally unreasonable on
first sight.

There are three places where section identifiers can be stored:
 1) at the end of the directory name
 2) at the end of the file name
 3) in the .Dt or .TH macro.

Needless to say, it is most robust to have all three agree, but that
is not always possible for a wide variety of reasons.  Operating
system conventions are just one such reason.  Some manual pages may
belong to more than one section.  For example, a section 8 manual page
may do double duty as a section 5 manual page for the associated
configuration file.  And third party manual pages may be hard to
control.  Packagers may be more willing to adjust file and directory
names at package build time and more hesitant to patch upstream file
content, in particular with respect to something that can be perceived
as mostly an aesthetic issue.  And there may be more reasons.

Hence, in general, mandoc tries to handle mismatching section identifiers
gracefully, putting the page in all sections mentioned.  Apart from the
section stated inside the file, man3/open.3pm is supposed to show up
in both sections "3" and "3pm" and man3/open.3p in both "3" and "3p".


In the following, let us keep the discussion of makewhatis(8),
mansearch() and fs_search() strictly separate.  This is the order
of decreasing importance.

First, regarding makewhatis(8).  To test, i ran:

  rm -rf Test
  mkdir Test
  mkdir Test/man3
  cp /usr/ports/pobj/man-pages-posix-2017a/man-pages-posix-2017/man3p/open.3p\
     Test/man3/
  cp /usr/share/man/man3p/open.3p Test/man3/open.3pm
  makewhatis Test
  alias dbm_dump=/usr/obj/regress/usr.bin/mandoc/db/dbm_dump/dbm_dump
  dbm_dump Test/mandoc.db

which yielded:

  === PAGES ===
  page name # [fh1t]open # [t]openat 
  page sect # 3 # 3P # 3p 
  page desc # open file 
  page file src # man3/open.3p 
  page name # [fh1t]open 
  page sect # 3 # 3p # 3pm 
  page desc # perl pragma to set default PerlIO layers for input and output 
  page file src # man3/open.3pm 
  === END OF PAGES ===

That looks reasonable to me.  In particular, it has the "3" from
the directory name, the 3P/3p from the file content, and the 3p/3pm
from the file name.


Next up, regarding mansearch().

  man -M Test open      -> Perl; random choice because bits, sec, name,
                                 and arch all compare equal
  man -M Test 3 open    -> Perl; dto. and both file extensions mismatch
  man -M Test 3p open   -> POSIX; preferred due to exact file extension match
  man -M Test 3pm open  -> Perl; because the POSIX page does not match at all

That all looks correct to me, too.


Now finally, fs_search().

  rm -f Test/mandoc.db
  man -M Test open      -> POSIX; random
  man -M Test 3p open   -> No entry; because directory does not exist
  man -M Test 3pm open  -> No entry; because directory does not exist

I think the globbing in the second part of fs_lookup could be relaxed
to allow "manpath/man3*/name.[01-9]*" for sec == 3*, and multiple
paths could be added to *res.  Then the main program would do the
usual prioritization.

I think i'll start working on that but found it useful to provide
feedback without too much delay, and i would welcome feedback on my
other points and questions in turn.


> With this setup `man 3p open` will always open the Perl man page and
> there seems to be no way to open the POSIX man page.

I can't reproduce, see above.  Are you sure?  Is this with or without
mandoc.db?  Are you using the latest mandoc from CVS?

> Looking at the fs_lookup implementation I believe this to be the
> case because mandoc expects each section to have its own subdirectory
> in MANDIR.

Yes.

> I am also aware that OpenBSD uses the 3p section for Perl man pages
> which is a bit confusing but probably Alpine's fault.

Using 3p for POSIX seems to be an upstream decision, not sure whether
by the Austin group or by the kernel.org packager.  Anyway, it does
not look like Alpine's fault.

> My present understanding is that this would have to be fixed on the
> Alpine side by moving these man pages to their own subdirectory. Please
> let me know if there is an alternative solution. While experimenting
> with moving these pages, I noticed a discrepancy in the man page lookup
> behavior of mansearch and fs_lookup. Assuming the above man pages are
> installed as follows:
> 
> 	/usr/share/man/man3/open.3pm.gz
> 	/usr/share/man/man3p/open.3p.gz
> 
> if a mandoc.db exists, `man 3p open` will display the Perl (open.3pm.gz)
> man page (mansearch).

I cannot reproduce, see above.

> If it doesn't (fs_lookup), it will display the
> POSIX man page (open.3p.gz).

Yes.

> I find this surprising as I would expect
> the two algorithms to be equivalent.

They cannot be equivalent: fs_search() needs to be *much* simpler
because we cannot possibly duplicate all the mandocdb.c logic in main.c.
That would be too much code and too slow.  Besides, main.c cannot look
at file contents.

When it doesn't cause too much complexity, making the two more similar
to each other may be be worthwhile for particular cases that matter
for practical purposes, though.

> I am reporting this here as I believe this to be a "minor bug" that
> doesn't interest the majority of mandoc users.

Good choice, thank you.

> See also: https://gitlab.alpinelinux.org/alpine/aports/-/issues/12958

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-09-01 20:35 ` Ingo Schwarze
@ 2021-09-02 19:09   ` Ingo Schwarze
  2021-09-04 10:12   ` Sören Tempel
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Schwarze @ 2021-09-02 19:09 UTC (permalink / raw)
  To: Soeren Tempel; +Cc: tech

Hi Soeren,

Ingo Schwarze wrote on Wed, Sep 01, 2021 at 10:35:23PM +0200:

> Now finally, fs_search().
> 
>   rm -f Test/mandoc.db
>   man -M Test open      -> POSIX; random
>   man -M Test 3p open   -> No entry; because directory does not exist
>   man -M Test 3pm open  -> No entry; because directory does not exist
> 
> I think the globbing in the second part of fs_lookup could be relaxed
> to allow "manpath/man3*/name.[01-9]*" for sec == 3*, and multiple
> paths could be added to *res.  Then the main program would do the
> usual prioritization.

See below for a patch resulting in

    man -M Test open      -> POSIX; random
    man -M Test 3p open   -> POSIX because exact match of file name extension
    man -M Test 3pm open  -> Perl because man3/open.3p does not match at all

[...]
> They cannot be equivalent: fs_search() needs to be *much* simpler
> because we cannot possibly duplicate all the mandocdb.c logic in main.c.
> That would be too much code and too slow.  Besides, main.c cannot look
> at file contents.
> 
> When it doesn't cause too much complexity, making the two more similar
> to each other may be be worthwhile for particular cases that matter
> for practical purposes, though.

This was trickier than anticipated.  When given "man -M Test 3p open",
my first try was allowing "Test/man3*/name.[01-9]*" which worked well
enough for finding the desired files, but blew up quite badly elsewhere
with lots of false positives and fallout all over the place.

To keep false positives at bay, it is definitely required that (unless
both match in the first place) either the directory name or the file
name extension match the requested section number.

What i'm proposing here is the following list of priorities,
described using the example of "man 3p open":

 1. man3p/open.3p          # double exact match
 2. cat3p/open.0           # preformatted exact match
 3. man3p/<arch>/open.3p   # architecture dependent page
 4. man3p/open.[01-9]*     # directory + name match
 5. man3/open.3p*          # name + extension match

Does the patch make sense to you and work for your use case?

Yours,
  Ingo


Index: main.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/main.c,v
retrieving revision 1.356
diff -u -p -r1.356 main.c
--- main.c	14 Aug 2021 13:53:08 -0000	1.356
+++ main.c	2 Sep 2021 18:36:49 -0000
@@ -94,9 +94,11 @@ struct	outstate {
 int			  mandocdb(int, char *[]);
 
 static	void		  check_xr(struct manpaths *);
-static	int		  fs_lookup(const struct manpaths *,
-				size_t ipath, const char *,
-				const char *, const char *,
+static	void		  fs_append(char **, size_t, int,
+				size_t, const char *, enum form,
+				struct manpage **, size_t *);
+static	int		  fs_lookup(const struct manpaths *, size_t,
+				const char *, const char *, const char *,
 				struct manpage **, size_t *);
 static	int		  fs_search(const struct mansearch *,
 				const struct manpaths *, const char *,
@@ -700,6 +702,30 @@ glob_esc(char **dst, const char *src, co
 		*(*dst)++ = *suffix++;
 }
 
+static void
+fs_append(char **file, size_t filesz, int copy, size_t ipath,
+    const char *sec, enum form form, struct manpage **res, size_t *ressz)
+{
+	struct manpage	*page;
+
+	*res = mandoc_reallocarray(*res, *ressz + filesz, sizeof(**res));
+	page = *res + *ressz;
+	*ressz += filesz;
+	for (;;) {
+		page->file = copy ? mandoc_strdup(*file) : *file;
+		page->names = NULL;
+		page->output = NULL;
+		page->bits = NAME_FILE & NAME_MASK;
+		page->ipath = ipath;
+		page->sec = (*sec >= '1' && *sec <= '9') ? *sec - '1' + 1 : 10;
+		page->form = form;
+		if (--filesz == 0)
+			break;
+		file++;
+		page++;
+	}
+}
+
 static int
 fs_lookup(const struct manpaths *paths, size_t ipath,
 	const char *sec, const char *arch, const char *name,
@@ -707,16 +733,19 @@ fs_lookup(const struct manpaths *paths, 
 {
 	struct stat	 sb;
 	glob_t		 globinfo;
-	struct manpage	*page;
-	char		*file, *cp;
+	char		*file, *cp, secnum[2];
 	int		 globres;
 	enum form	 form;
 
 	const char *const slman = "/man";
 	const char *const slash = "/";
 	const char *const sglob = ".[01-9]*";
+	const char *const dot   = ".";
+	const char *const aster = "*";
 
+	memset(&globinfo, 0, sizeof(globinfo));
 	form = FORM_SRC;
+
 	mandoc_asprintf(&file, "%s/man%s/%s.%s",
 	    paths->paths[ipath], sec, name, sec);
 	if (stat(file, &sb) != -1)
@@ -751,14 +780,34 @@ fs_lookup(const struct manpaths *paths, 
 		mandoc_msg(MANDOCERR_GLOB, 0, 0,
 		    "%s: %s", file, strerror(errno));
 	free(file);
+	file = NULL;
 	if (globres == 0)
-		file = mandoc_strdup(*globinfo.gl_pathv);
+		goto found;
 	globfree(&globinfo);
-	if (globres == 0) {
-		if (stat(file, &sb) != -1)
-			goto found;
+
+	if (sec[1] != '\0' && *ressz == 0) {
+		secnum[0] = sec[0];
+		secnum[1] = '\0';
+		cp = file = mandoc_malloc(strlen(paths->paths[ipath]) * 2 +
+		    strlen(slman) + strlen(secnum) * 2 + strlen(slash) +
+		    strlen(name) * 2 + strlen(dot) +
+		    strlen(sec) * 2 + strlen(aster) + 1);
+		glob_esc(&cp, paths->paths[ipath], slman);
+		glob_esc(&cp, secnum, slash);
+		glob_esc(&cp, name, dot);
+		glob_esc(&cp, sec, aster);
+		*cp = '\0';
+		globres = glob(file, 0, NULL, &globinfo);
+		if (globres != 0 && globres != GLOB_NOMATCH)
+			mandoc_msg(MANDOCERR_GLOB, 0, 0,
+			    "%s: %s", file, strerror(errno));
 		free(file);
+		file = NULL;
+		if (globres == 0)
+			goto found;
+		globfree(&globinfo);
 	}
+
 	if (res != NULL || ipath + 1 != paths->sz)
 		return -1;
 
@@ -770,19 +819,14 @@ fs_lookup(const struct manpaths *paths, 
 found:
 	warnx("outdated mandoc.db lacks %s(%s) entry, run %s %s",
 	    name, sec, BINM_MAKEWHATIS, paths->paths[ipath]);
-	if (res == NULL) {
+	if (res == NULL)
 		free(file);
-		return 0;
-	}
-	*res = mandoc_reallocarray(*res, ++*ressz, sizeof(**res));
-	page = *res + (*ressz - 1);
-	page->file = file;
-	page->names = NULL;
-	page->output = NULL;
-	page->bits = NAME_FILE & NAME_MASK;
-	page->ipath = ipath;
-	page->sec = (*sec >= '1' && *sec <= '9') ? *sec - '1' + 1 : 10;
-	page->form = form;
+	else if (file == NULL)
+		fs_append(globinfo.gl_pathv, globinfo.gl_pathc, 1,
+		    ipath, sec, form, res, ressz);
+	else
+		fs_append(&file, 1, 0, ipath, sec, form, res, ressz);
+	globfree(&globinfo);
 	return 0;
 }
 
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-09-01 20:35 ` Ingo Schwarze
  2021-09-02 19:09   ` Ingo Schwarze
@ 2021-09-04 10:12   ` Sören Tempel
  2021-09-04 13:18     ` Ingo Schwarze
  1 sibling, 1 reply; 8+ messages in thread
From: Sören Tempel @ 2021-09-04 10:12 UTC (permalink / raw)
  To: tech

Ingo Schwarze <schwarze@usta.de> wrote:
> Hi Soeren,

Hello Ingo,

> Hold on a second; if there are mandoc bugs or misfeatures in this area,
> uprooting major areas of manual page forests in a particular operating
> system may not be the best option for dealing with that.

Indeed, I would also prefer not move the Alpine man page. It was just
unclear to me whether mandoc intended to support lookup for section
which do not have their own subdirectory in MANDIR from reading the
fs_lookup code it seemed to me that this was not supported, thus my
initial idea was to move the pages.

> Do you mean, "with mandoc"?

Yes.

> In the following, let us keep the discussion of makewhatis(8),
> mansearch() and fs_search() strictly separate.  This is the order
> of decreasing importance.

In the following I will perform all tests with the open(3p) and the
open(3pm) man page installed as follows in MANDIR:

	/usr/share/man/man3/open.3pm.gz
	/usr/share/man/man3/open.3p.gz

>   === PAGES ===
>   page name # [fh1t]open # [t]openat 
>   page sect # 3 # 3P # 3p 
>   page desc # open file 
>   page file src # man3/open.3p 
>   page name # [fh1t]open 
>   page sect # 3 # 3p # 3pm 
>   page desc # perl pragma to set default PerlIO layers for input and output 
>   page file src # man3/open.3pm 
>   === END OF PAGES ===
> 
> That looks reasonable to me.  In particular, it has the "3" from
> the directory name, the 3P/3p from the file content, and the 3p/3pm
> from the file name.

The dbm_dump output for these two pages looks as follows on a standard
Alpine Linux Edge installation with a mandoc.db generated with
makewhatis(8) from mandoc-1.14.5:

	page name # [fh1t]open # [t]openat
	page sect # 3 # 3P # 3p
	page desc # open file
	page file src # man3/open.3p.gz
	page name # [fh1t]open
	page sect # 3 # 3pm
	page desc # perl pragma to set default PerlIO layers for input and output
	page file src # man3/open.3pm.gz

This does look similar to your output. The only difference seems to be
that my man pages are gzip compressed.

> Next up, regarding mansearch().
> 
>   man -M Test open      -> Perl; random choice because bits, sec, name,
>                                  and arch all compare equal
>   man -M Test 3 open    -> Perl; dto. and both file extensions mismatch
>   man -M Test 3p open   -> POSIX; preferred due to exact file extension match

If I try to duplicate your mansearch() tests I get the following results:

	man open    -> open(2) Linux man page (/usr/share/man/man2/open.2.gz)
	man 3 open  -> open(3pm) Perl man page
	man 3p open -> open(3pm) Perl man page

I get these results with both mandoc-1.14.5 and mandoc from current CVS.
Only for `man 3pm open` these two mandoc versions seem to return
different results:

	man 3pm open:
		CVS version:   open(3pm) Perl man page
		mandoc-1.14.5: open(2) Linux man page

For `man 3p open` I briefly stepped through the mansearch() code with
gdb on the mandoc CVS version. What happens here is that mansearch()
returns the following results:

	(gdb) p resnsz
	$1 = 2
	(gdb) p resn[0]
	$2 = {
		file = 0x7ffff7ffe8e0 "/usr/share/man/man3/open.3pm.gz",
		names = 0x7ffff7ffed20 "open(3, 3pm)",
		output = 0x7ffff7f60b00 "perl pragma to set default PerlIO layers for input and output",
		bits = 30,
		ipath = 0,
		sec = 3,
		form = FORM_SRC
	}
	(gdb) p resn[1]
	$3 = {
		file = 0x7ffff7ffe8b0 "/usr/share/man/man3/open.3p.gz",
		names = 0x7ffff7ffed00 "open, openat(3, 3P, 3p)",
		output = 0x7ffff7f608f0 "open file",
		bits = 30,
		ipath = 0,
		sec = 3,
		form = FORM_SRC
	}
	
mandoc then searches for the best match according to priority. In this
case both resn[0] and resn[1] get a priority of 35. However, since it
iterates over resn[0] first and the comparison with best_prio is a
greater than or equal comparison [1], resn[0] is the one that gets
selected. Maybe this is a problem with the .gz file extension?

	if (search.sec != NULL &&
	    (strlen(sec) <= ssz  + 3 ||
	     strcmp(sec + strlen(sec) - ssz,
	      search.sec) != 0))
	   prio += 20; /* Wrong file ext. */

The above code is where I would expect the two man pages to differ since
search.sec is 3p and as such should cause the open(3p) man page to get the
better priority. Unfortunately, this does not work correctly since the file
extension extracted here is gz, not 3p. This would also explain why we get
different mansearch() results.

Regarding fs_search(): I tested the patch from your other reply. This
patch work entirely fine for me and does what it is supposed to do. I
applied this patch on top of the CVS version. With mandoc.db removed, I
get the following results:

	man 3 open    -> open(3p) POSIX man page
	man 3p open   -> open(3p) POSIX man page
	man 3pm open  -> open(3pm) Perl man page

So yes, your patch does seem to improve the fs_search() behaviour, thanks a
lot! This is what I would expect the mansearch() to also return.

> > With this setup `man 3p open` will always open the Perl man page and
> > there seems to be no way to open the POSIX man page.
> 
> I can't reproduce, see above.  Are you sure?  Is this with or without
> mandoc.db?  Are you using the latest mandoc from CVS?

I tried to clarify this above. There presently does seem to be no way to
access open(3p) on Alpine with mansearch() neither with the CVS version
nor with 1.14.5.

Greetings,
Sören

[1]: https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/usr.bin/mandoc/main.c#L519-L520
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-09-04 10:12   ` Sören Tempel
@ 2021-09-04 13:18     ` Ingo Schwarze
  2021-09-04 16:16       ` Ingo Schwarze
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Schwarze @ 2021-09-04 13:18 UTC (permalink / raw)
  To: Sören Tempel; +Cc: tech

Hi Soeren,

Soeren Tempel wrote on Sat, Sep 04, 2021 at 12:12:50PM +0200:
> Ingo Schwarze <schwarze@usta.de> wrote:

>> Hold on a second; if there are mandoc bugs or misfeatures in this area,
>> uprooting major areas of manual page forests in a particular operating
>> system may not be the best option for dealing with that.

> Indeed, I would also prefer not move the Alpine man page. It was just
> unclear to me whether mandoc intended to support lookup for section
> which do not have their own subdirectory in MANDIR

I did not intend to in the past - but merely because i was unaware
of the need to support such a configuration.  This is a typical case
of software needing real-world use and testing to mature.

> from reading the fs_lookup code it seemed to me that this was not
> supported, thus my initial idea was to move the pages.

I hope we get this sorted out and you don't need to - hopefully
without breaking the setup in other operating systems either.
Illumos may be particularly fragile, so i hope people will test
the next release over there.  And i'd like to see tests on FreeBSD
too, but those will almost certainly happen anyway.

[...]
>>   === PAGES ===
>>   page name # [fh1t]open 
>>   page sect # 3 # 3p # 3pm 
[...]
> 	page name # [fh1t]open
> 	page sect # 3 # 3pm

Looks like your Perl pages contain ".TH open 3" or ".TH open 3pm"
whereas OpenBSD has ".TH open 3p", but i don't expect that to
cause trouble.  Actually, not having "3p" in there is good for you.
It lessens the risk of mixups.

[...]
>> Next up, regarding mansearch().
>> 
>>   man -M Test open    -> Perl; random choice because bits, sec, name,
>>                                  and arch all compare equal
>>   man -M Test 3 open  -> Perl; dto. and both file extensions mismatch
>>   man -M Test 3p open -> POSIX; preferred due to exact file extension match

> If I try to duplicate your mansearch() tests I get the following results:
> 
> 	man open    -> open(2) Linux man page (/usr/share/man/man2/open.2.gz)
> 	man 3 open  -> open(3pm) Perl man page
> 	man 3p open -> open(3pm) Perl man page
> 
> I get these results with both mandoc-1.14.5 and mandoc from current CVS.

Let's talk about CVS HEAD only, mixing in statements about old releases
can cause nothing but confusion.

[...]
> For `man 3p open` I briefly stepped through the mansearch() code with
> gdb on the mandoc CVS version. What happens here is that mansearch()
> returns the following results:
[... snip correct results ...]

> mandoc then searches for the best match according to priority. In this
> case both resn[0] and resn[1] get a priority of 35. However, since it
> iterates over resn[0] first and the comparison with best_prio is a
> greater than or equal comparison [1], resn[0] is the one that gets
> selected. Maybe this is a problem with the .gz file extension?

Yes, most likely.  I shall have a look separately and try to fix that.

[...]
> Regarding fs_search(): I tested the patch from your other reply. This
> patch work entirely fine for me and does what it is supposed to do. I
> applied this patch on top of the CVS version. With mandoc.db removed, I
> get the following results:
> 
> 	man 3 open    -> open(3p) POSIX man page
> 	man 3p open   -> open(3p) POSIX man page
> 	man 3pm open  -> open(3pm) Perl man page
> 
> So yes, your patch does seem to improve the fs_search() behaviour,
> thanks a lot!

Thank you very much for testing!  It is really good to have a real-world
test before commit rather only what i did in my Test/ sandbox.

The patch appended below is now committed.

Hang on a moment for the remaining issue with the .gz extension
probably breaking the prio code, i'll try to figure that out and
send a patch for that, too.

Yours,
  Ingo


Log Message:
-----------
In the fallback code to look for manual pages without using mandoc.db(5),
accept files "man<one-digit-section>/<name>.<full-section>"
in addition to the already supported "man<full-section>/name.[01-9]*". 
Needed for example on Alpine Linux which puts its Perl manuals
into "man3/<name>.3pm" and the POSIX manuals into "man3/<name>.3p".

While here, allow the glob(3) at the end of fs_lookup() to add multiple 
matches to the result set.  This improves man -w output and may also
help some cases of plain man(1), allowing main() to prioritize properly
rather than fs_lookup() picking a random match.    

Issue reported and patch tested
by Soeren Tempel <soeren at soeren hyphen tempel dot net>.

Modified Files:
--------------
    mandoc:
        main.c

Revision Data
-------------
Index: main.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/main.c,v
retrieving revision 1.356
retrieving revision 1.357
diff -Lmain.c -Lmain.c -u -p -r1.356 -r1.357
--- main.c
+++ main.c
@@ -94,9 +94,11 @@ struct	outstate {
 int			  mandocdb(int, char *[]);
 
 static	void		  check_xr(struct manpaths *);
-static	int		  fs_lookup(const struct manpaths *,
-				size_t ipath, const char *,
-				const char *, const char *,
+static	void		  fs_append(char **, size_t, int,
+				size_t, const char *, enum form,
+				struct manpage **, size_t *);
+static	int		  fs_lookup(const struct manpaths *, size_t,
+				const char *, const char *, const char *,
 				struct manpage **, size_t *);
 static	int		  fs_search(const struct mansearch *,
 				const struct manpaths *, const char *,
@@ -700,6 +702,30 @@ glob_esc(char **dst, const char *src, co
 		*(*dst)++ = *suffix++;
 }
 
+static void
+fs_append(char **file, size_t filesz, int copy, size_t ipath,
+    const char *sec, enum form form, struct manpage **res, size_t *ressz)
+{
+	struct manpage	*page;
+
+	*res = mandoc_reallocarray(*res, *ressz + filesz, sizeof(**res));
+	page = *res + *ressz;
+	*ressz += filesz;
+	for (;;) {
+		page->file = copy ? mandoc_strdup(*file) : *file;
+		page->names = NULL;
+		page->output = NULL;
+		page->bits = NAME_FILE & NAME_MASK;
+		page->ipath = ipath;
+		page->sec = (*sec >= '1' && *sec <= '9') ? *sec - '1' + 1 : 10;
+		page->form = form;
+		if (--filesz == 0)
+			break;
+		file++;
+		page++;
+	}
+}
+
 static int
 fs_lookup(const struct manpaths *paths, size_t ipath,
 	const char *sec, const char *arch, const char *name,
@@ -707,16 +733,19 @@ fs_lookup(const struct manpaths *paths, 
 {
 	struct stat	 sb;
 	glob_t		 globinfo;
-	struct manpage	*page;
-	char		*file, *cp;
+	char		*file, *cp, secnum[2];
 	int		 globres;
 	enum form	 form;
 
 	const char *const slman = "/man";
 	const char *const slash = "/";
 	const char *const sglob = ".[01-9]*";
+	const char *const dot   = ".";
+	const char *const aster = "*";
 
+	memset(&globinfo, 0, sizeof(globinfo));
 	form = FORM_SRC;
+
 	mandoc_asprintf(&file, "%s/man%s/%s.%s",
 	    paths->paths[ipath], sec, name, sec);
 	if (stat(file, &sb) != -1)
@@ -751,14 +780,34 @@ fs_lookup(const struct manpaths *paths, 
 		mandoc_msg(MANDOCERR_GLOB, 0, 0,
 		    "%s: %s", file, strerror(errno));
 	free(file);
+	file = NULL;
 	if (globres == 0)
-		file = mandoc_strdup(*globinfo.gl_pathv);
+		goto found;
 	globfree(&globinfo);
-	if (globres == 0) {
-		if (stat(file, &sb) != -1)
-			goto found;
+
+	if (sec[1] != '\0' && *ressz == 0) {
+		secnum[0] = sec[0];
+		secnum[1] = '\0';
+		cp = file = mandoc_malloc(strlen(paths->paths[ipath]) * 2 +
+		    strlen(slman) + strlen(secnum) * 2 + strlen(slash) +
+		    strlen(name) * 2 + strlen(dot) +
+		    strlen(sec) * 2 + strlen(aster) + 1);
+		glob_esc(&cp, paths->paths[ipath], slman);
+		glob_esc(&cp, secnum, slash);
+		glob_esc(&cp, name, dot);
+		glob_esc(&cp, sec, aster);
+		*cp = '\0';
+		globres = glob(file, 0, NULL, &globinfo);
+		if (globres != 0 && globres != GLOB_NOMATCH)
+			mandoc_msg(MANDOCERR_GLOB, 0, 0,
+			    "%s: %s", file, strerror(errno));
 		free(file);
+		file = NULL;
+		if (globres == 0)
+			goto found;
+		globfree(&globinfo);
 	}
+
 	if (res != NULL || ipath + 1 != paths->sz)
 		return -1;
 
@@ -770,19 +819,14 @@ fs_lookup(const struct manpaths *paths, 
 found:
 	warnx("outdated mandoc.db lacks %s(%s) entry, run %s %s",
 	    name, sec, BINM_MAKEWHATIS, paths->paths[ipath]);
-	if (res == NULL) {
+	if (res == NULL)
 		free(file);
-		return 0;
-	}
-	*res = mandoc_reallocarray(*res, ++*ressz, sizeof(**res));
-	page = *res + (*ressz - 1);
-	page->file = file;
-	page->names = NULL;
-	page->output = NULL;
-	page->bits = NAME_FILE & NAME_MASK;
-	page->ipath = ipath;
-	page->sec = (*sec >= '1' && *sec <= '9') ? *sec - '1' + 1 : 10;
-	page->form = form;
+	else if (file == NULL)
+		fs_append(globinfo.gl_pathv, globinfo.gl_pathc, 1,
+		    ipath, sec, form, res, ressz);
+	else
+		fs_append(&file, 1, 0, ipath, sec, form, res, ressz);
+	globfree(&globinfo);
 	return 0;
 }
 
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-09-04 13:18     ` Ingo Schwarze
@ 2021-09-04 16:16       ` Ingo Schwarze
  2021-09-04 17:51         ` Sören Tempel
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Schwarze @ 2021-09-04 16:16 UTC (permalink / raw)
  To: Soeren Tempel; +Cc: tech

Hi Soeren,

Ingo Schwarze wrote on Sat, Sep 04, 2021 at 03:18:32PM +0200:

> Hang on a moment for the remaining issue with the .gz extension
> probably breaking the prio code, i'll try to figure that out and
> send a patch for that, too.

I believe the patch below fixes the issue that main() does not
properly prioritize gzipped files according to their file name
extensions.

Does it make sense to you, and does it work for you?

The main() program is growing to become a bit unwieldy, but i'm not
going to do any major refactoring right now.

Yours,
  Ingo


Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.259
diff -u -r1.259 main.c
--- main.c	4 Sep 2021 12:47:04 -0000	1.259
+++ main.c	4 Sep 2021 16:04:44 -0000
@@ -131,7 +131,7 @@
 	struct mparse	*mp;		/* Opaque parser object. */
 	const char	*conf_file;	/* -C: alternate config file. */
 	const char	*os_s;		/* -I: Operating system for display. */
-	const char	*progname, *sec;
+	const char	*progname, *sec, *ep;
 	char		*defpaths;	/* -M: override manpaths. */
 	char		*auxpaths;	/* -m: additional manpaths. */
 	char		*oarg;		/* -O: output option string. */
@@ -514,11 +514,16 @@
 					sec++; /* Prefer without suffix. */
 				if (*sec != '/')
 					prio += 10; /* Wrong dir name. */
-				if (search.sec != NULL &&
-				    (strlen(sec) <= ssz  + 3 ||
-				     strcmp(sec + strlen(sec) - ssz,
-				      search.sec) != 0))
-					prio += 20; /* Wrong file ext. */
+				if (search.sec != NULL) {
+					ep = strchr(sec, '\0');
+					if (ep - sec > 3 &&
+					    strncmp(ep - 3, ".gz", 3) == 0)
+						ep -= 3;
+					if ((size_t)(ep - sec) < ssz + 3 ||
+					    strncmp(ep - ssz, search.sec,
+					     ssz) != 0)      /* Wrong file */
+						prio += 20;  /* extension. */
+				}
 				if (prio >= best_prio)
 					continue;
 				best_prio = prio;
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-09-04 16:16       ` Ingo Schwarze
@ 2021-09-04 17:51         ` Sören Tempel
  2021-09-05 12:47           ` Ingo Schwarze
  0 siblings, 1 reply; 8+ messages in thread
From: Sören Tempel @ 2021-09-04 17:51 UTC (permalink / raw)
  To: tech

Ingo Schwarze <schwarze@usta.de> wrote:
> Hi Soeren,

Hello Ingo,

> I believe the patch below fixes the issue that main() does not
> properly prioritize gzipped files according to their file name
> extensions.
> 
> Does it make sense to you, and does it work for you?

Yes, the patch does make sense and it does work well for me:

	man 3 open   -> Perl open(3pm) man page
	man 3p open  -> POSIX open(3p) man page
	man 3pm open -> Perl open(3pm) man page

Thanks a lot for fixing this so quickly!

BTW: Is there any chance that you could make a new release soonish? Has
been a while since the last release and CVS has accumulated quite a few
useful fixes since 1.14.5.

Greetings,
Sören
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Discrepancy in mansearch and fs_lookup behavior
  2021-09-04 17:51         ` Sören Tempel
@ 2021-09-05 12:47           ` Ingo Schwarze
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Schwarze @ 2021-09-05 12:47 UTC (permalink / raw)
  To: Soeren Tempel; +Cc: tech

Hi Soeren,

Soeren Tempel wrote on Sat, Sep 04, 2021 at 07:51:58PM +0200:
> Ingo Schwarze <schwarze@usta.de> wrote:

>> I believe the patch below fixes the issue that main() does not
>> properly prioritize gzipped files according to their file name
>> extensions.
>> 
>> Does it make sense to you, and does it work for you?

> Yes, the patch does make sense and it does work well for me:
> 
> 	man 3 open   -> Perl open(3pm) man page
> 	man 3p open  -> POSIX open(3p) man page
> 	man 3pm open -> Perl open(3pm) man page
> 
> Thanks a lot for fixing this so quickly!

Thank you for reporting and testing!

The patch is now committed, see the commit below.

> BTW: Is there any chance that you could make a new release soonish? Has
> been a while since the last release and CVS has accumulated quite a few
> useful fixes since 1.14.5.

Yes, the next release is long overdue.  I have been paying attention
to not do major reorganizations that might destabilize the code base
for quite some time now.  I'm trying to get those bugs fixed that
are easy to fix with a low risk, then draft the release notes,
then send out a beta to downstream maintainers for testing without
too much additional delay.

My list of pending patches is already free from mandoc entries.
My list of urgent bug reports still contains one or two entries
related to tbl(7).  After that, a final pass over the TODO file
makes sense, to see whether anything trivial remains in there -
but the bulk of the TODO entries will certainly not be addressed
before the upcoming release.

Yours,
  Ingo


Log Message:
-----------
during prioritization for man(1), correctly extract the section name
from the file name extension of gzipped manual page files; bug found 
on Alpine Linux by Soeren Tempel <soeren at soeren hyphen tempel dot net>,
who also tested this patch

Modified Files:
--------------
    mandoc:
        main.c

Revision Data
-------------
Index: main.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/main.c,v
retrieving revision 1.357
retrieving revision 1.358
diff -Lmain.c -Lmain.c -u -p -r1.357 -r1.358
--- main.c
+++ main.c
@@ -132,7 +132,7 @@ main(int argc, char *argv[])
 	struct mparse	*mp;		/* Opaque parser object. */
 	const char	*conf_file;	/* -C: alternate config file. */
 	const char	*os_s;		/* -I: Operating system for display. */
-	const char	*progname, *sec;
+	const char	*progname, *sec, *ep;
 	char		*defpaths;	/* -M: override manpaths. */
 	char		*auxpaths;	/* -m: additional manpaths. */
 	char		*oarg;		/* -O: output option string. */
@@ -536,11 +536,16 @@ main(int argc, char *argv[])
 					sec++; /* Prefer without suffix. */
 				if (*sec != '/')
 					prio += 10; /* Wrong dir name. */
-				if (search.sec != NULL &&
-				    (strlen(sec) <= ssz  + 3 ||
-				     strcmp(sec + strlen(sec) - ssz,
-				      search.sec) != 0))
-					prio += 20; /* Wrong file ext. */
+				if (search.sec != NULL) {
+					ep = strchr(sec, '\0');
+					if (ep - sec > 3 &&
+					    strncmp(ep - 3, ".gz", 3) == 0)
+						ep -= 3;
+					if ((size_t)(ep - sec) < ssz + 3 ||
+					    strncmp(ep - ssz, search.sec,
+					     ssz) != 0)      /* Wrong file */
+						prio += 20;  /* extension. */
+				}
 				if (prio >= best_prio)
 					continue;
 				best_prio = prio;
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-05 12:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 19:26 Discrepancy in mansearch and fs_lookup behavior Sören Tempel
2021-09-01 20:35 ` Ingo Schwarze
2021-09-02 19:09   ` Ingo Schwarze
2021-09-04 10:12   ` Sören Tempel
2021-09-04 13:18     ` Ingo Schwarze
2021-09-04 16:16       ` Ingo Schwarze
2021-09-04 17:51         ` Sören Tempel
2021-09-05 12:47           ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).