tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Subject: Re: Improve mandocdb catpage/man heuristics.
Date: Wed, 28 Dec 2011 02:27:40 +0100	[thread overview]
Message-ID: <20111228012740.GB30880@iris.usta.de> (raw)
In-Reply-To: <4EFA5B78.5030507@bsd.lv>

Hi Kristaps,

these are clear improvements, so i have committed them to OpenBSD.

Kristaps Dzonsons wrote on Wed, Dec 28, 2011 at 01:57:44AM +0200:

> This improves the mandocdb(8) catpage heuristic to, well, more or
> less as good as it's going to get.  It now reads multiple lines into
> a buffer, joining the lines with a space.

Actually, some man(7) pages have a similar problem, see for
example curs_extend(3) which contains

  .SH NAME
  \fBcurses_version\fP,
  \fBuse_extended_names\fP \- miscellaneous curses extensions

These need to take multiple lines into account, to, but maybe
a simpler algorithm than in pformatted is sufficient:

  After .SH NAME, skip all lines until you find "- ",
  then use everything until EOL as .Nd.

> While here, I removed the 70-character limit.

Yes, i'm not married to that.  Probably it's wrong to have this
at all.  I only put it in when first writing the code to avoid
getting distracted from the main tasks by badly formatted corner
case pages.  Now that the basic infrastructure is in place, we
can figure out whether such corner cases really exist, and how
many of them, and what to do about them.  Truncating is a very
naive - well, i hardly dare say "solution".

> I recoded this into apropos.c's and whatis.c's printf(3)
> statements.

Already an improvement.

> We should really
> consider a better way: if not 70-char, to the COLUMN limit?

Not sure yet, open for suggestions.
I'm not even sure how big the problem is...

> Lastly, I added an extra man(7) heuristic for separating names and
> descriptions, namely the \-\- I observed in some POD manuals.  This
> cleans up "apropos -s 3p ~.*" quite a lot.

Maybe man(7) can use similar heuristics as cat, just assuming
that "- " starts the description?

Thanks,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

      reply	other threads:[~2011-12-28  1:27 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-27 23:57 Kristaps Dzonsons
2011-12-28  1:27 ` Ingo Schwarze [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111228012740.GB30880@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).