Re: Reliable way to determine that mandoc cannot render a manpage

tech@mandoc.bsd.lv
 help / color / mirror / Atom feed

From: Ingo Schwarze <schwarze@usta.de>
To: Baptiste Daroussin <bapt@FreeBSD.org>
Cc: tech@mdocml.bsd.lv
Subject: Re: Reliable way to determine that mandoc cannot render a manpage
Date: Sun, 4 Jan 2015 15:25:40 +0100	[thread overview]
Message-ID: <20150104142540.GA22437@iris.usta.de> (raw)
In-Reply-To: <20150103233758.GH75600@ivaldir.etoilebsd.net>

Hi Baptiste,

Baptiste Daroussin wrote on Sun, Jan 04, 2015 at 12:37:58AM +0100:
> On Sun, Jan 04, 2015 at 12:31:18AM +0100, Baptiste Daroussin wrote:

>> On FreeBSD when switching to mandoc(1) as a default renderer for
>> manpages I made it falling back on groff(1) (for now :))

You will probably need some kind of fallback for quite some time,
if not for good.  While it is not good style to use the full power
of roff(7) in manuals, there will always be some manuals that use
low-level roff(7) features not implemented in mandoc(1).
By implementing many features in mandoc(1) that we wouldn't have
thought a few years ago we would ever support in mandoc(1), the
number of such manuals has been much reduced, but it's not clear
that we will ever bring it down to zero.

>> if mandoc is not able to render the manpage
>> 
>> To discover the bad manpages I run mandoc -Tlint -Werror and if
>> a failure occurs then the fall back happens.
>> 
>> After checking I can see that mandoc is often correctly able to
>> render the manpages even if they have errors.

Yes.  The mandoc(1) manual says:

  error  An input file contains syntax that cannot be safely interpreted,
         either because it is invalid or because mandoc does not
         implement it yet.  By discarding part of the input or inserting
         missing tokens, the parser is able to continue, and the error
         does not prevent generation of formatted output, but typically,
         preparing that output involves information loss, broken document
         structure or unintended formatting.

So, errors happen for two reasons:

 1. The document uses syntax that mandoc(1) does not implement,
    but groff(1) does.  In that case, you want to use groff(1).

 2. The document uses syntax that is just wrong, where it isn't
    even specified what it should do, and that consequently *no*
    formatter can handle properly.  In that case, it is not clear
    whether whatever implementation-dependent behaviour mandoc(1)
    or groff(1) exhibit happens to be closer to what the author
    actually intended.  I tend to think that in the majority of
    cases, mandoc(1) is the better choice than groff(1) for such
    malformed pages; i thing it is a bit more forgiving and the
    output often makes a bit of sense even for clearly malformed
    input, while groff(1) more often resorts to the principle of
    "garbage in, garbage out".  That isn't always true, though,
    there are certainly some counter-examples what groff(1)
    ahndles specific malformed input better.

On first sight, it seems that throwing the same error level in both
of these cases is stupid; the distinction between "malformed input"
and "unsupported input" is definitely relevant.  There are two
reasons why mandoc(1) throws the same level:

 1. Historical reasons.  We struggled for years to properly
    implement the current distinction warning/error/fatal.  What
    we now have isn't perfect, but quite good; but we didn't come
    round yet to implement the above distinction in the "error"
    level.

 2. The distinction is not quite as easy at it seems because
    it requires knowlege not about what mandoc(1) can do, but
    about what *other* software can do.  From mandoc(1)'s
    perspective, there isn't really much of a difference:
    It sees syntax it doesn't understand.  How should it know
    whether some other software might understand it?

Consequently, making this distinction requires mandoc(1) to contain
kind of a partial implementation (at least regarding the parsing)
of the features mandoc(1) does *not* support.  For example,
distinguishing unsupported from mistyped requests and macros
requires mandoc(1) to contain a full list of all existing requests
and macros, even those it does not support.

I think mandoc(1) is now mature enough to start trying to implement
such a distinction, but that work has not been started yet.

When we switched OpenBSD to use mandoc(1) by default more than four
years ago, mandoc(1) clearly wasn't mature enough to even attempt
making this destinction automatically.  So we decided to not attempt
it at runtime (like in FreeBSD) but instead decide manually at port
checkin time, port by port, dafulting to mandoc(1) but starting
from a state where all ports having manuals had an explicit USE_GROFF.

>> So my question is: is there a better way to figure out
>> automatically if mandoc will be able to render a manpage?

For a detailed discussion of that question, please read

  http://www.openbsd.org/faq/ports/specialtopics.html#Mandoc

The short answer is:  If you want to be on the safe side, that is,
if you want to avoid that some manuals get misformatted because
mandoc(1) is used but cannot fully handle them, requiring error-free
operation with -Werror is the best you can do right now.

> Nevermind -Wfatal is what I was looking for...

That seems like a very bad idea to me.

Nowadays, mandoc(1) hardly ever throws any fatal errors at all.  I
already considered removing that level altogether, but so far didn't
do it.  Look at the portable mandoc(1) manual for a complete list
of all three (sic!) remaining "FATAL errors":

  * input too large
  * NOT IMPLEMENTED: .so with absolute path or ".."
  * .so request failed

If you only require error-free operation with -Wfatal, that effectively
amounts to *always* using mandoc(1), almost never using groff(1),
not even for manuals that mandoc mishandles and that groff would
handle much better.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

     prev parent reply	other threads:[~2015-01-04 14:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-03 23:31 Baptiste Daroussin
2015-01-03 23:37 ` Baptiste Daroussin
2015-01-04 14:25   ` Ingo Schwarze [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150104142540.GA22437@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=bapt@FreeBSD.org \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).