From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout-kit-01-web.scc.kit.edu (scc-mailout-kit-01-web.scc.kit.edu [129.13.231.93]); by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id 33a38b59; for ; Sun, 4 Jan 2015 09:25:46 -0500 (EST) Received: from asta-nat.asta.uni-karlsruhe.de ([172.22.63.82] helo=hekate.usta.de) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (envelope-from ) id 1Y7m7h-0006wJ-Gy; Sun, 04 Jan 2015 15:25:42 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1Y7m7h-00030a-AC; Sun, 04 Jan 2015 15:25:41 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1Y7m7h-0001W2-8j; Sun, 04 Jan 2015 15:25:41 +0100 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1Y7m7g-0006Zr-S0; Sun, 04 Jan 2015 15:25:40 +0100 Date: Sun, 4 Jan 2015 15:25:40 +0100 From: Ingo Schwarze To: Baptiste Daroussin Cc: tech@mdocml.bsd.lv Subject: Re: Reliable way to determine that mandoc cannot render a manpage Message-ID: <20150104142540.GA22437@iris.usta.de> References: <20150103233118.GG75600@ivaldir.etoilebsd.net> <20150103233758.GH75600@ivaldir.etoilebsd.net> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150103233758.GH75600@ivaldir.etoilebsd.net> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Baptiste, Baptiste Daroussin wrote on Sun, Jan 04, 2015 at 12:37:58AM +0100: > On Sun, Jan 04, 2015 at 12:31:18AM +0100, Baptiste Daroussin wrote: >> On FreeBSD when switching to mandoc(1) as a default renderer for >> manpages I made it falling back on groff(1) (for now :)) You will probably need some kind of fallback for quite some time, if not for good. While it is not good style to use the full power of roff(7) in manuals, there will always be some manuals that use low-level roff(7) features not implemented in mandoc(1). By implementing many features in mandoc(1) that we wouldn't have thought a few years ago we would ever support in mandoc(1), the number of such manuals has been much reduced, but it's not clear that we will ever bring it down to zero. >> if mandoc is not able to render the manpage >> >> To discover the bad manpages I run mandoc -Tlint -Werror and if >> a failure occurs then the fall back happens. >> >> After checking I can see that mandoc is often correctly able to >> render the manpages even if they have errors. Yes. The mandoc(1) manual says: error An input file contains syntax that cannot be safely interpreted, either because it is invalid or because mandoc does not implement it yet. By discarding part of the input or inserting missing tokens, the parser is able to continue, and the error does not prevent generation of formatted output, but typically, preparing that output involves information loss, broken document structure or unintended formatting. So, errors happen for two reasons: 1. The document uses syntax that mandoc(1) does not implement, but groff(1) does. In that case, you want to use groff(1). 2. The document uses syntax that is just wrong, where it isn't even specified what it should do, and that consequently *no* formatter can handle properly. In that case, it is not clear whether whatever implementation-dependent behaviour mandoc(1) or groff(1) exhibit happens to be closer to what the author actually intended. I tend to think that in the majority of cases, mandoc(1) is the better choice than groff(1) for such malformed pages; i thing it is a bit more forgiving and the output often makes a bit of sense even for clearly malformed input, while groff(1) more often resorts to the principle of "garbage in, garbage out". That isn't always true, though, there are certainly some counter-examples what groff(1) ahndles specific malformed input better. On first sight, it seems that throwing the same error level in both of these cases is stupid; the distinction between "malformed input" and "unsupported input" is definitely relevant. There are two reasons why mandoc(1) throws the same level: 1. Historical reasons. We struggled for years to properly implement the current distinction warning/error/fatal. What we now have isn't perfect, but quite good; but we didn't come round yet to implement the above distinction in the "error" level. 2. The distinction is not quite as easy at it seems because it requires knowlege not about what mandoc(1) can do, but about what *other* software can do. From mandoc(1)'s perspective, there isn't really much of a difference: It sees syntax it doesn't understand. How should it know whether some other software might understand it? Consequently, making this distinction requires mandoc(1) to contain kind of a partial implementation (at least regarding the parsing) of the features mandoc(1) does *not* support. For example, distinguishing unsupported from mistyped requests and macros requires mandoc(1) to contain a full list of all existing requests and macros, even those it does not support. I think mandoc(1) is now mature enough to start trying to implement such a distinction, but that work has not been started yet. When we switched OpenBSD to use mandoc(1) by default more than four years ago, mandoc(1) clearly wasn't mature enough to even attempt making this destinction automatically. So we decided to not attempt it at runtime (like in FreeBSD) but instead decide manually at port checkin time, port by port, dafulting to mandoc(1) but starting from a state where all ports having manuals had an explicit USE_GROFF. >> So my question is: is there a better way to figure out >> automatically if mandoc will be able to render a manpage? For a detailed discussion of that question, please read http://www.openbsd.org/faq/ports/specialtopics.html#Mandoc The short answer is: If you want to be on the safe side, that is, if you want to avoid that some manuals get misformatted because mandoc(1) is used but cannot fully handle them, requiring error-free operation with -Werror is the best you can do right now. > Nevermind -Wfatal is what I was looking for... That seems like a very bad idea to me. Nowadays, mandoc(1) hardly ever throws any fatal errors at all. I already considered removing that level altogether, but so far didn't do it. Look at the portable mandoc(1) manual for a complete list of all three (sic!) remaining "FATAL errors": * input too large * NOT IMPLEMENTED: .so with absolute path or ".." * .so request failed If you only require error-free operation with -Wfatal, that effectively amounts to *always* using mandoc(1), almost never using groff(1), not even for manuals that mandoc mishandles and that groff would handle much better. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv