From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 28964 invoked from network); 19 Oct 2023 14:45:25 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 19 Oct 2023 14:45:25 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 4114dc30 for ; Thu, 19 Oct 2023 14:45:23 +0000 (UTC) Received: from scc-mailout-kit-01.scc.kit.edu (scc-mailout-kit-01.scc.kit.edu [129.13.231.81]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 8111e894 for ; Thu, 19 Oct 2023 14:45:23 +0000 (UTC) Received: from hekate.asta.kit.edu ([2a00:1398:5:f401::77]) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (envelope-from ) id 1qtUH4-00FQfB-1Q; Thu, 19 Oct 2023 16:45:22 +0200 Received: from login-1.asta.kit.edu ([2a00:1398:5:f400::72]) by hekate.asta.kit.edu with esmtp (Exim 4.94.2) (envelope-from ) id 1qtUH3-000KGz-OD; Thu, 19 Oct 2023 16:45:21 +0200 Received: from schwarze by login-1.asta.kit.edu with local (Exim 4.94.2) (envelope-from ) id 1qtUH3-000oJI-1h; Thu, 19 Oct 2023 16:45:21 +0200 Date: Thu, 19 Oct 2023 16:45:21 +0200 From: Ingo Schwarze To: Alejandro Colomar Cc: tech@mandoc.bsd.lv Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP Message-ID: References: X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi Alejandro, Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200: > On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote: >> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200: >>> groff -man -Thtml seems to never produce a blank line before a TP. >> What do you mean by "blank line"? > What my eyes experience as a relatively large inter-paragraph space. Heh. That's not a very useful definition when talking about HTML code, given that the HTML language does not provide any non-deprecated syntax or semantics related to paragraph spacing. :) >>> mandoc -man -Thtml produces one in some cases, and I can't see a >>> pattern. >>> >>> I found this bug while reading feature_test_macros(7) in the Debian >>> online manpages: >>> >> You can see that particular page rendered here: >> https://man.bsd.lv/Test/ftm.7 > I don't see the bug there. I'm going to guess it's just another case of > a missing CSS file. Actually, there *is* a problem with the HTML code in that page, even though you did not see it because the CSS file hides it. That page contains this HTML code:
_ISOC11_SOURCE (since glibc 2.16)
Exposes declarations consistent with the ISO C11 standard. Defining this macro also enables C99 and C95 features (like _ISOC99_SOURCE).
Invoking the C compiler with the option -std=c11 produces the same effects as defining this macro.
_LARGEFILE64_SOURCE
Expose definitions for the alternative API specified by the LFS (Large File Summit) as a "transitional extension" to the Single UNIX Specification. [...] That's of course quite nonsensical because _ISOC11_SOURCE and _LARGEFILE64_SOURCE are intended by the page author as adjacent entries in the same list, but mandoc(1) puts them into different lists, with yet another single-item list in between. >> It is a long page, and i have been unable to figure out what exactly >> you are talking about. >> >> Please point me to the precise position in the file where vertical >> spacing before a .TP macro feels lacking or execessive to you, > In the Debian bullseye page, check the inter-paragraph space before the > tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag > _LARGEFILE_SOURCE. Now i see, thank you for pointing me to the specific place. The trouble is caused by the following man(7) idiom: .TP first tag first body .IP still in the first body .TP second tag second body The author's intent here is that the two .TP macros mark up adjacent items in the same list and the .IP marks up an ordinary paragraph break within the item body of the first list entry. Now, using .IP for an ordinary paragraph break is no doubt surprising, but it works (from a purely presentational point of view) because .IP does assert the same vertical spacing as .PP would and because .IP asserts the same indentation as the previous .TP did. So logically, what the author wanted was a list with two entries, one containing two paragraphs of text, the other containing one paragraph of text. Technically, what we got is three paragraphs of text, all indented by the same amount, the first and last containing a tag and the middle one having an empty tag, with no indication that there is any relation between any of the paragraphs, let alone that they form a list. Figuring out the logical list structure, if any, is left as a guessing exercise to the formatter. Here the conceptual inadequacy of the man(7) language becomes blatantly obvious. With very few exceptions, the language does not provide any concept of block nesting. The only exception is that various macros can be nested inside .SH, .SS, and .RS. But nothing can be nested inside a list item body, neither a paragraph break, nor .RS, nor another list. Consequently, i think the fundamental design of the man(7) language is too weak to adequately express a list item containing more than a single paragraph of text, and the crude presentational workaround of splitting the item body into two unrelated paragraphs with different introducing macros indeed looks like the best workaround available. The longer i look at the man(7) language, the more convinced i become that it is so rotten to the core that trying to provide a style guide to write good man(7) pages is nothing but a fool's errand, and trying to add a few semantical macros to the man(7) language is an even worse fool's errand because a few additions won't cure the fundamental design flaws. If you put lipstick on a pig, it's still not going to win any Miss Espana contest. Let me quote only one other example that i ran into just today. The first real-world example of .MR usage i encountered required a trailing \c escape sequence on the preceding line. Now how ironic is that? A brand-new macro introduced to improve semantics, but using is requires terribly arcane low-level presentational markup. I'm progressively becoming convinced that the language is irredeemable. Consequently, the following needs to be done in mandoc: 1. Currently, when formatting .TP or .IP with a non-empty head, the HTML formatter looks at the previous and at the following abstract syntax tree (AST) node to figure out whether the tagged paragraph is part of a list. If that previous or follwing AST node is .IP or .RS with an empty head, it will have to iterate until it finds an AST node that is neither .IP nor .RS or has a non-empty head, evaluating the properties of that node instead of the directly preceding or following node. 2. When formatting .IP or .RS with an empty head, mandoc needs to iterate backwards, searching for an AST node that is neither .IP nor .RS or has a non-empty head, and figure out whether that node is a list item, which again, as explained above, requires iterating both forwards and backwards. If it turns out we are inside a list, interrupting the list must be prevented. Instead, .IP with an empty head must be formatted like .PP, and .RS with an empty head must be formatted somewhat like .br. Probably, doing all this in the HTML formatter module would be over the top. I believe such complicated AST inspection should be done by the validation module (man_validate.c), which should set AST node flags similar to - this node starts a new list - this node starts a new list item - this node merely indicates a paragraph break - this node ends a list which the fotmatters can then readily use. This required logic is so complicated that i won't code it right away, there are more urgent matters to be taken care of. Instead, i will add it to the mandoc TODO list. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv