From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 1019 invoked from network); 19 Oct 2023 16:19:28 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 19 Oct 2023 16:19:28 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id d939b34d for ; Thu, 19 Oct 2023 16:19:26 +0000 (UTC) Received: from scc-mailout-kit-01.scc.kit.edu (scc-mailout-kit-01.scc.kit.edu [129.13.231.81]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 47fba440 for ; Thu, 19 Oct 2023 16:19:26 +0000 (UTC) Received: from hekate.asta.kit.edu ([2a00:1398:5:f401::77]) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (envelope-from ) id 1qtVk4-00Fegu-2u; Thu, 19 Oct 2023 18:19:25 +0200 Received: from login-1.asta.kit.edu ([2a00:1398:5:f400::72]) by hekate.asta.kit.edu with esmtp (Exim 4.94.2) (envelope-from ) id 1qtVk3-000KbZ-RE; Thu, 19 Oct 2023 18:19:23 +0200 Received: from schwarze by login-1.asta.kit.edu with local (Exim 4.94.2) (envelope-from ) id 1qtVk3-000ouM-4t; Thu, 19 Oct 2023 18:19:23 +0200 Date: Thu, 19 Oct 2023 18:19:23 +0200 From: Ingo Schwarze To: Alejandro Colomar Cc: tech@mandoc.bsd.lv, branden@debian.org Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP Message-ID: References: X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi Alejandro, Alejandro Colomar wrote on Thu, Oct 19, 2023 at 05:17:10PM +0200: > I had this gripe with man(7) some years ago. I thought of using the > following instead, which slightly complicates the source code, but makes > it more logical. > > $ cat nested_indent.man > .TH nested_indent 7 2023-10-19 experiments > .SH Ingo said: > .TP > Todo > Currently, when formatting .TP or .IP with a non-empty head, > [yada yada] > .RS > .PP > When formatting .IP or .RS with an empty head, mandoc needs > [yada yada] > .RE > > As you can see, here the indentation is controlled by a single RS/RE > pair, and everything within it uses PP as a normal paragraph separator. While that also generates correct terminal and typographical (PS, PDF) output in the same purely presentational sense as .TP .IP .TP, it does not help with respect to the semantic problem we are discussing here. Look at the AST generated by mandoc(1): $ mandoc -T tree nested_indent.man title = "nested_indent" sec = "7" vol = "Miscellaneous Information Manual" os = "experiments" date = "2023-10-19" SH (block) *2:2 SH (head) 2:2 ID=HREF=Ingo_said: Ingo (text) 2:5 said: (text) 2:10 SH (body) 2:2 TP (block) *3:2 TP (head) 3:2 ID=HREF Todo (text) *4:1 TP (body) 4:1 Currently, when formatting .TP or .IP \ with a nonempty head, (text) *5:1 [yada yada] (text) *6:1 RS (block) *7:2 RS (head) 7:2 RS (body) 7:2 PP (block) *8:2 PP (head) 8:2 PP (body) 8:2 When formatting .IP or .RS with an empty head, mandoc needs (text) *9:1 [yada yada] (text) *10:1 TP (block) *12:2 TP (head) 12:2 ID=HREF=final final tag (text) *13:1 TP (body) 13:1 final body (text) *14:1 You see that the first .TP, the .RS, and the second .TP are all child nodes of the top-level .SH. The .RS is not a child of the .TP but a sibling. The two .TP nodes still aren't siblings of each other. Now on first sight, you might blame me for that and call it a mandoc artifact, arguing that mandoc instead ought to treat the .RS as a child of the first .TP. But no, that would be incorrect parsing for the following reason: the .TP inmplies an indentation, and the .RS also implies an indentation. If the .RS were a child of the .TP, we would get double indentation. You can make that argument even more convincing by adding a width argument to .RS and varying that argument. That way, you see that the .RS is indented relative to the .SH, not relative to the .TP. There are some cases where it is not completely clear whether one man(7) node following another man(7) node is a child or a sibling. mandoc(1) makes arbitrary choices in such ambiguous cases, usually opting for sibling relations where possible and avoiding unnecessary child relationships. But this is not an ambiguous case. Just like the .IP, the .RS is definitely a sibling and not a child of the .TP. As i said, no block can nest inside .TP. That's why i brought up .RS in my reply and developed rules for handling it in a similar way as .IP, even though you did not mention .RS before. > You could put the RS before the first paragraph, but then an unwanted > line break appears after the tag. No matter where you put the .RS, it will never be a child of .TP. > (Maybe man(7) could be tweaked so > that RS doesn't insert the line break after a TP.) Not really a useful idea because .RS doesn't help with the actual problem in the first place. > In the end I didn't switch to that scheme, because IP just worked, but > I might consider it if it proves to be useful. What do you think? As i said, i am not aware of a better solution than .TP .IP .TP. In particular, .RS is not better because it causes exactly the same trouble and potentially more trouble besides. But i also said that trying to define "good style" for man(7) is a fool's errand. Because man(7) code is so exceedingly difficult to write, man(7) code that is very clearly bad style is very often found in the wild, so there is ample opportunity for saying "this is bad style." In some cases, it is also possible to point out better style, for example .BR "some word" . is clearly better style than .B some word\c \&. even though both are correct man(7) code and even though there are situations in man(7) where \c is unavoidable. But very frequently, situations arise where man(7) doesn't really allow any good solution, and the best you can do is not making the source gratuitiously worse than it needs to be. The .TP .IP .TP idiom is such an example. It's definitely ugly from both semantic and stylistic points of view, but no good solution is available. I'm willing to go further and claim that no better solution can be designed even if you are willing to introduce a new macro or change the way the .TP API is defined, even in incompatible ways, because it's not this particular macro that is broken. What is broken is the fundamental design of the language: the language not only predates the concept of semantic markup, but it also predates the concept of block nesting in markup languages. Yes, that is hard to believe for people born after 1970 because those people have essentially grown up with HTML and LaTeX and those two markup languages have defined their concept of what a markup language is, but let's face it, man(7) predates those fundamental concepts, and it shows all over the place. As long as people are using the language, mandoc(1) needs to somehow deal with the mess. I'm not happy with that because it is wasting a lot of development time which could be spent in more productive ways, but what can i do... Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv