From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, T_TVD_MIME_EPI autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 30190 invoked from network); 19 Oct 2023 15:17:23 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 19 Oct 2023 15:17:23 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id d2ef8e17 for ; Thu, 19 Oct 2023 15:17:22 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 177da46a for ; Thu, 19 Oct 2023 15:17:22 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 80297B82A1F; Thu, 19 Oct 2023 15:17:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E975C433C8; Thu, 19 Oct 2023 15:17:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697728639; bh=x8SryG29LiQQatpbK8ry+YGSbDWj/T7+cZ26OI4Po5A=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=bhH0mo6UVuXLpwnGVA4ltJ0O+qVfAbi01sY9jVwcOb7anAc0XLgk8TKEsGHna2wD5 mScYAesEy65DfZvO0UwdlVR6hzfdFZZKn46wNagHpzmE16Gc8is9v2jBVjm90d3Vbm HwCMHD7jhqoH2FAkYlJ08UJ0wYTaST9Uh5fzt/VChlG0dtvq/FzvCnuYurl3n4iBIM NKAu1kEuKGCch0MgcjffmttTWoRLxwT2ugtNiPfzoF4pqq6mEn30M9EknZ+waZ0jlY XsATcj5nfm39rchkoYwfA9WqQL62RlXlpSCbRvxslku+AQmNAtgCi9aZi7lT6sVt22 +efIKGyqorAZg== Date: Thu, 19 Oct 2023 17:17:10 +0200 From: Alejandro Colomar To: Ingo Schwarze Cc: tech@mandoc.bsd.lv, "G. Branden Robinson" Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP Message-ID: References: X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Le2insNiADWNOapJ" Content-Disposition: inline In-Reply-To: --Le2insNiADWNOapJ Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Thu, 19 Oct 2023 17:17:10 +0200 From: Alejandro Colomar To: Ingo Schwarze Cc: tech@mandoc.bsd.lv, "G. Branden Robinson" Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP On Thu, Oct 19, 2023 at 04:45:21PM +0200, Ingo Schwarze wrote: > Hi Alejandro, >=20 > Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200: > > On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote: > >> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200: >=20 > >>> groff -man -Thtml seems to never produce a blank line before a TP. > >> What do you mean by "blank line"? > > What my eyes experience as a relatively large inter-paragraph space. >=20 > Heh. That's not a very useful definition when talking about HTML code, > given that the HTML language does not provide any non-deprecated syntax > or semantics related to paragraph spacing. :) I know. :) >=20 > >>> mandoc -man -Thtml produces one in some cases, and I can't see a > >>> pattern. > >>>=20 > >>> I found this bug while reading feature_test_macros(7) in the Debian > >>> online manpages: > >>> >=20 > >> You can see that particular page rendered here: > >> https://man.bsd.lv/Test/ftm.7 >=20 > > I don't see the bug there. I'm going to guess it's just another case of > > a missing CSS file. >=20 > Actually, there *is* a problem with the HTML code in that page, > even though you did not see it because the CSS file hides it. >=20 > That page contains this HTML code: >=20 >
>
_ISOC11_SOURCE (since glibc 2.16)
>
Exposes declarations consistent with the ISO C11 standard. > Defining this macro also enables C99 and C95 features > (like _ISOC99_SOURCE).
>
>
>
>
Invoking the C compiler with the option -std=3Dc11 > produces the same effects as defining this macro.
>
>
>
_LARGEFILE64_SOURCE
>
Expose definitions for the alternative API specified by the > LFS (Large File Summit) as a "transitional extension" > to the Single UNIX Specification. [...] >=20 > That's of course quite nonsensical because _ISOC11_SOURCE and > _LARGEFILE64_SOURCE are intended by the page author as adjacent > entries in the same list, but mandoc(1) puts them into different > lists, with yet another single-item list in between. >=20 > >> It is a long page, and i have been unable to figure out what exactly > >> you are talking about. > >>=20 > >> Please point me to the precise position in the file where vertical > >> spacing before a .TP macro feels lacking or execessive to you, >=20 > > In the Debian bullseye page, check the inter-paragraph space before the > > tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag > > _LARGEFILE_SOURCE. >=20 > Now i see, thank you for pointing me to the specific place. >=20 > The trouble is caused by the following man(7) idiom: >=20 > .TP > first tag > first body > .IP > still in the first body > .TP > second tag > second body Hmm, that's an old enemy showing up. >=20 > The author's intent here is that the two .TP macros mark up adjacent > items in the same list and the .IP marks up an ordinary paragraph > break within the item body of the first list entry. >=20 > Now, using .IP for an ordinary paragraph break is no doubt surprising, > but it works (from a purely presentational point of view) because .IP > does assert the same vertical spacing as .PP would and because .IP > asserts the same indentation as the previous .TP did. >=20 > So logically, what the author wanted was a list with two entries, > one containing two paragraphs of text, the other containing one > paragraph of text. Technically, what we got is three paragraphs of > text, all indented by the same amount, the first and last containing > a tag and the middle one having an empty tag, with no indication that > there is any relation between any of the paragraphs, let alone that > they form a list. Figuring out the logical list structure, if any, > is left as a guessing exercise to the formatter. >=20 > Here the conceptual inadequacy of the man(7) language becomes > blatantly obvious. With very few exceptions, the language does not > provide any concept of block nesting. The only exception is that > various macros can be nested inside .SH, .SS, and .RS. But nothing > can be nested inside a list item body, neither a paragraph break, > nor .RS, nor another list. I had this gripe with man(7) some years ago. I thought of using the following instead, which slightly complicates the source code, but makes it more logical. $ cat nested_indent.man=20 .TH nested_indent 7 2023-10-19 experiments .SH Ingo said: .TP Todo Currently, when formatting .TP or .IP with a non-empty head, the HTML formatter looks at the previous and at the following abstract syntax tree (AST) node to figure out whether the tagged paragraph is part of a list. If that previous or follwing AST node is .IP or .RS with an empty head, it will have to iterate until it finds an AST node that is neither .IP nor .RS or has a non-empty head, evaluating the properties of that node instead of the directly preceding or following node. .RS .PP When formatting .IP or .RS with an empty head, mandoc needs to iterate backwards, searching for an AST node that is neither \&.IP nor .RS or has a non-empty head, and figure out whether that node is a list item, which again, as explained above, requires iterating both forwards and backwards. If it turns out we are inside a list, interrupting the list must be prevented. Instead, .IP with an empty head must be formatted like .PP, and .RS with an empty head must be formatted somewhat like .br. .RE As you can see, here the indentation is controlled by a single RS/RE pair, and everything within it uses PP as a normal paragraph separator. You could put the RS before the first paragraph, but then an unwanted line break appears after the tag. (Maybe man(7) could be tweaked so that RS doesn't insert the line break after a TP.) In the end I didn't switch to that scheme, because IP just worked, but I might consider it if it proves to be useful. What do you think? [CC +=3D Branden, in case he wants to add his opinion too] Cheers, Alex >=20 > Consequently, i think the fundamental design of the man(7) language > is too weak to adequately express a list item containing more than a > single paragraph of text, and the crude presentational workaround of > splitting the item body into two unrelated paragraphs with different > introducing macros indeed looks like the best workaround available. >=20 > The longer i look at the man(7) language, the more convinced i become > that it is so rotten to the core that trying to provide a style > guide to write good man(7) pages is nothing but a fool's errand, > and trying to add a few semantical macros to the man(7) language is > an even worse fool's errand because a few additions won't cure the > fundamental design flaws. If you put lipstick on a pig, it's still > not going to win any Miss Espana contest. >=20 > Let me quote only one other example that i ran into just today. > The first real-world example of .MR usage i encountered required > a trailing \c escape sequence on the preceding line. Now how > ironic is that? A brand-new macro introduced to improve semantics, > but using is requires terribly arcane low-level presentational > markup. I'm progressively becoming convinced that the language > is irredeemable. >=20 >=20 > Consequently, the following needs to be done in mandoc: >=20 > 1. Currently, when formatting .TP or .IP with a non-empty head, > the HTML formatter looks at the previous and at the following > abstract syntax tree (AST) node to figure out whether the > tagged paragraph is part of a list. > If that previous or follwing AST node is .IP or .RS with an > empty head, it will have to iterate until it finds an AST node > that is neither .IP nor .RS or has a non-empty head, evaluating > the properties of that node instead of the directly preceding > or following node. >=20 > 2. When formatting .IP or .RS with an empty head, mandoc needs > to iterate backwards, searching for an AST node that is neither > .IP nor .RS or has a non-empty head, and figure out whether that > node is a list item, which again, as explained above, requires > iterating both forwards and backwards. > If it turns out we are inside a list, interrupting the list > must be prevented. Instead, .IP with an empty head must be > formatted like .PP, and .RS with an empty head must be formatted > somewhat like .br. >=20 > Probably, doing all this in the HTML formatter module would be > over the top. I believe such complicated AST inspection should > be done by the validation module (man_validate.c), which should > set AST node flags similar to >=20 > - this node starts a new list > - this node starts a new list item > - this node merely indicates a paragraph break > - this node ends a list >=20 > which the fotmatters can then readily use. >=20 > This required logic is so complicated that i won't code it right > away, there are more urgent matters to be taken care of. > Instead, i will add it to the mandoc TODO list. >=20 > Yours, > Ingo --=20 --Le2insNiADWNOapJ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmUxSHYACgkQnowa+77/ 2zKDhQ//c/RtpLshL0KiJ1UBgErv5Lr7EiIIlF7MxPnGv/XsVLWiF56EM9RYYIXL 6s6UWvTJUMbQxWmWAc6dYTjvteGhg1mVvU5NK1lQaefoHvABdo29EQsHncmMT5g9 k1Gij9s4tPKJ5vt9dboRYzwzM6BDBEA24z7xYM3hCCAPqviQZfjXv+zFhgHAMPGm 17snqMT+ZiNmYX0OWxOeq4apboBC++WcqCLaUBtOsgbCKUS03EqqSXMSWA16vvJg wxbe47OiYXKbWSmuf0igIQ22RWHErzfglpAyMWxqEvElm/5PGLGNvvg/fWNUy4wQ NoLGPQHxvJwxygF6DKrpDGa63AMzQ474LR7PxXV4KlYJBrJ3/1I+j6XmIR5IBiow rjM4t68/R6v0c0boWEWGqIGIuHSEC2XLRBuwjn8oAvsr2sXE3fXhaHGOcKfWjWJw DiX/+KaC5iE1U2WeaL6vzILk4BjKJlWmUrupH5RqB4vymjjqwW+x0Tb+4o2bv67V 4EKnot7MXQ5Qzob8PnthwBjK2Bnhhc0zSpUatp4UKsSouqq/LlpBp5UnfvUpkT9C tSQkd/3Y7o2aqUJay0Tn850LdMQF8g3GtCZp3UQwORlASELx9+hT5IwMP9BsLaph EWeMVzi+o/ziGRGqZp1LCs/EPa+BTuVXont5+6R7tPLV8sizcmE= =X5p9 -----END PGP SIGNATURE----- --Le2insNiADWNOapJ-- -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv