From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, T_TVD_MIME_EPI autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 32405 invoked from network); 18 Oct 2023 11:32:50 -0000 Received: from bsd.lv (HELO mandoc.bsd.lv) (66.111.2.12) by inbox.vuxu.org with ESMTPUTF8; 18 Oct 2023 11:32:50 -0000 Received: from fantadrom.bsd.lv (localhost [127.0.0.1]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id 4e388587 for ; Wed, 18 Oct 2023 11:32:46 +0000 (UTC) Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by mandoc.bsd.lv (OpenSMTPD) with ESMTP id a4ecfdcc for ; Wed, 18 Oct 2023 11:32:46 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 3F928CE24B1 for ; Wed, 18 Oct 2023 11:32:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 04282C433C7 for ; Wed, 18 Oct 2023 11:32:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697628761; bh=Y1zmMAW/wEiksujtauQTmn3iAj41lzlS8aal7mAqzW4=; h=Date:From:To:Subject:References:In-Reply-To:From; b=gxWNQG9R5dzrAvIa3+76y+hEz9f5c816rkoAf8dUhZRYZi+EJDuMygvdQAOkPdM7/ AqXVWcT/VoF+X/zi+pecbClE8pns1R1jOfKaZlbu6GIb7EewTePtbvI42zFgXqvPyl FjjPOJ8NUy8MfVzFk8gscF4H0uG+aBG0CHcA+84GfqK3Kjzpo5CRbBYmC8LjjWRaxr 05bmw+WNE8lo/rWQYwBAp8IedxMkWhAJpUBQ9521IY7GU6ElsDzMxTR4I9J4WpElRF e1vdJ4TRDn6vMmC/eJzhw/iNRHQXMZMgtZW2WNDxNgXR1WF8BcVZ2Az3w5qZiojfVp s5lLBci2cxVUQ== Date: Wed, 18 Oct 2023 13:32:31 +0200 From: Alejandro Colomar To: tech@mandoc.bsd.lv Subject: Re: mandoc -man -Thtml: unwanted line break after bullet (.IP) Message-ID: References: X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="CqpPQWKbk1pQFDBF" Content-Disposition: inline In-Reply-To: --CqpPQWKbk1pQFDBF Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Wed, 18 Oct 2023 13:32:31 +0200 From: Alejandro Colomar To: tech@mandoc.bsd.lv Subject: Re: mandoc -man -Thtml: unwanted line break after bullet (.IP) Hi Ingo, On Wed, Oct 18, 2023 at 02:04:46AM +0200, Ingo Schwarze wrote: [...] >=20 > The problem here is that just like it is *significantly* more difficult > to write a good man(7) page than a good mdoc(7) page - due to the fact > that the man(7) language is totally inadequate, by its fundamental > design, to express semantic markup - it is *massively* more difficult > to produce good HTML output from man(7) than from mdoc(7), and for > exactly the same reason that makes writing man87) so difficult > for humans in the first place: The man(7) language is a purely > presentational language with no feature whatsoever to convey anything > semantic, whereas the HTML language is a purely semantic language > with no feature whatsoever for expressing anything presentational. >=20 > Consequently, the mdoc(7) HTML formatter is very straightforward: > .Bl -tag becomes
> .Bl -bullet becomes
    > .Bl -enum becomes
      > and we are *certain* that we have met the manual page author's intention. > End of the story, everybody is happy now. Yup. I've set some rules (as much as the Linux man-pages can set any standard) for writing lists consistently in man(7) pages. Maybe that helps get better heuristics if authors follow them: Lists There are different kinds of lists: Tagged paragraphs These are used for a list of tags and their descrip=E2=80= =90 tions. When the tags are constants (either macros or numbers) they are in bold. Use the .TP macro. An example is this "Tagged paragraphs" subsection is it=E2=80= =90 self. Ordered lists Elements are preceded by a number in parentheses (1), (2). These represent a set of steps that have an order. When there are substeps, they will be numbered like (4.2). Positional lists Elements are preceded by a number (index) in square brackets [4], [5]. These represent fields in a set. The first index will be: 0 When it represents fields of a C data structure, to be consistent with arrays. 1 When it represents fields of a file, to be con=E2=80= =90 sistent with tools like cut(1). Alternatives list Elements are preceded by a letter in parentheses (a), (b). These represent a set of (normally) exclusive al=E2=80= =90 ternatives. Bullet lists Elements are preceded by bullet symbols (\[bu]). Any=E2=80= =90 thing that doesn=E2=80=99t fit elsewhere is usually covered= by this type of list. Numbered notes Not really a list, but the syntax is identical to "posi=E2=80= =90 tional lists". There should always be exactly 2 spaces between the list symbol and the elements. This doesn=E2=80=99t apply to "tagged paragraph= s", which use the default indentation rules. The Linux man-pages project already uses them, with only 2 exceptions: $ grep -rh '^\.IP .*' man* | sort | uniq -c 42 .IP (1) 5 1 .IP (1.1) 7 1 .IP (1.2) 1 .IP (1.3) 1 .IP (1.4) 42 .IP (2) 1 .IP (2.1) 7 1 .IP (2.2) 31 .IP (3) 20 .IP (4) 14 .IP (5) 1 .IP (5.1) 7 1 .IP (5.2) 6 .IP (6) 4 .IP (7) 3 .IP (8) 2 .IP (9) 7 .IP (a) 5 7 .IP (b) 2 .IP (c) 28 .IP * 5 .IP * 2 3 .IP [0] 5 3 .IP [1] 7 .IP [1] 5 9 .IP [2] 7 .IP [3] 5 .IP [4] 3 .IP [5] 3 .IP [6] 2 .IP [7] 1 .IP [8] 96 .IP \(bu 2 1017 .IP \[bu] 387 .IP \[bu] 3 The exceptions are from pages that I don't control (bpf-helpers.7 is autogenerated; I've asked the tz project if they are interested in using this style): $ grep -rl '\\(bu' man* man7/bpf-helpers.7 $ grep -rl '\.IP \*' man* man5/tzfile.5 >=20 > For man(7), almost anything we do involves crude guesswork. > Both .TP and .IP can become any of
      ,
        , or
          and which > is the right one has to be *guessed* from the content. > In addition, while mdoc(7) makes it explicit where a list begins > and where that list ends, in man(7), we have to *guess* whether > any given .TP/.IP macro begins, continues, or ends a list. >=20 > Look at https://cvsweb.bsd.lv/mandoc/man_html.c?rev=3DHEAD , > function man_IP_pre(). Right now, it maps > .IP * to .Bl -bullet > .IP \(bu to .Bl -bullet > .IP - to .Bl -dash > anything else to .Bl -tag > together with its helper function list_continues(). Seems good. As you say, it's just missing \[bu]. >=20 > Now, the function list_continues() already recognizes \(bu as a > potential marker for a bullet list. But that buster manual page > uses \[bu] instead. I'm not saying that is wrong, mind you. >=20 > What i am saying is that parsing and formatting the man(7) language > is a nightmare and fragile as hell. The fundamental design of that > language is totally 1970ies-style, and it shows in each and every > corner. I'm not blaming Doug; at the time man(7) was designed, it was > a monumental step forward, and nobody could have been expected to do > better in the 1970ies. But after Cynthia Livingston invented mdoc(7) > in 1989/90 and Tim Berners-Lee invented HTML in 1989/90, the fundamental > concept of man(7) was totally outdated with no redeeming qualities > and no hope for healing, and it should have been completely abandoned > by 1995 at the latest. A documentation language that is essentially > presentational obviously has no place in this millenium. >=20 > All the same, this conversation has been useful, and i need to change > three aspects of mandoc. So, thanks for reporting! You're welcome! >=20 > 1. change the Makefile to always install mandoc.css > 2. better document what mandoc.css is needed for, > what the embedded default CSS does and does not provide, > and that using a custom CSS file requires a high level of > proficiency in both the CSS and the mdoc(7) languages > 3. teach list_continues() that \[bu] is the same as \(bu. >=20 > But this is really an uphill battle. As long as you rely on man(7), > the fundamental design of that language implies that you will again > and again bump into similar formatting issues - even with mandoc. >=20 > Still, as long as man(7) is used in the wild, please keep reporting > such issues. I rarely look at man(7) pages, so i need reports from > the field to be able to make progress. :-) Cheers, Alex >=20 > Yours, > Ingo > -- > To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv >=20 --=20 --CqpPQWKbk1pQFDBF Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmUvwk8ACgkQnowa+77/ 2zKl9w//S/PltsU1irtTE+TziPQi+uvaQHhFI7+4poAStma9aDQxmFcs8TAoD2IW zWX6SdK3/7wleuzCOUYF0Xnxgcd8j6Tf+1nPusFY797Ht2rDSjgqj0F0lNWksxIT lXnyE8oRK7NKA8H7A39QPLj/hx1tR7z4msi6K7/UwfFRxlIc22TDnbYZUS19rk/Y LTX+5e1QZ2xf8V/cznh246GAdUNONoDxS1OoJak0JcWta0jCTN6w9w+W+Z8JJRVn x0ogdJF2VGMfwlOoxkAlDmqP1/7Bi3g9hLu7lcJ8r7xy5T0pf0bQNgxGjw8EC6KI ilS/6nL8HqcjLFptguF0jTjdOmhcCv3QD1DOQ2uoWHsgMBwVjGaGQ1U5C/LkN9Au AuXEkyX3yKObgUaSeJfuFcbfk629fuB7Pl3ZjP/bthFj25TsdLxAfn2evLmMKqbt XtuCn8MPmGkcJpzx8wQJKsQYD+mtPO5kwaWnTMq7SAiAJ8I7TVeTzk8Pnlcm9Oze ShHXGHyEfGvFEcK4FqcFMRjWtymSmbBzO7P8aAIof5Cr2kcGyew1dIxY+BxdTsux QtNKowI7bJwHTaVPRvzwdFw3IIHJXIc4NFfXYQktT/WZvFMRlSnGHkGik8xv7P+V MDoS69EaNWm8aRfghS/20U33Qu790eGsiCBs2Ej50l8IhQLcJYM= =4CX+ -----END PGP SIGNATURE----- --CqpPQWKbk1pQFDBF-- -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv