Hi Ingo,
On Wed, Oct 18, 2023 at 02:04:46AM +0200, Ingo Schwarze wrote:
[...]
>
> The problem here is that just like it is *significantly* more difficult
> to write a good man(7) page than a good mdoc(7) page - due to the fact
> that the man(7) language is totally inadequate, by its fundamental
> design, to express semantic markup - it is *massively* more difficult
> to produce good HTML output from man(7) than from mdoc(7), and for
> exactly the same reason that makes writing man87) so difficult
> for humans in the first place: The man(7) language is a purely
> presentational language with no feature whatsoever to convey anything
> semantic, whereas the HTML language is a purely semantic language
> with no feature whatsoever for expressing anything presentational.
>
> Consequently, the mdoc(7) HTML formatter is very straightforward:
> .Bl -tag becomes
> .Bl -bullet becomes
> .Bl -enum becomes
> and we are *certain* that we have met the manual page author's intention.
> End of the story, everybody is happy now.
Yup. I've set some rules (as much as the Linux man-pages can set any
standard) for writing lists consistently in man(7) pages. Maybe that
helps get better heuristics if authors follow them:
Lists
There are different kinds of lists:
Tagged paragraphs
These are used for a list of tags and their descrip‐
tions. When the tags are constants (either macros or
numbers) they are in bold. Use the .TP macro.
An example is this "Tagged paragraphs" subsection is it‐
self.
Ordered lists
Elements are preceded by a number in parentheses (1),
(2). These represent a set of steps that have an order.
When there are substeps, they will be numbered like
(4.2).
Positional lists
Elements are preceded by a number (index) in square
brackets [4], [5]. These represent fields in a set.
The first index will be:
0 When it represents fields of a C data structure,
to be consistent with arrays.
1 When it represents fields of a file, to be con‐
sistent with tools like cut(1).
Alternatives list
Elements are preceded by a letter in parentheses (a),
(b). These represent a set of (normally) exclusive al‐
ternatives.
Bullet lists
Elements are preceded by bullet symbols (\[bu]). Any‐
thing that doesn’t fit elsewhere is usually covered by
this type of list.
Numbered notes
Not really a list, but the syntax is identical to "posi‐
tional lists".
There should always be exactly 2 spaces between the list symbol
and the elements. This doesn’t apply to "tagged paragraphs",
which use the default indentation rules.
The Linux man-pages project already uses them, with only 2 exceptions:
$ grep -rh '^\.IP .*' man* | sort | uniq -c
42 .IP (1) 5
1 .IP (1.1) 7
1 .IP (1.2)
1 .IP (1.3)
1 .IP (1.4)
42 .IP (2)
1 .IP (2.1) 7
1 .IP (2.2)
31 .IP (3)
20 .IP (4)
14 .IP (5)
1 .IP (5.1) 7
1 .IP (5.2)
6 .IP (6)
4 .IP (7)
3 .IP (8)
2 .IP (9)
7 .IP (a) 5
7 .IP (b)
2 .IP (c)
28 .IP *
5 .IP * 2
3 .IP [0] 5
3 .IP [1]
7 .IP [1] 5
9 .IP [2]
7 .IP [3]
5 .IP [4]
3 .IP [5]
3 .IP [6]
2 .IP [7]
1 .IP [8]
96 .IP \(bu 2
1017 .IP \[bu]
387 .IP \[bu] 3
The exceptions are from pages that I don't control (bpf-helpers.7 is
autogenerated; I've asked the tz project if they are interested in using
this style):
$ grep -rl '\\(bu' man*
man7/bpf-helpers.7
$ grep -rl '\.IP \*' man*
man5/tzfile.5
>
> For man(7), almost anything we do involves crude guesswork.
> Both .TP and .IP can become any of , , or and which
> is the right one has to be *guessed* from the content.
> In addition, while mdoc(7) makes it explicit where a list begins
> and where that list ends, in man(7), we have to *guess* whether
> any given .TP/.IP macro begins, continues, or ends a list.
>
> Look at https://cvsweb.bsd.lv/mandoc/man_html.c?rev=HEAD ,
> function man_IP_pre(). Right now, it maps
> .IP * to .Bl -bullet
> .IP \(bu to .Bl -bullet
> .IP - to .Bl -dash
> anything else to .Bl -tag
> together with its helper function list_continues().
Seems good. As you say, it's just missing \[bu].
>
> Now, the function list_continues() already recognizes \(bu as a
> potential marker for a bullet list. But that buster manual page
> uses \[bu] instead. I'm not saying that is wrong, mind you.
>
> What i am saying is that parsing and formatting the man(7) language
> is a nightmare and fragile as hell. The fundamental design of that
> language is totally 1970ies-style, and it shows in each and every
> corner. I'm not blaming Doug; at the time man(7) was designed, it was
> a monumental step forward, and nobody could have been expected to do
> better in the 1970ies. But after Cynthia Livingston invented mdoc(7)
> in 1989/90 and Tim Berners-Lee invented HTML in 1989/90, the fundamental
> concept of man(7) was totally outdated with no redeeming qualities
> and no hope for healing, and it should have been completely abandoned
> by 1995 at the latest. A documentation language that is essentially
> presentational obviously has no place in this millenium.
>
> All the same, this conversation has been useful, and i need to change
> three aspects of mandoc. So, thanks for reporting!
You're welcome!
>
> 1. change the Makefile to always install mandoc.css
> 2. better document what mandoc.css is needed for,
> what the embedded default CSS does and does not provide,
> and that using a custom CSS file requires a high level of
> proficiency in both the CSS and the mdoc(7) languages
> 3. teach list_continues() that \[bu] is the same as \(bu.
>
> But this is really an uphill battle. As long as you rely on man(7),
> the fundamental design of that language implies that you will again
> and again bump into similar formatting issues - even with mandoc.
>
> Still, as long as man(7) is used in the wild, please keep reporting
> such issues. I rarely look at man(7) pages, so i need reports from
> the field to be able to make progress.
:-)
Cheers,
Alex
>
> Yours,
> Ingo
> --
> To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
>
--