tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: tech@mandoc.bsd.lv
Subject: Re: mandoc -man -Thtml: unwanted line break after bullet (.IP)
Date: Wed, 18 Oct 2023 13:32:31 +0200	[thread overview]
Message-ID: <ZS_CVZJ-Me80kZx5@debian> (raw)
In-Reply-To: <ZS8hHsrCp9Bn2/tt@asta-kit.de>

[-- Attachment #1: Type: text/plain, Size: 6984 bytes --]

Hi Ingo,


On Wed, Oct 18, 2023 at 02:04:46AM +0200, Ingo Schwarze wrote:
[...]
> 
> The problem here is that just like it is *significantly* more difficult
> to write a good man(7) page than a good mdoc(7) page - due to the fact
> that the man(7) language is totally inadequate, by its fundamental
> design, to express semantic markup - it is *massively* more difficult
> to produce good HTML output from man(7) than from mdoc(7), and for
> exactly the same reason that makes writing man87) so difficult
> for humans in the first place: The man(7) language is a purely
> presentational language with no feature whatsoever to convey anything
> semantic, whereas the HTML language is a purely semantic language
> with no feature whatsoever for expressing anything presentational.
> 
> Consequently, the mdoc(7) HTML formatter is very straightforward:
>   .Bl -tag      becomes   <dl>
>   .Bl -bullet   becomes   <ul>
>   .Bl -enum     becomes   <ol>
> and we are *certain* that we have met the manual page author's intention.
> End of the story, everybody is happy now.

Yup.  I've set some rules (as much as the Linux man-pages can set any
standard) for writing lists consistently in man(7) pages.  Maybe that
helps get better heuristics if authors follow them:

   Lists
       There are different kinds of lists:

       Tagged paragraphs
              These are used for a list of  tags  and  their  descrip‐
              tions.   When  the  tags are constants (either macros or
              numbers) they are in bold.  Use the .TP macro.

              An example is this "Tagged paragraphs" subsection is it‐
              self.

       Ordered lists
              Elements are preceded by a number  in  parentheses  (1),
              (2).  These represent a set of steps that have an order.

              When  there  are  substeps,  they  will be numbered like
              (4.2).

       Positional lists
              Elements are preceded by  a  number  (index)  in  square
              brackets  [4],  [5].   These  represent fields in a set.
              The first index will be:

              0      When it represents fields of a C data  structure,
                     to be consistent with arrays.
              1      When  it  represents fields of a file, to be con‐
                     sistent with tools like cut(1).

       Alternatives list
              Elements are preceded by a letter  in  parentheses  (a),
              (b).   These represent a set of (normally) exclusive al‐
              ternatives.

       Bullet lists
              Elements are preceded by bullet symbols  (\[bu]).   Any‐
              thing  that  doesn’t fit elsewhere is usually covered by
              this type of list.

       Numbered notes
              Not really a list, but the syntax is identical to "posi‐
              tional lists".

       There should always be exactly 2 spaces between the list symbol
       and the elements.  This doesn’t apply to  "tagged  paragraphs",
       which use the default indentation rules.

The Linux man-pages project already uses them, with only 2 exceptions:

$ grep -rh '^\.IP .*' man* | sort | uniq -c
     42 .IP (1) 5
      1 .IP (1.1) 7
      1 .IP (1.2)
      1 .IP (1.3)
      1 .IP (1.4)
     42 .IP (2)
      1 .IP (2.1) 7
      1 .IP (2.2)
     31 .IP (3)
     20 .IP (4)
     14 .IP (5)
      1 .IP (5.1) 7
      1 .IP (5.2)
      6 .IP (6)
      4 .IP (7)
      3 .IP (8)
      2 .IP (9)
      7 .IP (a) 5
      7 .IP (b)
      2 .IP (c)
     28 .IP *
      5 .IP * 2
      3 .IP [0] 5
      3 .IP [1]
      7 .IP [1] 5
      9 .IP [2]
      7 .IP [3]
      5 .IP [4]
      3 .IP [5]
      3 .IP [6]
      2 .IP [7]
      1 .IP [8]
     96 .IP \(bu 2
   1017 .IP \[bu]
    387 .IP \[bu] 3

The exceptions are from pages that I don't control (bpf-helpers.7 is
autogenerated; I've asked the tz project if they are interested in using
this style):

$ grep -rl '\\(bu' man*
man7/bpf-helpers.7
$ grep -rl '\.IP \*' man*
man5/tzfile.5

> 
> For man(7), almost anything we do involves crude guesswork.
> Both .TP and .IP can become any of <dl>, <ul>, or <ol> and which
> is the right one has to be *guessed* from the content.
> In addition, while mdoc(7) makes it explicit where a list begins
> and where that list ends, in man(7), we have to *guess* whether
> any given .TP/.IP macro begins, continues, or ends a list.
> 
> Look at https://cvsweb.bsd.lv/mandoc/man_html.c?rev=HEAD ,
> function man_IP_pre().  Right now, it maps
>   .IP *           to   .Bl -bullet
>   .IP \(bu        to   .Bl -bullet
>   .IP -           to   .Bl -dash
>   anything else   to   .Bl -tag
> together with its helper function list_continues().

Seems good.  As you say, it's just missing \[bu].

> 
> Now, the function list_continues() already recognizes \(bu as a
> potential marker for a bullet list.  But that buster manual page
> uses \[bu] instead.  I'm not saying that is wrong, mind you.
> 
> What i am saying is that parsing and formatting the man(7) language
> is a nightmare and fragile as hell.  The fundamental design of that
> language is totally 1970ies-style, and it shows in each and every
> corner.  I'm not blaming Doug; at the time man(7) was designed, it was
> a monumental step forward, and nobody could have been expected to do
> better in the 1970ies.  But after Cynthia Livingston invented mdoc(7)
> in 1989/90 and Tim Berners-Lee invented HTML in 1989/90, the fundamental
> concept of man(7) was totally outdated with no redeeming qualities
> and no hope for healing, and it should have been completely abandoned
> by 1995 at the latest.  A documentation language that is essentially
> presentational obviously has no place in this millenium.
> 
> All the same, this conversation has been useful, and i need to change
> three aspects of mandoc.  So, thanks for reporting!

You're welcome!

> 
>  1. change the Makefile to always install mandoc.css
>  2. better document what mandoc.css is needed for,
>     what the embedded default CSS does and does not provide,
>     and that using a custom CSS file requires a high level of
>     proficiency in both the CSS and the mdoc(7) languages
>  3. teach list_continues() that \[bu] is the same as \(bu.
> 
> But this is really an uphill battle.  As long as you rely on man(7),
> the fundamental design of that language implies that you will again
> and again bump into similar formatting issues - even with mandoc.
> 
> Still, as long as man(7) is used in the wild, please keep reporting
> such issues.  I rarely look at man(7) pages, so i need reports from
> the field to be able to make progress.

:-)

Cheers,
Alex

> 
> Yours,
>   Ingo
> --
>  To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-10-18 11:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 13:17 Alejandro Colomar
2023-10-16 14:52 ` Ingo Schwarze
2023-10-16 15:20   ` Jan Stary
2023-10-16 15:43     ` Ingo Schwarze
2023-10-16 16:03     ` Ingo Schwarze
2023-10-16 17:10   ` Alejandro Colomar
2023-10-16 17:16     ` Alejandro Colomar
2023-10-16 17:28     ` Alejandro Colomar
2023-10-17 19:02       ` Ingo Schwarze
2023-10-17 21:39         ` Alejandro Colomar
2023-10-18  0:04           ` Ingo Schwarze
2023-10-18 11:32             ` Alejandro Colomar [this message]
2023-10-18 14:48             ` Ingo Schwarze
2023-10-18 14:56               ` Alejandro Colomar
2023-10-18 16:20             ` Ingo Schwarze
2023-10-18 18:52               ` Alejandro Colomar
2023-10-19 11:59             ` Ingo Schwarze
2023-10-19 12:48               ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZS_CVZJ-Me80kZx5@debian \
    --to=alx@kernel.org \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).