tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: Alejandro Colomar <alx@kernel.org>
Cc: tech@mandoc.bsd.lv
Subject: Re: mandoc -man -Thtml: unwanted line break after bullet (.IP)
Date: Wed, 18 Oct 2023 02:04:46 +0200	[thread overview]
Message-ID: <ZS8hHsrCp9Bn2/tt@asta-kit.de> (raw)
In-Reply-To: <ZS7_EQjHv3IK83i4@debian>

Hi Alejandro,

Alejandro Colomar wrote on Tue, Oct 17, 2023 at 11:39:23PM +0200:

> Here's what I see in the bookworm one: <https://i.imgur.com/EW9B5Cq.png>
> And buster: <https://i.imgur.com/6WBolqG.png>

Ah, thank you for pointing me to the specific place where you found
a difference.  Yes, you are right, there is a difference there.

However, that difference is caused by a difference in the manual page
source code, not by a difference in the formatter.

The bookworm manual pages contains:

  First, though, a summary of a few details for the impatient:
  .IP \[bu] 3
  The macros that you most likely need to use in modern source code are

mandoc(1) renders that as:

  <p class="Pp">First, though, a summary of a few details
  for the impatient:</p>
  <dl class="Bl-tag">
    <dt>&#x2022;</dt>
    <dd>The macros that you most likely need to use in modern source...

that is, as a tagged list, because it is not (yet) smart enough to
figure out the ".IP \[bu]" is intended as a bullet list.

By contrast, the buster manual page contains:

  First, though a summary of a few details for the impatient:
  .IP * 3
  The macros that you most likely need to use in modern source code are

mandoc(1) renders that as:

  <p class="Pp">First, though a summary of a few details
  for the impatient:</p>
  <ul class="Bl-bullet">
    <li>The macros that you most likely need to use in modern source...

which is obviously much more visually pleasing and also makes more sense
semantically.

The problem here is that just like it is *significantly* more difficult
to write a good man(7) page than a good mdoc(7) page - due to the fact
that the man(7) language is totally inadequate, by its fundamental
design, to express semantic markup - it is *massively* more difficult
to produce good HTML output from man(7) than from mdoc(7), and for
exactly the same reason that makes writing man87) so difficult
for humans in the first place: The man(7) language is a purely
presentational language with no feature whatsoever to convey anything
semantic, whereas the HTML language is a purely semantic language
with no feature whatsoever for expressing anything presentational.

Consequently, the mdoc(7) HTML formatter is very straightforward:
  .Bl -tag      becomes   <dl>
  .Bl -bullet   becomes   <ul>
  .Bl -enum     becomes   <ol>
and we are *certain* that we have met the manual page author's intention.
End of the story, everybody is happy now.

For man(7), almost anything we do involves crude guesswork.
Both .TP and .IP can become any of <dl>, <ul>, or <ol> and which
is the right one has to be *guessed* from the content.
In addition, while mdoc(7) makes it explicit where a list begins
and where that list ends, in man(7), we have to *guess* whether
any given .TP/.IP macro begins, continues, or ends a list.

Look at https://cvsweb.bsd.lv/mandoc/man_html.c?rev=HEAD ,
function man_IP_pre().  Right now, it maps
  .IP *           to   .Bl -bullet
  .IP \(bu        to   .Bl -bullet
  .IP -           to   .Bl -dash
  anything else   to   .Bl -tag
together with its helper function list_continues().

Now, the function list_continues() already recognizes \(bu as a
potential marker for a bullet list.  But that buster manual page
uses \[bu] instead.  I'm not saying that is wrong, mind you.

What i am saying is that parsing and formatting the man(7) language
is a nightmare and fragile as hell.  The fundamental design of that
language is totally 1970ies-style, and it shows in each and every
corner.  I'm not blaming Doug; at the time man(7) was designed, it was
a monumental step forward, and nobody could have been expected to do
better in the 1970ies.  But after Cynthia Livingston invented mdoc(7)
in 1989/90 and Tim Berners-Lee invented HTML in 1989/90, the fundamental
concept of man(7) was totally outdated with no redeeming qualities
and no hope for healing, and it should have been completely abandoned
by 1995 at the latest.  A documentation language that is essentially
presentational obviously has no place in this millenium.

All the same, this conversation has been useful, and i need to change
three aspects of mandoc.  So, thanks for reporting!

 1. change the Makefile to always install mandoc.css
 2. better document what mandoc.css is needed for,
    what the embedded default CSS does and does not provide,
    and that using a custom CSS file requires a high level of
    proficiency in both the CSS and the mdoc(7) languages
 3. teach list_continues() that \[bu] is the same as \(bu.

But this is really an uphill battle.  As long as you rely on man(7),
the fundamental design of that language implies that you will again
and again bump into similar formatting issues - even with mandoc.

Still, as long as man(7) is used in the wild, please keep reporting
such issues.  I rarely look at man(7) pages, so i need reports from
the field to be able to make progress.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


  reply	other threads:[~2023-10-18  0:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 13:17 Alejandro Colomar
2023-10-16 14:52 ` Ingo Schwarze
2023-10-16 15:20   ` Jan Stary
2023-10-16 15:43     ` Ingo Schwarze
2023-10-16 16:03     ` Ingo Schwarze
2023-10-16 17:10   ` Alejandro Colomar
2023-10-16 17:16     ` Alejandro Colomar
2023-10-16 17:28     ` Alejandro Colomar
2023-10-17 19:02       ` Ingo Schwarze
2023-10-17 21:39         ` Alejandro Colomar
2023-10-18  0:04           ` Ingo Schwarze [this message]
2023-10-18 11:32             ` Alejandro Colomar
2023-10-18 14:48             ` Ingo Schwarze
2023-10-18 14:56               ` Alejandro Colomar
2023-10-18 16:20             ` Ingo Schwarze
2023-10-18 18:52               ` Alejandro Colomar
2023-10-19 11:59             ` Ingo Schwarze
2023-10-19 12:48               ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZS8hHsrCp9Bn2/tt@asta-kit.de \
    --to=schwarze@usta.de \
    --cc=alx@kernel.org \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).