tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: Alejandro Colomar <alx@kernel.org>
Cc: tech@mandoc.bsd.lv
Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
Date: Thu, 19 Oct 2023 16:45:21 +0200	[thread overview]
Message-ID: <ZTFBAV11eRPvDkWA@asta-kit.de> (raw)
In-Reply-To: <ZS1xRG55oh25kZGf@debian>

Hi Alejandro,

Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200:
> On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote:
>> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200:

>>> groff -man -Thtml seems to never produce a blank line before a TP.
>> What do you mean by "blank line"?
> What my eyes experience as a relatively large inter-paragraph space.

Heh.  That's not a very useful definition when talking about HTML code,
given that the HTML language does not provide any non-deprecated syntax
or semantics related to paragraph spacing.  :)

>>> mandoc -man -Thtml produces one in some cases, and I can't see a
>>> pattern.
>>> 
>>> I found this bug while reading feature_test_macros(7) in the Debian
>>> online manpages:
>>> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>

>> You can see that particular page rendered here:
>>   https://man.bsd.lv/Test/ftm.7

> I don't see the bug there.  I'm going to guess it's just another case of
> a missing CSS file.

Actually, there *is* a problem with the HTML code in that page,
even though you did not see it because the CSS file hides it.

That page contains this HTML code:

  <dl class="Bl-tag">
    <dt><b>_ISOC11_SOURCE</b> (since glibc 2.16)</dt>
    <dd>Exposes declarations consistent with the ISO C11 standard.
      Defining this macro also enables C99 and C95 features
      (like <b>_ISOC99_SOURCE</b>).</dd>
  </dl>
  <dl class="Bl-tag">
    <dt></dt>
    <dd>Invoking the C compiler with the option <i>-std=c11</i>
      produces the same effects as defining this macro.</dd>
  </dl>
  <dl class="Bl-tag">
    <dt><b>_LARGEFILE64_SOURCE</b></dt>
    <dd>Expose definitions for the alternative API specified by the
      LFS (Large File Summit) as a &quot;transitional extension&quot;
      to the Single UNIX Specification. [...]

That's of course quite nonsensical because _ISOC11_SOURCE and
_LARGEFILE64_SOURCE are intended by the page author as adjacent
entries in the same list, but mandoc(1) puts them into different
lists, with yet another single-item list in between.

>> It is a long page, and i have been unable to figure out what exactly
>> you are talking about.
>> 
>> Please point me to the precise position in the file where vertical
>> spacing before a .TP macro feels lacking or execessive to you,

> In the Debian bullseye page, check the inter-paragraph space before the
> tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag
> _LARGEFILE_SOURCE.

Now i see, thank you for pointing me to the specific place.

The trouble is caused by the following man(7) idiom:

  .TP
  first tag
  first body
  .IP
  still in the first body
  .TP
  second tag
  second body

The author's intent here is that the two .TP macros mark up adjacent
items in the same list and the .IP marks up an ordinary paragraph
break within the item body of the first list entry.

Now, using .IP for an ordinary paragraph break is no doubt surprising,
but it works (from a purely presentational point of view) because .IP
does assert the same vertical spacing as .PP would and because .IP
asserts the same indentation as the previous .TP did.

So logically, what the author wanted was a list with two entries,
one containing two paragraphs of text, the other containing one
paragraph of text.  Technically, what we got is three paragraphs of
text, all indented by the same amount, the first and last containing
a tag and the middle one having an empty tag, with no indication that
there is any relation between any of the paragraphs, let alone that
they form a list.  Figuring out the logical list structure, if any,
is left as a guessing exercise to the formatter.

Here the conceptual inadequacy of the man(7) language becomes
blatantly obvious.  With very few exceptions, the language does not
provide any concept of block nesting.  The only exception is that
various macros can be nested inside .SH, .SS, and .RS.  But nothing
can be nested inside a list item body, neither a paragraph break,
nor .RS, nor another list.

Consequently, i think the fundamental design of the man(7) language
is too weak to adequately express a list item containing more than a
single paragraph of text, and the crude presentational workaround of
splitting the item body into two unrelated paragraphs with different
introducing macros indeed looks like the best workaround available.

The longer i look at the man(7) language, the more convinced i become
that it is so rotten to the core that trying to provide a style
guide to write good man(7) pages is nothing but a fool's errand,
and trying to add a few semantical macros to the man(7) language is
an even worse fool's errand because a few additions won't cure the
fundamental design flaws.  If you put lipstick on a pig, it's still
not going to win any Miss Espana contest.

Let me quote only one other example that i ran into just today.
The first real-world example of .MR usage i encountered required
a trailing \c escape sequence on the preceding line.  Now how
ironic is that?  A brand-new macro introduced to improve semantics,
but using is requires terribly arcane low-level presentational
markup.  I'm progressively becoming convinced that the language
is irredeemable.


Consequently, the following needs to be done in mandoc:

1. Currently, when formatting .TP or .IP with a non-empty head,
   the HTML formatter looks at the previous and at the following
   abstract syntax tree (AST) node to figure out whether the
   tagged paragraph is part of a list.
   If that previous or follwing AST node is .IP or .RS with an
   empty head, it will have to iterate until it finds an AST node
   that is neither .IP nor .RS or has a non-empty head, evaluating
   the properties of that node instead of the directly preceding
   or following node.

2. When formatting .IP or .RS with an empty head, mandoc needs
   to iterate backwards, searching for an AST node that is neither
   .IP nor .RS or has a non-empty head, and figure out whether that
   node is a list item, which again, as explained above, requires
   iterating both forwards and backwards.
   If it turns out we are inside a list, interrupting the list
   must be prevented.  Instead, .IP with an empty head must be
   formatted like .PP, and .RS with an empty head must be formatted
   somewhat like .br.

Probably, doing all this in the HTML formatter module would be
over the top.  I believe such complicated AST inspection should
be done by the validation module (man_validate.c), which should
set AST node flags similar to

 - this node starts a new list
 - this node starts a new list item
 - this node merely indicates a paragraph break
 - this node ends a list

which the fotmatters can then readily use.

This required logic is so complicated that i won't code it right
away, there are more urgent matters to be taken care of.
Instead, i will add it to the mandoc TODO list.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


  reply	other threads:[~2023-10-19 14:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 11:32 Alejandro Colomar
2023-10-16 16:28 ` Ingo Schwarze
2023-10-16 17:22   ` Alejandro Colomar
2023-10-19 14:45     ` Ingo Schwarze [this message]
2023-10-19 15:10       ` Ingo Schwarze
2023-10-19 15:17       ` Alejandro Colomar
2023-10-19 16:19         ` Ingo Schwarze
2023-10-19 21:32           ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTFBAV11eRPvDkWA@asta-kit.de \
    --to=schwarze@usta.de \
    --cc=alx@kernel.org \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).