tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: Ingo Schwarze <schwarze@usta.de>
Cc: tech@mandoc.bsd.lv, "G. Branden Robinson" <branden@debian.org>
Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
Date: Thu, 19 Oct 2023 17:17:10 +0200	[thread overview]
Message-ID: <ZTFIfEt1T2eSHMLC@debian> (raw)
In-Reply-To: <ZTFBAV11eRPvDkWA@asta-kit.de>

[-- Attachment #1: Type: text/plain, Size: 9447 bytes --]

On Thu, Oct 19, 2023 at 04:45:21PM +0200, Ingo Schwarze wrote:
> Hi Alejandro,
> 
> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200:
> > On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote:
> >> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200:
> 
> >>> groff -man -Thtml seems to never produce a blank line before a TP.
> >> What do you mean by "blank line"?
> > What my eyes experience as a relatively large inter-paragraph space.
> 
> Heh.  That's not a very useful definition when talking about HTML code,
> given that the HTML language does not provide any non-deprecated syntax
> or semantics related to paragraph spacing.  :)

I know.  :)

> 
> >>> mandoc -man -Thtml produces one in some cases, and I can't see a
> >>> pattern.
> >>> 
> >>> I found this bug while reading feature_test_macros(7) in the Debian
> >>> online manpages:
> >>> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>
> 
> >> You can see that particular page rendered here:
> >>   https://man.bsd.lv/Test/ftm.7
> 
> > I don't see the bug there.  I'm going to guess it's just another case of
> > a missing CSS file.
> 
> Actually, there *is* a problem with the HTML code in that page,
> even though you did not see it because the CSS file hides it.
> 
> That page contains this HTML code:
> 
>   <dl class="Bl-tag">
>     <dt><b>_ISOC11_SOURCE</b> (since glibc 2.16)</dt>
>     <dd>Exposes declarations consistent with the ISO C11 standard.
>       Defining this macro also enables C99 and C95 features
>       (like <b>_ISOC99_SOURCE</b>).</dd>
>   </dl>
>   <dl class="Bl-tag">
>     <dt></dt>
>     <dd>Invoking the C compiler with the option <i>-std=c11</i>
>       produces the same effects as defining this macro.</dd>
>   </dl>
>   <dl class="Bl-tag">
>     <dt><b>_LARGEFILE64_SOURCE</b></dt>
>     <dd>Expose definitions for the alternative API specified by the
>       LFS (Large File Summit) as a &quot;transitional extension&quot;
>       to the Single UNIX Specification. [...]
> 
> That's of course quite nonsensical because _ISOC11_SOURCE and
> _LARGEFILE64_SOURCE are intended by the page author as adjacent
> entries in the same list, but mandoc(1) puts them into different
> lists, with yet another single-item list in between.
> 
> >> It is a long page, and i have been unable to figure out what exactly
> >> you are talking about.
> >> 
> >> Please point me to the precise position in the file where vertical
> >> spacing before a .TP macro feels lacking or execessive to you,
> 
> > In the Debian bullseye page, check the inter-paragraph space before the
> > tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag
> > _LARGEFILE_SOURCE.
> 
> Now i see, thank you for pointing me to the specific place.
> 
> The trouble is caused by the following man(7) idiom:
> 
>   .TP
>   first tag
>   first body
>   .IP
>   still in the first body
>   .TP
>   second tag
>   second body

Hmm, that's an old enemy showing up.

> 
> The author's intent here is that the two .TP macros mark up adjacent
> items in the same list and the .IP marks up an ordinary paragraph
> break within the item body of the first list entry.
> 
> Now, using .IP for an ordinary paragraph break is no doubt surprising,
> but it works (from a purely presentational point of view) because .IP
> does assert the same vertical spacing as .PP would and because .IP
> asserts the same indentation as the previous .TP did.
> 
> So logically, what the author wanted was a list with two entries,
> one containing two paragraphs of text, the other containing one
> paragraph of text.  Technically, what we got is three paragraphs of
> text, all indented by the same amount, the first and last containing
> a tag and the middle one having an empty tag, with no indication that
> there is any relation between any of the paragraphs, let alone that
> they form a list.  Figuring out the logical list structure, if any,
> is left as a guessing exercise to the formatter.
> 
> Here the conceptual inadequacy of the man(7) language becomes
> blatantly obvious.  With very few exceptions, the language does not
> provide any concept of block nesting.  The only exception is that
> various macros can be nested inside .SH, .SS, and .RS.  But nothing
> can be nested inside a list item body, neither a paragraph break,
> nor .RS, nor another list.

I had this gripe with man(7) some years ago.  I thought of using the
following instead, which slightly complicates the source code, but makes
it more logical.

	$ cat nested_indent.man 
	.TH nested_indent 7 2023-10-19 experiments
	.SH Ingo said:
	.TP
	Todo
	Currently, when formatting .TP or .IP with a non-empty head,
	the HTML formatter looks at the previous and at the following
	abstract syntax tree (AST) node to figure out whether the
	tagged paragraph is part of a list.
	If that previous or follwing AST node is .IP or .RS with an
	empty head, it will have to iterate until it finds an AST node
	that is neither .IP nor .RS or has a non-empty head, evaluating
	the properties of that node instead of the directly preceding
	or following node.
	.RS
	.PP
	When formatting .IP or .RS with an empty head, mandoc needs
	to iterate backwards, searching for an AST node that is neither
	\&.IP nor .RS or has a non-empty head, and figure out whether that
	node is a list item, which again, as explained above, requires
	iterating both forwards and backwards.
	If it turns out we are inside a list, interrupting the list
	must be prevented.  Instead, .IP with an empty head must be
	formatted like .PP, and .RS with an empty head must be formatted
	somewhat like .br.
	.RE

As you can see, here the indentation is controlled by a single RS/RE
pair, and everything within it uses PP as a normal paragraph separator.
You could put the RS before the first paragraph, but then an unwanted
line break appears after the tag.  (Maybe man(7) could be tweaked so
that RS doesn't insert the line break after a TP.)

In the end I didn't switch to that scheme, because IP just worked, but
I might consider it if it proves to be useful.  What do you think?

[CC += Branden, in case he wants to add his opinion too]

Cheers,
Alex

> 
> Consequently, i think the fundamental design of the man(7) language
> is too weak to adequately express a list item containing more than a
> single paragraph of text, and the crude presentational workaround of
> splitting the item body into two unrelated paragraphs with different
> introducing macros indeed looks like the best workaround available.
> 
> The longer i look at the man(7) language, the more convinced i become
> that it is so rotten to the core that trying to provide a style
> guide to write good man(7) pages is nothing but a fool's errand,
> and trying to add a few semantical macros to the man(7) language is
> an even worse fool's errand because a few additions won't cure the
> fundamental design flaws.  If you put lipstick on a pig, it's still
> not going to win any Miss Espana contest.
> 
> Let me quote only one other example that i ran into just today.
> The first real-world example of .MR usage i encountered required
> a trailing \c escape sequence on the preceding line.  Now how
> ironic is that?  A brand-new macro introduced to improve semantics,
> but using is requires terribly arcane low-level presentational
> markup.  I'm progressively becoming convinced that the language
> is irredeemable.
> 
> 
> Consequently, the following needs to be done in mandoc:
> 
> 1. Currently, when formatting .TP or .IP with a non-empty head,
>    the HTML formatter looks at the previous and at the following
>    abstract syntax tree (AST) node to figure out whether the
>    tagged paragraph is part of a list.
>    If that previous or follwing AST node is .IP or .RS with an
>    empty head, it will have to iterate until it finds an AST node
>    that is neither .IP nor .RS or has a non-empty head, evaluating
>    the properties of that node instead of the directly preceding
>    or following node.
> 
> 2. When formatting .IP or .RS with an empty head, mandoc needs
>    to iterate backwards, searching for an AST node that is neither
>    .IP nor .RS or has a non-empty head, and figure out whether that
>    node is a list item, which again, as explained above, requires
>    iterating both forwards and backwards.
>    If it turns out we are inside a list, interrupting the list
>    must be prevented.  Instead, .IP with an empty head must be
>    formatted like .PP, and .RS with an empty head must be formatted
>    somewhat like .br.
> 
> Probably, doing all this in the HTML formatter module would be
> over the top.  I believe such complicated AST inspection should
> be done by the validation module (man_validate.c), which should
> set AST node flags similar to
> 
>  - this node starts a new list
>  - this node starts a new list item
>  - this node merely indicates a paragraph break
>  - this node ends a list
> 
> which the fotmatters can then readily use.
> 
> This required logic is so complicated that i won't code it right
> away, there are more urgent matters to be taken care of.
> Instead, i will add it to the mandoc TODO list.
> 
> Yours,
>   Ingo

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-10-19 15:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 11:32 Alejandro Colomar
2023-10-16 16:28 ` Ingo Schwarze
2023-10-16 17:22   ` Alejandro Colomar
2023-10-19 14:45     ` Ingo Schwarze
2023-10-19 15:10       ` Ingo Schwarze
2023-10-19 15:17       ` Alejandro Colomar [this message]
2023-10-19 16:19         ` Ingo Schwarze
2023-10-19 21:32           ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTFIfEt1T2eSHMLC@debian \
    --to=alx@kernel.org \
    --cc=branden@debian.org \
    --cc=schwarze@usta.de \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).