tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* mandoc -man -Thtml bug: inconsistent vertical space before .TP
@ 2023-10-16 11:32 Alejandro Colomar
  2023-10-16 16:28 ` Ingo Schwarze
  0 siblings, 1 reply; 8+ messages in thread
From: Alejandro Colomar @ 2023-10-16 11:32 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: tech

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

Hi Ingo,

groff -man -Thtml seems to never produce a blank line before a TP.
mandoc -man -Thtml produces one in some cases, and I can't see a
pattern.

I found this bug while reading feature_test_macros(7) in the Debian
online manpages:
<https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>

To reproduce this bug, run the following in the Linux man-pages repo:

 1999  groff -man -T html man7/feature_test_macros.7 > ftm.g.html
 2000  open ftm.g.html 
 2001  mandoc -man -T html man7/feature_test_macros.7 > ftm.m.html
 2002  open ftm.m.html 


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-16 11:32 mandoc -man -Thtml bug: inconsistent vertical space before .TP Alejandro Colomar
@ 2023-10-16 16:28 ` Ingo Schwarze
  2023-10-16 17:22   ` Alejandro Colomar
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Schwarze @ 2023-10-16 16:28 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: tech

Hi Alejandro,

Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200:

> groff -man -Thtml seems to never produce a blank line before a TP.

What do you mean by "blank line"?
I'm not aware that any such concept is defined in HTML,
at least not outside the "pre" element (and possibly a few
elements similar to "pre").

> mandoc -man -Thtml produces one in some cases, and I can't see a
> pattern.
> 
> I found this bug while reading feature_test_macros(7) in the Debian
> online manpages:
> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>
> 
> To reproduce this bug, run the following in the Linux man-pages repo:
> 
>  1999  groff -man -T html man7/feature_test_macros.7 > ftm.g.html
>  2000  open ftm.g.html 
>  2001  mandoc -man -T html man7/feature_test_macros.7 > ftm.m.html
>  2002  open ftm.m.html 

You can see that particular page rendered here:

  https://man.bsd.lv/Test/ftm.7

It is a long page, and i have been unable to figure out what exactly
you are talking about.

Please point me to the precise position in the file where vertical
spacing before a .TP macro feels lacking or execessive to you,
for example by giving the exact text following that .TP macro,
and ideally also provide an example of where you feel vertical
spacing is about right.

Also note that comparing to groff -T html output is completely
useless because groff -T html output is of utterly atrocious
quality, violating the HTML standard in almost every way imaginable.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-16 16:28 ` Ingo Schwarze
@ 2023-10-16 17:22   ` Alejandro Colomar
  2023-10-19 14:45     ` Ingo Schwarze
  0 siblings, 1 reply; 8+ messages in thread
From: Alejandro Colomar @ 2023-10-16 17:22 UTC (permalink / raw)
  To: tech

[-- Attachment #1: Type: text/plain, Size: 2329 bytes --]

Hi Ingo,

On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote:
> Hi Alejandro,
> 
> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200:
> 
> > groff -man -Thtml seems to never produce a blank line before a TP.
> 
> What do you mean by "blank line"?

What my eyes experience as a relatively large inter-paragraph space.

> I'm not aware that any such concept is defined in HTML,
> at least not outside the "pre" element (and possibly a few
> elements similar to "pre").
> 
> > mandoc -man -Thtml produces one in some cases, and I can't see a
> > pattern.
> > 
> > I found this bug while reading feature_test_macros(7) in the Debian
> > online manpages:
> > <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>
> > 
> > To reproduce this bug, run the following in the Linux man-pages repo:
> > 
> >  1999  groff -man -T html man7/feature_test_macros.7 > ftm.g.html
> >  2000  open ftm.g.html 
> >  2001  mandoc -man -T html man7/feature_test_macros.7 > ftm.m.html
> >  2002  open ftm.m.html 
> 
> You can see that particular page rendered here:
> 
>   https://man.bsd.lv/Test/ftm.7

I don't see the bug there.  I'm going to guess it's just another case of
a missing CSS file.

> 
> It is a long page, and i have been unable to figure out what exactly
> you are talking about.
> 
> Please point me to the precise position in the file where vertical
> spacing before a .TP macro feels lacking or execessive to you,

In the Debian bullseye page, check the inter-paragraph space before the
tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag
_LARGEFILE_SOURCE.

> for example by giving the exact text following that .TP macro,

The lines after the tags say
"Expose definitions for the alternative API specified by the LFS"
and
"This macro was historically used to expose certain functions"
respectively.

> and ideally also provide an example of where you feel vertical
> spacing is about right.

The page you posted in <man.bsd.lv> seems right to me.

> 
> Also note that comparing to groff -T html output is completely
> useless because groff -T html output is of utterly atrocious
> quality, violating the HTML standard in almost every way imaginable.

:)

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-16 17:22   ` Alejandro Colomar
@ 2023-10-19 14:45     ` Ingo Schwarze
  2023-10-19 15:10       ` Ingo Schwarze
  2023-10-19 15:17       ` Alejandro Colomar
  0 siblings, 2 replies; 8+ messages in thread
From: Ingo Schwarze @ 2023-10-19 14:45 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: tech

Hi Alejandro,

Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200:
> On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote:
>> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200:

>>> groff -man -Thtml seems to never produce a blank line before a TP.
>> What do you mean by "blank line"?
> What my eyes experience as a relatively large inter-paragraph space.

Heh.  That's not a very useful definition when talking about HTML code,
given that the HTML language does not provide any non-deprecated syntax
or semantics related to paragraph spacing.  :)

>>> mandoc -man -Thtml produces one in some cases, and I can't see a
>>> pattern.
>>> 
>>> I found this bug while reading feature_test_macros(7) in the Debian
>>> online manpages:
>>> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>

>> You can see that particular page rendered here:
>>   https://man.bsd.lv/Test/ftm.7

> I don't see the bug there.  I'm going to guess it's just another case of
> a missing CSS file.

Actually, there *is* a problem with the HTML code in that page,
even though you did not see it because the CSS file hides it.

That page contains this HTML code:

  <dl class="Bl-tag">
    <dt><b>_ISOC11_SOURCE</b> (since glibc 2.16)</dt>
    <dd>Exposes declarations consistent with the ISO C11 standard.
      Defining this macro also enables C99 and C95 features
      (like <b>_ISOC99_SOURCE</b>).</dd>
  </dl>
  <dl class="Bl-tag">
    <dt></dt>
    <dd>Invoking the C compiler with the option <i>-std=c11</i>
      produces the same effects as defining this macro.</dd>
  </dl>
  <dl class="Bl-tag">
    <dt><b>_LARGEFILE64_SOURCE</b></dt>
    <dd>Expose definitions for the alternative API specified by the
      LFS (Large File Summit) as a &quot;transitional extension&quot;
      to the Single UNIX Specification. [...]

That's of course quite nonsensical because _ISOC11_SOURCE and
_LARGEFILE64_SOURCE are intended by the page author as adjacent
entries in the same list, but mandoc(1) puts them into different
lists, with yet another single-item list in between.

>> It is a long page, and i have been unable to figure out what exactly
>> you are talking about.
>> 
>> Please point me to the precise position in the file where vertical
>> spacing before a .TP macro feels lacking or execessive to you,

> In the Debian bullseye page, check the inter-paragraph space before the
> tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag
> _LARGEFILE_SOURCE.

Now i see, thank you for pointing me to the specific place.

The trouble is caused by the following man(7) idiom:

  .TP
  first tag
  first body
  .IP
  still in the first body
  .TP
  second tag
  second body

The author's intent here is that the two .TP macros mark up adjacent
items in the same list and the .IP marks up an ordinary paragraph
break within the item body of the first list entry.

Now, using .IP for an ordinary paragraph break is no doubt surprising,
but it works (from a purely presentational point of view) because .IP
does assert the same vertical spacing as .PP would and because .IP
asserts the same indentation as the previous .TP did.

So logically, what the author wanted was a list with two entries,
one containing two paragraphs of text, the other containing one
paragraph of text.  Technically, what we got is three paragraphs of
text, all indented by the same amount, the first and last containing
a tag and the middle one having an empty tag, with no indication that
there is any relation between any of the paragraphs, let alone that
they form a list.  Figuring out the logical list structure, if any,
is left as a guessing exercise to the formatter.

Here the conceptual inadequacy of the man(7) language becomes
blatantly obvious.  With very few exceptions, the language does not
provide any concept of block nesting.  The only exception is that
various macros can be nested inside .SH, .SS, and .RS.  But nothing
can be nested inside a list item body, neither a paragraph break,
nor .RS, nor another list.

Consequently, i think the fundamental design of the man(7) language
is too weak to adequately express a list item containing more than a
single paragraph of text, and the crude presentational workaround of
splitting the item body into two unrelated paragraphs with different
introducing macros indeed looks like the best workaround available.

The longer i look at the man(7) language, the more convinced i become
that it is so rotten to the core that trying to provide a style
guide to write good man(7) pages is nothing but a fool's errand,
and trying to add a few semantical macros to the man(7) language is
an even worse fool's errand because a few additions won't cure the
fundamental design flaws.  If you put lipstick on a pig, it's still
not going to win any Miss Espana contest.

Let me quote only one other example that i ran into just today.
The first real-world example of .MR usage i encountered required
a trailing \c escape sequence on the preceding line.  Now how
ironic is that?  A brand-new macro introduced to improve semantics,
but using is requires terribly arcane low-level presentational
markup.  I'm progressively becoming convinced that the language
is irredeemable.


Consequently, the following needs to be done in mandoc:

1. Currently, when formatting .TP or .IP with a non-empty head,
   the HTML formatter looks at the previous and at the following
   abstract syntax tree (AST) node to figure out whether the
   tagged paragraph is part of a list.
   If that previous or follwing AST node is .IP or .RS with an
   empty head, it will have to iterate until it finds an AST node
   that is neither .IP nor .RS or has a non-empty head, evaluating
   the properties of that node instead of the directly preceding
   or following node.

2. When formatting .IP or .RS with an empty head, mandoc needs
   to iterate backwards, searching for an AST node that is neither
   .IP nor .RS or has a non-empty head, and figure out whether that
   node is a list item, which again, as explained above, requires
   iterating both forwards and backwards.
   If it turns out we are inside a list, interrupting the list
   must be prevented.  Instead, .IP with an empty head must be
   formatted like .PP, and .RS with an empty head must be formatted
   somewhat like .br.

Probably, doing all this in the HTML formatter module would be
over the top.  I believe such complicated AST inspection should
be done by the validation module (man_validate.c), which should
set AST node flags similar to

 - this node starts a new list
 - this node starts a new list item
 - this node merely indicates a paragraph break
 - this node ends a list

which the fotmatters can then readily use.

This required logic is so complicated that i won't code it right
away, there are more urgent matters to be taken care of.
Instead, i will add it to the mandoc TODO list.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-19 14:45     ` Ingo Schwarze
@ 2023-10-19 15:10       ` Ingo Schwarze
  2023-10-19 15:17       ` Alejandro Colomar
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Schwarze @ 2023-10-19 15:10 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: tech

Hi Alejandro,

Ingo Schwarze wrote on Thu, Oct 19, 2023 at 04:45:21PM +0200:

> Consequently, the following needs to be done in mandoc:
> 
> 1. Currently, when formatting .TP or .IP with a non-empty head,
>    the HTML formatter looks at the previous and at the following
>    abstract syntax tree (AST) node to figure out whether the
>    tagged paragraph is part of a list.
>    If that previous or follwing AST node is .IP or .RS with an
>    empty head, it will have to iterate until it finds an AST node
>    that is neither .IP nor .RS or has a non-empty head, evaluating
>    the properties of that node instead of the directly preceding
>    or following node.
> 
> 2. When formatting .IP or .RS with an empty head, mandoc needs
>    to iterate backwards, searching for an AST node that is neither
>    .IP nor .RS or has a non-empty head, and figure out whether that
>    node is a list item, which again, as explained above, requires
>    iterating both forwards and backwards.
>    If it turns out we are inside a list, interrupting the list
>    must be prevented.  Instead, .IP with an empty head must be
>    formatted like .PP, and .RS with an empty head must be formatted
>    somewhat like .br.
> 
> Probably, doing all this in the HTML formatter module would be
> over the top.  I believe such complicated AST inspection should
> be done by the validation module (man_validate.c), which should
> set AST node flags similar to
> 
>  - this node starts a new list
>  - this node starts a new list item
>  - this node merely indicates a paragraph break
>  - this node ends a list
> 
> which the fotmatters can then readily use.
> 
> This required logic is so complicated that i won't code it right
> away, there are more urgent matters to be taken care of.
> Instead, i will add it to the mandoc TODO list.

FYI: Added, see the commit below.
  Ingo


Log Message:
-----------
new entries: .MR and .TP .IP .TP

Modified Files:
--------------
    mandoc:
        TODO

Revision Data
-------------
Index: TODO
===================================================================
RCS file: /home/cvs/mandoc/mandoc/TODO,v
retrieving revision 1.329
retrieving revision 1.330
diff -LTODO -LTODO -u -p -r1.329 -r1.330
--- TODO
+++ TODO
@@ -239,6 +239,9 @@ are mere guesses, and some may be wrong.
 
 --- missing man features -----------------------------------------------
 
+- groff_man(7) .MR
+  loc **  exist *  algo *  size *  imp ***
+
 - MANWIDTH
   Markus Waldeck <waldeck at gmx dot de> 9 Jun 2015 05:49:56 +0200
   Laura Morales <lauretas at mail dot com> 26 Apr 2018 08:15:55 +0200
@@ -504,6 +507,10 @@ are mere guesses, and some may be wrong.
   loc **  exist **  algo **  size *  imp **
 
 --- HTML issues --------------------------------------------------------
+
+- support the idiom .TP .IP .TP for multi-paragraph list item bodies
+  to: Alejandro Colomar Thu, 19 Oct 2023 16:45:21 +0200
+  loc **  exist **  algo **  size **  imp **
 
 - .Nm without an argument and .Bx cause premature </pre>
   Nab Sun, 5 Jun 2022 18:30:09 +0200
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-19 14:45     ` Ingo Schwarze
  2023-10-19 15:10       ` Ingo Schwarze
@ 2023-10-19 15:17       ` Alejandro Colomar
  2023-10-19 16:19         ` Ingo Schwarze
  1 sibling, 1 reply; 8+ messages in thread
From: Alejandro Colomar @ 2023-10-19 15:17 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: tech, G. Branden Robinson

[-- Attachment #1: Type: text/plain, Size: 9447 bytes --]

On Thu, Oct 19, 2023 at 04:45:21PM +0200, Ingo Schwarze wrote:
> Hi Alejandro,
> 
> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 07:22:11PM +0200:
> > On Mon, Oct 16, 2023 at 06:28:05PM +0200, Ingo Schwarze wrote:
> >> Alejandro Colomar wrote on Mon, Oct 16, 2023 at 01:32:30PM +0200:
> 
> >>> groff -man -Thtml seems to never produce a blank line before a TP.
> >> What do you mean by "blank line"?
> > What my eyes experience as a relatively large inter-paragraph space.
> 
> Heh.  That's not a very useful definition when talking about HTML code,
> given that the HTML language does not provide any non-deprecated syntax
> or semantics related to paragraph spacing.  :)

I know.  :)

> 
> >>> mandoc -man -Thtml produces one in some cases, and I can't see a
> >>> pattern.
> >>> 
> >>> I found this bug while reading feature_test_macros(7) in the Debian
> >>> online manpages:
> >>> <https://manpages.debian.org/bullseye/manpages/ftm.7.en.html>
> 
> >> You can see that particular page rendered here:
> >>   https://man.bsd.lv/Test/ftm.7
> 
> > I don't see the bug there.  I'm going to guess it's just another case of
> > a missing CSS file.
> 
> Actually, there *is* a problem with the HTML code in that page,
> even though you did not see it because the CSS file hides it.
> 
> That page contains this HTML code:
> 
>   <dl class="Bl-tag">
>     <dt><b>_ISOC11_SOURCE</b> (since glibc 2.16)</dt>
>     <dd>Exposes declarations consistent with the ISO C11 standard.
>       Defining this macro also enables C99 and C95 features
>       (like <b>_ISOC99_SOURCE</b>).</dd>
>   </dl>
>   <dl class="Bl-tag">
>     <dt></dt>
>     <dd>Invoking the C compiler with the option <i>-std=c11</i>
>       produces the same effects as defining this macro.</dd>
>   </dl>
>   <dl class="Bl-tag">
>     <dt><b>_LARGEFILE64_SOURCE</b></dt>
>     <dd>Expose definitions for the alternative API specified by the
>       LFS (Large File Summit) as a &quot;transitional extension&quot;
>       to the Single UNIX Specification. [...]
> 
> That's of course quite nonsensical because _ISOC11_SOURCE and
> _LARGEFILE64_SOURCE are intended by the page author as adjacent
> entries in the same list, but mandoc(1) puts them into different
> lists, with yet another single-item list in between.
> 
> >> It is a long page, and i have been unable to figure out what exactly
> >> you are talking about.
> >> 
> >> Please point me to the precise position in the file where vertical
> >> spacing before a .TP macro feels lacking or execessive to you,
> 
> > In the Debian bullseye page, check the inter-paragraph space before the
> > tag _LARGEFILE64_SOURCE (I see a long vertical space) and the tag
> > _LARGEFILE_SOURCE.
> 
> Now i see, thank you for pointing me to the specific place.
> 
> The trouble is caused by the following man(7) idiom:
> 
>   .TP
>   first tag
>   first body
>   .IP
>   still in the first body
>   .TP
>   second tag
>   second body

Hmm, that's an old enemy showing up.

> 
> The author's intent here is that the two .TP macros mark up adjacent
> items in the same list and the .IP marks up an ordinary paragraph
> break within the item body of the first list entry.
> 
> Now, using .IP for an ordinary paragraph break is no doubt surprising,
> but it works (from a purely presentational point of view) because .IP
> does assert the same vertical spacing as .PP would and because .IP
> asserts the same indentation as the previous .TP did.
> 
> So logically, what the author wanted was a list with two entries,
> one containing two paragraphs of text, the other containing one
> paragraph of text.  Technically, what we got is three paragraphs of
> text, all indented by the same amount, the first and last containing
> a tag and the middle one having an empty tag, with no indication that
> there is any relation between any of the paragraphs, let alone that
> they form a list.  Figuring out the logical list structure, if any,
> is left as a guessing exercise to the formatter.
> 
> Here the conceptual inadequacy of the man(7) language becomes
> blatantly obvious.  With very few exceptions, the language does not
> provide any concept of block nesting.  The only exception is that
> various macros can be nested inside .SH, .SS, and .RS.  But nothing
> can be nested inside a list item body, neither a paragraph break,
> nor .RS, nor another list.

I had this gripe with man(7) some years ago.  I thought of using the
following instead, which slightly complicates the source code, but makes
it more logical.

	$ cat nested_indent.man 
	.TH nested_indent 7 2023-10-19 experiments
	.SH Ingo said:
	.TP
	Todo
	Currently, when formatting .TP or .IP with a non-empty head,
	the HTML formatter looks at the previous and at the following
	abstract syntax tree (AST) node to figure out whether the
	tagged paragraph is part of a list.
	If that previous or follwing AST node is .IP or .RS with an
	empty head, it will have to iterate until it finds an AST node
	that is neither .IP nor .RS or has a non-empty head, evaluating
	the properties of that node instead of the directly preceding
	or following node.
	.RS
	.PP
	When formatting .IP or .RS with an empty head, mandoc needs
	to iterate backwards, searching for an AST node that is neither
	\&.IP nor .RS or has a non-empty head, and figure out whether that
	node is a list item, which again, as explained above, requires
	iterating both forwards and backwards.
	If it turns out we are inside a list, interrupting the list
	must be prevented.  Instead, .IP with an empty head must be
	formatted like .PP, and .RS with an empty head must be formatted
	somewhat like .br.
	.RE

As you can see, here the indentation is controlled by a single RS/RE
pair, and everything within it uses PP as a normal paragraph separator.
You could put the RS before the first paragraph, but then an unwanted
line break appears after the tag.  (Maybe man(7) could be tweaked so
that RS doesn't insert the line break after a TP.)

In the end I didn't switch to that scheme, because IP just worked, but
I might consider it if it proves to be useful.  What do you think?

[CC += Branden, in case he wants to add his opinion too]

Cheers,
Alex

> 
> Consequently, i think the fundamental design of the man(7) language
> is too weak to adequately express a list item containing more than a
> single paragraph of text, and the crude presentational workaround of
> splitting the item body into two unrelated paragraphs with different
> introducing macros indeed looks like the best workaround available.
> 
> The longer i look at the man(7) language, the more convinced i become
> that it is so rotten to the core that trying to provide a style
> guide to write good man(7) pages is nothing but a fool's errand,
> and trying to add a few semantical macros to the man(7) language is
> an even worse fool's errand because a few additions won't cure the
> fundamental design flaws.  If you put lipstick on a pig, it's still
> not going to win any Miss Espana contest.
> 
> Let me quote only one other example that i ran into just today.
> The first real-world example of .MR usage i encountered required
> a trailing \c escape sequence on the preceding line.  Now how
> ironic is that?  A brand-new macro introduced to improve semantics,
> but using is requires terribly arcane low-level presentational
> markup.  I'm progressively becoming convinced that the language
> is irredeemable.
> 
> 
> Consequently, the following needs to be done in mandoc:
> 
> 1. Currently, when formatting .TP or .IP with a non-empty head,
>    the HTML formatter looks at the previous and at the following
>    abstract syntax tree (AST) node to figure out whether the
>    tagged paragraph is part of a list.
>    If that previous or follwing AST node is .IP or .RS with an
>    empty head, it will have to iterate until it finds an AST node
>    that is neither .IP nor .RS or has a non-empty head, evaluating
>    the properties of that node instead of the directly preceding
>    or following node.
> 
> 2. When formatting .IP or .RS with an empty head, mandoc needs
>    to iterate backwards, searching for an AST node that is neither
>    .IP nor .RS or has a non-empty head, and figure out whether that
>    node is a list item, which again, as explained above, requires
>    iterating both forwards and backwards.
>    If it turns out we are inside a list, interrupting the list
>    must be prevented.  Instead, .IP with an empty head must be
>    formatted like .PP, and .RS with an empty head must be formatted
>    somewhat like .br.
> 
> Probably, doing all this in the HTML formatter module would be
> over the top.  I believe such complicated AST inspection should
> be done by the validation module (man_validate.c), which should
> set AST node flags similar to
> 
>  - this node starts a new list
>  - this node starts a new list item
>  - this node merely indicates a paragraph break
>  - this node ends a list
> 
> which the fotmatters can then readily use.
> 
> This required logic is so complicated that i won't code it right
> away, there are more urgent matters to be taken care of.
> Instead, i will add it to the mandoc TODO list.
> 
> Yours,
>   Ingo

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-19 15:17       ` Alejandro Colomar
@ 2023-10-19 16:19         ` Ingo Schwarze
  2023-10-19 21:32           ` Alejandro Colomar
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Schwarze @ 2023-10-19 16:19 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: tech, branden

Hi Alejandro,

Alejandro Colomar wrote on Thu, Oct 19, 2023 at 05:17:10PM +0200:

> I had this gripe with man(7) some years ago.  I thought of using the
> following instead, which slightly complicates the source code, but makes
> it more logical.
> 
> 	$ cat nested_indent.man 
> 	.TH nested_indent 7 2023-10-19 experiments
> 	.SH Ingo said:
> 	.TP
> 	Todo
> 	Currently, when formatting .TP or .IP with a non-empty head,
> 	[yada yada]
> 	.RS
> 	.PP
> 	When formatting .IP or .RS with an empty head, mandoc needs
> 	[yada yada]
> 	.RE
> 
> As you can see, here the indentation is controlled by a single RS/RE
> pair, and everything within it uses PP as a normal paragraph separator.

While that also generates correct terminal and typographical (PS, PDF)
output in the same purely presentational sense as .TP .IP .TP, it
does not help with respect to the semantic problem we are discussing
here.

Look at the AST generated by mandoc(1):

   $ mandoc -T tree nested_indent.man
  title = "nested_indent"
  sec   = "7"
  vol   = "Miscellaneous Information Manual"
  os    = "experiments"
  date  = "2023-10-19"
  
  SH (block) *2:2
    SH (head) 2:2 ID=HREF=Ingo_said:
        Ingo (text) 2:5
        said: (text) 2:10
    SH (body) 2:2
        TP (block) *3:2
          TP (head) 3:2 ID=HREF
              Todo (text) *4:1
          TP (body) 4:1
              Currently, when formatting .TP or .IP \
                  with a nonempty head, (text) *5:1
              [yada yada] (text) *6:1
        RS (block) *7:2
          RS (head) 7:2
          RS (body) 7:2
              PP (block) *8:2
                PP (head) 8:2
                PP (body) 8:2
                    When formatting .IP or .RS with an empty head,
                        mandoc needs (text) *9:1
                    [yada yada] (text) *10:1
        TP (block) *12:2
          TP (head) 12:2 ID=HREF=final
              final tag (text) *13:1
          TP (body) 13:1
              final body (text) *14:1

You see that the first .TP, the .RS, and the second .TP are all child
nodes of the top-level .SH.  The .RS is not a child of the .TP but
a sibling.  The two .TP nodes still aren't siblings of each other.

Now on first sight, you might blame me for that and call it a mandoc
artifact, arguing that mandoc instead ought to treat the .RS as a
child of the first .TP.  But no, that would be incorrect parsing
for the following reason: the .TP inmplies an indentation, and
the .RS also implies an indentation.  If the .RS were a child of
the .TP, we would get double indentation.  You can make that
argument even more convincing by adding a width argument to .RS
and varying that argument.  That way, you see that the .RS is
indented relative to the .SH, not relative to the .TP.

There are some cases where it is not completely clear whether one
man(7) node following another man(7) node is a child or a sibling.
mandoc(1) makes arbitrary choices in such ambiguous cases, usually
opting for sibling relations where possible and avoiding unnecessary
child relationships.  But this is not an ambiguous case.  Just like
the .IP, the .RS is definitely a sibling and not a child of the .TP.
As i said, no block can nest inside .TP.

That's why i brought up .RS in my reply and developed rules
for handling it in a similar way as .IP, even though you did
not mention .RS before.

> You could put the RS before the first paragraph, but then an unwanted
> line break appears after the tag.

No matter where you put the .RS, it will never be a child of .TP.

> (Maybe man(7) could be tweaked so
> that RS doesn't insert the line break after a TP.)

Not really a useful idea because .RS doesn't help with the actual
problem in the first place.

> In the end I didn't switch to that scheme, because IP just worked, but
> I might consider it if it proves to be useful.  What do you think?

As i said, i am not aware of a better solution than .TP .IP .TP.
In particular, .RS is not better because it causes exactly the
same trouble and potentially more trouble besides.

But i also said that trying to define "good style" for man(7)
is a fool's errand.  Because man(7) code is so exceedingly difficult
to write, man(7) code that is very clearly bad style is very often
found in the wild, so there is ample opportunity for saying "this
is bad style."  In some cases, it is also possible to point out
better style, for example

  .BR "some word" .

is clearly better style than

  .B some word\c
  \&.

even though both are correct man(7) code and even though there are
situations in man(7) where \c is unavoidable.

But very frequently, situations arise where man(7) doesn't really
allow any good solution, and the best you can do is not making
the source gratuitiously worse than it needs to be.

The .TP .IP .TP idiom is such an example.  It's definitely ugly from
both semantic and stylistic points of view, but no good solution
is available.  I'm willing to go further and claim that no better
solution can be designed even if you are willing to introduce a new
macro or change the way the .TP API is defined, even in incompatible
ways, because it's not this particular macro that is broken.  What is
broken is the fundamental design of the language: the language not
only predates the concept of semantic markup, but it also predates
the concept of block nesting in markup languages.  Yes, that is hard
to believe for people born after 1970 because those people have
essentially grown up with HTML and LaTeX and those two markup
languages have defined their concept of what a markup language is,
but let's face it, man(7) predates those fundamental concepts,
and it shows all over the place.

As long as people are using the language, mandoc(1) needs to somehow
deal with the mess.  I'm not happy with that because it is wasting a
lot of development time which could be spent in more productive ways,
but what can i do...

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
  2023-10-19 16:19         ` Ingo Schwarze
@ 2023-10-19 21:32           ` Alejandro Colomar
  0 siblings, 0 replies; 8+ messages in thread
From: Alejandro Colomar @ 2023-10-19 21:32 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: tech, branden

[-- Attachment #1: Type: text/plain, Size: 6564 bytes --]

On Thu, Oct 19, 2023 at 06:19:23PM +0200, Ingo Schwarze wrote:
> Hi Alejandro,
> 
> Alejandro Colomar wrote on Thu, Oct 19, 2023 at 05:17:10PM +0200:
> 
> > I had this gripe with man(7) some years ago.  I thought of using the
> > following instead, which slightly complicates the source code, but makes
> > it more logical.
> > 
> > 	$ cat nested_indent.man 
> > 	.TH nested_indent 7 2023-10-19 experiments
> > 	.SH Ingo said:
> > 	.TP
> > 	Todo
> > 	Currently, when formatting .TP or .IP with a non-empty head,
> > 	[yada yada]
> > 	.RS
> > 	.PP
> > 	When formatting .IP or .RS with an empty head, mandoc needs
> > 	[yada yada]
> > 	.RE
> > 
> > As you can see, here the indentation is controlled by a single RS/RE
> > pair, and everything within it uses PP as a normal paragraph separator.
> 
> While that also generates correct terminal and typographical (PS, PDF)
> output in the same purely presentational sense as .TP .IP .TP, it
> does not help with respect to the semantic problem we are discussing
> here.
> 
> Look at the AST generated by mandoc(1):
> 
>    $ mandoc -T tree nested_indent.man
>   title = "nested_indent"
>   sec   = "7"
>   vol   = "Miscellaneous Information Manual"
>   os    = "experiments"
>   date  = "2023-10-19"
>   
>   SH (block) *2:2
>     SH (head) 2:2 ID=HREF=Ingo_said:
>         Ingo (text) 2:5
>         said: (text) 2:10
>     SH (body) 2:2
>         TP (block) *3:2
>           TP (head) 3:2 ID=HREF
>               Todo (text) *4:1
>           TP (body) 4:1
>               Currently, when formatting .TP or .IP \
>                   with a nonempty head, (text) *5:1
>               [yada yada] (text) *6:1
>         RS (block) *7:2
>           RS (head) 7:2
>           RS (body) 7:2
>               PP (block) *8:2
>                 PP (head) 8:2
>                 PP (body) 8:2
>                     When formatting .IP or .RS with an empty head,
>                         mandoc needs (text) *9:1
>                     [yada yada] (text) *10:1
>         TP (block) *12:2
>           TP (head) 12:2 ID=HREF=final
>               final tag (text) *13:1
>           TP (body) 13:1
>               final body (text) *14:1
> 
> You see that the first .TP, the .RS, and the second .TP are all child
> nodes of the top-level .SH.  The .RS is not a child of the .TP but
> a sibling.  The two .TP nodes still aren't siblings of each other.
> 
> Now on first sight, you might blame me for that and call it a mandoc
> artifact, arguing that mandoc instead ought to treat the .RS as a
> child of the first .TP.  But no, that would be incorrect parsing
> for the following reason: the .TP inmplies an indentation, and
> the .RS also implies an indentation.  If the .RS were a child of
> the .TP, we would get double indentation.  You can make that
> argument even more convincing by adding a width argument to .RS
> and varying that argument.  That way, you see that the .RS is
> indented relative to the .SH, not relative to the .TP.
> 
> There are some cases where it is not completely clear whether one
> man(7) node following another man(7) node is a child or a sibling.
> mandoc(1) makes arbitrary choices in such ambiguous cases, usually
> opting for sibling relations where possible and avoiding unnecessary
> child relationships.  But this is not an ambiguous case.  Just like
> the .IP, the .RS is definitely a sibling and not a child of the .TP.
> As i said, no block can nest inside .TP.
> 
> That's why i brought up .RS in my reply and developed rules
> for handling it in a similar way as .IP, even though you did
> not mention .RS before.
> 
> > You could put the RS before the first paragraph, but then an unwanted
> > line break appears after the tag.
> 
> No matter where you put the .RS, it will never be a child of .TP.
> 
> > (Maybe man(7) could be tweaked so
> > that RS doesn't insert the line break after a TP.)
> 
> Not really a useful idea because .RS doesn't help with the actual
> problem in the first place.
> 
> > In the end I didn't switch to that scheme, because IP just worked, but
> > I might consider it if it proves to be useful.  What do you think?
> 
> As i said, i am not aware of a better solution than .TP .IP .TP.
> In particular, .RS is not better because it causes exactly the
> same trouble and potentially more trouble besides.
> 
> But i also said that trying to define "good style" for man(7)
> is a fool's errand.  Because man(7) code is so exceedingly difficult
> to write, man(7) code that is very clearly bad style is very often
> found in the wild, so there is ample opportunity for saying "this
> is bad style."  In some cases, it is also possible to point out
> better style, for example
> 
>   .BR "some word" .
> 
> is clearly better style than
> 
>   .B some word\c
>   \&.
> 
> even though both are correct man(7) code and even though there are
> situations in man(7) where \c is unavoidable.
> 
> But very frequently, situations arise where man(7) doesn't really
> allow any good solution, and the best you can do is not making
> the source gratuitiously worse than it needs to be.
> 
> The .TP .IP .TP idiom is such an example.  It's definitely ugly from
> both semantic and stylistic points of view, but no good solution
> is available.  I'm willing to go further and claim that no better
> solution can be designed even if you are willing to introduce a new
> macro or change the way the .TP API is defined, even in incompatible
> ways, because it's not this particular macro that is broken.  What is
> broken is the fundamental design of the language: the language not
> only predates the concept of semantic markup, but it also predates
> the concept of block nesting in markup languages.  Yes, that is hard
> to believe for people born after 1970 because those people have
> essentially grown up with HTML and LaTeX and those two markup
> languages have defined their concept of what a markup language is,
> but let's face it, man(7) predates those fundamental concepts,
> and it shows all over the place.
> 
> As long as people are using the language, mandoc(1) needs to somehow
> deal with the mess.  I'm not happy with that because it is wasting a
> lot of development time which could be spent in more productive ways,
> but what can i do...
> 
> Yours,
>   Ingo

Hi Ingo,

Hmmm.  You convinced me (about the problems of man(7)), I think.


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-19 21:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16 11:32 mandoc -man -Thtml bug: inconsistent vertical space before .TP Alejandro Colomar
2023-10-16 16:28 ` Ingo Schwarze
2023-10-16 17:22   ` Alejandro Colomar
2023-10-19 14:45     ` Ingo Schwarze
2023-10-19 15:10       ` Ingo Schwarze
2023-10-19 15:17       ` Alejandro Colomar
2023-10-19 16:19         ` Ingo Schwarze
2023-10-19 21:32           ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).