tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: Alejandro Colomar <alx@kernel.org>
Cc: tech@mandoc.bsd.lv, branden@debian.org
Subject: Re: mandoc -man -Thtml bug: inconsistent vertical space before .TP
Date: Thu, 19 Oct 2023 18:19:23 +0200	[thread overview]
Message-ID: <ZTFXC0LDDyS0+X/7@asta-kit.de> (raw)
In-Reply-To: <ZTFIfEt1T2eSHMLC@debian>

Hi Alejandro,

Alejandro Colomar wrote on Thu, Oct 19, 2023 at 05:17:10PM +0200:

> I had this gripe with man(7) some years ago.  I thought of using the
> following instead, which slightly complicates the source code, but makes
> it more logical.
> 
> 	$ cat nested_indent.man 
> 	.TH nested_indent 7 2023-10-19 experiments
> 	.SH Ingo said:
> 	.TP
> 	Todo
> 	Currently, when formatting .TP or .IP with a non-empty head,
> 	[yada yada]
> 	.RS
> 	.PP
> 	When formatting .IP or .RS with an empty head, mandoc needs
> 	[yada yada]
> 	.RE
> 
> As you can see, here the indentation is controlled by a single RS/RE
> pair, and everything within it uses PP as a normal paragraph separator.

While that also generates correct terminal and typographical (PS, PDF)
output in the same purely presentational sense as .TP .IP .TP, it
does not help with respect to the semantic problem we are discussing
here.

Look at the AST generated by mandoc(1):

   $ mandoc -T tree nested_indent.man
  title = "nested_indent"
  sec   = "7"
  vol   = "Miscellaneous Information Manual"
  os    = "experiments"
  date  = "2023-10-19"
  
  SH (block) *2:2
    SH (head) 2:2 ID=HREF=Ingo_said:
        Ingo (text) 2:5
        said: (text) 2:10
    SH (body) 2:2
        TP (block) *3:2
          TP (head) 3:2 ID=HREF
              Todo (text) *4:1
          TP (body) 4:1
              Currently, when formatting .TP or .IP \
                  with a nonempty head, (text) *5:1
              [yada yada] (text) *6:1
        RS (block) *7:2
          RS (head) 7:2
          RS (body) 7:2
              PP (block) *8:2
                PP (head) 8:2
                PP (body) 8:2
                    When formatting .IP or .RS with an empty head,
                        mandoc needs (text) *9:1
                    [yada yada] (text) *10:1
        TP (block) *12:2
          TP (head) 12:2 ID=HREF=final
              final tag (text) *13:1
          TP (body) 13:1
              final body (text) *14:1

You see that the first .TP, the .RS, and the second .TP are all child
nodes of the top-level .SH.  The .RS is not a child of the .TP but
a sibling.  The two .TP nodes still aren't siblings of each other.

Now on first sight, you might blame me for that and call it a mandoc
artifact, arguing that mandoc instead ought to treat the .RS as a
child of the first .TP.  But no, that would be incorrect parsing
for the following reason: the .TP inmplies an indentation, and
the .RS also implies an indentation.  If the .RS were a child of
the .TP, we would get double indentation.  You can make that
argument even more convincing by adding a width argument to .RS
and varying that argument.  That way, you see that the .RS is
indented relative to the .SH, not relative to the .TP.

There are some cases where it is not completely clear whether one
man(7) node following another man(7) node is a child or a sibling.
mandoc(1) makes arbitrary choices in such ambiguous cases, usually
opting for sibling relations where possible and avoiding unnecessary
child relationships.  But this is not an ambiguous case.  Just like
the .IP, the .RS is definitely a sibling and not a child of the .TP.
As i said, no block can nest inside .TP.

That's why i brought up .RS in my reply and developed rules
for handling it in a similar way as .IP, even though you did
not mention .RS before.

> You could put the RS before the first paragraph, but then an unwanted
> line break appears after the tag.

No matter where you put the .RS, it will never be a child of .TP.

> (Maybe man(7) could be tweaked so
> that RS doesn't insert the line break after a TP.)

Not really a useful idea because .RS doesn't help with the actual
problem in the first place.

> In the end I didn't switch to that scheme, because IP just worked, but
> I might consider it if it proves to be useful.  What do you think?

As i said, i am not aware of a better solution than .TP .IP .TP.
In particular, .RS is not better because it causes exactly the
same trouble and potentially more trouble besides.

But i also said that trying to define "good style" for man(7)
is a fool's errand.  Because man(7) code is so exceedingly difficult
to write, man(7) code that is very clearly bad style is very often
found in the wild, so there is ample opportunity for saying "this
is bad style."  In some cases, it is also possible to point out
better style, for example

  .BR "some word" .

is clearly better style than

  .B some word\c
  \&.

even though both are correct man(7) code and even though there are
situations in man(7) where \c is unavoidable.

But very frequently, situations arise where man(7) doesn't really
allow any good solution, and the best you can do is not making
the source gratuitiously worse than it needs to be.

The .TP .IP .TP idiom is such an example.  It's definitely ugly from
both semantic and stylistic points of view, but no good solution
is available.  I'm willing to go further and claim that no better
solution can be designed even if you are willing to introduce a new
macro or change the way the .TP API is defined, even in incompatible
ways, because it's not this particular macro that is broken.  What is
broken is the fundamental design of the language: the language not
only predates the concept of semantic markup, but it also predates
the concept of block nesting in markup languages.  Yes, that is hard
to believe for people born after 1970 because those people have
essentially grown up with HTML and LaTeX and those two markup
languages have defined their concept of what a markup language is,
but let's face it, man(7) predates those fundamental concepts,
and it shows all over the place.

As long as people are using the language, mandoc(1) needs to somehow
deal with the mess.  I'm not happy with that because it is wasting a
lot of development time which could be spent in more productive ways,
but what can i do...

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv


  reply	other threads:[~2023-10-19 16:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 11:32 Alejandro Colomar
2023-10-16 16:28 ` Ingo Schwarze
2023-10-16 17:22   ` Alejandro Colomar
2023-10-19 14:45     ` Ingo Schwarze
2023-10-19 15:10       ` Ingo Schwarze
2023-10-19 15:17       ` Alejandro Colomar
2023-10-19 16:19         ` Ingo Schwarze [this message]
2023-10-19 21:32           ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTFXC0LDDyS0+X/7@asta-kit.de \
    --to=schwarze@usta.de \
    --cc=alx@kernel.org \
    --cc=branden@debian.org \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).