The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Re: Paragraphs formatted differently depending on previous ones
@ 2025-05-03 12:14 Douglas McIlroy
  2025-05-03 12:58 ` G. Branden Robinson
  0 siblings, 1 reply; 3+ messages in thread
From: Douglas McIlroy @ 2025-05-03 12:14 UTC (permalink / raw)
  To: G. Branden Robinson, TUHS main list

Branden,

> The relevant function fits on one screen, if your terminal window is at
> least 36 lines high.  :)  (Much of it is given over to comments.)

>   https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/env.cpp?id=d96a9c58bbe296b065fa250e3ea1e1a410cdde81#n2185

Actually there's still another function, spread_space that contains
the inner R-L and L-R loops. The whole thing has become astonishingly
complicated compared to what I remember as a few (carefully crafted)
lines of code in the early roff. I admire your intrepid forays into
the groff woods, of which this part must be among the less murky.

Doug

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [TUHS] Re: Paragraphs formatted differently depending on previous ones
  2025-05-03 12:14 [TUHS] Re: Paragraphs formatted differently depending on previous ones Douglas McIlroy
@ 2025-05-03 12:58 ` G. Branden Robinson
  0 siblings, 0 replies; 3+ messages in thread
From: G. Branden Robinson @ 2025-05-03 12:58 UTC (permalink / raw)
  To: groff; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 11202 bytes --]

[looping the groff list back in; Doug emailed TUHS instead]

At 2025-05-03T08:14:18-0400, Douglas McIlroy wrote:
> > The relevant function fits on one screen, if your terminal window is
> > at least 36 lines high.  :)  (Much of it is given over to comments.)
> 
> >   https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/env.cpp?id=d96a9c58bbe296b065fa250e3ea1e1a410cdde81#n2185
> 
> Actually there's still another function, spread_space that contains
> the inner R-L and L-R loops.

Yes.  `distribute_space()` is in "env.cpp" (environment handling) and
operates on the output line.  `spread_space()` is in "node.cpp" and is
what alters the width of `word_space_node` (and derived
`unbreakable_space_node`) objects on the line.  Whereas in troff mode,
often every adjustable space on an underset line experiences adjustment,
in nroff mode the converse is frequently true, as shown below.

Some of this stuff will be more visible for debugging purposes with the
new `pline` request and improved `pm` request in the forthcoming groff
1.24.0 release.

Here's an altered version of the adjustment demonstrator I cooked up for
Alex.  It uses a shorter line length and fewer repetitions of "alex",
but still illustrates alternating adjustment "parity", as I term it.

$ { echo .ll 15n; echo .di dd; for n in $(seq 7); do echo alex; done; \
  printf '.pl \\n(nlu\n'; echo .di; echo .pm dd; echo .dd; } \
  | nroff 2>/dev/null | cat -s
alex alex  alex
alex  alex alex
alex

If we discard normal output with the `-z` option, reënable output to
standard error, and send that to jq(1) for formatting, we get more
information, which I'll relegate to a footnote because it's lengthy.[1]

It also serves to illustrate how we can dump diversions, and the
intriguing properties thereof in GNU troff.

$ { echo .ll 15n; echo .di dd; for n in $(seq 7); do echo alex; done; \
  printf '.pl \\n(nlu\n'; echo .di; echo .pm dd; } | nroff -z 2>&1 | jq

> The whole thing has become astonishingly complicated compared to what
> I remember as a few (carefully crafted) lines of code in the early
> roff.

The first computer I ever touched, and programmed, had 16 KB of RAM.
Necessity is a mother in more than one sense.  ;-)

I'm doing what I can with the GNU troff code base to make it more
intelligible.  Among the windmills I'm tilting at are improved type
annotations (like using `bool` for Booleans instead of integers for Yet
Another Purpose), explicitly annotated null pointers, and above all,
more meaningful variable and function names.  Kernighan and Plauger,
then Pike, beat this drum repeatedly in their books on programming
style, but we're still not rid of hackers who think naming a variable
`bflag` is a good idea.

> I admire your intrepid forays into the groff woods, of which this part
> must be among the less murky.

Thank you!  The reformed handling of device extension requests/escapes
so that they could encode Unicode characters, and their conversion into
nodes, was almost more murk than I could stand.  I think it might have
helped to have some of the new introspection features 12-18 months ago,
but we have them now.  There's always more to learn, and to document.

For those who hadn't noticed, I'll put the relevant part of the "NEWS"
file in another footnote.[2]

I'm on the verge of adding another, `pftr`, to dump the dictionary of
font translations (remappings), because mdoc is proving to be a
headache this week with Savannah #66126.

Regards,
Branden

[1] Observe how some of the `word_space_node`s have a width of `24`, and
    others a width of `48`.  The latter are the "adjusted" spaces.

$ { echo .ll 15n; echo .di dd; for n in $(seq 7); do echo alex; done; printf '.pl \\n(nlu\n'; echo .di; echo .pm dd; } | nroff -z 2>&1 | jq
{
  "name": "dd",
  "file name": "<standard input>",
  "starting line number": 2,
  "length": 35,
  "contents": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\n\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\n",
  "node list": [
    {
      "type": "line_start_node",
      "diversion level": 0,
      "is_special_node": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "a"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "e"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "x"
    },
    {
      "type": "word_space_node",
      "diversion level": 0,
      "is_special_node": false,
      "hunits": 48,
      "undiscardable": true,
      "is hyphenless breakpoint": false,
      "terminal_color": "default",
      "width_list": [
        {
          "width": 24,
          "sentence_width": 24
        }
      ],
      "unformat": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "a"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "e"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "x"
    },
    {
      "type": "word_space_node",
      "diversion level": 0,
      "is_special_node": false,
      "hunits": 24,
      "undiscardable": true,
      "is hyphenless breakpoint": false,
      "terminal_color": "default",
      "width_list": [
        {
          "width": 24,
          "sentence_width": 24
        }
      ],
      "unformat": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "a"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "e"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "x"
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": -40
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": 0
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "a"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "e"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "x"
    },
    {
      "type": "word_space_node",
      "diversion level": 0,
      "is_special_node": false,
      "hunits": 24,
      "undiscardable": true,
      "is hyphenless breakpoint": false,
      "terminal_color": "default",
      "width_list": [
        {
          "width": 24,
          "sentence_width": 24
        }
      ],
      "unformat": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "a"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "e"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "x"
    },
    {
      "type": "word_space_node",
      "diversion level": 0,
      "is_special_node": false,
      "hunits": 48,
      "undiscardable": true,
      "is hyphenless breakpoint": false,
      "terminal_color": "default",
      "width_list": [
        {
          "width": 24,
          "sentence_width": 24
        }
      ],
      "unformat": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "a"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "e"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "x"
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": -40
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": 0
    }
  ]
}

[2] NEWS:

---snip---
*  A new request, `pcolor`, reports to the standard error stream details
   of each color name specified as an argument, including its color
   space identifier and channel value assignments.  Without arguments,
   all defined colors are listed.  (A device's default stroke and/or
   fill colors, "default", are not listed since they are immutable and
   their details unknown to the formatter.)

*  A new request, `pchar`, reports to the standard error stream data
   about any ordinary, special, or indexed character arguments.

*  A new request, `pcomposite`, reports to the standard error stream the
   list of defined composite character mappings.

*  A new request, `phw`, reports to the standard error stream the
   list of hyphenation exceptions associated with the current
   hyphenation language.

*  A new request, `pline`, reports to the standard error stream the list
   of output nodes (an internal data structure) corresponding to the
   pending output line.  The list is empty if no such nodes exist.

*  The `pm` request now interprets any arguments as a sequence of macro,
   string, or diversion names, and reports their contents.

*  The `pnr` request now additionally reports the autoincrementation
   amount and interpolation format of each register (if it is not
   string-valued).

*  The `pnr` request now accepts arguments.  It treats each as
   identifying a register and reports its properties to the standard
   error stream.

*  A new request, `pstream`, reports to the standard error stream the
   name of each stream opened with the `open` or `opena` requests, the
   name of the file backing it, and its mode (writing or appending).
---end snip---

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [TUHS] Re: Paragraphs formatted differently depending on previous ones
       [not found]   ` <CACRhBXPWc+Q3mxLsPwLBsg3JO8s1x_6Fppepbw=8wMoEergHuA@mail.gmail.com>
@ 2025-05-06 14:16     ` G. Branden Robinson
  0 siblings, 0 replies; 3+ messages in thread
From: G. Branden Robinson @ 2025-05-06 14:16 UTC (permalink / raw)
  To: groff; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 4620 bytes --]

[looping TUHS back in since I'm correcting a message I sent there]

Hi Dave,

At 2025-05-06T08:36:55-0500, Dave Kemper wrote:
> On Fri, May 2, 2025 at 7:35 PM G. Branden Robinson
> <g.branden.robinson@gmail.com> wrote:
> > I guess another way of saying this is that, as I conceive it, a line
> > that is "adequately full" contributes to the page's uniformity of
> > grayness by definition.
> 
> For an example of less-than-ideal results if this is _not_ considered
> the case (groff's behavior before this change), see
> http://savannah.gnu.org/bugs/?60673#comment0 (the initial report that
> precipitated the commit Doug is commenting on).

Yes.  In my reply to Doug I incorrectly characterized the resolution of
this bug as a "2023" change of mine, but I actually landed the change in
2021.  It simply took until 2023 to appear in a released _groff_.

To make this message more TUHS-o-riffic, let's observe that input using
DWB 3.3 troff and Heirloom Doctools troff (descended from Solaris troff,
descended from DWB 2.0 troff [I think]), and both of which descend from
Kernighan's device-independent troff circa 1980.

$ DWBHOME=. ./bin/nroff groff-60673.roff | cat -s
While the example in bug  57836's  original  report  is  somewhat
contrived and a bit of an edge case in real life, there turns out
to be a more innate  bug  in  grotty's  balancing  algorithm.  As
mentioned before (and easily observable), when grotty adds spaces
to a line in the process  of  justifying  it,  the  algorithm  it
utilizes adds spaces from opposite ends of each line. But when it
adds this space, it does not take  into  account  lines  with  no
adjustment at all required. Therefore if space only need be added
to every other line of the text, all  the  space  ends  up  being
added to the same end of the line, degrading the uniform grayness
of the output, as can be seen  in  this  example.  There  is  one
fairly simple way to address this: grotty shouldn't "count" lines
that don't need to be adjusted;  instead,  it  should  apply  the
alternation pattern only to those lines that do need adjustment.

$ ./bin/nroff groff-60673.roff | cat -s
While the example in bug  57836's  original  report  is  somewhat
contrived and a bit of an edge case in real life, there turns out
to be a more innate  bug  in  grotty's  balancing  algorithm.  As
mentioned before (and easily observable), when grotty adds spaces
to a line in the process  of  justifying  it,  the  algorithm  it
utilizes adds spaces from opposite ends of each line. But when it
adds this space, it does not take  into  account  lines  with  no
adjustment at all required. Therefore if space only need be added
to every other line of the text, all  the  space  ends  up  being
added to the same end of the line, degrading the uniform grayness
of the output, as can be seen  in  this  example.  There  is  one
fairly simple way to address this: grotty shouldn't "count" lines
that don't need to be adjusted;  instead,  it  should  apply  the
alternation pattern only to those lines that do need adjustment.

They are the same, and differ from groff 1.22.4 and earlier only in that
they adjust spaces starting from the right end of the line instead of
the left.

At the risk of tooting our own horn, here's how groff 1.23.0+ handles
the same input.

$ ~/groff-1.23.0/bin/nroff groff-60673.roff | cat -s
While  the  example  in  bug  57836’s original report is somewhat
contrived and a bit of an edge case in real life, there turns out
to be a more innate  bug  in  grotty’s  balancing  algorithm.  As
mentioned before (and easily observable), when grotty adds spaces
to  a  line  in  the  process  of justifying it, the algorithm it
utilizes adds spaces from opposite ends of each line. But when it
adds this space, it does not take  into  account  lines  with  no
adjustment at all required. Therefore if space only need be added
to  every  other  line  of  the text, all the space ends up being
added to the same end of the line, degrading the uniform grayness
of the output, as can be seen  in  this  example.  There  is  one
fairly simple way to address this: grotty shouldn’t "count" lines
that  don’t  need  to  be  adjusted; instead, it should apply the
alternation pattern only to those lines that do need adjustment.

Three observations:

1.  One can find the input at Dave's URL above.
2.  The input disables inter-sentence spacing.
3.  The adjustment algorithm is a property not of grotty(1) (the output
    driver), but of GNU troff itself.

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-06 14:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-03 12:14 [TUHS] Re: Paragraphs formatted differently depending on previous ones Douglas McIlroy
2025-05-03 12:58 ` G. Branden Robinson
     [not found] <CAKH6PiXM48KOkv5vi5RPH7VtwBeWFxwD7x5=NcEZB1ybw2fQrA@mail.gmail.com>
     [not found] ` <20250503003445.2ker2h3dgomawk6h@illithid>
     [not found]   ` <CACRhBXPWc+Q3mxLsPwLBsg3JO8s1x_6Fppepbw=8wMoEergHuA@mail.gmail.com>
2025-05-06 14:16     ` G. Branden Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).