From mboxrd@z Thu Jan  1 00:00:00 1970
From: ralph@inputplus.co.uk (Ralph Corderoy)
Date: Sun, 04 Mar 2018 21:50:23 +0000
Subject: [TUHS] [groff] The hyphenation algorithm produces wrong results
In-Reply-To: <645D5FCC-7AAB-43D0-8035-FABB23986EAA@bitblocks.com>
References: <201803042023.w24KN0Kt013712@coolidge.cs.Dartmouth.EDU>
 <CAC20D2OABQOq6w230j3NddaX=bRQ-963zBD=uzXHMDYY1S7V1w@mail.gmail.com>
 <645D5FCC-7AAB-43D0-8035-FABB23986EAA@bitblocks.com>
Message-ID: <20180304215023.883981F96E@orac.inputplus.co.uk>

Hi Doug,

Bakul wrote:
> I remembered reading about Knuth's line-breaking algorithm in Software
> Practice & Experience in early eighties and being quite impressed with
> it.  So may be that clear description of the algorithm has something
> to do with it?  Ah, here it is:
>
> “Breaking Paragraphs into lines” by Donald Knuth & Plass, SP&E, Volume
> 11, issue 11, Nov. 1981

That's more to do with TeX looking at the whole paragraph when deciding
where to split lines.  Hyphenation is part of that because a word might
help out by being the ideal thing to split and have the rest of the
lines sit easily in their length, but TeX's hyphenation algorithm is
distinct again.

Ted Harding gives some background on the groff list back in 2001,
https://lists.gnu.org/archive/html/groff/2001-03/msg00026.html
but I expect groff used TeX's algorithm because it was published, could
handle multiple languages, e.g. hyphen.us, and the data files were
available to contort into what groff ended up using in its simplified
TeX algorithm.

    $ cd /usr/share/groff/1.22.3/tmac
    $ ls hyphen*
    hyphen.den  hyphenex.cs hyphenex.us hyphen.sv   hyphen.us
    hyphen.cs   hyphen.det  hyphenex.de hyphen.fr
    $

They've comments explaining their content.

Werner Lemburg on the groff list probably knows for certain as he had to
fathom all this out before becoming groff's excellent maintainer for
many years.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy