From mboxrd@z Thu Jan 1 00:00:00 1970 From: ralph@inputplus.co.uk (Ralph Corderoy) Date: Sun, 04 Mar 2018 21:50:23 +0000 Subject: [TUHS] [groff] The hyphenation algorithm produces wrong results In-Reply-To: <645D5FCC-7AAB-43D0-8035-FABB23986EAA@bitblocks.com> References: <201803042023.w24KN0Kt013712@coolidge.cs.Dartmouth.EDU> <645D5FCC-7AAB-43D0-8035-FABB23986EAA@bitblocks.com> Message-ID: <20180304215023.883981F96E@orac.inputplus.co.uk> Hi Doug, Bakul wrote: > I remembered reading about Knuth's line-breaking algorithm in Software > Practice & Experience in early eighties and being quite impressed with > it. So may be that clear description of the algorithm has something > to do with it? Ah, here it is: > > “Breaking Paragraphs into lines” by Donald Knuth & Plass, SP&E, Volume > 11, issue 11, Nov. 1981 That's more to do with TeX looking at the whole paragraph when deciding where to split lines. Hyphenation is part of that because a word might help out by being the ideal thing to split and have the rest of the lines sit easily in their length, but TeX's hyphenation algorithm is distinct again. Ted Harding gives some background on the groff list back in 2001, https://lists.gnu.org/archive/html/groff/2001-03/msg00026.html but I expect groff used TeX's algorithm because it was published, could handle multiple languages, e.g. hyphen.us, and the data files were available to contort into what groff ended up using in its simplified TeX algorithm. $ cd /usr/share/groff/1.22.3/tmac $ ls hyphen* hyphen.den hyphenex.cs hyphenex.us hyphen.sv hyphen.us hyphen.cs hyphen.det hyphenex.de hyphen.fr $ They've comments explaining their content. Werner Lemburg on the groff list probably knows for certain as he had to fathom all this out before becoming groff's excellent maintainer for many years. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy