The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: ralph@inputplus.co.uk (Ralph Corderoy)
Subject: [TUHS] [groff] The hyphenation algorithm produces wrong results
Date: Sun, 04 Mar 2018 21:50:23 +0000	[thread overview]
Message-ID: <20180304215023.883981F96E@orac.inputplus.co.uk> (raw)
In-Reply-To: <645D5FCC-7AAB-43D0-8035-FABB23986EAA@bitblocks.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

Hi Doug,

Bakul wrote:
> I remembered reading about Knuth's line-breaking algorithm in Software
> Practice & Experience in early eighties and being quite impressed with
> it.  So may be that clear description of the algorithm has something
> to do with it?  Ah, here it is:
>
> “Breaking Paragraphs into lines” by Donald Knuth & Plass, SP&E, Volume
> 11, issue 11, Nov. 1981

That's more to do with TeX looking at the whole paragraph when deciding
where to split lines.  Hyphenation is part of that because a word might
help out by being the ideal thing to split and have the rest of the
lines sit easily in their length, but TeX's hyphenation algorithm is
distinct again.

Ted Harding gives some background on the groff list back in 2001,
https://lists.gnu.org/archive/html/groff/2001-03/msg00026.html
but I expect groff used TeX's algorithm because it was published, could
handle multiple languages, e.g. hyphen.us, and the data files were
available to contort into what groff ended up using in its simplified
TeX algorithm.

    $ cd /usr/share/groff/1.22.3/tmac
    $ ls hyphen*
    hyphen.den  hyphenex.cs hyphenex.us hyphen.sv   hyphen.us
    hyphen.cs   hyphen.det  hyphenex.de hyphen.fr
    $

They've comments explaining their content.

Werner Lemburg on the groff list probably knows for certain as he had to
fathom all this out before becoming groff's excellent maintainer for
many years.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy


  parent reply	other threads:[~2018-03-04 21:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-04 20:23 Doug McIlroy
2018-03-04 20:42 ` Clem Cole
2018-03-04 21:00   ` Bakul Shah
2018-03-04 21:32     ` Toby Thain
2018-03-04 21:50     ` Ralph Corderoy [this message]
2018-03-04 22:36       ` Bakul Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180304215023.883981F96E@orac.inputplus.co.uk \
    --to=ralph@inputplus.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).