The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] [groff] The hyphenation algorithm produces wrong results
@ 2018-03-04 20:23 Doug McIlroy
  2018-03-04 20:42 ` Clem Cole
  0 siblings, 1 reply; 6+ messages in thread
From: Doug McIlroy @ 2018-03-04 20:23 UTC (permalink / raw)



I hadn't realized that groff hyphenation had been taken from
Tex, not troff. Is that becuase Tex did a better job, or 
because troff's was deemed proprietary?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] [groff] The hyphenation algorithm produces wrong results
  2018-03-04 20:23 [TUHS] [groff] The hyphenation algorithm produces wrong results Doug McIlroy
@ 2018-03-04 20:42 ` Clem Cole
  2018-03-04 21:00   ` Bakul Shah
  0 siblings, 1 reply; 6+ messages in thread
From: Clem Cole @ 2018-03-04 20:42 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Sun, Mar 4, 2018 at 3:23 PM, Doug McIlroy <doug at cs.dartmouth.edu> wrote:

>
> I hadn't realized that groff hyphenation had been taken from
> Tex, not troff. Is that becuase Tex did a better job, or
> because troff's was deemed proprietary?
>
> Given the author, I would guess the later as he wanted to be FOSS and
would not have looked at the ditroff source - but that guess is worth just
that ;-)


ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180304/a6c2dac7/attachment.html>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] [groff] The hyphenation algorithm produces wrong results
  2018-03-04 20:42 ` Clem Cole
@ 2018-03-04 21:00   ` Bakul Shah
  2018-03-04 21:32     ` Toby Thain
  2018-03-04 21:50     ` Ralph Corderoy
  0 siblings, 2 replies; 6+ messages in thread
From: Bakul Shah @ 2018-03-04 21:00 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]



> On Mar 4, 2018, at 12:42 PM, Clem Cole <clemc at ccc.com> wrote:
> 
> 
>> On Sun, Mar 4, 2018 at 3:23 PM, Doug McIlroy <doug at cs.dartmouth.edu> wrote:
>> 
>> I hadn't realized that groff hyphenation had been taken from
>> Tex, not troff. Is that becuase Tex did a better job, or
>> because troff's was deemed proprietary?
>> 
> 
> Given the author, I would guess the later as he wanted to be FOSS and would not have looked at the ditroff source - but that guess is worth just that ;-)

I remembered reading about Knuth's line-breaking  algorithm in
Software Practice & Experience in early eighties and being quite
impressed with it. So may be that clear description of the algorithm
has something to do with it? Ah, here it is:

“Breaking Paragraphs into lines” by Donald Knuth & Plass,
SP&E, Volume 11, issue 11, Nov. 1981

(Download from Wiley is not free)



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180304/3b71374a/attachment.html>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] [groff] The hyphenation algorithm produces wrong results
  2018-03-04 21:00   ` Bakul Shah
@ 2018-03-04 21:32     ` Toby Thain
  2018-03-04 21:50     ` Ralph Corderoy
  1 sibling, 0 replies; 6+ messages in thread
From: Toby Thain @ 2018-03-04 21:32 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

On 2018-03-04 4:00 PM, Bakul Shah wrote:
> 
> 
> On Mar 4, 2018, at 12:42 PM, Clem Cole <clemc at ccc.com
> <mailto:clemc at ccc.com>> wrote:
> 
>>
>> On Sun, Mar 4, 2018 at 3:23 PM, Doug McIlroy <doug at cs.dartmouth.edu
>> <mailto:doug at cs.dartmouth.edu>> wrote:
>>
>>
>>     I hadn't realized that groff hyphenation had been taken from
>>     Tex, not troff. Is that becuase Tex did a better job, or
>>     because troff's was deemed proprietary?
>>
>> Given the author, I would guess the later as he wanted to be FOSS and
>> would not have looked at the ditroff source - but that guess is worth
>> just that ;-)
> 
> I remembered reading about Knuth's line-breaking  algorithm in
> Software Practice & Experience in early eighties and being quite
> impressed with it. So may be that clear description of the algorithm
> has something to do with it? Ah, here it is:
> 
> “Breaking Paragraphs into lines” by Donald Knuth & Plass,
> SP&E, Volume 11, issue 11, Nov. 1981

That's the line breaker, which is an important contributor to the
quality of TeX output.

But TeX's *hyphenation* algorithm per se was invented by Franklin Mark
Liang and was indeed considerably better than its predecessors and
competitors (including most or all commercial typesetting software --
which was a big part of the motivation for it):

https://tug.org/docs/liang/liang-thesis.pdf

--Toby


> 
> (Download from Wiley is not free)
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] [groff] The hyphenation algorithm produces wrong results
  2018-03-04 21:00   ` Bakul Shah
  2018-03-04 21:32     ` Toby Thain
@ 2018-03-04 21:50     ` Ralph Corderoy
  2018-03-04 22:36       ` Bakul Shah
  1 sibling, 1 reply; 6+ messages in thread
From: Ralph Corderoy @ 2018-03-04 21:50 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

Hi Doug,

Bakul wrote:
> I remembered reading about Knuth's line-breaking algorithm in Software
> Practice & Experience in early eighties and being quite impressed with
> it.  So may be that clear description of the algorithm has something
> to do with it?  Ah, here it is:
>
> “Breaking Paragraphs into lines” by Donald Knuth & Plass, SP&E, Volume
> 11, issue 11, Nov. 1981

That's more to do with TeX looking at the whole paragraph when deciding
where to split lines.  Hyphenation is part of that because a word might
help out by being the ideal thing to split and have the rest of the
lines sit easily in their length, but TeX's hyphenation algorithm is
distinct again.

Ted Harding gives some background on the groff list back in 2001,
https://lists.gnu.org/archive/html/groff/2001-03/msg00026.html
but I expect groff used TeX's algorithm because it was published, could
handle multiple languages, e.g. hyphen.us, and the data files were
available to contort into what groff ended up using in its simplified
TeX algorithm.

    $ cd /usr/share/groff/1.22.3/tmac
    $ ls hyphen*
    hyphen.den  hyphenex.cs hyphenex.us hyphen.sv   hyphen.us
    hyphen.cs   hyphen.det  hyphenex.de hyphen.fr
    $

They've comments explaining their content.

Werner Lemburg on the groff list probably knows for certain as he had to
fathom all this out before becoming groff's excellent maintainer for
many years.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [TUHS] [groff] The hyphenation algorithm produces wrong results
  2018-03-04 21:50     ` Ralph Corderoy
@ 2018-03-04 22:36       ` Bakul Shah
  0 siblings, 0 replies; 6+ messages in thread
From: Bakul Shah @ 2018-03-04 22:36 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]



> On Mar 4, 2018, at 1:50 PM, Ralph Corderoy <ralph at inputplus.co.uk> wrote:
> 
> Hi Doug,
> 
> Bakul wrote:
>> I remembered reading about Knuth's line-breaking algorithm in Software
>> Practice & Experience in early eighties and being quite impressed with
>> it.  So may be that clear description of the algorithm has something
>> to do with it?  Ah, here it is:
>> 
>> “Breaking Paragraphs into lines” by Donald Knuth & Plass, SP&E, Volume
>> 11, issue 11, Nov. 1981
> 
> That's more to do with TeX looking at the whole paragraph when deciding
> where to split lines.  Hyphenation is part of that because a word might
> help out by being the ideal thing to split and have the rest of the
> lines sit easily in their length, but TeX's hyphenation algorithm is
> distinct again.
> 
> Ted Harding gives some background on the groff list back in 2001,
> https://lists.gnu.org/archive/html/groff/2001-03/msg00026.html
> but I expect groff used TeX's algorithm because it was published, could
> handle multiple languages, e.g. hyphen.us, and the data files were
> available to contort into what groff ended up using in its simplified
> TeX algorithm.
> 
>    $ cd /usr/share/groff/1.22.3/tmac
>    $ ls hyphen*
>    hyphen.den  hyphenex.cs hyphenex.us hyphen.sv   hyphen.us
>    hyphen.cs   hyphen.det  hyphenex.de hyphen.fr
>    $
> 
> They've comments explaining their content.
> 
> Werner Lemburg on the groff list probably knows for certain as he had to
> fathom all this out before becoming groff's excellent maintainer for
> many years.
> 
> -- 
> Cheers, Ralph.
> https://plus.google.com/+RalphCorderoy

Thanks Ralph and Toby. “Because it was clearly described and published”
was the point I was trying to make and should’ve stopped there : ). SP&E
article had made a strong impression on me and that is what I instantly
thought of. 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-04 22:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-04 20:23 [TUHS] [groff] The hyphenation algorithm produces wrong results Doug McIlroy
2018-03-04 20:42 ` Clem Cole
2018-03-04 21:00   ` Bakul Shah
2018-03-04 21:32     ` Toby Thain
2018-03-04 21:50     ` Ralph Corderoy
2018-03-04 22:36       ` Bakul Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).