ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Ligature suppression word list
@ 2021-04-03 15:06 denis.maier
  2021-04-03 15:20 ` Arthur Rosendahl
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: denis.maier @ 2021-04-03 15:06 UTC (permalink / raw)
  To: ntg-context


[-- Attachment #1.1: Type: text/plain, Size: 2460 bytes --]

Hi everyone

Now that Hans has implemented the new ligature suppression mechanism via language goodies - thanks again Hans! - we now need to come up with wordlists.

I've started working on a list of German words with ligatures that should be suppressed. The list is derived from the word list that comes with the lualatex selnolig package: https://github.com/micoloretan/selnolig/blob/master/selnolig-german-wordlist.tex

You can find the current list here : https://github.com/denismaier/context-nolig-wordlist

The list is currently organized as follows :


  1.  L.25-l.35: This specifies words where automatic pattern matching is more difficult than usually because the words contain multiple ligatures, some of which must be suppressed while others must be preserved. In the case of « Auflagefläche » it's even the same combination of letters. So here, we use the bar | to manually indicate points where no ligature must occur.
  2.  L. 36ff.: The vast amount of words is currently in that list that specifies words where a ff, fl, fi, ffi, or ffl ligature has to be broken up after the first f.
  3.  L.1804ff contain words where ffi, ffl, or fff ligatures have to be prevented after the second f, so the first two fs form a ligature.
  4.  The remaining blocks starting at L.1900, l. 2073, l. 2157, l. 2225, and l. 2277 suppress ligatures for « ft » and « fft »,  « fb » and « ffb », « fh » and « ffh», «fj» and «ffj», and «fk» and «ffk»

Obviously, that list is far from being complete, and the question is if it ever can be. Please have a look and feel free to propose more words to be included - either via mail or directly on github.

More generally, there's the question how such a list should be enhanced? I was thinking about two options:

  1.  The new language options features include a tracker that allows for tracking for which words in a given document ligature prevention happened, and which words haven't been touched by the mechanism. It should be possible to analyze the log file and to create lists of words with ligatures. Should be a rather simple step to derive new words for the ligature-suppression wordlist.
  2.  A bigger solution might be to use selnoligs patterns in a script that can be run over a large corpus, such as the DWDS (Digitales Wörterbuch der deutschen Sprache). That should produce us a more complete list of words where ligatures must be suppressed.

What do you think?

Best,
Denis

[-- Attachment #1.2: Type: text/html, Size: 12409 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-04-12 15:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.248.1617745098.1120.ntg-context@ntg.nl>
2021-04-07 18:19 ` Ligature suppression word list rha17
2021-04-08  8:52   ` denis.maier
2021-04-12 15:52   ` denis.maier
2021-04-03 15:06 denis.maier
2021-04-03 15:20 ` Arthur Rosendahl
2021-04-03 16:02   ` Hans Hagen
2021-04-08 19:37     ` Arthur Rosendahl
2021-04-08 20:51       ` Hans Hagen
2021-04-03 15:58 ` Hans Hagen
2021-04-06 14:59   ` denis.maier
2021-04-06 15:03   ` denis.maier
2021-04-03 16:03 ` Hans Hagen
2021-04-03 16:30   ` Thangalin
2021-04-03 16:43     ` Hans Hagen
2021-04-03 19:21       ` Thangalin
2021-04-03 16:42 ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).