From: Hans Hagen via ntg-context <ntg-context@ntg.nl>
To: denis.maier@unibe.ch,
mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: Hans Hagen <j.hagen@freedom.nl>
Subject: [NTG-context] Re: Tracker for hyphens at the end of lines
Date: Wed, 9 Aug 2023 14:17:30 +0200 [thread overview]
Message-ID: <25fc1fa2-0da0-02d9-5053-2763643b24f0@freedom.nl> (raw)
In-Reply-To: <ZRAP278MB0495AAD71C1AFB6AB0E121BB8312A@ZRAP278MB0495.CHEP278.PROD.OUTLOOK.COM>
On 8/9/2023 12:10 PM, denis.maier@unibe.ch wrote:
> Keith, you can also check hyphenations using a script:
>
> -- check-hyphens.lua
> --[[
> analyze hyphenations based on a ConTeXt log file
> enable hyphenation tracking in the ConTeXt file with
> \enabletrackers[hyphenation.applied]
> then run this script with
> lua check-hyphens.lua input_file whitelist.ending
> for the input_file we assume .log, so no need to add this
> for the whitelist a file ending has to be supplied
> the whitelist is optional
> ]]
>
> -- local lines = string.splitlines(io.loaddata("oeps.tex")or "") or { }
>
> -- local pprint = require('pprint')
>
> function main (input_file, whitelist_file)
> local lines = lines_from(input_file .. ".log")
> local whitelist = {}
> if whitelist_file == nil then
> whitelist = {}
> else
> whitelist = lines_from(whitelist_file)
> end
> --pprint (lines)
> --pprint (whitelist)
> local filteredWordlist = filterHyphenationsWordlist
> (cleanLines
> (getHyphenationLines(lines)),
> whitelist)
> -- pprint(filteredWordlist)
> saveResultsToFile(filteredWordlist, 'check-hyphens.log')
> end
>
> -- see if the file exists
>
> -- http://lua-users.org/wiki/FileInputOutput
>
> -- see if the file exists
> function file_exists(file)
> local f = io.open(file, "rb")
> if f then f:close() end
> return f ~= nil
> end
>
> -- get all lines from a file, returns an empty
> -- list/table if the file does not exist
> function lines_from(file)
> if not file_exists(file) then return {} end
> local lines = {}
> for line in io.lines(file) do
> lines[#lines + 1] = line
> end
> return lines
> end
>
> -- String testing
> function starts_with(str, start)
> return str:sub(1, #start) == start
> end
>
> -- get relevant lines
> function getHyphenationLines(lines)
> local lines_with_hyphenations = {}
> for k,v in pairs(lines) do
> if
> (starts_with(v, "hyphenated")
> and not string.find(v, "start hyphenated words")
> and not string.find(v, "stop hyphenated words"))
> then table.insert(lines_with_hyphenations, v) end
> end
> return lines_with_hyphenations
> end
>
> -- String cleaning
> -- wrapper functions
>
> function cleanLines (xs)
> local cleanedLines = {}
> for k,v in pairs(xs) do
> table.insert(cleanedLines, cleanLine(v))
> end
> return cleanedLines
> end
>
> function cleanLine (x)
> return removeTrailingPunctuation(getWord(x))
> end
>
> -- 1. Start reading at colon
> function getWord(x)
> -- wir lesen aber Zeichen 26
> return string.sub(x,26)
> end
>
> -- 2. Remove trailing punctuation
> function removeTrailingPunctuation (x)
> if string.find(x, ',') then
> return x:sub(1, -2)
> else
> return x
> end
> end
>
> -- test if word is in second list
> function inList (x, list)
> for k,v in ipairs(list) do
> if v == x then
> return true
> end
> end
> return nil
> end
>
> -- Filter hyphenated words based on second list (whitelist)
> function filterHyphenationsWordlist (xs, list)
> local result = {}
> for k,v in ipairs(xs) do
> if not inList(v, list) then table.insert (result, v) end
> end
> return result
> end
>
> function saveResultsToFile(results, output_file)
> -- Opens a file in write mode
> output_file = io.open("check_hyphens.log", "w")
> -- sets the default output file as output_file
> io.output(output_file)
> -- iterate oiver
> for k,v in ipairs(results) do
> io.write(v..'\n')
> end
> -- closes the open file
> io.close(output_file)
> end
>
> -- Run
> main(arg[1], arg[2])
Ok, a little lua lesson, if you don't mind.
---- xxx.tex ----
\enabletrackers[hyphenation.applied]
\starttext
\input tufte
\stoptext
---- xxx.tmp ----
re-fine
---- xxx.lua ----
local function check(logname,whitename)
if not logname then
return
end
local data = io.loaddata(logname) or ""
if data == "" then
return
end
local blob = string.match(data,"start hyphenated words(.-)stop
hyphenated words")
if not blob then
return
end
local white = table.tohash(string.splitlines(whitename and
io.loaddata(whitename) or ""))
for n, s in string.gmatch(blob,"(%d+) *: (%S+)") do
if white[s] then
-- were good
else
print(n,s)
end
end
end
check(environment.files[1],environment.files[2])
-- print("TEST 1")
-- check("xxx.log")
-- print("TEST 2")
-- check("xxx.log","xxx.tmp")
-------------------
>mtxrun --script xxx xxx.log
1 dis-tinguish
1 harmo-nize
1 re-fine
>mtxrun --script xxx xxx.log xxx.tmp
1 dis-tinguish
1 harmo-nize
That said, i wonder if we should add the filename, just in case one
includes 20 files and a whitelist could be an option to the tracker.
Now the good news is that the tracker is actually already a bit more
clever. After a run you will see
xxx-hyphenation-new.lua
that has the hyphenated words (not the numbers)
and you can make a whitelist
xxx-hyphenation-old.lua
in which case you only get the new ones.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage : https://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : https://contextgarden.net
___________________________________________________________________________________
prev parent reply other threads:[~2023-08-09 12:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 14:54 [NTG-context] " Keith McKay
2023-08-01 17:10 ` [NTG-context] " Hans Hagen via ntg-context
2023-08-01 18:22 ` Keith McKay
2023-08-09 10:10 ` denis.maier
2023-08-09 12:17 ` Hans Hagen via ntg-context [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25fc1fa2-0da0-02d9-5053-2763643b24f0@freedom.nl \
--to=ntg-context@ntg.nl \
--cc=denis.maier@unibe.ch \
--cc=j.hagen@freedom.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).