ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Analyze hyphenations
@ 2021-04-18 21:29 denis.maier
  0 siblings, 0 replies; only message in thread
From: denis.maier @ 2021-04-18 21:29 UTC (permalink / raw)
  To: ntg-context


[-- Attachment #1.1: Type: text/plain, Size: 686 bytes --]

Hi,

a couple of weeks ago Hans has implemented a tracker that adds a list of hyphentated words to the log file. I finally managed to play with this. The result is attached to this mail.
This will permit analyzing hyphenations based on a ConTeXt log file. You need to enable hyphenation tracking in the ConTeXt file with \enabletrackers[hyphenation.applied]. Then, you can run the script with
lua check-hyphens.lua input_file whitelist.ending
As we assume .log for the input_file, the input file should be supplied without file ending. For the (optional) whitelist a file ending has to be supplied.

Maybe I should wikify this, any suggestion where that would fit?
HTH,
Denis

[-- Attachment #1.2: Type: text/html, Size: 2554 bytes --]

[-- Attachment #2: check-hyphens.lua --]
[-- Type: application/octet-stream, Size: 3537 bytes --]

-- check-hyphens.lua
--[[ 
    analyze hyphenations based on a ConTeXt log file
    enable hyphenation tracking in the ConTeXt file with
    \enabletrackers[hyphenation.applied]
    then run this script with
    lua check-hyphens.lua input_file whitelist.ending
    for the input_file we assume .log, so no need to add this
    for the whitelist a file ending has to be supplied
    the whitelist is optional
]] 

-- local pprint = require('pprint')

function main (input_file, whitelist_file)
    local lines = lines_from(input_file .. ".log")
    local whitelist = {}
    if whitelist_file == nil then
        whitelist = {}
    else 
        whitelist = lines_from(whitelist_file)
    end
    --pprint (lines)
    --pprint (whitelist)
    local filteredWordlist = filterHyphenationsWordlist
                (cleanLines
                    (getHyphenationLines(lines)), 
                    whitelist)
    -- pprint(filteredWordlist)
    saveResultsToFile(filteredWordlist, 'check-hyphens.log')
end

-- see if the file exists

-- http://lua-users.org/wiki/FileInputOutput

-- see if the file exists
function file_exists(file)
    local f = io.open(file, "rb")
    if f then f:close() end
    return f ~= nil
end
  
-- get all lines from a file, returns an empty 
-- list/table if the file does not exist
function lines_from(file)
    if not file_exists(file) then return {} end
    local lines = {}
    for line in io.lines(file) do 
        lines[#lines + 1] = line
    end
    return lines
end

-- String testing
function starts_with(str, start)
    return str:sub(1, #start) == start
end

-- get relevant lines
function getHyphenationLines(lines)
    local lines_with_hyphenations = {}
    for k,v in pairs(lines) do
        if 
            (starts_with(v, "hyphenated") 
            and not string.find(v, "start hyphenated words") 
            and not string.find(v, "stop hyphenated words"))
        then table.insert(lines_with_hyphenations, v) end
    end
    return lines_with_hyphenations
end

-- String cleaning
-- wrapper functions

function cleanLines (xs)
    local cleanedLines = {}
    for k,v in pairs(xs) do
        table.insert(cleanedLines, cleanLine(v))
    end
    return cleanedLines
end

function cleanLine (x)
    return removeTrailingPunctuation(getWord(x))
end

-- 1. Start reading at colon 
function getWord(x)
    -- wir lesen aber Zeichen 26
    return string.sub(x,26)
end

-- 2. Remove trailing punctuation
function removeTrailingPunctuation (x)
    if string.find(x, ',') then
        return x:sub(1, -2)
    else
        return x
    end
end

-- test if word is in second list
function inList (x, list)
    for k,v in ipairs(list) do
        if v == x then
            return true
        end
    end
    return nil
end

-- Filter hyphenated words based on second list (whitelist)
function filterHyphenationsWordlist (xs, list)
    local result = {}
    for k,v in ipairs(xs) do
        if not inList(v, list) then table.insert (result, v) end
    end
    return result
end

function saveResultsToFile(results, output_file)
    -- Opens a file in write mode
    output_file = io.open("check_hyphens.log", "w")
    -- sets the default output file as output_file
    io.output(output_file)
    -- iterate oiver 
    for k,v in ipairs(results) do
        io.write(v..'\n')
    end
    -- closes the open file
    io.close(output_file)
end

-- Run
main(arg[1], arg[2])

[-- Attachment #3: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-04-18 21:29 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-18 21:29 Analyze hyphenations denis.maier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).