From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 3057 invoked from network); 9 Aug 2023 12:19:08 -0000 Received: from cgl.ntg.nl (5.39.185.202) by inbox.vuxu.org with ESMTPUTF8; 9 Aug 2023 12:19:08 -0000 Received: from localhost (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id E0F2E4823DC for ; Wed, 9 Aug 2023 14:18:37 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at cgl.ntg.nl Received: from cgl.ntg.nl ([127.0.0.1]) by localhost (cgl.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j1LNwe3vkdjt for ; Wed, 9 Aug 2023 14:18:37 +0200 (CEST) Received: from cgl.ntg.nl (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id CEA5548227E for ; Wed, 9 Aug 2023 14:17:55 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id A07A148154B for ; Wed, 9 Aug 2023 14:17:35 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at cgl.ntg.nl Received: from cgl.ntg.nl ([127.0.0.1]) by localhost (cgl.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hoHo3a9xCqMJ for ; Wed, 9 Aug 2023 14:17:35 +0200 (CEST) Received: from outbound.soverin.net (outbound.soverin.net [185.233.34.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by cgl.ntg.nl (Postfix) with ESMTPS id 08FE4481549 for ; Wed, 9 Aug 2023 14:17:34 +0200 (CEST) Received: from smtp.freedom.nl (c04smtp-lb01.int.sover.in [10.10.4.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id 4RLTb26fMsz6s; Wed, 9 Aug 2023 12:17:34 +0000 (UTC) Received: from smtp.freedom.nl (smtp.freedom.nl [10.10.4.108]) by freedom.nl (Postfix) with ESMTPSA id 4RLTb20Mjfz6L; Wed, 9 Aug 2023 12:17:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=freedom.nl; s=default; t=1691583454; bh=Dmpw5RpwFTYkI3ANw+Gos6G+ABr/TINCHpV/Q7yMKvk=; h=Date:Subject:To:References:From:In-Reply-To:From; b=IETVFvGb6p1QD5Ku+r96jzCWQuocjOmkEdRraNLlgE73aKmrMPiB96ZwuGek52hKM Iov/3q2ubiDZ1jYr91dZbFzNVgac6GbGIIrs9YCbR/x5OaCXoLKWKVgRIy5yZxH5KI kayyoq5Dw4Ho3ioq5sY/BM/UV+YqGMMpMl1806o8= ARC-Seal: i=1; s=default; d=freedom.nl; t=1691583454; a=rsa-sha256; cv=none; b=EJKCbJIh7IJDWYi8qvFzaPgkDpRVkW7a+ALE7X63wnzX74GPWRmN4IE7agPAD0i4SUIEu2 IkUZsH2elTReuANPsahI7DOtFvxjEWL1AnpF3feK9sTgL6wPAIx9/K8ehA3eUEjw9QYQgW ZuEJGid46aPl6j3YmCzdLjQ0Mn20Zjs= ARC-Authentication-Results: i=1; smtp.freedom.nl; auth=pass smtp.mailfrom=j.hagen@freedom.nl ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freedom.nl; s=default; t=1691583454; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QrZqDArQSl4Kpycwl5OqSmA72AhYEDtUPQ3LOaZ4tEo=; b=ZI7a/SqlmBRL/7xrKrBV3qcTeuckQ/2iSFvz8evaT5m9fPNXEKuQt53RpvCFyDUlwuilLV 4LVTl9OUjtrLv1xQ7BE7PgTcLNywgwik9u7ClhF0xlQF41I3ht1g96/D59jQsv/7TJyFvJ YADdxBdx6UBv0DsQSnXQb2oopCy5xPo= Message-ID: <25fc1fa2-0da0-02d9-5053-2763643b24f0@freedom.nl> Date: Wed, 9 Aug 2023 14:17:30 +0200 MIME-Version: 1.0 Content-Language: en-US To: denis.maier@unibe.ch, mailing list for ConTeXt users References: <359cd8b6-455a-c7d6-82c1-013049bde319@freedom.nl> <47bff55e-1d67-bb7d-07c8-edf1f8178743@gmail.com> X-Soverin-Authenticated: true In-Reply-To: X-CMAE-Score: 0 X-CMAE-Analysis: v=2.4 cv=PfY5xAtd c=1 sm=1 tr=0 ts=64d383de a=yaeGIPywLLJyAsH1IMTGEA==:117 a=yaeGIPywLLJyAsH1IMTGEA==:17 a=IkcTkHD0fZMA:10 a=AjNtXHXTAAAA:8 a=YEMqx4UAAAAA:8 a=Uows8UApAAAA:8 a=Jiv3W2VnD7wB4Lh8UBsA:9 a=QEXdDO2ut3YA:10 a=qn196Gr8otgDUSxXUngw:22 a=V0662LiR8DSfwiDagK97:22 a=oX0v64vK4IwbpfTCMRm2:22 X-Cloudmark-Reporter: Vi4jbLBhQCMmvOObZds78TlDouc= Message-ID-Hash: NMU43KJWN2KBKTRJ622R27JXTDQOE3GA X-Message-ID-Hash: NMU43KJWN2KBKTRJ622R27JXTDQOE3GA X-MailFrom: j.hagen@freedom.nl X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list Reply-To: mailing list for ConTeXt users Subject: [NTG-context] Re: Tracker for hyphens at the end of lines List-Id: mailing list for ConTeXt users Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Hans Hagen via ntg-context Cc: Hans Hagen Content-Type: text/plain; charset="us-ascii"; format="flowed" Content-Transfer-Encoding: 7bit On 8/9/2023 12:10 PM, denis.maier@unibe.ch wrote: > Keith, you can also check hyphenations using a script: > > -- check-hyphens.lua > --[[ > analyze hyphenations based on a ConTeXt log file > enable hyphenation tracking in the ConTeXt file with > \enabletrackers[hyphenation.applied] > then run this script with > lua check-hyphens.lua input_file whitelist.ending > for the input_file we assume .log, so no need to add this > for the whitelist a file ending has to be supplied > the whitelist is optional > ]] > > -- local lines = string.splitlines(io.loaddata("oeps.tex")or "") or { } > > -- local pprint = require('pprint') > > function main (input_file, whitelist_file) > local lines = lines_from(input_file .. ".log") > local whitelist = {} > if whitelist_file == nil then > whitelist = {} > else > whitelist = lines_from(whitelist_file) > end > --pprint (lines) > --pprint (whitelist) > local filteredWordlist = filterHyphenationsWordlist > (cleanLines > (getHyphenationLines(lines)), > whitelist) > -- pprint(filteredWordlist) > saveResultsToFile(filteredWordlist, 'check-hyphens.log') > end > > -- see if the file exists > > -- http://lua-users.org/wiki/FileInputOutput > > -- see if the file exists > function file_exists(file) > local f = io.open(file, "rb") > if f then f:close() end > return f ~= nil > end > > -- get all lines from a file, returns an empty > -- list/table if the file does not exist > function lines_from(file) > if not file_exists(file) then return {} end > local lines = {} > for line in io.lines(file) do > lines[#lines + 1] = line > end > return lines > end > > -- String testing > function starts_with(str, start) > return str:sub(1, #start) == start > end > > -- get relevant lines > function getHyphenationLines(lines) > local lines_with_hyphenations = {} > for k,v in pairs(lines) do > if > (starts_with(v, "hyphenated") > and not string.find(v, "start hyphenated words") > and not string.find(v, "stop hyphenated words")) > then table.insert(lines_with_hyphenations, v) end > end > return lines_with_hyphenations > end > > -- String cleaning > -- wrapper functions > > function cleanLines (xs) > local cleanedLines = {} > for k,v in pairs(xs) do > table.insert(cleanedLines, cleanLine(v)) > end > return cleanedLines > end > > function cleanLine (x) > return removeTrailingPunctuation(getWord(x)) > end > > -- 1. Start reading at colon > function getWord(x) > -- wir lesen aber Zeichen 26 > return string.sub(x,26) > end > > -- 2. Remove trailing punctuation > function removeTrailingPunctuation (x) > if string.find(x, ',') then > return x:sub(1, -2) > else > return x > end > end > > -- test if word is in second list > function inList (x, list) > for k,v in ipairs(list) do > if v == x then > return true > end > end > return nil > end > > -- Filter hyphenated words based on second list (whitelist) > function filterHyphenationsWordlist (xs, list) > local result = {} > for k,v in ipairs(xs) do > if not inList(v, list) then table.insert (result, v) end > end > return result > end > > function saveResultsToFile(results, output_file) > -- Opens a file in write mode > output_file = io.open("check_hyphens.log", "w") > -- sets the default output file as output_file > io.output(output_file) > -- iterate oiver > for k,v in ipairs(results) do > io.write(v..'\n') > end > -- closes the open file > io.close(output_file) > end > > -- Run > main(arg[1], arg[2]) Ok, a little lua lesson, if you don't mind. ---- xxx.tex ---- \enabletrackers[hyphenation.applied] \starttext \input tufte \stoptext ---- xxx.tmp ---- re-fine ---- xxx.lua ---- local function check(logname,whitename) if not logname then return end local data = io.loaddata(logname) or "" if data == "" then return end local blob = string.match(data,"start hyphenated words(.-)stop hyphenated words") if not blob then return end local white = table.tohash(string.splitlines(whitename and io.loaddata(whitename) or "")) for n, s in string.gmatch(blob,"(%d+) *: (%S+)") do if white[s] then -- were good else print(n,s) end end end check(environment.files[1],environment.files[2]) -- print("TEST 1") -- check("xxx.log") -- print("TEST 2") -- check("xxx.log","xxx.tmp") ------------------- >mtxrun --script xxx xxx.log 1 dis-tinguish 1 harmo-nize 1 re-fine >mtxrun --script xxx xxx.log xxx.tmp 1 dis-tinguish 1 harmo-nize That said, i wonder if we should add the filename, just in case one includes 20 files and a whitelist could be an option to the tracker. Now the good news is that the tracker is actually already a bit more clever. After a run you will see xxx-hyphenation-new.lua that has the hyphenated words (not the numbers) and you can make a whitelist xxx-hyphenation-old.lua in which case you only get the new ones. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context webpage : https://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : https://contextgarden.net ___________________________________________________________________________________