From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31678 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: bapt a Newsgroups: gmane.text.pandoc Subject: Lua filter to automatically tag keywords for TeX indexing Date: Wed, 2 Nov 2022 18:20:37 -0700 (PDT) Message-ID: <7f570676-2876-4e29-a8c0-9a765617f141n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_4452_199012712.1667438437801" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16292"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDG3FYUYQUCBBZ5ORSNQMGQEMSXRLAQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Nov 03 02:20:42 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f185.google.com ([209.85.167.185]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oqOuQ-00042G-Cz for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 03 Nov 2022 02:20:42 +0100 Original-Received: by mail-oi1-f185.google.com with SMTP id o21-20020a544795000000b0035a2a65eb10sf281673oic.18 for ; Wed, 02 Nov 2022 18:20:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:message-id:to:from:date:sender:from:to:cc :subject:date:message-id:reply-to; bh=42fabHqn+JovfUTcq7r9ZJB8iMgMehw52iULwn5vosY=; b=G7nZ/50gwT64Ec3RRG/YYahJxMrwxrEdQaO0xqXUXiTb3lsrElE3vxK6I6DNCi1kqa jdYeyZMN5r8OCIEg0rK2Y/nlvbRSlBqMxrtVKixyjFjM8dFEgxx1UI2zv1g4lT+f1lqJ 18480ZHDB3ZH/UW5i2MKWQuDoviRuC3W3dMD6DdUbD4FpKdd5lFlSUsLqGLveM9Kr29/ h62shKghF+ymMOsnL4CEavn8Cd/NOPLWG7HcxUZXulSVSySDFqpkzzYlZDN9lnuHv/Sj L8QcXCgMIqWzbSU9p57XjKzsAnRHr6Khvrd2V1ClXMFPfuVjXsGXPvigIiXxWPujswT9 z+sw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:message-id:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=42fabHqn+JovfUTcq7r9ZJB8iMgMehw52iULwn5vosY=; b=baD08XMoS13bjTSM+6JV6nXAXXEUZoxpFi2fRQRa+Of9HJLoe7W6mUnQ09ue2dxLeU X+JQKFDjy4cpmvrN3pnBERY5mdtDM8PoLPy/iyjKLq0Bnb7jyh4vuKAT8rrVjRoihqDT iGzp4ryUKV1Jw1K+J+nXHdQMB5eMmSRLu9a84BCnjmMDBi7zxjxudnYCuoHSIDGqtQZ6 e+i5G0/qo16AMhyL6QFrubnnpgq5n17Ktq5ZGovNhE0+zZlq+8e2KhK9UlPJDSUbI7kL fyIW5J/9JlS8AR4RcUVqHsc3LPRiV2CxrwrxyUpBMF+G3yTdccOtCa7FtH5YnxkfDazM J2mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:message-id:to:from:date :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=42fabHqn+JovfUTcq7r9ZJB8iMgMehw52iULwn5vosY=; b=qKwtYZ0vDMstn/rjnkBDuvbKHvbPgpNEKw1rUwTGdB1sazpJcfMFTu7MnueGp+1Flz RdKsdnfhjV8UIsIUAuMEF6LdXAop9HTzBi3YloMPoGbKzPgMI2xYr1IkMX2Z7qx//cD0 /uDo/T9iA8ftza3R3YF9CsbvR8c2fWup6980g5CSF3gc3N36uiBxvmYQBY+BXAsPzB5K GdUf0ykHgFFUcWV7LAa0I4nllWHtsgUp8r0ypRRkaIFHi6ZB2/k1pUHE8ImN1iVXYItF fBu/IhhQxF0NPB0sHEnAXX1p574z5rH8ZlwylVVUkSZZvkUIoeKdCaPhtBAbIvOwfIl9 PpvA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf2FAaS4sdWwA4rNSPIDCExohRXGZN5HkBRG7aldlceeUFtUZ9UM lQLMwQojYV9ojAjCzb8qCM0= X-Google-Smtp-Source: AMsMyM4W/4S1r8OFtRm0XYAlPOOm0UXq1T7d9scHDwTkoF2ZPyp/ElCw4RsuEdTf6U0sLs9qTMFkiw== X-Received: by 2002:a05:6808:9b6:b0:359:eb1a:de7b with SMTP id e22-20020a05680809b600b00359eb1ade7bmr13169969oig.124.1667438441236; Wed, 02 Nov 2022 18:20:41 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6870:5305:b0:132:4cb:dd6 with SMTP id j5-20020a056870530500b0013204cb0dd6ls248123oan.2.-pod-prod-gmail; Wed, 02 Nov 2022 18:20:38 -0700 (PDT) X-Received: by 2002:a05:6870:c182:b0:12a:e54e:c6e8 with SMTP id h2-20020a056870c18200b0012ae54ec6e8mr26342993oad.207.1667438438569; Wed, 02 Nov 2022 18:20:38 -0700 (PDT) X-Original-Sender: auguieba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31678 Archived-At: ------=_Part_4452_199012712.1667438437801 Content-Type: multipart/alternative; boundary="----=_Part_4453_1647693604.1667438437802" ------=_Part_4453_1647693604.1667438437802 Content-Type: text/plain; charset="UTF-8" Hi all, I've started writing a technical book using Quarto markdown, which uses pandoc with Lua filters under the hood to produce a website as well as the publisher's pdf format (via LaTeX). I quite like to keep the source document as plain as possible, and I'm wondering if I could avoid the use of [concept]{.index}, which gets turned into \index{concept}, and instead write a Lua filter with my custom list of keywords, and have pandoc automatically match them as they appear in the text. As a proof of principle I wrote the following code (see below), which matches specific keywords, and reformats them as small-caps. I quickly realised that trailing punctuation, such as "concept, ..." will fail to match, so I'm using gsub to strip such punctuation before matching. It works, but I'm a bit worried: - what's the overhead of such a filter, in practice? From what I understand, every single string element in the AST will be processed by gsub then tested for a match. Are Lua filters walking down the AST fast enough that I shouldn't worry about it? (as far as I can tell on small examples, it seems fine) - assuming this idea is reasonable, I might want to do a few similar operations, e.g. reformatting program languages (as in this example code), wrapping keywords in \index{}, etc., and the exact format will often depend on the output target (html vs TeX etc.). Is there a better construct for this than successive if/else statements to look for matches? (I don't know much Lua) Best regards, baptiste Lua filter: ----- local text = require 'text' local pandoc = require 'pandoc' -- keywords to look for in the document local langs = {"Matlab", "R", "Julia", "C++"} function Includes(tab, val) -- strip trailing punctuation before matching local bare = string.gsub(val,"[%.|,|;|:]", "") for index, value in ipairs(tab) do if value == bare then return true end end return false end function Replace_langname(elem) if Includes(langs, elem.text) then return pandoc.SmallCaps(text.lower(elem.text)) else return elem end end return {{Str = Replace_langname}} -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7f570676-2876-4e29-a8c0-9a765617f141n%40googlegroups.com. ------=_Part_4453_1647693604.1667438437802 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,

I've started writing a technical book= using Quarto markdown, which uses pandoc with Lua filters under the hood t= o produce a website as well as the publisher's pdf format (via LaTeX).
=
I quite like to keep the source document as plain as possible, a= nd I'm wondering if I could avoid the use of [concept]{.index}, which gets = turned into \index{concept}, and instead write a Lua filter with my custom = list of keywords, and have pandoc automatically match them as they appear i= n the text.
As a proof of principle I wrote the following co= de (see below), which matches specific keywords, and reformats them as smal= l-caps. I quickly realised that trailing punctuation, such as "concept, ...= " will fail to match, so I'm using gsub to strip such punctuation before ma= tching. It works, but I'm a bit worried:

- what's = the overhead of such a filter, in practice? From what I understand, every s= ingle string element in the AST will be processed by gsub then tested for a= match. Are Lua filters walking down the AST fast enough that I shouldn't w= orry about it? (as far as I can tell on small examples, it seems fine)

- assuming this idea is reasonable, I might want to do= a few similar operations, e.g. reformatting program languages (as in this = example code), wrapping keywords in \index{}, etc., and the exact format wi= ll often depend on the output target (html vs TeX etc.). Is there a better = construct for this than successive if/else statements to look for matches? = (I don't know much Lua)

Best regards,

baptiste

Lua filt= er:
-----

local text = =3D require 'text'
= local pandoc =3D = require 'pandoc'

-- ke= ywords to look for in the document
local langs =3D {"Matlab", = "R", "Julia", "C++"}

function Includes(tab, val)
-- strip trailing= punctuation before matching
local bare =3D string.gsub(val,"[%.|,|;|:]", "")
for index= , value in ipairs(tab) do<= /div>
if value =3D=3D bare then
return true
end
end

return false
end

func= tion Replace_langname(e= lem)
if= Includes(langs, elem.text) then<= /span>
return pandoc.SmallCaps(text= .lower(elem.text))
else
return elem
end
<= div> end

return {{Str =3D Replace_langname= }}










--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/7f570676-2876-4e29-a8c0-9a765617f141n%40googlegroups.= com.
------=_Part_4453_1647693604.1667438437802-- ------=_Part_4452_199012712.1667438437801--