From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31678
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: bapt a <auguieba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups: gmane.text.pandoc
Subject: Lua filter to automatically tag keywords for TeX indexing
Date: Wed, 2 Nov 2022 18:20:37 -0700 (PDT)
Message-ID: <7f570676-2876-4e29-a8c0-9a765617f141n@googlegroups.com>
Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_4452_199012712.1667438437801"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="16292"; mail-complaints-to="usenet@ciao.gmane.io"
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Original-X-From: pandoc-discuss+bncBDG3FYUYQUCBBZ5ORSNQMGQEMSXRLAQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Nov 03 02:20:42 2022
Return-path: <pandoc-discuss+bncBDG3FYUYQUCBBZ5ORSNQMGQEMSXRLAQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org
Original-Received: from mail-oi1-f185.google.com ([209.85.167.185])
	by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128)
	(Exim 4.92)
	(envelope-from <pandoc-discuss+bncBDG3FYUYQUCBBZ5ORSNQMGQEMSXRLAQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>)
	id 1oqOuQ-00042G-Cz
	for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 03 Nov 2022 02:20:42 +0100
Original-Received: by mail-oi1-f185.google.com with SMTP id o21-20020a544795000000b0035a2a65eb10sf281673oic.18
        for <gtp-pandoc-discuss@m.gmane-mx.org>; Wed, 02 Nov 2022 18:20:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20210112;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:reply-to:x-original-sender
         :mime-version:subject:message-id:to:from:date:sender:from:to:cc
         :subject:date:message-id:reply-to;
        bh=42fabHqn+JovfUTcq7r9ZJB8iMgMehw52iULwn5vosY=;
        b=G7nZ/50gwT64Ec3RRG/YYahJxMrwxrEdQaO0xqXUXiTb3lsrElE3vxK6I6DNCi1kqa
         jdYeyZMN5r8OCIEg0rK2Y/nlvbRSlBqMxrtVKixyjFjM8dFEgxx1UI2zv1g4lT+f1lqJ
         18480ZHDB3ZH/UW5i2MKWQuDoviRuC3W3dMD6DdUbD4FpKdd5lFlSUsLqGLveM9Kr29/
         h62shKghF+ymMOsnL4CEavn8Cd/NOPLWG7HcxUZXulSVSySDFqpkzzYlZDN9lnuHv/Sj
         L8QcXCgMIqWzbSU9p57XjKzsAnRHr6Khvrd2V1ClXMFPfuVjXsGXPvigIiXxWPujswT9
         z+sw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:reply-to:x-original-sender
         :mime-version:subject:message-id:to:from:date:from:to:cc:subject
         :date:message-id:reply-to;
        bh=42fabHqn+JovfUTcq7r9ZJB8iMgMehw52iULwn5vosY=;
        b=baD08XMoS13bjTSM+6JV6nXAXXEUZoxpFi2fRQRa+Of9HJLoe7W6mUnQ09ue2dxLeU
         X+JQKFDjy4cpmvrN3pnBERY5mdtDM8PoLPy/iyjKLq0Bnb7jyh4vuKAT8rrVjRoihqDT
         iGzp4ryUKV1Jw1K+J+nXHdQMB5eMmSRLu9a84BCnjmMDBi7zxjxudnYCuoHSIDGqtQZ6
         e+i5G0/qo16AMhyL6QFrubnnpgq5n17Ktq5ZGovNhE0+zZlq+8e2KhK9UlPJDSUbI7kL
         fyIW5J/9JlS8AR4RcUVqHsc3LPRiV2CxrwrxyUpBMF+G3yTdccOtCa7FtH5YnxkfDazM
         J2mA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to
         :x-original-sender:mime-version:subject:message-id:to:from:date
         :x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=42fabHqn+JovfUTcq7r9ZJB8iMgMehw52iULwn5vosY=;
        b=qKwtYZ0vDMstn/rjnkBDuvbKHvbPgpNEKw1rUwTGdB1sazpJcfMFTu7MnueGp+1Flz
         RdKsdnfhjV8UIsIUAuMEF6LdXAop9HTzBi3YloMPoGbKzPgMI2xYr1IkMX2Z7qx//cD0
         /uDo/T9iA8ftza3R3YF9CsbvR8c2fWup6980g5CSF3gc3N36uiBxvmYQBY+BXAsPzB5K
         GdUf0ykHgFFUcWV7LAa0I4nllWHtsgUp8r0ypRRkaIFHi6ZB2/k1pUHE8ImN1iVXYItF
         fBu/IhhQxF0NPB0sHEnAXX1p574z5rH8ZlwylVVUkSZZvkUIoeKdCaPhtBAbIvOwfIl9
         PpvA==
Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
X-Gm-Message-State: ACrzQf2FAaS4sdWwA4rNSPIDCExohRXGZN5HkBRG7aldlceeUFtUZ9UM
	lQLMwQojYV9ojAjCzb8qCM0=
X-Google-Smtp-Source: AMsMyM4W/4S1r8OFtRm0XYAlPOOm0UXq1T7d9scHDwTkoF2ZPyp/ElCw4RsuEdTf6U0sLs9qTMFkiw==
X-Received: by 2002:a05:6808:9b6:b0:359:eb1a:de7b with SMTP id e22-20020a05680809b600b00359eb1ade7bmr13169969oig.124.1667438441236;
        Wed, 02 Nov 2022 18:20:41 -0700 (PDT)
X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Original-Received: by 2002:a05:6870:5305:b0:132:4cb:dd6 with SMTP id
 j5-20020a056870530500b0013204cb0dd6ls248123oan.2.-pod-prod-gmail; Wed, 02 Nov
 2022 18:20:38 -0700 (PDT)
X-Received: by 2002:a05:6870:c182:b0:12a:e54e:c6e8 with SMTP id h2-20020a056870c18200b0012ae54ec6e8mr26342993oad.207.1667438438569;
        Wed, 02 Nov 2022 18:20:38 -0700 (PDT)
X-Original-Sender: auguieba-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Precedence: list
Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
List-ID: <pandoc-discuss.googlegroups.com>
X-Google-Group-Id: 1007024079513
List-Post: <https://groups.google.com/group/pandoc-discuss/post>, <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Help: <https://groups.google.com/support/>, <mailto:pandoc-discuss+help-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Archive: <https://groups.google.com/group/pandoc-discuss
List-Subscribe: <https://groups.google.com/group/pandoc-discuss/subscribe>, <mailto:pandoc-discuss+subscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Unsubscribe: <mailto:googlegroups-manage+1007024079513+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>,
 <https://groups.google.com/group/pandoc-discuss/subscribe>
Xref: news.gmane.io gmane.text.pandoc:31678
Archived-At: <http://permalink.gmane.org/gmane.text.pandoc/31678>

------=_Part_4452_199012712.1667438437801
Content-Type: multipart/alternative; 
	boundary="----=_Part_4453_1647693604.1667438437802"

------=_Part_4453_1647693604.1667438437802
Content-Type: text/plain; charset="UTF-8"

Hi all,

I've started writing a technical book using Quarto markdown, which uses 
pandoc with Lua filters under the hood to produce a website as well as the 
publisher's pdf format (via LaTeX). 
I quite like to keep the source document as plain as possible, and I'm 
wondering if I could avoid the use of [concept]{.index}, which gets turned 
into \index{concept}, and instead write a Lua filter with my custom list of 
keywords, and have pandoc automatically match them as they appear in the 
text. 
As a proof of principle I wrote the following code (see below), which 
matches specific keywords, and reformats them as small-caps. I quickly 
realised that trailing punctuation, such as "concept, ..." will fail to 
match, so I'm using gsub to strip such punctuation before matching. It 
works, but I'm a bit worried:

- what's the overhead of such a filter, in practice? From what I 
understand, every single string element in the AST will be processed by 
gsub then tested for a match. Are Lua filters walking down the AST fast 
enough that I shouldn't worry about it? (as far as I can tell on small 
examples, it seems fine)

- assuming this idea is reasonable, I might want to do a few similar 
operations, e.g. reformatting program languages (as in this example code), 
wrapping keywords in \index{}, etc., and the exact format will often depend 
on the output target (html vs TeX etc.). Is there a better construct for 
this than successive if/else statements to look for matches? (I don't know 
much Lua)

Best regards,

baptiste

Lua filter:
-----

local text = require 'text'
local pandoc = require 'pandoc'

-- keywords to look for in the document
local langs = {"Matlab", "R", "Julia", "C++"}

function Includes(tab, val)
-- strip trailing punctuation before matching
local bare = string.gsub(val,"[%.|,|;|:]", "")

for index, value in ipairs(tab) do
if value == bare then
return true
end
end

return false
end

function Replace_langname(elem)
if Includes(langs, elem.text) then
return pandoc.SmallCaps(text.lower(elem.text))
else
return elem
end
end

return {{Str = Replace_langname}}


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7f570676-2876-4e29-a8c0-9a765617f141n%40googlegroups.com.

------=_Part_4453_1647693604.1667438437802
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div>Hi all,</div><div><br></div><div>I've started writing a technical book=
 using Quarto markdown, which uses pandoc with Lua filters under the hood t=
o produce a website as well as the publisher's pdf format (via LaTeX). <br>=
</div><div>I quite like to keep the source document as plain as possible, a=
nd I'm wondering if I could avoid the use of [concept]{.index}, which gets =
turned into \index{concept}, and instead write a Lua filter with my custom =
list of keywords, and have pandoc automatically match them as they appear i=
n the text. <br></div><div>As a proof of principle I wrote the following co=
de (see below), which matches specific keywords, and reformats them as smal=
l-caps. I quickly realised that trailing punctuation, such as "concept, ...=
" will fail to match, so I'm using gsub to strip such punctuation before ma=
tching. It works, but I'm a bit worried:</div><div><br></div><div>- what's =
the overhead of such a filter, in practice? From what I understand, every s=
ingle string element in the AST will be processed by gsub then tested for a=
 match. Are Lua filters walking down the AST fast enough that I shouldn't w=
orry about it? (as far as I can tell on small examples, it seems fine)</div=
><div><br></div><div>- assuming this idea is reasonable, I might want to do=
 a few similar operations, e.g. reformatting program languages (as in this =
example code), wrapping keywords in \index{}, etc., and the exact format wi=
ll often depend on the output target (html vs TeX etc.). Is there a better =
construct for this than successive if/else statements to look for matches? =
(I don't know much Lua)<br></div><div><br></div><div>Best regards,</div><di=
v><br></div><div>baptiste</div><div><br></div><div><div></div><div>Lua filt=
er:</div></div><div>-----</div><div><br></div><div><div style=3D"font-weigh=
t: normal;"><div><span>local</span><span> </span><span>text</span><span> =
=3D </span><span>require</span><span> </span><span>'text'</span></div><div>=
<span>local</span><span> </span><span>pandoc</span><span> =3D </span><span>=
require</span><span> </span><span>'pandoc'</span></div><br><div><span>-- ke=
ywords to look for in the document</span></div><div><span>local</span><span=
> </span><span>langs</span><span> =3D {</span><span>"Matlab"</span><span>, =
</span><span>"R"</span><span>, </span><span>"Julia"</span><span>, </span><s=
pan>"C++"</span><span>}</span></div><br><div><span>function</span><span> </=
span><span>Includes</span><span>(</span><span>tab</span><span>, </span><spa=
n>val</span><span>)</span></div><div><span>  </span><span>-- strip trailing=
 punctuation before matching</span></div><div><span>  </span><span>local</s=
pan><span> </span><span>bare</span><span> =3D </span><span>string</span><sp=
an>.</span><span>gsub</span><span>(</span><span>val</span><span>,</span><sp=
an>"[%.|,|;|:]"</span><span>, </span><span>""</span><span>)</span></div><br=
><div><span>  </span><span>for</span><span> </span><span>index</span><span>=
, </span><span>value</span><span> </span><span>in</span><span> </span><span=
>ipairs</span><span>(</span><span>tab</span><span>) </span><span>do</span><=
/div><div><span>      </span><span>if</span><span> </span><span>value</span=
><span> =3D=3D </span><span>bare</span><span> </span><span>then</span></div=
><div><span>          </span><span>return</span><span> </span><span>true</s=
pan></div><div><span>      </span><span>end</span></div><div><span>  </span=
><span>end</span></div><br><div><span>  </span><span>return</span><span> </=
span><span>false</span></div><div><span>end</span></div><br><div><span>func=
tion</span><span> </span><span>Replace_langname</span><span>(</span><span>e=
lem</span><span>)</span></div><div><span>      </span><span>if</span><span>=
 </span><span>Includes</span><span>(</span><span>langs</span><span>, </span=
><span>elem</span><span>.</span><span>text</span><span>) </span><span>then<=
/span></div><div><span>        </span><span>return</span><span> </span><spa=
n>pandoc</span><span>.</span><span>SmallCaps</span><span>(</span><span>text=
</span><span>.</span><span>lower</span><span>(</span><span>elem</span><span=
>.</span><span>text</span><span>))</span></div><div><span>        </span><s=
pan>else</span></div><div><span>        </span><span>return</span><span> </=
span><span>elem</span></div><div><span>      </span><span>end</span></div><=
div><span>    </span><span>end</span></div><br><div><span>return</span><spa=
n> {{</span><span>Str</span><span> =3D </span><span>Replace_langname</span>=
<span>}}</span></div><br><br><br><br></div></div><div><br></div><div><br></=
div><div><br></div><div><br></div><div><br></div><div><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;pandoc-discuss&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org">pand=
oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/pandoc-discuss/7f570676-2876-4e29-a8c0-9a765617f141n%40googlegro=
ups.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d=
/msgid/pandoc-discuss/7f570676-2876-4e29-a8c0-9a765617f141n%40googlegroups.=
com</a>.<br />

------=_Part_4453_1647693604.1667438437802--

------=_Part_4452_199012712.1667438437801--