From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31680 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Lua filter to automatically tag keywords for TeX indexing Date: Thu, 3 Nov 2022 19:18:20 +0000 Message-ID: References: <7f570676-2876-4e29-a8c0-9a765617f141n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16218"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDCINCES2QJRB75HSCNQMGQEI7D32AQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Nov 03 20:18:28 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wm1-f56.google.com ([209.85.128.56]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oqfjP-0003vd-BZ for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 03 Nov 2022 20:18:27 +0100 Original-Received: by mail-wm1-f56.google.com with SMTP id 189-20020a1c02c6000000b003cf8e70c1ecsf277441wmc.4 for ; Thu, 03 Nov 2022 12:18:27 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1667503106; cv=pass; d=google.com; s=arc-20160816; b=Qdh0eZUOMCSDiRfJgiLJE73Ul9kuYwuuqxcNDxh/q0JAwItgQLzrwMvEGGseUvAInf jWICSXVb2fyDHTjEFFEkgDwiMJHIRKZPJz9YLTMIYYQmcPAlFkFYxu3f3aGbsAIO/jcX 3WGZc4zru8wh51pUvdXYHPhclTjKh15cGRktEivuTGXbPGQbnTu6UKGH+97yNNq7sXE2 6VX+OUXBl+vTSjR3s7gUk8Ai7/Q0s7n7C0mc+i726LRF4XjEEzZ1tX5bJg4125BOVpLD zW1XCT6cnoDXNRsemg7Vn29mkbglU281wnzBM5D/y8eTzJrvYgzq4wQ6DHQTJt1JZYY+ mnHA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=A+Cyu2a/iwFeaIQ+n8Vh0DdrvkqBm1R2njc7JziXUK0=; b=EJjaTZy2B0NL9RidRvBS7C2lmdygtNA5ehqTLfuY4MV2aNCrPQ/4Q7A8laBsWmS4CU gdVJkQrVH8RSiQb+CHUf9xtYmTCzDrmMp8kt3BsJ9QYhggpsnzL2VIdQomaX7H58S+ip OzyNECyIqxMj6wPzCfnylfVA8jyIX+9Hh9Njjad/9O6OKaeRR6HQN02hxvpykzjSocPd dRLMgpUrLfi8lnjYjhF4xCtMZjdaqfHDmWZ0/2dMm+3YEFj/Lb1NAroGN8kWl85sirkG 8LO7yh+/6T2Rn+QYxAvYE24RMfnuXKPFtB+AAsGjxRahjD5CgRqMgOhvaeHGSEBOEZF4 +8zg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=YksrHh4X; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:from:to:cc :subject:date:message-id:reply-to; bh=A+Cyu2a/iwFeaIQ+n8Vh0DdrvkqBm1R2njc7JziXUK0=; b=F2l9vKjdo7qu9eH+Bgc+eG6zgFpn1yqm5KjlED0q+2ZtfEv2f8XFlgZQ+ZqcWhAGFG 4podGfRS2sMt5FxqRRQatS2JDDbmYNfbmbc72ppVFnK/m0MKKjLqoC5AEHhrGeQRpsn4 V1GqH5ioH6w5KIpmnkHjraZnl1UqKwME+jh1PnwxMsqPvYHv4yOj7of7Lq7djdpdRoHD vKVmqUEkRi0M3WUf5RlBWtepr/yICZGf3JHERp47z1+cG6QPMPK97IZbhfHPXhoXFX5L mh83y3QX//oVFZHtf8JD9RJAFhBhCOq+9OXsZHTD2ikkj X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=A+Cyu2a/iwFeaIQ+n8Vh0DdrvkqBm1R2njc7JziXUK0=; b=Agdg3x0NHRzo7qOetVBRtlccnXSHppvDzsBg1XCJurAWt9VbsX7V5jiKJfJNCjxkUQ /O7IWSHwyut9MDxrYJDcE9pSwCRuwF8JoiaT0/YLxf1IwWkQsdWwwavAQlRgrLJZq0cD tj5eHZPW+/vYXGhASPc46dQuzd2ym+DhldIPcdIeMe8nYuRl29xKx7c1iYFSAI0Janqa GiUy9Nfp2E7ppgbHEkhfvZ6Jv1fiEcrw2Q+XBvHzB+BPoruTXC724RpQ8U+s6f+h/7SD Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf3jkKHdQhR3ziro1VidkghryIDQKjj7FLcLtLybQevReZlXr4uC LzCVy6pCXX41aFgBQqE/le4= X-Google-Smtp-Source: AMsMyM5hwM83OQvEVXPcIy/72UOObLWne/qwiq2Q3qqLyRnUjBYORRq0zX9h/xHnBr5ErCXOQ6KRfQ== X-Received: by 2002:a5d:4846:0:b0:236:64b1:50fe with SMTP id n6-20020a5d4846000000b0023664b150femr20340737wrs.672.1667503106746; Thu, 03 Nov 2022 12:18:26 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a7b:ce11:0:b0:3c6:efd6:9cd8 with SMTP id m17-20020a7bce11000000b003c6efd69cd8ls1552141wmc.0.-pod-control-gmail; Thu, 03 Nov 2022 12:18:22 -0700 (PDT) X-Received: by 2002:a05:600c:46cd:b0:3c6:f5ab:d383 with SMTP id q13-20020a05600c46cd00b003c6f5abd383mr21341969wmo.40.1667503102693; Thu, 03 Nov 2022 12:18:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667503102; cv=none; d=google.com; s=arc-20160816; b=RfjO2cKlJghQHzEtckftXJFJUhwh18EMsM1f/aYM9qcD5wnPua1uWhDNqo8O0idWue I77vhG7sgZ8gKk2hAVvWdb8hIHiOaxopdrlgMhX1WacYKkYaG0kCo+Nld8OQBnvbsitO TjnwLLI+0AbvSUimrQfohOWwJt9cnmGl4S9/Mr3o3VZKUzlholEF0bErckBHOwnh3Ru3 K7Yoy7gwN8T/jacbMNXpLmLSBcI4L/equJF4SS2/WgurLuMHeMGg5T7dZLloGIDMue8r 3TSeMcKuQm/d4Fyh/YNjXCupGOZNgbJM7AJaVtAAF/0kCgQKYT5N8nImwh7amLXfCedO nT0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=gVGoMrNp61foZTB5L7ZUTF/JaVnIOIRYfDgun4V5NGQ=; b=UScqxzuGG1zuDnfL7CkfUqvHjDU1bg14sv4hurtL+z7e//+4iiIOWvaQ2izfb74vEE hypxHoR7PUyZ8f4eCuTjNNLo3fsF/Fy9y7NQ2P551op4HSCkt+/ZVvOj8Aw2XSgot+G7 W4lSER7psZWkncDBIvfSu2iivE7/ySo0OV4TEqU28uqvScVsLO6MR5oXbF/1h0wlKEXX 7s0IkJIxe5Q8SPhrsLPsY7nYTve8etonP0QjoaRFJS8fsTI1IAHYKSMtF5/OECGDTt83 mchoH4Hq2UWNAU3G5ItIS42vujA48Wo3VsWq41cLqn24fyxokz6FKK0ygtdBaXzppnGy hwAA== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=YksrHh4X; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout01.posteo.de (mout01.posteo.de. [185.67.36.65]) by gmr-mx.google.com with ESMTPS id n23-20020a7bc5d7000000b003cf1536d24dsi39091wmk.0.2022.11.03.12.18.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Nov 2022 12:18:22 -0700 (PDT) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) client-ip=185.67.36.65; Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 488AF240027 for ; Thu, 3 Nov 2022 20:18:22 +0100 (CET) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4N3D7K6P6Pz9rxP for ; Thu, 3 Nov 2022 20:18:21 +0100 (CET) Content-Disposition: inline In-Reply-To: <7f570676-2876-4e29-a8c0-9a765617f141n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=YksrHh4X; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31680 Archived-At: Iterating over all possible values at every string can be expensive. You ca= n speed up the process by rewriting Includes like this: ``` local bool_tables =3D {} function Includes(tab, val) -- strip trailing punctuation before matching local bare =3D string.gsub(val,"[%.|,|;|:]", "") -- The first time a given value of tab is tested in Includes, -- a table is added in bool_tables which contains -- a boolean key-value entry for each element in tab. if not bool_tables[tab] then bool_tables[tab] =3D {} for _, elem in ipairs(tab) do bool_tables[tab][elem] =3D true end end -- So we only have to test once for the key-value entry -- instead of iterating over all elements in tab -- every time Inclues is called. return bool_tables[tab][val] end ``` Le Wednesday 02 November 2022 =C3=A0 06:20:37PM, bapt a a =C3=A9crit : > Hi all, >=20 > I've started writing a technical book using Quarto markdown, which uses p= andoc > with Lua filters under the hood to produce a website as well as the publi= sher's > pdf format (via LaTeX). > I quite like to keep the source document as plain as possible, and I'm > wondering if I could avoid the use of [concept]{.index}, which gets turne= d into > \index{concept}, and instead write a Lua filter with my custom list of > keywords, and have pandoc automatically match them as they appear in the = text. > As a proof of principle I wrote the following code (see below), which mat= ches > specific keywords, and reformats them as small-caps. I quickly realised t= hat > trailing punctuation, such as "concept, ..." will fail to match, so I'm u= sing > gsub to strip such punctuation before matching. It works, but I'm a bit > worried: >=20 > - what's the overhead of such a filter, in practice? From what I understa= nd, > every single string element in the AST will be processed by gsub then tes= ted > for a match. Are Lua filters walking down the AST fast enough that I shou= ldn't > worry about it? (as far as I can tell on small examples, it seems fine) >=20 > - assuming this idea is reasonable, I might want to do a few similar > operations, e.g. reformatting program languages (as in this example code)= , > wrapping keywords in \index{}, etc., and the exact format will often depe= nd on > the output target (html vs TeX etc.). Is there a better construct for thi= s than > successive if/else statements to look for matches? (I don't know much Lua= ) >=20 > Best regards, >=20 > baptiste >=20 > Lua filter: > ----- >=20 > local text =3D require 'text' > local pandoc =3D require 'pandoc' >=20 > -- keywords to look for in the document > local langs =3D {"Matlab", "R", "Julia", "C++"} >=20 > function Includes(tab, val) > -- strip trailing punctuation before matching > local bare =3D string.gsub(val,"[%.|,|;|:]", "") >=20 > for index, value in ipairs(tab) do > if value =3D=3D bare then > return true > end > end >=20 > return false > end >=20 > function Replace_langname(elem) > if Includes(langs, elem.text) then > return pandoc.SmallCaps(text.lower(elem.text)) > else > return elem > end > end >=20 > return {{Str =3D Replace_langname}} >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email > to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [2]https://groups.google.com/d/m= sgid/ > pandoc-discuss/7f570676-2876-4e29-a8c0-9a765617f141n%40googlegroups.com. >=20 > References: >=20 > [1] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [2] https://groups.google.com/d/msgid/pandoc-discuss/7f570676-2876-4e29-a= 8c0-9a765617f141n%40googlegroups.com?utm_medium=3Demail&utm_source=3Dfooter --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/Y2QT/FV43helVxE0%40localhost.