From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31681 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Lua filter to automatically tag keywords for TeX indexing Date: Thu, 3 Nov 2022 12:39:22 -0700 Message-ID: References: <7f570676-2876-4e29-a8c0-9a765617f141n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5701"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDW7ZIEHTIIBB35RSCNQMGQE4QIQ3AQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Nov 03 20:39:31 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f192.google.com ([209.85.222.192]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oqg3m-0001Dp-Mg for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 03 Nov 2022 20:39:30 +0100 Original-Received: by mail-qk1-f192.google.com with SMTP id i17-20020a05620a249100b006fa2e10a2ecsf2779090qkn.16 for ; Thu, 03 Nov 2022 12:39:30 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1667504369; cv=pass; d=google.com; s=arc-20160816; b=z1ZQp6G9Yts7uAr+6WTUpj4k2s/c5Tj+SquROqbklzBO7xBhxIBg/35oDdVmuim9uG 7ZjPO9BQVsgNnAuGZvKGVUsIMqsFHSaF0U0YynmGxz9YIGwGw0zttmWJGWVm1/YP1FBZ LRy+ci18yEwmGguXLbDIdFVZdjUK7utVwio5WgwdInAH1OQH1ywmD1e+8Bsw2ItR2bYF 0ikf6cMh9YM3rJLCHiCtNLBOM875k3n/q0R6745f92sC/Km4iqeTA1fqP9fbWNwoogNj pB1Du98TZGqrvE2S1Nyyb+V7/7LDGSLCCp7iOGlQGzSzAcns/qtAxu8I6XWLJQn7JE2i lxDA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:message-id:in-reply-to:to :references:date:subject:mime-version:content-transfer-encoding:from :sender:dkim-signature:dkim-signature; bh=2H6tIGlByX9mO4GTqzz/I4fCODtIkHMT3/27HbbU2ck=; b=iligHuccaEqiPo5KrKxtd8gbR9nuaGy10fW6e/Ax+KxbuLv/UTc+AdYfE5k7HrW/U9 PogQMhNGBMumDmcGlUZY4NHz4HKIekhZTlpfd+oaiCno1OuT0hMKGjO+Zqlv3Lnel6VB NuLS+jT8npWZVJDPVhzwJD/6g9SWjvN+EjeyxV6v7iF5er91UULCpXDn6Sc3fPaXmhwC oxD/NsuIDjWj15GjiKBpBcmXFAnbQbnyTR/KhfXomWLjniH4tHBP21WKelelvaPqOD43 MV/BZUOw1IU5pgmGAgTawZLyj4mIMC5LHxMUuiroSS1I8HzMoLQ3AkiCojRAjBuHcQ12 CXnA== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PYH64IHD; spf=pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::62e as permitted sender) smtp.mailfrom=fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:message-id :in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=2H6tIGlByX9mO4GTqzz/I4fCODtIkHMT3/27HbbU2ck=; b=auSEcVZ9JBWsA8LMPTjo8W4Ie54ExHppdrBTm6XfqwjUwIOxyAiQvdL7xfCkcKl1Z6 8rAJy8MsLoVNUmH/cFqCtBD055JF9Dkx17J+PjL/Xfbcl3C2RXMc/yiTWMK/rlmus3wS h1D2hgsCbXvudegO8qyTwlEAYGNe/5BwlKMC8Gco3Y/b3fYEiQUYLuqZEoRyey0gNBVu qUZqiyP4V0RvSZiJdong85ScDji8V3vmxUVEWuQ0KQ3iIlQEU1UGlYFZDVAyvBXXgudL QGaDcOukd9yv/fMQQwOE9hm36F+NVxQHnts+2NAr1Enq0XzsXTc0S8KVWonx67VJs DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:message-id :in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:from:to:cc:subject:date:message-id :reply-to; bh=2H6tIGlByX9mO4GTqzz/I4fCODtIkHMT3/27HbbU2ck=; b=SPiPxKwi0YeD2VG30w3q93UDVJxFv9VXrsziYV7spTHqrACkRvnPJ/L/vAYUQtWrv/ tTx1mRxehMr1HLYSGqSgZXIOMOHgwgqZVhVD4YxYS3J9JzPEy4mKwNVYn1ttkIIvAXYv PPX30hVQoI8r3EFjcCO7kt4h1fT57GY51ZZ+g3g+gSaWaOalNCQMZWg+sYDAtO607u3L n9Y8cD37Km8RD2I9Npimdl5gghFBnbzatTWnjf0uDPK2B/jkYHjI5pSZCzL6M9ZZSvpW OD0OkIhbHDx30DBPv1t/CuUL/EI99sfwP7zjHFe0vJ7UKDPMAeVjAuHi7X/QsBGKN94f h X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:message-id :in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=2H6tIGlByX9mO4GTqzz/I4fCODtIkHMT3/27HbbU2ck=; b=EDIIvOTFM4xEVrTiuy8g0PIYNLXD482/wiCA2WQ0s2w+f03qQyqDsjv6thttNctSRp fR9dCzYDGQais3z9r7PW5OATLJIzU0VprJ+r8tS3JWArZejMGgPtrNiSV3zbH+H7wJWj zQujkysTOkjoP4xvEezYjf3IXA+YVLnTh4V3DvuHY7wdso0lqqITNsjZOv2qJq5UIpr5 2JkFAQVgJBFeZeI7L8bLvLvY6AIDq85+t/TbT5GakrlAhT1fbIoayRLA/+HtZTGuhdER gR6IuTwjP3Z0x0kkOgCH Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf2lQJbRXuFoQng6dszMeyyEMHmiY6QViAjrpg7vi/QXx+SuHRty 5wTwz4SOrJHZxlaPzcGnI20= X-Google-Smtp-Source: AMsMyM4/isAfc+xT6hwz1gbnvM0IXaolgaisE1yOmACAaaDxft7ioHDww5oyd//QCG3OjfYa5xbH6A== X-Received: by 2002:a05:6214:f2e:b0:4bb:ee2f:43b2 with SMTP id iw14-20020a0562140f2e00b004bbee2f43b2mr22540546qvb.105.1667504369779; Thu, 03 Nov 2022 12:39:29 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ae9:e317:0:b0:6fa:2ffa:81db with SMTP id v23-20020ae9e317000000b006fa2ffa81dbls1847599qkf.2.-pod-prod-gmail; Thu, 03 Nov 2022 12:39:27 -0700 (PDT) X-Received: by 2002:a37:454d:0:b0:6fa:8c15:651a with SMTP id s74-20020a37454d000000b006fa8c15651amr138594qka.674.1667504367363; Thu, 03 Nov 2022 12:39:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667504367; cv=none; d=google.com; s=arc-20160816; b=O6Bx/Pt+Y4z7jeWhY3fRYPl7TM8o0fgoXEjo63qIFw0VSyX35XNKXEbwWAqdGKWCk2 2hZMF6AvlvI1EMqLJ4LLlDvyhXInA/0Dv1CU9PddWVBFb7BlEGwCNSMx/QGnDcyVn9Uk +7VmXiL3AS4qihl09SjmANdfjbMd2NQ1d+jiQ3Bvo28torhjro5HmGJ3uiqdsZbJXSm4 w3Cc5sahDlG2vSDeBKvsfJ3rUTRMfeCL3YE8d989w05Te9AYT7927TAB76XAPS49IwQX rH8GhjGQL5nAf1iOEsHrjHV43Dyo5iFECYNed1PsCnjtPhBBesu3P3JotcM5DaNcvMhN Ehtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:in-reply-to:to:references:date:subject:mime-version :content-transfer-encoding:from:dkim-signature; bh=lb36zQYFoQnz9TUjLxAeDR7MtGahV7jJv/isKAxY9Hs=; b=KnYQxb+PyIpdQcA98pK2wd2TEOX4nZuPOnOJDV7U+PWFqb5Ba+V3/RqtfTSCVolAhf 4RdEf9BblOCEfl20AshJ5j2RaZDBxy4C2f+fBnubTNRnyBA3p2OA/YKO147swjXLrKAD qJD+Ewo7szd4buLCgmyngIhyDywvr+3MuJ0nyMAHiUf3TNwSzapCQqBbhaQMSb3re2yf 5nfuqTwW8bgD1DsjhHFSmPeOB2USUi0suUGWlIIiaNurdcG4XQPeFmxRVfx3PAOWn0TO N5Y4LwmLyP8v+NZkBdNM7ubGghdSVcefdGZiSz7n1AlrtH32KKGijY4ZKfx4WuqCT6xc XErQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PYH64IHD; spf=pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::62e as permitted sender) smtp.mailfrom=fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com. [2607:f8b0:4864:20::62e]) by gmr-mx.google.com with ESMTPS id a26-20020ac84d9a000000b0039cf0064ccfsi88397qtw.2.2022.11.03.12.39.27 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 03 Nov 2022 12:39:27 -0700 (PDT) Received-SPF: pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::62e as permitted sender) client-ip=2607:f8b0:4864:20::62e; Original-Received: by mail-pl1-x62e.google.com with SMTP id c2so2887365plz.11 for ; Thu, 03 Nov 2022 12:39:27 -0700 (PDT) X-Received: by 2002:a17:90b:2393:b0:213:ecb2:2e04 with SMTP id mr19-20020a17090b239300b00213ecb22e04mr21611637pjb.100.1667504366671; Thu, 03 Nov 2022 12:39:26 -0700 (PDT) Original-Received: from smtpclient.apple (protagoras.phil.berkeley.edu. [128.32.252.45]) by smtp.gmail.com with ESMTPSA id o35-20020a17090a0a2600b00205d70ccfeesm351936pjo.33.2022.11.03.12.39.24 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 03 Nov 2022 12:39:24 -0700 (PDT) In-Reply-To: <7f570676-2876-4e29-a8c0-9a765617f141n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Original-Sender: fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PYH64IHD; spf=pass (google.com: domain of fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::62e as permitted sender) smtp.mailfrom=fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31681 Archived-At: > On Nov 2, 2022, at 6:20 PM, bapt a wrote: >=20 > Hi all, >=20 > I've started writing a technical book using Quarto markdown, which uses p= andoc with Lua filters under the hood to produce a website as well as the p= ublisher's pdf format (via LaTeX).=20 > I quite like to keep the source document as plain as possible, and I'm wo= ndering if I could avoid the use of [concept]{.index}, which gets turned in= to \index{concept}, and instead write a Lua filter with my custom list of k= eywords, and have pandoc automatically match them as they appear in the tex= t.=20 > As a proof of principle I wrote the following code (see below), which mat= ches specific keywords, and reformats them as small-caps. I quickly realise= d that trailing punctuation, such as "concept, ..." will fail to match, so = I'm using gsub to strip such punctuation before matching. It works, but I'm= a bit worried: >=20 > - what's the overhead of such a filter, in practice? From what I understa= nd, every single string element in the AST will be processed by gsub then t= ested for a match. Are Lua filters walking down the AST fast enough that I = shouldn't worry about it? (as far as I can tell on small examples, it seems= fine) The AST walking is very fast. See the benchmarks at the beginning of https= ://pandoc.org/lua-filters.html for one example. > - assuming this idea is reasonable, I might want to do a few similar oper= ations, e.g. reformatting program languages (as in this example code), wrap= ping keywords in \index{}, etc., and the exact format will often depend on = the output target (html vs TeX etc.). Is there a better construct for this = than successive if/else statements to look for matches? (I don't know much = Lua) In lua you can do string.gusb(val, [=E2=80=9C(%l*)=E2=80=9D], function (word) if indexable[word] then .. whatever .. end end) This will run the function on every group of letters in the matched string. Here I=E2=80=99m assuming you have a lua table indexable that maps words to= true, e.g. { cow: true, horse: true } That will be much faster than iterating through an array as you=E2=80=99re = doing here. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/ECDE1635-3DD4-4E57-8D66-E546B4742622%40gmail.com.