From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/19018 Path: news.gmane.org!.POSTED!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: lua filter to count words in a document Date: Mon, 18 Dec 2017 22:22:47 -0800 Message-ID: <20171219062247.GB765@Johns-MacBook-Pro.local> References: <20171113165515.GA49016@protagoras> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed X-Trace: blaine.gmane.org 1513664465 15836 195.159.176.226 (19 Dec 2017 06:21:05 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 19 Dec 2017 06:21:05 +0000 (UTC) User-Agent: Mutt/1.6.2 (2016-07-01) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBRPA4LIQKGQEKU4RYVY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Dec 19 07:21:01 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-it0-f64.google.com ([209.85.214.64]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eRBGu-0003lR-Bb for gtp-pandoc-discuss@m.gmane.org; Tue, 19 Dec 2017 07:21:00 +0100 Original-Received: by mail-it0-f64.google.com with SMTP id r196sf1248022itc.4 for ; Mon, 18 Dec 2017 22:23:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1513664582; cv=pass; d=google.com; s=arc-20160816; b=SbnBH9+NehJGr8/xU1XK7nBQlTnDH9m0BfDH4ITUVO56PLdOHwWIZg1q8FBGh7rFbk WFrCmebbNn5hhqMUKirYbHzmNIBqQkT+ov9ZXXtCtcmDaQ5a3CdJhDM/E50bQUWSWtpG m2zuyu7oK73vr1gaYLsNVOgGZE9xdl/YgBr+Xps9ZO/jqC1AsIuqL0YW6dJ86VJz1Rqm Jg8a4ga8PNu7SXtxVcKhSbzI5he3CQ/bJupuERfRT+xAnNrgGnMMUOyssQOSoClVoVud hv/p960CXhnA2cfMBLIr8o3Qqrk7aUQPPi1MmNHABBwfDmHCcXfbcOYABySWr5xil+LI /IQg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:to :date:from:arc-authentication-results:arc-message-signature:sender :dkim-signature:arc-authentication-results; bh=3fZWklZD8e2Xc/yepbdSEdJ35MltUKxhKX0IAExyOXM=; b=yeBMqDpr32fYY1DhbuF++RmMX/ewcF8PVPpohZtgXA0EOY9+21xZiRQjrRokvUmCJI zVvW1mAtlTVufvAhDwtcZKXjmFjcxG/7oif/qTOkSsccujpAC1WoHmgGnDwoIHUQ/mF0 9uY9lKJXuE0au5YpiT01YuElHaMr42dGOvUvUjcUgWvstLJfpqDfkXLyW4e23Azkq6LO bP9OR3kmJ7p+if2F5ASRy3BwyMf+l+UnE7ijsqEeYD6IXx5jG9FxZWDaRulsSDdGCAyn gQjv/+7Ly82FtmHwHXX8yJf+RS3iFURS4sseE87kgDYfEZyal/QS2SXiOWDrJ96dUQ1Q ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=RHRNUvC2; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::234 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:date:to:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=3fZWklZD8e2Xc/yepbdSEdJ35MltUKxhKX0IAExyOXM=; b=V8rrRKpuctGfSmga2LpdXC+taaSdp+l4Q5j0Q0NGbt0OwasqvmripPCBBxpiMXdvSt zWpNO+yULSWzg4iHpMhJsAykcqp3I7PIOAY4ida5Ea5qNr6LjEZQBBocDDQuSmRYLEub pBGCPls7H5/dsHhx6ebFUAIoeB4dFD1l/tbyWbntnw3zx5Tw8vx3n1TH3n8BH+UDYa09 KzNxev3YzGgLtwuvChW0oFljplOxCIoLkWtTVd5xevaeQJA3qbCu1Dnv/cHYNoz6QHDu rDm2VE07RORDtnwX40ByOIpna3aCzvUdkwmC6SJsuWBeF1pKibcHHH7FTbvenF1y/8kx weQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:date:to:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=3fZWklZD8e2Xc/yepbdSEdJ35MltUKxhKX0IAExyOXM=; b=JIefGnXoPKoxJDiBYVlHDvMn6lfKoUoz0S9Cu6w491nRFlSXWti84yOBmaSJJWaqLT iYKMh6AcXrXEBKZ3Oo9eiO5cgbHmflXsgNAk1wwdvKjjr2hRh6oa5VNzHSIC6sVKJEgO TGYbbTgMkIverIGtUE9gAMIaO4btAeBfS6CseQ7bA5LfwMWNvV4V9m8r8PlFJHo3g+Yo XuFQf+EcnmGMVtBDyjlNrOI2GXpryhz9b8A8qHhOoFVX5UHNVStwof8pnhjo6fvLBvxK PuyOjS/MKBt6PKegxaGPFwHjxbeqbDNm5EbsKJ33sb7UXHJN2/wjznZu9xOmuLgusTpG Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AKGB3mIzfagH+JDtZb78hQjaRFLBDZ9sPoivYoK0ynzbqHZaQ/CDF9Si 2fG1OiZDUL5EMbOjoGQWl9o= X-Google-Smtp-Source: ACJfBovp56hbYpT/o1lBg7Nc2+HuW7tYVqDWRr0BhSMYkdYUXMQZERPh4GaKLUT2e++dk8ygRe9oSw== X-Received: by 10.36.9.208 with SMTP id 199mr22142itm.0.1513664582661; Mon, 18 Dec 2017 22:23:02 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.36.41.131 with SMTP id p125ls72625itp.1.gmail; Mon, 18 Dec 2017 22:23:01 -0800 (PST) X-Received: by 10.107.63.215 with SMTP id m206mr723529ioa.83.1513664581551; Mon, 18 Dec 2017 22:23:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513664581; cv=none; d=google.com; s=arc-20160816; b=BdVPXUBg65TTaUFgMlJyzqKw/bhfCTPKCsEsmBFqwvjjlIawV1aVrX1cedw8JIQ/sU xTskPnV7xZ2XyC2Rqyt2fwtnSmwnxAINwwMnQcx4+nlGcHUZ+j6stmI2ntPonxqh00mW /HU0RpHuayULGQa5rkO48LO90szZCPWTqKFWGZnPQ/8P0p7NH3lvkAfZcyZtvkrko5KP iAaVfcflDD/5+D34csqi4yaPbxjsLlifqgUuAVvW4SLAwp+X5CSTL+FrzfkuoMZ2MvGS VG1jUNyQtyagllrsBq/MTK8RD1D48osEg5z5J+ZOv8Zgxr55+f9PYmiBzgsPcJeVZSGy 5bKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:to:date:from:dkim-signature :arc-authentication-results; bh=+bDG3yK9QH5K4fUEoA5fh5i8zmn90FLCYwcwPCCdvqE=; b=eCChM8DRGh16XO2g0DIRFmwO62f+fZmvWoH1ka6GldGRuGr9kZI3zyXaSHlLQuNaI/ JqHyTe4NEW2Bwhre4f1gnYBHFotBDhEUK3JiXTAO1PaQB9ZmDmVnK/sV5ACWw3duJMqY tYPzCpP2A5JBSV/QjphSqmMOyoAegh2JAPIrnj/OC9quMZDtChbkcPKuzFdbZ84ZSGSU r0Cg5C8gPVXQa/q+pIvPWfjqyhMwykrx6YoP9Zu/FSeOl7zMpcuD0COoqu5vywwiWFAi fx62V+eW4AC24oirHqmtUQDzekwiJstl7ewVgq3SAi2TjQzvavJZAsWkssJHo0v8hoL6 i1ug== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=RHRNUvC2; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::234 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pf0-x234.google.com (mail-pf0-x234.google.com. [2607:f8b0:400e:c00::234]) by gmr-mx.google.com with ESMTPS id f73si264395itf.2.2017.12.18.22.23.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Dec 2017 22:23:01 -0800 (PST) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::234 as permitted sender) client-ip=2607:f8b0:400e:c00::234; Original-Received: by mail-pf0-x234.google.com with SMTP id a90so10763110pfk.1 for ; Mon, 18 Dec 2017 22:23:01 -0800 (PST) X-Received: by 10.98.63.221 with SMTP id z90mr2211302pfj.101.1513664580549; Mon, 18 Dec 2017 22:23:00 -0800 (PST) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id u19sm25829362pfh.89.2017.12.18.22.22.58 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 22:22:58 -0800 (PST) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 0F5E1A21C; Tue, 19 Dec 2017 01:22:48 -0500 (EST) Content-Disposition: inline In-Reply-To: X-PGP-Key: http://johnmacfarlane.net/jgm.asc X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=RHRNUvC2; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c00::234 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Spam-Checked-In-Group: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:19018 Archived-At: @tarleb, any idea what is happening here? +++ Greg Tucker-Kellogg [Dec 18 17 17:08 ]: > This filter appears to fail when the metadata includes elements such > as multiple authors with keys. For example, the following fails when > the metadata blocks is included (but not if the metadata block is a > simple list of authors). > The error message is > Error running filter wordcount.lua: > attempt to call a nil value > And the return value is 83 > --- > > > title: The document title > > > author: > > > - name: Author One > > > affiliation: University of Somewhere > > > - name: Author Two > > > affiliation: University of Nowhere > > > --- > > > > > > > > > # This document has a few words (14) > > > > > > This is a test of some words > > > On Tuesday, November 14, 2017 at 12:54:55 AM UTC+8, John MacFarlane > wrote: > > This lua filter can be used to count the words in a > document, in any format pandoc can read. It omits > metadata words (title, abstract, authors), and of course > it ignores all of the non-content words like HTML tags, > LaTeX commands, the `#` that marks an ATX header, and > so on. > To use, save as wordcount.lua and do > pandoc --lua-filter wordcount.lua inputfile > ```lua > -- counts words in a document > words = 0 > wordcount = { > Str = function(el) > -- we don't count a word if it's entirely punctuation: > if el.text:match("%P") then > words = words + 1 > end > end, > Code = function(el) > _,n = el.text:gsub("%S+","") > words = words + n > end, > CodeBlock = function(el) > _,n = el.text:gsub("%S+","") > words = words + n > end > } > function Pandoc(el) > -- skip metadata, just count body: > pandoc.walk_block(pandoc.Div(el.blocks), wordcount) > print(words .. " words in body") > os.exit(0) > end > ``` > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > [3]https://groups.google.com/d/msgid/pandoc-discuss/d39a7269-42e4-4ca4- > 8f2b-946415d174af%40googlegroups.com. > For more options, visit [4]https://groups.google.com/d/optout. > >References > > 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > 2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > 3. https://groups.google.com/d/msgid/pandoc-discuss/d39a7269-42e4-4ca4-8f2b-946415d174af-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer > 4. https://groups.google.com/d/optout