* Extending lua wordcount filter to count specific parts of text
@ 2020-08-17 19:42 h gv
[not found] ` <49b04b07-285b-47f5-8b6b-b123db559b07o-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 2+ messages in thread
From: h gv @ 2020-08-17 19:42 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2973 bytes --]
I'd like to extend the lua wordcount filter to tell me a bit more about
specific parts of my text, specifically how many words are in the footnotes
and how many words are in "original quotations," which I mark off with the
<qu></qu> tag in my markdown (and which I then strip later via another
filter for certain versions). I got the footnote part to work but can't
figure out the RawInline html bit. Any guidance would be appreciated.
Here's my filter followed by a simple markdown doc and the results
```
-- counts words in a document
words = 0
notewords = 0
quotewords = 0
notenoquotewords = 0
noquotewords = 0
wordcount = {
Note = function(el)
pandoc.walk_inline(el, {
Str = function(el)
if el.text:match("%P") then
notewords = notewords + 1
end
end })
end,
RawInline = function(el)
if el.text == '<qu>' then
pandoc.walk_inline(el, {
Str = function(el)
if el.text:match("%P") then
quotewords = quotewords + 1
end
end })
end
end,
Str = function(el)
-- we don't count a word if it's entirely punctuation:
if el.text:match("%P") then
words = words + 1
end
end,
Code = function(el)
_,n = el.text:gsub("%S+","")
words = words + n
end,
CodeBlock = function(el)
_,n = el.text:gsub("%S+","")
words = words + n
end
}
function Pandoc(el)
-- skip metadata, just count body:
pandoc.walk_block(pandoc.Div(el.blocks), wordcount)
mainwords = words - notewords
notenoquotewords = notewords - quotewords
noquotewords = words - quotewords
print(words .. " total words")
print(mainwords .. " words in main text")
print(notewords .. " words in notes")
print(noquotewords .. " total words minus original quotes")
print(quotewords .. " words in original quotes")
print (notenoquotewords .. " words in notes minus original quotes")
os.exit(0)
end
```
test.md mwe markdown file
```
Suspendisse malesuada venenatis mauris. Curabitur ornare mollis velit. Sed
vitae metus.
"Morbi posuere mi id odio."[^1]
[^1]: Citation. <qu>("Original quotation here.")</qu>
```
`pandoc --lua-filter wordcount.lua test.md`
> 20 total words
> 16 words in main text
> 4 words in notes
> 20 total words minus original quotes
> 0 words in original quotes
> 4 words in notes minus original quotes
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/49b04b07-285b-47f5-8b6b-b123db559b07o%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 4115 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Extending lua wordcount filter to count specific parts of text
[not found] ` <49b04b07-285b-47f5-8b6b-b123db559b07o-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-08-19 21:02 ` hgvhgvhgv
0 siblings, 0 replies; 2+ messages in thread
From: hgvhgvhgv @ 2020-08-19 21:02 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 4078 bytes --]
Now I understand why what I wrote doesn't work (the RawInline objects is
limited to just whats in the <> and isn't a traversable object like Note).
But based on reading through the listserv, I'm not sure if I can do what I
want with two arbitrary tags as RawInline objects. It would be easier if my
<qu> tags were <span class="qu"> (though less ideal from a readability
standpoint). I can convert these to spans in this way (
https://groups.google.com/g/pandoc-discuss/c/yQjvOhIQ40A/m/RclMzdtiCAAJ).
But then it seems like I have to go back to markdown to get the lua filter
to recognize these as Span objects? Or is there some way to do it all in
one pass? Maybe a completely different approach is necessary (somehow
putting what's between the two RawInline tags into a table or list?). Sorry
for my obtuse first attempts.
On Monday, August 17, 2020 at 3:42:54 PM UTC-4 h...-97jfqw80gc5Wk0Htik3J/w@public.gmane.org wrote:
> I'd like to extend the lua wordcount filter to tell me a bit more about
> specific parts of my text, specifically how many words are in the footnotes
> and how many words are in "original quotations," which I mark off with the
> <qu></qu> tag in my markdown (and which I then strip later via another
> filter for certain versions). I got the footnote part to work but can't
> figure out the RawInline html bit. Any guidance would be appreciated.
>
> Here's my filter followed by a simple markdown doc and the results
>
> ```
> -- counts words in a document
>
> words = 0
> notewords = 0
> quotewords = 0
> notenoquotewords = 0
> noquotewords = 0
>
> wordcount = {
>
> Note = function(el)
> pandoc.walk_inline(el, {
> Str = function(el)
> if el.text:match("%P") then
> notewords = notewords + 1
> end
> end })
> end,
>
> RawInline = function(el)
> if el.text == '<qu>' then
> pandoc.walk_inline(el, {
> Str = function(el)
> if el.text:match("%P") then
> quotewords = quotewords + 1
> end
> end })
> end
> end,
>
> Str = function(el)
> -- we don't count a word if it's entirely punctuation:
> if el.text:match("%P") then
> words = words + 1
> end
> end,
>
> Code = function(el)
> _,n = el.text:gsub("%S+","")
> words = words + n
> end,
>
> CodeBlock = function(el)
> _,n = el.text:gsub("%S+","")
> words = words + n
> end
> }
>
> function Pandoc(el)
> -- skip metadata, just count body:
> pandoc.walk_block(pandoc.Div(el.blocks), wordcount)
> mainwords = words - notewords
> notenoquotewords = notewords - quotewords
> noquotewords = words - quotewords
> print(words .. " total words")
> print(mainwords .. " words in main text")
> print(notewords .. " words in notes")
> print(noquotewords .. " total words minus original quotes")
> print(quotewords .. " words in original quotes")
> print (notenoquotewords .. " words in notes minus original quotes")
> os.exit(0)
> end
> ```
>
> test.md mwe markdown file
> ```
> Suspendisse malesuada venenatis mauris. Curabitur ornare mollis velit. Sed
> vitae metus.
> "Morbi posuere mi id odio."[^1]
>
> [^1]: Citation. <qu>("Original quotation here.")</qu>
> ```
> `pandoc --lua-filter wordcount.lua test.md`
>
> > 20 total words
> > 16 words in main text
> > 4 words in notes
> > 20 total words minus original quotes
> > 0 words in original quotes
> > 4 words in notes minus original quotes
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e516c89b-05fc-4607-9237-98d2d01577c0n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 5360 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-08-19 21:02 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-17 19:42 Extending lua wordcount filter to count specific parts of text h gv
[not found] ` <49b04b07-285b-47f5-8b6b-b123db559b07o-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-08-19 21:02 ` hgvhgvhgv
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).