* document to json
@ 2022-02-01 10:38 jgran
[not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: jgran @ 2022-02-01 10:38 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1302 bytes --]
Hi,
I would like to use pandoc to transform a Word collection to json for using
in search engines (meilisearch, tantivy or datasette for instance) and for
learning lua-filter also!
Unfortunately I'm struggling to get content as is, say in markdown for
paragraph. I don't get how to accumulate content in usable format. I'm lost
among recursive Str, BulletList, etc.
I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output
<https://stackoverflow.com/questions/70834828/pandoc-lua-filter-restructure-json-output>)
but got not answer... Sorry for this double post.
I found *pandoc.utils.stringify*, but it only returns no-spaced
concatenated Str, so only useful for debuggin AFAIK.
Is there a *stringify *equivalent for returning content in markdown format?
Or what's the best way to walk paragraphs (walk_block/walk_inline) for
extracting content?
Thanks in advance!
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/002e3430-c171-4af5-8b94-095c054193fdn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 1802 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: document to json
[not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-02-01 16:26 ` Albert Krewinkel
[not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Albert Krewinkel @ 2022-02-01 16:26 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: jgran
jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> I would like to use pandoc to transform a Word collection to json for
> using in search engines (meilisearch, tantivy or datasette for
> instance) and for learning lua-filter also!
Interesting use-case!
> Unfortunately I'm struggling to get content as is, say in markdown for
> paragraph. I don't get how to accumulate content in usable format. I'm
> lost among recursive Str, BulletList, etc.
>
> I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output)
> but got not answer... Sorry for this double post.
No problem, bringing it here was the right step.
I think one reason that this is difficult to answer is that the expected
output is not clearly defined. I wasn't sure which parts are supposed to
be Markdown strings and which should become structured JSON.
> I found pandoc.utils.stringify, but it only returns no-spaced
> concatenated Str, so only useful for debuggin AFAIK.
>
> Is there a stringify equivalent for returning content in markdown
> format? Or what's the best way to walk paragraphs
> (walk_block/walk_inline) for extracting content?
Take a look at [pandoc.write], which was added in pandoc 2.17. It
expects whole documents as input, which you can create with
`pandoc.Pandoc`.
You might also find good use for [pandoc.utils.make_sections], a
function to turn a flat document into a more hierarchical structure.
[pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write
[pandoc.utils.make_section]: https://pandoc.org/lua-filters.html#pandoc.utils.make_sections
--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: document to json
[not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-02-03 11:20 ` jgran
[not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: jgran @ 2022-02-03 11:20 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2848 bytes --]
Thanks a lot for your answer.
But, beginner quesion, what's the basic command line for executing the
pandoc.write example?
Pandoc requires a document as input. In the following example, should it be
'pandoc -L .\write.lua anyfile'? How to output only the result of the
'write.lua' filter. What to return?
Thx
file write.lua:
local doc = pandoc.Pandoc( {pandoc.Para {pandoc.Strong 'Tea'}} )
local html = pandoc.write(doc, 'html')
assert(html == "<p><strong>Tea</strong></p>")
return { ?? } -- what to return for having the html as sole output
Le mardi 1 février 2022 à 18:04:40 UTC+1, Albert Krewinkel a écrit :
> jgran <jgrn...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I would like to use pandoc to transform a Word collection to json for
> > using in search engines (meilisearch, tantivy or datasette for
> > instance) and for learning lua-filter also!
>
> Interesting use-case!
>
> > Unfortunately I'm struggling to get content as is, say in markdown for
> > paragraph. I don't get how to accumulate content in usable format. I'm
> > lost among recursive Str, BulletList, etc.
> >
> > I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output)
> > but got not answer... Sorry for this double post.
>
> No problem, bringing it here was the right step.
>
> I think one reason that this is difficult to answer is that the expected
> output is not clearly defined. I wasn't sure which parts are supposed to
> be Markdown strings and which should become structured JSON.
>
> > I found pandoc.utils.stringify, but it only returns no-spaced
> > concatenated Str, so only useful for debuggin AFAIK.
> >
> > Is there a stringify equivalent for returning content in markdown
> > format? Or what's the best way to walk paragraphs
> > (walk_block/walk_inline) for extracting content?
>
> Take a look at [pandoc.write], which was added in pandoc 2.17. It
> expects whole documents as input, which you can create with
> `pandoc.Pandoc`.
>
> You might also find good use for [pandoc.utils.make_sections], a
> function to turn a flat document into a more hierarchical structure.
>
> [pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write
> [pandoc.utils.make_section]:
> https://pandoc.org/lua-filters.html#pandoc.utils.make_sections
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ee6fd91d-76dd-4af0-b4d1-f41a584f283cn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 4437 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: document to json
[not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-02-06 10:04 ` Albert Krewinkel
0 siblings, 0 replies; 4+ messages in thread
From: Albert Krewinkel @ 2022-02-06 10:04 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: jgran
jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> But, beginner quesion, what's the basic command line for executing the
> pandoc.write example?
The example in the docs is not meant as standalone code. It must be used
as part of a Lua filter or custom writer. A good start is probably to
study some of the complete examples in the docs
<https://pandoc.org/lua-filters.html#examples>. Work you way from there.
Cheers,
Albert
--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-06 10:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 10:38 document to json jgran
[not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-01 16:26 ` Albert Krewinkel
[not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-02-03 11:20 ` jgran
[not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-06 10:04 ` Albert Krewinkel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).