* document to json @ 2022-02-01 10:38 jgran [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: jgran @ 2022-02-01 10:38 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1302 bytes --] Hi, I would like to use pandoc to transform a Word collection to json for using in search engines (meilisearch, tantivy or datasette for instance) and for learning lua-filter also! Unfortunately I'm struggling to get content as is, say in markdown for paragraph. I don't get how to accumulate content in usable format. I'm lost among recursive Str, BulletList, etc. I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output <https://stackoverflow.com/questions/70834828/pandoc-lua-filter-restructure-json-output>) but got not answer... Sorry for this double post. I found *pandoc.utils.stringify*, but it only returns no-spaced concatenated Str, so only useful for debuggin AFAIK. Is there a *stringify *equivalent for returning content in markdown format? Or what's the best way to walk paragraphs (walk_block/walk_inline) for extracting content? Thanks in advance! -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/002e3430-c171-4af5-8b94-095c054193fdn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 1802 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: document to json [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-02-01 16:26 ` Albert Krewinkel [not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Albert Krewinkel @ 2022-02-01 16:26 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: jgran jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > I would like to use pandoc to transform a Word collection to json for > using in search engines (meilisearch, tantivy or datasette for > instance) and for learning lua-filter also! Interesting use-case! > Unfortunately I'm struggling to get content as is, say in markdown for > paragraph. I don't get how to accumulate content in usable format. I'm > lost among recursive Str, BulletList, etc. > > I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output) > but got not answer... Sorry for this double post. No problem, bringing it here was the right step. I think one reason that this is difficult to answer is that the expected output is not clearly defined. I wasn't sure which parts are supposed to be Markdown strings and which should become structured JSON. > I found pandoc.utils.stringify, but it only returns no-spaced > concatenated Str, so only useful for debuggin AFAIK. > > Is there a stringify equivalent for returning content in markdown > format? Or what's the best way to walk paragraphs > (walk_block/walk_inline) for extracting content? Take a look at [pandoc.write], which was added in pandoc 2.17. It expects whole documents as input, which you can create with `pandoc.Pandoc`. You might also find good use for [pandoc.utils.make_sections], a function to turn a flat document into a more hierarchical structure. [pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write [pandoc.utils.make_section]: https://pandoc.org/lua-filters.html#pandoc.utils.make_sections -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>]
* Re: document to json [not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> @ 2022-02-03 11:20 ` jgran [not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: jgran @ 2022-02-03 11:20 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2848 bytes --] Thanks a lot for your answer. But, beginner quesion, what's the basic command line for executing the pandoc.write example? Pandoc requires a document as input. In the following example, should it be 'pandoc -L .\write.lua anyfile'? How to output only the result of the 'write.lua' filter. What to return? Thx file write.lua: local doc = pandoc.Pandoc( {pandoc.Para {pandoc.Strong 'Tea'}} ) local html = pandoc.write(doc, 'html') assert(html == "<p><strong>Tea</strong></p>") return { ?? } -- what to return for having the html as sole output Le mardi 1 février 2022 à 18:04:40 UTC+1, Albert Krewinkel a écrit : > jgran <jgrn...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > I would like to use pandoc to transform a Word collection to json for > > using in search engines (meilisearch, tantivy or datasette for > > instance) and for learning lua-filter also! > > Interesting use-case! > > > Unfortunately I'm struggling to get content as is, say in markdown for > > paragraph. I don't get how to accumulate content in usable format. I'm > > lost among recursive Str, BulletList, etc. > > > > I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output) > > but got not answer... Sorry for this double post. > > No problem, bringing it here was the right step. > > I think one reason that this is difficult to answer is that the expected > output is not clearly defined. I wasn't sure which parts are supposed to > be Markdown strings and which should become structured JSON. > > > I found pandoc.utils.stringify, but it only returns no-spaced > > concatenated Str, so only useful for debuggin AFAIK. > > > > Is there a stringify equivalent for returning content in markdown > > format? Or what's the best way to walk paragraphs > > (walk_block/walk_inline) for extracting content? > > Take a look at [pandoc.write], which was added in pandoc 2.17. It > expects whole documents as input, which you can create with > `pandoc.Pandoc`. > > You might also find good use for [pandoc.utils.make_sections], a > function to turn a flat document into a more hierarchical structure. > > [pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write > [pandoc.utils.make_section]: > https://pandoc.org/lua-filters.html#pandoc.utils.make_sections > > > -- > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ee6fd91d-76dd-4af0-b4d1-f41a584f283cn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 4437 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: document to json [not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-02-06 10:04 ` Albert Krewinkel 0 siblings, 0 replies; 4+ messages in thread From: Albert Krewinkel @ 2022-02-06 10:04 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: jgran jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > But, beginner quesion, what's the basic command line for executing the > pandoc.write example? The example in the docs is not meant as standalone code. It must be used as part of a Lua filter or custom writer. A good start is probably to study some of the complete examples in the docs <https://pandoc.org/lua-filters.html#examples>. Work you way from there. Cheers, Albert -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-06 10:04 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-01 10:38 document to json jgran [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-02-01 16:26 ` Albert Krewinkel [not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 2022-02-03 11:20 ` jgran [not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-02-06 10:04 ` Albert Krewinkel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).