From: Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Cc: jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: document to json
Date: Tue, 01 Feb 2022 17:26:59 +0100 [thread overview]
Message-ID: <878ruuecx0.fsf@zeitkraut.de> (raw)
In-Reply-To: <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> I would like to use pandoc to transform a Word collection to json for
> using in search engines (meilisearch, tantivy or datasette for
> instance) and for learning lua-filter also!
Interesting use-case!
> Unfortunately I'm struggling to get content as is, say in markdown for
> paragraph. I don't get how to accumulate content in usable format. I'm
> lost among recursive Str, BulletList, etc.
>
> I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output)
> but got not answer... Sorry for this double post.
No problem, bringing it here was the right step.
I think one reason that this is difficult to answer is that the expected
output is not clearly defined. I wasn't sure which parts are supposed to
be Markdown strings and which should become structured JSON.
> I found pandoc.utils.stringify, but it only returns no-spaced
> concatenated Str, so only useful for debuggin AFAIK.
>
> Is there a stringify equivalent for returning content in markdown
> format? Or what's the best way to walk paragraphs
> (walk_block/walk_inline) for extracting content?
Take a look at [pandoc.write], which was added in pandoc 2.17. It
expects whole documents as input, which you can create with
`pandoc.Pandoc`.
You might also find good use for [pandoc.utils.make_sections], a
function to turn a flat document into a more hierarchical structure.
[pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write
[pandoc.utils.make_section]: https://pandoc.org/lua-filters.html#pandoc.utils.make_sections
--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
next prev parent reply other threads:[~2022-02-01 16:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-01 10:38 jgran
[not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-01 16:26 ` Albert Krewinkel [this message]
[not found] ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-02-03 11:20 ` jgran
[not found] ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-06 10:04 ` Albert Krewinkel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878ruuecx0.fsf@zeitkraut.de \
--to=albert+pandoc-9eawchwdxg8hfhg+jk9f0w@public.gmane.org \
--cc=jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).