public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Cc: jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: document to json
Date: Tue, 01 Feb 2022 17:26:59 +0100	[thread overview]
Message-ID: <878ruuecx0.fsf@zeitkraut.de> (raw)
In-Reply-To: <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I would like to use pandoc to transform a Word collection to json for
> using in search engines (meilisearch, tantivy or datasette for
> instance) and for learning lua-filter also!

Interesting use-case!

> Unfortunately I'm struggling to get content as is, say in markdown for
> paragraph. I don't get how to accumulate content in usable format. I'm
> lost among recursive Str, BulletList, etc.
>
> I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output)
> but got not answer... Sorry for this double post.

No problem, bringing it here was the right step.

I think one reason that this is difficult to answer is that the expected
output is not clearly defined. I wasn't sure which parts are supposed to
be Markdown strings and which should become structured JSON.

> I found pandoc.utils.stringify, but it only returns no-spaced
> concatenated Str, so only useful for debuggin AFAIK.
>
> Is there a stringify equivalent for returning content in markdown
> format? Or what's the best way to walk paragraphs
> (walk_block/walk_inline) for extracting content?

Take a look at [pandoc.write], which was added in pandoc 2.17. It
expects whole documents as input, which you can create with
`pandoc.Pandoc`.

You might also find good use for [pandoc.utils.make_sections], a
function to turn a flat document into a more hierarchical structure.

[pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write
[pandoc.utils.make_section]: https://pandoc.org/lua-filters.html#pandoc.utils.make_sections


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


  parent reply	other threads:[~2022-02-01 16:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 10:38 jgran
     [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-01 16:26   ` Albert Krewinkel [this message]
     [not found]     ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-02-03 11:20       ` jgran
     [not found]         ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-06 10:04           ` Albert Krewinkel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878ruuecx0.fsf@zeitkraut.de \
    --to=albert+pandoc-9eawchwdxg8hfhg+jk9f0w@public.gmane.org \
    --cc=jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).