document to json

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* document to json
@ 2022-02-01 10:38 jgran
       [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: jgran @ 2022-02-01 10:38 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1302 bytes --]

 Hi, 

I would like to use pandoc to transform a Word collection to json for using 
in search engines (meilisearch, tantivy or datasette for instance) and for 
learning lua-filter also!
Unfortunately I'm struggling to get content as is, say in markdown for 
paragraph. I don't get how to accumulate content in usable format. I'm lost 
among recursive Str, BulletList, etc.
I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output 
<https://stackoverflow.com/questions/70834828/pandoc-lua-filter-restructure-json-output>) 
but got not answer... Sorry for this double post.

I found *pandoc.utils.stringify*, but it only returns no-spaced 
concatenated Str, so only useful for debuggin AFAIK.
Is there a *stringify *equivalent for returning content in markdown format? 
Or what's the best way to walk paragraphs (walk_block/walk_inline) for 
extracting content?
Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/002e3430-c171-4af5-8b94-095c054193fdn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1802 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: document to json
       [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-02-01 16:26   ` Albert Krewinkel
       [not found]     ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Albert Krewinkel @ 2022-02-01 16:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: jgran

jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I would like to use pandoc to transform a Word collection to json for
> using in search engines (meilisearch, tantivy or datasette for
> instance) and for learning lua-filter also!

Interesting use-case!

> Unfortunately I'm struggling to get content as is, say in markdown for
> paragraph. I don't get how to accumulate content in usable format. I'm
> lost among recursive Str, BulletList, etc.
>
> I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output)
> but got not answer... Sorry for this double post.

No problem, bringing it here was the right step.

I think one reason that this is difficult to answer is that the expected
output is not clearly defined. I wasn't sure which parts are supposed to
be Markdown strings and which should become structured JSON.

> I found pandoc.utils.stringify, but it only returns no-spaced
> concatenated Str, so only useful for debuggin AFAIK.
>
> Is there a stringify equivalent for returning content in markdown
> format? Or what's the best way to walk paragraphs
> (walk_block/walk_inline) for extracting content?

Take a look at [pandoc.write], which was added in pandoc 2.17. It
expects whole documents as input, which you can create with
`pandoc.Pandoc`.

You might also find good use for [pandoc.utils.make_sections], a
function to turn a flat document into a more hierarchical structure.

[pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write
[pandoc.utils.make_section]: https://pandoc.org/lua-filters.html#pandoc.utils.make_sections


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: document to json
       [not found]     ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-02-03 11:20       ` jgran
       [not found]         ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: jgran @ 2022-02-03 11:20 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2848 bytes --]

Thanks a lot for your answer.

But, beginner quesion, what's the basic command line for executing the 
pandoc.write example?
Pandoc requires a document as input. In the following example, should it be 
'pandoc -L .\write.lua anyfile'? How to output only the result of the 
'write.lua' filter. What to return?
Thx

file write.lua:
    local doc = pandoc.Pandoc( {pandoc.Para {pandoc.Strong 'Tea'}} ) 
    local html = pandoc.write(doc, 'html') 
    assert(html == "<p><strong>Tea</strong></p>")

   return { ?? } -- what to return for having the html as sole output
Le mardi 1 février 2022 à 18:04:40 UTC+1, Albert Krewinkel a écrit :

> jgran <jgrn...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I would like to use pandoc to transform a Word collection to json for
> > using in search engines (meilisearch, tantivy or datasette for
> > instance) and for learning lua-filter also!
>
> Interesting use-case!
>
> > Unfortunately I'm struggling to get content as is, say in markdown for
> > paragraph. I don't get how to accumulate content in usable format. I'm
> > lost among recursive Str, BulletList, etc.
> >
> > I posted on StackOverFlow (Pandoc Lua Filter restructure JSON output)
> > but got not answer... Sorry for this double post.
>
> No problem, bringing it here was the right step.
>
> I think one reason that this is difficult to answer is that the expected
> output is not clearly defined. I wasn't sure which parts are supposed to
> be Markdown strings and which should become structured JSON.
>
> > I found pandoc.utils.stringify, but it only returns no-spaced
> > concatenated Str, so only useful for debuggin AFAIK.
> >
> > Is there a stringify equivalent for returning content in markdown
> > format? Or what's the best way to walk paragraphs
> > (walk_block/walk_inline) for extracting content?
>
> Take a look at [pandoc.write], which was added in pandoc 2.17. It
> expects whole documents as input, which you can create with
> `pandoc.Pandoc`.
>
> You might also find good use for [pandoc.utils.make_sections], a
> function to turn a flat document into a more hierarchical structure.
>
> [pandoc.write]: https://pandoc.org/lua-filters.html#pandoc.write
> [pandoc.utils.make_section]: 
> https://pandoc.org/lua-filters.html#pandoc.utils.make_sections
>
>
> -- 
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ee6fd91d-76dd-4af0-b4d1-f41a584f283cn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4437 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: document to json
       [not found]         ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-02-06 10:04           ` Albert Krewinkel
  0 siblings, 0 replies; 4+ messages in thread
From: Albert Krewinkel @ 2022-02-06 10:04 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: jgran


jgran <jgrnduel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> But, beginner quesion, what's the basic command line for executing the
> pandoc.write example?

The example in the docs is not meant as standalone code. It must be used
as part of a Lua filter or custom writer.  A good start is probably to
study some of the complete examples in the docs
<https://pandoc.org/lua-filters.html#examples>. Work you way from there.

Cheers,
Albert


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-02-06 10:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 10:38 document to json jgran
     [not found] ` <002e3430-c171-4af5-8b94-095c054193fdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-01 16:26   ` Albert Krewinkel
     [not found]     ` <878ruuecx0.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-02-03 11:20       ` jgran
     [not found]         ` <ee6fd91d-76dd-4af0-b4d1-f41a584f283cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-02-06 10:04           ` Albert Krewinkel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).