public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* process of converting into HTML
@ 2018-08-15 15:07 M. Mos
       [not found] ` <2d09b7e0-734e-4269-b476-7cafdd50d883-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: M. Mos @ 2018-08-15 15:07 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1290 bytes --]

As I understand it Pandoc converts every type of input into a JSON (AST).
Then this JSON is converted into the type we actually want.
In case of converting any type into HTML Text.Pandoc.Writers.HTML 
<http://hackage.haskell.org/package/pandoc-2.2.3.2/docs/src/Text.Pandoc.Writers.HTML.html#writeHtml4> converts 
the resulting JSON into HTML.
Does it mean the JSONs produced by all Text.Pandoc.Readers.X have the same 
pattern?
And if yes is there any description of this pattern? or do you know any 
program (in imperative language) operating on Pandocs JSON?
Extracting the pattern from Pandoc's writers is not an option because of my 
very basic Haskell knowledge.

P.S. I'm interested in LaTeX to HTML conversion.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2d09b7e0-734e-4269-b476-7cafdd50d883%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1796 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: process of converting into HTML
       [not found] ` <2d09b7e0-734e-4269-b476-7cafdd50d883-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-08-15 16:47   ` John MacFarlane
       [not found]     ` <yh480ktvnvh8zq.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2018-08-15 16:47 UTC (permalink / raw)
  To: M. Mos, pandoc-discuss


Yes, things get converted into an AST.  But it's a
Haskell data structure (not JSON).  It is documented
here:

http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html

You can see it using pandoc -t native.

Pandoc can produce a JSON representation of this
structure.  It is not separately documented, but
by comparing the output of -t json and -t native,
you can understand the pattern.

There are libraries for writing pandoc JSON filters
in various languages, e.g.
http://scorreia.com/software/panflute/
These include code for manipulating the JSON.
Pandoc also includes some native lua support
for this.

See

http://pandoc.org/filters.html
http://pandoc.org/lua-filters.html

Depending on what you want to do, you may
want to consider using either a pandoc filter
or a custom lua writer.

"M. Mos" <m.mossadegh-Mmb7MZpHnFY@public.gmane.org> writes:

> As I understand it Pandoc converts every type of input into a JSON (AST).
> Then this JSON is converted into the type we actually want.
> In case of converting any type into HTML Text.Pandoc.Writers.HTML 
> <http://hackage.haskell.org/package/pandoc-2.2.3.2/docs/src/Text.Pandoc.Writers.HTML.html#writeHtml4> converts 
> the resulting JSON into HTML.
> Does it mean the JSONs produced by all Text.Pandoc.Readers.X have the same 
> pattern?
> And if yes is there any description of this pattern? or do you know any 
> program (in imperative language) operating on Pandocs JSON?
> Extracting the pattern from Pandoc's writers is not an option because of my 
> very basic Haskell knowledge.
>
> P.S. I'm interested in LaTeX to HTML conversion.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2d09b7e0-734e-4269-b476-7cafdd50d883%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: process of converting into HTML
       [not found]     ` <yh480ktvnvh8zq.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-08-16 18:56       ` mos
       [not found]         ` <05c4d44a-291b-4dbc-81c2-e316eac82458-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: mos @ 2018-08-16 18:56 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1756 bytes --]

I want to convert LaTeX into a specific HTML format.
I thought I could convert LaTeX into JSON using Pandoc and to convert the 
resulting JSON into the intended format.
Text-Pandoc-Definition 
<http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html> shows 
the rough structure of such a JSON but that's not enough.
E.g. Pandoc converts
\section*{Greetings}
into
{
      "t": "Header",
      "c": [
        1,
        [
          "greetings",
          [
            "unnumbered", 
            "unnumbered"
          ],
          []
        ],
        [
          {
            "t": "Str",
            "c": "Greetings"
          }
        ]
      ]
    }
and Text-Pandoc-Definition 
<http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html#t:Attr>
 says 
 type Attr = (String, [String], [(String, String)]).
But ["unnumbered","unnumbered"] is not a random string array. It has a 
specific purpose and meaning.
- Would it not be better to replace strings like "unnumbered" with a new 
Algebraic data type?
- Do we have to understand LaTeX Reader/HTML Writer to get the complete 
structure of the resulting AST/JSON?


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/05c4d44a-291b-4dbc-81c2-e316eac82458%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4610 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: process of converting into HTML
       [not found]         ` <05c4d44a-291b-4dbc-81c2-e316eac82458-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-08-16 19:48           ` John MacFarlane
       [not found]             ` <yh480kk1oqf5ye.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2018-08-16 19:48 UTC (permalink / raw)
  To: mos, pandoc-discuss

mos <m.mossadegh-Mmb7MZpHnFY@public.gmane.org> writes:

> I want to convert LaTeX into a specific HTML format.
> I thought I could convert LaTeX into JSON using Pandoc and to convert the 
> resulting JSON into the intended format.
> Text-Pandoc-Definition 
> <http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html> shows 
> the rough structure of such a JSON but that's not enough.
> E.g. Pandoc converts
> \section*{Greetings}
> into
> {
>       "t": "Header",
>       "c": [
>         1,
>         [
>           "greetings",
>           [
>             "unnumbered", 
>             "unnumbered"
>           ],
>           []
>         ],
>         [
>           {
>             "t": "Str",
>             "c": "Greetings"
>           }
>         ]
>       ]
>     }
> and Text-Pandoc-Definition 
> <http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html#t:Attr>
>  says 
>  type Attr = (String, [String], [(String, String)]).
> But ["unnumbered","unnumbered"] is not a random string array. It has a 
> specific purpose and meaning.

In fact, it represents a list of classes.  Yes, they could be
any strings.  Image converting HTML

    <h1 class="foo bar baz">Hello</h1>

However, there should only be one "unnumbered". That's
a bug in the LaTeX reader. I've reported it here:
https://github.com/jgm/pandoc/issues/4838

> - Would it not be better to replace strings like "unnumbered" with a new 
> Algebraic data type?

We can't anticipate in advance all the class
attributes people might use. To be sure, the Attr type
would be better represented as something more
structured. See
https://github.com/jgm/pandoc/issues/3861 This is a
poor early decision that would be a fair amount of
work to change now throughout the code base.

I'll also acknowledge that, in the particular case
of "unnumbered," it would be better to handle that
with another field on Header rather than checking
a list of classes.  Again, it's a tradeoff between
API stability (and the difficulty of modifying the
whole code base) and encoding as much as possible
in the types.

> - Do we have to understand LaTeX Reader/HTML Writer to get the complete 
> structure of the resulting AST/JSON?

I don't see why. The AST structure is completely
defined by the module I linked to before.
The translation to JSON is also very predictable.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: process of converting into HTML
       [not found]             ` <yh480kk1oqf5ye.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-08-16 21:43               ` mos
       [not found]                 ` <4c0a2ced-29d7-4fcc-a6bf-2c5cc33d1622-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: mos @ 2018-08-16 21:43 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1501 bytes --]


>
> > I don't see why. The AST structure is completely 
> > defined by the module I linked to before. 
> > The translation to JSON is also very predictable.
>
 
I do understand that the AST structure has to be general to fit all inputs 
and outputs.
However in case of LaTeX to AST I would prefer to see strings like 
"unnumbered" as part of the structure of AST. Because in this special case 
"unnumbered" is not a random string in fact it's given by pandoc itself 
<http://hackage.haskell.org/package/pandoc-2.2.3.2/docs/src/Text.Pandoc.Readers.LaTeX.html#blockCommands>
.
That's why it's necessary to read LaTeX Reader to find these special 
strings and get the complete structure of LaTeX's AST. Without 
reading LaTeX Reader/HTML Writer and only by studying Text-Pandoc-Definition 
<http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html> I 
would know nothing about "unnumbered" at all.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4c0a2ced-29d7-4fcc-a6bf-2c5cc33d1622%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2248 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: process of converting into HTML
       [not found]                 ` <4c0a2ced-29d7-4fcc-a6bf-2c5cc33d1622-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-08-17  0:36                   ` John MacFarlane
  0 siblings, 0 replies; 6+ messages in thread
From: John MacFarlane @ 2018-08-17  0:36 UTC (permalink / raw)
  To: mos, pandoc-discuss


Yeah.  As I said in my last message, I agree that in
an ideal world, "unnumbered" would be an algebraic
data type and part of the Header constructor.

We've made this kind of compromise here and there in
order to move forward.  When we wanted to support
unnumbered headers, it was a choice: revise the
Pandoc structure, a breaking API change, and modify
most of the 100 or so modules that depend on this?
Or just make this sensitive to the presence of
"unnumbered" as a class and modify a few readers
and writers?  The latter approach won out at the time.

Another reason to just make it sensitive to
the "unnumbered" class is that our current way of
marking this distinction in pandoc markdown is
precisely to add "unnumbered" as a class.  It would
be confusing if this had a different effect on the
AST.

Anyway, this is all to say that I see your point.

mos <m.mossadegh-Mmb7MZpHnFY@public.gmane.org> writes:

>>
>> > I don't see why. The AST structure is completely 
>> > defined by the module I linked to before. 
>> > The translation to JSON is also very predictable.
>>
>  
> I do understand that the AST structure has to be general to fit all inputs 
> and outputs.
> However in case of LaTeX to AST I would prefer to see strings like 
> "unnumbered" as part of the structure of AST. Because in this special case 
> "unnumbered" is not a random string in fact it's given by pandoc itself 
> <http://hackage.haskell.org/package/pandoc-2.2.3.2/docs/src/Text.Pandoc.Readers.LaTeX.html#blockCommands>
> .
> That's why it's necessary to read LaTeX Reader to find these special 
> strings and get the complete structure of LaTeX's AST. Without 
> reading LaTeX Reader/HTML Writer and only by studying Text-Pandoc-Definition 
> <http://hackage.haskell.org/package/pandoc-types-1.17.5.1/docs/Text-Pandoc-Definition.html> I 
> would know nothing about "unnumbered" at all.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4c0a2ced-29d7-4fcc-a6bf-2c5cc33d1622%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-17  0:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-15 15:07 process of converting into HTML M. Mos
     [not found] ` <2d09b7e0-734e-4269-b476-7cafdd50d883-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-08-15 16:47   ` John MacFarlane
     [not found]     ` <yh480ktvnvh8zq.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-08-16 18:56       ` mos
     [not found]         ` <05c4d44a-291b-4dbc-81c2-e316eac82458-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-08-16 19:48           ` John MacFarlane
     [not found]             ` <yh480kk1oqf5ye.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-08-16 21:43               ` mos
     [not found]                 ` <4c0a2ced-29d7-4fcc-a6bf-2c5cc33d1622-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-08-17  0:36                   ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).