Span and Div elements without any attributes (in filter output)

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Span and Div elements without any attributes (in filter output)
@ 2013-10-11 11:55 BP Jonsson
       [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-11 11:55 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

In a filter it's sometimes desirable to replace an element in its
parent element's content list with e.g. the contents of the
element itself, modified in some way. In practice this is hard to
do as you'll have to walk the AST data structure and collect
elements along with a reference to their parent element's content
list, which is a bit more complicated than just collecting the
(child) elements themselves. One possible workaround is to
convert the element into a Div or Span element and set it's
contents to whatever one wants to replace the original element
with. It works in the sense that the Span or Div element will
just sit there and, well, contain the data, but in HTML output it
will show up as a `<span>...</span>` or `<div>...</div>`, even
though it probably doesn't have any meaningful purpose in the
HTML document; it just makes the HTML harder to read and harder
to render.

I've tried to write a filter which removes Span and Div elements
which don't have any attributes at all (id, class or other
attributes) -- or alternatively those which have a `disembowel=1`
attribute, although the absence of any attributes seems a better
criterion -- but for various

reasons this has proved hard within a reasonable level of
'parsing', especially since the AST data structure is rather
radically altered in the process -- paths to elements change
during the process in ways that make the processing hard. What if
pandoc itself replaced such attribute-less Span and Div elements
with their content at least when the `--normalize` option is set?
After all pandoc parses the whole document anyway!

/bpj

P.S.
:   I am aware that there may be situations when a `<span>` or `<div>`
     may be meaningful in an HTML document, most notably perhaps
     when a CSS rule targets them as children of some other
     element, but it seems to me that even then it is probably
     most user friendly to give them a class describing their
     function if they are really intended to fulfill a function in
     the document, so that it might be reasonable for pandoc to
     remove such attribute-less elements at least under
     `-- normalize`.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-12 14:50   ` John MacFarlane
       [not found]     ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: John MacFarlane @ 2013-10-12 14:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I don't want to make removing empty spans the default, since
it breaks expected behavior that HTML tags will be passed through
verbatim.

Note that if you use the python pandocfilters library to write
your filters, your transformation functions can return a list
instead of an object, in which case the list will be spliced in
to the result (which I think is what you want).

If you're writing the filters in Haskell, you can just use a
function Inline -> [Inline] or Inline -> IO [Inline].

+++ BP Jonsson [Oct 11 13 13:55 ]:
> In a filter it's sometimes desirable to replace an element in its
> parent element's content list with e.g. the contents of the
> element itself, modified in some way. In practice this is hard to
> do as you'll have to walk the AST data structure and collect
> elements along with a reference to their parent element's content
> list, which is a bit more complicated than just collecting the
> (child) elements themselves. One possible workaround is to
> convert the element into a Div or Span element and set it's
> contents to whatever one wants to replace the original element
> with. It works in the sense that the Span or Div element will
> just sit there and, well, contain the data, but in HTML output it
> will show up as a `<span>...</span>` or `<div>...</div>`, even
> though it probably doesn't have any meaningful purpose in the
> HTML document; it just makes the HTML harder to read and harder
> to render.
> 
> I've tried to write a filter which removes Span and Div elements
> which don't have any attributes at all (id, class or other
> attributes) -- or alternatively those which have a `disembowel=1`
> attribute, although the absence of any attributes seems a better
> criterion -- but for various
> 
> reasons this has proved hard within a reasonable level of
> 'parsing', especially since the AST data structure is rather
> radically altered in the process -- paths to elements change
> during the process in ways that make the processing hard. What if
> pandoc itself replaced such attribute-less Span and Div elements
> with their content at least when the `--normalize` option is set?
> After all pandoc parses the whole document anyway!
> 
> /bpj
> 
> P.S.
> :   I am aware that there may be situations when a `<span>` or `<div>`
>     may be meaningful in an HTML document, most notably perhaps
>     when a CSS rule targets them as children of some other
>     element, but it seems to me that even then it is probably
>     most user friendly to give them a class describing their
>     function if they are really intended to fulfill a function in
>     the document, so that it might be reasonable for pandoc to
>     remove such attribute-less elements at least under
>     `-- normalize`.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com.
> For more options, visit https://groups.google.com/groups/opt_out.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]     ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
@ 2013-10-13 12:15       ` BP Jonsson
       [not found]         ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-13 12:15 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5996 bytes --]

I'm trying to write something similar to pandocfilters.py to help with
writing filters in Perl.

I noticed that the that the `walk` function in pandocfilters.py seems to
expect that the dict objects it receives has an element name like
`CodeBlock` as a key, with the contents of the element as value. This was
indeed how the JSON output by pandoc looked prior to pandoc 1.12:

{"CodeBlock":[...]}
but in pandoc 1.12 I get JSON output where each element object has a `tag`
key with the element type, e.g. `CodeBlock` as value and a key "contents"
with the element contents as value:

{"tag":"CodeBlock","contents":[...]}
I guessed that it's the json module which automatically converts from the
'new style' to the 'old style' behind the scenes? I tried to locate its
documentation but couldn't find anything relevant.

I'm probably revealing my utter ignorance of python here -- I'm not a
programmer but a philologist who learned perl years ago to work on my data
--, and it's probably a good time to remedy that, but I want to be sure
what kind of data I should be expecting/returning, or if something is
broken in my pandoc installation, however unlikely!

Den lördagen den 12:e oktober 2013 skrev John MacFarlane:

> I don't want to make removing empty spans the default, since
> it breaks expected behavior that HTML tags will be passed through
> verbatim.
>
> Note that if you use the python pandocfilters library to write
> your filters, your transformation functions can return a list
> instead of an object, in which case the list will be spliced in
> to the result (which I think is what you want).
>
> If you're writing the filters in Haskell, you can just use a
> function Inline -> [Inline] or Inline -> IO [Inline].
>
> +++ BP Jonsson [Oct 11 13 13:55 ]:
> > In a filter it's sometimes desirable to replace an element in its
> > parent element's content list with e.g. the contents of the
> > element itself, modified in some way. In practice this is hard to
> > do as you'll have to walk the AST data structure and collect
> > elements along with a reference to their parent element's content
> > list, which is a bit more complicated than just collecting the
> > (child) elements themselves. One possible workaround is to
> > convert the element into a Div or Span element and set it's
> > contents to whatever one wants to replace the original element
> > with. It works in the sense that the Span or Div element will
> > just sit there and, well, contain the data, but in HTML output it
> > will show up as a `<span>...</span>` or `<div>...</div>`, even
> > though it probably doesn't have any meaningful purpose in the
> > HTML document; it just makes the HTML harder to read and harder
> > to render.
> >
> > I've tried to write a filter which removes Span and Div elements
> > which don't have any attributes at all (id, class or other
> > attributes) -- or alternatively those which have a `disembowel=1`
> > attribute, although the absence of any attributes seems a better
> > criterion -- but for various
> >
> > reasons this has proved hard within a reasonable level of
> > 'parsing', especially since the AST data structure is rather
> > radically altered in the process -- paths to elements change
> > during the process in ways that make the processing hard. What if
> > pandoc itself replaced such attribute-less Span and Div elements
> > with their content at least when the `--normalize` option is set?
> > After all pandoc parses the whole document anyway!
> >
> > /bpj
> >
> > P.S.
> > :   I am aware that there may be situations when a `<span>` or `<div>`
> >     may be meaningful in an HTML document, most notably perhaps
> >     when a CSS rule targets them as children of some other
> >     element, but it seems to me that even then it is probably
> >     most user friendly to give them a class describing their
> >     function if they are really intended to fulfill a function in
> >     the document, so that it might be reasonable for pandoc to
> >     remove such attribute-less elements at least under
> >     `-- normalize`.
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
> .
> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>


-- 
/BP

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 8070 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]         ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-10-13 21:29           ` John MacFarlane
       [not found]             ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: John MacFarlane @ 2013-10-13 21:29 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

There was a change in the aeson (json) library that caused pandoc's
JSON format to change, briefly.  (The latest code works around the
change and restores the old behavior.)

However, I think it would be worth considering changing to the
format aeson now defaults to:

    {"tag": "CodeBlock", "contents": ...}

instead of

    {"CodeBlock": ...}

The former is more verbose but much easier to work with programatically.
And we could remove some of the verbosity by changing "tag" to "k"
and "contents" to "v".

Thoughts?

+++ BP Jonsson [Oct 13 13 14:15 ]:
>    I'm trying to write something similar to pandocfilters.py to help with
>    writing filters in Perl.
> 
>    I noticed that the that the `walk` function in pandocfilters.py seems
>    to expect that the dict objects it receives has an element name like
>    `CodeBlock` as a key, with the contents of the element as value. This
>    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
> 
>    {"CodeBlock":[...]}
> 
>    but in pandoc 1.12 I get JSON output where each element object has a
>    `tag` key with the element type, e.g. `CodeBlock` as value and a key
>    "contents" with the element contents as value:
> 
>    {"tag":"CodeBlock","contents":[...]}
> 
>    I guessed that it's the json module which automatically converts from
>    the 'new style' to the 'old style' behind the scenes? I tried to locate
>    its documentation but couldn't find anything relevant.
> 
>    I'm probably revealing my utter ignorance of python here -- I'm not a
>    programmer but a philologist who learned perl years ago to work on my
>    data --, and it's probably a good time to remedy that, but I want to be
>    sure what kind of data I should be expecting/returning, or if something
>    is broken in my pandoc installation, however unlikely!
> 
>    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
> 
>      I don't want to make removing empty spans the default, since
>      it breaks expected behavior that HTML tags will be passed through
>      verbatim.
>      Note that if you use the python pandocfilters library to write
>      your filters, your transformation functions can return a list
>      instead of an object, in which case the list will be spliced in
>      to the result (which I think is what you want).
>      If you're writing the filters in Haskell, you can just use a
>      function Inline -> [Inline] or Inline -> IO [Inline].
>      +++ BP Jonsson [Oct 11 13 13:55 ]:
>      > In a filter it's sometimes desirable to replace an element in its
>      > parent element's content list with e.g. the contents of the
>      > element itself, modified in some way. In practice this is hard to
>      > do as you'll have to walk the AST data structure and collect
>      > elements along with a reference to their parent element's content
>      > list, which is a bit more complicated than just collecting the
>      > (child) elements themselves. One possible workaround is to
>      > convert the element into a Div or Span element and set it's
>      > contents to whatever one wants to replace the original element
>      > with. It works in the sense that the Span or Div element will
>      > just sit there and, well, contain the data, but in HTML output it
>      > will show up as a `<span>...</span>` or `<div>...</div>`, even
>      > though it probably doesn't have any meaningful purpose in the
>      > HTML document; it just makes the HTML harder to read and harder
>      > to render.
>      >
>      > I've tried to write a filter which removes Span and Div elements
>      > which don't have any attributes at all (id, class or other
>      > attributes) -- or alternatively those which have a `disembowel=1`
>      > attribute, although the absence of any attributes seems a better
>      > criterion -- but for various
>      >
>      > reasons this has proved hard within a reasonable level of
>      > 'parsing', especially since the AST data structure is rather
>      > radically altered in the process -- paths to elements change
>      > during the process in ways that make the processing hard. What if
>      > pandoc itself replaced such attribute-less Span and Div elements
>      > with their content at least when the `--normalize` option is set?
>      > After all pandoc parses the whole document anyway!
>      >
>      > /bpj
>      >
>      > P.S.
>      > : I am aware that there may be situations when a `<span>` or
>      `<div>`
>      > may be meaningful in an HTML document, most notably perhaps
>      > when a CSS rule targets them as children of some other
>      > element, but it seems to me that even then it is probably
>      > most user friendly to give them a class describing their
>      > function if they are really intended to fulfill a function in
>      > the document, so that it might be reasonable for pandoc to
>      > remove such attribute-less elements at least under
>      > `-- normalize`.
>      >
>      > --
>      > You received this message because you are subscribed to the Google
>      Groups "pandoc-discuss" group.
>      > To unsubscribe from this group and stop receiving emails from it,
>      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>      > To post to this group, send email to
>      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>      > To view this discussion on the web visit
>      [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
>      %40gmail.com.
>      > For more options, visit
>      [4]https://groups.google.com/groups/opt_out.
>      --
>      You received this message because you are subscribed to the Google
>      Groups "pandoc-discuss" group.
>      To unsubscribe from this group and stop receiving emails from it,
>      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>      To post to this group, send email to
>      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>      To view this discussion on the web visit
>      [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
>      E95559%40Johns-MacBook-Pro.local.
>      For more options, visit [8]https://groups.google.com/groups/opt_out.
> 
>    --
>    /BP
> 
>    --
>    You received this message because you are subscribed to the Google
>    Groups "pandoc-discuss" group.
>    To unsubscribe from this group and stop receiving emails from it, send
>    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>    To view this discussion on the web visit
>    [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
>    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
>    For more options, visit [10]https://groups.google.com/groups/opt_out.
> 
> References
> 
>    1. javascript:;
>    2. javascript:;
>    3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
>    4. https://groups.google.com/groups/opt_out
>    5. javascript:;
>    6. javascript:;
>    7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
>    8. https://groups.google.com/groups/opt_out
>    9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
>   10. https://groups.google.com/groups/opt_out


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]             ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2013-10-14  2:06               ` Peter Sefton
       [not found]                 ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-10-14  6:01               ` BP Jonsson
  2013-10-16 14:16               ` BP Jonsson
  2 siblings, 1 reply; 19+ messages in thread
From: Peter Sefton @ 2013-10-14  2:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

John,

As a new user I agree that the {"tag": "X" format makes more sense as
it easier to code against. Is the key 'tag' and artefact of the json
serialiser?  In this particular case 'tag' almost works, I think a
meaningful name like 'element' or 'node' would be better than 'k'.

Peter

On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> There was a change in the aeson (json) library that caused pandoc's
> JSON format to change, briefly.  (The latest code works around the
> change and restores the old behavior.)
>
> However, I think it would be worth considering changing to the
> format aeson now defaults to:
>
>     {"tag": "CodeBlock", "contents": ...}
>
> instead of
>
>     {"CodeBlock": ...}
>
> The former is more verbose but much easier to work with programatically.
> And we could remove some of the verbosity by changing "tag" to "k"
> and "contents" to "v".
>
> Thoughts?
>
> +++ BP Jonsson [Oct 13 13 14:15 ]:
>>    I'm trying to write something similar to pandocfilters.py to help with
>>    writing filters in Perl.
>>
>>    I noticed that the that the `walk` function in pandocfilters.py seems
>>    to expect that the dict objects it receives has an element name like
>>    `CodeBlock` as a key, with the contents of the element as value. This
>>    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
>>
>>    {"CodeBlock":[...]}
>>
>>    but in pandoc 1.12 I get JSON output where each element object has a
>>    `tag` key with the element type, e.g. `CodeBlock` as value and a key
>>    "contents" with the element contents as value:
>>
>>    {"tag":"CodeBlock","contents":[...]}
>>
>>    I guessed that it's the json module which automatically converts from
>>    the 'new style' to the 'old style' behind the scenes? I tried to locate
>>    its documentation but couldn't find anything relevant.
>>
>>    I'm probably revealing my utter ignorance of python here -- I'm not a
>>    programmer but a philologist who learned perl years ago to work on my
>>    data --, and it's probably a good time to remedy that, but I want to be
>>    sure what kind of data I should be expecting/returning, or if something
>>    is broken in my pandoc installation, however unlikely!
>>
>>    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
>>
>>      I don't want to make removing empty spans the default, since
>>      it breaks expected behavior that HTML tags will be passed through
>>      verbatim.
>>      Note that if you use the python pandocfilters library to write
>>      your filters, your transformation functions can return a list
>>      instead of an object, in which case the list will be spliced in
>>      to the result (which I think is what you want).
>>      If you're writing the filters in Haskell, you can just use a
>>      function Inline -> [Inline] or Inline -> IO [Inline].
>>      +++ BP Jonsson [Oct 11 13 13:55 ]:
>>      > In a filter it's sometimes desirable to replace an element in its
>>      > parent element's content list with e.g. the contents of the
>>      > element itself, modified in some way. In practice this is hard to
>>      > do as you'll have to walk the AST data structure and collect
>>      > elements along with a reference to their parent element's content
>>      > list, which is a bit more complicated than just collecting the
>>      > (child) elements themselves. One possible workaround is to
>>      > convert the element into a Div or Span element and set it's
>>      > contents to whatever one wants to replace the original element
>>      > with. It works in the sense that the Span or Div element will
>>      > just sit there and, well, contain the data, but in HTML output it
>>      > will show up as a `<span>...</span>` or `<div>...</div>`, even
>>      > though it probably doesn't have any meaningful purpose in the
>>      > HTML document; it just makes the HTML harder to read and harder
>>      > to render.
>>      >
>>      > I've tried to write a filter which removes Span and Div elements
>>      > which don't have any attributes at all (id, class or other
>>      > attributes) -- or alternatively those which have a `disembowel=1`
>>      > attribute, although the absence of any attributes seems a better
>>      > criterion -- but for various
>>      >
>>      > reasons this has proved hard within a reasonable level of
>>      > 'parsing', especially since the AST data structure is rather
>>      > radically altered in the process -- paths to elements change
>>      > during the process in ways that make the processing hard. What if
>>      > pandoc itself replaced such attribute-less Span and Div elements
>>      > with their content at least when the `--normalize` option is set?
>>      > After all pandoc parses the whole document anyway!
>>      >
>>      > /bpj
>>      >
>>      > P.S.
>>      > : I am aware that there may be situations when a `<span>` or
>>      `<div>`
>>      > may be meaningful in an HTML document, most notably perhaps
>>      > when a CSS rule targets them as children of some other
>>      > element, but it seems to me that even then it is probably
>>      > most user friendly to give them a class describing their
>>      > function if they are really intended to fulfill a function in
>>      > the document, so that it might be reasonable for pandoc to
>>      > remove such attribute-less elements at least under
>>      > `-- normalize`.
>>      >
>>      > --
>>      > You received this message because you are subscribed to the Google
>>      Groups "pandoc-discuss" group.
>>      > To unsubscribe from this group and stop receiving emails from it,
>>      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>      > To post to this group, send email to
>>      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>      > To view this discussion on the web visit
>>      [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
>>      %40gmail.com.
>>      > For more options, visit
>>      [4]https://groups.google.com/groups/opt_out.
>>      --
>>      You received this message because you are subscribed to the Google
>>      Groups "pandoc-discuss" group.
>>      To unsubscribe from this group and stop receiving emails from it,
>>      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>      To post to this group, send email to
>>      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>      To view this discussion on the web visit
>>      [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
>>      E95559%40Johns-MacBook-Pro.local.
>>      For more options, visit [8]https://groups.google.com/groups/opt_out.
>>
>>    --
>>    /BP
>>
>>    --
>>    You received this message because you are subscribed to the Google
>>    Groups "pandoc-discuss" group.
>>    To unsubscribe from this group and stop receiving emails from it, send
>>    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>    To view this discussion on the web visit
>>    [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
>>    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
>>    For more options, visit [10]https://groups.google.com/groups/opt_out.
>>
>> References
>>
>>    1. javascript:;
>>    2. javascript:;
>>    3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
>>    4. https://groups.google.com/groups/opt_out
>>    5. javascript:;
>>    6. javascript:;
>>    7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
>>    8. https://groups.google.com/groups/opt_out
>>    9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
>>   10. https://groups.google.com/groups/opt_out
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu.
> For more options, visit https://groups.google.com/groups/opt_out.



-- 

Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com
Gmail, Twitter & Skype name: ptsefton


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                 ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-10-14  2:12                   ` Peter Sefton
       [not found]                     ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Sefton @ 2013-10-14  2:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Further to this, I might be missing the point, but this could also be
made much easier to work with in arbitrary languages:


On Mon, Oct 14, 2013 at 1:06 PM, Peter Sefton <ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> John,
>
> As a new user I agree that the {"tag": "X" format makes more sense as
> it easier to code against. Is the key 'tag' and artefact of the json
> serialiser?  In this particular case 'tag' almost works, I think a
> meaningful name like 'element' or 'node' would be better than 'k'.
>
> Peter
>
> On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> There was a change in the aeson (json) library that caused pandoc's
>> JSON format to change, briefly.  (The latest code works around the
>> change and restores the old behavior.)
>>
>> However, I think it would be worth considering changing to the
>> format aeson now defaults to:
>>
>>     {"tag": "CodeBlock", "contents": ...}
>>
>> instead of
>>
>>     {"CodeBlock": ...}
>>
>> The former is more verbose but much easier to work with programatically.
>> And we could remove some of the verbosity by changing "tag" to "k"
>> and "contents" to "v".
>>
>> Thoughts?
>>
>> +++ BP Jonsson [Oct 13 13 14:15 ]:
>>>    I'm trying to write something similar to pandocfilters.py to help with
>>>    writing filters in Perl.
>>>
>>>    I noticed that the that the `walk` function in pandocfilters.py seems
>>>    to expect that the dict objects it receives has an element name like
>>>    `CodeBlock` as a key, with the contents of the element as value. This
>>>    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
>>>
>>>    {"CodeBlock":[...]}
>>>
>>>    but in pandoc 1.12 I get JSON output where each element object has a
>>>    `tag` key with the element type, e.g. `CodeBlock` as value and a key
>>>    "contents" with the element contents as value:
>>>
>>>    {"tag":"CodeBlock","contents":[...]}
>>>
>>>    I guessed that it's the json module which automatically converts from
>>>    the 'new style' to the 'old style' behind the scenes? I tried to locate
>>>    its documentation but couldn't find anything relevant.
>>>
>>>    I'm probably revealing my utter ignorance of python here -- I'm not a
>>>    programmer but a philologist who learned perl years ago to work on my
>>>    data --, and it's probably a good time to remedy that, but I want to be
>>>    sure what kind of data I should be expecting/returning, or if something
>>>    is broken in my pandoc installation, however unlikely!
>>>
>>>    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
>>>
>>>      I don't want to make removing empty spans the default, since
>>>      it breaks expected behavior that HTML tags will be passed through
>>>      verbatim.
>>>      Note that if you use the python pandocfilters library to write
>>>      your filters, your transformation functions can return a list
>>>      instead of an object, in which case the list will be spliced in
>>>      to the result (which I think is what you want).
>>>      If you're writing the filters in Haskell, you can just use a
>>>      function Inline -> [Inline] or Inline -> IO [Inline].
>>>      +++ BP Jonsson [Oct 11 13 13:55 ]:
>>>      > In a filter it's sometimes desirable to replace an element in its
>>>      > parent element's content list with e.g. the contents of the
>>>      > element itself, modified in some way. In practice this is hard to
>>>      > do as you'll have to walk the AST data structure and collect
>>>      > elements along with a reference to their parent element's content
>>>      > list, which is a bit more complicated than just collecting the
>>>      > (child) elements themselves. One possible workaround is to
>>>      > convert the element into a Div or Span element and set it's
>>>      > contents to whatever one wants to replace the original element
>>>      > with. It works in the sense that the Span or Div element will
>>>      > just sit there and, well, contain the data, but in HTML output it
>>>      > will show up as a `<span>...</span>` or `<div>...</div>`, even
>>>      > though it probably doesn't have any meaningful purpose in the
>>>      > HTML document; it just makes the HTML harder to read and harder
>>>      > to render.
>>>      >
>>>      > I've tried to write a filter which removes Span and Div elements
>>>      > which don't have any attributes at all (id, class or other
>>>      > attributes) -- or alternatively those which have a `disembowel=1`
>>>      > attribute, although the absence of any attributes seems a better
>>>      > criterion -- but for various
>>>      >
>>>      > reasons this has proved hard within a reasonable level of
>>>      > 'parsing', especially since the AST data structure is rather
>>>      > radically altered in the process -- paths to elements change
>>>      > during the process in ways that make the processing hard. What if
>>>      > pandoc itself replaced such attribute-less Span and Div elements
>>>      > with their content at least when the `--normalize` option is set?
>>>      > After all pandoc parses the whole document anyway!
>>>      >
>>>      > /bpj
>>>      >
>>>      > P.S.
>>>      > : I am aware that there may be situations when a `<span>` or
>>>      `<div>`
>>>      > may be meaningful in an HTML document, most notably perhaps
>>>      > when a CSS rule targets them as children of some other
>>>      > element, but it seems to me that even then it is probably
>>>      > most user friendly to give them a class describing their
>>>      > function if they are really intended to fulfill a function in
>>>      > the document, so that it might be reasonable for pandoc to
>>>      > remove such attribute-less elements at least under
>>>      > `-- normalize`.
>>>      >
>>>      > --
>>>      > You received this message because you are subscribed to the Google
>>>      Groups "pandoc-discuss" group.
>>>      > To unsubscribe from this group and stop receiving emails from it,
>>>      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>      > To post to this group, send email to
>>>      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>      > To view this discussion on the web visit
>>>      [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
>>>      %40gmail.com.
>>>      > For more options, visit
>>>      [4]https://groups.google.com/groups/opt_out.
>>>      --
>>>      You received this message because you are subscribed to the Google
>>>      Groups "pandoc-discuss" group.
>>>      To unsubscribe from this group and stop receiving emails from it,
>>>      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>      To post to this group, send email to
>>>      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>      To view this discussion on the web visit
>>>      [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
>>>      E95559%40Johns-MacBook-Pro.local.
>>>      For more options, visit [8]https://groups.google.com/groups/opt_out.
>>>
>>>    --
>>>    /BP
>>>
>>>    --
>>>    You received this message because you are subscribed to the Google
>>>    Groups "pandoc-discuss" group.
>>>    To unsubscribe from this group and stop receiving emails from it, send
>>>    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>    To view this discussion on the web visit
>>>    [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
>>>    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
>>>    For more options, visit [10]https://groups.google.com/groups/opt_out.
>>>
>>> References
>>>
>>>    1. javascript:;
>>>    2. javascript:;
>>>    3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
>>>    4. https://groups.google.com/groups/opt_out
>>>    5. javascript:;
>>>    6. javascript:;
>>>    7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
>>>    8. https://groups.google.com/groups/opt_out
>>>    9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
>>>   10. https://groups.google.com/groups/opt_out
>>
>> --
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu.
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
> --
>
> Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com
> Gmail, Twitter & Skype name: ptsefton



-- 

Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com
Gmail, Twitter & Skype name: ptsefton


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                     ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-10-14  2:16                       ` Peter Sefton
       [not found]                         ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Sefton @ 2013-10-14  2:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Oh dear, sorry for sending that last incomplete thought, fingers slipped.

Further to this, I might be missing the point, but this could also be
made much easier to work with in arbitrary languages:

[{"Header":[1,["heading-1",[],[]]

Something like:
[{"node" : "Header", "level":1, "id": "heading", ...


On Mon, Oct 14, 2013 at 1:12 PM, Peter Sefton <ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Further to this, I might be missing the point, but this could also be
> made much easier to work with in arbitrary languages:
>
>
> On Mon, Oct 14, 2013 at 1:06 PM, Peter Sefton <ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> John,
>>
>> As a new user I agree that the {"tag": "X" format makes more sense as
>> it easier to code against. Is the key 'tag' and artefact of the json
>> serialiser?  In this particular case 'tag' almost works, I think a
>> meaningful name like 'element' or 'node' would be better than 'k'.
>>
>> Peter
>>
>> On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> There was a change in the aeson (json) library that caused pandoc's
>>> JSON format to change, briefly.  (The latest code works around the
>>> change and restores the old behavior.)
>>>
>>> However, I think it would be worth considering changing to the
>>> format aeson now defaults to:
>>>
>>>     {"tag": "CodeBlock", "contents": ...}
>>>
>>> instead of
>>>
>>>     {"CodeBlock": ...}
>>>
>>> The former is more verbose but much easier to work with programatically.
>>> And we could remove some of the verbosity by changing "tag" to "k"
>>> and "contents" to "v".
>>>
>>> Thoughts?
>>>
>>> +++ BP Jonsson [Oct 13 13 14:15 ]:
>>>>    I'm trying to write something similar to pandocfilters.py to help with
>>>>    writing filters in Perl.
>>>>
>>>>    I noticed that the that the `walk` function in pandocfilters.py seems
>>>>    to expect that the dict objects it receives has an element name like
>>>>    `CodeBlock` as a key, with the contents of the element as value. This
>>>>    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
>>>>
>>>>    {"CodeBlock":[...]}
>>>>
>>>>    but in pandoc 1.12 I get JSON output where each element object has a
>>>>    `tag` key with the element type, e.g. `CodeBlock` as value and a key
>>>>    "contents" with the element contents as value:
>>>>
>>>>    {"tag":"CodeBlock","contents":[...]}
>>>>
>>>>    I guessed that it's the json module which automatically converts from
>>>>    the 'new style' to the 'old style' behind the scenes? I tried to locate
>>>>    its documentation but couldn't find anything relevant.
>>>>
>>>>    I'm probably revealing my utter ignorance of python here -- I'm not a
>>>>    programmer but a philologist who learned perl years ago to work on my
>>>>    data --, and it's probably a good time to remedy that, but I want to be
>>>>    sure what kind of data I should be expecting/returning, or if something
>>>>    is broken in my pandoc installation, however unlikely!
>>>>
>>>>    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
>>>>
>>>>      I don't want to make removing empty spans the default, since
>>>>      it breaks expected behavior that HTML tags will be passed through
>>>>      verbatim.
>>>>      Note that if you use the python pandocfilters library to write
>>>>      your filters, your transformation functions can return a list
>>>>      instead of an object, in which case the list will be spliced in
>>>>      to the result (which I think is what you want).
>>>>      If you're writing the filters in Haskell, you can just use a
>>>>      function Inline -> [Inline] or Inline -> IO [Inline].
>>>>      +++ BP Jonsson [Oct 11 13 13:55 ]:
>>>>      > In a filter it's sometimes desirable to replace an element in its
>>>>      > parent element's content list with e.g. the contents of the
>>>>      > element itself, modified in some way. In practice this is hard to
>>>>      > do as you'll have to walk the AST data structure and collect
>>>>      > elements along with a reference to their parent element's content
>>>>      > list, which is a bit more complicated than just collecting the
>>>>      > (child) elements themselves. One possible workaround is to
>>>>      > convert the element into a Div or Span element and set it's
>>>>      > contents to whatever one wants to replace the original element
>>>>      > with. It works in the sense that the Span or Div element will
>>>>      > just sit there and, well, contain the data, but in HTML output it
>>>>      > will show up as a `<span>...</span>` or `<div>...</div>`, even
>>>>      > though it probably doesn't have any meaningful purpose in the
>>>>      > HTML document; it just makes the HTML harder to read and harder
>>>>      > to render.
>>>>      >
>>>>      > I've tried to write a filter which removes Span and Div elements
>>>>      > which don't have any attributes at all (id, class or other
>>>>      > attributes) -- or alternatively those which have a `disembowel=1`
>>>>      > attribute, although the absence of any attributes seems a better
>>>>      > criterion -- but for various
>>>>      >
>>>>      > reasons this has proved hard within a reasonable level of
>>>>      > 'parsing', especially since the AST data structure is rather
>>>>      > radically altered in the process -- paths to elements change
>>>>      > during the process in ways that make the processing hard. What if
>>>>      > pandoc itself replaced such attribute-less Span and Div elements
>>>>      > with their content at least when the `--normalize` option is set?
>>>>      > After all pandoc parses the whole document anyway!
>>>>      >
>>>>      > /bpj
>>>>      >
>>>>      > P.S.
>>>>      > : I am aware that there may be situations when a `<span>` or
>>>>      `<div>`
>>>>      > may be meaningful in an HTML document, most notably perhaps
>>>>      > when a CSS rule targets them as children of some other
>>>>      > element, but it seems to me that even then it is probably
>>>>      > most user friendly to give them a class describing their
>>>>      > function if they are really intended to fulfill a function in
>>>>      > the document, so that it might be reasonable for pandoc to
>>>>      > remove such attribute-less elements at least under
>>>>      > `-- normalize`.
>>>>      >
>>>>      > --
>>>>      > You received this message because you are subscribed to the Google
>>>>      Groups "pandoc-discuss" group.
>>>>      > To unsubscribe from this group and stop receiving emails from it,
>>>>      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>      > To post to this group, send email to
>>>>      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>      > To view this discussion on the web visit
>>>>      [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
>>>>      %40gmail.com.
>>>>      > For more options, visit
>>>>      [4]https://groups.google.com/groups/opt_out.
>>>>      --
>>>>      You received this message because you are subscribed to the Google
>>>>      Groups "pandoc-discuss" group.
>>>>      To unsubscribe from this group and stop receiving emails from it,
>>>>      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>      To post to this group, send email to
>>>>      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>      To view this discussion on the web visit
>>>>      [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
>>>>      E95559%40Johns-MacBook-Pro.local.
>>>>      For more options, visit [8]https://groups.google.com/groups/opt_out.
>>>>
>>>>    --
>>>>    /BP
>>>>
>>>>    --
>>>>    You received this message because you are subscribed to the Google
>>>>    Groups "pandoc-discuss" group.
>>>>    To unsubscribe from this group and stop receiving emails from it, send
>>>>    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>    To view this discussion on the web visit
>>>>    [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
>>>>    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
>>>>    For more options, visit [10]https://groups.google.com/groups/opt_out.
>>>>
>>>> References
>>>>
>>>>    1. javascript:;
>>>>    2. javascript:;
>>>>    3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
>>>>    4. https://groups.google.com/groups/opt_out
>>>>    5. javascript:;
>>>>    6. javascript:;
>>>>    7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
>>>>    8. https://groups.google.com/groups/opt_out
>>>>    9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
>>>>   10. https://groups.google.com/groups/opt_out
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>> --
>>
>> Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com
>> Gmail, Twitter & Skype name: ptsefton
>
>
>
> --
>
> Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com
> Gmail, Twitter & Skype name: ptsefton



-- 

Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com
Gmail, Twitter & Skype name: ptsefton


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]             ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2013-10-14  2:06               ` Peter Sefton
@ 2013-10-14  6:01               ` BP Jonsson
  2013-10-16 14:16               ` BP Jonsson
  2 siblings, 0 replies; 19+ messages in thread
From: BP Jonsson @ 2013-10-14  6:01 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 10649 bytes --]

Oh, I must have missed a pandoc update!

I agree that the altered format is easier to work with programmatically; in
fact I thought you had made the change deliberately for that reason. I
don't see any real benefit in ultra-short keys though, preferring
descriptive but not *too* long names for what is in effect attributes, and
any good editor should be expected to support tab completion. To use "key"
and "value" or "k" and "v" as keys might be potentially confusing to
newcomers however suggestive of the earlier format it might be. I'd suggest
"type" and "data" as both short enough and adequately descriptive.

Not that it's a very big deal; part of my perl scripting support module is
already about munging pandoc elements as decoded from json into manageable
objects inorder to not have to deal with things like this all the time:

    my($keyval) = grep { $_->[0] eq 'foo' } @{$elem->{$key}[0][2]};

so in the whole a

    my $type = ( keys %$elem )[0];

is really nothing!

Please don't get me wrong: being able to rely on every hash in the input
being an element is a feature, and being able to rely on those hashes
having exactly one key which is the name of the element type works well in
practice. It probably makes the json more human readable too.

Den söndagen den 13:e oktober 2013 skrev John MacFarlane:

> There was a change in the aeson (json) library that caused pandoc's
> JSON format to change, briefly.  (The latest code works around the
> change and restores the old behavior.)
>
> However, I think it would be worth considering changing to the
> format aeson now defaults to:
>
>     {"tag": "CodeBlock", "contents": ...}
>
> instead of
>
>     {"CodeBlock": ...}
>
> The former is more verbose but much easier to work with programatically.
> And we could remove some of the verbosity by changing "tag" to "k"
> and "contents" to "v".
>
> Thoughts?
>
> +++ BP Jonsson [Oct 13 13 14:15 ]:
> >    I'm trying to write something similar to pandocfilters.py to help with
> >    writing filters in Perl.
> >
> >    I noticed that the that the `walk` function in pandocfilters.py seems
> >    to expect that the dict objects it receives has an element name like
> >    `CodeBlock` as a key, with the contents of the element as value. This
> >    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
> >
> >    {"CodeBlock":[...]}
> >
> >    but in pandoc 1.12 I get JSON output where each element object has a
> >    `tag` key with the element type, e.g. `CodeBlock` as value and a key
> >    "contents" with the element contents as value:
> >
> >    {"tag":"CodeBlock","contents":[...]}
> >
> >    I guessed that it's the json module which automatically converts from
> >    the 'new style' to the 'old style' behind the scenes? I tried to
> locate
> >    its documentation but couldn't find anything relevant.
> >
> >    I'm probably revealing my utter ignorance of python here -- I'm not a
> >    programmer but a philologist who learned perl years ago to work on my
> >    data --, and it's probably a good time to remedy that, but I want to
> be
> >    sure what kind of data I should be expecting/returning, or if
> something
> >    is broken in my pandoc installation, however unlikely!
> >
> >    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
> >
> >      I don't want to make removing empty spans the default, since
> >      it breaks expected behavior that HTML tags will be passed through
> >      verbatim.
> >      Note that if you use the python pandocfilters library to write
> >      your filters, your transformation functions can return a list
> >      instead of an object, in which case the list will be spliced in
> >      to the result (which I think is what you want).
> >      If you're writing the filters in Haskell, you can just use a
> >      function Inline -> [Inline] or Inline -> IO [Inline].
> >      +++ BP Jonsson [Oct 11 13 13:55 ]:
> >      > In a filter it's sometimes desirable to replace an element in its
> >      > parent element's content list with e.g. the contents of the
> >      > element itself, modified in some way. In practice this is hard to
> >      > do as you'll have to walk the AST data structure and collect
> >      > elements along with a reference to their parent element's content
> >      > list, which is a bit more complicated than just collecting the
> >      > (child) elements themselves. One possible workaround is to
> >      > convert the element into a Div or Span element and set it's
> >      > contents to whatever one wants to replace the original element
> >      > with. It works in the sense that the Span or Div element will
> >      > just sit there and, well, contain the data, but in HTML output it
> >      > will show up as a `<span>...</span>` or `<div>...</div>`, even
> >      > though it probably doesn't have any meaningful purpose in the
> >      > HTML document; it just makes the HTML harder to read and harder
> >      > to render.
> >      >
> >      > I've tried to write a filter which removes Span and Div elements
> >      > which don't have any attributes at all (id, class or other
> >      > attributes) -- or alternatively those which have a `disembowel=1`
> >      > attribute, although the absence of any attributes seems a better
> >      > criterion -- but for various
> >      >
> >      > reasons this has proved hard within a reasonable level of
> >      > 'parsing', especially since the AST data structure is rather
> >      > radically altered in the process -- paths to elements change
> >      > during the process in ways that make the processing hard. What if
> >      > pandoc itself replaced such attribute-less Span and Div elements
> >      > with their content at least when the `--normalize` option is set?
> >      > After all pandoc parses the whole document anyway!
> >      >
> >      > /bpj
> >      >
> >      > P.S.
> >      > : I am aware that there may be situations when a `<span>` or
> >      `<div>`
> >      > may be meaningful in an HTML document, most notably perhaps
> >      > when a CSS rule targets them as children of some other
> >      > element, but it seems to me that even then it is probably
> >      > most user friendly to give them a class describing their
> >      > function if they are really intended to fulfill a function in
> >      > the document, so that it might be reasonable for pandoc to
> >      > remove such attribute-less elements at least under
> >      > `-- normalize`.
> >      >
> >      > --
> >      > You received this message because you are subscribed to the Google
> >      Groups "pandoc-discuss" group.
> >      > To unsubscribe from this group and stop receiving emails from it,
> >      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >      > To post to this group, send email to
> >      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> >      > To view this discussion on the web visit
> >      [3]
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
> >      %40gmail.com.
> >      > For more options, visit
> >      [4]https://groups.google.com/groups/opt_out.
> >      --
> >      You received this message because you are subscribed to the Google
> >      Groups "pandoc-discuss" group.
> >      To unsubscribe from this group and stop receiving emails from it,
> >      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >      To post to this group, send email to
> >      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> >      To view this discussion on the web visit
> >      [7]
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
> >      E95559%40Johns-MacBook-Pro.local.
> >      For more options, visit [8]https://groups.google.com/groups/opt_out
> .
> >
> >    --
> >    /BP
> >
> >    --
> >    You received this message because you are subscribed to the Google
> >    Groups "pandoc-discuss" group.
> >    To unsubscribe from this group and stop receiving emails from it, send
> >    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >    To view this discussion on the web visit
> >    [9]
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
> >    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
> >    For more options, visit [10]https://groups.google.com/groups/opt_out.
> >
> > References
> >
> >    1. javascript:;
> >    2. javascript:;
> >    3.
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
> >    4. https://groups.google.com/groups/opt_out
> >    5. javascript:;
> >    6. javascript:;
> >    7.
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
> >    8. https://groups.google.com/groups/opt_out
> >    9.
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
> >   10. https://groups.google.com/groups/opt_out
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>


-- 
/BP

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuQT%3DA30x6fogB5a482OgVjX6uYtdmx1USoXnhSBo_mGqg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 14722 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                         ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-10-14  7:03                           ` John MacFarlane
  0 siblings, 0 replies; 19+ messages in thread
From: John MacFarlane @ 2013-10-14  7:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Peter Sefton [Oct 14 13 13:16 ]:
> Further to this, I might be missing the point, but this could also be
> made much easier to work with in arbitrary languages:
> 
> [{"Header":[1,["heading-1",[],[]]
> 
> Something like:
> [{"node" : "Header", "level":1, "id": "heading", ...

Well, yes.  I could do that, with custom ToJSON/FromJSON instances
for Inline and Block.  Right now I'm relying on generic instances
which are automatically generated, and which can be predicted from
the types.  I've contemplated doing something fancier, but I'm not
sure the payoff is big enough.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]             ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2013-10-14  2:06               ` Peter Sefton
  2013-10-14  6:01               ` BP Jonsson
@ 2013-10-16 14:16               ` BP Jonsson
       [not found]                 ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2 siblings, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-16 14:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 9819 bytes --]

Is that restoration in the development version only? I tried to update but
cabal said everything was up to date.

I guess that in the meanwhile I could write a filter which will convert
from the changed to the traditional JSON format and one to convert the
other way around and call them before and after other filters, provided it
is possible to chain filters without pandoc seeing the data between the
filters. That should work, shouldn't it?

/bpj

Den söndagen den 13:e oktober 2013 skrev John MacFarlane:

> There was a change in the aeson (json) library that caused pandoc's
> JSON format to change, briefly.  (The latest code works around the
> change and restores the old behavior.)
>
> However, I think it would be worth considering changing to the
> format aeson now defaults to:
>
>     {"tag": "CodeBlock", "contents": ...}
>
> instead of
>
>     {"CodeBlock": ...}
>
> The former is more verbose but much easier to work with programatically.
> And we could remove some of the verbosity by changing "tag" to "k"
> and "contents" to "v".
>
> Thoughts?
>
> +++ BP Jonsson [Oct 13 13 14:15 ]:
> >    I'm trying to write something similar to pandocfilters.py to help with
> >    writing filters in Perl.
> >
> >    I noticed that the that the `walk` function in pandocfilters.py seems
> >    to expect that the dict objects it receives has an element name like
> >    `CodeBlock` as a key, with the contents of the element as value. This
> >    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
> >
> >    {"CodeBlock":[...]}
> >
> >    but in pandoc 1.12 I get JSON output where each element object has a
> >    `tag` key with the element type, e.g. `CodeBlock` as value and a key
> >    "contents" with the element contents as value:
> >
> >    {"tag":"CodeBlock","contents":[...]}
> >
> >    I guessed that it's the json module which automatically converts from
> >    the 'new style' to the 'old style' behind the scenes? I tried to
> locate
> >    its documentation but couldn't find anything relevant.
> >
> >    I'm probably revealing my utter ignorance of python here -- I'm not a
> >    programmer but a philologist who learned perl years ago to work on my
> >    data --, and it's probably a good time to remedy that, but I want to
> be
> >    sure what kind of data I should be expecting/returning, or if
> something
> >    is broken in my pandoc installation, however unlikely!
> >
> >    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
> >
> >      I don't want to make removing empty spans the default, since
> >      it breaks expected behavior that HTML tags will be passed through
> >      verbatim.
> >      Note that if you use the python pandocfilters library to write
> >      your filters, your transformation functions can return a list
> >      instead of an object, in which case the list will be spliced in
> >      to the result (which I think is what you want).
> >      If you're writing the filters in Haskell, you can just use a
> >      function Inline -> [Inline] or Inline -> IO [Inline].
> >      +++ BP Jonsson [Oct 11 13 13:55 ]:
> >      > In a filter it's sometimes desirable to replace an element in its
> >      > parent element's content list with e.g. the contents of the
> >      > element itself, modified in some way. In practice this is hard to
> >      > do as you'll have to walk the AST data structure and collect
> >      > elements along with a reference to their parent element's content
> >      > list, which is a bit more complicated than just collecting the
> >      > (child) elements themselves. One possible workaround is to
> >      > convert the element into a Div or Span element and set it's
> >      > contents to whatever one wants to replace the original element
> >      > with. It works in the sense that the Span or Div element will
> >      > just sit there and, well, contain the data, but in HTML output it
> >      > will show up as a `<span>...</span>` or `<div>...</div>`, even
> >      > though it probably doesn't have any meaningful purpose in the
> >      > HTML document; it just makes the HTML harder to read and harder
> >      > to render.
> >      >
> >      > I've tried to write a filter which removes Span and Div elements
> >      > which don't have any attributes at all (id, class or other
> >      > attributes) -- or alternatively those which have a `disembowel=1`
> >      > attribute, although the absence of any attributes seems a better
> >      > criterion -- but for various
> >      >
> >      > reasons this has proved hard within a reasonable level of
> >      > 'parsing', especially since the AST data structure is rather
> >      > radically altered in the process -- paths to elements change
> >      > during the process in ways that make the processing hard. What if
> >      > pandoc itself replaced such attribute-less Span and Div elements
> >      > with their content at least when the `--normalize` option is set?
> >      > After all pandoc parses the whole document anyway!
> >      >
> >      > /bpj
> >      >
> >      > P.S.
> >      > : I am aware that there may be situations when a `<span>` or
> >      `<div>`
> >      > may be meaningful in an HTML document, most notably perhaps
> >      > when a CSS rule targets them as children of some other
> >      > element, but it seems to me that even then it is probably
> >      > most user friendly to give them a class describing their
> >      > function if they are really intended to fulfill a function in
> >      > the document, so that it might be reasonable for pandoc to
> >      > remove such attribute-less elements at least under
> >      > `-- normalize`.
> >      >
> >      > --
> >      > You received this message because you are subscribed to the Google
> >      Groups "pandoc-discuss" group.
> >      > To unsubscribe from this group and stop receiving emails from it,
> >      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >      > To post to this group, send email to
> >      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> >      > To view this discussion on the web visit
> >      [3]
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
> >      %40gmail.com.
> >      > For more options, visit
> >      [4]https://groups.google.com/groups/opt_out.
> >      --
> >      You received this message because you are subscribed to the Google
> >      Groups "pandoc-discuss" group.
> >      To unsubscribe from this group and stop receiving emails from it,
> >      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >      To post to this group, send email to
> >      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> >      To view this discussion on the web visit
> >      [7]
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
> >      E95559%40Johns-MacBook-Pro.local.
> >      For more options, visit [8]https://groups.google.com/groups/opt_out
> .
> >
> >    --
> >    /BP
> >
> >    --
> >    You received this message because you are subscribed to the Google
> >    Groups "pandoc-discuss" group.
> >    To unsubscribe from this group and stop receiving emails from it, send
> >    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >    To view this discussion on the web visit
> >    [9]
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
> >    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
> >    For more options, visit [10]https://groups.google.com/groups/opt_out.
> >
> > References
> >
> >    1. javascript:;
> >    2. javascript:;
> >    3.
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
> >    4. https://groups.google.com/groups/opt_out
> >    5. javascript:;
> >    6. javascript:;
> >    7.
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
> >    8. https://groups.google.com/groups/opt_out
> >    9.
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
> >   10. https://groups.google.com/groups/opt_out
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>


-- 
/BP

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRuOgEKK8BG%3DeDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 13692 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                 ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-10-16 16:13                   ` John MacFarlane
  2013-10-16 16:32                   ` BP Jonsson
  1 sibling, 0 replies; 19+ messages in thread
From: John MacFarlane @ 2013-10-16 16:13 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I'm reworking the JSON format right now.

Note, even if pandoc is up to date, it may be compiled against the
older pandoc-types library.  So

    cabal install pandoc-types pandoc

+++ BP Jonsson [Oct 16 13 16:16 ]:
> Is that restoration in the development version only? I tried to update but
> cabal said everything was up to date.
> 
> I guess that in the meanwhile I could write a filter which will convert
> from the changed to the traditional JSON format and one to convert the
> other way around and call them before and after other filters, provided it
> is possible to chain filters without pandoc seeing the data between the
> filters. That should work, shouldn't it?
> 
> /bpj
> 
> Den söndagen den 13:e oktober 2013 skrev John MacFarlane:
> 
> > There was a change in the aeson (json) library that caused pandoc's
> > JSON format to change, briefly.  (The latest code works around the
> > change and restores the old behavior.)
> >
> > However, I think it would be worth considering changing to the
> > format aeson now defaults to:
> >
> >     {"tag": "CodeBlock", "contents": ...}
> >
> > instead of
> >
> >     {"CodeBlock": ...}
> >
> > The former is more verbose but much easier to work with programatically.
> > And we could remove some of the verbosity by changing "tag" to "k"
> > and "contents" to "v".
> >
> > Thoughts?
> >
> > +++ BP Jonsson [Oct 13 13 14:15 ]:
> > >    I'm trying to write something similar to pandocfilters.py to help with
> > >    writing filters in Perl.
> > >
> > >    I noticed that the that the `walk` function in pandocfilters.py seems
> > >    to expect that the dict objects it receives has an element name like
> > >    `CodeBlock` as a key, with the contents of the element as value. This
> > >    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
> > >
> > >    {"CodeBlock":[...]}
> > >
> > >    but in pandoc 1.12 I get JSON output where each element object has a
> > >    `tag` key with the element type, e.g. `CodeBlock` as value and a key
> > >    "contents" with the element contents as value:
> > >
> > >    {"tag":"CodeBlock","contents":[...]}
> > >
> > >    I guessed that it's the json module which automatically converts from
> > >    the 'new style' to the 'old style' behind the scenes? I tried to
> > locate
> > >    its documentation but couldn't find anything relevant.
> > >
> > >    I'm probably revealing my utter ignorance of python here -- I'm not a
> > >    programmer but a philologist who learned perl years ago to work on my
> > >    data --, and it's probably a good time to remedy that, but I want to
> > be
> > >    sure what kind of data I should be expecting/returning, or if
> > something
> > >    is broken in my pandoc installation, however unlikely!
> > >
> > >    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
> > >
> > >      I don't want to make removing empty spans the default, since
> > >      it breaks expected behavior that HTML tags will be passed through
> > >      verbatim.
> > >      Note that if you use the python pandocfilters library to write
> > >      your filters, your transformation functions can return a list
> > >      instead of an object, in which case the list will be spliced in
> > >      to the result (which I think is what you want).
> > >      If you're writing the filters in Haskell, you can just use a
> > >      function Inline -> [Inline] or Inline -> IO [Inline].
> > >      +++ BP Jonsson [Oct 11 13 13:55 ]:
> > >      > In a filter it's sometimes desirable to replace an element in its
> > >      > parent element's content list with e.g. the contents of the
> > >      > element itself, modified in some way. In practice this is hard to
> > >      > do as you'll have to walk the AST data structure and collect
> > >      > elements along with a reference to their parent element's content
> > >      > list, which is a bit more complicated than just collecting the
> > >      > (child) elements themselves. One possible workaround is to
> > >      > convert the element into a Div or Span element and set it's
> > >      > contents to whatever one wants to replace the original element
> > >      > with. It works in the sense that the Span or Div element will
> > >      > just sit there and, well, contain the data, but in HTML output it
> > >      > will show up as a `<span>...</span>` or `<div>...</div>`, even
> > >      > though it probably doesn't have any meaningful purpose in the
> > >      > HTML document; it just makes the HTML harder to read and harder
> > >      > to render.
> > >      >
> > >      > I've tried to write a filter which removes Span and Div elements
> > >      > which don't have any attributes at all (id, class or other
> > >      > attributes) -- or alternatively those which have a `disembowel=1`
> > >      > attribute, although the absence of any attributes seems a better
> > >      > criterion -- but for various
> > >      >
> > >      > reasons this has proved hard within a reasonable level of
> > >      > 'parsing', especially since the AST data structure is rather
> > >      > radically altered in the process -- paths to elements change
> > >      > during the process in ways that make the processing hard. What if
> > >      > pandoc itself replaced such attribute-less Span and Div elements
> > >      > with their content at least when the `--normalize` option is set?
> > >      > After all pandoc parses the whole document anyway!
> > >      >
> > >      > /bpj
> > >      >
> > >      > P.S.
> > >      > : I am aware that there may be situations when a `<span>` or
> > >      `<div>`
> > >      > may be meaningful in an HTML document, most notably perhaps
> > >      > when a CSS rule targets them as children of some other
> > >      > element, but it seems to me that even then it is probably
> > >      > most user friendly to give them a class describing their
> > >      > function if they are really intended to fulfill a function in
> > >      > the document, so that it might be reasonable for pandoc to
> > >      > remove such attribute-less elements at least under
> > >      > `-- normalize`.
> > >      >
> > >      > --
> > >      > You received this message because you are subscribed to the Google
> > >      Groups "pandoc-discuss" group.
> > >      > To unsubscribe from this group and stop receiving emails from it,
> > >      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> > .
> > >      > To post to this group, send email to
> > >      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> > >      > To view this discussion on the web visit
> > >      [3]
> > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
> > >      %40gmail.com.
> > >      > For more options, visit
> > >      [4]https://groups.google.com/groups/opt_out.
> > >      --
> > >      You received this message because you are subscribed to the Google
> > >      Groups "pandoc-discuss" group.
> > >      To unsubscribe from this group and stop receiving emails from it,
> > >      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> > .
> > >      To post to this group, send email to
> > >      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> > >      To view this discussion on the web visit
> > >      [7]
> > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
> > >      E95559%40Johns-MacBook-Pro.local.
> > >      For more options, visit [8]https://groups.google.com/groups/opt_out
> > .
> > >
> > >    --
> > >    /BP
> > >
> > >    --
> > >    You received this message because you are subscribed to the Google
> > >    Groups "pandoc-discuss" group.
> > >    To unsubscribe from this group and stop receiving emails from it, send
> > >    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> > .
> > >    To post to this group, send email to pandoc-discuss@googlegroups.com<javascript:;>
> > .
> > >    To view this discussion on the web visit
> > >    [9]
> > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
> > >    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
> > >    For more options, visit [10]https://groups.google.com/groups/opt_out.
> > >
> > > References
> > >
> > >    1. javascript:;
> > >    2. javascript:;
> > >    3.
> > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
> > >    4. https://groups.google.com/groups/opt_out
> > >    5. javascript:;
> > >    6. javascript:;
> > >    7.
> > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
> > >    8. https://groups.google.com/groups/opt_out
> > >    9.
> > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
> > >   10. https://groups.google.com/groups/opt_out
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> > .
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu
> > .
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> 
> 
> -- 
> /BP
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRuOgEKK8BG%3DeDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A%40mail.gmail.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131016161336.GD52267%40Johns-MacBook-Pro.local.
For more options, visit https://groups.google.com/groups/opt_out.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                 ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-10-16 16:13                   ` John MacFarlane
@ 2013-10-16 16:32                   ` BP Jonsson
       [not found]                     ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-16 16:32 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2013-10-16 16:16, BP Jonsson skrev:
> Is that restoration in the development version only? I tried to update but
> cabal said everything was up to date.
>
> I guess that in the meanwhile I could write a filter which will convert
> from the changed to the traditional JSON format and one to convert the
> other way around and call them before and after other filters, provided it
> is possible to chain filters without pandoc seeing the data between the
> filters. That should work, shouldn't it?

I discovered it doesn't, but I incidentally peeked at
<https://github.com/jgm/pandocfilters/blob/460404290a3e956dff3cb0321aa908c4cffabbaf/pandocfilters.py>

And found that you have decided to change the JSON format so that
objects have a "tag" key and a "val" key. Assuming this will be 
the format in the next release I can make do for now with an
environment variable + variable to temporarily use "contents"
instead of "val" for the "val" key.

Are there any other changes to the JSON format?

/BPJ

>
> /bpj
>
> Den söndagen den 13:e oktober 2013 skrev John MacFarlane:
>
>> There was a change in the aeson (json) library that caused pandoc's
>> JSON format to change, briefly.  (The latest code works around the
>> change and restores the old behavior.)
>>
>> However, I think it would be worth considering changing to the
>> format aeson now defaults to:
>>
>>      {"tag": "CodeBlock", "contents": ...}
>>
>> instead of
>>
>>      {"CodeBlock": ...}
>>
>> The former is more verbose but much easier to work with programatically.
>> And we could remove some of the verbosity by changing "tag" to "k"
>> and "contents" to "v".
>>
>> Thoughts?
>>
>> +++ BP Jonsson [Oct 13 13 14:15 ]:
>>>     I'm trying to write something similar to pandocfilters.py to help with
>>>     writing filters in Perl.
>>>
>>>     I noticed that the that the `walk` function in pandocfilters.py seems
>>>     to expect that the dict objects it receives has an element name like
>>>     `CodeBlock` as a key, with the contents of the element as value. This
>>>     was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
>>>
>>>     {"CodeBlock":[...]}
>>>
>>>     but in pandoc 1.12 I get JSON output where each element object has a
>>>     `tag` key with the element type, e.g. `CodeBlock` as value and a key
>>>     "contents" with the element contents as value:
>>>
>>>     {"tag":"CodeBlock","contents":[...]}
>>>
>>>     I guessed that it's the json module which automatically converts from
>>>     the 'new style' to the 'old style' behind the scenes? I tried to
>> locate
>>>     its documentation but couldn't find anything relevant.
>>>
>>>     I'm probably revealing my utter ignorance of python here -- I'm not a
>>>     programmer but a philologist who learned perl years ago to work on my
>>>     data --, and it's probably a good time to remedy that, but I want to
>> be
>>>     sure what kind of data I should be expecting/returning, or if
>> something
>>>     is broken in my pandoc installation, however unlikely!
>>>
>>>     Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
>>>
>>>       I don't want to make removing empty spans the default, since
>>>       it breaks expected behavior that HTML tags will be passed through
>>>       verbatim.
>>>       Note that if you use the python pandocfilters library to write
>>>       your filters, your transformation functions can return a list
>>>       instead of an object, in which case the list will be spliced in
>>>       to the result (which I think is what you want).
>>>       If you're writing the filters in Haskell, you can just use a
>>>       function Inline -> [Inline] or Inline -> IO [Inline].
>>>       +++ BP Jonsson [Oct 11 13 13:55 ]:
>>>       > In a filter it's sometimes desirable to replace an element in its
>>>       > parent element's content list with e.g. the contents of the
>>>       > element itself, modified in some way. In practice this is hard to
>>>       > do as you'll have to walk the AST data structure and collect
>>>       > elements along with a reference to their parent element's content
>>>       > list, which is a bit more complicated than just collecting the
>>>       > (child) elements themselves. One possible workaround is to
>>>       > convert the element into a Div or Span element and set it's
>>>       > contents to whatever one wants to replace the original element
>>>       > with. It works in the sense that the Span or Div element will
>>>       > just sit there and, well, contain the data, but in HTML output it
>>>       > will show up as a `<span>...</span>` or `<div>...</div>`, even
>>>       > though it probably doesn't have any meaningful purpose in the
>>>       > HTML document; it just makes the HTML harder to read and harder
>>>       > to render.
>>>       >
>>>       > I've tried to write a filter which removes Span and Div elements
>>>       > which don't have any attributes at all (id, class or other
>>>       > attributes) -- or alternatively those which have a `disembowel=1`
>>>       > attribute, although the absence of any attributes seems a better
>>>       > criterion -- but for various
>>>       >
>>>       > reasons this has proved hard within a reasonable level of
>>>       > 'parsing', especially since the AST data structure is rather
>>>       > radically altered in the process -- paths to elements change
>>>       > during the process in ways that make the processing hard. What if
>>>       > pandoc itself replaced such attribute-less Span and Div elements
>>>       > with their content at least when the `--normalize` option is set?
>>>       > After all pandoc parses the whole document anyway!
>>>       >
>>>       > /bpj
>>>       >
>>>       > P.S.
>>>       > : I am aware that there may be situations when a `<span>` or
>>>       `<div>`
>>>       > may be meaningful in an HTML document, most notably perhaps
>>>       > when a CSS rule targets them as children of some other
>>>       > element, but it seems to me that even then it is probably
>>>       > most user friendly to give them a class describing their
>>>       > function if they are really intended to fulfill a function in
>>>       > the document, so that it might be reasonable for pandoc to
>>>       > remove such attribute-less elements at least under
>>>       > `-- normalize`.
>>>       >
>>>       > --
>>>       > You received this message because you are subscribed to the Google
>>>       Groups "pandoc-discuss" group.
>>>       > To unsubscribe from this group and stop receiving emails from it,
>>>       send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
>> .
>>>       > To post to this group, send email to
>>>       [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
>>>       > To view this discussion on the web visit
>>>       [3]
>> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
>>>       %40gmail.com.
>>>       > For more options, visit
>>>       [4]https://groups.google.com/groups/opt_out.
>>>       --
>>>       You received this message because you are subscribed to the Google
>>>       Groups "pandoc-discuss" group.
>>>       To unsubscribe from this group and stop receiving emails from it,
>>>       send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
>> .
>>>       To post to this group, send email to
>>>       [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
>>>       To view this discussion on the web visit
>>>       [7]
>> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
>>>       E95559%40Johns-MacBook-Pro.local.
>>>       For more options, visit [8]https://groups.google.com/groups/opt_out
>> .
>>>
>>>     --
>>>     /BP
>>>
>>>     --
>>>     You received this message because you are subscribed to the Google
>>>     Groups "pandoc-discuss" group.
>>>     To unsubscribe from this group and stop receiving emails from it, send
>>>     an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
>> .
>>>     To post to this group, send email to pandoc-discuss-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.orgm<javascript:;>
>> .
>>>     To view this discussion on the web visit
>>>     [9]
>> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
>>>     YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
>>>     For more options, visit [10]https://groups.google.com/groups/opt_out.
>>>
>>> References
>>>
>>>     1. javascript:;
>>>     2. javascript:;
>>>     3.
>> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
>>>     4. https://groups.google.com/groups/opt_out
>>>     5. javascript:;
>>>     6. javascript:;
>>>     7.
>> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
>>>     8. https://groups.google.com/groups/opt_out
>>>     9.
>> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
>>>    10. https://groups.google.com/groups/opt_out
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
>> .
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/525EBF8A.7050106%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                     ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-16 17:49                       ` John MacFarlane
       [not found]                         ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: John MacFarlane @ 2013-10-16 17:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ BP Jonsson [Oct 16 13 18:32 ]:
> 2013-10-16 16:16, BP Jonsson skrev:
> >Is that restoration in the development version only? I tried to update but
> >cabal said everything was up to date.
> >
> >I guess that in the meanwhile I could write a filter which will convert
> >from the changed to the traditional JSON format and one to convert the
> >other way around and call them before and after other filters, provided it
> >is possible to chain filters without pandoc seeing the data between the
> >filters. That should work, shouldn't it?
> 
> I discovered it doesn't, but I incidentally peeked at
> <https://github.com/jgm/pandocfilters/blob/460404290a3e956dff3cb0321aa908c4cffabbaf/pandocfilters.py>
> 
> And found that you have decided to change the JSON format so that
> objects have a "tag" key and a "val" key. Assuming this will be the
> format in the next release I can make do for now with an
> environment variable + variable to temporarily use "contents"
> instead of "val" for the "val" key.
> 
> Are there any other changes to the JSON format?

Since you checked, I've changed "tag" to "t" and "val" to "v".
This will cut down a lot on the size of the serialized strings,
which will also help performance.  I don't worry too much about
confusing users, since anyone who interacts directly with the
JSON will have to study the format pretty closely anyway.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                         ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
@ 2013-10-17  9:02                           ` BP Jonsson
       [not found]                             ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-17  9:02 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2013-10-16 19:49, John MacFarlane skrev:
> +++ BP Jonsson [Oct 16 13 18:32 ]:
>> 2013-10-16 16:16, BP Jonsson skrev:
>>> Is that restoration in the development version only? I tried to update but
>>> cabal said everything was up to date.
>>>
>>> I guess that in the meanwhile I could write a filter which will convert
>> >from the changed to the traditional JSON format and one to convert the
>>> other way around and call them before and after other filters, provided it
>>> is possible to chain filters without pandoc seeing the data between the
>>> filters. That should work, shouldn't it?
>>
>> I discovered it doesn't, but I incidentally peeked at
>> <https://github.com/jgm/pandocfilters/blob/460404290a3e956dff3cb0321aa908c4cffabbaf/pandocfilters.py>
>>
>> And found that you have decided to change the JSON format so that
>> objects have a "tag" key and a "val" key. Assuming this will be the
>> format in the next release I can make do for now with an
>> environment variable + variable to temporarily use "contents"
>> instead of "val" for the "val" key.
>>
>> Are there any other changes to the JSON format?
>
> Since you checked, I've changed "tag" to "t" and "val" to "v".
> This will cut down a lot on the size of the serialized strings,
> which will also help performance.  I don't worry too much about
> confusing users, since anyone who interacts directly with the
> JSON will have to study the format pretty closely anyway.

No problem as long as I know what format(s) to support.
BTW is the JSON format documented anywhere? And can I expect
a pandoc release with the new format soon?

/bpj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                             ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-17 16:09                               ` John MacFarlane
       [not found]                                 ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: John MacFarlane @ 2013-10-17 16:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

It is not documented, but it is predictable from the definitions in
Text.Pandoc.Definition.  Looking at a few examples should give you the
idea.

You might just try to copy the tree-walking code from pandocfilters.py.

> No problem as long as I know what format(s) to support.
> BTW is the JSON format documented anywhere? And can I expect
> a pandoc release with the new format soon?
> 
> /bpj
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/525FA791.60509%40gmail.com.
> For more options, visit https://groups.google.com/groups/opt_out.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                                 ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
@ 2013-10-17 19:31                                   ` BP Jonsson
       [not found]                                     ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-17 19:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2013-10-17 18:09, John MacFarlane skrev:
> It is not documented, but it is predictable from the definitions in
> Text.Pandoc.Definition.  Looking at a few examples should give you the
> idea.

I've done that several times, and am mostly confident with the
notation. Im not quite sure what the parentheses mean in e.g.

     DefinitionList [([Inline], [[Block]])]

but I guess that it means "one or more tuple(s)", since that's
what you actually get in this case.

>
> You might just try to copy the tree-walking code from pandocfilters.py.

I'm no stranger to tree-walking and am already using such code,
but I want to pass the callback function an object rather than
the raw document element data so that one can use methods to
modify the value(s) of attributes etc. without worrying about the
JSON format -- although by the time I've written the code for
those objects I'll have learned the JSON format by heart I
suppose -- and also allow users to build instances of such
objects, taking advantage of the JSON.pm module's support for
automatically calling TO_JSON methods on objects to obtain
appropriate data structures to serialize in place of the objects.

/bpj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                                     ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-17 23:41                                       ` John MacFarlane
       [not found]                                         ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: John MacFarlane @ 2013-10-17 23:41 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ BP Jonsson [Oct 17 13 21:31 ]:
> 2013-10-17 18:09, John MacFarlane skrev:
> >It is not documented, but it is predictable from the definitions in
> >Text.Pandoc.Definition.  Looking at a few examples should give you the
> >idea.
> 
> I've done that several times, and am mostly confident with the
> notation. Im not quite sure what the parentheses mean in e.g.
> 
>     DefinitionList [([Inline], [[Block]])]
> 
> but I guess that it means "one or more tuple(s)", since that's
> what you actually get in this case.
> 
> >
> >You might just try to copy the tree-walking code from pandocfilters.py.
> 
> I'm no stranger to tree-walking and am already using such code,
> but I want to pass the callback function an object rather than
> the raw document element data so that one can use methods to
> modify the value(s) of attributes etc. without worrying about the
> JSON format -- although by the time I've written the code for
> those objects I'll have learned the JSON format by heart I
> suppose -- and also allow users to build instances of such
> objects, taking advantage of the JSON.pm module's support for
> automatically calling TO_JSON methods on objects to obtain
> appropriate data structures to serialize in place of the objects.

Yes, that's probably a nicer approach, but it requires duplicating
all of the pandoc data structures as native python/perl objects,
and providing code to translate between JSON and those.  Seemed a
bit too much for the moment, but I did provide python "constructor"
functions for all the inline and block elements, so people don't
need to directly construct JSON objects.  (This also makes filters
a bit more independent of the underlying representation....when I
modified the JSON format just now, I didn't need to change any
of the example filters.)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                                         ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2013-10-18  9:26                                           ` BP Jonsson
       [not found]                                             ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: BP Jonsson @ 2013-10-18  9:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2013-10-18 01:41, John MacFarlane skrev:
>>> > >You might just try to copy the tree-walking code from pandocfilters.py.
>> >
>> >I'm no stranger to tree-walking and am already using such code,
>> >but I want to pass the callback function an object rather than
>> >the raw document element data so that one can use methods to
>> >modify the value(s) of attributes etc. without worrying about the
>> >JSON format -- although by the time I've written the code for
>> >those objects I'll have learned the JSON format by heart I
>> >suppose -- and also allow users to build instances of such
>> >objects, taking advantage of the JSON.pm module's support for
>> >automatically calling TO_JSON methods on objects to obtain
>> >appropriate data structures to serialize in place of the objects.

> Yes, that's probably a nicer approach, but it requires duplicating
> all of the pandoc data structures as native python/perl objects,

My thought it to make that half-lazy by letting the user specify
a wanted_tags parameter to the walker call, and if that
parameter, a single tag or an array of tags, is provided then
only element with those tags, and their descendants, will be
'objectified'. I suppose I could make that even more lazy by
fetching/converting data from the structures obtained from JSON
only when requested -- but too many checks gets expensive too, at
least in terms of code clarity.

> and providing code to translate between JSON and those.

There is no way around that, but naturally I'm modularizing that
as much as possible, e.g. the attr property of a span object is
an instance of an attr class, which is also used by
div/header/code/codeblock objects so code for constructing and
serializing attributes need not be duplicated.

> Seemed a bit too much for the moment,

I've decided that it is worth it because perl code for handling
nested structures, especially finding values in nested arrays,
tends to become hairy and thus error prone, so that I would
anyway need to provide helper functions for such things.
Consider the example for how to get a keyval I gave the other day
(as modified for the new format):

     my($keyval) = grep { $_->[0] eq 'foo' } @{ $elem->{v}[0][2] };

Things get much cleaner, and faster if one needs to look up many
keyvals, on all levels if the keyvals are converted to a hashmap
when the attr object is constructed and back again once by the
attr object's TO_JSON method, which both are comparatively benign
code.

> but I did provide python "constructor" functions for all the
> inline and block elements, so people don't need to directly
> construct JSON objects.

I have a 'filter' object which is instantiated once for each
filter, and various utility routines, including the walking
routine, are provided as methods of that object. It has a
new_elem_obj method which calls the appropriate constructor
depending on which tag is provided to it, mostly because class
names tend to get long with perl's namespace model. You can pass
that method or the specific constructors either separate
parameters for tag/format/classes/whatever or an element
parameter obtained from the JSON data, which is 'mined' for data
by the constructor if it exists and the specific parameters
don't, so both the walking routine and the user use the same
method to get their element objects. Moreover you can subclass
the 'filter' object and provide functions with names like
div_handler div_handler_for_someclass or div_handler_to_latex for
the walker to use as callback for specific elements w/o specific
classes if applicable and for specific target formats if desired.
The goal is that the user shall not have to litter a single
callback function with a lot of conditional chains.

> (This also makes filters a bit more independent of the
> underlying representation....when I modified the JSON format
> just now, I didn't need to change any of the example filters.)

Yes, that's my goal too, and to move repetitive code out of the
filter scripts into modules used by the filter scripts.

/bpj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Span and Div elements without any attributes (in filter output)
       [not found]                                             ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-10-18 15:10                                               ` John MacFarlane
  0 siblings, 0 replies; 19+ messages in thread
From: John MacFarlane @ 2013-10-18 15:10 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Note:  I've tweaked things again, using 'c' instead of 'v'.
Hope to have a proper release soon.

+++ BP Jonsson [Oct 18 13 11:26 ]:
> 2013-10-18 01:41, John MacFarlane skrev:
> >>>> >You might just try to copy the tree-walking code from pandocfilters.py.
> >>>
> >>>I'm no stranger to tree-walking and am already using such code,
> >>>but I want to pass the callback function an object rather than
> >>>the raw document element data so that one can use methods to
> >>>modify the value(s) of attributes etc. without worrying about the
> >>>JSON format -- although by the time I've written the code for
> >>>those objects I'll have learned the JSON format by heart I
> >>>suppose -- and also allow users to build instances of such
> >>>objects, taking advantage of the JSON.pm module's support for
> >>>automatically calling TO_JSON methods on objects to obtain
> >>>appropriate data structures to serialize in place of the objects.
> 
> >Yes, that's probably a nicer approach, but it requires duplicating
> >all of the pandoc data structures as native python/perl objects,
> 
> My thought it to make that half-lazy by letting the user specify
> a wanted_tags parameter to the walker call, and if that
> parameter, a single tag or an array of tags, is provided then
> only element with those tags, and their descendants, will be
> 'objectified'. I suppose I could make that even more lazy by
> fetching/converting data from the structures obtained from JSON
> only when requested -- but too many checks gets expensive too, at
> least in terms of code clarity.
> 
> >and providing code to translate between JSON and those.
> 
> There is no way around that, but naturally I'm modularizing that
> as much as possible, e.g. the attr property of a span object is
> an instance of an attr class, which is also used by
> div/header/code/codeblock objects so code for constructing and
> serializing attributes need not be duplicated.
> 
> >Seemed a bit too much for the moment,
> 
> I've decided that it is worth it because perl code for handling
> nested structures, especially finding values in nested arrays,
> tends to become hairy and thus error prone, so that I would
> anyway need to provide helper functions for such things.
> Consider the example for how to get a keyval I gave the other day
> (as modified for the new format):
> 
>     my($keyval) = grep { $_->[0] eq 'foo' } @{ $elem->{v}[0][2] };
> 
> Things get much cleaner, and faster if one needs to look up many
> keyvals, on all levels if the keyvals are converted to a hashmap
> when the attr object is constructed and back again once by the
> attr object's TO_JSON method, which both are comparatively benign
> code.
> 
> >but I did provide python "constructor" functions for all the
> >inline and block elements, so people don't need to directly
> >construct JSON objects.
> 
> I have a 'filter' object which is instantiated once for each
> filter, and various utility routines, including the walking
> routine, are provided as methods of that object. It has a
> new_elem_obj method which calls the appropriate constructor
> depending on which tag is provided to it, mostly because class
> names tend to get long with perl's namespace model. You can pass
> that method or the specific constructors either separate
> parameters for tag/format/classes/whatever or an element
> parameter obtained from the JSON data, which is 'mined' for data
> by the constructor if it exists and the specific parameters
> don't, so both the walking routine and the user use the same
> method to get their element objects. Moreover you can subclass
> the 'filter' object and provide functions with names like
> div_handler div_handler_for_someclass or div_handler_to_latex for
> the walker to use as callback for specific elements w/o specific
> classes if applicable and for specific target formats if desired.
> The goal is that the user shall not have to litter a single
> callback function with a lot of conditional chains.
> 
> >(This also makes filters a bit more independent of the
> >underlying representation....when I modified the JSON format
> >just now, I didn't need to change any of the example filters.)
> 
> Yes, that's my goal too, and to move repetitive code out of the
> filter scripts into modules used by the filter scripts.
> 
> /bpj
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5260FECB.30409%40gmail.com.
> For more options, visit https://groups.google.com/groups/opt_out.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-10-18 15:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-11 11:55 Span and Div elements without any attributes (in filter output) BP Jonsson
     [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-12 14:50   ` John MacFarlane
     [not found]     ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2013-10-13 12:15       ` BP Jonsson
     [not found]         ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-13 21:29           ` John MacFarlane
     [not found]             ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2013-10-14  2:06               ` Peter Sefton
     [not found]                 ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-14  2:12                   ` Peter Sefton
     [not found]                     ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-14  2:16                       ` Peter Sefton
     [not found]                         ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-14  7:03                           ` John MacFarlane
2013-10-14  6:01               ` BP Jonsson
2013-10-16 14:16               ` BP Jonsson
     [not found]                 ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-16 16:13                   ` John MacFarlane
2013-10-16 16:32                   ` BP Jonsson
     [not found]                     ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-16 17:49                       ` John MacFarlane
     [not found]                         ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2013-10-17  9:02                           ` BP Jonsson
     [not found]                             ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-17 16:09                               ` John MacFarlane
     [not found]                                 ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2013-10-17 19:31                                   ` BP Jonsson
     [not found]                                     ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-17 23:41                                       ` John MacFarlane
     [not found]                                         ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2013-10-18  9:26                                           ` BP Jonsson
     [not found]                                             ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-18 15:10                                               ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).