* Span and Div elements without any attributes (in filter output) @ 2013-10-11 11:55 BP Jonsson [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-11 11:55 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw In a filter it's sometimes desirable to replace an element in its parent element's content list with e.g. the contents of the element itself, modified in some way. In practice this is hard to do as you'll have to walk the AST data structure and collect elements along with a reference to their parent element's content list, which is a bit more complicated than just collecting the (child) elements themselves. One possible workaround is to convert the element into a Div or Span element and set it's contents to whatever one wants to replace the original element with. It works in the sense that the Span or Div element will just sit there and, well, contain the data, but in HTML output it will show up as a `<span>...</span>` or `<div>...</div>`, even though it probably doesn't have any meaningful purpose in the HTML document; it just makes the HTML harder to read and harder to render. I've tried to write a filter which removes Span and Div elements which don't have any attributes at all (id, class or other attributes) -- or alternatively those which have a `disembowel=1` attribute, although the absence of any attributes seems a better criterion -- but for various reasons this has proved hard within a reasonable level of 'parsing', especially since the AST data structure is rather radically altered in the process -- paths to elements change during the process in ways that make the processing hard. What if pandoc itself replaced such attribute-less Span and Div elements with their content at least when the `--normalize` option is set? After all pandoc parses the whole document anyway! /bpj P.S. : I am aware that there may be situations when a `<span>` or `<div>` may be meaningful in an HTML document, most notably perhaps when a CSS rule targets them as children of some other element, but it seems to me that even then it is probably most user friendly to give them a class describing their function if they are really intended to fulfill a function in the document, so that it might be reasonable for pandoc to remove such attribute-less elements at least under `-- normalize`. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-12 14:50 ` John MacFarlane [not found] ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: John MacFarlane @ 2013-10-12 14:50 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw I don't want to make removing empty spans the default, since it breaks expected behavior that HTML tags will be passed through verbatim. Note that if you use the python pandocfilters library to write your filters, your transformation functions can return a list instead of an object, in which case the list will be spliced in to the result (which I think is what you want). If you're writing the filters in Haskell, you can just use a function Inline -> [Inline] or Inline -> IO [Inline]. +++ BP Jonsson [Oct 11 13 13:55 ]: > In a filter it's sometimes desirable to replace an element in its > parent element's content list with e.g. the contents of the > element itself, modified in some way. In practice this is hard to > do as you'll have to walk the AST data structure and collect > elements along with a reference to their parent element's content > list, which is a bit more complicated than just collecting the > (child) elements themselves. One possible workaround is to > convert the element into a Div or Span element and set it's > contents to whatever one wants to replace the original element > with. It works in the sense that the Span or Div element will > just sit there and, well, contain the data, but in HTML output it > will show up as a `<span>...</span>` or `<div>...</div>`, even > though it probably doesn't have any meaningful purpose in the > HTML document; it just makes the HTML harder to read and harder > to render. > > I've tried to write a filter which removes Span and Div elements > which don't have any attributes at all (id, class or other > attributes) -- or alternatively those which have a `disembowel=1` > attribute, although the absence of any attributes seems a better > criterion -- but for various > > reasons this has proved hard within a reasonable level of > 'parsing', especially since the AST data structure is rather > radically altered in the process -- paths to elements change > during the process in ways that make the processing hard. What if > pandoc itself replaced such attribute-less Span and Div elements > with their content at least when the `--normalize` option is set? > After all pandoc parses the whole document anyway! > > /bpj > > P.S. > : I am aware that there may be situations when a `<span>` or `<div>` > may be meaningful in an HTML document, most notably perhaps > when a CSS rule targets them as children of some other > element, but it seems to me that even then it is probably > most user friendly to give them a class describing their > function if they are really intended to fulfill a function in > the document, so that it might be reasonable for pandoc to > remove such attribute-less elements at least under > `-- normalize`. > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com. > For more options, visit https://groups.google.com/groups/opt_out. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> @ 2013-10-13 12:15 ` BP Jonsson [not found] ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-13 12:15 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 5996 bytes --] I'm trying to write something similar to pandocfilters.py to help with writing filters in Perl. I noticed that the that the `walk` function in pandocfilters.py seems to expect that the dict objects it receives has an element name like `CodeBlock` as a key, with the contents of the element as value. This was indeed how the JSON output by pandoc looked prior to pandoc 1.12: {"CodeBlock":[...]} but in pandoc 1.12 I get JSON output where each element object has a `tag` key with the element type, e.g. `CodeBlock` as value and a key "contents" with the element contents as value: {"tag":"CodeBlock","contents":[...]} I guessed that it's the json module which automatically converts from the 'new style' to the 'old style' behind the scenes? I tried to locate its documentation but couldn't find anything relevant. I'm probably revealing my utter ignorance of python here -- I'm not a programmer but a philologist who learned perl years ago to work on my data --, and it's probably a good time to remedy that, but I want to be sure what kind of data I should be expecting/returning, or if something is broken in my pandoc installation, however unlikely! Den lördagen den 12:e oktober 2013 skrev John MacFarlane: > I don't want to make removing empty spans the default, since > it breaks expected behavior that HTML tags will be passed through > verbatim. > > Note that if you use the python pandocfilters library to write > your filters, your transformation functions can return a list > instead of an object, in which case the list will be spliced in > to the result (which I think is what you want). > > If you're writing the filters in Haskell, you can just use a > function Inline -> [Inline] or Inline -> IO [Inline]. > > +++ BP Jonsson [Oct 11 13 13:55 ]: > > In a filter it's sometimes desirable to replace an element in its > > parent element's content list with e.g. the contents of the > > element itself, modified in some way. In practice this is hard to > > do as you'll have to walk the AST data structure and collect > > elements along with a reference to their parent element's content > > list, which is a bit more complicated than just collecting the > > (child) elements themselves. One possible workaround is to > > convert the element into a Div or Span element and set it's > > contents to whatever one wants to replace the original element > > with. It works in the sense that the Span or Div element will > > just sit there and, well, contain the data, but in HTML output it > > will show up as a `<span>...</span>` or `<div>...</div>`, even > > though it probably doesn't have any meaningful purpose in the > > HTML document; it just makes the HTML harder to read and harder > > to render. > > > > I've tried to write a filter which removes Span and Div elements > > which don't have any attributes at all (id, class or other > > attributes) -- or alternatively those which have a `disembowel=1` > > attribute, although the absence of any attributes seems a better > > criterion -- but for various > > > > reasons this has proved hard within a reasonable level of > > 'parsing', especially since the AST data structure is rather > > radically altered in the process -- paths to elements change > > during the process in ways that make the processing hard. What if > > pandoc itself replaced such attribute-less Span and Div elements > > with their content at least when the `--normalize` option is set? > > After all pandoc parses the whole document anyway! > > > > /bpj > > > > P.S. > > : I am aware that there may be situations when a `<span>` or `<div>` > > may be meaningful in an HTML document, most notably perhaps > > when a CSS rule targets them as children of some other > > element, but it seems to me that even then it is probably > > most user friendly to give them a class describing their > > function if they are really intended to fulfill a function in > > the document, so that it might be reasonable for pandoc to > > remove such attribute-less elements at least under > > `-- normalize`. > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com > . > > For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local > . > For more options, visit https://groups.google.com/groups/opt_out. > -- /BP -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. [-- Attachment #2: Type: text/html, Size: 8070 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-13 21:29 ` John MacFarlane [not found] ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: John MacFarlane @ 2013-10-13 21:29 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw There was a change in the aeson (json) library that caused pandoc's JSON format to change, briefly. (The latest code works around the change and restores the old behavior.) However, I think it would be worth considering changing to the format aeson now defaults to: {"tag": "CodeBlock", "contents": ...} instead of {"CodeBlock": ...} The former is more verbose but much easier to work with programatically. And we could remove some of the verbosity by changing "tag" to "k" and "contents" to "v". Thoughts? +++ BP Jonsson [Oct 13 13 14:15 ]: > I'm trying to write something similar to pandocfilters.py to help with > writing filters in Perl. > > I noticed that the that the `walk` function in pandocfilters.py seems > to expect that the dict objects it receives has an element name like > `CodeBlock` as a key, with the contents of the element as value. This > was indeed how the JSON output by pandoc looked prior to pandoc 1.12: > > {"CodeBlock":[...]} > > but in pandoc 1.12 I get JSON output where each element object has a > `tag` key with the element type, e.g. `CodeBlock` as value and a key > "contents" with the element contents as value: > > {"tag":"CodeBlock","contents":[...]} > > I guessed that it's the json module which automatically converts from > the 'new style' to the 'old style' behind the scenes? I tried to locate > its documentation but couldn't find anything relevant. > > I'm probably revealing my utter ignorance of python here -- I'm not a > programmer but a philologist who learned perl years ago to work on my > data --, and it's probably a good time to remedy that, but I want to be > sure what kind of data I should be expecting/returning, or if something > is broken in my pandoc installation, however unlikely! > > Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: > > I don't want to make removing empty spans the default, since > it breaks expected behavior that HTML tags will be passed through > verbatim. > Note that if you use the python pandocfilters library to write > your filters, your transformation functions can return a list > instead of an object, in which case the list will be spliced in > to the result (which I think is what you want). > If you're writing the filters in Haskell, you can just use a > function Inline -> [Inline] or Inline -> IO [Inline]. > +++ BP Jonsson [Oct 11 13 13:55 ]: > > In a filter it's sometimes desirable to replace an element in its > > parent element's content list with e.g. the contents of the > > element itself, modified in some way. In practice this is hard to > > do as you'll have to walk the AST data structure and collect > > elements along with a reference to their parent element's content > > list, which is a bit more complicated than just collecting the > > (child) elements themselves. One possible workaround is to > > convert the element into a Div or Span element and set it's > > contents to whatever one wants to replace the original element > > with. It works in the sense that the Span or Div element will > > just sit there and, well, contain the data, but in HTML output it > > will show up as a `<span>...</span>` or `<div>...</div>`, even > > though it probably doesn't have any meaningful purpose in the > > HTML document; it just makes the HTML harder to read and harder > > to render. > > > > I've tried to write a filter which removes Span and Div elements > > which don't have any attributes at all (id, class or other > > attributes) -- or alternatively those which have a `disembowel=1` > > attribute, although the absence of any attributes seems a better > > criterion -- but for various > > > > reasons this has proved hard within a reasonable level of > > 'parsing', especially since the AST data structure is rather > > radically altered in the process -- paths to elements change > > during the process in ways that make the processing hard. What if > > pandoc itself replaced such attribute-less Span and Div elements > > with their content at least when the `--normalize` option is set? > > After all pandoc parses the whole document anyway! > > > > /bpj > > > > P.S. > > : I am aware that there may be situations when a `<span>` or > `<div>` > > may be meaningful in an HTML document, most notably perhaps > > when a CSS rule targets them as children of some other > > element, but it seems to me that even then it is probably > > most user friendly to give them a class describing their > > function if they are really intended to fulfill a function in > > the document, so that it might be reasonable for pandoc to > > remove such attribute-less elements at least under > > `-- normalize`. > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, > send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To post to this group, send email to > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 > %40gmail.com. > > For more options, visit > [4]https://groups.google.com/groups/opt_out. > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to > [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G > E95559%40Johns-MacBook-Pro.local. > For more options, visit [8]https://groups.google.com/groups/opt_out. > > -- > /BP > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z > YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. > For more options, visit [10]https://groups.google.com/groups/opt_out. > > References > > 1. javascript:; > 2. javascript:; > 3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com > 4. https://groups.google.com/groups/opt_out > 5. javascript:; > 6. javascript:; > 7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local > 8. https://groups.google.com/groups/opt_out > 9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com > 10. https://groups.google.com/groups/opt_out ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> @ 2013-10-14 2:06 ` Peter Sefton [not found] ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-14 6:01 ` BP Jonsson 2013-10-16 14:16 ` BP Jonsson 2 siblings, 1 reply; 19+ messages in thread From: Peter Sefton @ 2013-10-14 2:06 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw John, As a new user I agree that the {"tag": "X" format makes more sense as it easier to code against. Is the key 'tag' and artefact of the json serialiser? In this particular case 'tag' almost works, I think a meaningful name like 'element' or 'node' would be better than 'k'. Peter On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > There was a change in the aeson (json) library that caused pandoc's > JSON format to change, briefly. (The latest code works around the > change and restores the old behavior.) > > However, I think it would be worth considering changing to the > format aeson now defaults to: > > {"tag": "CodeBlock", "contents": ...} > > instead of > > {"CodeBlock": ...} > > The former is more verbose but much easier to work with programatically. > And we could remove some of the verbosity by changing "tag" to "k" > and "contents" to "v". > > Thoughts? > > +++ BP Jonsson [Oct 13 13 14:15 ]: >> I'm trying to write something similar to pandocfilters.py to help with >> writing filters in Perl. >> >> I noticed that the that the `walk` function in pandocfilters.py seems >> to expect that the dict objects it receives has an element name like >> `CodeBlock` as a key, with the contents of the element as value. This >> was indeed how the JSON output by pandoc looked prior to pandoc 1.12: >> >> {"CodeBlock":[...]} >> >> but in pandoc 1.12 I get JSON output where each element object has a >> `tag` key with the element type, e.g. `CodeBlock` as value and a key >> "contents" with the element contents as value: >> >> {"tag":"CodeBlock","contents":[...]} >> >> I guessed that it's the json module which automatically converts from >> the 'new style' to the 'old style' behind the scenes? I tried to locate >> its documentation but couldn't find anything relevant. >> >> I'm probably revealing my utter ignorance of python here -- I'm not a >> programmer but a philologist who learned perl years ago to work on my >> data --, and it's probably a good time to remedy that, but I want to be >> sure what kind of data I should be expecting/returning, or if something >> is broken in my pandoc installation, however unlikely! >> >> Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: >> >> I don't want to make removing empty spans the default, since >> it breaks expected behavior that HTML tags will be passed through >> verbatim. >> Note that if you use the python pandocfilters library to write >> your filters, your transformation functions can return a list >> instead of an object, in which case the list will be spliced in >> to the result (which I think is what you want). >> If you're writing the filters in Haskell, you can just use a >> function Inline -> [Inline] or Inline -> IO [Inline]. >> +++ BP Jonsson [Oct 11 13 13:55 ]: >> > In a filter it's sometimes desirable to replace an element in its >> > parent element's content list with e.g. the contents of the >> > element itself, modified in some way. In practice this is hard to >> > do as you'll have to walk the AST data structure and collect >> > elements along with a reference to their parent element's content >> > list, which is a bit more complicated than just collecting the >> > (child) elements themselves. One possible workaround is to >> > convert the element into a Div or Span element and set it's >> > contents to whatever one wants to replace the original element >> > with. It works in the sense that the Span or Div element will >> > just sit there and, well, contain the data, but in HTML output it >> > will show up as a `<span>...</span>` or `<div>...</div>`, even >> > though it probably doesn't have any meaningful purpose in the >> > HTML document; it just makes the HTML harder to read and harder >> > to render. >> > >> > I've tried to write a filter which removes Span and Div elements >> > which don't have any attributes at all (id, class or other >> > attributes) -- or alternatively those which have a `disembowel=1` >> > attribute, although the absence of any attributes seems a better >> > criterion -- but for various >> > >> > reasons this has proved hard within a reasonable level of >> > 'parsing', especially since the AST data structure is rather >> > radically altered in the process -- paths to elements change >> > during the process in ways that make the processing hard. What if >> > pandoc itself replaced such attribute-less Span and Div elements >> > with their content at least when the `--normalize` option is set? >> > After all pandoc parses the whole document anyway! >> > >> > /bpj >> > >> > P.S. >> > : I am aware that there may be situations when a `<span>` or >> `<div>` >> > may be meaningful in an HTML document, most notably perhaps >> > when a CSS rule targets them as children of some other >> > element, but it seems to me that even then it is probably >> > most user friendly to give them a class describing their >> > function if they are really intended to fulfill a function in >> > the document, so that it might be reasonable for pandoc to >> > remove such attribute-less elements at least under >> > `-- normalize`. >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, >> send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To post to this group, send email to >> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit >> [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 >> %40gmail.com. >> > For more options, visit >> [4]https://groups.google.com/groups/opt_out. >> -- >> You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to >> [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G >> E95559%40Johns-MacBook-Pro.local. >> For more options, visit [8]https://groups.google.com/groups/opt_out. >> >> -- >> /BP >> >> -- >> You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z >> YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. >> For more options, visit [10]https://groups.google.com/groups/opt_out. >> >> References >> >> 1. javascript:; >> 2. javascript:; >> 3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com >> 4. https://groups.google.com/groups/opt_out >> 5. javascript:; >> 6. javascript:; >> 7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local >> 8. https://groups.google.com/groups/opt_out >> 9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com >> 10. https://groups.google.com/groups/opt_out > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu. > For more options, visit https://groups.google.com/groups/opt_out. -- Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com Gmail, Twitter & Skype name: ptsefton ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-14 2:12 ` Peter Sefton [not found] ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Peter Sefton @ 2013-10-14 2:12 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Further to this, I might be missing the point, but this could also be made much easier to work with in arbitrary languages: On Mon, Oct 14, 2013 at 1:06 PM, Peter Sefton <ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > John, > > As a new user I agree that the {"tag": "X" format makes more sense as > it easier to code against. Is the key 'tag' and artefact of the json > serialiser? In this particular case 'tag' almost works, I think a > meaningful name like 'element' or 'node' would be better than 'k'. > > Peter > > On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> There was a change in the aeson (json) library that caused pandoc's >> JSON format to change, briefly. (The latest code works around the >> change and restores the old behavior.) >> >> However, I think it would be worth considering changing to the >> format aeson now defaults to: >> >> {"tag": "CodeBlock", "contents": ...} >> >> instead of >> >> {"CodeBlock": ...} >> >> The former is more verbose but much easier to work with programatically. >> And we could remove some of the verbosity by changing "tag" to "k" >> and "contents" to "v". >> >> Thoughts? >> >> +++ BP Jonsson [Oct 13 13 14:15 ]: >>> I'm trying to write something similar to pandocfilters.py to help with >>> writing filters in Perl. >>> >>> I noticed that the that the `walk` function in pandocfilters.py seems >>> to expect that the dict objects it receives has an element name like >>> `CodeBlock` as a key, with the contents of the element as value. This >>> was indeed how the JSON output by pandoc looked prior to pandoc 1.12: >>> >>> {"CodeBlock":[...]} >>> >>> but in pandoc 1.12 I get JSON output where each element object has a >>> `tag` key with the element type, e.g. `CodeBlock` as value and a key >>> "contents" with the element contents as value: >>> >>> {"tag":"CodeBlock","contents":[...]} >>> >>> I guessed that it's the json module which automatically converts from >>> the 'new style' to the 'old style' behind the scenes? I tried to locate >>> its documentation but couldn't find anything relevant. >>> >>> I'm probably revealing my utter ignorance of python here -- I'm not a >>> programmer but a philologist who learned perl years ago to work on my >>> data --, and it's probably a good time to remedy that, but I want to be >>> sure what kind of data I should be expecting/returning, or if something >>> is broken in my pandoc installation, however unlikely! >>> >>> Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: >>> >>> I don't want to make removing empty spans the default, since >>> it breaks expected behavior that HTML tags will be passed through >>> verbatim. >>> Note that if you use the python pandocfilters library to write >>> your filters, your transformation functions can return a list >>> instead of an object, in which case the list will be spliced in >>> to the result (which I think is what you want). >>> If you're writing the filters in Haskell, you can just use a >>> function Inline -> [Inline] or Inline -> IO [Inline]. >>> +++ BP Jonsson [Oct 11 13 13:55 ]: >>> > In a filter it's sometimes desirable to replace an element in its >>> > parent element's content list with e.g. the contents of the >>> > element itself, modified in some way. In practice this is hard to >>> > do as you'll have to walk the AST data structure and collect >>> > elements along with a reference to their parent element's content >>> > list, which is a bit more complicated than just collecting the >>> > (child) elements themselves. One possible workaround is to >>> > convert the element into a Div or Span element and set it's >>> > contents to whatever one wants to replace the original element >>> > with. It works in the sense that the Span or Div element will >>> > just sit there and, well, contain the data, but in HTML output it >>> > will show up as a `<span>...</span>` or `<div>...</div>`, even >>> > though it probably doesn't have any meaningful purpose in the >>> > HTML document; it just makes the HTML harder to read and harder >>> > to render. >>> > >>> > I've tried to write a filter which removes Span and Div elements >>> > which don't have any attributes at all (id, class or other >>> > attributes) -- or alternatively those which have a `disembowel=1` >>> > attribute, although the absence of any attributes seems a better >>> > criterion -- but for various >>> > >>> > reasons this has proved hard within a reasonable level of >>> > 'parsing', especially since the AST data structure is rather >>> > radically altered in the process -- paths to elements change >>> > during the process in ways that make the processing hard. What if >>> > pandoc itself replaced such attribute-less Span and Div elements >>> > with their content at least when the `--normalize` option is set? >>> > After all pandoc parses the whole document anyway! >>> > >>> > /bpj >>> > >>> > P.S. >>> > : I am aware that there may be situations when a `<span>` or >>> `<div>` >>> > may be meaningful in an HTML document, most notably perhaps >>> > when a CSS rule targets them as children of some other >>> > element, but it seems to me that even then it is probably >>> > most user friendly to give them a class describing their >>> > function if they are really intended to fulfill a function in >>> > the document, so that it might be reasonable for pandoc to >>> > remove such attribute-less elements at least under >>> > `-- normalize`. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> > To unsubscribe from this group and stop receiving emails from it, >>> send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> > To post to this group, send email to >>> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> > To view this discussion on the web visit >>> [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 >>> %40gmail.com. >>> > For more options, visit >>> [4]https://groups.google.com/groups/opt_out. >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To post to this group, send email to >>> [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G >>> E95559%40Johns-MacBook-Pro.local. >>> For more options, visit [8]https://groups.google.com/groups/opt_out. >>> >>> -- >>> /BP >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z >>> YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. >>> For more options, visit [10]https://groups.google.com/groups/opt_out. >>> >>> References >>> >>> 1. javascript:; >>> 2. javascript:; >>> 3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com >>> 4. https://groups.google.com/groups/opt_out >>> 5. javascript:; >>> 6. javascript:; >>> 7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local >>> 8. https://groups.google.com/groups/opt_out >>> 9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com >>> 10. https://groups.google.com/groups/opt_out >> >> -- >> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu. >> For more options, visit https://groups.google.com/groups/opt_out. > > > > -- > > Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com > Gmail, Twitter & Skype name: ptsefton -- Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com Gmail, Twitter & Skype name: ptsefton ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-14 2:16 ` Peter Sefton [not found] ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Peter Sefton @ 2013-10-14 2:16 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Oh dear, sorry for sending that last incomplete thought, fingers slipped. Further to this, I might be missing the point, but this could also be made much easier to work with in arbitrary languages: [{"Header":[1,["heading-1",[],[]] Something like: [{"node" : "Header", "level":1, "id": "heading", ... On Mon, Oct 14, 2013 at 1:12 PM, Peter Sefton <ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Further to this, I might be missing the point, but this could also be > made much easier to work with in arbitrary languages: > > > On Mon, Oct 14, 2013 at 1:06 PM, Peter Sefton <ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> John, >> >> As a new user I agree that the {"tag": "X" format makes more sense as >> it easier to code against. Is the key 'tag' and artefact of the json >> serialiser? In this particular case 'tag' almost works, I think a >> meaningful name like 'element' or 'node' would be better than 'k'. >> >> Peter >> >> On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> There was a change in the aeson (json) library that caused pandoc's >>> JSON format to change, briefly. (The latest code works around the >>> change and restores the old behavior.) >>> >>> However, I think it would be worth considering changing to the >>> format aeson now defaults to: >>> >>> {"tag": "CodeBlock", "contents": ...} >>> >>> instead of >>> >>> {"CodeBlock": ...} >>> >>> The former is more verbose but much easier to work with programatically. >>> And we could remove some of the verbosity by changing "tag" to "k" >>> and "contents" to "v". >>> >>> Thoughts? >>> >>> +++ BP Jonsson [Oct 13 13 14:15 ]: >>>> I'm trying to write something similar to pandocfilters.py to help with >>>> writing filters in Perl. >>>> >>>> I noticed that the that the `walk` function in pandocfilters.py seems >>>> to expect that the dict objects it receives has an element name like >>>> `CodeBlock` as a key, with the contents of the element as value. This >>>> was indeed how the JSON output by pandoc looked prior to pandoc 1.12: >>>> >>>> {"CodeBlock":[...]} >>>> >>>> but in pandoc 1.12 I get JSON output where each element object has a >>>> `tag` key with the element type, e.g. `CodeBlock` as value and a key >>>> "contents" with the element contents as value: >>>> >>>> {"tag":"CodeBlock","contents":[...]} >>>> >>>> I guessed that it's the json module which automatically converts from >>>> the 'new style' to the 'old style' behind the scenes? I tried to locate >>>> its documentation but couldn't find anything relevant. >>>> >>>> I'm probably revealing my utter ignorance of python here -- I'm not a >>>> programmer but a philologist who learned perl years ago to work on my >>>> data --, and it's probably a good time to remedy that, but I want to be >>>> sure what kind of data I should be expecting/returning, or if something >>>> is broken in my pandoc installation, however unlikely! >>>> >>>> Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: >>>> >>>> I don't want to make removing empty spans the default, since >>>> it breaks expected behavior that HTML tags will be passed through >>>> verbatim. >>>> Note that if you use the python pandocfilters library to write >>>> your filters, your transformation functions can return a list >>>> instead of an object, in which case the list will be spliced in >>>> to the result (which I think is what you want). >>>> If you're writing the filters in Haskell, you can just use a >>>> function Inline -> [Inline] or Inline -> IO [Inline]. >>>> +++ BP Jonsson [Oct 11 13 13:55 ]: >>>> > In a filter it's sometimes desirable to replace an element in its >>>> > parent element's content list with e.g. the contents of the >>>> > element itself, modified in some way. In practice this is hard to >>>> > do as you'll have to walk the AST data structure and collect >>>> > elements along with a reference to their parent element's content >>>> > list, which is a bit more complicated than just collecting the >>>> > (child) elements themselves. One possible workaround is to >>>> > convert the element into a Div or Span element and set it's >>>> > contents to whatever one wants to replace the original element >>>> > with. It works in the sense that the Span or Div element will >>>> > just sit there and, well, contain the data, but in HTML output it >>>> > will show up as a `<span>...</span>` or `<div>...</div>`, even >>>> > though it probably doesn't have any meaningful purpose in the >>>> > HTML document; it just makes the HTML harder to read and harder >>>> > to render. >>>> > >>>> > I've tried to write a filter which removes Span and Div elements >>>> > which don't have any attributes at all (id, class or other >>>> > attributes) -- or alternatively those which have a `disembowel=1` >>>> > attribute, although the absence of any attributes seems a better >>>> > criterion -- but for various >>>> > >>>> > reasons this has proved hard within a reasonable level of >>>> > 'parsing', especially since the AST data structure is rather >>>> > radically altered in the process -- paths to elements change >>>> > during the process in ways that make the processing hard. What if >>>> > pandoc itself replaced such attribute-less Span and Div elements >>>> > with their content at least when the `--normalize` option is set? >>>> > After all pandoc parses the whole document anyway! >>>> > >>>> > /bpj >>>> > >>>> > P.S. >>>> > : I am aware that there may be situations when a `<span>` or >>>> `<div>` >>>> > may be meaningful in an HTML document, most notably perhaps >>>> > when a CSS rule targets them as children of some other >>>> > element, but it seems to me that even then it is probably >>>> > most user friendly to give them a class describing their >>>> > function if they are really intended to fulfill a function in >>>> > the document, so that it might be reasonable for pandoc to >>>> > remove such attribute-less elements at least under >>>> > `-- normalize`. >>>> > >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> > To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> > To post to this group, send email to >>>> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> > To view this discussion on the web visit >>>> [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 >>>> %40gmail.com. >>>> > For more options, visit >>>> [4]https://groups.google.com/groups/opt_out. >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, >>>> send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To post to this group, send email to >>>> [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit >>>> [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G >>>> E95559%40Johns-MacBook-Pro.local. >>>> For more options, visit [8]https://groups.google.com/groups/opt_out. >>>> >>>> -- >>>> /BP >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit >>>> [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z >>>> YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. >>>> For more options, visit [10]https://groups.google.com/groups/opt_out. >>>> >>>> References >>>> >>>> 1. javascript:; >>>> 2. javascript:; >>>> 3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com >>>> 4. https://groups.google.com/groups/opt_out >>>> 5. javascript:; >>>> 6. javascript:; >>>> 7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local >>>> 8. https://groups.google.com/groups/opt_out >>>> 9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com >>>> 10. https://groups.google.com/groups/opt_out >>> >>> -- >>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu. >>> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> >> -- >> >> Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com >> Gmail, Twitter & Skype name: ptsefton > > > > -- > > Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com > Gmail, Twitter & Skype name: ptsefton -- Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com Gmail, Twitter & Skype name: ptsefton ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-14 7:03 ` John MacFarlane 0 siblings, 0 replies; 19+ messages in thread From: John MacFarlane @ 2013-10-14 7:03 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw +++ Peter Sefton [Oct 14 13 13:16 ]: > Further to this, I might be missing the point, but this could also be > made much easier to work with in arbitrary languages: > > [{"Header":[1,["heading-1",[],[]] > > Something like: > [{"node" : "Header", "level":1, "id": "heading", ... Well, yes. I could do that, with custom ToJSON/FromJSON instances for Inline and Block. Right now I'm relying on generic instances which are automatically generated, and which can be predicted from the types. I've contemplated doing something fancier, but I'm not sure the payoff is big enough. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 2013-10-14 2:06 ` Peter Sefton @ 2013-10-14 6:01 ` BP Jonsson 2013-10-16 14:16 ` BP Jonsson 2 siblings, 0 replies; 19+ messages in thread From: BP Jonsson @ 2013-10-14 6:01 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 10649 bytes --] Oh, I must have missed a pandoc update! I agree that the altered format is easier to work with programmatically; in fact I thought you had made the change deliberately for that reason. I don't see any real benefit in ultra-short keys though, preferring descriptive but not *too* long names for what is in effect attributes, and any good editor should be expected to support tab completion. To use "key" and "value" or "k" and "v" as keys might be potentially confusing to newcomers however suggestive of the earlier format it might be. I'd suggest "type" and "data" as both short enough and adequately descriptive. Not that it's a very big deal; part of my perl scripting support module is already about munging pandoc elements as decoded from json into manageable objects inorder to not have to deal with things like this all the time: my($keyval) = grep { $_->[0] eq 'foo' } @{$elem->{$key}[0][2]}; so in the whole a my $type = ( keys %$elem )[0]; is really nothing! Please don't get me wrong: being able to rely on every hash in the input being an element is a feature, and being able to rely on those hashes having exactly one key which is the name of the element type works well in practice. It probably makes the json more human readable too. Den söndagen den 13:e oktober 2013 skrev John MacFarlane: > There was a change in the aeson (json) library that caused pandoc's > JSON format to change, briefly. (The latest code works around the > change and restores the old behavior.) > > However, I think it would be worth considering changing to the > format aeson now defaults to: > > {"tag": "CodeBlock", "contents": ...} > > instead of > > {"CodeBlock": ...} > > The former is more verbose but much easier to work with programatically. > And we could remove some of the verbosity by changing "tag" to "k" > and "contents" to "v". > > Thoughts? > > +++ BP Jonsson [Oct 13 13 14:15 ]: > > I'm trying to write something similar to pandocfilters.py to help with > > writing filters in Perl. > > > > I noticed that the that the `walk` function in pandocfilters.py seems > > to expect that the dict objects it receives has an element name like > > `CodeBlock` as a key, with the contents of the element as value. This > > was indeed how the JSON output by pandoc looked prior to pandoc 1.12: > > > > {"CodeBlock":[...]} > > > > but in pandoc 1.12 I get JSON output where each element object has a > > `tag` key with the element type, e.g. `CodeBlock` as value and a key > > "contents" with the element contents as value: > > > > {"tag":"CodeBlock","contents":[...]} > > > > I guessed that it's the json module which automatically converts from > > the 'new style' to the 'old style' behind the scenes? I tried to > locate > > its documentation but couldn't find anything relevant. > > > > I'm probably revealing my utter ignorance of python here -- I'm not a > > programmer but a philologist who learned perl years ago to work on my > > data --, and it's probably a good time to remedy that, but I want to > be > > sure what kind of data I should be expecting/returning, or if > something > > is broken in my pandoc installation, however unlikely! > > > > Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: > > > > I don't want to make removing empty spans the default, since > > it breaks expected behavior that HTML tags will be passed through > > verbatim. > > Note that if you use the python pandocfilters library to write > > your filters, your transformation functions can return a list > > instead of an object, in which case the list will be spliced in > > to the result (which I think is what you want). > > If you're writing the filters in Haskell, you can just use a > > function Inline -> [Inline] or Inline -> IO [Inline]. > > +++ BP Jonsson [Oct 11 13 13:55 ]: > > > In a filter it's sometimes desirable to replace an element in its > > > parent element's content list with e.g. the contents of the > > > element itself, modified in some way. In practice this is hard to > > > do as you'll have to walk the AST data structure and collect > > > elements along with a reference to their parent element's content > > > list, which is a bit more complicated than just collecting the > > > (child) elements themselves. One possible workaround is to > > > convert the element into a Div or Span element and set it's > > > contents to whatever one wants to replace the original element > > > with. It works in the sense that the Span or Div element will > > > just sit there and, well, contain the data, but in HTML output it > > > will show up as a `<span>...</span>` or `<div>...</div>`, even > > > though it probably doesn't have any meaningful purpose in the > > > HTML document; it just makes the HTML harder to read and harder > > > to render. > > > > > > I've tried to write a filter which removes Span and Div elements > > > which don't have any attributes at all (id, class or other > > > attributes) -- or alternatively those which have a `disembowel=1` > > > attribute, although the absence of any attributes seems a better > > > criterion -- but for various > > > > > > reasons this has proved hard within a reasonable level of > > > 'parsing', especially since the AST data structure is rather > > > radically altered in the process -- paths to elements change > > > during the process in ways that make the processing hard. What if > > > pandoc itself replaced such attribute-less Span and Div elements > > > with their content at least when the `--normalize` option is set? > > > After all pandoc parses the whole document anyway! > > > > > > /bpj > > > > > > P.S. > > > : I am aware that there may be situations when a `<span>` or > > `<div>` > > > may be meaningful in an HTML document, most notably perhaps > > > when a CSS rule targets them as children of some other > > > element, but it seems to me that even then it is probably > > > most user friendly to give them a class describing their > > > function if they are really intended to fulfill a function in > > > the document, so that it might be reasonable for pandoc to > > > remove such attribute-less elements at least under > > > `-- normalize`. > > > > > > -- > > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > > To post to this group, send email to > > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > > To view this discussion on the web visit > > [3] > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 > > %40gmail.com. > > > For more options, visit > > [4]https://groups.google.com/groups/opt_out. > > -- > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To post to this group, send email to > > [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > To view this discussion on the web visit > > [7] > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G > > E95559%40Johns-MacBook-Pro.local. > > For more options, visit [8]https://groups.google.com/groups/opt_out > . > > > > -- > > /BP > > > > -- > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To view this discussion on the web visit > > [9] > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z > > YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. > > For more options, visit [10]https://groups.google.com/groups/opt_out. > > > > References > > > > 1. javascript:; > > 2. javascript:; > > 3. > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com > > 4. https://groups.google.com/groups/opt_out > > 5. javascript:; > > 6. javascript:; > > 7. > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local > > 8. https://groups.google.com/groups/opt_out > > 9. > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com > > 10. https://groups.google.com/groups/opt_out > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu > . > For more options, visit https://groups.google.com/groups/opt_out. > -- /BP -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuQT%3DA30x6fogB5a482OgVjX6uYtdmx1USoXnhSBo_mGqg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. [-- Attachment #2: Type: text/html, Size: 14722 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 2013-10-14 2:06 ` Peter Sefton 2013-10-14 6:01 ` BP Jonsson @ 2013-10-16 14:16 ` BP Jonsson [not found] ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2 siblings, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-16 14:16 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 9819 bytes --] Is that restoration in the development version only? I tried to update but cabal said everything was up to date. I guess that in the meanwhile I could write a filter which will convert from the changed to the traditional JSON format and one to convert the other way around and call them before and after other filters, provided it is possible to chain filters without pandoc seeing the data between the filters. That should work, shouldn't it? /bpj Den söndagen den 13:e oktober 2013 skrev John MacFarlane: > There was a change in the aeson (json) library that caused pandoc's > JSON format to change, briefly. (The latest code works around the > change and restores the old behavior.) > > However, I think it would be worth considering changing to the > format aeson now defaults to: > > {"tag": "CodeBlock", "contents": ...} > > instead of > > {"CodeBlock": ...} > > The former is more verbose but much easier to work with programatically. > And we could remove some of the verbosity by changing "tag" to "k" > and "contents" to "v". > > Thoughts? > > +++ BP Jonsson [Oct 13 13 14:15 ]: > > I'm trying to write something similar to pandocfilters.py to help with > > writing filters in Perl. > > > > I noticed that the that the `walk` function in pandocfilters.py seems > > to expect that the dict objects it receives has an element name like > > `CodeBlock` as a key, with the contents of the element as value. This > > was indeed how the JSON output by pandoc looked prior to pandoc 1.12: > > > > {"CodeBlock":[...]} > > > > but in pandoc 1.12 I get JSON output where each element object has a > > `tag` key with the element type, e.g. `CodeBlock` as value and a key > > "contents" with the element contents as value: > > > > {"tag":"CodeBlock","contents":[...]} > > > > I guessed that it's the json module which automatically converts from > > the 'new style' to the 'old style' behind the scenes? I tried to > locate > > its documentation but couldn't find anything relevant. > > > > I'm probably revealing my utter ignorance of python here -- I'm not a > > programmer but a philologist who learned perl years ago to work on my > > data --, and it's probably a good time to remedy that, but I want to > be > > sure what kind of data I should be expecting/returning, or if > something > > is broken in my pandoc installation, however unlikely! > > > > Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: > > > > I don't want to make removing empty spans the default, since > > it breaks expected behavior that HTML tags will be passed through > > verbatim. > > Note that if you use the python pandocfilters library to write > > your filters, your transformation functions can return a list > > instead of an object, in which case the list will be spliced in > > to the result (which I think is what you want). > > If you're writing the filters in Haskell, you can just use a > > function Inline -> [Inline] or Inline -> IO [Inline]. > > +++ BP Jonsson [Oct 11 13 13:55 ]: > > > In a filter it's sometimes desirable to replace an element in its > > > parent element's content list with e.g. the contents of the > > > element itself, modified in some way. In practice this is hard to > > > do as you'll have to walk the AST data structure and collect > > > elements along with a reference to their parent element's content > > > list, which is a bit more complicated than just collecting the > > > (child) elements themselves. One possible workaround is to > > > convert the element into a Div or Span element and set it's > > > contents to whatever one wants to replace the original element > > > with. It works in the sense that the Span or Div element will > > > just sit there and, well, contain the data, but in HTML output it > > > will show up as a `<span>...</span>` or `<div>...</div>`, even > > > though it probably doesn't have any meaningful purpose in the > > > HTML document; it just makes the HTML harder to read and harder > > > to render. > > > > > > I've tried to write a filter which removes Span and Div elements > > > which don't have any attributes at all (id, class or other > > > attributes) -- or alternatively those which have a `disembowel=1` > > > attribute, although the absence of any attributes seems a better > > > criterion -- but for various > > > > > > reasons this has proved hard within a reasonable level of > > > 'parsing', especially since the AST data structure is rather > > > radically altered in the process -- paths to elements change > > > during the process in ways that make the processing hard. What if > > > pandoc itself replaced such attribute-less Span and Div elements > > > with their content at least when the `--normalize` option is set? > > > After all pandoc parses the whole document anyway! > > > > > > /bpj > > > > > > P.S. > > > : I am aware that there may be situations when a `<span>` or > > `<div>` > > > may be meaningful in an HTML document, most notably perhaps > > > when a CSS rule targets them as children of some other > > > element, but it seems to me that even then it is probably > > > most user friendly to give them a class describing their > > > function if they are really intended to fulfill a function in > > > the document, so that it might be reasonable for pandoc to > > > remove such attribute-less elements at least under > > > `-- normalize`. > > > > > > -- > > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > > To post to this group, send email to > > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > > To view this discussion on the web visit > > [3] > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 > > %40gmail.com. > > > For more options, visit > > [4]https://groups.google.com/groups/opt_out. > > -- > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To post to this group, send email to > > [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > To view this discussion on the web visit > > [7] > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G > > E95559%40Johns-MacBook-Pro.local. > > For more options, visit [8]https://groups.google.com/groups/opt_out > . > > > > -- > > /BP > > > > -- > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > > To view this discussion on the web visit > > [9] > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z > > YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. > > For more options, visit [10]https://groups.google.com/groups/opt_out. > > > > References > > > > 1. javascript:; > > 2. javascript:; > > 3. > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com > > 4. https://groups.google.com/groups/opt_out > > 5. javascript:; > > 6. javascript:; > > 7. > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local > > 8. https://groups.google.com/groups/opt_out > > 9. > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com > > 10. https://groups.google.com/groups/opt_out > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > . > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu > . > For more options, visit https://groups.google.com/groups/opt_out. > -- /BP -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRuOgEKK8BG%3DeDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out. [-- Attachment #2: Type: text/html, Size: 13692 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-10-16 16:13 ` John MacFarlane 2013-10-16 16:32 ` BP Jonsson 1 sibling, 0 replies; 19+ messages in thread From: John MacFarlane @ 2013-10-16 16:13 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw I'm reworking the JSON format right now. Note, even if pandoc is up to date, it may be compiled against the older pandoc-types library. So cabal install pandoc-types pandoc +++ BP Jonsson [Oct 16 13 16:16 ]: > Is that restoration in the development version only? I tried to update but > cabal said everything was up to date. > > I guess that in the meanwhile I could write a filter which will convert > from the changed to the traditional JSON format and one to convert the > other way around and call them before and after other filters, provided it > is possible to chain filters without pandoc seeing the data between the > filters. That should work, shouldn't it? > > /bpj > > Den söndagen den 13:e oktober 2013 skrev John MacFarlane: > > > There was a change in the aeson (json) library that caused pandoc's > > JSON format to change, briefly. (The latest code works around the > > change and restores the old behavior.) > > > > However, I think it would be worth considering changing to the > > format aeson now defaults to: > > > > {"tag": "CodeBlock", "contents": ...} > > > > instead of > > > > {"CodeBlock": ...} > > > > The former is more verbose but much easier to work with programatically. > > And we could remove some of the verbosity by changing "tag" to "k" > > and "contents" to "v". > > > > Thoughts? > > > > +++ BP Jonsson [Oct 13 13 14:15 ]: > > > I'm trying to write something similar to pandocfilters.py to help with > > > writing filters in Perl. > > > > > > I noticed that the that the `walk` function in pandocfilters.py seems > > > to expect that the dict objects it receives has an element name like > > > `CodeBlock` as a key, with the contents of the element as value. This > > > was indeed how the JSON output by pandoc looked prior to pandoc 1.12: > > > > > > {"CodeBlock":[...]} > > > > > > but in pandoc 1.12 I get JSON output where each element object has a > > > `tag` key with the element type, e.g. `CodeBlock` as value and a key > > > "contents" with the element contents as value: > > > > > > {"tag":"CodeBlock","contents":[...]} > > > > > > I guessed that it's the json module which automatically converts from > > > the 'new style' to the 'old style' behind the scenes? I tried to > > locate > > > its documentation but couldn't find anything relevant. > > > > > > I'm probably revealing my utter ignorance of python here -- I'm not a > > > programmer but a philologist who learned perl years ago to work on my > > > data --, and it's probably a good time to remedy that, but I want to > > be > > > sure what kind of data I should be expecting/returning, or if > > something > > > is broken in my pandoc installation, however unlikely! > > > > > > Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: > > > > > > I don't want to make removing empty spans the default, since > > > it breaks expected behavior that HTML tags will be passed through > > > verbatim. > > > Note that if you use the python pandocfilters library to write > > > your filters, your transformation functions can return a list > > > instead of an object, in which case the list will be spliced in > > > to the result (which I think is what you want). > > > If you're writing the filters in Haskell, you can just use a > > > function Inline -> [Inline] or Inline -> IO [Inline]. > > > +++ BP Jonsson [Oct 11 13 13:55 ]: > > > > In a filter it's sometimes desirable to replace an element in its > > > > parent element's content list with e.g. the contents of the > > > > element itself, modified in some way. In practice this is hard to > > > > do as you'll have to walk the AST data structure and collect > > > > elements along with a reference to their parent element's content > > > > list, which is a bit more complicated than just collecting the > > > > (child) elements themselves. One possible workaround is to > > > > convert the element into a Div or Span element and set it's > > > > contents to whatever one wants to replace the original element > > > > with. It works in the sense that the Span or Div element will > > > > just sit there and, well, contain the data, but in HTML output it > > > > will show up as a `<span>...</span>` or `<div>...</div>`, even > > > > though it probably doesn't have any meaningful purpose in the > > > > HTML document; it just makes the HTML harder to read and harder > > > > to render. > > > > > > > > I've tried to write a filter which removes Span and Div elements > > > > which don't have any attributes at all (id, class or other > > > > attributes) -- or alternatively those which have a `disembowel=1` > > > > attribute, although the absence of any attributes seems a better > > > > criterion -- but for various > > > > > > > > reasons this has proved hard within a reasonable level of > > > > 'parsing', especially since the AST data structure is rather > > > > radically altered in the process -- paths to elements change > > > > during the process in ways that make the processing hard. What if > > > > pandoc itself replaced such attribute-less Span and Div elements > > > > with their content at least when the `--normalize` option is set? > > > > After all pandoc parses the whole document anyway! > > > > > > > > /bpj > > > > > > > > P.S. > > > > : I am aware that there may be situations when a `<span>` or > > > `<div>` > > > > may be meaningful in an HTML document, most notably perhaps > > > > when a CSS rule targets them as children of some other > > > > element, but it seems to me that even then it is probably > > > > most user friendly to give them a class describing their > > > > function if they are really intended to fulfill a function in > > > > the document, so that it might be reasonable for pandoc to > > > > remove such attribute-less elements at least under > > > > `-- normalize`. > > > > > > > > -- > > > > You received this message because you are subscribed to the Google > > > Groups "pandoc-discuss" group. > > > > To unsubscribe from this group and stop receiving emails from it, > > > send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > > . > > > > To post to this group, send email to > > > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > > > To view this discussion on the web visit > > > [3] > > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 > > > %40gmail.com. > > > > For more options, visit > > > [4]https://groups.google.com/groups/opt_out. > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "pandoc-discuss" group. > > > To unsubscribe from this group and stop receiving emails from it, > > > send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > > . > > > To post to this group, send email to > > > [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > > To view this discussion on the web visit > > > [7] > > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G > > > E95559%40Johns-MacBook-Pro.local. > > > For more options, visit [8]https://groups.google.com/groups/opt_out > > . > > > > > > -- > > > /BP > > > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "pandoc-discuss" group. > > > To unsubscribe from this group and stop receiving emails from it, send > > > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > > . > > > To post to this group, send email to pandoc-discuss@googlegroups.com<javascript:;> > > . > > > To view this discussion on the web visit > > > [9] > > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z > > > YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. > > > For more options, visit [10]https://groups.google.com/groups/opt_out. > > > > > > References > > > > > > 1. javascript:; > > > 2. javascript:; > > > 3. > > https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com > > > 4. https://groups.google.com/groups/opt_out > > > 5. javascript:; > > > 6. javascript:; > > > 7. > > https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local > > > 8. https://groups.google.com/groups/opt_out > > > 9. > > https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com > > > 10. https://groups.google.com/groups/opt_out > > > > -- > > You received this message because you are subscribed to the Google Groups > > "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. > > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> > > . > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu > > . > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > /BP > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRuOgEKK8BG%3DeDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A%40mail.gmail.com. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131016161336.GD52267%40Johns-MacBook-Pro.local. For more options, visit https://groups.google.com/groups/opt_out. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-16 16:13 ` John MacFarlane @ 2013-10-16 16:32 ` BP Jonsson [not found] ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-16 16:32 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw 2013-10-16 16:16, BP Jonsson skrev: > Is that restoration in the development version only? I tried to update but > cabal said everything was up to date. > > I guess that in the meanwhile I could write a filter which will convert > from the changed to the traditional JSON format and one to convert the > other way around and call them before and after other filters, provided it > is possible to chain filters without pandoc seeing the data between the > filters. That should work, shouldn't it? I discovered it doesn't, but I incidentally peeked at <https://github.com/jgm/pandocfilters/blob/460404290a3e956dff3cb0321aa908c4cffabbaf/pandocfilters.py> And found that you have decided to change the JSON format so that objects have a "tag" key and a "val" key. Assuming this will be the format in the next release I can make do for now with an environment variable + variable to temporarily use "contents" instead of "val" for the "val" key. Are there any other changes to the JSON format? /BPJ > > /bpj > > Den söndagen den 13:e oktober 2013 skrev John MacFarlane: > >> There was a change in the aeson (json) library that caused pandoc's >> JSON format to change, briefly. (The latest code works around the >> change and restores the old behavior.) >> >> However, I think it would be worth considering changing to the >> format aeson now defaults to: >> >> {"tag": "CodeBlock", "contents": ...} >> >> instead of >> >> {"CodeBlock": ...} >> >> The former is more verbose but much easier to work with programatically. >> And we could remove some of the verbosity by changing "tag" to "k" >> and "contents" to "v". >> >> Thoughts? >> >> +++ BP Jonsson [Oct 13 13 14:15 ]: >>> I'm trying to write something similar to pandocfilters.py to help with >>> writing filters in Perl. >>> >>> I noticed that the that the `walk` function in pandocfilters.py seems >>> to expect that the dict objects it receives has an element name like >>> `CodeBlock` as a key, with the contents of the element as value. This >>> was indeed how the JSON output by pandoc looked prior to pandoc 1.12: >>> >>> {"CodeBlock":[...]} >>> >>> but in pandoc 1.12 I get JSON output where each element object has a >>> `tag` key with the element type, e.g. `CodeBlock` as value and a key >>> "contents" with the element contents as value: >>> >>> {"tag":"CodeBlock","contents":[...]} >>> >>> I guessed that it's the json module which automatically converts from >>> the 'new style' to the 'old style' behind the scenes? I tried to >> locate >>> its documentation but couldn't find anything relevant. >>> >>> I'm probably revealing my utter ignorance of python here -- I'm not a >>> programmer but a philologist who learned perl years ago to work on my >>> data --, and it's probably a good time to remedy that, but I want to >> be >>> sure what kind of data I should be expecting/returning, or if >> something >>> is broken in my pandoc installation, however unlikely! >>> >>> Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: >>> >>> I don't want to make removing empty spans the default, since >>> it breaks expected behavior that HTML tags will be passed through >>> verbatim. >>> Note that if you use the python pandocfilters library to write >>> your filters, your transformation functions can return a list >>> instead of an object, in which case the list will be spliced in >>> to the result (which I think is what you want). >>> If you're writing the filters in Haskell, you can just use a >>> function Inline -> [Inline] or Inline -> IO [Inline]. >>> +++ BP Jonsson [Oct 11 13 13:55 ]: >>> > In a filter it's sometimes desirable to replace an element in its >>> > parent element's content list with e.g. the contents of the >>> > element itself, modified in some way. In practice this is hard to >>> > do as you'll have to walk the AST data structure and collect >>> > elements along with a reference to their parent element's content >>> > list, which is a bit more complicated than just collecting the >>> > (child) elements themselves. One possible workaround is to >>> > convert the element into a Div or Span element and set it's >>> > contents to whatever one wants to replace the original element >>> > with. It works in the sense that the Span or Div element will >>> > just sit there and, well, contain the data, but in HTML output it >>> > will show up as a `<span>...</span>` or `<div>...</div>`, even >>> > though it probably doesn't have any meaningful purpose in the >>> > HTML document; it just makes the HTML harder to read and harder >>> > to render. >>> > >>> > I've tried to write a filter which removes Span and Div elements >>> > which don't have any attributes at all (id, class or other >>> > attributes) -- or alternatively those which have a `disembowel=1` >>> > attribute, although the absence of any attributes seems a better >>> > criterion -- but for various >>> > >>> > reasons this has proved hard within a reasonable level of >>> > 'parsing', especially since the AST data structure is rather >>> > radically altered in the process -- paths to elements change >>> > during the process in ways that make the processing hard. What if >>> > pandoc itself replaced such attribute-less Span and Div elements >>> > with their content at least when the `--normalize` option is set? >>> > After all pandoc parses the whole document anyway! >>> > >>> > /bpj >>> > >>> > P.S. >>> > : I am aware that there may be situations when a `<span>` or >>> `<div>` >>> > may be meaningful in an HTML document, most notably perhaps >>> > when a CSS rule targets them as children of some other >>> > element, but it seems to me that even then it is probably >>> > most user friendly to give them a class describing their >>> > function if they are really intended to fulfill a function in >>> > the document, so that it might be reasonable for pandoc to >>> > remove such attribute-less elements at least under >>> > `-- normalize`. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> > To unsubscribe from this group and stop receiving emails from it, >>> send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> >> . >>> > To post to this group, send email to >>> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. >>> > To view this discussion on the web visit >>> [3] >> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 >>> %40gmail.com. >>> > For more options, visit >>> [4]https://groups.google.com/groups/opt_out. >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> >> . >>> To post to this group, send email to >>> [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. >>> To view this discussion on the web visit >>> [7] >> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G >>> E95559%40Johns-MacBook-Pro.local. >>> For more options, visit [8]https://groups.google.com/groups/opt_out >> . >>> >>> -- >>> /BP >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> >> . >>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.orgm<javascript:;> >> . >>> To view this discussion on the web visit >>> [9] >> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z >>> YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. >>> For more options, visit [10]https://groups.google.com/groups/opt_out. >>> >>> References >>> >>> 1. javascript:; >>> 2. javascript:; >>> 3. >> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com >>> 4. https://groups.google.com/groups/opt_out >>> 5. javascript:; >>> 6. javascript:; >>> 7. >> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local >>> 8. https://groups.google.com/groups/opt_out >>> 9. >> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com >>> 10. https://groups.google.com/groups/opt_out >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>. >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;> >> . >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/525EBF8A.7050106%40gmail.com. For more options, visit https://groups.google.com/groups/opt_out. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-16 17:49 ` John MacFarlane [not found] ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: John MacFarlane @ 2013-10-16 17:49 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw +++ BP Jonsson [Oct 16 13 18:32 ]: > 2013-10-16 16:16, BP Jonsson skrev: > >Is that restoration in the development version only? I tried to update but > >cabal said everything was up to date. > > > >I guess that in the meanwhile I could write a filter which will convert > >from the changed to the traditional JSON format and one to convert the > >other way around and call them before and after other filters, provided it > >is possible to chain filters without pandoc seeing the data between the > >filters. That should work, shouldn't it? > > I discovered it doesn't, but I incidentally peeked at > <https://github.com/jgm/pandocfilters/blob/460404290a3e956dff3cb0321aa908c4cffabbaf/pandocfilters.py> > > And found that you have decided to change the JSON format so that > objects have a "tag" key and a "val" key. Assuming this will be the > format in the next release I can make do for now with an > environment variable + variable to temporarily use "contents" > instead of "val" for the "val" key. > > Are there any other changes to the JSON format? Since you checked, I've changed "tag" to "t" and "val" to "v". This will cut down a lot on the size of the serialized strings, which will also help performance. I don't worry too much about confusing users, since anyone who interacts directly with the JSON will have to study the format pretty closely anyway. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> @ 2013-10-17 9:02 ` BP Jonsson [not found] ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-17 9:02 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw 2013-10-16 19:49, John MacFarlane skrev: > +++ BP Jonsson [Oct 16 13 18:32 ]: >> 2013-10-16 16:16, BP Jonsson skrev: >>> Is that restoration in the development version only? I tried to update but >>> cabal said everything was up to date. >>> >>> I guess that in the meanwhile I could write a filter which will convert >> >from the changed to the traditional JSON format and one to convert the >>> other way around and call them before and after other filters, provided it >>> is possible to chain filters without pandoc seeing the data between the >>> filters. That should work, shouldn't it? >> >> I discovered it doesn't, but I incidentally peeked at >> <https://github.com/jgm/pandocfilters/blob/460404290a3e956dff3cb0321aa908c4cffabbaf/pandocfilters.py> >> >> And found that you have decided to change the JSON format so that >> objects have a "tag" key and a "val" key. Assuming this will be the >> format in the next release I can make do for now with an >> environment variable + variable to temporarily use "contents" >> instead of "val" for the "val" key. >> >> Are there any other changes to the JSON format? > > Since you checked, I've changed "tag" to "t" and "val" to "v". > This will cut down a lot on the size of the serialized strings, > which will also help performance. I don't worry too much about > confusing users, since anyone who interacts directly with the > JSON will have to study the format pretty closely anyway. No problem as long as I know what format(s) to support. BTW is the JSON format documented anywhere? And can I expect a pandoc release with the new format soon? /bpj ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-17 16:09 ` John MacFarlane [not found] ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: John MacFarlane @ 2013-10-17 16:09 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw It is not documented, but it is predictable from the definitions in Text.Pandoc.Definition. Looking at a few examples should give you the idea. You might just try to copy the tree-walking code from pandocfilters.py. > No problem as long as I know what format(s) to support. > BTW is the JSON format documented anywhere? And can I expect > a pandoc release with the new format soon? > > /bpj > > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/525FA791.60509%40gmail.com. > For more options, visit https://groups.google.com/groups/opt_out. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> @ 2013-10-17 19:31 ` BP Jonsson [not found] ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-17 19:31 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw 2013-10-17 18:09, John MacFarlane skrev: > It is not documented, but it is predictable from the definitions in > Text.Pandoc.Definition. Looking at a few examples should give you the > idea. I've done that several times, and am mostly confident with the notation. Im not quite sure what the parentheses mean in e.g. DefinitionList [([Inline], [[Block]])] but I guess that it means "one or more tuple(s)", since that's what you actually get in this case. > > You might just try to copy the tree-walking code from pandocfilters.py. I'm no stranger to tree-walking and am already using such code, but I want to pass the callback function an object rather than the raw document element data so that one can use methods to modify the value(s) of attributes etc. without worrying about the JSON format -- although by the time I've written the code for those objects I'll have learned the JSON format by heart I suppose -- and also allow users to build instances of such objects, taking advantage of the JSON.pm module's support for automatically calling TO_JSON methods on objects to obtain appropriate data structures to serialize in place of the objects. /bpj ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-17 23:41 ` John MacFarlane [not found] ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: John MacFarlane @ 2013-10-17 23:41 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw +++ BP Jonsson [Oct 17 13 21:31 ]: > 2013-10-17 18:09, John MacFarlane skrev: > >It is not documented, but it is predictable from the definitions in > >Text.Pandoc.Definition. Looking at a few examples should give you the > >idea. > > I've done that several times, and am mostly confident with the > notation. Im not quite sure what the parentheses mean in e.g. > > DefinitionList [([Inline], [[Block]])] > > but I guess that it means "one or more tuple(s)", since that's > what you actually get in this case. > > > > >You might just try to copy the tree-walking code from pandocfilters.py. > > I'm no stranger to tree-walking and am already using such code, > but I want to pass the callback function an object rather than > the raw document element data so that one can use methods to > modify the value(s) of attributes etc. without worrying about the > JSON format -- although by the time I've written the code for > those objects I'll have learned the JSON format by heart I > suppose -- and also allow users to build instances of such > objects, taking advantage of the JSON.pm module's support for > automatically calling TO_JSON methods on objects to obtain > appropriate data structures to serialize in place of the objects. Yes, that's probably a nicer approach, but it requires duplicating all of the pandoc data structures as native python/perl objects, and providing code to translate between JSON and those. Seemed a bit too much for the moment, but I did provide python "constructor" functions for all the inline and block elements, so people don't need to directly construct JSON objects. (This also makes filters a bit more independent of the underlying representation....when I modified the JSON format just now, I didn't need to change any of the example filters.) ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> @ 2013-10-18 9:26 ` BP Jonsson [not found] ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: BP Jonsson @ 2013-10-18 9:26 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw 2013-10-18 01:41, John MacFarlane skrev: >>> > >You might just try to copy the tree-walking code from pandocfilters.py. >> > >> >I'm no stranger to tree-walking and am already using such code, >> >but I want to pass the callback function an object rather than >> >the raw document element data so that one can use methods to >> >modify the value(s) of attributes etc. without worrying about the >> >JSON format -- although by the time I've written the code for >> >those objects I'll have learned the JSON format by heart I >> >suppose -- and also allow users to build instances of such >> >objects, taking advantage of the JSON.pm module's support for >> >automatically calling TO_JSON methods on objects to obtain >> >appropriate data structures to serialize in place of the objects. > Yes, that's probably a nicer approach, but it requires duplicating > all of the pandoc data structures as native python/perl objects, My thought it to make that half-lazy by letting the user specify a wanted_tags parameter to the walker call, and if that parameter, a single tag or an array of tags, is provided then only element with those tags, and their descendants, will be 'objectified'. I suppose I could make that even more lazy by fetching/converting data from the structures obtained from JSON only when requested -- but too many checks gets expensive too, at least in terms of code clarity. > and providing code to translate between JSON and those. There is no way around that, but naturally I'm modularizing that as much as possible, e.g. the attr property of a span object is an instance of an attr class, which is also used by div/header/code/codeblock objects so code for constructing and serializing attributes need not be duplicated. > Seemed a bit too much for the moment, I've decided that it is worth it because perl code for handling nested structures, especially finding values in nested arrays, tends to become hairy and thus error prone, so that I would anyway need to provide helper functions for such things. Consider the example for how to get a keyval I gave the other day (as modified for the new format): my($keyval) = grep { $_->[0] eq 'foo' } @{ $elem->{v}[0][2] }; Things get much cleaner, and faster if one needs to look up many keyvals, on all levels if the keyvals are converted to a hashmap when the attr object is constructed and back again once by the attr object's TO_JSON method, which both are comparatively benign code. > but I did provide python "constructor" functions for all the > inline and block elements, so people don't need to directly > construct JSON objects. I have a 'filter' object which is instantiated once for each filter, and various utility routines, including the walking routine, are provided as methods of that object. It has a new_elem_obj method which calls the appropriate constructor depending on which tag is provided to it, mostly because class names tend to get long with perl's namespace model. You can pass that method or the specific constructors either separate parameters for tag/format/classes/whatever or an element parameter obtained from the JSON data, which is 'mined' for data by the constructor if it exists and the specific parameters don't, so both the walking routine and the user use the same method to get their element objects. Moreover you can subclass the 'filter' object and provide functions with names like div_handler div_handler_for_someclass or div_handler_to_latex for the walker to use as callback for specific elements w/o specific classes if applicable and for specific target formats if desired. The goal is that the user shall not have to litter a single callback function with a lot of conditional chains. > (This also makes filters a bit more independent of the > underlying representation....when I modified the JSON format > just now, I didn't need to change any of the example filters.) Yes, that's my goal too, and to move repetitive code out of the filter scripts into modules used by the filter scripts. /bpj ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Span and Div elements without any attributes (in filter output) [not found] ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-18 15:10 ` John MacFarlane 0 siblings, 0 replies; 19+ messages in thread From: John MacFarlane @ 2013-10-18 15:10 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Note: I've tweaked things again, using 'c' instead of 'v'. Hope to have a proper release soon. +++ BP Jonsson [Oct 18 13 11:26 ]: > 2013-10-18 01:41, John MacFarlane skrev: > >>>> >You might just try to copy the tree-walking code from pandocfilters.py. > >>> > >>>I'm no stranger to tree-walking and am already using such code, > >>>but I want to pass the callback function an object rather than > >>>the raw document element data so that one can use methods to > >>>modify the value(s) of attributes etc. without worrying about the > >>>JSON format -- although by the time I've written the code for > >>>those objects I'll have learned the JSON format by heart I > >>>suppose -- and also allow users to build instances of such > >>>objects, taking advantage of the JSON.pm module's support for > >>>automatically calling TO_JSON methods on objects to obtain > >>>appropriate data structures to serialize in place of the objects. > > >Yes, that's probably a nicer approach, but it requires duplicating > >all of the pandoc data structures as native python/perl objects, > > My thought it to make that half-lazy by letting the user specify > a wanted_tags parameter to the walker call, and if that > parameter, a single tag or an array of tags, is provided then > only element with those tags, and their descendants, will be > 'objectified'. I suppose I could make that even more lazy by > fetching/converting data from the structures obtained from JSON > only when requested -- but too many checks gets expensive too, at > least in terms of code clarity. > > >and providing code to translate between JSON and those. > > There is no way around that, but naturally I'm modularizing that > as much as possible, e.g. the attr property of a span object is > an instance of an attr class, which is also used by > div/header/code/codeblock objects so code for constructing and > serializing attributes need not be duplicated. > > >Seemed a bit too much for the moment, > > I've decided that it is worth it because perl code for handling > nested structures, especially finding values in nested arrays, > tends to become hairy and thus error prone, so that I would > anyway need to provide helper functions for such things. > Consider the example for how to get a keyval I gave the other day > (as modified for the new format): > > my($keyval) = grep { $_->[0] eq 'foo' } @{ $elem->{v}[0][2] }; > > Things get much cleaner, and faster if one needs to look up many > keyvals, on all levels if the keyvals are converted to a hashmap > when the attr object is constructed and back again once by the > attr object's TO_JSON method, which both are comparatively benign > code. > > >but I did provide python "constructor" functions for all the > >inline and block elements, so people don't need to directly > >construct JSON objects. > > I have a 'filter' object which is instantiated once for each > filter, and various utility routines, including the walking > routine, are provided as methods of that object. It has a > new_elem_obj method which calls the appropriate constructor > depending on which tag is provided to it, mostly because class > names tend to get long with perl's namespace model. You can pass > that method or the specific constructors either separate > parameters for tag/format/classes/whatever or an element > parameter obtained from the JSON data, which is 'mined' for data > by the constructor if it exists and the specific parameters > don't, so both the walking routine and the user use the same > method to get their element objects. Moreover you can subclass > the 'filter' object and provide functions with names like > div_handler div_handler_for_someclass or div_handler_to_latex for > the walker to use as callback for specific elements w/o specific > classes if applicable and for specific target formats if desired. > The goal is that the user shall not have to litter a single > callback function with a lot of conditional chains. > > >(This also makes filters a bit more independent of the > >underlying representation....when I modified the JSON format > >just now, I didn't need to change any of the example filters.) > > Yes, that's my goal too, and to move repetitive code out of the > filter scripts into modules used by the filter scripts. > > /bpj > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5260FECB.30409%40gmail.com. > For more options, visit https://groups.google.com/groups/opt_out. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-10-18 15:10 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-10-11 11:55 Span and Div elements without any attributes (in filter output) BP Jonsson [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2013-10-12 14:50 ` John MacFarlane [not found] ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 2013-10-13 12:15 ` BP Jonsson [not found] ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-13 21:29 ` John MacFarlane [not found] ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 2013-10-14 2:06 ` Peter Sefton [not found] ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-14 2:12 ` Peter Sefton [not found] ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-14 2:16 ` Peter Sefton [not found] ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-14 7:03 ` John MacFarlane 2013-10-14 6:01 ` BP Jonsson 2013-10-16 14:16 ` BP Jonsson [not found] ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-10-16 16:13 ` John MacFarlane 2013-10-16 16:32 ` BP Jonsson [not found] ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2013-10-16 17:49 ` John MacFarlane [not found] ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 2013-10-17 9:02 ` BP Jonsson [not found] ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2013-10-17 16:09 ` John MacFarlane [not found] ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 2013-10-17 19:31 ` BP Jonsson [not found] ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2013-10-17 23:41 ` John MacFarlane [not found] ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org> 2013-10-18 9:26 ` BP Jonsson [not found] ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2013-10-18 15:10 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).