public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: "pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org"
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Span and Div elements without any attributes (in filter output)
Date: Mon, 14 Oct 2013 08:01:06 +0200	[thread overview]
Message-ID: <CAFC_yuQT=A30x6fogB5a482OgVjX6uYtdmx1USoXnhSBo_mGqg@mail.gmail.com> (raw)
In-Reply-To: <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 10649 bytes --]

Oh, I must have missed a pandoc update!

I agree that the altered format is easier to work with programmatically; in
fact I thought you had made the change deliberately for that reason. I
don't see any real benefit in ultra-short keys though, preferring
descriptive but not *too* long names for what is in effect attributes, and
any good editor should be expected to support tab completion. To use "key"
and "value" or "k" and "v" as keys might be potentially confusing to
newcomers however suggestive of the earlier format it might be. I'd suggest
"type" and "data" as both short enough and adequately descriptive.

Not that it's a very big deal; part of my perl scripting support module is
already about munging pandoc elements as decoded from json into manageable
objects inorder to not have to deal with things like this all the time:

    my($keyval) = grep { $_->[0] eq 'foo' } @{$elem->{$key}[0][2]};

so in the whole a

    my $type = ( keys %$elem )[0];

is really nothing!

Please don't get me wrong: being able to rely on every hash in the input
being an element is a feature, and being able to rely on those hashes
having exactly one key which is the name of the element type works well in
practice. It probably makes the json more human readable too.

Den söndagen den 13:e oktober 2013 skrev John MacFarlane:

> There was a change in the aeson (json) library that caused pandoc's
> JSON format to change, briefly.  (The latest code works around the
> change and restores the old behavior.)
>
> However, I think it would be worth considering changing to the
> format aeson now defaults to:
>
>     {"tag": "CodeBlock", "contents": ...}
>
> instead of
>
>     {"CodeBlock": ...}
>
> The former is more verbose but much easier to work with programatically.
> And we could remove some of the verbosity by changing "tag" to "k"
> and "contents" to "v".
>
> Thoughts?
>
> +++ BP Jonsson [Oct 13 13 14:15 ]:
> >    I'm trying to write something similar to pandocfilters.py to help with
> >    writing filters in Perl.
> >
> >    I noticed that the that the `walk` function in pandocfilters.py seems
> >    to expect that the dict objects it receives has an element name like
> >    `CodeBlock` as a key, with the contents of the element as value. This
> >    was indeed how the JSON output by pandoc looked prior to pandoc 1.12:
> >
> >    {"CodeBlock":[...]}
> >
> >    but in pandoc 1.12 I get JSON output where each element object has a
> >    `tag` key with the element type, e.g. `CodeBlock` as value and a key
> >    "contents" with the element contents as value:
> >
> >    {"tag":"CodeBlock","contents":[...]}
> >
> >    I guessed that it's the json module which automatically converts from
> >    the 'new style' to the 'old style' behind the scenes? I tried to
> locate
> >    its documentation but couldn't find anything relevant.
> >
> >    I'm probably revealing my utter ignorance of python here -- I'm not a
> >    programmer but a philologist who learned perl years ago to work on my
> >    data --, and it's probably a good time to remedy that, but I want to
> be
> >    sure what kind of data I should be expecting/returning, or if
> something
> >    is broken in my pandoc installation, however unlikely!
> >
> >    Den lrdagen den 12:e oktober 2013 skrev John MacFarlane:
> >
> >      I don't want to make removing empty spans the default, since
> >      it breaks expected behavior that HTML tags will be passed through
> >      verbatim.
> >      Note that if you use the python pandocfilters library to write
> >      your filters, your transformation functions can return a list
> >      instead of an object, in which case the list will be spliced in
> >      to the result (which I think is what you want).
> >      If you're writing the filters in Haskell, you can just use a
> >      function Inline -> [Inline] or Inline -> IO [Inline].
> >      +++ BP Jonsson [Oct 11 13 13:55 ]:
> >      > In a filter it's sometimes desirable to replace an element in its
> >      > parent element's content list with e.g. the contents of the
> >      > element itself, modified in some way. In practice this is hard to
> >      > do as you'll have to walk the AST data structure and collect
> >      > elements along with a reference to their parent element's content
> >      > list, which is a bit more complicated than just collecting the
> >      > (child) elements themselves. One possible workaround is to
> >      > convert the element into a Div or Span element and set it's
> >      > contents to whatever one wants to replace the original element
> >      > with. It works in the sense that the Span or Div element will
> >      > just sit there and, well, contain the data, but in HTML output it
> >      > will show up as a `<span>...</span>` or `<div>...</div>`, even
> >      > though it probably doesn't have any meaningful purpose in the
> >      > HTML document; it just makes the HTML harder to read and harder
> >      > to render.
> >      >
> >      > I've tried to write a filter which removes Span and Div elements
> >      > which don't have any attributes at all (id, class or other
> >      > attributes) -- or alternatively those which have a `disembowel=1`
> >      > attribute, although the absence of any attributes seems a better
> >      > criterion -- but for various
> >      >
> >      > reasons this has proved hard within a reasonable level of
> >      > 'parsing', especially since the AST data structure is rather
> >      > radically altered in the process -- paths to elements change
> >      > during the process in ways that make the processing hard. What if
> >      > pandoc itself replaced such attribute-less Span and Div elements
> >      > with their content at least when the `--normalize` option is set?
> >      > After all pandoc parses the whole document anyway!
> >      >
> >      > /bpj
> >      >
> >      > P.S.
> >      > : I am aware that there may be situations when a `<span>` or
> >      `<div>`
> >      > may be meaningful in an HTML document, most notably perhaps
> >      > when a CSS rule targets them as children of some other
> >      > element, but it seems to me that even then it is probably
> >      > most user friendly to give them a class describing their
> >      > function if they are really intended to fulfill a function in
> >      > the document, so that it might be reasonable for pandoc to
> >      > remove such attribute-less elements at least under
> >      > `-- normalize`.
> >      >
> >      > --
> >      > You received this message because you are subscribed to the Google
> >      Groups "pandoc-discuss" group.
> >      > To unsubscribe from this group and stop receiving emails from it,
> >      send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >      > To post to this group, send email to
> >      [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> >      > To view this discussion on the web visit
> >      [3]
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706
> >      %40gmail.com.
> >      > For more options, visit
> >      [4]https://groups.google.com/groups/opt_out.
> >      --
> >      You received this message because you are subscribed to the Google
> >      Groups "pandoc-discuss" group.
> >      To unsubscribe from this group and stop receiving emails from it,
> >      send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >      To post to this group, send email to
> >      [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> >      To view this discussion on the web visit
> >      [7]
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G
> >      E95559%40Johns-MacBook-Pro.local.
> >      For more options, visit [8]https://groups.google.com/groups/opt_out
> .
> >
> >    --
> >    /BP
> >
> >    --
> >    You received this message because you are subscribed to the Google
> >    Groups "pandoc-discuss" group.
> >    To unsubscribe from this group and stop receiving emails from it, send
> >    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> >    To view this discussion on the web visit
> >    [9]
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z
> >    YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com.
> >    For more options, visit [10]https://groups.google.com/groups/opt_out.
> >
> > References
> >
> >    1. javascript:;
> >    2. javascript:;
> >    3.
> https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com
> >    4. https://groups.google.com/groups/opt_out
> >    5. javascript:;
> >    6. javascript:;
> >    7.
> https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local
> >    8. https://groups.google.com/groups/opt_out
> >    9.
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com
> >   10. https://groups.google.com/groups/opt_out
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:;>.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:;>
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>


-- 
/BP

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuQT%3DA30x6fogB5a482OgVjX6uYtdmx1USoXnhSBo_mGqg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 14722 bytes --]

  parent reply	other threads:[~2013-10-14  6:01 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-11 11:55 BP Jonsson
     [not found] ` <5257E71D.9070706-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-12 14:50   ` John MacFarlane
     [not found]     ` <20131012145025.GE95559-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2013-10-13 12:15       ` BP Jonsson
     [not found]         ` <CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-13 21:29           ` John MacFarlane
     [not found]             ` <20131013212946.GA25277-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2013-10-14  2:06               ` Peter Sefton
     [not found]                 ` <CAGQnt7U9F7HcQ70yp11CsnKPW6r2R_CLe3WB8OGBeO=GiC46yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-14  2:12                   ` Peter Sefton
     [not found]                     ` <CAGQnt7UOMMtsAu7+Lhmb2NvGxSq+zcLUPq0yaBcs=xL-Nyh0sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-14  2:16                       ` Peter Sefton
     [not found]                         ` <CAGQnt7V4HxtkBDjaFtYK2o77B0HWqj6o__423hHGJyed7eAf2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-14  7:03                           ` John MacFarlane
2013-10-14  6:01               ` BP Jonsson [this message]
2013-10-16 14:16               ` BP Jonsson
     [not found]                 ` <CAFC_yuRuOgEKK8BG=eDsm1KG0_CbCda8_GAFWJPjp0Agt3Kq_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-16 16:13                   ` John MacFarlane
2013-10-16 16:32                   ` BP Jonsson
     [not found]                     ` <525EBF8A.7050106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-16 17:49                       ` John MacFarlane
     [not found]                         ` <20131016174957.GA59114-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2013-10-17  9:02                           ` BP Jonsson
     [not found]                             ` <525FA791.60509-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-17 16:09                               ` John MacFarlane
     [not found]                                 ` <20131017160958.GC65594-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2013-10-17 19:31                                   ` BP Jonsson
     [not found]                                     ` <52603B2E.1080100-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-17 23:41                                       ` John MacFarlane
     [not found]                                         ` <20131017234100.GC25883-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2013-10-18  9:26                                           ` BP Jonsson
     [not found]                                             ` <5260FECB.30409-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-18 15:10                                               ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFC_yuQT=A30x6fogB5a482OgVjX6uYtdmx1USoXnhSBo_mGqg@mail.gmail.com' \
    --to=bpjonsson-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).