From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/7763 Path: news.gmane.org!not-for-mail From: Peter Sefton Newsgroups: gmane.text.pandoc Subject: Re: Span and Div elements without any attributes (in filter output) Date: Mon, 14 Oct 2013 13:12:21 +1100 Message-ID: References: <5257E71D.9070706@gmail.com> <20131012145025.GE95559@Johns-MacBook-Pro.local> <20131013212946.GA25277@protagoras.phil.berkeley.edu> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: ger.gmane.org 1381716737 7988 80.91.229.3 (14 Oct 2013 02:12:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 14 Oct 2013 02:12:17 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCNNBAPMXMPBBBVG5WJAKGQEINXTKKA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Oct 14 04:12:23 2013 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-lb0-f186.google.com ([209.85.217.186]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VVXdu-0001Z7-VK for gtp-pandoc-discuss@m.gmane.org; Mon, 14 Oct 2013 04:12:23 +0200 Original-Received: by mail-lb0-f186.google.com with SMTP id w6sf692338lbh.3 for ; Sun, 13 Oct 2013 19:12:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:x-original-sender:x-original-authentication-results :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=OwJqC3BpNKuTP1BkJ+YhbQdjedbzrEz2Zehyc++gbzU=; b=knhQyJ/bOZOwv0cbgZr2Y8q3IaN1b0LI6iqw+P2efD5Ll/2q2gYidzIP6j/D1OcgPD tkrp5+2wkcE+GKX07yQCw/Wei0oDerRDuf7hg0UXkRduQh0sQkBe15tjrcuVCrYp52BP qIevtnoUZ7cme5HlrRbRaRXNfXb3czvykksBLHWIEjeRy99scQ2Z3F3X9bmdoAFJQEA5 03Tq9STxv67npo2fS/LNmbtmF15XoDQLru1UY4P5j7Ho549IO6iQN6mEKZ5N4DSxVc4h n7d9I+wlGTEnib+HJRccpRqxme1Uc2A6G/MaN7X1IUWSCW63nkuhTh+gU2H9uuaXfxvj 5Y7Q== X-Received: by 10.180.86.2 with SMTP id l2mr145663wiz.12.1381716742466; Sun, 13 Oct 2013 19:12:22 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.180.109.133 with SMTP id hs5ls577294wib.25.canary; Sun, 13 Oct 2013 19:12:21 -0700 (PDT) X-Received: by 10.15.43.68 with SMTP id w44mr10640148eev.6.1381716741882; Sun, 13 Oct 2013 19:12:21 -0700 (PDT) Original-Received: from mail-we0-x230.google.com (mail-we0-x230.google.com [2a00:1450:400c:c03::230]) by gmr-mx.google.com with ESMTPS id ma5si572306wic.2.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 13 Oct 2013 19:12:21 -0700 (PDT) Received-SPF: pass (google.com: domain of ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:400c:c03::230 as permitted sender) client-ip=2a00:1450:400c:c03::230; Original-Received: by mail-we0-f176.google.com with SMTP id w62so6345700wes.35 for ; Sun, 13 Oct 2013 19:12:21 -0700 (PDT) X-Received: by 10.180.20.116 with SMTP id m20mr12449109wie.53.1381716741763; Sun, 13 Oct 2013 19:12:21 -0700 (PDT) Original-Received: by 10.216.155.66 with HTTP; Sun, 13 Oct 2013 19:12:21 -0700 (PDT) In-Reply-To: X-Original-Sender: ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:400c:c03::230 as permitted sender) smtp.mail=ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-Subscribe: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:7763 Archived-At: Further to this, I might be missing the point, but this could also be made much easier to work with in arbitrary languages: On Mon, Oct 14, 2013 at 1:06 PM, Peter Sefton wrote: > John, > > As a new user I agree that the {"tag": "X" format makes more sense as > it easier to code against. Is the key 'tag' and artefact of the json > serialiser? In this particular case 'tag' almost works, I think a > meaningful name like 'element' or 'node' would be better than 'k'. > > Peter > > On Mon, Oct 14, 2013 at 8:29 AM, John MacFarlane wrote: >> There was a change in the aeson (json) library that caused pandoc's >> JSON format to change, briefly. (The latest code works around the >> change and restores the old behavior.) >> >> However, I think it would be worth considering changing to the >> format aeson now defaults to: >> >> {"tag": "CodeBlock", "contents": ...} >> >> instead of >> >> {"CodeBlock": ...} >> >> The former is more verbose but much easier to work with programatically. >> And we could remove some of the verbosity by changing "tag" to "k" >> and "contents" to "v". >> >> Thoughts? >> >> +++ BP Jonsson [Oct 13 13 14:15 ]: >>> I'm trying to write something similar to pandocfilters.py to help with >>> writing filters in Perl. >>> >>> I noticed that the that the `walk` function in pandocfilters.py seems >>> to expect that the dict objects it receives has an element name like >>> `CodeBlock` as a key, with the contents of the element as value. This >>> was indeed how the JSON output by pandoc looked prior to pandoc 1.12: >>> >>> {"CodeBlock":[...]} >>> >>> but in pandoc 1.12 I get JSON output where each element object has a >>> `tag` key with the element type, e.g. `CodeBlock` as value and a key >>> "contents" with the element contents as value: >>> >>> {"tag":"CodeBlock","contents":[...]} >>> >>> I guessed that it's the json module which automatically converts from >>> the 'new style' to the 'old style' behind the scenes? I tried to locate >>> its documentation but couldn't find anything relevant. >>> >>> I'm probably revealing my utter ignorance of python here -- I'm not a >>> programmer but a philologist who learned perl years ago to work on my >>> data --, and it's probably a good time to remedy that, but I want to be >>> sure what kind of data I should be expecting/returning, or if something >>> is broken in my pandoc installation, however unlikely! >>> >>> Den lrdagen den 12:e oktober 2013 skrev John MacFarlane: >>> >>> I don't want to make removing empty spans the default, since >>> it breaks expected behavior that HTML tags will be passed through >>> verbatim. >>> Note that if you use the python pandocfilters library to write >>> your filters, your transformation functions can return a list >>> instead of an object, in which case the list will be spliced in >>> to the result (which I think is what you want). >>> If you're writing the filters in Haskell, you can just use a >>> function Inline -> [Inline] or Inline -> IO [Inline]. >>> +++ BP Jonsson [Oct 11 13 13:55 ]: >>> > In a filter it's sometimes desirable to replace an element in its >>> > parent element's content list with e.g. the contents of the >>> > element itself, modified in some way. In practice this is hard to >>> > do as you'll have to walk the AST data structure and collect >>> > elements along with a reference to their parent element's content >>> > list, which is a bit more complicated than just collecting the >>> > (child) elements themselves. One possible workaround is to >>> > convert the element into a Div or Span element and set it's >>> > contents to whatever one wants to replace the original element >>> > with. It works in the sense that the Span or Div element will >>> > just sit there and, well, contain the data, but in HTML output it >>> > will show up as a `...` or `
...
`, even >>> > though it probably doesn't have any meaningful purpose in the >>> > HTML document; it just makes the HTML harder to read and harder >>> > to render. >>> > >>> > I've tried to write a filter which removes Span and Div elements >>> > which don't have any attributes at all (id, class or other >>> > attributes) -- or alternatively those which have a `disembowel=1` >>> > attribute, although the absence of any attributes seems a better >>> > criterion -- but for various >>> > >>> > reasons this has proved hard within a reasonable level of >>> > 'parsing', especially since the AST data structure is rather >>> > radically altered in the process -- paths to elements change >>> > during the process in ways that make the processing hard. What if >>> > pandoc itself replaced such attribute-less Span and Div elements >>> > with their content at least when the `--normalize` option is set? >>> > After all pandoc parses the whole document anyway! >>> > >>> > /bpj >>> > >>> > P.S. >>> > : I am aware that there may be situations when a `` or >>> `
` >>> > may be meaningful in an HTML document, most notably perhaps >>> > when a CSS rule targets them as children of some other >>> > element, but it seems to me that even then it is probably >>> > most user friendly to give them a class describing their >>> > function if they are really intended to fulfill a function in >>> > the document, so that it might be reasonable for pandoc to >>> > remove such attribute-less elements at least under >>> > `-- normalize`. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> > To unsubscribe from this group and stop receiving emails from it, >>> send an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> > To post to this group, send email to >>> [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> > To view this discussion on the web visit >>> [3]https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706 >>> %40gmail.com. >>> > For more options, visit >>> [4]https://groups.google.com/groups/opt_out. >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To post to this group, send email to >>> [6]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> [7]https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.G >>> E95559%40Johns-MacBook-Pro.local. >>> For more options, visit [8]https://groups.google.com/groups/opt_out. >>> >>> -- >>> /BP >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> [9]https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8z >>> YfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com. >>> For more options, visit [10]https://groups.google.com/groups/opt_out. >>> >>> References >>> >>> 1. javascript:; >>> 2. javascript:; >>> 3. https://groups.google.com/d/msgid/pandoc-discuss/5257E71D.9070706%40gmail.com >>> 4. https://groups.google.com/groups/opt_out >>> 5. javascript:; >>> 6. javascript:; >>> 7. https://groups.google.com/d/msgid/pandoc-discuss/20131012145025.GE95559%40Johns-MacBook-Pro.local >>> 8. https://groups.google.com/groups/opt_out >>> 9. https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSeX5gHKAGD8zYfbGkFFjFYjZdH3QNTzjRuZpryAp-0CQ%40mail.gmail.com >>> 10. https://groups.google.com/groups/opt_out >> >> -- >> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20131013212946.GA25277%40protagoras.phil.berkeley.edu. >> For more options, visit https://groups.google.com/groups/opt_out. > > > > -- > > Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com > Gmail, Twitter & Skype name: ptsefton -- Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com Gmail, Twitter & Skype name: ptsefton