* Simplifying pandoc's HTML output even more @ 2017-02-11 21:14 Marc Haber [not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Marc Haber @ 2017-02-11 21:14 UTC (permalink / raw) To: pandoc-discuss Hi, I am using pandoc to generate simple HTML from markdown. Simple HTML is required because the german tax authority wants footnotes and explanation in a rather limited subset of XHTML. For example, here a test markdown input: Right Left Center Default ------- ------ -------- ------- 12 12 12 12 123 123 123 123 1 1 1 1 This creates the following HTML: <table> <thead> <tr class="header"> <th align="right">Right</th> <th align="right">Left</th> <th align="right">Center</th> <th>Default</th> </tr> </thead> <tbody> <tr class="odd"> <td align="right">12</td> <td align="right">12</td> <td align="right">12</td> <td>12</td> </tr> <tr class="even"> <td align="right">123</td> <td align="right">123</td> <td align="right">123</td> <td>123</td> </tr> <tr class="odd"> <td align="right">1</td> <td align="right">1</td> <td align="right">1</td> <td>1</td> </tr> </tbody> </table> This HTML does not pass tax validation due to the thead and tbody and the class attribute to the tr tag. Can I make pandoc omit those tags and attributes, or do I need to do post-processing of the generated HTML? Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>]
* Re: Simplifying pandoc's HTML output even more [not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> @ 2017-02-11 22:00 ` BP Jonsson [not found] ` <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-02-11 22:25 ` John MacFarlane 1 sibling, 1 reply; 8+ messages in thread From: BP Jonsson @ 2017-02-11 22:00 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 4024 bytes --] If you make a list of everything you want changed/removed in terms of HTML I'll try to write an HTML filter. /bpj lör 11 feb. 2017 kl. 22:14 skrev Marc Haber <mh+pandoc-discuss@zugschlus.de >: > Hi, > > I am using pandoc to generate simple HTML from markdown. Simple HTML > is required because the german tax authority wants footnotes and > explanation in a rather limited subset of XHTML. > > For example, here a test markdown input: > Right Left Center Default > ------- ------ -------- ------- > 12 12 12 12 > 123 123 123 123 > 1 1 1 1 > > This creates the following HTML: > <table> > <thead> > <tr class="header"> > <th align="right">Right</th> > <th align="right">Left</th> > <th align="right">Center</th> > <th>Default</th> > </tr> > </thead> > <tbody> > <tr class="odd"> > <td align="right">12</td> > <td align="right">12</td> > <td align="right">12</td> > <td>12</td> > </tr> > <tr class="even"> > <td align="right">123</td> > <td align="right">123</td> > <td align="right">123</td> > <td>123</td> > </tr> > <tr class="odd"> > <td align="right">1</td> > <td align="right">1</td> > <td align="right">1</td> > <td>1</td> > </tr> > </tbody> > </table> > > This HTML does not pass tax validation due to the thead and tbody and > the class attribute to the tr tag. > > Can I make pandoc omit those tags and attributes, or do I need to do > post-processing of the generated HTML? > > Greetings > Marc > > > -- > > ----------------------------------------------------------------------------- > Marc Haber | "I don't trust Computers. They | Mailadresse im Header > Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 > Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/20170211211439.GD2488%40torres.zugschlus.de > . > For more options, visit https://groups.google.com/d/optout. > -- ------------------------------ SavedURI :Show URLShow URLSavedURI : SavedURI :Hide URLHide URLSavedURI : https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvwhttps://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw <https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw> <https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw> ------------------------------ -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRk%2BEGMRy6Bw0p2u6EiTwSHVwr589MZPm_%2Bda3hZudUiQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 7338 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Simplifying pandoc's HTML output even more [not found] ` <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-02-12 7:10 ` Marc Haber [not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Marc Haber @ 2017-02-12 7:10 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw On Sat, Feb 11, 2017 at 10:00:04PM +0000, BP Jonsson wrote: > If you make a list of everything you want changed/removed in terms of HTML > I'll try to write an HTML filter. So, pandoc's output is not configurable in this regard, there a no run-time changeable termplates being used? Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>]
* Re: Simplifying pandoc's HTML output even more [not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> @ 2017-02-13 12:33 ` BP Jonsson [not found] ` <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2017-02-13 15:27 ` John MacFarlane 1 sibling, 1 reply; 8+ messages in thread From: BP Jonsson @ 2017-02-13 12:33 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 1376 bytes --] Den 2017-02-12 kl. 08:10, skrev Marc Haber: > So, pandoc's output is not configurable in this regard, there a no > run-time changeable termplates being used? No, you have to post-process. When I need to do that I usually write an HTML filter based on [Mojo::DOM][] like the one attached (which should do the trick for you if thead, tbody and tr.class are the only issues. Of course you need [perl][][^1] and the Mojo::DOM [modules][] installed, but that should be a piece of cake, then pipe pandoc's output through the html filter: $ pandoc input.md | perl strip-table-parts.pl > output.html [Mojo::DOM]: https://metacpan.org/pod/Mojo::DOM [perl]: https://www.perl.org/get.html [modules]: http://www.cpan.org/modules/INSTALL.html [^1]: I recommend Strawberry Perl if you are on Windows. /bpj -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/735216ab-2350-d8a5-d582-10d82d7a8d61%40gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: strip-table-parts.pl --] [-- Type: text/x-perl, Size: 749 bytes --] #!/usr/bin/env perl # strip thead, tbody tags and tr classes from HTML (as produced by pandoc) use utf8; use strict; use warnings; use warnings qw(FATAL utf8); use open qw(:std :utf8); use Mojo::DOM; sub trim { my($string) = @_; $string =~ s/\A\s+//; # remove leading whitespace $string =~ s/\s+\z//; # remove trailing whitespace return $string; } my $html = do { local $/; <>; }; # slurp STDIN my $dom = Mojo::DOM->new($html); my $stripped = $dom->find('thead, tbody'); for my $elem ( @$stripped ) { # trim content so no empty lines inside table $elem->replace( trim $elem->content ); } my $trs = $dom->find('tr'); for my $tr ( @$trs ) { delete $tr->attr->{class}; } print $dom->to_string; __END__ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Simplifying pandoc's HTML output even more [not found] ` <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2017-02-16 11:46 ` Marc Haber 0 siblings, 0 replies; 8+ messages in thread From: Marc Haber @ 2017-02-16 11:46 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw On Mon, Feb 13, 2017 at 01:33:30PM +0100, BP Jonsson wrote: > No, you have to post-process. When I need to do that I usually > write an HTML filter based on [Mojo::DOM][] like the one attached > (which should do the trick for you if thead, tbody and tr.class > are the only issues. Of course you need [perl][][^1] and the > Mojo::DOM [modules][] installed, but that should be a piece of > cake, then pipe pandoc's output through the html filter: > > $ pandoc input.md | perl strip-table-parts.pl > output.html Thanks for the code and the insight into Mojo::DOM, but writing a custom lua filter was easier for me. I'll keep Mojo::DOM in mind though. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more [not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> 2017-02-13 12:33 ` BP Jonsson @ 2017-02-13 15:27 ` John MacFarlane [not found] ` <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: John MacFarlane @ 2017-02-13 15:27 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Another option is to use a custom lua writer (see the manual). The example included with pandoc (data/sample.lua) generates HTML, so it would be easy to modify this slightly for your needs. +++ Marc Haber [Feb 12 17 08:10 ]: >On Sat, Feb 11, 2017 at 10:00:04PM +0000, BP Jonsson wrote: >> If you make a list of everything you want changed/removed in terms of HTML >> I'll try to write an HTML filter. > >So, pandoc's output is not configurable in this regard, there a no >run-time changeable termplates being used? > >Greetings >Marc > >-- >----------------------------------------------------------------------------- >Marc Haber | "I don't trust Computers. They | Mailadresse im Header >Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 >Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 > >-- >You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20170212071012.GI2488%40torres.zugschlus.de. >For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>]
* Re: Simplifying pandoc's HTML output even more [not found] ` <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org> @ 2017-02-16 11:45 ` Marc Haber 0 siblings, 0 replies; 8+ messages in thread From: Marc Haber @ 2017-02-16 11:45 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw On Mon, Feb 13, 2017 at 04:27:05PM +0100, John MacFarlane wrote: > Another option is to use a custom lua writer (see > the manual). > > The example included with pandoc (data/sample.lua) > generates HTML, so it would be easy to modify this > slightly for your needs. Perfect. Thanks! Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more [not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> 2017-02-11 22:00 ` BP Jonsson @ 2017-02-11 22:25 ` John MacFarlane 1 sibling, 0 replies; 8+ messages in thread From: John MacFarlane @ 2017-02-11 22:25 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw +++ Marc Haber [Feb 11 17 22:14 ]: >Can I make pandoc omit those tags and attributes, or do I need to do >post-processing of the generated HTML? No, you need to do post-processing. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-02-16 11:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-11 21:14 Simplifying pandoc's HTML output even more Marc Haber [not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> 2017-02-11 22:00 ` BP Jonsson [not found] ` <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-02-12 7:10 ` Marc Haber [not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org> 2017-02-13 12:33 ` BP Jonsson [not found] ` <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2017-02-16 11:46 ` Marc Haber 2017-02-13 15:27 ` John MacFarlane [not found] ` <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org> 2017-02-16 11:45 ` Marc Haber 2017-02-11 22:25 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).