* Simplifying pandoc's HTML output even more
@ 2017-02-11 21:14 Marc Haber
[not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Marc Haber @ 2017-02-11 21:14 UTC (permalink / raw)
To: pandoc-discuss
Hi,
I am using pandoc to generate simple HTML from markdown. Simple HTML
is required because the german tax authority wants footnotes and
explanation in a rather limited subset of XHTML.
For example, here a test markdown input:
Right Left Center Default
------- ------ -------- -------
12 12 12 12
123 123 123 123
1 1 1 1
This creates the following HTML:
<table>
<thead>
<tr class="header">
<th align="right">Right</th>
<th align="right">Left</th>
<th align="right">Center</th>
<th>Default</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="right">12</td>
<td align="right">12</td>
<td align="right">12</td>
<td>12</td>
</tr>
<tr class="even">
<td align="right">123</td>
<td align="right">123</td>
<td align="right">123</td>
<td>123</td>
</tr>
<tr class="odd">
<td align="right">1</td>
<td align="right">1</td>
<td align="right">1</td>
<td>1</td>
</tr>
</tbody>
</table>
This HTML does not pass tax validation due to the thead and tbody and
the class attribute to the tr tag.
Can I make pandoc omit those tags and attributes, or do I need to do
post-processing of the generated HTML?
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
@ 2017-02-11 22:00 ` BP Jonsson
[not found] ` <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-11 22:25 ` John MacFarlane
1 sibling, 1 reply; 8+ messages in thread
From: BP Jonsson @ 2017-02-11 22:00 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 4024 bytes --]
If you make a list of everything you want changed/removed in terms of HTML
I'll try to write an HTML filter.
/bpj
lör 11 feb. 2017 kl. 22:14 skrev Marc Haber <mh+pandoc-discuss@zugschlus.de
>:
> Hi,
>
> I am using pandoc to generate simple HTML from markdown. Simple HTML
> is required because the german tax authority wants footnotes and
> explanation in a rather limited subset of XHTML.
>
> For example, here a test markdown input:
> Right Left Center Default
> ------- ------ -------- -------
> 12 12 12 12
> 123 123 123 123
> 1 1 1 1
>
> This creates the following HTML:
> <table>
> <thead>
> <tr class="header">
> <th align="right">Right</th>
> <th align="right">Left</th>
> <th align="right">Center</th>
> <th>Default</th>
> </tr>
> </thead>
> <tbody>
> <tr class="odd">
> <td align="right">12</td>
> <td align="right">12</td>
> <td align="right">12</td>
> <td>12</td>
> </tr>
> <tr class="even">
> <td align="right">123</td>
> <td align="right">123</td>
> <td align="right">123</td>
> <td>123</td>
> </tr>
> <tr class="odd">
> <td align="right">1</td>
> <td align="right">1</td>
> <td align="right">1</td>
> <td>1</td>
> </tr>
> </tbody>
> </table>
>
> This HTML does not pass tax validation due to the thead and tbody and
> the class attribute to the tr tag.
>
> Can I make pandoc omit those tags and attributes, or do I need to do
> post-processing of the generated HTML?
>
> Greetings
> Marc
>
>
> --
>
> -----------------------------------------------------------------------------
> Marc Haber | "I don't trust Computers. They | Mailadresse im Header
> Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
> Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20170211211439.GD2488%40torres.zugschlus.de
> .
> For more options, visit https://groups.google.com/d/optout.
>
--
------------------------------
SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvwhttps://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw
<https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw>
<https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw>
------------------------------
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRk%2BEGMRy6Bw0p2u6EiTwSHVwr589MZPm_%2Bda3hZudUiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: Type: text/html, Size: 7338 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
2017-02-11 22:00 ` BP Jonsson
@ 2017-02-11 22:25 ` John MacFarlane
1 sibling, 0 replies; 8+ messages in thread
From: John MacFarlane @ 2017-02-11 22:25 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
+++ Marc Haber [Feb 11 17 22:14 ]:
>Can I make pandoc omit those tags and attributes, or do I need to do
>post-processing of the generated HTML?
No, you need to do post-processing.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-12 7:10 ` Marc Haber
[not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Marc Haber @ 2017-02-12 7:10 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On Sat, Feb 11, 2017 at 10:00:04PM +0000, BP Jonsson wrote:
> If you make a list of everything you want changed/removed in terms of HTML
> I'll try to write an HTML filter.
So, pandoc's output is not configurable in this regard, there a no
run-time changeable termplates being used?
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
@ 2017-02-13 12:33 ` BP Jonsson
[not found] ` <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-13 15:27 ` John MacFarlane
1 sibling, 1 reply; 8+ messages in thread
From: BP Jonsson @ 2017-02-13 12:33 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]
Den 2017-02-12 kl. 08:10, skrev Marc Haber:
> So, pandoc's output is not configurable in this regard, there a no
> run-time changeable termplates being used?
No, you have to post-process. When I need to do that I usually
write an HTML filter based on [Mojo::DOM][] like the one attached
(which should do the trick for you if thead, tbody and tr.class
are the only issues. Of course you need [perl][][^1] and the
Mojo::DOM [modules][] installed, but that should be a piece of
cake, then pipe pandoc's output through the html filter:
$ pandoc input.md | perl strip-table-parts.pl > output.html
[Mojo::DOM]: https://metacpan.org/pod/Mojo::DOM
[perl]: https://www.perl.org/get.html
[modules]: http://www.cpan.org/modules/INSTALL.html
[^1]: I recommend Strawberry Perl if you are on Windows.
/bpj
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/735216ab-2350-d8a5-d582-10d82d7a8d61%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: strip-table-parts.pl --]
[-- Type: text/x-perl, Size: 749 bytes --]
#!/usr/bin/env perl
# strip thead, tbody tags and tr classes from HTML (as produced by pandoc)
use utf8;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);
use Mojo::DOM;
sub trim {
my($string) = @_;
$string =~ s/\A\s+//; # remove leading whitespace
$string =~ s/\s+\z//; # remove trailing whitespace
return $string;
}
my $html = do { local $/; <>; }; # slurp STDIN
my $dom = Mojo::DOM->new($html);
my $stripped = $dom->find('thead, tbody');
for my $elem ( @$stripped ) {
# trim content so no empty lines inside table
$elem->replace( trim $elem->content );
}
my $trs = $dom->find('tr');
for my $tr ( @$trs ) {
delete $tr->attr->{class};
}
print $dom->to_string;
__END__
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
2017-02-13 12:33 ` BP Jonsson
@ 2017-02-13 15:27 ` John MacFarlane
[not found] ` <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
1 sibling, 1 reply; 8+ messages in thread
From: John MacFarlane @ 2017-02-13 15:27 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Another option is to use a custom lua writer (see
the manual).
The example included with pandoc (data/sample.lua)
generates HTML, so it would be easy to modify this
slightly for your needs.
+++ Marc Haber [Feb 12 17 08:10 ]:
>On Sat, Feb 11, 2017 at 10:00:04PM +0000, BP Jonsson wrote:
>> If you make a list of everything you want changed/removed in terms of HTML
>> I'll try to write an HTML filter.
>
>So, pandoc's output is not configurable in this regard, there a no
>run-time changeable termplates being used?
>
>Greetings
>Marc
>
>--
>-----------------------------------------------------------------------------
>Marc Haber | "I don't trust Computers. They | Mailadresse im Header
>Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
>Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
>
>--
>You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20170212071012.GI2488%40torres.zugschlus.de.
>For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
@ 2017-02-16 11:45 ` Marc Haber
0 siblings, 0 replies; 8+ messages in thread
From: Marc Haber @ 2017-02-16 11:45 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On Mon, Feb 13, 2017 at 04:27:05PM +0100, John MacFarlane wrote:
> Another option is to use a custom lua writer (see
> the manual).
>
> The example included with pandoc (data/sample.lua)
> generates HTML, so it would be easy to modify this
> slightly for your needs.
Perfect. Thanks!
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Simplifying pandoc's HTML output even more
[not found] ` <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-02-16 11:46 ` Marc Haber
0 siblings, 0 replies; 8+ messages in thread
From: Marc Haber @ 2017-02-16 11:46 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On Mon, Feb 13, 2017 at 01:33:30PM +0100, BP Jonsson wrote:
> No, you have to post-process. When I need to do that I usually
> write an HTML filter based on [Mojo::DOM][] like the one attached
> (which should do the trick for you if thead, tbody and tr.class
> are the only issues. Of course you need [perl][][^1] and the
> Mojo::DOM [modules][] installed, but that should be a piece of
> cake, then pipe pandoc's output through the html filter:
>
> $ pandoc input.md | perl strip-table-parts.pl > output.html
Thanks for the code and the insight into Mojo::DOM, but writing a
custom lua filter was easier for me. I'll keep Mojo::DOM in mind though.
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-02-16 11:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-11 21:14 Simplifying pandoc's HTML output even more Marc Haber
[not found] ` <20170211211439.GD2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
2017-02-11 22:00 ` BP Jonsson
[not found] ` <CAFC_yuRk+EGMRy6Bw0p2u6EiTwSHVwr589MZPm_+da3hZudUiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-12 7:10 ` Marc Haber
[not found] ` <20170212071012.GI2488-MEsB+WDYHc7QKvwJT6wXshvVK+yQ3ZXh@public.gmane.org>
2017-02-13 12:33 ` BP Jonsson
[not found] ` <735216ab-2350-d8a5-d582-10d82d7a8d61-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-16 11:46 ` Marc Haber
2017-02-13 15:27 ` John MacFarlane
[not found] ` <20170213152705.GB67285-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
2017-02-16 11:45 ` Marc Haber
2017-02-11 22:25 ` John MacFarlane
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).