* Custom Markdown template
@ 2021-12-08 1:32 'Angel Blue01' via pandoc-discuss
[not found] ` <c6ba5cb9-d30b-4085-9199-c627370eaee9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 2+ messages in thread
From: 'Angel Blue01' via pandoc-discuss @ 2021-12-08 1:32 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1424 bytes --]
Hello,
I'm relatively new to pandoc and completely new to templates. I have
already searched the documentation on pandoc.org and could not find much on
templates per say, there is a great deal on different types of filters, but
I am not sure what the difference is between templates and filters.
I'm trying to customize the conversion of HTML to Markdown (github
markdown).
- If the page's author and authored date are specified in the <head>
tag, insert them on new lines below the page title
- If there are <header>, <footer> or <nav> elements, those should be
skipped and not included in the markdown file
- If its possible to also skip parts of the input file according to
their class or id attributes that would be great
- If there is an <iframe> tag, say for an embedded video, a hyperlink
to, or at least the value of, the src attribute should appear in place
of the iframe
- Insert the page's URL in the last line at the bottom of the markdown
file
Thanks.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c6ba5cb9-d30b-4085-9199-c627370eaee9n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 2063 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Custom Markdown template
[not found] ` <c6ba5cb9-d30b-4085-9199-c627370eaee9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-08 2:40 ` John MacFarlane
0 siblings, 0 replies; 2+ messages in thread
From: John MacFarlane @ 2021-12-08 2:40 UTC (permalink / raw)
To: 'Angel Blue01' via pandoc-discuss, pandoc-discuss
"'Angel Blue01' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
> Hello,
>
> I'm relatively new to pandoc and completely new to templates. I have
> already searched the documentation on pandoc.org and could not find much on
> templates per say, there is a great deal on different types of filters, but
> I am not sure what the difference is between templates and filters.
Templates: https://pandoc.org/MANUAL.html#templates
Filters: https://pandoc.org/filters.html
> - If the page's author and authored date are specified in the <head>
> tag, insert them on new lines below the page title
Specified how, exactly?
> - If there are <header>, <footer> or <nav> elements, those should be
> skipped and not included in the markdown file
Pandoc doesn't do this, but I think this could be a reasonable
thing to do. I'm not sure what others think about the idea.
> - If its possible to also skip parts of the input file according to
> their class or id attributes that would be great
Your best bet is probably preprocessing the HTML with a tool
that can omit certain elements, before passing it to pandoc.
But if the parts are divs or spans, you may be able to use
a filter to remove the parsed Div or Span from the AST.
> - If there is an <iframe> tag, say for an embedded video, a hyperlink
> to, or at least the value of, the src attribute should appear in place
> of the iframe
Currently pandoc will deal with this by attempting to download
the content of the src attribute; for videos this often gives a bad
result, which, though it does include the URL, could be improved.
If you like, you can submit an issue to the issue tracker.
If you do
pandoc -f html+raw_html
which disables raw html blocks in the AST, you'll just get the
raw contents of the iframe, which you could intercept with a
filter and turn into something else.
> - Insert the page's URL in the last line at the bottom of the markdown
> file
See the manual under sourcefile. The sourcefile variable will be
set and you can use it in a template.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-12-08 2:40 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 1:32 Custom Markdown template 'Angel Blue01' via pandoc-discuss
[not found] ` <c6ba5cb9-d30b-4085-9199-c627370eaee9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-08 2:40 ` John MacFarlane
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).