public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Custom Markdown template
@ 2021-12-08  1:32 'Angel Blue01' via pandoc-discuss
       [not found] ` <c6ba5cb9-d30b-4085-9199-c627370eaee9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: 'Angel Blue01' via pandoc-discuss @ 2021-12-08  1:32 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1424 bytes --]

Hello,

I'm relatively new to pandoc and completely new to templates. I have 
already searched the documentation on pandoc.org and could not find much on 
templates per say, there is a great deal on different types of filters, but 
I am not sure what the difference is between templates and filters.

I'm trying to customize the conversion of HTML to Markdown (github 
markdown).

   - If the page's author and authored date are specified in the <head> 
   tag, insert them on new lines below the page title
   - If there are <header>, <footer> or <nav> elements, those should be 
   skipped and not included in the markdown file
   - If its possible to also skip parts of the input file according to 
   their class or id attributes that would be great
   - If there is an <iframe> tag, say for an embedded video,  a hyperlink 
   to, or at least the value of, the src attribute should appear in place 
   of the iframe
   - Insert the page's URL in the last line at the bottom of the markdown 
   file

Thanks.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c6ba5cb9-d30b-4085-9199-c627370eaee9n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2063 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Custom Markdown template
       [not found] ` <c6ba5cb9-d30b-4085-9199-c627370eaee9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-08  2:40   ` John MacFarlane
  0 siblings, 0 replies; 2+ messages in thread
From: John MacFarlane @ 2021-12-08  2:40 UTC (permalink / raw)
  To: 'Angel Blue01' via pandoc-discuss, pandoc-discuss

"'Angel Blue01' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> Hello,
>
> I'm relatively new to pandoc and completely new to templates. I have 
> already searched the documentation on pandoc.org and could not find much on 
> templates per say, there is a great deal on different types of filters, but 
> I am not sure what the difference is between templates and filters.

Templates: https://pandoc.org/MANUAL.html#templates
Filters:   https://pandoc.org/filters.html

>    - If the page's author and authored date are specified in the <head> 
>    tag, insert them on new lines below the page title

Specified how, exactly?

>    - If there are <header>, <footer> or <nav> elements, those should be 
>    skipped and not included in the markdown file

Pandoc doesn't do this, but I think this could be a reasonable
thing to do.  I'm not sure what others think about the idea.

>    - If its possible to also skip parts of the input file according to 
>    their class or id attributes that would be great

Your best bet is probably preprocessing the HTML with a tool
that can omit certain elements, before passing it to pandoc.
But if the parts are divs or spans, you may be able to use
a filter to remove the parsed Div or Span from the AST.

>    - If there is an <iframe> tag, say for an embedded video,  a hyperlink 
>    to, or at least the value of, the src attribute should appear in place 
>    of the iframe

Currently pandoc will deal with this by attempting to download
the content of the src attribute; for videos this often gives a bad
result, which, though it does include the URL, could be improved.
If you like, you can submit an issue to the issue tracker.

If you do

    pandoc -f html+raw_html

which disables raw html blocks in the AST, you'll just get the
raw contents of the iframe, which you could intercept with a
filter and turn into something else.

>    - Insert the page's URL in the last line at the bottom of the markdown 
>    file

See the manual under sourcefile.  The sourcefile variable will be
set and you can use it in a template.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-12-08  2:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08  1:32 Custom Markdown template 'Angel Blue01' via pandoc-discuss
     [not found] ` <c6ba5cb9-d30b-4085-9199-c627370eaee9n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-08  2:40   ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).